Thursday, January 27, 2011

Contexts in legacy filters and FSRTL_ADVANCED_FCB_HEADER

I've seen some questions about how a legacy filter can implement contexts similar to the ones fltmgr provides for a minifilter.

So what is a context ? A context is a structure that is owned by some system component (in our case a filter, legacy or mini) that is associated with some other structure. In a very general way, a context is a "value" and at the object that it is associated with is a "key". In general contexts are necessary when the flow of execution is controlled by some other component in the system than the one that implements the actual code (for example for callbacks and services and library functions, where the code is provided by the library or service, but when the code is called depends on something else). Anyway, because the context is simply a key-value pair, anyone can implement a generic context mechanism by using hashes, and this allows great flexibility in what one can attach a context to. For example, one can associate a context with a thread or a logged on user or even a sector on a volume if they feel so inclined. One issue with this approach is how to know when the underlying object is released so that the context can be released as well. For example, if a context is associated with thread 128 and then thread 128 terminates and then at some later point in time another thread is created with the same ID of 128, clearly the context should be released since it's not referring to the same underlying object, but unless the entity implementing the context is notified that thread 128 was terminated, it won't know to release it.

So returning to filters, filter manager offers support for the following types of contexts (at least, these the ones that are typically interesting; the other contexts can usually be implemented fairly easily by legacy filters): Streams, StreamHandles and Files. Let's look at how each of these contexts can be implemented. These are just examples about how it could be done with little support from the OS, but it's definitely not the best way it can be done… I'll address that after this section.
StreamHandle contexts

In terms of implementation in a legacy filter, the StreamHandle is probably the easiest to implement since the key is the FILE_OBJECT and the time to remove the context is during IRP_MJ_CLOSE. Of course the context can be created either the first time the FILE_OBJECT is seen by the filter in an operation or when the filter processes the IRP_MJ_CREATE. Because of stream file objects the filter can't assume that it will always see an IRP_MJ_CREATE for each FILE_OBJECT, so a filter must always be prepared to get a FILE_OBJECT that it hasn't seen an IRP_MJ_CREATE for.

Stream contexts

The key for this type of context is the SCB, so whatever the FILE_OBJECT->FsContext member points to is a good key. Unfortunately, FILE_OBJECT->FsContext is not initialized until the file system processes the IRP_MJ_CREATE and opens the stream on disk, which means that a Stream context isn't available in preCreate (the same restriction as for minifilters). The more complicated part is how to know when the SCB is freed by the FILE_SYSTEM. One way to do this is to simply keep track of all the FILE_OBJECTs that the filter is interested in that all reference that SCB and then when the last FILE_OBJECT is processing it's IRP_MJ_CLOSE, free the context associated with the SCB. This is a bit more complicated than in the StreamHandle context, but not much more so. One notable thing is that since the SCB is a structure that belongs to the file system, it is possible that some file system is implemented in such a way that the address of the SCB changes throught the lifetime of an SCB (for example, the FS can copy the SCB to a different memory location under some circumstances). I haven't seen this in practice and there may be other issues with it (since the OS uses some fields in the FSRTL_COMMON_FCB_HEADER) but I haven't either seen anything definitive that disallows it.


File contexts

For file systems that implement alternate data streams (ADS) it might be important to know whether a stream belongs to the same file or not. In this case, the key for the context must be something that identifies the file. For example, if the file ID is guaranteed to be unique for the lifetime of the file (which is true for NTFS for example but is not true for the FASTFAT implementation; however, FASTFAT doesn't support alternate data streams so it doesn't really matter from this perspective) then the file ID can be used as a key. In terms of removing the context, it depends on the structure that was used as the key. For example, if the file ID is used, then the context would need to be removed when the file is deleted (and detecting that is a complicated problem in itself).


Fortunately the nice folks at MS decided to offer some help to the filters writers and developed some support APIs. They are covered in the MSDN pages "Tracking Per-Stream Context in a Legacy File System Filter Driver" (which is currently here) and "Tracking Per-File Context in a Legacy File System Filter Driver" (which is here). These APIs rely on the file system implementing support for the FSRTL_ADVANCED_FCB_HEADER structure. Please note that a file system is not required to implement this support but if it doesn't then it won't work with Filter Manager. Anyway, these APIs allow any kernel component (filter or not) to associate a context with an SCB and to be notified when the SCB itself is torn down. Please note that the SCB might not be torn down immediately when the last FILE_OBJECT for it is closed, because some file systems implement SCB caching and the filter might be able to benefit from this (benefit from it because it can keep its context and if someone opens a new handle to the same stream the filter's context is also cached).

There is another useful structure when implementing contexts, the RTL_GENERIC_TABLE (MSDN page currently here). A generic table is an OS structure that can be used as a general purpose hash, so that the filter doesn't need to implement their own. However, please note that it is implemented as a tree so if performance must be really good then a custom hash might still be necessary.

To wrap it up, in order for a filter to implement a similar scheme to FltMgr's contexts it can use the following scheme:
  • Use OS support for stream contexts (FsRtlInsertPerStreamContext, FsRtlLookupPerStreamContext and so on)
  • Use OS support for file contexts (FsRtlInsertPerFileContext, FsRtlLookupPerFileContext and so on)
  • Implement a hash for per FILE_OBJECT context. Either use a straight hash or use a per Stream structure which includes a hash for FILE_OBJECTS for that stream (which is useful because the number of entries in each hash is much smaller so the RTL_GENERIC_TABLE might be a good fit).

Finally, I'd like to point out that any filter (legacy or mini) that implements its own streams (that completes an IRP_MJ_CREATE and puts something in FILE_OBJECT->FsContext) should implement support for FSRTL_ADVANCED_FCB_HEADER otherwise contexts won't work for those files and it might cause problems for other filters. This should be fairly easy to implement though following the MSDN documentation.

5 comments:

  1. Alex,

    This is a good post, but I find missing the whole area generated by NTFS hard links, where the notion of a "link/name context" seems to be needed.

    Some questions the [mini]filter driver developer needs to contemplate:

    - Which name was changed in IRP_MJ_SET_INFORMATION/FileRenameInformation

    - Which name was (maybe will have been!) deleted in IRP_MJ_SET_INFORMATION/FileDispositionInformation (and "friend" FILE_DELETE_ON_CLOSE )

    The FileObject, FileObject->FsContext (aka FCB/SCB), model seems to be less than complete for these questions.

    There seems to be nothing in the "legacy filter" world other than brute force [normalized] file name (string) comparision.

    Then here also, filter manager also seems [to me, at least[ to comes up short in terms of assist for the mini-filter developer.

    Your thoughts, as ever, much appreciate.

    Best Wishes,
    Lyndon

    ReplyDelete
  2. Hi Lyndon! Thanks for your comment. This is a very good point, I'll talk about the relation between a file, its links and streams in the next post then.

    ReplyDelete
  3. Dont forget the mess that is symbolic links and all the work that Win32 and the IoManager do to make them almost impossible to deal with in a name aware filter (because they do not respect the reparse point rules)

    ReplyDelete
  4. Hi Rod. I can't think of any particular unpleasantness that symlinks introduce with regard to contexts. Or are you referring to them in general ?
    I for one am grateful that they needed to add ECPs to implement symlinks or else i'm not sure we would have had them even now.

    ReplyDelete
  5. Alex,

    No my objection to sym links has nothing to do with contexts (my context dilikes are aimed at RDR, but thats another story).

    But you appear to have hit one of my hot buttons.

    The issue with symlinks is that IOCFSDH goes straight through them on the same volume. This allows shoddy code (the stuff that doesn't handle STATUS_REPARSE) to work just dandy, but it breaks code that wants to understand the name space.

    An example (there are many more, I know because every namespace aware filter I have ever worked on has needed work because of this).

    If I open /foo/bar/foo/jim and it says yes, then I have every right to assume that bar is a directory. So I may want to set up my datastructures like that. I may even want to open bar FILE_DIRECTORY. I can be sure that I won't enter a circularity when traversing it becasue NTFS doesn't alolow hard links to directories (I have work on filesystem that allowed that, don't ask).

    But in fact bar is a symlink and IOCFSDH has *SILENTLY HANDLED IT FOR ME*, further there is no way to know that without traversing the path by hand and looking at the names.

    Then a bit later I open /foo/fred/jim and that fails, but as far as the user is concerned there is no difference - fred and bar are both symbolic links but jim is off disk and I have to explain that they are different. How am I going to do that and pretend that I have a well engineered solution?

    I'm sorry, but that is just shoddy engineering. I've been there and I can guess why: it was added (easier under time pressure) to add a gross hack to one module than to fix a thousand others.

    As far as a filesystem is concerned an on disk symbolic link and a mount point at the same. They should have the same code to handle them. They don't and that sucks.

    It would have been *so* much better if IOCFSDH had be modified to return the reparse buffer and fail. Given that I would have even been OK with a "go through symoblic links" option..

    I'm not sure why ECPs and symlinks are related, but I'll guess that this allows known modules to workaround these issues..

    ReplyDelete