Thursday, February 11, 2010

Names and file systems filters

Proper usage of names in file system filters and minifilters is a topic that comes up a lot. The reason for this is that sooner or later one has to deal with names and it is a particularly complicated area. In this post I’ll try to address some of the common problems minifilters have with names and suggest some ways that can be used to achieve some of the common scenarios.

There are a couple of factors that make names hard to work with:

1. Computer users are used to names. They think they understand file names. They might not understand many other things, but the fact that a file has a name is something that is pretty clear to everyone. This affects file system filter developers in two ways. First, all developers started as users and as such they suffer from thinking in terms of names in cases where they shouldn’t. Second, most file system filters are meant to be used by users (or written according to user specifications) and as such they need to work with names, because that’s what users know and want.

2. File systems don’t really care about names. I mean, they need names to talk to users, but file systems developers spend most of their time thinking about improving IO performance or reliability and in general about things that happen after a file is created (names matter to file systems pretty much only in the create path). File systems don’t care whether your file is a word document or not, but filters sometimes need to know and names play into this.

3. There are so many of them. You have long names and short names and then file IDs object IDs. And then you have hardlinks and symlinks. Also, there are alternate data streams that also have names. And then you have remote file systems where one needs to care about machine names and redirector names and so on. An additional problem is that users don’t usually get the finer details about all these, so specs they come up with often don’t properly address all the possible interactions.

4. The IO stack in Windows is asynchronous. Which means that a name can change at any time. Which in turn means that once a filter gets a name, it might already be useless or wrong. Sure, one can argue it’s a rare occurrence and regular users wouldn’t run into it. But what about malicious users ? You are unlikely to run into a race between renames and transactions on a regular user’s machine, but what if someone makes it happen ? A real product can’t afford to ignore such cases.

Now let’s take a look at some of the things minifilters try to do with names. It turns out that there aren’t that many. In fact, there are pretty much three types of things that minifilters do with names. Each of these classes has specific requirements which I will address at length:

1. Open files by calling FltCreateFile. This is done to scan the file or to read or write contents, to encrypt it or something like that. Once the file has been opened there usually is a handle that is used in subsequent operations so the name is not interesting anymore. Things to note here are:

  • the filter must know the name of the file at this moment. If the filter is trying to open a file that has been renamed, it needs to know the new name. Opening the file by the old name might lead to problems.
  • FltCreateFile can only be called at PASSIVE_LEVEL.

2. Send the name to user mode for some reason (to display it to the user, or to open the file in user mode, or to log operations and so on). The vast majority of these operations are not synchronous (i.e. there is no operation that is blocked in kernel mode waiting for the user to read the message). There is one common exception, which is Anti-virus software which in case it finds a virus will sometimes prompt the user for action and it needs to display the file name (it will also probably log the name, but that can be done asynchronously). The reason this is important is that by the time the name is consumed (the user reads the log for example) the name could very easily have changed. Things to note:

  • the name of the file is usually less important. If a filter logs writes, if a rename happens at the same time writes are happening, the order doesn’t usually matter much.
  • because the information is meant to consumed by the user, performance and lag in presenting the information doesn’t matter. Even in the AV case, where the user must chose some action before the kernel thread can continue, the user is much slower than the processor. So for these types of scenarios performance is not usually important (in these paths at least; overall performance impact of the product is a different issue).

3. Policy checks. This is usually done in an effort to understand if a file is interesting or not to the filter. This is usually the case where the is some policy that is enforced by the user. For example and anti-virus filter might ignore files under a certain path or an encryption filter might only encrypt .doc and .txt files. Key things here:

  • it is a bad idea to check if the file is interesting by querying and parsing the file name every single time the filters needs to know this. A better design is to cache the information about the file somewhere and then update it only when it changes. Since we are talking about name based policy here, the only place where it can change is in the rename path. Stream contexts are particularly suited for this task and what filters normally do is attach a stream context if the file is interesting. Then, when they need to decide whether the file is interesting or not they can simply get the stream context and if one is present then it is interesting.
  • The stream context is initialized at create time and is potentially changed at rename. Both these operations happen at PASSIVE_LEVEL. Some filters prefer to query the name when the operation they care about happens, but this approach usually generates more problems that it solves.

4. Virtualization. Minifilters will use names to create a virtual namespace (inject virtual files or folders into the file systems namespace or hide files). This has a different set of challenges (many information classes that expose names, directory change notifications, oplocks and so on) but querying names is fairly easy. Also, the minifilter either is the owner of part of the namespace, which means it can serialize things and it is in the position to authoritatively know what the name of the object is or it is hiding part of the namespace, which means there will be no operations on that part of it (since no one knows it’s there).

Now that we have all the pieces in place, let’s look at some of the common scenarios.

By far the most common failure is to try to get a name where it is not supported. Like at DPC or in the paging path (people want names when writes happen to a file). This has in the past made people believe that the name support in filter manager (via FltGetFileNameInformation and friends) is broken. However, that is not the case. The important thing to understand in this case is that it is almost never the case that the name is actually needed in these cases. And by needed I’m referring to how the name is going to be used. If it is a class 1 (looking at my classification above) operation (FltCreateFile) then if the name can’t be obtained FltCreateFile cannot be called anyway (by that i mean that if FltGetFileNameInformation can’t get the name then it is illegal to call FltCreateFile). For the 2nd class of operations, the approach is to queue an async work item to get the name for this file and use it (send it to user mode, log it to the file and so on). Remember that neither accuracy nor performance usually matter here so not waiting for the work item to finish is usually ok... For the 3rd class of operations it only matters if the context is not set up yet because once a context is in place the decision should be made based on it. However, the approach of getting the name the first time it is needed has some drawbacks like the fact that getting the name and setting the context can race with renames (outside of the IRP_MJ_CREATE path) so the name might become invalid immediately; also the lack of a context might mean that the file is not interesting as well as the fact that this is the first operation for a file…

 

Another common scenario is to try to open the same file the user has open. Some anti-virus filters do this to scan the files. So the minifilter gets the name of the file in pre or post IRP_MJ_CREATE and then tries to open it. This works in the sense that one can get the name both in pre and post create, but it is problematic because the name of the file can change (however AV scanners should avoid scanning in preCreate for other reasons…). It’s hard to come up with a scenario where a malicious file might end up on a user’s system by taking advantage of this, but even so it is something to consider. Another common scenario is to open an alternate data stream for a file the user has opened in a filter. The same set of issues around racing with renames applies. A solution for this is to use a rather unknown feature of the IO system, relative opens. For any ZwCreateFile or FltCreateFile when initializing the object attributes with the InitializeObjectAttributes macro there is a parameter that accepts a handle to the root directory so that a file can be opened relative to a directory. However, this can be used to solve the problems in the example above. If the name passed in to InitializeObjectAttributes is empty (the Length = MaximumLength = 0 and Buffer = NULL) then the create will open the same stream. So if a filter wants to open another file object for a stream the user has open (or an alternate data stream for the same file) then the filter can call InitializeObjectAttributes with a handle to the user’s FILE_OBJECT (one way to generate a handle is via ObOpenObjectByPointer) and use an empty name (to open exactly the same stream) or just the name of stream (to open an alternate data stream) as the ObjectName.

 

One more thing I would like to point out is that a call to FltGetFileNameInformation in preCreate might fail if the create itself will fail. So if FltGetFileNameInformation fails with a weird status in preCreate, please make sure to investigate if the user’s create would actually have succeeded. In such cases where getting the file name in preCreate is vital to the operation of the filter then the filter should most likely fail the user’s create if FltGetFileNameInformation failed. Generally it would be better it things were done in postCreate, where possible.

 

There are a lot more interesting things with names but these are some of the common things that filters try and have problems with. Feel free to ask questions about specific scenarios.

5 comments:

  1. I am trying to use ObOpenObjectByPointer to get a handle so that I can then open an alternate stream to store some information about the process which is writing to a file.

    It seems to be freezing in this call and I never get control back.

    I am doing this in my postWrite handler, any thoughts? I assume it has to do with write access and too many processes trying to write, leading to deadlock. But, I don't want to write, I just want to use the handle to get a pointer to a file object for an alt stream...

    My call looks like this -
    status = ObOpenObjectByPointer(
    FltObjects->FileObject,
    OBJ_KERNEL_HANDLE,
    NULL,
    0,
    NULL,
    KernelMode,
    &fileHandle);

    Any help would be appreciated. Also, I really appreciate the blog.

    ReplyDelete
  2. There are a couple of issues with this approach: ObOpenObjectByPointer should only be called <= APC, but in postWrite you might be at DPC. Also, if the write is a paging write then you must not touch any paged memory (code or data) or you will deadlock.
    In general trying to open a file while processing a WRITE is not a good idea. You should probably revisit your architecture so that you don't rely on this.

    The NTFSD forum is a better place to ask this sort of questions (http://www.osronline.com/showlists.cfm?list=ntfsd).

    With this said, one thing you could do is capture whatever information you want to log, reference the FILE_OBJECT and then queue a workitem in which you would open the alternate data stream (and then dereference the FILE_OBJECT) and do your logging..

    ReplyDelete
  3. Hi Alex, How to retrieve a Full Filename in PreCrete (for policy checks )if my mini-filter is a Layered FSD.If I call, FltGetFileNameInformation() it seems to be crashing in NTFS as it is sending my FileObject down the stack. Do I have to maintain FileName myself in FCB and rely on constructing full path while taking care of renames?

    ReplyDelete
  4. Well, if you are a layered FSD you must take great care to NEVER send your FILE_OBJECT down to the file system. As such your minifilter must implement name provider callbacks. Once you have that you can call FltGetFileNameInformation() with the FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER flag, which will send the request to your name provider. This is cool because it allows you to keep your implementation in one place and not scattered around to wherever you need to call FltGetFileNameInformation().
    This aside, it's not clear to me what you mean by "my FileObject" in PreCreate because a FILE_OBJECT becomes yours only when you complete the create yourself (the owner of a FILE_OBJECT is the layer that returns STATUS_SUCCESS (or one of the other appropriate success values except STATUS_REPARSE) for the IRP_MJ_CREATE request). So which FILE_OBJECT gets passed in to NTFS ?

    ReplyDelete
  5. Great article!

    One more I should notice. Calling FltGetFileNameInformation with FLT_FILE_NAME_NORMALIZED for network path in pre create operation consumes a lot of time. So much that we had to use FLT_FILE_NAME_OPENED...

    ReplyDelete