Thursday, April 7, 2011

Names in Minifilters - The Flags of FltGetFileNameInformation

While we're on the topic of names I'd like to address a question that comes up pretty regularly, the question of what do the various flags of FltGetFileNameInformation mean and why do they exist in the first place. The actual documentation can be found in the FltGetFileNameInformation MSDN page and flags can of course be added and their meaning might change in some ways so always check that page first.
There is a good discussion on each type of name on the msdn page for FLT_FILE_NAME_INFORMATION Structure.

FLT_FILE_NAME_SHORT


The description of a short name in the documentation is pretty good. It is important to note that this just the final component, not a full path. There is only one short name for a file. Hardlinks can only be long names. Also please note that any name might have a ~ in the name and it might be 8.3 compatible but it doesn't mean it is a short name. In fact, it is possible that a file is created that has a long name like "foo~1.txt" and a short name "foobar.txt". Going even further the short name can be set independently and so it doesn't even have to resemble the long name at all. So please don't make assumptions about how long and short names look like. The only thing you can assume is that if a name is longer than 8.3 then it is a long name. If it fits in 8.3 then it's impossible to tell just by inspecting it whether it's a long or short name. Other than calling this API (which is the easiest way anyway), one can get the short name for a file from the parent directory (see the documentation for ZwQueryDirectoryFile and some of the information classes like the FileBothDirectoryInformation) or from the file directly (see ZwQueryInformationFile and the FileAlternateNameInformation information class). Also, please note that not all file must have a short name. It is perfectly normal that there is no short name at all.
This is not a very popular name request, I don't think I've ever used it or seen a minifilter that uses this.

FLT_FILE_NAME_OPENED


This returns a path to the file. The name might contain short names for components in the path. In case there are multiple hardlinks and the file was opened by name, this name will be the appropriate one, using the appropriate directory entry and link (in other words if C:\foo\test1.txt and C:\bar\test2.txt are hardlinks to the same file and a FILE_OBJECT is opened for "C:\foo\test1.txt" then the opened name will never return "C:\bar\test2.txt" on that FILE_OBJECT). This is a very important point to remember for any file system or file system filter that implements something that looks like hardlinks (multiple names for the same file). For a more in-depth discussion on when opened names are useful see this post .

FLT_FILE_NAME_NORMALIZED


The normalized name is described quite well on the MSDN page I was talking about. One thing that is not mentioned in that page is that it also takes into account the hardlink the file was opened on (like the opened name) so two FILE_OBJECTS that are opened on the same underlying file (SCB) but through different hardlinks will have different normalized names. It very useful anywhere where  the file name influences the behavior of a filter (for example encrypt some files depending on file name or path; any name-based policy in general). Again, see my post on using names in filters.

FLT_FILE_NAME_QUERY_FILESYSTEM_ONLY


This tells FltMgr (and any name provider minifilter) to re-generate the name by querying the file system. It is meant to prevent any name cache associated with the file or any path component  from being used. Please note that this can fail because it is not always safe to query the name from the file system (see where STATUS_FLT_INVALID_NAME_REQUEST is returned). The important thing to note is that if it's not safe to query the name it doesn't mean that the requests FltMgr issues will fail but rather that the system might bugcheck or deadlock. So FltMgr has a set of checks it performs to see if it is safe to even attempt to build the name and if it is not it won't try to do anything and it will fail right away. If that check passes then FltMgr proceeds to build the name but the request might still fail because one of the operations FltMgr performs fails (not enough memory, the path does not exist and so on). This is particularly useful in debugging where a minifilter developer might want to make sure they always exercise the full code path. The performance impact of never going to the cache is pretty significant so it's unlikely to be useful in other case or in a production environment.

FLT_FILE_NAME_QUERY_CACHE_ONLY


This tells FltMgr to never query the file system and instead always go to the cache. If the name is not in the cache then FltMgr will not return anything. Possibly useful when the caller would like to get the name if it's cached but doesn't want to incur the cost of building it if it's not there. I can't think of a good use case because the name cache is a very dynamic thing and it gets purged when certain things happen and so it is impossible to rely on something being in the cache at any given time.

FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP


This is an the option that is the most similar to any regular cache. It will look the name up in the cache and if it finds it, it returns it. If it doesn't find it will check to make sure it's safe to get the name from the file system (and fail if it's not). If it is safe then it will proceed to build the name.

FLT_FILE_NAME_QUERY_DEFAULT


The difference between this query method and FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP is quite subtle and I see a lot of questions about it. In this case FltMgr will first perform the checks to see whether its safe to build the name from the file system before even looking at the cache. The reason it does this is that caches in general have a side effect of hiding problems in the data retrieval path (for name caches the data retrieval path is the name generation path). If the data retrieval path can fail in some cases then the cache might hide those failures by not exercising that path when data is cached. So FltMgr designers introduced this flag to help the developer know when they are asking for the file name in a path where they shouldn't because it's not safe to even try to get a name. So in terms of implementing debugging code, FLT_FILE_NAME_QUERY_FILESYSTEM_ONLY is the flag that has the biggest likelihood of failure (it fails if it's not safe to query the name and if it encounters any errors in the name generation path). Then comes FLT_FILE_NAME_QUERY_DEFAULT which will fail if it's not safe to query the name but if the name is in the cache it will return that (and so it won't fail in cases where errors would be encountered if trying to get the name from the file system). Finally FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP is the least likely to fail, since it will always return the name from the cache and only if the name isn't in the cache it might fail (if the check to see if it is safe to generate the name fails or if any other errors are encountered while trying to build the name). Personally I tend to always use FLT_FILE_NAME_QUERY_DEFAULT even in release builds but I've seen people that change it to FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP for release builds so to reduce somewhat the likelihood of failure.

FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER


When developing proper name providers that implement different namespaces above and below their level, in some code paths the minifilter might need a name for "above" their level and in some cases the name for "below" their level. For example, if the minifilter wants to return a name to the user (for example when populating directory entries or when completing a call for FileNameInformation) then the minifilter must get the name for "above" their layer. On the other hand, when the minifilter wants to open a file on the file system or when it wants to get a directory entry for a certain file they need the name for below their layer. Figuring out when to get the "above" name and when to use the "below" name is pretty complicated and it really depends on the architecture of the minifilter, but even building the appropriate name can be very complicated. There is one place where a name provider must always return the above name, and that is in the name provider callbacks (since those are called specifically to build names for the layers above). So by specifying this flag a minifilter can simply call FltGetFileNameInformation(…,FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER, …) when they need the "above" name and call FltGetFileNameInformation() without this flag to get the "below" name. This flag is the only way that I know of where a minifilter can tell FltMgr "send this request to filters below me AND to me".
Please note that when this flag is set all operations related to generating and normalizing a name will be sent TO this layer instead of the layers below and so when a name provider uses this flag to request the "above" name, they will see IRP_MJ_CREATE, directory queries and other operations. In particular, a name provider that calls FltGetFileNameInformation(…,FLT_FILE_NAME_NORMALIZED|FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER, …) in preCreate can easily get in trouble if IRP_MJ_CREATE is called on the normalization path (which is pretty common) because they'll see that create and call FltGetFileNameInformation again which will issue another create which will go to the minifilter again and so on until the stack runs out…

FLT_FILE_NAME_DO_NOT_CACHE


In the case of a file system any IRP_MJ_CREATE request that reaches the layer must be resolved to a file or failed. However, for a filter things are more complicated. Virtualization filters for example might only virtualize a part of the namespace and so only some of the IRP_MJ_CREATE requests that they see belong to them. They must decide based solely on the information that is available to them at IRP_MJ_CREATE time whether they should step in and virtualize the file or not, which means the file name must be used for that. But as we already know getting the file name can be pretty complicated and it would be nice if the minifilter could call FltGetFileNameInformation(). Calling FltGetFileNameInformation() might send requests all the way to the file system with a name that might be wrong at the layers below the minifilter (if the minifilter is indeed supposed to virtualize the file).  The problem isn't so much that the wrong name might reach the wrong layers (since the operations involved in the name generation path are non-destructive) but rather that the wrong name might be cached at those layers. So in this case the name provider minifilter can call FltGetFileNameInformation() and specify FLT_FILE_NAME_DO_NOT_CACHE to make sure that the name (right or wrong) doesn't get cached as a result of that call.

FLT_FILE_NAME_ALLOW_QUERY_ON_REPARSE


This flag is needed because in postCreate a call to FltGetFileNameInformation() fails if the request wasn't successful (if the file wasn't opened). However, minifilters that rely on reparse points to know when to perform some action might be written in a way that they do nothing in preCreate and in postCreate if they get STATUS_REPARSE and the reparse point belongs to them then they know they must act. The problem is how to get the name in that case ? They could send the IRP_MJ_CREATE down again and specify FILE_OPEN_REPARSE_POINT and query the name from the file system directly, but this is pretty high overhead and they would need to normalize the name themselves. Alternatively they could always call FltGetFileNameInformation in preCreate and in postCreate only use it when they need to but this adds a lot of overhead to each IRP_MJ_CREATE. This is where this flag can be used. When the file system completes an operation with STATUS_REPARSE and returns a reparse tag the FILE_OBJECT->FileName is not modified and so the owner of that reparse tag can call FltGetFileNameInformation(…,FLT_FILE_NAME_ALLOW_QUERY_ON_REPARSE ,...) and FltMgr will generate the name based on FILE_OBJECT->FileName. As the documentation clearly states,  it is the caller's responsibility to ensure that the FileObject->FileName field was not changed. So this cannot be called in the general case, whenever a minifilter gets STATUS_REPARSE and they don't own the reparse tag.
Also, i'm not sure why this is listed as a name provider only flag, it seems to me it could be used by filters that are not name providers. Though I can't tell for sure if it would work or not so some experimentation will be necessary.


7 comments:

  1. Our HSM minifilter uses FLT_FILE_NAME_ALLOW_QUERY_ON_REPARSE and isn't a name provider. We don't actually change the namespace, and so have no reason to implement the name provider callbacks. The reparse point just holds the information needed to recall the file from its remote location, we need the name from FltMgr to determine where the file with the reparse point is currently.

    ReplyDelete
  2. Thanks for confirming that this works for regular minifilters that aren't name providers.

    ReplyDelete
  3. Hi Alex, It would be great if you could share a post on Shadow/Proxy File Object (SFO) model where mini-filters owns file cache (either to present a different view of the data from underlying FSD or some other purpose).

    ReplyDelete
  4. There is an NT Insider series of articles on this very subject, see this: http://www.osronline.com/article.cfm?article=571

    ReplyDelete
  5. One thing that might be useful to expound upon: FltGetDestinationFileNameInformation does not appear to support all of the flags -- in particular FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER. What should name provider minifilters do in this case?

    ReplyDelete
  6. Hmm... I'm sorry but could you please post a bit more information about your setup (os version)? From my win7 debugger i see this:
    0: kd> uf fltmgr!FltGetDestinationFileNameInformation
    ...
    fltmgr!FltGetDestinationFileNameInformation+0x84:
    960a8344 f7451c00000001 test dword ptr [ebp+1Ch],1000000h
    960a834b 8bb5e0feffff mov esi,dword ptr [ebp-120h]
    960a8351 7419 je fltmgr!FltGetDestinationFileNameInformation+0xac (960a836c)

    fltmgr!FltGetDestinationFileNameInformation+0x93:
    960a8353 6a01 push 1
    960a8355 ff7608 push dword ptr [esi+8]
    960a8358 e89507feff call fltmgr!FltpGetCallbackNodeForInstance (96088af2)
    960a835d 53 push ebx
    960a835e 894614 mov dword ptr [esi+14h],eax
    960a8361 ff7608 push dword ptr [esi+8]
    960a8364 e88907feff call fltmgr!FltpGetCallbackNodeForInstance (96088af2)
    960a8369 894618 mov dword ptr [esi+18h],eax
    ....

    Which i read to mean that if FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER is set then the code calls FltpGetCallbackNodeForInstance twice, presumably for the Generate and Normalize callbacks. I imagine it does something with them once this is done.

    I don't have any minifilter handy that would reproduce this issue. Could you please double check that that is indeed the case before I spend some time writing something like this ? Or feel free to contact me offline if you'd like to send me more information on the issue

    ReplyDelete
  7. Using your name provider additions to the passthrough minifilter, I was able to confirm that you are correct. Apparently there are some cases (which I haven't identified the exact cause yet) where it doesn't happen... probably a bug in the offending minifilter.

    ReplyDelete