Thursday, June 28, 2012

The Flags of FILE_OBJECTs - Part IV

This is the last part in the series of posts on the flags of the FILE_OBJECT. The previous posts can be found here:

  1. The Flags of FILE_OBJECTs - Part I
  2. The Flags of FILE_OBJECTs - Part II
  3. The Flags of FILE_OBJECTs - Part III

And here is the final set of flags:

  • FO_FILE_FAST_IO_READ - 0x00080000 - according to the documentation page for the FILE_OBJECT structure, this flag is set to indicate that a fast I/O read was performed on this FILE_OBJECT. However, looking at the FastFat source code reveals more. First, the flag can be set when the file is read through a regular request as well, not only through fast IO (see FatCommonRead and FatMultiAsyncCompletionRoutine). There is also an interesting bit in the create path (FatCommonCreate), where the flag is set if the caller requested EXECUTE access. So it appears that in fact this flag is set when the user performs a read on the file, but not for paging reads (which makes sense, paging reads are, in a way, the system reading from a file and not exactly the user). The reason for this becomes clear when looking at where the flag is used in FastFat: in the FatUpdateDirentFromFcb function, which uses the flag to figure out if it needs to update the last access time on the file.
  • FO_RANDOM_ACCESS - 0x00100000 - this flag is set to indicate that the caller intends to use this FILE_OBJECT for random access (forward and backward seeks). It maps to the FILE_RANDOM_ACCESS flag and is set when creating the handle. The flag is pretty similar to the FO_SEQUENTIAL_ONLY but works in the opposite way: it's an indication to the cache manager that it can't assume that once a location in the file was touched once it won't be touched again so it shouldn't be overly aggressive in removing things from the cache. Unlike FO_SEQUENTIAL_ONLY this flag can't be set or queried using the FileModeInformation.
  • FO_FILE_OPEN_CANCELLED - 0x00200000 - this flag indicates that the IRP_MJ_CREATE for the FILE_OBJECT was cancelled. In other words, someone called IoCancelFileOpen or FltCancelFileOpen. The flag is set in IoCancelFileOpen and is used in IopParseDevice where the IO manager needs to figure out if it should cleanup the FILE_OBJECT (by issuing an IRP_MJ_CLOSE) or not. The documentation for the FltCancelFileOpen routine is also a pretty good source of information for this flag.
  • FO_VOLUME_OPEN - 0x00400000 - this flag indicates that the FILE_OBJECT is an open for a volume. Interestingly enough this flag is set by the IO manager after the IRP_MJ_CREATE IRP completes, so it would not be available in preCreate. However, preCreate is where it's most useful (see the simrep and MetadataManager minifilter samples in the WDK) and so FltMgr parses the IRP_MJ_CREATE parameters and set the FO_VOLUME_OPEN flag when appropriate so that it's available even in preCreate (see this NTFSD post). This means that this flag is usable by minifilters for all operations. For legacy filters things are more complicated (as usual): they might see the flag set if there is a FltMgr frame above them but they might not see it otherwise (so I guess legacy filters shouldn't expect to see this flag set).
  • FO_REMOTE_ORIGIN - 0x01000000 - this is a flag that indicates that the FILE_OBJECT was created to satisfy a remote request, which is something done by remote file systems. There are two functions that deal with this flag: IoIsFileOriginRemote which pretty much just queries this flag, and IoSetFileOrigin which sets it. The documentation for both functions mentions this connection (between these functions and FO_REMOTE_ORIGIN) but fails to explain why this is done. The reason is that the cache manager handles files coming from a remote file system differently and so this flag is a hint to the cache manager.
  • FO_DISALLOW_EXCLUSIVE - 0x02000000 - this flag is set during the create processing if the FILE_DISALLOW_EXCLUSIVE flag was set on the request. It would appear that this flag can only be set during an IRP_MJ_CREATE operation (there is a definition for FO_FLAGS_VALID_ONLY_DURING_CREATE which is what I'm basing this on). I don't know exactly what this flag does and looking over my notes I've never actually used it or had to interact with it so I guess I'll have to investigate this and write a post on it :).
  • FO_SKIP_COMPLETION_PORT - 0x02000000 - this flag is pretty well explained in the FILE_OBJECT structure page. Additionally, the documentation for SetFileCompletionNotificationModes also explains how this flag might be used (this flag maps to the FILE_SKIP_COMPLETION_PORT_ON_SUCCESS flag). There is also this post that explains a bit more about the flags that affect behavior of IO completion (the following flags): Designing Applications for High Performance - Part III. I'm guessing this flag can be set using the FILE_IO_COMPLETION_NOTIFICATION_INFORMATION structure and respective information class.
  • FO_SKIP_SET_EVENT - 0x04000000 - this flag is similar to FO_SKIP_COMPLETION_PORT. It maps to the FILE_SKIP_SET_EVENT_ON_HANDLE (as per the page for the SetFileCompletionNotificationModes function).
  • FO_SKIP_SET_FAST_IO - 0x08000000 - this flag appears to be similar to the previous ones but doesn’t seem to be something that can be set using SetFileCompletionNotificationModes(). Based on the documentation page for the FILE_OBJECT structure it seems that this flag controls whether the FILE_OBJECT event should be signaled when an operation goes through a fast IO call and therefore completes inline. Naturally in that case there is no need to set the event. Looking at wdm.h it would seem that this flag maps to the FILE_SKIP_SET_USER_EVENT_ON_FAST_IO.

Thursday, June 21, 2012

The Flags of FILE_OBJECTs - Part III

This is the third part in the series about FILE_OBJECT flags. Hopefully the next one will be the last one, I didn't expect this would take this long. Anyway, here is the next set of FILE_OBJECT flags:

  • FO_DIRECT_DEVICE_OPEN - 0x00000800 - this flag indicates that this FILE_OBJECT was the result of a "direct device open". I have already explored this topic in my previous post on Opening Volume Handles in Minifilters. This means that the FILE_OBJECT is in fact a handle to the storage stack volume instead of the file system volume. Please note that this flag is not the same as the FO_VOLUME_OPEN flag (and in fact they're incompatible). This flag isn't really used by file systems at all since the target device is not a file system device but rather the actual storage underneath a file system. As such file system filters will also not be involved in processing FILE_OBJECTs that have this flag set.
  • FO_FILE_MODIFIED - 0x00001000 - this flag indicates that the FILE_OBJECT was used to modify the file. It is primarily used by the file system (to track handles that didn't modify the file or to figure out when it should flush things). As far as I can tell the IO manager doesn't really use it but it will set it. The FastFat sample in the WDK shows in great detail how FO_FILE_MODIFIED is used.
  • FO_FILE_SIZE_CHANGED - 0x00002000 - this flag indicates that the file size was changed using this FILE_OBJECT. This is also primarily used by the file system and the IO manager will only set it. This flag is used in conjunction with FO_FILE_MODIFIED and, like FO_FILE_MODIFIED, the FastFat sample is a great example of a file system uses it.
  • FO_CLEANUP_COMPLETE - 0x00004000 - this flag means that the file system has already processed an IRP_MJ_CLEANUP request on this FILE_OBJECT. There are certain restrictions on which operations are allowed on such FILE_OBJECTs, as can be seen in both the CDFS and FastFat samples. Windows actually uses such FILE_OBJECTs (that have been cleaned up) fairly frequently and thus filters might need to pay attention to this flag in order to be able to figure out which operations might be allowed on it.
  • FO_TEMPORARY_FILE - 0x00008000 - this flag indicates that the file was opened with the FILE_ATTRIBUTE_TEMPORARY. File systems don't seem to do much beyond setting the flag. The consumer of this information is the Cache Manager, that does more aggressive write caching for temporary files (since it's expected that the file will go away anyway and so there is no point in flushing data to disk; please note that memory pressure might require that data gets flushed, it's just that the caching is more aggressive). See the notes under "Caching Behavior" on the page for the CreateFile function.
  • FO_DELETE_ON_CLOSE - 0x00010000 - as noted in my previous post on Using FileModeInformation, I've never seen this flag used anywhere until I looked at the implementation of IopGetModeInformation. So as far as I can tell this flag really isn't used. In the current implementation however it can be read by a query for the FileModeInformation class.
  • FO_OPENED_CASE_SENSITIVE - 0x00020000 - this flag indicates whether name operations on the FILE_OBJECT should be performed in a case-sensitive way or not. This is very important for file system filters that deal with the namespace. This applies not only to filters that change the name space (name providers must know whether to process things in a case-sensitive or in a case-insensitive way) but also for filters that just look at file names and don't actually change anything. There is a related flag that is very important, the SL_CASE_SENSITIVE. I've written multiple posts that mention case sensitivity but the more interesting ones are the one on FILE_OBJECT Names in IRP_MJ_CREATE and the one on Names in Minifilters - Implementing Name Provider Callbacks.
  • FO_HANDLE_CREATED - 0x00040000 - this flag indicates that a handle has been created for the FILE_OBJECT. From the IO manager's perspective the CREATE operation has completed successfully (so it's too late to call FltCancelFileOpen or IoCancelFileOpen). The flag is set after the successful IRP_MJ_CREATE is completed back to the IO manager, in the IopParseDevice function. The documentation for ObOpenObjectByPointer implies that it's only safe to be called for FILE_OBJECTs that have this flag set. I should have mentioned this in my previous post on Duplicating User Mode Handles, but I can see I've only added an ASSERT but didn't explain it.

Thursday, June 14, 2012

The Flags of FILE_OBJECTs - Part II

In the post last week we discussed some of the various flags of the FILE_OBJECT, but since there are so many of them I left some for this week. So here we go again:
  • FO_SEQUENTIAL_ONLY - 0x00000020 - this flag is set to indicate that the caller guarantees that this FILE_OBJECT will only be used for sequential requests (no seeks). It maps to the FILE_SEQUENTIAL_ONLY flag and is usually set when Creating the handle, though it's possible to set and query it through the FileModeInformation information class. This flag is simply an indication to the cache manager that it can be more aggressive about removing things from the file system cache (since the flag implies that once data at a certain offset was touched it will likely not be needed again). If the user sets this flag but doesn't honor the contract (i.e. if IO isn't actually sequential (so it's at random offsets in the file)) then the cache performance will not be optimal. This flag (FILE_SEQUENTIAL_ONLY) is useful when copying a file or scanning a file for hashing and so on. Without the FILE_SEQUENTIAL_ONLY flag the file will be kept in the cache and will compete with everything else in there, so it might evict useful data from the cache (for example you have an application open and you switch to explorer and copy some files around... if the files that are copied are opened with the FILE_SEQUENTIAL_ONLY flag then the cache will likely prioritize existing things in cache over the file contents so when you switch back to your application it will still be in the cache. Without the flag it is more likely that the application will be removed from the cache because of the file copy...).
  • FO_CACHE_SUPPORTED - 0x00000040 - according to the MSDN page on the FILE_OBJECT structure this flag should only be set if the file system implements supports for the FSRTL_ADVANCED_FCB_HEADER structure. However, all windows file systems should implement this structure anyway. This flag is set by the filesystem as a means to keep track whether the open for this FILE_OBJECT had the FILE_NO_INTERMEDIATE_BUFFERING set (if FILE_NO_INTERMEDIATE_BUFFERING was set then this must be clear). This flag isn't terribly useful since the flag that controls most of the caching behavior is FO_NO_INTERMEDIATE_BUFFERING. I'm not sure why someone felt there was a need to add this flag when FO_NO_INTERMEDIATE_BUFFERING conveys the same information.
  • FO_NAMED_PIPE - 0x00000080 - this flag indicates that the FILE_OBJECT is for a named pipe. This is set by the file system (NPFS in this case) so it's not available in preCreate. According to a post on NTFSD, this flag can also be set for the connection to the IPC$ by the redirector, even though that FILE_OBJECT is not backed by NPFS on the remote server. On the local system everything works pretty much as you expect, with the flag being set for all NPFS FILE_OBJECTs.
  • FO_STREAM_FILE - 0x00000100 - this flag is set for FILE_OBJECTs created using the IoCreateStreamFileObject API (or friends: IoCreateStreamFileObjectEx, IoCreateStreamFileObjectEx2 and IoCreateStreamFileObjectLite). I've discussed stream FILE_OBJECTs in my post About IRP_MJ_CREATE and minifilter design considerations - Part V. Please note that this does NOT indicate the FILE_OBJECT is for an alternate data stream (I've seen some people confuse them). The two concepts (stream file objects and alternate data streams) are unrelated.
  • FO_MAILSLOT - 0x00000200 - this flag is similar in intent with the FO_NAMED_PIPE flag, except that in this case it indicates that the underlying file system is the mailslots file system (MSFS). According to the OSR thread that I pointed to earlier, apparently for FILE_OBJECTs for remote file systems the redirector doesn't set the FO_MAILSLOT flag even if the remote FILE_OBJECT has it. But anyway for local FILE_OBJECTs this flag should be set just like the FO_NAMED_PIPE flag.
  • FO_GENERATE_AUDIT_ON_CLOSE - 0x00000400 - as the MSDN page on the FILE_OBJECT structure points out this flag is deprecated. It has the same value as the next flag, FO_QUEUE_IRP_TO_THREAD.
  • FO_QUEUE_IRP_TO_THREAD - 0x00000400 - the only information I've been able to find has also been the MSDN page on the FILE_OBJECT structure. According to that page it means that IRPs will not be queued to this FILE_OBJECT and will be queued to the thread instead (at least that's what I get from the name). This has to do with the thread agnostic IO that was a new feature for Vista. The idea is that instead of queuing IO to a thread there are cases where it can be queued to a FILE_OBJECT (as an optimization). This is described in more detail in a post by PaulSli on Doron's blog, New for Windows Vista: thread agnostic I/O. My guess is that this flag disables that optimization. I've seen this flag set occasionally but I've never had to figure out where it's set and under what circumstances that happens. Still, I hope that this bit of investigation helps.
This is it for today, these things take quite a while to investigate so I can't do too many of them at once.

Thursday, June 7, 2012

The Flags of FILE_OBJECTs - Part I

In this set of posts I'd like to discuss the various flags defined for FILE_OBJECTs and how they are used in the system. The reason I'm doing this is because for most of them the documentation is scarce or wrong and I've been meaning to document them for a while. Please note that most of this information is based on my experience with them and is not meant to be an exhaustive list of how they might be used.

The only documentation I've found on MSDN is on the page about the FILE_OBJECT structure, but as I said before it's very scarce on the details. So without further delay let's start with the flags:

  • FO_FILE_OPEN - 0x00000001 - this flag is documented as being deprecated. The sample file systems (FastFat, CDFS) don't even mention it anywhere. I've also been unable to find it set anywhere (I expected either the file system to set it or the IO manager to set it but as far as I can tell it's not really used at all).
  • FO_SYNCHRONOUS_IO - 0x00000002 - this is a pretty complicated flag (in that it controls different types of behavior) . It can only be set during the create operation (before the FILE_OBJECT is sent through the IO stack) if the caller uses the FILE_SYNCHRONOUS_IO_ALERT or FILE_SYNCHRONOUS_IO_NONALERT options. If none of these is set then the FILE_OBJECT won't have the flag set. This flag controls a couple of different behaviors:
    • if the flag is set then the FILE_OBJECT->CurrentByteOffset field is updated after each IO operation to reflect the current file position pointer and also it enables using the FILE_OBJECT->CurrentByteOffset as the current file offset to perform IO at (FILE_USE_FILE_POINTER_POSITION can only be used for a file that has FO_SYNCHRONOUS_IO set). So in a way this flag controls whether the current position can be used for this FILE_OBJECT.
    • if the flag is set then the IO manager will try to synchronize operations using that FILE_OBJECT. So if multiple threads use the same FILE_OBJECT then the IO manager will only allow one operation at a time to proceed. Please note that this doesn't mean that file system filter will only see one operation at a time since it's possible that other components on the system issue IO directly on the FILE_OBJECT without synchronizing with the IO manager (pretty typical with the Cache Manager).
    • If the flag is set then IoIsOperationSynchronous (and its friend FltIsOperationSynchronous) will return TRUE (as long as this isn't an asynchronous paging IO request, in which case it doesn't matter whether FO_SYNCHRONOUS_IO is set). The return value of IoIsOperationSynchronous and FltIsOperationSynchronous might have deeper impact on how the request is processed by the file system or file system filters (which might decide to process it inline rather than queue it if it's synchronous).
  • FO_ALERTABLE_IO - 0x00000004 - as the FILE_OBJECT structure documentation page mentions, this flag controls how the IO manager waits when using the FILE_OBJECT. If the flag is set then most waits (which can be waits to acquire the FILE_OBJECT to synchronize IO on it or just waiting for IO to complete) are alertable. The flag will be set depending on the presence of FILE_SYNCHRONOUS_IO_ALERT or FILE_SYNCHRONOUS_IO_NONALERT. The flag is only meaningful for FILE_OBJECTs for which FO_SYNCHRONOUS_IO is set and can be set and cleared in the Create path and using the FileModeInformation information class. Please note that setting or clearing it using FileModeInformation only works if the FILE_OBJECT has the FO_SYNCHRONOUS_IO set. Another thing to note is that file systems seem to ignore this flag.
  • FO_NO_INTERMEDIATE_BUFFERING - 0x00000008 - this flag indicates whether the file was opened for non-cached IO. It maps to the FILE_NO_INTERMEDIATE_BUFFERING flag and can only be set during Create. It controls two different behaviors as well:
    • If it is set then there is no file system caching for this FILE_OBJECT. The implications here are pretty heavy. A lot of the cache management specific operations and flags are disabled (I'll discuss this more further down when I talk about some of the other flags). Also the file system code paths are quite different when dealing with cached IO and non-cached IO. This also means that for IO happening on this FILE_OBJECT the file system might need to perform flushes to keep the cached and non-cached views of the data coherent.
    • Another implication of this flag being set is that IO must be aligned to the sector size of the underlying device. This impacts reads and writes as well as setting the current file pointer for example. Unaligned operations on this FILE_OBJECT will generally fail at the IO manager level with STATUS_INVALID_PARAMETER.
  • FO_WRITE_THROUGH - 0x00000010 - this flag indicates that any data written on this FILE_OBJECT should be flushed immediately to disk. It maps to the FILE_WRITE_THROUGH flag. It can be set in the Create file path and also at any time with the FileModeInformation information class. This flag cannot be set when FO_NO_INTERMEDIATE_BUFFERING is set (it makes no sense if there is no cache at all). Please note that this flag doesn't require that operations be aligned to the underlying sector size, since caching is still in effect, it's just that writes are immediately flushed to disk.

I'll stop here for today but I plan to address all the flags in the following posts.

Thursday, May 31, 2012

Using FileModeInformation

I've recently had to debug again how the system handles the FileModeInformation information class and whenever that happens (debugging something more than once) it's generally a good indication that I need to write a blog post, at least so that I don't have to debug yet again in the future. Incidentally, it's quite scary when I occasionally search for something and I discover that I have actually written a blog post about it but I've completely forgotten not only the information in the post but even the fact that I have written about it. I hope this explanation will come in quite handy at some point in the future when I'll actually write a blog post about something I've already blogged about :).

Anyway, on to business. FileModeInformation is an interesting information class because it can be used to query and set some of the create options that were used when the file was opened. Another reason why it is interesting is because it's completely handled in the IO manager (i.e. a request is never sent to the file system). Since this information class is also part of the FileAllInformation information class, the fact that the file system doesn't handle this leads to pretty interesting semantics (since the IO manager must fill in part of the FILE_ALL_INFORMATION structure while the file system fills in the rest). I've discussed this in my previous blog post on Filters And IRP_MJ_QUERY_INFORMATION.

Other than the specifics of implementing the FileAllInformation class in a filter, there are a couple of other aspects that I think are interesting to discuss about the FileModeInformation class.

  • Querying - the list of CreateOptions that can be queried using the FileModeInformation is limited to (the information is listed on the MSDN page 2.4.24 FileModeInformation):
    • FILE_WRITE_THROUGH
    • FILE_SEQUENTIAL_ONLY
    • FILE_NO_INTERMEDIATE_BUFFERING
    • FILE_SYNCHRONOUS_IO_ALERT
    • FILE_SYNCHRONOUS_IO_NONALERT
    • FILE_DELETE_ON_CLOSE - please note that the MSDN documentation states that this flag isn't implemented yet and is never returned as set. However, peeking at the implementation (do a uf nt!IopGetModeInformation) clearly shows that the FILE_DELETE_ON_CLOSE is set when the FILE_OBJECT flag FO_DELETE_ON_CLOSE is set. This is the only time I've actually seen FO_DELETE_ON_CLOSE used anywhere. I had tried a couple of times in the past to see where it gets set but could never find it (though obviously I couldn't rule out the possibility that it does get set somewhere). Now, looking at the implementation of IopGetModeInformation and at the documentation page I can draw the conclusion that this flag isn't actually set anywhere at all.
  • Setting - the list of attributes that can be set is even shorter. It is addressed in the MSDN page 3.1.5.14.7 FileModeInformation. One thing to note is that the list of attributes that can be set is also available as a bitmask in the WDK, as FILE_VALID_SET_FLAGS. Also the list of flags that can be set using FileModeInformation indicates which of these modes make sense to change for a file that has already been open (and possibly in use) for a while (in other words, flags that can change file system behavior at a certain point and not only when the file is opened; for example, setting the FILE_DELETE_ON_CLOSE attribute doesn't make sense once the file was opened so it's not available to set). Anyway, these are the attributes that can be changed using the FileModeInformation information class:
    • FILE_WRITE_THROUGH
    • FILE_SEQUENTIAL_ONLY
    • FILE_SYNCHRONOUS_IO_ALERT
    • FILE_SYNCHRONOUS_IO_NONALERT
  • Issuing these requests from a minifilter - this is a pretty interesting issue. The dedicated FltMgr APIs, FltQueryInformationFile and FltSetInformationFile do little else that to allocate a FLT_CALLBACK_DATA structure and send it to the file system. However, since the file system doesn't actually implement these requests it's impossible to use FileModeInformation with either FltQueryInformationFile or FltSetInformationFile (though a request will actually be sent). Since the actual implementation happens in the IO manager a filter could call the ZwQueryInformationFile and ZwSetInformationFile APIs but those require a handle for the FILE_OBJECT and filters don't often have that. So what can a file system filter do if it wants to query or set these flags ? As far as I can tell there are two options:
    1. The file system filter can implement the equivalent IO manager code (which is actually not that hard since the steps are documented pretty well on the MSDN page 3.1.5.14.7 FileModeInformation and the implementation for nt!IopGetModeInformation is pretty straightforward).
    2. The file system filter can call an undocumented API, IoSetInformation(), that is a pretty good replacement for ZwSetInformationFile(). Unfortunately I couldn't find a similar function for ZwQueryInformationFile() so filters would most likely have to resort to checking flags directly.

Thursday, May 24, 2012

Writing to Read-Only Files

This week I want to talk about a topic that's pretty interesting, the topic of writing to a read-only file. I've mentioned this in my post About IRP_MJ_CREATE and minifilter design considerations - Part VI but I want to discuss it in a bit more depth.

Why is writing to a read-only file important ? Well, for one, it allows one to implement a file system based synchronization mechanism. It might also be useful for filters that might need to write data to files for such purposes as tracking access or simply to virtualize a file's data (deduplication filters and HSM might want to write to a read-only file , if only to put the original data in).

So first let's look at how the access checks when writing to a file actually work. It's a pretty straightforward operation:

  1. Caller calls NtWriteFile with a file handle.
  2. NtWriteFile tries to resolve the handle to a FILE_OBJECT by calling ObReferenceFileObjectForWrite()
  3. ObReferenceFileObjectForWrite() gets the handle information from the handle extracts the actual access that was granted to the caller of the handle.
  4. ObReferenceFileObjectForWrite() then a simple bit check between the requested access (which is for write) and the one granted to the handle. If the granted access doesn't include write this is where STATUS_ACCESS_DENIED is returned.

From this it's clear that an easy way to be able to write to handle is to have been granted that access when the handle was created. So one simple way to achieve the goal we set for ourselves in the title is to open a file that is not a read-only file for write and then set the read-only attribute. This can be done on the handle we have (that has been granted write access) and since we made the file read-only no one else open a handle for write, while we can use the handle to write to the file. However, if we close this handle then we can't open another handle for write on that file since it's now a read-only file. So this also shows a potential limitation when clearing the read-only flag: the handle where we'll do that can't have write data access and so it would be necessary to reset the read-only attribute on one handle and then open another handle with write data access to write data to the file.

I find this quite interesting because it exposes how the internal implementation of the handle access checks but it's not necessarily relevant to file system filters, since they can use the FILE_OBJECT directory to perform any write they want. This might be useful, however, to user mode services that work in conjunction with the file system filter that might require using a handle for IO.

Anyway, the thing I really wanted to point out was that there is another case when writing on a read-only file (a file that has a read-only attribute) is possible, a case that doesn't require that the file be opened without the read-only flag at all. The case I'm talking about happens when creating a read-only file. If the caller of the CreateFile (or one of the CreateFile APIs) is actually creating the file (it's not an "open" type of create) and they're asking for it to be a read-only file but at the same time they're requesting write access, then if the create operation succeeds the handle they get back can actually be used to write to the file. In my experience this behavior isn't that well known. It is also quite useful for implementing various types of atomic synchronization using the file system (creating a file with GENERIC_WRITE, CREATE_ALWAYS and FILE_ATTRIBUTE_READONLY will only succeed for the first caller and will fail for subsequent callers since the file will be read-only… the first caller can then reset the read-only attribute on the file and the next subsequent request with the same parameters will succeed (provided the sharing mode allows it)).

This is the kind of semantic that makes things interesting for a file system filter that tries to open files before the user (i.e. open the target file for an IRP_MJ_CREATE from the PreCreate callback for that operation). This is generally not a good idea for many reasons and this is just another one of them. It's also a good reminder of the subtle some of the file system semantics are and how easy it is for a filter to change things in a way that might break things.

Thursday, May 17, 2012

More Fun with FileIDs: Boot Critical Files

As you might have guessed from my previous posts, I'm actively working on a file system filter that uses FileIDs to open files. As it happens this isn't something most developers do so I occasionally find myself in some dark corner of the file system that I had no idea existed. Today's post is about a debugging session where I ran into one of these corners.
This all started when I was testing my filter on Server 2008 R2. I have tested it extensively on Win7 AMD64 and since Srv08R2 is pretty much identical in terms of the kernel and file system code I didn't expect I'd run into any issues. However, to my surprise, the machine wouldn't boot, crashing with the following bugcheck (edited a couple of lines here and there to make the output smaller, and highlighted the important bits):
kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NTFS_FILE_SYSTEM (24)
    If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
    parameters are the exception record and context record. Do a .cxr
    on the 3rd parameter and then kb to obtain a more informative stack
    trace.
Arguments:
Arg1: 00000000001904fb
Arg2: fffff880009903b8
Arg3: fffff8800098fc10
Arg4: fffff880017504cf

Debugging Details:
------------------

ExceptionAddress: fffff880017504cf (Ntfs! ?? ::NNGAKEGL::`string'+0x0000000000013040)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000070
Attempt to read from address 0000000000000070

CONTEXT:  fffff8800098fc10 -- (.cxr 0xfffff8800098fc10)
rax=fffff8a000aaa5e0 rbx=fffff8a000aaa940 rcx=0000000000000000
rdx=fffff88000990501 rsi=0000000000000000 rdi=0000000000000000
rip=fffff880017504cf rsp=fffff880009905f0 rbp=fffff88000990800
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=fffff880009905d0 r12=0000000000000000 r13=fffff8a000aaa501
r14=fffffa800e044500 r15=fffff8a000aaa760
iopl=0         nv up ei pl nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
Ntfs! ?? ::NNGAKEGL::`string'+0x13040:
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h] ds:002b:00000000`00000070=????????????????
Resetting default scope

PROCESS_NAME:  lsass.exe

CURRENT_IRQL:  2

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000070

READ_ADDRESS:  0000000000000070 

FOLLOWUP_IP: 
Ntfs! ?? ::NNGAKEGL::`string'+13040
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h]

FAULTING_IP: 
Ntfs! ?? ::NNGAKEGL::`string'+13040
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h]

BUGCHECK_STR:  0x24

DEFAULT_BUCKET_ID:  NULL_CLASS_PTR_DEREFERENCE

LAST_CONTROL_TRANSFER:  from fffff880016d8f46 to fffff880017504cf

STACK_TEXT:  
fffff880`009905f0 fffff880`016d8f46 : fffffa80`0e044500 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs! ?? ::NNGAKEGL::`string'+0x13040
fffff880`00990660 fffff880`016d96b4 : fffffa80`0e044500 fffffa80`0e6febb0 fffffa80`0d6d1420 00000000`00000000 : Ntfs!NtfsCommonFlushBuffers+0x3f2
fffff880`00990740 fffff880`00c02bcf : fffffa80`0e6fee78 fffffa80`0e6febb0 fffffa80`0e044500 fffff880`00990768 : Ntfs!NtfsFsdFlushBuffers+0x104
fffff880`009907b0 fffff880`00c016df : fffffa80`0d27d730 00000000`00000000 fffffa80`0d27d700 fffffa80`0e6febb0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`00990840 fffff800`019e171b : 00000000`00000002 fffffa80`0d6cf640 00000000`00000000 fffffa80`0e6febb0 : fltmgr!FltpDispatch+0xcf
fffff880`009908a0 fffff800`01978871 : fffffa80`0e6febb0 fffffa80`0e54b6a0 fffffa80`0d6cf640 fffff800`0184be80 : nt!IopSynchronousServiceTail+0xfb
fffff880`00990910 fffff800`016d88d3 : fffffa80`0e54b6a0 fffff8a0`016edeb0 fffffa80`0d27d730 fffffa80`0d6cf640 : nt!NtFlushBuffersFile+0x171
fffff880`009909a0 fffff800`016d4e70 : fffff800`01979627 00000000`00000010 00000000`ffffffff 00000000`00000001 : nt!KiSystemServiceCopyEnd+0x13
fffff880`00990b38 fffff800`01979627 : 00000000`00000010 00000000`ffffffff 00000000`00000001 00000000`00000000 : nt!KiServiceLinkage
fffff880`00990b40 fffff800`019797a7 : 00000000`0000001e fffff880`00990bd0 00000000`00b1e801 fffff800`0000001f : nt!CmpFileFlush+0x3f
fffff880`00990b80 fffff800`016d88d3 : fffffa80`0e54b6a0 fffff8a0`016edeb0 fffff880`00990ca0 00000000`00c17120 : nt!NtFlushKey+0xfb
fffff880`00990c20 00000000`770a1f7a : 000007fe`fcbf3b76 00000000`00000000 00000000`00c17120 00000000`00c17120 : nt!KiSystemServiceCopyEnd+0x13
00000000`00b1eba8 000007fe`fcbf3b76 : 00000000`00000000 00000000`00c17120 00000000`00c17120 00000000`00000000 : ntdll!ZwFlushKey+0xa
00000000`00b1ebb0 000007fe`fe7323d5 : 00000000`00c17101 000007fe`00000000 00000000`00000001 00000000`00b1eea0 : SAMSRV!SamrCloseHandle+0xf6
00000000`00b1ec00 000007fe`fe7db68e : 00000000`00000002 00000000`00000001 000007fe`fcc80220 00000000`00c20c30 : RPCRT4!Invoke+0x65
00000000`00b1ec50 000007fe`fe71ac40 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : RPCRT4!Ndr64StubWorker+0x61b
00000000`00b1f210 000007fe`fe7250f4 : 230100ef`00000001 00000000`ac896745 00002f00`e3236899 00000000`00000001 : RPCRT4!NdrServerCallAll+0x40
00000000`00b1f260 000007fe`fe724f56 : 00000000`00c20ad0 00000000`00000018 00000000`00b1f410 00000000`00c1b330 : RPCRT4!DispatchToStubInCNoAvrf+0x14
00000000`00b1f290 000007fe`fe71d879 : 00000000`00bda928 000007fe`fe72db94 00000000`00000001 00000000`00000001 : RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x146
00000000`00b1f3b0 000007fe`fe71d6de : 00000000`00c20ad0 00000000`00c20ad0 00000000`00000003 00000000`00000000 : RPCRT4!OSF_SCALL::DispatchHelper+0x159
00000000`00b1f4d0 000007fe`fe71d4f9 : 00000000`00000014 00000000`00000000 00000000`00c1a2d0 00000000`00c20ad0 : RPCRT4!OSF_SCALL::ProcessReceivedPDU+0x18e
00000000`00b1f540 000007fe`fe71d023 : 00000000`00c20a18 00000000`00c20a70 00000000`00000000 00000000`00000000 : RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+0x3e9
00000000`00b1f5f0 000007fe`fe71d103 : 00000000`00bda850 00000000`00b1f9c0 00000000`00c20a18 00000000`00b1f8d8 : RPCRT4!CO_ConnectionThreadPoolCallback+0x123
00000000`00b1f6a0 000007fe`fd1c908f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : RPCRT4!CO_NmpThreadPoolCallback+0x3f
00000000`00b1f6e0 00000000`7706098a : 00000000`00485840 00000000`00000000 00000000`77154da0 00000000`77070043 : KERNELBASE!BasepTpIoCallback+0x4b
00000000`00b1f720 00000000`7706feff : 00000000`00000002 00000000`00000000 00000000`00c20a70 00000000`00b1f9c0 : ntdll!TppIopExecuteCallback+0x1ff
00000000`00b1f7d0 00000000`76e4652d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!TppWorkerThread+0x3f8
00000000`00b1fad0 00000000`7707c521 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00b1fb00 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
So as you can see, this was an access violation in NTFS, in some function that the debugger couldn't resolve properly (hence the weird Ntfs! ?? ::NNGAKEGL::`string'+0x13040 name). The stack shows how this happens when a registry key is flushed as a result of closing a handle, and the registry eventually ends up flushing the actual hive file which makes it to NTFS where it bugchecks.
The first thing to do is to try to figure out what the function is that NTFS actually bugchecks in:
kd> ub Ntfs!NtfsCommonFlushBuffers+0x3f2
Ntfs!NtfsCommonFlushBuffers+0x3c7:
fffff880`016d8f1b 488b4348        mov     rax,qword ptr [rbx+48h]
fffff880`016d8f1f 488b4818        mov     rcx,qword ptr [rax+18h]
fffff880`016d8f23 48898c24f8000000 mov     qword ptr [rsp+0F8h],rcx
fffff880`016d8f2b 8b051389fdff    mov     eax,dword ptr [Ntfs!NtfsDebugDiskFlush (fffff880`016b1844)]
fffff880`016d8f31 4c8b8424f8000000 mov     r8,qword ptr [rsp+0F8h]
fffff880`016d8f39 488b542468      mov     rdx,qword ptr [rsp+68h]
fffff880`016d8f3e 498bcc          mov     rcx,r12
fffff880`016d8f41 e84efaffff      call    Ntfs!NtfsFlushBootCritical (fffff880`016d8994)
So NtfsCommonFlushBuffers calls NtfsFlushBootCritical, which then jumps around a bit and, due to pretty aggressive optimizations, ends up in a location that the debugger doesn't have a name for, hence the "Ntfs! ?? ::NNGAKEGL::`string'+0x13040" name.
Anyway, next step in this case is to try to find the FILE_OBJECT and see if it's somehow damaged or something. I also generally like to find the IRP and, since I'm working on minifilter, I also like to get the FLT_CALLBACK_DATA structure. This isn't too bad in most cases but it's a pain on amd64 systems because the calling convention makes sure the IRP_CTRL (the parent structure for the FLT_CALLBACK_DATA) isn't passed as a parameter on the stack. Also, since FltMgr implements a callback model it means that while my filter was definitely involved in processing this IO, my callback functions (for which I have symbols and I could get local variables and such) have already returned and can't be found on this stack. So the easiest thing to do in this case is to try to find the IRP_CALL_CTRL structure (see these posts for more details on what that is and how to get it: Debugging Minifilters: Finding the FLT_CALLBACK_DATA - Part I and Debugging Minifilters: Finding the FLT_CALLBACK_DATA - Part II).
kd> dp fffff880`00990840
fffff880`00990840  fffffa80`0d27d730 00000000`00000000
fffff880`00990850  fffffa80`0d27d700 fffffa80`0e6febb0
fffff880`00990860  fffffa80`0d3875a0 fffffa80`0e6febb0
fffff880`00990870  fffffa80`0e8c7790 ffffffff`ffffffff
fffff880`00990880  00000000`00000000 fffffa80`00000204
fffff880`00990890  fffffa80`0e6febb0 fffff800`019e171b
fffff880`009908a0  00000000`00000002 fffffa80`0d6cf640
fffff880`009908b0  00000000`00000000 fffffa80`0e6febb0
kd> dt fffff880`00990840+0x20 fltmgr!_IRP_CALL_CTRL
   +0x000 Volume           : 0xfffffa80`0d3875a0 _FLT_VOLUME
   +0x008 Irp              : 0xfffffa80`0e6febb0 _IRP
   +0x010 IrpCtrl          : 0xfffffa80`0e8c7790 _IRP_CTRL
   +0x018 StartingCallbackNode : 0xffffffff`ffffffff _CALLBACK_NODE
   +0x020 OperationStatusCallbackListHead : _SINGLE_LIST_ENTRY
   +0x028 Flags            : 0x204 (No matching name)
kd> !fltkd.cbd 0xfffffa80`0e8c7790

IRP_CTRL: fffffa800e8c7790  FLUSH_BUFFERS (9) [00000001] Irp
Flags                    : [10000000] FixedAlloc
Irp                      : fffffa800e6febb0 
DeviceObject             : fffffa800d27d730 "\Device\HarddiskVolume1"
FileObject               : fffffa800d6d1420 
CompletionNodeStack      : fffffa800e8c78e0   Size=5  Next=1
SyncEvent                : (fffffa800e8c77a8)
InitiatingInstance       : 0000000000000000 
Icc                      : fffff88000990860 
PendingCallbackNode      : ffffffffffffffff 
PendingCallbackContext   : 0000000000000000 
PendingStatus            : 0x00000000 
CallbackData             : (fffffa800e8c7840)
 Flags                    : [00000001] Irp
 Thread                   : fffffa800e54b6a0 
 Iopb                     : fffffa800e8c7898 
 RequestorMode            : [00] KernelMode
 IoStatus.Status          : 0x00000000 
 IoStatus.Information     : 0000000000000000 
 TagData                  : 0000000000000000 
 FilterContext[0]         : 0000000000000000 
 FilterContext[1]         : 0000000000000000 
 FilterContext[2]         : 0000000000000000 
 FilterContext[3]         : 0000000000000000 

   Cmd     IrpFl   OpFl  CmpFl  Instance FileObjt Completion-Context  Node Adr
--------- -------- ----- -----  -------- -------- ------------------  --------
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7ae0
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7a60
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c79e0
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7960
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [9,0]    00060004  00   0002   fffffa800d43d610 fffffa800d6cf640 fffff880014ecea0-0000000000000000   fffffa800e8c78e0
            ("ivm","IVM")  ivm!IvmFSPostOpDispatch 
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Working IOPB:
>[9,0]    00060004  00          fffffa800e73e010 fffffa800d6d1420                     fffffa800e8c7898
            ("luafv","luafv")  
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
So now that I have the FLT_CALLBACK_DATA I need to get the FILE_OBJECT. Please note that the FILE_OBJECT changes between the parameters my filter gets and the ones FltMgr is using now (after my filter was called), which make sense since I replace the FILE_OBJECT in the callback. So let's see what that FILE_OBJECT looks like and what the file is:
kd> !fileobj fffffa800d6d1420

掃

Related File Object: 0xfffffa800d43d460

Device Object: 0xfffffa800d301cd0   \Driver\volmgr
Vpb: 0xfffffa800cee62f0
Event signalled
Access: Read Write 

Flags:  0x140008
 No Intermediate Buffering
 Handle Created
 Random Access

FsContext: 0xfffff8a000aaa6f0 FsContext2: 0xfffff8a000aaa940
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: fffffa800d6d3af8
  Shared Cache Map: 00000000


File object extension is at fffffa800d41b780:

As you have undoubtedly noticed, the file name is actually pretty strange (and in case you can read that character, one could possibly associate sweeping with flushing a file, but it's a bit of a stretch :)) . Whenever I see a filename that looks like one or a couple of characters that don't belong I suspect that that file was opened by ID (and in this case I actually know my filter does this so I was sort of expecting it anyway). A very important thing to check at this point is that the FILE_OBJECT is actually an NTFS FILE_OBJECT and I'm not somehow leaking FILE_OBJECTs from my filter into the file system. The easiest way to check is to look at the tags for the buffers in FsContext and FsContext2, since the owner of the FILE_OBJECT allocates those:
kd> !pool 0xfffff8a000aaa6f0 2
Pool page fffff8a000aaa6f0 region is Paged pool
*fffff8a000aaa5b0 size:  580 previous size:   80  (Allocated) *NtfF
  Pooltag NtfF : FCB_INDEX, Binary : ntfs.sys
kd> !pool 0xfffff8a000aaa940 2
Pool page fffff8a000aaa940 region is Paged pool
*fffff8a000aaa5b0 size:  580 previous size:   80  (Allocated) *NtfF
  Pooltag NtfF : FCB_INDEX, Binary : ntfs.sys
One final thing to check is that the actual file name is a file ID. This is pretty complicated to do since a fileID can look almost like anything. The easiest way to do this is to use a VM (so that it's easy to snap back to the same OS at an earlier point) and look at what the file system looks like without the filter installed and use fsutil.exe to resolve the FileID to a name:
kd> dt fffffa800d6d1420 nt!_FILE_OBJECT FileName.
   +0x058 FileName  :  "掃"
      +0x000 Length    : 8
      +0x002 MaximumLength : 0x38
      +0x008 Buffer    : 0xfffff8a0`00aed860  "掃"
kd> dp 0xfffff8a0`00aed860 L1
fffff8a0`00aed860  00010000`00006383
So anyway, this does look like a file ID and it actually was the fileID for the file "\Windows\System32\Config\SYSTEM", which is the system hive. This was pretty confusing because I was expecting I was doing something obviously wrong (like pass the wrong FILE_OBJECT down the stack) and yet I couldn't see anything broken here. So I decided to see what's actually going on in NTFS.
I spent some time looking at the disassembly for NtfsFlushBootCritical but like most NTFS code it's packed with structures I have no idea about. In these cases it's easier to walk through the code in the debugger to see what the structures look like (and the pool tags they use) to try to figure out what they might be. However, NtfsFlushBootCritical doesn't actually get called at all for (almost) any other file. So at this point I was pretty puzzled because it meant NTFS somehow knew that the system hive was a "boot critical" file, unlike any other file on the file system (since that function was never called again). I didn't know that NTFS did special things for files that are deemed boot critical (since NTFS is generally pretty uniform in how it does things, it's rare to see it do something special for a certain windows process) and I had no idea what those files were.
Anyway, as it turned out NtfsFlushBootCritical was called for the system hive even when my driver wasn't there so I could debug it in more detail. After walking through it a couple of times it became obvious that it was walking a list of SCBs, starting from the file I had. The SCB when my filter was present was NULL and then NTFS tried to access a member of the SCB, resulting in a NULL pointer dereference.
Now, I'd like to point out that NTFS keeps a pointer to the parent SCB for each SCB. So the SCB for the file "\Windows\System32\Config\SYSTEM" has a pointer to the SCB for the folder "\Windows\System32\Config", which in turn has a pointer to the SCB for its parent folder, "\Windows\System32" and so on. However this doesn't always happen for files that are opened by ID, because there is no path traversal necessary in order to open the file (NTFS can use the fileID as the index and immediately find the file) and it doesn't make much sense for NTFS to populate the SCB if it's not necessary.
So, obviously, the issue here is that when opening the file \Windows\System32\Config\SYSTEM by ID the pointer to the folder SCB wasn't initialized when NTFS expected it would be and so it ended up dereferencing a NULL pointer.
What puzzled me about this is that this never happened on Win7. So I debugged my code on that platform and guess what, in Win7 the link is actually initialized even when the file is opened by fileID. As it turns out, NTFS will not populate the pointer to the parent SCB if it doesn't have to, but if someone asks for the file name (a filter calling FltGetFileNameInformation() for example) then it will walk the list and build the path and then the SCB pointer will be initialized. This is exactly what happens on Win7 because there are other filters (FileInfo, Luafv) that query names, but on Srv08R2 they're not present (or at least not enabled) and so this doesn't happen.
Now the final bit of mystery was how did NTFS know that "\Windows\System32\Config\SYSTEM" should be handled in a special way. What was it that made it a boot critical file ? Well, as it turns out, there is an FSCTL, FSCTL_SET_BOOTLOADER_ACCESSED, which serves this purpose. When NTFS receives this for a FILE_OBJECT it sets an internal flag and then when an IRP_MJ_FLUSH_BUFFERS arrives on the same handle NTFS will flush not only that file but all the folders that make up the path to that file (which is all handled in NtfsFlushBootCritical).