Of Filesystems And Other Demons

Getting Started with Windows File System Filters - DriverEntry

2014-01-23T13:26:00.000-08:00

Now that we have everything in place (meaning the INF file with the Microsoft allocated altitude), we're ready to start coding. Minifilters are regular WDM drivers (well, they implement a subset of the features; no power manager or plug and play) and so their main() function is DriverEntry(). A minifilter's DriverEntry() is very simple and typically only registers the minifilter with FltMgr and starts filtering. Of course, a minifilter can choose a different flow, where it starts filtering (or even registers with FltMgr) at a later point. In my experience these are pretty rare so I'll just discuss the usual case.
So, as explained before, there are two steps to "starting" a minifilter:

Registering it with FltMgr - This is done by calling FltRegisterFilter() and it is during this time where the FLT_FILTER structure is allocated by FltMgr. When the function returns the system can see the filter but filtering hasn't started yet, no callbacks are being called.
Starting filtering - This is done in the call to FltStartFiltering() and a minifilter is expected to be ready to process callbacks even before the function returns. This separation is useful for cases where the filter needs to set up some additional things before it's ready to do any work, things that might require some time (for example it might need to create some threads).

The only potentially difficult thing here is dealing with the many fields of the FLT_REGISTRATION structure. It has its own documentation page, but i'd like to quickly go over the fields:

Size - this is pretty obvious, just sizeof(FLT_REGISTRATION).
Version - this has changed a couple of times. Mainly exists for future compatibility. Look in fltkernel.h if you'd like to see what the versions look like.
Flags - The values here are documented and pretty easy to understand. Keep in mind that support for NPFS and MSFS is only available for Win8 and newer.
ContextRegistration - Now we're getting into the tricky bits. This is an array of structures that define which types of contexts the filter will use. This concept of the array of structures might feel a bit strange initially, but it works out great for things that are really sparse. For example there are 7 types of context and most applications only use one or two.
OperationRegistration - This is also an array of the IRP_MJ operations you want to filter. I generally start with one or two operations (depending on the style of filter) and just keep adding stuff here.
FilterUnloadCallback - Callback for unloading the filter. If NULL the filter can't be unloaded.
InstanceSetupCallback - Callback for when an instance gets created.
InstanceQueryTeardownCallback - Callback to check whether a manual detach should be allowed.
InstanceTeardownStartCallback - Callback to indicate that the instance will start tearing down.
InstanceTeardownCompleteCallback - Callback to indicate that the instance has finished tearing down.
GenerateFileNameCallback - Name provider callback for generating names.
NormalizeNameComponentCallback - Name provider callback for normalizing names.
NormalizeContextCleanupCallback - Name provider callback for normalization context cleanup.
TransactionNotificationCallback - Callback for transaction management.
NormalizeNameComponentExCallback - Another normalization callback for with better transaction support.
SectionNotificationCallback - A callback for notifications for sections created by the filter.

As you can see this list is a pretty random set of things that the filter might need. So let's look at the minimal set of things that are required to make a filter:

ContextRegistration - for anything other than very trivial filters you'll need some contexts, so this will probably have something.
OperationRegistration - same here, you can only do very limited stuff (like watching volumes appear and go away) without any operations to monitor. You'll have at least one callback defined here.
FilterUnloadCallback - you can probably skip this initially but you won't be able to unload the filter. Doesn't matter much if you use a VM for debugging but in general I'd advise creating this callback, even if it doesn't do anything.
InstanceXxxCallback - you can probably initially skip all these. In my experience it'll be pretty clear when you need to use them.
XxxNameXxxCallback - unless you're a name provider you can skip all these. You probably shouldn't make your first filter a name provider anyway.
TransactionNotificationCallback - very likely you can skip this initially.
SectionNotificationCallback - same here, most likely not necessary.

So basically once you build this structure and call FltRegisterFilter() and FltStartFiltering() from DriverEntry() you should be able to play with your filter. If you don't want to do all the typing take a look at the nullFilter WDK sample. It doesn't really do anything (no operations are defined) but it should load and unload just fine. Easy way if you want a simple file system filter template to build on top of.

Getting Started with Windows File System Filters - What's That INF File For ?

2014-01-16T08:00:00.000-08:00

In a nutshell, INF files are files that contain parameters for drivers. There is an MSDN page for INF files in general and then there is a file system filter specific MSDN page. I won't go over the information there, which is very comprehensive, I'd rather focus a bit on how I work with them and what sort of decisions a file system designer might need to make.

In terms of how often does one use the INF files, I use the INF file every time I need to install a new version of my minifilter. My workflow is to rebuild the SYS file and copy it to a location shared with the VM, and then I need the INF to load it. Sometimes I load it by right-clicking the INF file and then clicking on Install:

This is fine when I just want to play with a filter but when actively working on it (when I'm building it 40 times a day) I use a script to load it. It's usually a one-line script but it can grow if I need to initialize other things as well:
rundll32.exe setupapi.dll,InstallHinfSection DefaultInstall 128 nullfilter.inf
The magic value "128" is documented on the page for the InstallHinfSection Function. I never need to reboot for development versions of my drivers so 128 works for me. I remove any bits that might require a reboot and if I need to run other services and such I just add more lines to my installation script for that.
Now, for most filters I've ever needed to work on I just take one of the INF files for any of the file system filter samples (nullfilter is the one I use as a template most often, for more advanced usage see simrep and minispy) and I change a couple of things:

The driver name. This is pretty much a straight find&replace, but make sure you preserve the case.
The LoadOrderGroup. This is the big one. Each file system filter developer needs to pick the right LoadOrderGroup for their filter, depending on what they plan do to. All the various load orders are listed on the MSDN page Load Order Groups for File System Filter Drivers. For an in-depth look at how loading a filter works and how to more closely control the load order see my previous post on Controlling the Load Order of File System Filters. However, at the time of writing the INF all one needs to worry about is the proper LoadOrderGroup.
The Class and ClassGuid. This is closely related to the LoadOrderGroup, basically just copy the name and the GUID from the MSDN page File System Filter Driver Classes and Class GUIDs.
The altitude. You need to get this from Microsoft, they allocate unique ones for each filter. If you follow the steps on the page Minifilter Altitude Request you should receive your altitude in an email. This might take a couple of weeks or more (AFAIK it's a manual process and each request is reviewed to make sure that the LoadOrderGroup for the filter type makes sense (based on the brief description in the submission form; thus it helps to explain what the filter does fairly clearly, to expedite the process)). But again, I don't think there are any guarantees when it comes to the time, so you want to do this early in the development process and not wait till the last minute. However, when starting development you'll need an altitude to load your filter so what I normally do is just pick an unused one in the right LoadOrderGroup from the page Allocated Altitudes while I wait for my request to be processed. Please keep in mind that this page isn't updated all the often so it's possible that the altitude you picked has already been allocated to someone else. But in general when you get the new altitude it's simply a matter of replacing it in the INF file and you should be good to go. Please make sure to use altitudes that you picked (not allocated by Microsoft) ONLY for development and debugging, you NEVER should to release a filter or share it with customers without a proper altitude from Microsoft.
The InstanceFlags. The only documentation I've been able to find for these is in a presentation on Loading and Unloading Minifilters. In short, 1 means 'don't allow automatic attachments' and 2 means 'don't allow manual attachments'. I mostly use 1 during development anyway since I prefer to attach my filter manually to specific drives (generally by adding more lines to the script that loads the driver).

Since I mentioned altitudes here and the fact that you must register your filter with Microsoft and get an altitude from them, there is another process that requires a similar registration with Microsoft. I'm talking about the process of requiring a reparse point. Basically, if you call FSCTL_SET_REPARSE_POINT in your product (assuming you don't set any of the Microsoft reparse points, which are undocumented anyway as far as I know) you need a reparse point tag. There is a page that describes that process, Reparse Point Tag Request and the only other thing you'll need when applying for one of those is a GUID, which you must generate and then use together with the Reparse Point Tag (the one you get back from Microsoft) every time you call FSCTL_SET_REPARSE_POINT. (NOTE: this is only necessary for a very small set of filters and it's not related with the INF file at all, I'm just mentioning it here because it also requires a tag from Microsoft, similar to altitudes, so it's easy to do them both at the same time (if you are one of those filters)).

Anyway, hopefully this page and the links to MSDN documentation will get you going with your INF so that you can finally start working on the actual filter.

Getting Started with Windows File System Filters - Hooking Up WinDbg to the VM

2013-12-19T11:17:00.004-08:00

Sorry I missed last Thursday, I've been traveling and I didn't get around to post anything.
This post is focused to hooking up WinDBG to a VM. There are multiple ways to do attach debuggers to physical machines, each with it's specific pluses and minuses (please note, however, that this list changes often since Microsoft keeps improving the tools, which is awesome). Also, please note that in most cases you'll use two machines, a target (the machine you want to debug) and a host (the debugger host). If I remember correctly there used to be a way to attach WinDBG to the kernel it was running on, which was useful to inspect various kernel structures, but I don't think that's supported anymore, and anyway with VMs this is easy to do and much more flexible. Anyway, here are the main ways to attach to machines (that I'm aware of at the time of writing this):

serial cable connection. This is pretty universal, works on all Windows versions and it's been tested for a very long time. However, not many machine have serial ports anymore, and serial is really slow. But it's really the solution that's most likely to work for all cases.
USB connection. I've never actually used this, I think I tried once but gave up for some reason I can't remember (and I was left with the impression that it's quite finicky). This is also only available on Vista and newer.
1394 (or firewire). I've used this a lot, it's really fast and pretty easy to set up. The downside is that it's expensive because it needs specialized hardware (like firewire cards and cables). This is available since XP and I highly recommend it if you need to work with real hardware and a VM won't work for you.
network. I've not used this at all since it's pretty new (Win8+) but it should be pretty awesome since you don't have to be physically connected to the machine you're trying to debug (which limits the usefulness of the other methods, since if you work in a larger group you'll often have to debug things on machines that are not in your office so you'll have to figure out a way to connect to them).

In general you really want a fast debugger connection because you might end up moving massive amounts of data across from the target to the host (for example, you might decide to capture a dump instead of keep using the machine, so if you'll have to capture all physical memory you'll want the 4, 8 or who knows, maybe even 128 GB to move quickly. I mean, over serial this can last weeks...
However, we're not talking about physical machines here so you should be able to just read the memory of the VM from the host, right ? Well, as it turns out, you can. There is a tool called VirtualKD, that actually does this and it has a couple of features that really help with driver development. There is a tutorial on how to install VirtualKD so please follow the steps there. Once you have everything working (the way I test that it's working is to boot up the VM and then breaking in the debugger. At that point the VM should freeze (please note that it might grab your mouse and you'll need to press some key combination (Ctrl+Alt for VMWare) to release it). Then I return to WinDBG and press F5 to resume running the VM) it's time to build and install a minifilter. This is what WinDBG should look like once it's connected properly and paused and resumed the VM:

One other thing we should set up is some method of copying files between the VM and host. I do this by setting up the Shared Folders feature to always share a certain folder between the host and the VM. Then all I need to do is copy the files I need to that folder on the host and I can access them from the VM (for this machine I used a folder D:\shared). After everything is set up, it's time to start the sequence of steps that will deploy the minifilter on the VM (which will become familiar to you pretty quickly):

So now let's build a minifilter. Using the steps described in the previous post, let's try to build the passThrough minifilter. Try to build the checked version for the platform you've set up your VM to use (basically, for my Win7 x86 VM I'll start the "x86 Checked Build Environment" under the Windows 7 folder and build there). Once it's built I'd like to add a breakpoint in DriverEntry() (which is basically the main() function for drivers) to break in the debugger once the filter loads. So please just add a breakpoint (a call to DbgBreakPoint();) somewhere in DriverEntry() and rebuild the driver (bcz again)).
Now we need to copy two files to the VM. Those two files are the actual driver (objchk_win7_x86\i386\passThrough.sys in my case) and the inf file (passThrough.inf). Put these two in the same directory and copy them to the VM through the shared folder.
And finally we need to install the minifilter on the VM. So now we're switching to the VM. Open the shared folder, find the passThrough folder, right-click on passThrough.inf and select Install (which will require administrator privileges).
Then we need to start an elevated CMD window from where we'll run "fltmc load passThrough".
If everything is fine and the minifilter is loading then we'll trigger the breakpoint, which will freeze the VM. So we'll need to switch back to the debugger and unfreeze it (F5 again or use "g" command to allow execution to continue).
After that we should see passthrough loaded and attached to all the volumes in your VM.

So this is it for today. In the next post I'll talk about some of my configurations and some optimizations I use to make this whole process faster, but that's not going to be until next year, so I wish you Happy Holidays! and a Happy New Year! in the meantime :).

Getting Started with Windows File System Filters - Installing the Tools

2013-12-05T08:00:00.000-08:00

Hello everyone,

It's been a while since I last posted anything. I took a break from file systems (and Windows, for that matter) but now I'm ready to get back. Since I'll be spending some time getting reacquainted with the whole thing I figured it's a good time to start a blog series for absolute beginners to this subject. However, you're expected to know C and understand synchronization (you should know what a mutexes and semaphores are).
So with that, the first step is to try to get the environment set up. We'll need the following tools:

Windows Driver Kit - I'll be installing the 7.1.0 WDK, which should be good for Windows 7, Windows Vista, Windows XP, Windows Server 2008 R2, Windows Server 2008, and Windows Server 2003. The drivers we build should also run on Windows 8 and newer but the 7.1 WDK doesn't include the additional features (new APIs and such) that are available there. I'll cover setting up the newer WDK in a different post. For now, just get the WDK from MSDN.
A Virtual Machine software - Personally I'm a big fan of VMware Workstation, but it's not free. I've used VirtualBox in the past and it's fine for what we need and it's also free so you can get it from here.

So let's get started. Install the WDK anywhere (please note that it's an ISO image so you can either mount it with any tool you like or you can just use a tool like 7zip (or most other decent archivers) to extract the files to a directory. Then just run KitSetup.exe and we're good to go.
This is what I have installed under D:\WinDDK (the red Xs are there because the kit can't find the kit I used so I can't add anything else - just ignore those):

Once the WDK installed you should be able to build a sample. So the steps are:

Start a cmd prompt to build the sample: Start->Windows Driver Kits->WDK 7600.16385.1->Build Environments->Windows 7->x86 Checked Build Environment
type cd src\filesys\miniFilter\nullFilter to get to the simplest minifilter sample
type bcz to build it
now you can go to the output directory (cd objchk_win7_x86\i386) and you should see the nullfilter.sys file, which is the null minifilter sample

At this point you should have the WDK installed properly. Please install VirtualBox on your own and create a new Win7 VM (or, if you have VMware Workstation, you can just use that) and next week we'll configure the debugger and load the filter and look at it a bit.

FltCreateSectionForDataScan

2012-11-15T11:21:00.002-08:00

Thanks everyone for your suggestions. I really appreciate you took the time and sent me emails. Some of the topics you've suggested I won't be able to address for various reasons (mostly having to do with the legal implications of my doing that). However, I have a couple of topics that I might go over in the next couple of months. Incidentally, I also plan to change the frequency of my posts because of changes around my work so I'll probably update the blog whenever I get around to it. I still plan to answer questions though so feel free to leave comments and suggest topics (preferably in email). And now let's proceed to the topic at hand.

Someone suggested it would be useful to take a closer look at what FltCreateSectionForDataScan does and discuss how the same functionality could be achieved in previous OS releases. The FltCreateSectionForDataScan function is documented as being supported only in Win8+ but looking at the documentation page for the FltRegisterForDataScan function there is a mention about the FsRtlCreateSectionForDataScan function, which is documented as being available since Win 2000 SP4 and XPSP2, basically from the time FltMgr is available. A quick look in the debugger at FltCreateSectionForDataScan shows that it's really a wrapper around FsRtlCreateSectionForDataScan so I thought it might be useful to take a closer look at that:

As one might expect the FsRtlCreateSectionForDataScan function begins with various parameter checks, primarily to filter out various invalid parameters. There is also a check to see if the FileObject supports sections.
The next step is to set the TopLevelIrp to the value 1, which maps to FSRTL_FSP_TOP_LEVEL_IRP.
Once that is done there is a call to nt!FsRtlAcquireFileExclusiveCommon. This function is not documented anywhere and I could only find some references in crash dumps. However, looking for just FsRtlAcquireFileExclusive returns more hits and we can see that FsRtlAcquireFileExclusive doesn't do much beyond calling nt!FsRtlAcquireFileExclusiveCommon. In the ntifs.h file we see a comment to the effect that FsRtlAcquireFileExclusive is called from NtCreateSection to avoid deadlocks with file systems, so we can guess that this the same thing going on here. We can also guess that whatever FsRtlCreateSectionForDataScan does is probably pretty similar to NtCreateSection.
The next step is to get the file size. This is done by a call to nt!FsRtlGetFileSize, and the documentation for FsRtlGetFileSize states that "If you already own file system resources, you should call FsRtlGetFileSize instead of ZwQueryInformationFile, because attempting to acquire the file object lock would violate locking order and lead to deadlocks. The ZwQueryInformationFile function should be only if you do not already own file system resources.". This makes sense since the function has already acquired the file object lock in the previous step.
There is a quick check to see if the file is empty and the function should return with STATUS_END_OF_FILE.
Then, once everything is set up we have a call to nt!MmCreateSection. Calling this function directly is not supported by Microsoft and there is no documentation for how to do that. This is basically why FsRtlCreateSectionForDataScan was added. This can be inferred from the msdn page "Installing the Filter Manager rollup package for Windows XP SP2". If the function returns STATUS_FILE_LOCK_CONFLICT the call to nt!MmCreateSection is retried after delaying the thread by nt!FsRtlHalfSecond (I had no idea this existed, but I plan to use it from now on :)).
Once the section is created there is a call to nt!CcZeroEndOfLastPage. I've not been able to find any documentation for this function (though it's pretty obvious what it does) except in the NT Insider article "Cache Me if You Can: Using the NT Cache Manager".
Once this is done the function releases the locks so the rest of the function is done outside the critical region (that was set up around step 2).
The only interesting thing happening in the rest of the function is the creation of the handle for the section in the current process handle table. The rest of it just deals with cleanup and the return path from the function.

One thing to note is that FsRtlCreateSectionForDataScan just creates the section. There are A LOT of FltMgr functions related to these data sections and based on the names most of them are dealing with conflict resolution. I'm not sure what that's about yet and I haven't actually used them anywhere but if you plan to use this API directly in downlevel OSes then you should probably expect some sort of conflicts. Good luck!

Suggestions

2012-10-04T09:31:00.001-07:00

Hi everyone. It seems lately I'm having a hard time coming up with good topics, so I wanted to ask you to send me emails with topics you might be interested in. These could be things you already know but you think might be beneficial to have posted somewhere to the rest of the community or things you're not sure about. However, please keep the requests specific. Things like "minifilter interaction with the memory manager" are simply too vague to be helful. You could ask about specific functions or behaviors or structure members (but again, please keep it specific). You can find my email address on my contact information on this blog.

Thanks,
Alex.

A Helpful Debugging Technique: Debug Stream Contexts

2012-09-27T10:04:00.000-07:00

In this post I'd like to discuss a debugging technique that I found pretty useful. I'm a big fan of keeping lots of information around in debug builds and ASSERTing all over the place. I try to be very careful about what information I keep so that I don't change the actual flow of execution for the code. For example, I tend to try to not add references when I copy some pointers because I don't want to alter when freeing the memory happens for the objects I'm tracking (but this approach requires extra care when asserting since I don't want to use pointers to free memory).
One particularly useful thing I like is to keep track of all the files I've processed in my filter and remember various decisions I made about them and keep back-pointers to things that would otherwise be impossible to find (for example when I base my decision on some parameters in the FLT_CALLBACK_DATA I might take a snapshot of those parameters; or I might keep a stack trace for certain operations and so on). The problem that arises with this approach is, of course, where to store this information. In some cases it's feasible to use some of the structures I'm already using (StreamContext or StreamHandleContext) and just add extra members in debug builds. However, there are other times where this becomes a significant pain and it would alter the filter quite a bit (for example, if the logic of the filter is to check for presence of a context to decide on what to do with a file then adding a context in which I only populate the debugging information for files I don't intend to process would change that logic). Also, certain contexts might store data that isn't fixed in size (like a file name or path) which I typically set to be the last thing in the context so adding more debugging information (which might itself be dynamic in size) would make things very complicated.
The ideal approach would be to set more than one context on an object, but FltMgr's functions don’t really allow that. Well, here is where the some FsRtl functions really come in handy. There is support built in the file system library (FsRtl code) for contexts (which I've discussed in more detail in my post on Contexts in legacy filters and FSRTL_ADVANCED_FCB_HEADER) and the library supports setting multiple contexts for the same stream, with different keys. Moreover, the contexts have two keys and there are pretty cool searching semantics that allow searching on only one key. The functions that deal with this are FsRtlInitPerStreamContext, FsRtlInsertPerStreamContext and FsRtlLookupPerStreamContext. They’re pretty easy to use and there is even a page on MSDN on Creating and Using Per-Stream Context Structures.
Here is some code that shows how to use these functions:

typedef struct _MY_DEBUG_CONTEXT {

    //
    // this is the header for the perStream context structure. 
    //

    FSRTL_PER_STREAM_CONTEXT PerStreamContext;

    // 
    // this is the information we want to track.
    //
    
    BOOLEAN Flag;

} MY_DEBUG_CONTEXT, *PMY_DEBUG_CONTEXT;



VOID
FreeDebugContext(
    __in PVOID ContextToFree
    )
{

    //
    // here we would release and uninitialize any other data.. 
    //

    //
    // and then we free the actual context.
    //
    
    ExFreePoolWithTag( ContextToFree, 'CgbD' );
}


NTSTATUS
SetDebugContext(
    __in PFILE_OBJECT FileObject,
    __in PVOID Key1,
    __in_opt PVOID Key2,
    __in BOOLEAN Flag
    )
{
    PMY_DEBUG_CONTEXT context = NULL;
    NTSTATUS status = STATUS_SUCCESS;

    __try {

        //
        // allocate and initialize the context.
        //

        context = ExAllocatePoolWithTag( PagedPool, 
                                         sizeof(MY_DEBUG_CONTEXT),
                                         'CgbD' );

        if (context == NULL) {

            status = STATUS_INSUFFICIENT_RESOURCES;
            __leave;    
        }

        FsRtlInitPerStreamContext( &context->PerStreamContext,
                                   Key1,
                                   Key2,
                                   FreeDebugContext );

        //
        // initialize all the members that we want to track.
        //

        context->Flag = Flag;

        //
        // try to set the context.
        //

        status = FsRtlInsertPerStreamContext( FsRtlGetPerStreamContextPointer(FileObject),
                                              &context->PerStreamContext );

        NT_ASSERT(status == STATUS_SUCCESS);

    } __finally {

        if (!NT_SUCCESS(status)) {

            if (context != NULL) {
            
                ExFreePoolWithTag( context, 'CgbD' );
            }
        }
    }

    return status;
}


BOOLEAN
GetDebugContextFlag(
    __in PFILE_OBJECT FileObject,
    __in PVOID Key1,
    __in_opt PVOID Key2
    )
{
    PMY_DEBUG_CONTEXT context = NULL;
    
    context = (PMY_DEBUG_CONTEXT)FsRtlLookupPerStreamContext( FsRtlGetPerStreamContextPointer(FileObject),
                                                              Key1,
                                                              Key2 );

    if ((context != NULL) &&
        (context->Flag)) {

        return TRUE;
    }

    return FALSE;
}

Finally, one more thing to show is how easy it is to find this structure in the debugger:

0: kd> dt @@(tempFileObject) nt!_FILE_OBJECT FsContext  <- get the FsContext
   +0x00c FsContext : 0xa24d05f8 Void
0: kd> dt 0xa24d05f8 nt!_FSRTL_ADVANCED_FCB_HEADER FilterContexts. <- look at the FilterContexts  in the FsContext
   +0x02c FilterContexts  :  [ 0xa25a1108 - 0x936e93e4 ]
      +0x000 Flink           : 0xa25a1108 _LIST_ENTRY [ 0x936e93e4 - 0xa24d0624 ]
      +0x004 Blink           : 0x936e93e4 _LIST_ENTRY [ 0xa24d0624 - 0xa25a1108 ]
0: kd> dl 0xa25a1108 <- look at all the entries in the list
a25a1108  936e93e4 a24d0624 9441eb40 92e3cb40
936e93e4  a24d0624 a25a1108 93437438 a24d05f8
a24d0624  a25a1108 936e93e4 00000000 a24d05f4
0: kd> !pool a25a1108 2  <- check the pool type for the first one
Pool page a25a1108 region is Paged pool
*a25a1100 size:   20 previous size:  100  (Allocated) *DbgC  <- this is our structure
  Owning component : Unknown (update pooltag.txt)
0: kd> !pool 936e93e4  2   <- for fun, let's see what the other context is 
Pool page 936e93e4 region is Nonpaged pool
*936e93d8 size:   68 previous size:    8  (Allocated) *FMsl   
  Pooltag FMsl : STREAM_LIST_CTRL structure, Binary : fltmgr.sys  <- obviously FltMgr's context
0: kd> dt a25a1108 _MY_DEBUG_CONTEXT  <- display the debugging context
PassThrough!_MY_DEBUG_CONTEXT
   +0x000 PerStreamContext : _FSRTL_PER_STREAM_CONTEXT
   +0x014 Flag             : 0x1 ''

Volume Mount Points, Directory Junctions and STATUS_ACCESS_VIOLATION

2012-09-20T08:00:00.000-07:00

Hi everyone. I've just spent a couple of days debugging an issue that I think is pretty interesting and I hope publishing this post might save someone a lot of pain.
Before I go into the specifics, let me explain a bit about my setup. My filter opens files for its own use but on behalf of the user (in other words it opens files in response to certain user actions, but the user must have access to the files as well (which is different from certain kinds of filters where the filter uses files but there is no connection between the user and the files; in those cases the files are similar to file system metadata)). It does this by calling FltCreateFile and it uses OBJ_FORCE_ACCESS_CHECK to make sure that all the access checks are performed. The files that the filter opens depend on the file that the user opens and so it's possible that the filter ends up opening symlinks or volume mount points and such. In these cases the file system will return a STATUS_REPARSE and the create will be retried on a different path; the same can happen if a file system filter returns STATUS_REPARSE. If the new path happens to be located on a different volume (which is usually the case with volume mount points, though not always since one can in fact create a mountpoint for C:\ at C:\mnt for example…) then the FltCreateFile call will generally fail with STATUS_MOUNT_POINT_NOT_RESOLVED. I've discussed this at great length in my post on Reparsing to a Different Volume in Win7 and Win8.
The problem I was running into was that when my filter tried to open a path containing a volume mount point it would fail with STATUS_ACCESS_DENIED. This behavior was different from symlinks where the call to FltCreateFile still failed with STATUS_MOUNT_POINT_NOT_RESOLVED. This was something I wanted to investigate since the STATUS_ACCESS_VIOLATION status is one I've learned to pay attention to because it generaly indicates a bug in my code. I thought that maybe I was calling FltCreateFile with some parameter that wasn't set up correctly which led to NTFS or the IO manager or the Object manager to try an invalid access, which was probably trapped in some exception handler and returned to the caller. This was definitely worth fixing since such problems can be exploited.
So I started the looong process of debugging this issue. Being an access violation I figured it comes from some improper memory access, which is different from calling some function that returns an error, because most instructions perform memory accesses so my usual approach of stepping to the next call and walking over it to see the return status would not work. Moreover, I had no guarantee that if I set a breakpoint later on I will actually hit it since any of the memory accesses up to that breakpoint could in fact be the culprit. And, to make things worse, this was in the create path for the file system, which is heavily exercised on a running system, which means that any breakpoint that I set in the debugger that's not threaded will often trigger when another thread happens to try to do a create. Anyway, to spare you the gory details, I eventually found the problem and this is what the stack looked like when the instruction I was about to execute was the one that triggered the access violation:

0: kd> kbn L0xa
 # ChildEBP RetAddr  Args to Child              
00 b0583190 82a8f20f 92637001 92637001 b0583734 nt!ObpCaptureObjectCreateInformation+0x61
01 b05831dc 82ae2587 b0583734 924d6f78 92637001 nt!ObOpenObjectByName+0x9b
02 b0583338 82ae17f6 b0583734 00120089 92637001 nt!IopFastQueryNetworkAttributes+0x127
03 b05833a4 82a88936 92637001 dc6a18a4 b0583550 nt!IopQueryNetworkAttributes+0x40
04 b0583480 82a6926b 938e34d8 974d6f78 92637008 nt!IopParseDevice+0x115e
05 b05834fc 82a8f2d9 00000000 b0583550 00000640 nt!ObpLookupObjectName+0x4fa
06 b0583558 82a8762b b0583734 924d6f78 00010000 nt!ObOpenObjectByName+0x165
07 b05835d4 82abee29 b058371c 00120089 b0583734 nt!IopCreateFile+0x673
08 b0583630 96297b62 b058371c 00120089 b0583734 nt!IoCreateFileEx+0x9e
09 b05836bc 9631f8f1 92ed7008 938fd008 b058371c fltmgr!FltCreateFileEx2+0xba
0: kd> u
nt!ObpCaptureObjectCreateInformation+0x61:
82a78339 8a01            mov     al,byte ptr [ecx]
82a7833b 833b18          cmp     dword ptr [ebx],18h
0: kd> r
eax=7fff0000 ebx=b0583734 ecx=7fff0000 edx=00000000 esi=00000000 edi=9260dd94
eip=82a78339 esp=b0583154 ebp=b0583190 iopl=0         ov up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000a02
nt!ObpCaptureObjectCreateInformation+0x61:
82a78339 8a01            mov     al,byte ptr [ecx]          ds:0023:7fff0000=??

So as you can see we 're trying to read one byte from ECX, which is set to 0x7fff0000. Since we're in kernel mode this will fail with access violation. Let's look a bit up the stack to see where that weird ECX value came from (and I added a couple instructions on the bottom just to get the full picture):

0: kd> ub . L0xE
nt!ObpCaptureObjectCreateInformation+0x35:
82a7830c 807d0800        cmp     byte ptr [ebp+8],0    <- compare a parameter to the function with 0. 
82a78310 7429            je      nt!ObpCaptureObjectCreateInformation+0x63 (82a7833b) <- and jump if it's 0
82a78312 64a124010000    mov     eax,dword ptr fs:[00000124h] <- look at some address 
82a78318 80b83a01000000  cmp     byte ptr [eax+13Ah],0 <- and see if another byte is 0
82a7831f 741a            je      nt!ObpCaptureObjectCreateInformation+0x63 (82a7833b) <- and if that is 0 jump to the same place as above (so we're probably exiting an IF statement)
82a78321 8bcb            mov     ecx,ebx
82a78323 f6c303          test    bl,3 <- check if any of the least significant two bits in ebx are set 
82a78326 7406            je      nt!ObpCaptureObjectCreateInformation+0x56 (82a7832e)
82a78328 e8281f0d00      call    nt!ExRaiseDatatypeMisalignment (82b4a255) <- and if they're set then raise an exception for data misalignment
82a7832d cc              int     3 <- and look, there's a breakpoint
82a7832e a11c079b82      mov     eax,dword ptr [nt!MmUserProbeAddress (829b071c)] <- load MmUserProbeAddress into EAX
82a78333 3bd8            cmp     ebx,eax <- and compare it with EBX
82a78335 7202            jb      nt!ObpCaptureObjectCreateInformation+0x61 (82a78339) <- and exit the IF statement if EBX is smaller than MmUserProbeAddress
82a78337 8bc8            mov     ecx,eax <-move MmUserProbeAddress into ECX
82a78339 8a01            mov     al,byte ptr [ecx] <- and try to read a byte from it.
82a7833b 833b18          cmp     dword ptr [ebx],18h
0: kd> dp nt!MmUserProbeAddress L1
829b071c  7fff0000

Before I explain what's going on let's see what the IF is actually checking:

0: kd> dp fs:[00000124h] L1
0030:00000124  926bfd48
0: kd> !pool 926bfd48 2
Pool page 926bfd48 region is Nonpaged pool
*926bfd18 size:  2e8 previous size:   10  (Allocated) *Thre (Protected)   <- this is a thread!
  Pooltag Thre : Thread objects, Binary : nt!ps
0: kd> dt nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 CycleTime        : Uint8B
...
   +0x13a PreviousMode     : Char   <-and at offset 0x13A we have the PreviousMode !
...

Let's piece it all together. This piece of code is trying to figure out if a buffer that it receives as a parameter (I didn't show that code but that's what's in EBX) is accessible. It first looks at some boolean parameter and if that's TRUE then it looks at the PreviousMode and if that's UserMode then it checks the buffer for alignment (and it raises an exception if it's not aligned) and then it checks it's in user mode memory range ( smaller than MmUserProbeAddress ). If it's not smaller than MmUserProbeAddress then it just does a probe at MmUserProbeAddress. The buffer in question is the _OBJECT_ATTRIBUTES structure, which is actually the same one as the one I pass in for the original create. This is interesting because ObOpenObjectByName can be seen twice on the stack, with the same _OBJECT_ATTRIBUTES parameter, but the first time the call to nt!ObpCaptureObjectCreateInformation works when checking the buffer and the second time it doesn't. Since the PreviousMode is UserMode in both cases, the only difference is that the boolean parameter that ObpCaptureObjectCreateInformation is called with is 1 in the first case and 0 in the second case. This parameter seems to be related to OBJ_FORCE_ACCESS_CHECK since if I don't use OBJ_FORCE_ACCESS_CHECK then the parameter is 1 in both cases and no access violation happens.
One additional thing I'd like to explain is why this check happens only for volume mount points and not symlinks. Volume mount points are different from symlinks in that there before a new IRP_MJ_CREATE is sent to the new path, for volume mount points the IO manager checks whether the user has access to the target volume. The same thing happens for directory junctions, which are similar in implementation to volume mount points. Symlinks don't perform this check and instead the new IRP_MJ_CREATE is simply issued to the new path and access checks will be performed there. Now, for volume mount points this access check takes the form of a IopQueryNetworkAttributes, which in turn is implemented as a special kind of OPEN operation (hence the call to ObOpenObjectByName). It's this second open where ObpCaptureObjectCreateInformation fails and the access violation happens.
This problem was fixed in Win8 and it was very likely a windows bug.
Before I end this post I'd like to summarize some of the things I've discussed:

There is a windows 7 bug where if FltCreateFile tries to open a path that traverses a volume mount point or directory junction the call will fail with STATUS_ACCESS_VIOLATION. Fixed in Win8.
This happens because of the extra access checking for volume mount points and directory junctions.
This doesn't happen for symlinks.
Workaround: I can't think of anything a filter could do but one posibility to address this problem is to ask customers to not use volume mount points and instead use directory symlinks.

Finally, in closing I'd like to show some code (a modification of the PassThrough sample) that calls FltCreateFile whenever the user tries to open anything that has the name "mnt" (file or folder). After the FltCreateFile call the minifilter breaks so that I get a chance to look at the status returned by the FltCreateFile. Here is the code:


    NTSTATUS status;

    UNREFERENCED_PARAMETER( FltObjects );
    UNREFERENCED_PARAMETER( CompletionContext );

    PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,
                  ("PassThrough!PtPreOperationPassThrough: Entered\n") );




    if (Data->Iopb->MajorFunction == IRP_MJ_CREATE) {

        UNICODE_STRING myFile = RTL_CONSTANT_STRING( L"mnt" );
        OBJECT_ATTRIBUTES fileAttributes;
        HANDLE fileHandle = NULL;
        IO_STATUS_BLOCK ioStatus;
        PFLT_FILE_NAME_INFORMATION fileName = NULL;  

        NTSTATUS ecpFindStatus;
        ULONG targetEcpSize = 0;

        __try {

            //  
            // this is a preCreate, get the name.  
            //  

            status = FltGetFileNameInformation( Data,   
                                                FLT_FILE_NAME_OPENED | FLT_FILE_NAME_QUERY_DEFAULT,   
                                                &fileName );  
            if (!NT_SUCCESS(status)) {

                DbgBreakPoint();
                __leave;
            }
    
            //
            // we need to parse the name to get the final component
            //

            status = FltParseFileNameInformation( fileName );

            if (!NT_SUCCESS(status)) {

                DbgBreakPoint();
                __leave;
            }

            //
            //  Compare to see if this is a file we care about.
            //

            if (!RtlEqualUnicodeString( &fileName->FinalComponent,
                                        &myFile,
                                        TRUE )) {

                __leave;
            }

            //
            // intialize the attributes and issue the create.
            //

            InitializeObjectAttributes( &fileAttributes, 
                                        &fileName->Name,
                                        OBJ_KERNEL_HANDLE | OBJ_FORCE_ACCESS_CHECK,
                                        NULL,
                                        NULL );

            status = FltCreateFileEx2( FltObjects->Filter,
                                       FltObjects->Instance,
                                       &fileHandle,
                                       NULL,
                                       FILE_READ_DATA | SYNCHRONIZE | FILE_READ_ATTRIBUTES,
                                       &fileAttributes,
                                       &ioStatus,
                                       NULL,
                                       FILE_ATTRIBUTE_NORMAL,
                                       FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
                                       FILE_OPEN,
                                       FILE_SYNCHRONOUS_IO_ALERT,
                                       NULL,
                                       0,
                                       0,
                                       NULL );


            DbgBreakPoint();
        } __finally {

            if (fileName != NULL) {

                FltReleaseFileNameInformation( fileName );
            }

            if (fileHandle != NULL) {

                ZwClose(fileHandle);
            }
        }
    }



    //
    //  See if this is an operation we would like the operation status
    //  for.  If so request it.
    //
    //  NOTE: most filters do NOT need to do this.  You only need to make
    //        this call if, for example, you need to know if the oplock was
    //        actually granted.
    //

    if (PtDoRequestOperationStatus( Data )) {

        status = FltRequestOperationStatusCallback( Data,
                                                    PtOperationStatusCallback,
                                                    (PVOID)(++OperationStatusCtx) );
        if (!NT_SUCCESS(status)) {

            PT_DBG_PRINT( PTDBG_TRACE_OPERATION_STATUS,
                          ("PassThrough!PtPreOperationPassThrough: FltRequestOperationStatusCallback Failed, status=%08x\n",
                           status) );
        }
    }

Interesting article

2012-09-13T09:38:00.001-07:00

I'm pretty swamped at work so I won't post anything this week, but I'd like to share an article that I found interesting. It's about some malware, Backdoor.Proxybox, that hooks NTFS directly. This is the page on Symantec's blog: Backdoor.Proxybox: Kernel File System Hooking.

Completing Information Queries in File System Filters

2012-09-06T08:53:00.000-07:00

Completing IRP_MJ_QUERY_INFORMATION requests isn't a terribly difficult task but I wanted to discuss some of the basics as well as share some code to show how a minifilter might go about doing this. I've already discussed some of the more advanced topics related to this in my previous posts on Filters And IRP_MJ_QUERY_INFORMATION and on Setting IoStatus.Information.

There are many information classes and many of them can be found on the documentation page for the ZwQueryInformationFile function, though the complete list is documented on the 2.4 File Information Classes page in the MS_FSCC document. Please note that not all of them can be used with IRP_MJ_QUERY_INFORMATION. However, for this post I'll only be focusing on those that can.

Pretty much each information class has an associated structure. For example, the information class of FileAlignmentInformation has the associated FILE_ALIGNMENT_INFORMATION structure. Some of these structures have a fixed size so in that case all a filter has to do is to populate the information and complete the request:

        switch ( Data->Iopb->Parameters.QueryFileInformation.FileInformationClass ) {

            case FilePositionInformation:
                {
                    PFILE_POSITION_INFORMATION positionInfo = Data->Iopb->Parameters.QueryFileInformation.InfoBuffer;

                    // retrieve the current position from the file object

                    positionInfo->CurrentByteOffset = FltObjects->FileObject->CurrentByteOffset;

                    // then complete the operation

                    Data->IoStatus.Status = STATUS_SUCCESS;
                    Data->IoStatus.Information = sizeof(positionInfo->CurrentByteOffset);
                    callbackStatus = FLT_PREOP_COMPLETE;
                }
                break;
 ....
 }
 ...
 return callbackStatus;

The filter can expect that a buffer of the appropriate size is received, otherwise the IO manager will reject the operation with STATUS_INFO_LENGTH_MISMATCH. This is done inside the IO manager, where the IO manager has an array of minimum sizes for each information class (can be seen in the debugger as nt!IopQueryOperationLength) and if the size is smaller than that then the request will fail with the error code I mentioned:

0: kd> u nt!NtQueryInformationFile+0x36 L0xB
nt!NtQueryInformationFile+0x36:
82aa87dc 8b4518          mov     eax,dword ptr [ebp+18h]
82aa87df 83f838          cmp     eax,38h
82aa87e2 7377            jae     nt!NtQueryInformationFile+0xb5 (82aa885b)
82aa87e4 8a80d4a0a782    mov     al,byte ptr nt!IopQueryOperationLength (82a7a0d4)[eax]
82aa87ea 84c0            test    al,al
82aa87ec 746d            je      nt!NtQueryInformationFile+0xb5 (82aa885b)
82aa87ee 0fb6c0          movzx   eax,al
82aa87f1 394514          cmp     dword ptr [ebp+14h],eax
82aa87f4 730a            jae     nt!NtQueryInformationFile+0x5a (82aa8800)
82aa87f6 b8040000c0      mov     eax,0C0000004h
82aa87fb e98e080000      jmp     nt!NtQueryInformationFile+0x8e3 (82aa908e)

Now, there are certain information classes that have variable length. In general this happens with classes that contain names (for example, FileNameInformation) or that return certain lists (like FileStreamInformation that returns all the alternate data streams for the file). In these cases it's impossible for the IO manager to know what the required size might be without querying the file system and so file systems and file system filters will receive requests with what is possibly an insufficient buffer size (there are apps that allocate huge buffer just to avoid reissuing a request, but there are many callers that don't do that; I generally prefer to issue a request with enough buffer in cases where that's possible, for example most files don't have Alternate Data Streams so it's possible to calculate the size of the buffer). Those requests will be failed by the file system (or filter) with the STATUS_BUFFER_OVERFLOW status (which is actually a warning, not an error!). However, different information classes have different requirements about how the structure has to be populated in such cases. There are some structures (like FILE_NAME_INFORMATION) that have a field where the caller can fill in the required size so that the caller can just allocate a buffer of the appropriate size and resend the request. Other structures don't have any such field (like FILE_STREAM_INFORMATION) and so any caller must simply increase the size of the buffer without knowing in advance if it will be sufficient (I personally double the size of the buffer for each iteration but there are other ways of doing it) and then they must issue the request and see whether it was enough.

Finally, there are certain requirements for each information class about how to fill in the buffer even in the event when the request is going to be failed with STATUS_BUFFER_OVERFLOW. For example, for FileNameInformation the buffer must be filled with as many characters as will fit the buffer whereas the FILE_STREAM_INFORMATION must be filled only with complete records. These type of requirements make it necessary for whoever completes the request to be able to specify how much of the buffer actually contains real information. This can be achieved by setting the IoStatus.Information member to the number of bytes to return the caller. The IO manager will then copy the specified number of bytes to the caller's buffer. It is very important to get this right because returning a larger number here will result in the IO manager copying more bytes than were actually written by the file system or filter and is a security problem. For more details on this see Doron's post on How to return the number of bytes required for a subsequent operation. Of course, the specifics of how the buffer must be set up for each class are documented in the pages for each information class or associated structure, so whenever dealing with this subject make sure to re-read those pages and pay great attention to the details.

I wanted to show some more code on how to do this, but there is plenty of code that shows how this is done in the file system sources that come with the WDK. I found the CDFS sources to be a bit more clear for this specific topic, where CdCommonQueryInfo pretty much does it all and shows exactly how to set things for most information classes. FastFat can be a bit more convoluted at times.

Finally, I want to say that whenever I have to deal with this I find it extremely useful to just use FileTest and then query the information class I want while supplying different sized buffers (starting with 1 byte, then 2 bytes and so on) and make sure that I understand exactly what the file system returns without my filter present and how things look with the filter installed.

Mount Point Resolution with FltGetFileNameInformation - Part II

2012-08-30T10:44:00.000-07:00

In the previous post I showed a couple of examples of how FltMgr handles mount point resolution. I also ran into a problem where querying for the normalized name during preCreate for a path that would traverse a mountpoint failed, and I promised I would investigate and post my findings. In this post I will explain what I found and how I went about it (which might be interesting for someone that doesn't have a lot of experience debugging the workings of fltmgr or other kernel components).
So first, let me explain my findings. As a quick refresher, FltMgr performs name normalization by taking an opened name for a file (like C:\foo~1\bar~2.txt) and opening the parent directory (C:\foo~1) and then querying the directory (via the directory enumeration call) for the file name (bar~2.txt). FltMgr uses one of the directory structures that returns the long name for the file (like FILE_BOTH_DIR_INFORMATION or FILE_NAMES_INFORMATION) and then copies the long name for the file (let's say it is BarBarBar.txt). It then does the same for current directory name (for C:\foo~1 it would open C:\ and query for foo~1 and get back FooFooFoo) and then assembles everything back together into C:\FooFooFoo\BarBarBar.txt. Now, this process will fail with STATUS_NOT_SAME_DEVICE if when trying to open a parent directory (C:\foo~1) FltMgr encounters a reparse point that reparses to a different volume. The relevant code is this:

0: kd> u fltmgr!FltpExpandFilePathWorker+0x29e
fltmgr!FltpExpandFilePathWorker+0x29e:
96034a1c ff154c030296    call    dword ptr [fltmgr!_imp__IoGetAttachedDevice (9602034c)]
96034a22 ff765c          push    dword ptr [esi+5Ch]
96034a25 8bf8            mov     edi,eax
96034a27 ff1548010296    call    dword ptr [fltmgr!_imp__IoGetRelatedDeviceObject (96020148)]
96034a2d 3bc7            cmp     eax,edi
96034a2f 740c            je      fltmgr!FltpExpandFilePathWorker+0x2bf (96034a3d)
96034a31 c74508d40000c0  mov     dword ptr [ebp+8],0C00000D4h
96034a38 e9e8feffff      jmp     fltmgr!FltpExpandFilePathWorker+0x1a7 (96034925)

As you can see, FltMgr compares two devices, the one on which the IRP_MJ_CREATE to open the parent directory was issued and the one where it was actually opened on, and if they don't match it will fail with STATUS_NOT_SAME_DEVICE (which is an interesting and confusing choice since the actual error message (from ntstatus.h) is "The target file of a rename request is located on a different device than the source of the rename request." and there is no rename involved). There are a couple of interesting points to make:

This should only impact preCreate queries since after the IRP_MJ_CREATE completes successfully we are on the final volume and so the opened file name can't traverse any reparse points (as discussed in my previous post on this subject). So in postCreate (and this implies a successful create) there will be no reparse points to traverse.
The other interesting thing to note is that FltMgr actually issues its IRP_MJ_CREATE violating the file system filter layering rules, in that it sends the request to the top of the IO stack. It does this with the explicit goal of being able to survive reparses to other volumes (like I've discussed elsewhere on this blog). However, it then checks to figure out if it actually ended up on a different volume and fails the operation, which would have happened anyway if they targeted the IRP_MJ_CREATE properly. The only case where this might actually work and the proper layering of the IRP_MJ_CREATE would have failed is where there multiple reparses that eventually end up on the original volume where the IRP_MJ_CREATE was issued on (so you'd have C:\foo~1 reparse to D:\bar~1 which would in turn reparse back to C:\baz~1). This would fail with proper layering of the IRP_MJ_CREATE because the first reparse (from C:\ to D:\) would fail, but it would work with the current implementation because the request is targeted at C:\ and the directory that is opened eventually is on C:\ as well). However, I doubt that this behavior is what FltMgr's designers had in mind and instead I tend to believe that violating the layering in this case was unnecessary.

Anyway, let me quickly show how I've debugged this issue to find where the error comes from in a couple of minutes. I'll first show you the code that I added to the passthrough sample to detect when this case happens:

    if (PtDoRequestOperationStatus( Data )) {

        status = FltRequestOperationStatusCallback( Data,
                                                    PtOperationStatusCallback,
                                                    (PVOID)(++OperationStatusCtx) );
        if (!NT_SUCCESS(status)) {

            PT_DBG_PRINT( PTDBG_TRACE_OPERATION_STATUS,
                          ("PassThrough!PtPreOperationPassThrough: FltRequestOperationStatusCallback Failed, status=%08x\n",
                           status) );
        }
    }

    if (Data->Iopb->MajorFunction == IRP_MJ_CREATE) {

        status = FltGetFileNameInformation( Data, 
                                            FLT_FILE_NAME_NORMALIZED,
                                            &fileName );

        if (NT_SUCCESS(status)) {

            DbgPrint("preCreate|normalized -> \"%wZ\"\n", &fileName->Name); 

            FltReleaseFileNameInformation( fileName );

        } else {

            DbgBreakPoint();

            status = FltGetFileNameInformation( Data, 
                                                FLT_FILE_NAME_NORMALIZED,
                                                &fileName );

        
        }

As you can see, when the call to FltGetFileNameInformation fails I've added a breakpoint and then I retry the operation with the exact same parameters, which allows me to investigate the call. After that, when the breakpoint triggers, I do this:

Break instruction exception - code 80000003 (first chance)    <- here I've hit the breakpoint
PassThrough!PtPreOperationPassThrough+0xd2:
a414c0f2 cc              int     3
0: kd> p <- step over the breakpoint
PassThrough!PtPreOperationPassThrough+0xd3:
a414c0f3 8d45f8          lea     eax,[ebp-8]
0: kd> t <- trace into the function
fltmgr!FltGetFileNameInformation:
9601ce78 8bff            mov     edi,edi
0: kd> pc <- find the next call
fltmgr!FltGetFileNameInformation+0x118:
9601cf90 e833010000      call    fltmgr!FltpAllocateInitializeNameGenerationContext (9601d0c8)
0: kd> p <- and step over it
fltmgr!FltGetFileNameInformation+0x11d:
9601cf95 8bf0            mov     esi,eax
0: kd> r <- and now inspect the registers. In this calling convention eax will be the NTSTATUS , so eax = 00000000 means STATUS_SUCCESS
eax=00000000 ebx=92639008 ecx=944f6ae0 edx=944f6ae0 esi=92639068 edi=00000000
eip=9601cf95 esp=a194d96c ebp=a194d988 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
fltmgr!FltGetFileNameInformation+0x11d:
9601cf95 8bf0            mov     esi,eax
0: kd> pc <- ok, find the next call
fltmgr!FltGetFileNameInformation+0x126:
9601cf9e e85df8ffff      call    fltmgr!FltpGetFileNameInformation (9601c800)
0: kd> p <- step over it 
fltmgr!FltGetFileNameInformation+0x12b:
9601cfa3 8bf0            mov     esi,eax
0: kd> r <- inspect the result again and note that eax=c00000d4, which happens to be STATUS_NOT_SAME_DEVICE .
eax=c00000d4 ebx=92639008 ecx=00000000 edx=00000003 esi=00000000 edi=00000000
eip=9601cfa3 esp=a194d96c ebp=a194d988 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
fltmgr!FltGetFileNameInformation+0x12b:
9601cfa3 8bf0            mov     esi,eax

So now I know that the STATUS_NOT_SAME_DEVICE error came from the call to fltmgr!FltpGetFileNameInformation. Time to continue this investigation, but this time looking into that function:

0: kd> g <- hit go and trigger the condition (which in my case was trying to traverse a mountpoint so I just did a notepad C:\mnt\foo.txt, where C:\mnt was a mount point)
Break instruction exception - code 80000003 (first chance)
PassThrough!PtPreOperationPassThrough+0xd2:
a414c0f2 cc              int     3 <- so I hit the breakpoint again, as expected. 
0: kd> bp /t @$thread fltmgr!FltpGetFileNameInformation  <- set a threaded breakpoint on the function where I know the failure should happen. The thread is useful when a function is called from multiple threads and I don't want the breakpoint to trigger for those.
0: kd> g <- hit go and we should hit the breakpoint when the current thread enters that function
Breakpoint 0 hit  <- and so it is...
fltmgr!FltpGetFileNameInformation:
9601c800 8bff            mov     edi,edi
0: kd> bc * <- clear the breakpoint . This is sometimes necessary because if there is recursion it will hit again on the same thread and might confuse things
0: kd> pc <- find the first call
fltmgr!FltpGetFileNameInformation+0x48:
9601c848 e8db9bffff      call    fltmgr!FltpGetNextCallbackNodeForInstance (96016428)
0: kd> p <- and step over to see the status
fltmgr!FltpGetFileNameInformation+0x4d:
9601c84d 894614          mov     dword ptr [esi+14h],eax
0: kd> r <- check the status, which is STATUS_SUCCESS
eax=00000000 ebx=92639008 ecx=9444cd48 edx=00000000 esi=94501858 edi=00000000
eip=9601c84d esp=a194d864 ebp=a194d888 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
fltmgr!FltpGetFileNameInformation+0x4d:
9601c84d 894614          mov     dword ptr [esi+14h],eax ds:0023:9450186c=00000000
0: kd> pc <- find the next call
fltmgr!FltpGetFileNameInformation+0x5a:
9601c85a e8c99bffff      call    fltmgr!FltpGetNextCallbackNodeForInstance (96016428)
0: kd> p <- step again
fltmgr!FltpGetFileNameInformation+0x5f:
9601c85f 894618          mov     dword ptr [esi+18h],eax
0: kd> r <- STATUS_SUCCESS again
eax=00000000 ebx=92639008 ecx=9444cd48 edx=00000000 esi=94501858 edi=00000000
eip=9601c85f esp=a194d864 ebp=a194d888 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
fltmgr!FltpGetFileNameInformation+0x5f:
9601c85f 894618          mov     dword ptr [esi+18h],eax ds:0023:94501870=00000000
0: kd> pc <- and then the next call
fltmgr!FltpGetFileNameInformation+0xb2:
9601c8b2 e829160000      call    fltmgr!FltpGetStreamListCtrl (9601dee0)
0: kd> p 
fltmgr!FltpGetFileNameInformation+0xb7:
9601c8b7 894508          mov     dword ptr [ebp+8],eax
0: kd> r <- this fails with STATUS_NOT_SUPPORTED. This might be a failure that gets converted into the one we're looking for. I investigated it (not shown here) but it doesn't result in a failure, it just changes the code path through the function
eax=c00000bb ebx=92639008 ecx=9601e0a8 edx=00000000 esi=94501858 edi=00000000
eip=9601c8b7 esp=a194d864 ebp=a194d888 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
fltmgr!FltpGetFileNameInformation+0xb7:
9601c8b7 894508          mov     dword ptr [ebp+8],eax ss:0010:a194d890=94501858
0: kd> pc <-find the next call
fltmgr!FltpGetFileNameInformation+0xc2:
9601c8c2 e887fdffff      call    fltmgr!HandleStreamListNotSupported (9601c64e)
0: kd> p <- step over
fltmgr!FltpGetFileNameInformation+0xc7:
9601c8c7 894508          mov     dword ptr [ebp+8],eax
0: kd> r <- and we can see that we've hit the status we wanted, eax=c00000d4. so now we know the failure comes from fltmgr!HandleStreamListNotSupported, so set a breakpoint on it.
eax=c00000d4 ebx=92639008 ecx=00000000 edx=00000003 esi=94501858 edi=00000000
eip=9601c8c7 esp=a194d864 ebp=a194d888 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
fltmgr!FltpGetFileNameInformation+0xc7:
9601c8c7 894508          mov     dword ptr [ebp+8],eax ss:0010:a194d890=c00000bb
0: kd> g <- so now we need to go back to trigger the breakpoint
Break instruction exception - code 80000003 (first chance)
PassThrough!PtPreOperationPassThrough+0xd2:
a414c0f2 cc              int     3
0: kd> bp /t@$thread fltmgr!HandleStreamListNotSupported  <- set our threaded breakpoint on that function
0: kd> g <- and run
Breakpoint 0 hit
fltmgr!HandleStreamListNotSupported:
9601c64e 8bff            mov     edi,edi

After a couple of iterations (in about 5 minutes of debugging) we end up with this stack, where the status is actually generated.:

0: kd> k
ChildEBP RetAddr  
a194d8ac 96034c59 fltmgr!FltpExpandFilePathWorker+0x2b3
a194d8c4 96034dc3 fltmgr!FltpExpandFilePath+0x19
a194d8e0 96035505 fltmgr!FltpGetNormalizedFileNameWorker+0x7d
a194d8f8 96032765 fltmgr!FltpGetNormalizedFileName+0x19
a194d910 9601c773 fltmgr!FltpCreateFileNameInformation+0x81
a194d930 9601c8c7 fltmgr!HandleStreamListNotSupported+0x125
a194d960 9601cfa3 fltmgr!FltpGetFileNameInformation+0xc7
a194d988 a414c102 fltmgr!FltGetFileNameInformation+0x12b
a194d9ac 96016aeb PassThrough!PtPreOperationPassThrough+0xe2 [c:\temp11\passthrough\passthrough.c @ 705]
a194da18 960199f0 fltmgr!FltpPerformPreCallbacks+0x34d
a194da30 9602d1fe fltmgr!FltpPassThroughInternal+0x40
a194da44 9602d8b7 fltmgr!FltpCreateInternal+0x24
a194da88 828884bc fltmgr!FltpCreate+0x2c9
a194daa0 82a8c6ad nt!IofCallDriver+0x63
a194db78 82a6d26b nt!IopParseDevice+0xed7
a194dbf4 82a932d9 nt!ObpLookupObjectName+0x4fa
a194dc50 82a8b62b nt!ObOpenObjectByName+0x165
a194dccc 82ac667e nt!IopCreateFile+0x673
a194dd14 8288f44a nt!NtOpenFile+0x2a
a194dd14 776064f4 nt!KiFastCallEntry+0x12a
WARNING: Stack unwind information not available. Following frames may be wrong.

Mount Point Resolution in FltGetFileNameInformation

2012-08-23T14:04:00.000-07:00

In this post I'd like to discuss one particular aspect of using FltGetFileNameInformation and how mount point resolution is handled. The issue I want to talk about is mentioned in the documentation for the FLT_FILE_NAME_INFORMATION structure, where in the Remarks section it says that a name is normalized if all the mount points are resolved. This was mentioned in the context of the OSR thread http://www.osronline.com/showThread.CFM?link=230176, where it was suggested that the definition means that opened names would not resolve mount points (see the discussion starting at message 21).
So the plan is to discuss how mount points work and how name normalization works in this context and what's the actual behavior for mount point resolution for normalized and opened names.
A mount point is a special type of reparse point, where it reparses from a directory to the root of a volume. A typical scenario would be mounting a volume like \Device\HarddiskVolume4\ as a folder under \Device\HarddiskVolume3\mnt\ (the volume names are actual names that I'm going to use in my examples later). This can be easily done using mountvol.exe, like so:

C:\>mountvol.exe

Creates, deletes, or lists a volume mount point.
....
....

Possible values for VolumeName along with current mount points are:

    \\?\Volume{a681bf9c-c97b-11de-90d7-806e6f6e6963}\
        *** NO MOUNT POINTS ***

    \\?\Volume{f4810a4e-cfbb-11de-86cd-000c291f01a1}\
        D:\

    \\?\Volume{f4810a5a-cfbb-11de-86cd-000c291f01a1}\
        E:\

    \\?\Volume{a681bf9d-c97b-11de-90d7-806e6f6e6963}\
        C:\

    \\?\Volume{a681bfa1-c97b-11de-90d7-806e6f6e6963}\
        A:\

    \\?\Volume{a681bfa0-c97b-11de-90d7-806e6f6e6963}\
        F:\

C:\>mountvol.exe D:\mnt \\?\Volume{f4810a5a-cfbb-11de-86cd-000c291f01a1}\

C:\>mountvol.exe

Creates, deletes, or lists a volume mount point.
....
....

Possible values for VolumeName along with current mount points are:

    \\?\Volume{a681bf9c-c97b-11de-90d7-806e6f6e6963}\
        *** NO MOUNT POINTS ***

    \\?\Volume{f4810a4e-cfbb-11de-86cd-000c291f01a1}\
        D:\

    \\?\Volume{f4810a5a-cfbb-11de-86cd-000c291f01a1}\
        E:\
        D:\mnt\

    \\?\Volume{a681bf9d-c97b-11de-90d7-806e6f6e6963}\
        C:\

    \\?\Volume{a681bfa1-c97b-11de-90d7-806e6f6e6963}\
        A:\

    \\?\Volume{a681bfa0-c97b-11de-90d7-806e6f6e6963}\
        F:\

Please note that I can still use both E:\ and D:\mnt\ to access the volume. One could also remove the E:\ mount point and rely only on D:\mnt\.
So what happens when someone tries to open D:\mnt\foo.txt ? Here are the steps, and pretty much most of this has been explained in other posts on this site, mainly around STATUS_REPARSE and IRP_MJ_CREATE processing:

the IO manager issues an IRP_MJ_CREATE for the volume that is D: (\Device\HarddiskVolume3) with the FILE_OBJECT->FileName set to "\mnt\foo.txt"
NTFS sees there is a reparse point on "\mnt" and returns the new path for the file, concatenating the name of the volume in the reparse point with the rest of the path, as "\Device\HarddiskVolume4\FOO.TXT"
the IO manager issues a new IRP_MJ_CREATE for the volume that is E: (\Device\HarddiskVolume4) with the FILE_OBJECT->FileName set to "\foo.txt", which succeeds.

Before we start looking at what the names look like, let's quickly discuss what FltMgr does when generating names. There are two steps here, first FltMgr will generate a name for the file (the Generate step) and then, if the caller requested it, it will normalize it (the Normalize step).
The Normalize step is pretty simple so we'll discuss it first. FltMgr opens the parent directory for the name it has from the Generate step and does a directory query to get the long name associated with the file (or directory). It doesn't know if the name it has for the file is short or long and it doesn't really matter, it will figure it out from the directory query anyway. It then concatenates this with the normalized name for the parent directory. If it doesn't have the normalized name for the parent directory cached it will build it using the same algorithm. Please note that this is different in Win8, where NTFS can directly return the normalized name for a file. Also, this can explain why it can be pretty expensive to generate a normalized name, though FltMgr's name cache is pretty effective and so in most cases there is no need to perform the normalization of the parent directory name.
The Generate step differs between the time when the file isn't opened (preCreate) and the time after the file is opened (postCreate and all operations after that). If the file isn't opened FltMgr will take the name from the FILE_OBJECT->FileName and the RelatedFileObject (if there is one). If the file is opened then FltMgr will simply query NTFS for the file name using an IRP_MJ_QUERY_INFORMATION request. Please note that the OPENED name is actually the name FltMgr gets from the Generate step.
So now that all the pieces are in place, let's discuss the differences between the opened and the normalized name:

If the file is opened (so after a successful postCreate and for any operation where the FILE_OBJECT is opened) the name will always have the mount point resolved because during the Generate step FltMgr will simply concatenate the volume name with the path to the file that NTFS returns.
If this is a postCreate but the status is STATUS_REPARSE FltGetFileNameInformation will fail with STATUS_FLT_INVALID_NAME_REQUEST, so the issue of the mount point resolution doesn't apply.
The one case that remains is that of a preCreate, then whether mount points are resolved depends on whether this is the final IRP_MJ_CREATE or not. If it is the final IRP_MJ_CREATE then IO manager has already resolved the mount points so the opened name and the normalized name will have the same volume name. If this is not the final IRP_MJ_CREATE then the opened name will always have the volume name on which the IRP_MJ_CREATE was issued, and the normalized name will always use the name it has when opening the parent folder for the final component. This last bit is interesting because it's not clear what happens when opening the actual mount point. Will it be resolved or not ? It depends on how the query for the normalization is done, if necessary.

So let's go over some examples. First I'm going to show what happens when I open D:\mnt\folder_under_mount_point\foo.txt:

preCreate|opened -> "\Device\HarddiskVolume3\mnt\folder_under_mount_point\foo.txt" -> as expected, mount point not resolved.
preCreate|normalized -> "\Device\HarddiskVolume4\folder_under_mount_point\foo.txt" -> this is after the reparse, so mount point is resolved.
preCreate|opened -> "\Device\HarddiskVolume4\FOLDER_UNDER_MOUNT_POINT\FOO.TXT"  -> after reparse, so no mount point. Please note the case.
postCreate|normalized -> "\Device\HarddiskVolume4\folder_under_mount_point\foo.txt" -> normalize gets the actual proper case from NTFS
postCreate|opened -> "\Device\HarddiskVolume4\FOLDER_UNDER_MOUNT_POINT\FOO.TXT" -> but the opened name has the original case from the user's request.

Things to note here are the fact that there are no names from the postCreate for the first creates since they fail with STATUS_FLT_INVALID_NAME_REQUEST because those creates aren't successful. Also, interesting to note that for the first create (for which we get the opened name) trying to get the normalized name fails with STATUS_NOT_SAME_DEVICE on my system. Not sure why but I'll investigate it for next week's blog :).
Now let's look at what happens when we open something directly under the mount point (pretty much the same as above):

preCreate|opened -> "\Device\HarddiskVolume3\mnt\foo.txt" -> as expected, mount point not resolved.
preCreate|normalized -> "\Device\HarddiskVolume4\foo.txt" -> and again, after reparse, mount point resolved.
preCreate|opened -> "\Device\HarddiskVolume4\FOO.TXT" 
postCreate|normalized -> "\Device\HarddiskVolume4\foo.txt"
postCreate|opened -> "\Device\HarddiskVolume4\FOO.TXT"

And finally I want to show you what happens if I try to open the actual mount point itself, D:\mnt\.

preCreate|normalized -> "\Device\HarddiskVolume3\mnt" -> please note how the normalized name in this case is actually the mount point, so the mount point is NOT resolved.
preCreate|opened -> "\Device\HarddiskVolume3\mnt\" -> naturally, mount point not resolved since this is the opened name.
preCreate|normalized -> "\Device\HarddiskVolume4\"
preCreate|opened -> "\Device\HarddiskVolume4\"
postCreate|normalized -> "\Device\HarddiskVolume4\"
postCreate|opened -> "\Device\HarddiskVolume4\"

Hopefully this shows what's going on with mount point resolution and opened and normalized names, in more detail that anyone ever needed :).

Testing Case-Sensitive Behavior

2012-07-26T10:09:00.000-07:00

In this post I want to talk about case sensitivity and how to test such behavior for file systems and file system filters. I've addressed various case-sensitive file system filter specific topics in my previous posts on FILE_OBJECT Names in IRP_MJ_CREATE and on Names in Minifilters - Implementing Name Provider Callbacks.
This isn't a particular complicated subject but a developer must be careful in their implementation to support this concept. Other than figuring out the case sensitivity of the name in the IRP_MJ_CREATE path and in the name provider paths, a developer must in general pay attention to their implementation so that if, for example, they compare names or file extensions and they would use RtlEqualUnicodeString or RtlCompareUnicodeString then they must use the appropriate value for the CaseInSensitive parameter, deduced from the IRP_MJ_CREATE parameters or from the FILE_OBJECT flags. Also, when dealing with UNICODE_STRINGs, one should never try to change the case by directly touching the bits in the string, and instead always use functions like RtlUpcaseUnicodeString and RtlUpcaseUnicodeChar (and their counterparts, RtlDowncaseUnicodeString and RtlDowncaseUnicodeChar).
However, for this post I don't want to focus on implementation but on how to test this. In my opinion the best place to start is the IFS Test suite, since it has a lot of tests that deal with the create path, and in particular some that test both case sensitive behavior and case preserving behavior. However, when running this test on a regular system it will most likely fail with this message:

E94.E98 :  +TEST+SEV2      : Test         :CaseSensitiveTest
Group        :OpenCreateGeneral
Status       :C000001E (IFSTEST_TEST_NTAPI_FAILURE_CODE)
LastNtStatus     :C0000035 STATUS_OBJECT_NAME_COLLISION
ExpectedNtStatus :00000000 STATUS_SUCCESS
Description  :{Msg# OpCreatG!casesen!14} Failure while creating a
              test file \casesen\NeSesac.dat. Note that in a previous
              step we created another test file. That other test
              file \casesen\nesesac.dat, differs from this one by
              the case of the name. 
Lookup Query : http://www.microsoft.com/ContentRedirect.asp?prd=IFSKit&id=OpCreatG!casesen!14&pver=2195

This happens because by default Windows is case insensitive and so naturally the case sensitivity test fails. In order to enable case sensitive behavior for Windows all you need to do is change this registry value (it should be 1 be default so change it to 0):

HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive

BTW, this is described in Microsoft's KB 817921, even though the case deals with something else entirely.
So anyway, once you set this key make sure that the file system itself is case sensitive and then test your filter like you normally would, though it might be a good idea to add a couple of tests with files that have the same name with different cases, just to make sure you're exercising the relevant code paths. For example, if implementing an anti-virus program it might be interesting to test with a clean file with the name "foo.txt" and the EICAR file "Foo.txt" in the same folder, to make sure that not only the scanning happens on the apropriate file, but also that the other parts of the program (such as the quarantine engine or the cleaning engine) work as expected. Another good test is to then switch file names (so that "Foo.txt" is clean and "foo.txt" is the EICAR file), so that in case the product somehow depends on the sorting order in the directory that issue can be found and addressed.

Finally, this is an easy way to check that the file system is case sensitive (UPDATE: as Phil pointed out in the comments, this doesn't mean that the requests are going to be case sensitive, it simply means the file system supports it; whether this is on or off generally depends on the file system implementation and not the registry key value):

C:\Windows\system32>fsutil fsinfo volumeinfo C:
Volume Name : System
Volume Serial Number : 0x4297004a
Max Component Length : 255
File System Name : NTFS
Supports Case-sensitive filenames
Preserves Case of filenames
Supports Unicode in filenames
Preserves & Enforces ACL's
Supports file-based Compression
Supports Disk Quotas
Supports Sparse files
Supports Reparse Points
Supports Object Identifiers
Supports Encrypted File System
Supports Named Streams
Supports Transactions
Supports Hard Links
Supports Extended Attributes
Supports Open By FileID
Supports USN Journal

Detaching Instances and FLTFL_POST_OPERATION_DRAINING

2012-07-19T10:21:00.000-07:00

I've recently realized that FLTFL_POST_OPERATION_DRAINING is not a very well understood feature. This is even though it is pretty well documented on the page for the PFLT_POST_OPERATION_CALLBACK function. So I think that in addition to the documentation page it might help to take a look at what happens under the hood.

So once a minifilter instance attaches to a volume it will start receiving IRPs and the filter will process them. There are many cases where filters need to process the results of the operation after the file system has completed it so the filter would register PostOperation callbacks and the filter would return an appropriate status to indicate that it wants a that PostOp callback called for each specific operation. These IRPs are generally called in-flight requests and in many cases the filter has data allocated associated with the operation (buffers allocated and possibly even replaced on the request, contexts that are sent from the PreOp callback to the PostOp callback, references (if the filter references a context in PreOp and dereferences it in PostOp) and so on). In case the instance needs to be detached (the minifilter is unloaded or just one instance is detached from the underlying volume) from a live volume, IO will continue to flow on the volume. One way to detach is to stop receiving new IO requests and then just wait for all the in-flight operations to be completed by the file system. Once all the in-flight operations have completed then FltMgr can pretty much free the instance because there can not be any more IO on it.

Unfortunately, this approach has a pretty big disadvantage. It is impossible to predict when the file system will complete all in-flight IRPs. There are certain operations that are very long-lived (for example, the directory change notification IRPs can be held in the file system indefinitely) which would make the time it takes to unload a filter or to detach an instance completely unpredictable. FltMgr howver is smarter than that and what it actually does is that it always tracks for each instance which operations are in-flight at any given time. So when an instance needs to be detached or a filter unloaded, FltMgr can actually know that it doesn't need to call any PostOp callbacks for that filter or that instance. However, there is still data associated with the request, buffers that have been replaced, contexts that are allocated and so on. If FltMgr simply stops calling the PostOp callbacks for all in-flight operations then the allocated data and the buffers would never be freed and the references would be leaked.

This is where FLTFL_POST_OPERATION_DRAINING comes in. When an instance is detached, FltMgr disconnects the instances from the IO path and then calls each PostOp callback for all the in-flight operations with the FLTFL_POST_OPERATION_DRAINING flag set. When a PostOp callback is called with FLTFL_POST_OPERATION_DRAINING flag the callback must not do any actual work (like checking the status of the operation or touching the FLT_CALLBACK_DATA structure at all) and instead the filter must just release any buffers or references they might have allocated. For most filters this is generally similar to what they do when the operation is unsuccessful (though there might be cases where this is different so this is more of a hint than an actual guideline to implement this feature). It is important to note that when the PostOp callback is called with the FLTFL_POST_OPERATION_DRAINING flag the actual operation might have already been completed or it might still be in the file system somewhere so the filter must not make any assumptions about the status of the actual operation.

The Flags of FILE_OBJECTs - Part V

2012-07-12T10:22:00.000-07:00

I wasn't really planning on adding another part to the series but there were still a couple of things I wanted to mention.
First, as I'm sure you've noticed if you looked at the actual flag values in the wdm.h file, there seems to be a flag missing, with the value 0x00800000, which would put it between FO_VOLUME_OPEN and FO_REMOTE_ORIGIN. Well, the easiest way to solve this mystery is to look at an older DDK where you'll find the missing flag:

FO_FILE_OBJECT_HAS_EXTENSION - 0x00800000

So what is this FO_FILE_OBJECT_HAS_EXTENSION and why is it gone from pretty much everything from documentation to the !fileobj extension (as I mentioned in my previous posts, an easy way to figure out flag values is to set them into a structure and use one of extension that knows how to parse that structure to see what name it displays; in this case however the !fileobj extension ignored that flag). As far as I can tell, this flag is no longer used since Vista, but before Vista this was used to indicate that there was some private data allocated past the end of the FILE_OBJECT (see this thread on NTFSD). From the thread it appears that the IO manager used this flag to store the targeting information associated with the FILE_OBJECT, which was also used by FltMgr (see this other thread on NTFSD as well). Since Vista this functionality is implemented using the FSRTL_PER_FILEOBJECT_CONTEXT structure. I've explored the way this works in my previous post on tracking a minifilter's ActiveOpens files.
Anyway, this FILE_OBJECT extension seems to have been used to implement the same functionality that FsRtlInsertPerFileObjectContext provides today. FltMgr's designers had a goal to make sure that everything that FltMgr does can be implemented by a 3rd party filter and so since they couldn't expect 3rd party filters to modify private OS structures they had exposed the same functionality through the FSRTL_PER_FILEOBJECT_CONTEXT structure and related functions (FsRtlInitPerFileObjectContext, FsRtlInsertPerFileObjectContext, FsRtlLookupPerFileObjectContext, FsRtlRemovePerFileObjectContext). Since this is implemented in a different way today it makes sense that this flag is obsolete for now. Please do note, however, that it's entirely possible that it will be reused at some point in the future.
Another thing I wanted to do was to take a closer look at how the FO_DISALLOW_EXCLUSIVE flag can be used, since as I mentioned in my previous post in the series I wasn't sure what it does. Well, after some investigation I have some idea what it does but I'm not sure why exactly this behavior would be desired. So first I'd like to mention that this flag doesn't seem to be used anywhere in the WDK, neither FastFat nor CDFS seem to need it. I couldn't find anything about it online. Naturally I searched for both FO_DISALLOW_EXCLUSIVE and FILE_DISALLOW_EXCLUSIVE but there was really no mention about what they might do or where they might be used. However, there was one bit of information in the WDK that helped a lot: the fact that FO_DISALLOW_EXCLUSIVE is the only flag defined for FO_FLAGS_VALID_ONLY_DURING_CREATE. So I decided to take a very close look at Ntfs' IRP_MJ_CREATE processing in the hope that I might be able to figure out where this flag might be used.
So after some debugging and using my favorite disassembler, IDA, I've been able to find a couple of places where this flag might be checked. In particular the Ntfs!NtfsOpenAttribute function looked very promising since it had code that checked for that exact value, and moreover it was using the same offset into a structure that the FILE_OBJECT->Flags are at:

1: kd> dt nt!_FILE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Int2B
...
   +0x02c Flags            : Uint4B
...
   +0x07c FileObjectExtension : Ptr32 Void

1: kd> u 960e7c14 L1
Ntfs!NtfsOpenAttribute+0x1f0:
960e7c14 f7402c00000002  test    dword ptr [eax+2Ch],2000000h

1: kd> u 960e7ded L1
Ntfs!NtfsOpenAttribute+0x3ca:
960e7ded f7402c00000002  test    dword ptr [eax+2Ch],2000000h

So I figured I had a pretty good shot at finding out how this is used by looking at when these flags are checked (the path that leads to the checks). That involved a lot of debugging and backtracking, good thing VMWare makes it so easy to retry exactly the same code path over and over again :). Anyway, this is the piece of code that turned out to be relevant:

1: kd> u Ntfs!NtfsOpenAttribute+0x3af L0x10
Ntfs!NtfsOpenAttribute+0x3b0:
960e7dd3 7328            jae     Ntfs!NtfsOpenAttribute+0x3da (960e7dfd)
960e7dd5 ff7518          push    dword ptr [ebp+18h]
960e7dd8 50              push    eax
960e7dd9 53              push    ebx
960e7dda e8e9d70100      call    Ntfs!NtfsWriteCheck (961055c8)
960e7ddf 8845e7          mov     byte ptr [ebp-19h],al
960e7de2 c645e601        mov     byte ptr [ebp-1Ah],1
960e7de6 84c0            test    al,al
960e7de8 7535            jne     Ntfs!NtfsOpenAttribute+0x3fc (960e7e1f)
960e7dea 8b4618          mov     eax,dword ptr [esi+18h]
960e7ded f7402c00000002  test    dword ptr [eax+2Ch],2000000h
960e7df4 7429            je      Ntfs!NtfsOpenAttribute+0x3fc (960e7e1f)
960e7df6 a0187c0996      mov     al,byte ptr [Ntfs!NtfsStatusDebugEnabled (96097c18)]
960e7dfb 84c0            test    al,al
960e7dfd 7414            je      Ntfs!NtfsOpenAttribute+0x3f0 (960e7e13)
960e7dff 681f3a0000      push    3A1Fh

1: kd> u 960e7e13
Ntfs!NtfsOpenAttribute+0x3f0:
960e7e13 c745e0220000c0  mov     dword ptr [ebp-20h],0C0000022h
960e7e1a e98f040000      jmp     Ntfs!NtfsOpenAttribute+0x88a (960e82ae)

What this piece of code checks is that if the file is not writable and this flag is set and the user wants to open the file exclusively (doesn't share READ, WRITE and DELETE; this particular check happens a bit further up in the code, not pictured here…) then the IRP_MJ_CREATE will fail with error 0xC0000022, STATUS_ACCESS_DENIED. I've been able to validate that this is the case using FileTest (during debugging I resorted to nasty tricks to get the thread to take that path I wanted, so I had to validate that this can be done by regular creates too), by creating a file with a Deny Write DACL for EVERYONE (so that Ntfs!NtfsWriteCheck would fail) and then opening it exclusively. It works fine if I don't specify FILE_DISALLOW_EXCLUSIVE but fails with STATUS_ACCESS_DENIED when I add FILE_DISALLOW_EXCLUSIVE to the options.
As I said before I'm not sure why this is useful, so if you know a scenario where this might be useful please add a comment and enlighten me :). Anyway, this is what I've been to figure out about this flag so I hope it helps...

The Flags of FILE_OBJECTs - Part IV

2012-06-28T11:35:00.000-07:00

This is the last part in the series of posts on the flags of the FILE_OBJECT. The previous posts can be found here:

And here is the final set of flags:

FO_FILE_FAST_IO_READ - 0x00080000 - according to the documentation page for the FILE_OBJECT structure, this flag is set to indicate that a fast I/O read was performed on this FILE_OBJECT. However, looking at the FastFat source code reveals more. First, the flag can be set when the file is read through a regular request as well, not only through fast IO (see FatCommonRead and FatMultiAsyncCompletionRoutine). There is also an interesting bit in the create path (FatCommonCreate), where the flag is set if the caller requested EXECUTE access. So it appears that in fact this flag is set when the user performs a read on the file, but not for paging reads (which makes sense, paging reads are, in a way, the system reading from a file and not exactly the user). The reason for this becomes clear when looking at where the flag is used in FastFat: in the FatUpdateDirentFromFcb function, which uses the flag to figure out if it needs to update the last access time on the file.
FO_RANDOM_ACCESS - 0x00100000 - this flag is set to indicate that the caller intends to use this FILE_OBJECT for random access (forward and backward seeks). It maps to the FILE_RANDOM_ACCESS flag and is set when creating the handle. The flag is pretty similar to the FO_SEQUENTIAL_ONLY but works in the opposite way: it's an indication to the cache manager that it can't assume that once a location in the file was touched once it won't be touched again so it shouldn't be overly aggressive in removing things from the cache. Unlike FO_SEQUENTIAL_ONLY this flag can't be set or queried using the FileModeInformation.
FO_FILE_OPEN_CANCELLED - 0x00200000 - this flag indicates that the IRP_MJ_CREATE for the FILE_OBJECT was cancelled. In other words, someone called IoCancelFileOpen or FltCancelFileOpen. The flag is set in IoCancelFileOpen and is used in IopParseDevice where the IO manager needs to figure out if it should cleanup the FILE_OBJECT (by issuing an IRP_MJ_CLOSE) or not. The documentation for the FltCancelFileOpen routine is also a pretty good source of information for this flag.
FO_VOLUME_OPEN - 0x00400000 - this flag indicates that the FILE_OBJECT is an open for a volume. Interestingly enough this flag is set by the IO manager after the IRP_MJ_CREATE IRP completes, so it would not be available in preCreate. However, preCreate is where it's most useful (see the simrep and MetadataManager minifilter samples in the WDK) and so FltMgr parses the IRP_MJ_CREATE parameters and set the FO_VOLUME_OPEN flag when appropriate so that it's available even in preCreate (see this NTFSD post). This means that this flag is usable by minifilters for all operations. For legacy filters things are more complicated (as usual): they might see the flag set if there is a FltMgr frame above them but they might not see it otherwise (so I guess legacy filters shouldn't expect to see this flag set).
FO_REMOTE_ORIGIN - 0x01000000 - this is a flag that indicates that the FILE_OBJECT was created to satisfy a remote request, which is something done by remote file systems. There are two functions that deal with this flag: IoIsFileOriginRemote which pretty much just queries this flag, and IoSetFileOrigin which sets it. The documentation for both functions mentions this connection (between these functions and FO_REMOTE_ORIGIN) but fails to explain why this is done. The reason is that the cache manager handles files coming from a remote file system differently and so this flag is a hint to the cache manager.
FO_DISALLOW_EXCLUSIVE - 0x02000000 - this flag is set during the create processing if the FILE_DISALLOW_EXCLUSIVE flag was set on the request. It would appear that this flag can only be set during an IRP_MJ_CREATE operation (there is a definition for FO_FLAGS_VALID_ONLY_DURING_CREATE which is what I'm basing this on). I don't know exactly what this flag does and looking over my notes I've never actually used it or had to interact with it so I guess I'll have to investigate this and write a post on it :).
FO_SKIP_COMPLETION_PORT - 0x02000000 - this flag is pretty well explained in the FILE_OBJECT structure page. Additionally, the documentation for SetFileCompletionNotificationModes also explains how this flag might be used (this flag maps to the FILE_SKIP_COMPLETION_PORT_ON_SUCCESS flag). There is also this post that explains a bit more about the flags that affect behavior of IO completion (the following flags): Designing Applications for High Performance - Part III. I'm guessing this flag can be set using the FILE_IO_COMPLETION_NOTIFICATION_INFORMATION structure and respective information class.
FO_SKIP_SET_EVENT - 0x04000000 - this flag is similar to FO_SKIP_COMPLETION_PORT. It maps to the FILE_SKIP_SET_EVENT_ON_HANDLE (as per the page for the SetFileCompletionNotificationModes function).
FO_SKIP_SET_FAST_IO - 0x08000000 - this flag appears to be similar to the previous ones but doesn’t seem to be something that can be set using SetFileCompletionNotificationModes(). Based on the documentation page for the FILE_OBJECT structure it seems that this flag controls whether the FILE_OBJECT event should be signaled when an operation goes through a fast IO call and therefore completes inline. Naturally in that case there is no need to set the event. Looking at wdm.h it would seem that this flag maps to the FILE_SKIP_SET_USER_EVENT_ON_FAST_IO.

The Flags of FILE_OBJECTs - Part III

2012-06-21T08:00:00.000-07:00

This is the third part in the series about FILE_OBJECT flags. Hopefully the next one will be the last one, I didn't expect this would take this long. Anyway, here is the next set of FILE_OBJECT flags:

FO_DIRECT_DEVICE_OPEN - 0x00000800 - this flag indicates that this FILE_OBJECT was the result of a "direct device open". I have already explored this topic in my previous post on Opening Volume Handles in Minifilters. This means that the FILE_OBJECT is in fact a handle to the storage stack volume instead of the file system volume. Please note that this flag is not the same as the FO_VOLUME_OPEN flag (and in fact they're incompatible). This flag isn't really used by file systems at all since the target device is not a file system device but rather the actual storage underneath a file system. As such file system filters will also not be involved in processing FILE_OBJECTs that have this flag set.
FO_FILE_MODIFIED - 0x00001000 - this flag indicates that the FILE_OBJECT was used to modify the file. It is primarily used by the file system (to track handles that didn't modify the file or to figure out when it should flush things). As far as I can tell the IO manager doesn't really use it but it will set it. The FastFat sample in the WDK shows in great detail how FO_FILE_MODIFIED is used.
FO_FILE_SIZE_CHANGED - 0x00002000 - this flag indicates that the file size was changed using this FILE_OBJECT. This is also primarily used by the file system and the IO manager will only set it. This flag is used in conjunction with FO_FILE_MODIFIED and, like FO_FILE_MODIFIED, the FastFat sample is a great example of a file system uses it.
FO_CLEANUP_COMPLETE - 0x00004000 - this flag means that the file system has already processed an IRP_MJ_CLEANUP request on this FILE_OBJECT. There are certain restrictions on which operations are allowed on such FILE_OBJECTs, as can be seen in both the CDFS and FastFat samples. Windows actually uses such FILE_OBJECTs (that have been cleaned up) fairly frequently and thus filters might need to pay attention to this flag in order to be able to figure out which operations might be allowed on it.
FO_TEMPORARY_FILE - 0x00008000 - this flag indicates that the file was opened with the FILE_ATTRIBUTE_TEMPORARY. File systems don't seem to do much beyond setting the flag. The consumer of this information is the Cache Manager, that does more aggressive write caching for temporary files (since it's expected that the file will go away anyway and so there is no point in flushing data to disk; please note that memory pressure might require that data gets flushed, it's just that the caching is more aggressive). See the notes under "Caching Behavior" on the page for the CreateFile function.
FO_DELETE_ON_CLOSE - 0x00010000 - as noted in my previous post on Using FileModeInformation, I've never seen this flag used anywhere until I looked at the implementation of IopGetModeInformation. So as far as I can tell this flag really isn't used. In the current implementation however it can be read by a query for the FileModeInformation class.
FO_OPENED_CASE_SENSITIVE - 0x00020000 - this flag indicates whether name operations on the FILE_OBJECT should be performed in a case-sensitive way or not. This is very important for file system filters that deal with the namespace. This applies not only to filters that change the name space (name providers must know whether to process things in a case-sensitive or in a case-insensitive way) but also for filters that just look at file names and don't actually change anything. There is a related flag that is very important, the SL_CASE_SENSITIVE. I've written multiple posts that mention case sensitivity but the more interesting ones are the one on FILE_OBJECT Names in IRP_MJ_CREATE and the one on Names in Minifilters - Implementing Name Provider Callbacks.
FO_HANDLE_CREATED - 0x00040000 - this flag indicates that a handle has been created for the FILE_OBJECT. From the IO manager's perspective the CREATE operation has completed successfully (so it's too late to call FltCancelFileOpen or IoCancelFileOpen). The flag is set after the successful IRP_MJ_CREATE is completed back to the IO manager, in the IopParseDevice function. The documentation for ObOpenObjectByPointer implies that it's only safe to be called for FILE_OBJECTs that have this flag set. I should have mentioned this in my previous post on Duplicating User Mode Handles, but I can see I've only added an ASSERT but didn't explain it.

The Flags of FILE_OBJECTs - Part II

2012-06-14T08:00:00.000-07:00

In the post last week we discussed some of the various flags of the FILE_OBJECT, but since there are so many of them I left some for this week. So here we go again:

FO_SEQUENTIAL_ONLY - 0x00000020 - this flag is set to indicate that the caller guarantees that this FILE_OBJECT will only be used for sequential requests (no seeks). It maps to the FILE_SEQUENTIAL_ONLY flag and is usually set when Creating the handle, though it's possible to set and query it through the FileModeInformation information class. This flag is simply an indication to the cache manager that it can be more aggressive about removing things from the file system cache (since the flag implies that once data at a certain offset was touched it will likely not be needed again). If the user sets this flag but doesn't honor the contract (i.e. if IO isn't actually sequential (so it's at random offsets in the file)) then the cache performance will not be optimal. This flag (FILE_SEQUENTIAL_ONLY) is useful when copying a file or scanning a file for hashing and so on. Without the FILE_SEQUENTIAL_ONLY flag the file will be kept in the cache and will compete with everything else in there, so it might evict useful data from the cache (for example you have an application open and you switch to explorer and copy some files around... if the files that are copied are opened with the FILE_SEQUENTIAL_ONLY flag then the cache will likely prioritize existing things in cache over the file contents so when you switch back to your application it will still be in the cache. Without the flag it is more likely that the application will be removed from the cache because of the file copy...).
FO_CACHE_SUPPORTED - 0x00000040 - according to the MSDN page on the FILE_OBJECT structure this flag should only be set if the file system implements supports for the FSRTL_ADVANCED_FCB_HEADER structure. However, all windows file systems should implement this structure anyway. This flag is set by the filesystem as a means to keep track whether the open for this FILE_OBJECT had the FILE_NO_INTERMEDIATE_BUFFERING set (if FILE_NO_INTERMEDIATE_BUFFERING was set then this must be clear). This flag isn't terribly useful since the flag that controls most of the caching behavior is FO_NO_INTERMEDIATE_BUFFERING. I'm not sure why someone felt there was a need to add this flag when FO_NO_INTERMEDIATE_BUFFERING conveys the same information.
FO_NAMED_PIPE - 0x00000080 - this flag indicates that the FILE_OBJECT is for a named pipe. This is set by the file system (NPFS in this case) so it's not available in preCreate. According to a post on NTFSD, this flag can also be set for the connection to the IPC$ by the redirector, even though that FILE_OBJECT is not backed by NPFS on the remote server. On the local system everything works pretty much as you expect, with the flag being set for all NPFS FILE_OBJECTs.
FO_STREAM_FILE - 0x00000100 - this flag is set for FILE_OBJECTs created using the IoCreateStreamFileObject API (or friends: IoCreateStreamFileObjectEx, IoCreateStreamFileObjectEx2 and IoCreateStreamFileObjectLite). I've discussed stream FILE_OBJECTs in my post About IRP_MJ_CREATE and minifilter design considerations - Part V. Please note that this does NOT indicate the FILE_OBJECT is for an alternate data stream (I've seen some people confuse them). The two concepts (stream file objects and alternate data streams) are unrelated.
FO_MAILSLOT - 0x00000200 - this flag is similar in intent with the FO_NAMED_PIPE flag, except that in this case it indicates that the underlying file system is the mailslots file system (MSFS). According to the OSR thread that I pointed to earlier, apparently for FILE_OBJECTs for remote file systems the redirector doesn't set the FO_MAILSLOT flag even if the remote FILE_OBJECT has it. But anyway for local FILE_OBJECTs this flag should be set just like the FO_NAMED_PIPE flag.
FO_GENERATE_AUDIT_ON_CLOSE - 0x00000400 - as the MSDN page on the FILE_OBJECT structure points out this flag is deprecated. It has the same value as the next flag, FO_QUEUE_IRP_TO_THREAD.
FO_QUEUE_IRP_TO_THREAD - 0x00000400 - the only information I've been able to find has also been the MSDN page on the FILE_OBJECT structure. According to that page it means that IRPs will not be queued to this FILE_OBJECT and will be queued to the thread instead (at least that's what I get from the name). This has to do with the thread agnostic IO that was a new feature for Vista. The idea is that instead of queuing IO to a thread there are cases where it can be queued to a FILE_OBJECT (as an optimization). This is described in more detail in a post by PaulSli on Doron's blog, New for Windows Vista: thread agnostic I/O. My guess is that this flag disables that optimization. I've seen this flag set occasionally but I've never had to figure out where it's set and under what circumstances that happens. Still, I hope that this bit of investigation helps.

This is it for today, these things take quite a while to investigate so I can't do too many of them at once.

The Flags of FILE_OBJECTs - Part I

2012-06-07T10:55:00.000-07:00

In this set of posts I'd like to discuss the various flags defined for FILE_OBJECTs and how they are used in the system. The reason I'm doing this is because for most of them the documentation is scarce or wrong and I've been meaning to document them for a while. Please note that most of this information is based on my experience with them and is not meant to be an exhaustive list of how they might be used.

The only documentation I've found on MSDN is on the page about the FILE_OBJECT structure, but as I said before it's very scarce on the details. So without further delay let's start with the flags:

FO_FILE_OPEN - 0x00000001 - this flag is documented as being deprecated. The sample file systems (FastFat, CDFS) don't even mention it anywhere. I've also been unable to find it set anywhere (I expected either the file system to set it or the IO manager to set it but as far as I can tell it's not really used at all).
FO_SYNCHRONOUS_IO - 0x00000002 - this is a pretty complicated flag (in that it controls different types of behavior) . It can only be set during the create operation (before the FILE_OBJECT is sent through the IO stack) if the caller uses the FILE_SYNCHRONOUS_IO_ALERT or FILE_SYNCHRONOUS_IO_NONALERT options. If none of these is set then the FILE_OBJECT won't have the flag set. This flag controls a couple of different behaviors:
- if the flag is set then the FILE_OBJECT->CurrentByteOffset field is updated after each IO operation to reflect the current file position pointer and also it enables using the FILE_OBJECT->CurrentByteOffset as the current file offset to perform IO at (FILE_USE_FILE_POINTER_POSITION can only be used for a file that has FO_SYNCHRONOUS_IO set). So in a way this flag controls whether the current position can be used for this FILE_OBJECT.
- if the flag is set then the IO manager will try to synchronize operations using that FILE_OBJECT. So if multiple threads use the same FILE_OBJECT then the IO manager will only allow one operation at a time to proceed. Please note that this doesn't mean that file system filter will only see one operation at a time since it's possible that other components on the system issue IO directly on the FILE_OBJECT without synchronizing with the IO manager (pretty typical with the Cache Manager).
- If the flag is set then IoIsOperationSynchronous (and its friend FltIsOperationSynchronous) will return TRUE (as long as this isn't an asynchronous paging IO request, in which case it doesn't matter whether FO_SYNCHRONOUS_IO is set). The return value of IoIsOperationSynchronous and FltIsOperationSynchronous might have deeper impact on how the request is processed by the file system or file system filters (which might decide to process it inline rather than queue it if it's synchronous).
FO_ALERTABLE_IO - 0x00000004 - as the FILE_OBJECT structure documentation page mentions, this flag controls how the IO manager waits when using the FILE_OBJECT. If the flag is set then most waits (which can be waits to acquire the FILE_OBJECT to synchronize IO on it or just waiting for IO to complete) are alertable. The flag will be set depending on the presence of FILE_SYNCHRONOUS_IO_ALERT or FILE_SYNCHRONOUS_IO_NONALERT. The flag is only meaningful for FILE_OBJECTs for which FO_SYNCHRONOUS_IO is set and can be set and cleared in the Create path and using the FileModeInformation information class. Please note that setting or clearing it using FileModeInformation only works if the FILE_OBJECT has the FO_SYNCHRONOUS_IO set. Another thing to note is that file systems seem to ignore this flag.
FO_NO_INTERMEDIATE_BUFFERING - 0x00000008 - this flag indicates whether the file was opened for non-cached IO. It maps to the FILE_NO_INTERMEDIATE_BUFFERING flag and can only be set during Create. It controls two different behaviors as well:
- If it is set then there is no file system caching for this FILE_OBJECT. The implications here are pretty heavy. A lot of the cache management specific operations and flags are disabled (I'll discuss this more further down when I talk about some of the other flags). Also the file system code paths are quite different when dealing with cached IO and non-cached IO. This also means that for IO happening on this FILE_OBJECT the file system might need to perform flushes to keep the cached and non-cached views of the data coherent.
- Another implication of this flag being set is that IO must be aligned to the sector size of the underlying device. This impacts reads and writes as well as setting the current file pointer for example. Unaligned operations on this FILE_OBJECT will generally fail at the IO manager level with STATUS_INVALID_PARAMETER.
FO_WRITE_THROUGH - 0x00000010 - this flag indicates that any data written on this FILE_OBJECT should be flushed immediately to disk. It maps to the FILE_WRITE_THROUGH flag. It can be set in the Create file path and also at any time with the FileModeInformation information class. This flag cannot be set when FO_NO_INTERMEDIATE_BUFFERING is set (it makes no sense if there is no cache at all). Please note that this flag doesn't require that operations be aligned to the underlying sector size, since caching is still in effect, it's just that writes are immediately flushed to disk.

I'll stop here for today but I plan to address all the flags in the following posts.

Using FileModeInformation

2012-05-31T11:07:00.001-07:00

I've recently had to debug again how the system handles the FileModeInformation information class and whenever that happens (debugging something more than once) it's generally a good indication that I need to write a blog post, at least so that I don't have to debug yet again in the future. Incidentally, it's quite scary when I occasionally search for something and I discover that I have actually written a blog post about it but I've completely forgotten not only the information in the post but even the fact that I have written about it. I hope this explanation will come in quite handy at some point in the future when I'll actually write a blog post about something I've already blogged about :).

Anyway, on to business. FileModeInformation is an interesting information class because it can be used to query and set some of the create options that were used when the file was opened. Another reason why it is interesting is because it's completely handled in the IO manager (i.e. a request is never sent to the file system). Since this information class is also part of the FileAllInformation information class, the fact that the file system doesn't handle this leads to pretty interesting semantics (since the IO manager must fill in part of the FILE_ALL_INFORMATION structure while the file system fills in the rest). I've discussed this in my previous blog post on Filters And IRP_MJ_QUERY_INFORMATION.

Other than the specifics of implementing the FileAllInformation class in a filter, there are a couple of other aspects that I think are interesting to discuss about the FileModeInformation class.

Querying - the list of CreateOptions that can be queried using the FileModeInformation is limited to (the information is listed on the MSDN page 2.4.24 FileModeInformation):
- FILE_WRITE_THROUGH
- FILE_SEQUENTIAL_ONLY
- FILE_NO_INTERMEDIATE_BUFFERING
- FILE_SYNCHRONOUS_IO_ALERT
- FILE_SYNCHRONOUS_IO_NONALERT
- FILE_DELETE_ON_CLOSE - please note that the MSDN documentation states that this flag isn't implemented yet and is never returned as set. However, peeking at the implementation (do a uf nt!IopGetModeInformation) clearly shows that the FILE_DELETE_ON_CLOSE is set when the FILE_OBJECT flag FO_DELETE_ON_CLOSE is set. This is the only time I've actually seen FO_DELETE_ON_CLOSE used anywhere. I had tried a couple of times in the past to see where it gets set but could never find it (though obviously I couldn't rule out the possibility that it does get set somewhere). Now, looking at the implementation of IopGetModeInformation and at the documentation page I can draw the conclusion that this flag isn't actually set anywhere at all.
Setting - the list of attributes that can be set is even shorter. It is addressed in the MSDN page 3.1.5.14.7 FileModeInformation. One thing to note is that the list of attributes that can be set is also available as a bitmask in the WDK, as FILE_VALID_SET_FLAGS. Also the list of flags that can be set using FileModeInformation indicates which of these modes make sense to change for a file that has already been open (and possibly in use) for a while (in other words, flags that can change file system behavior at a certain point and not only when the file is opened; for example, setting the FILE_DELETE_ON_CLOSE attribute doesn't make sense once the file was opened so it's not available to set). Anyway, these are the attributes that can be changed using the FileModeInformation information class:
- FILE_WRITE_THROUGH
- FILE_SEQUENTIAL_ONLY
- FILE_SYNCHRONOUS_IO_ALERT
- FILE_SYNCHRONOUS_IO_NONALERT
Issuing these requests from a minifilter - this is a pretty interesting issue. The dedicated FltMgr APIs, FltQueryInformationFile and FltSetInformationFile do little else that to allocate a FLT_CALLBACK_DATA structure and send it to the file system. However, since the file system doesn't actually implement these requests it's impossible to use FileModeInformation with either FltQueryInformationFile or FltSetInformationFile (though a request will actually be sent). Since the actual implementation happens in the IO manager a filter could call the ZwQueryInformationFile and ZwSetInformationFile APIs but those require a handle for the FILE_OBJECT and filters don't often have that. So what can a file system filter do if it wants to query or set these flags ? As far as I can tell there are two options:
1. The file system filter can implement the equivalent IO manager code (which is actually not that hard since the steps are documented pretty well on the MSDN page 3.1.5.14.7 FileModeInformation and the implementation for nt!IopGetModeInformation is pretty straightforward).
2. The file system filter can call an undocumented API, IoSetInformation(), that is a pretty good replacement for ZwSetInformationFile(). Unfortunately I couldn't find a similar function for ZwQueryInformationFile() so filters would most likely have to resort to checking flags directly.

Writing to Read-Only Files

2012-05-24T15:06:00.000-07:00

This week I want to talk about a topic that's pretty interesting, the topic of writing to a read-only file. I've mentioned this in my post About IRP_MJ_CREATE and minifilter design considerations - Part VI but I want to discuss it in a bit more depth.

Why is writing to a read-only file important ? Well, for one, it allows one to implement a file system based synchronization mechanism. It might also be useful for filters that might need to write data to files for such purposes as tracking access or simply to virtualize a file's data (deduplication filters and HSM might want to write to a read-only file , if only to put the original data in).

So first let's look at how the access checks when writing to a file actually work. It's a pretty straightforward operation:

Caller calls NtWriteFile with a file handle.
NtWriteFile tries to resolve the handle to a FILE_OBJECT by calling ObReferenceFileObjectForWrite()
ObReferenceFileObjectForWrite() gets the handle information from the handle extracts the actual access that was granted to the caller of the handle.
ObReferenceFileObjectForWrite() then a simple bit check between the requested access (which is for write) and the one granted to the handle. If the granted access doesn't include write this is where STATUS_ACCESS_DENIED is returned.

From this it's clear that an easy way to be able to write to handle is to have been granted that access when the handle was created. So one simple way to achieve the goal we set for ourselves in the title is to open a file that is not a read-only file for write and then set the read-only attribute. This can be done on the handle we have (that has been granted write access) and since we made the file read-only no one else open a handle for write, while we can use the handle to write to the file. However, if we close this handle then we can't open another handle for write on that file since it's now a read-only file. So this also shows a potential limitation when clearing the read-only flag: the handle where we'll do that can't have write data access and so it would be necessary to reset the read-only attribute on one handle and then open another handle with write data access to write data to the file.

I find this quite interesting because it exposes how the internal implementation of the handle access checks but it's not necessarily relevant to file system filters, since they can use the FILE_OBJECT directory to perform any write they want. This might be useful, however, to user mode services that work in conjunction with the file system filter that might require using a handle for IO.

Anyway, the thing I really wanted to point out was that there is another case when writing on a read-only file (a file that has a read-only attribute) is possible, a case that doesn't require that the file be opened without the read-only flag at all. The case I'm talking about happens when creating a read-only file. If the caller of the CreateFile (or one of the CreateFile APIs) is actually creating the file (it's not an "open" type of create) and they're asking for it to be a read-only file but at the same time they're requesting write access, then if the create operation succeeds the handle they get back can actually be used to write to the file. In my experience this behavior isn't that well known. It is also quite useful for implementing various types of atomic synchronization using the file system (creating a file with GENERIC_WRITE, CREATE_ALWAYS and FILE_ATTRIBUTE_READONLY will only succeed for the first caller and will fail for subsequent callers since the file will be read-only… the first caller can then reset the read-only attribute on the file and the next subsequent request with the same parameters will succeed (provided the sharing mode allows it)).

This is the kind of semantic that makes things interesting for a file system filter that tries to open files before the user (i.e. open the target file for an IRP_MJ_CREATE from the PreCreate callback for that operation). This is generally not a good idea for many reasons and this is just another one of them. It's also a good reminder of the subtle some of the file system semantics are and how easy it is for a filter to change things in a way that might break things.

More Fun with FileIDs: Boot Critical Files

2012-05-17T09:08:00.000-07:00

As you might have guessed from my previous posts, I'm actively working on a file system filter that uses FileIDs to open files. As it happens this isn't something most developers do so I occasionally find myself in some dark corner of the file system that I had no idea existed. Today's post is about a debugging session where I ran into one of these corners.
This all started when I was testing my filter on Server 2008 R2. I have tested it extensively on Win7 AMD64 and since Srv08R2 is pretty much identical in terms of the kernel and file system code I didn't expect I'd run into any issues. However, to my surprise, the machine wouldn't boot, crashing with the following bugcheck (edited a couple of lines here and there to make the output smaller, and highlighted the important bits):

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NTFS_FILE_SYSTEM (24)
    If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
    parameters are the exception record and context record. Do a .cxr
    on the 3rd parameter and then kb to obtain a more informative stack
    trace.
Arguments:
Arg1: 00000000001904fb
Arg2: fffff880009903b8
Arg3: fffff8800098fc10
Arg4: fffff880017504cf

Debugging Details:
------------------

ExceptionAddress: fffff880017504cf (Ntfs! ?? ::NNGAKEGL::`string'+0x0000000000013040)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000070
Attempt to read from address 0000000000000070

CONTEXT:  fffff8800098fc10 -- (.cxr 0xfffff8800098fc10)
rax=fffff8a000aaa5e0 rbx=fffff8a000aaa940 rcx=0000000000000000
rdx=fffff88000990501 rsi=0000000000000000 rdi=0000000000000000
rip=fffff880017504cf rsp=fffff880009905f0 rbp=fffff88000990800
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=fffff880009905d0 r12=0000000000000000 r13=fffff8a000aaa501
r14=fffffa800e044500 r15=fffff8a000aaa760
iopl=0         nv up ei pl nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
Ntfs! ?? ::NNGAKEGL::`string'+0x13040:
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h] ds:002b:00000000`00000070=????????????????
Resetting default scope

PROCESS_NAME:  lsass.exe

CURRENT_IRQL:  2

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000070

READ_ADDRESS:  0000000000000070 

FOLLOWUP_IP: 
Ntfs! ?? ::NNGAKEGL::`string'+13040
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h]

FAULTING_IP: 
Ntfs! ?? ::NNGAKEGL::`string'+13040
fffff880`017504cf 488b5770        mov     rdx,qword ptr [rdi+70h]

BUGCHECK_STR:  0x24

DEFAULT_BUCKET_ID:  NULL_CLASS_PTR_DEREFERENCE

LAST_CONTROL_TRANSFER:  from fffff880016d8f46 to fffff880017504cf

STACK_TEXT:  
fffff880`009905f0 fffff880`016d8f46 : fffffa80`0e044500 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs! ?? ::NNGAKEGL::`string'+0x13040
fffff880`00990660 fffff880`016d96b4 : fffffa80`0e044500 fffffa80`0e6febb0 fffffa80`0d6d1420 00000000`00000000 : Ntfs!NtfsCommonFlushBuffers+0x3f2
fffff880`00990740 fffff880`00c02bcf : fffffa80`0e6fee78 fffffa80`0e6febb0 fffffa80`0e044500 fffff880`00990768 : Ntfs!NtfsFsdFlushBuffers+0x104
fffff880`009907b0 fffff880`00c016df : fffffa80`0d27d730 00000000`00000000 fffffa80`0d27d700 fffffa80`0e6febb0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`00990840 fffff800`019e171b : 00000000`00000002 fffffa80`0d6cf640 00000000`00000000 fffffa80`0e6febb0 : fltmgr!FltpDispatch+0xcf
fffff880`009908a0 fffff800`01978871 : fffffa80`0e6febb0 fffffa80`0e54b6a0 fffffa80`0d6cf640 fffff800`0184be80 : nt!IopSynchronousServiceTail+0xfb
fffff880`00990910 fffff800`016d88d3 : fffffa80`0e54b6a0 fffff8a0`016edeb0 fffffa80`0d27d730 fffffa80`0d6cf640 : nt!NtFlushBuffersFile+0x171
fffff880`009909a0 fffff800`016d4e70 : fffff800`01979627 00000000`00000010 00000000`ffffffff 00000000`00000001 : nt!KiSystemServiceCopyEnd+0x13
fffff880`00990b38 fffff800`01979627 : 00000000`00000010 00000000`ffffffff 00000000`00000001 00000000`00000000 : nt!KiServiceLinkage
fffff880`00990b40 fffff800`019797a7 : 00000000`0000001e fffff880`00990bd0 00000000`00b1e801 fffff800`0000001f : nt!CmpFileFlush+0x3f
fffff880`00990b80 fffff800`016d88d3 : fffffa80`0e54b6a0 fffff8a0`016edeb0 fffff880`00990ca0 00000000`00c17120 : nt!NtFlushKey+0xfb
fffff880`00990c20 00000000`770a1f7a : 000007fe`fcbf3b76 00000000`00000000 00000000`00c17120 00000000`00c17120 : nt!KiSystemServiceCopyEnd+0x13
00000000`00b1eba8 000007fe`fcbf3b76 : 00000000`00000000 00000000`00c17120 00000000`00c17120 00000000`00000000 : ntdll!ZwFlushKey+0xa
00000000`00b1ebb0 000007fe`fe7323d5 : 00000000`00c17101 000007fe`00000000 00000000`00000001 00000000`00b1eea0 : SAMSRV!SamrCloseHandle+0xf6
00000000`00b1ec00 000007fe`fe7db68e : 00000000`00000002 00000000`00000001 000007fe`fcc80220 00000000`00c20c30 : RPCRT4!Invoke+0x65
00000000`00b1ec50 000007fe`fe71ac40 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : RPCRT4!Ndr64StubWorker+0x61b
00000000`00b1f210 000007fe`fe7250f4 : 230100ef`00000001 00000000`ac896745 00002f00`e3236899 00000000`00000001 : RPCRT4!NdrServerCallAll+0x40
00000000`00b1f260 000007fe`fe724f56 : 00000000`00c20ad0 00000000`00000018 00000000`00b1f410 00000000`00c1b330 : RPCRT4!DispatchToStubInCNoAvrf+0x14
00000000`00b1f290 000007fe`fe71d879 : 00000000`00bda928 000007fe`fe72db94 00000000`00000001 00000000`00000001 : RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x146
00000000`00b1f3b0 000007fe`fe71d6de : 00000000`00c20ad0 00000000`00c20ad0 00000000`00000003 00000000`00000000 : RPCRT4!OSF_SCALL::DispatchHelper+0x159
00000000`00b1f4d0 000007fe`fe71d4f9 : 00000000`00000014 00000000`00000000 00000000`00c1a2d0 00000000`00c20ad0 : RPCRT4!OSF_SCALL::ProcessReceivedPDU+0x18e
00000000`00b1f540 000007fe`fe71d023 : 00000000`00c20a18 00000000`00c20a70 00000000`00000000 00000000`00000000 : RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+0x3e9
00000000`00b1f5f0 000007fe`fe71d103 : 00000000`00bda850 00000000`00b1f9c0 00000000`00c20a18 00000000`00b1f8d8 : RPCRT4!CO_ConnectionThreadPoolCallback+0x123
00000000`00b1f6a0 000007fe`fd1c908f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : RPCRT4!CO_NmpThreadPoolCallback+0x3f
00000000`00b1f6e0 00000000`7706098a : 00000000`00485840 00000000`00000000 00000000`77154da0 00000000`77070043 : KERNELBASE!BasepTpIoCallback+0x4b
00000000`00b1f720 00000000`7706feff : 00000000`00000002 00000000`00000000 00000000`00c20a70 00000000`00b1f9c0 : ntdll!TppIopExecuteCallback+0x1ff
00000000`00b1f7d0 00000000`76e4652d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!TppWorkerThread+0x3f8
00000000`00b1fad0 00000000`7707c521 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00b1fb00 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

So as you can see, this was an access violation in NTFS, in some function that the debugger couldn't resolve properly (hence the weird Ntfs! ?? ::NNGAKEGL::`string'+0x13040 name). The stack shows how this happens when a registry key is flushed as a result of closing a handle, and the registry eventually ends up flushing the actual hive file which makes it to NTFS where it bugchecks.
The first thing to do is to try to figure out what the function is that NTFS actually bugchecks in:

kd> ub Ntfs!NtfsCommonFlushBuffers+0x3f2
Ntfs!NtfsCommonFlushBuffers+0x3c7:
fffff880`016d8f1b 488b4348        mov     rax,qword ptr [rbx+48h]
fffff880`016d8f1f 488b4818        mov     rcx,qword ptr [rax+18h]
fffff880`016d8f23 48898c24f8000000 mov     qword ptr [rsp+0F8h],rcx
fffff880`016d8f2b 8b051389fdff    mov     eax,dword ptr [Ntfs!NtfsDebugDiskFlush (fffff880`016b1844)]
fffff880`016d8f31 4c8b8424f8000000 mov     r8,qword ptr [rsp+0F8h]
fffff880`016d8f39 488b542468      mov     rdx,qword ptr [rsp+68h]
fffff880`016d8f3e 498bcc          mov     rcx,r12
fffff880`016d8f41 e84efaffff      call    Ntfs!NtfsFlushBootCritical (fffff880`016d8994)

So NtfsCommonFlushBuffers calls NtfsFlushBootCritical, which then jumps around a bit and, due to pretty aggressive optimizations, ends up in a location that the debugger doesn't have a name for, hence the "Ntfs! ?? ::NNGAKEGL::`string'+0x13040" name.
Anyway, next step in this case is to try to find the FILE_OBJECT and see if it's somehow damaged or something. I also generally like to find the IRP and, since I'm working on minifilter, I also like to get the FLT_CALLBACK_DATA structure. This isn't too bad in most cases but it's a pain on amd64 systems because the calling convention makes sure the IRP_CTRL (the parent structure for the FLT_CALLBACK_DATA) isn't passed as a parameter on the stack. Also, since FltMgr implements a callback model it means that while my filter was definitely involved in processing this IO, my callback functions (for which I have symbols and I could get local variables and such) have already returned and can't be found on this stack. So the easiest thing to do in this case is to try to find the IRP_CALL_CTRL structure (see these posts for more details on what that is and how to get it: Debugging Minifilters: Finding the FLT_CALLBACK_DATA - Part I and Debugging Minifilters: Finding the FLT_CALLBACK_DATA - Part II).

kd> dp fffff880`00990840
fffff880`00990840  fffffa80`0d27d730 00000000`00000000
fffff880`00990850  fffffa80`0d27d700 fffffa80`0e6febb0
fffff880`00990860  fffffa80`0d3875a0 fffffa80`0e6febb0
fffff880`00990870  fffffa80`0e8c7790 ffffffff`ffffffff
fffff880`00990880  00000000`00000000 fffffa80`00000204
fffff880`00990890  fffffa80`0e6febb0 fffff800`019e171b
fffff880`009908a0  00000000`00000002 fffffa80`0d6cf640
fffff880`009908b0  00000000`00000000 fffffa80`0e6febb0
kd> dt fffff880`00990840+0x20 fltmgr!_IRP_CALL_CTRL
   +0x000 Volume           : 0xfffffa80`0d3875a0 _FLT_VOLUME
   +0x008 Irp              : 0xfffffa80`0e6febb0 _IRP
   +0x010 IrpCtrl          : 0xfffffa80`0e8c7790 _IRP_CTRL
   +0x018 StartingCallbackNode : 0xffffffff`ffffffff _CALLBACK_NODE
   +0x020 OperationStatusCallbackListHead : _SINGLE_LIST_ENTRY
   +0x028 Flags            : 0x204 (No matching name)
kd> !fltkd.cbd 0xfffffa80`0e8c7790

IRP_CTRL: fffffa800e8c7790  FLUSH_BUFFERS (9) [00000001] Irp
Flags                    : [10000000] FixedAlloc
Irp                      : fffffa800e6febb0 
DeviceObject             : fffffa800d27d730 "\Device\HarddiskVolume1"
FileObject               : fffffa800d6d1420 
CompletionNodeStack      : fffffa800e8c78e0   Size=5  Next=1
SyncEvent                : (fffffa800e8c77a8)
InitiatingInstance       : 0000000000000000 
Icc                      : fffff88000990860 
PendingCallbackNode      : ffffffffffffffff 
PendingCallbackContext   : 0000000000000000 
PendingStatus            : 0x00000000 
CallbackData             : (fffffa800e8c7840)
 Flags                    : [00000001] Irp
 Thread                   : fffffa800e54b6a0 
 Iopb                     : fffffa800e8c7898 
 RequestorMode            : [00] KernelMode
 IoStatus.Status          : 0x00000000 
 IoStatus.Information     : 0000000000000000 
 TagData                  : 0000000000000000 
 FilterContext[0]         : 0000000000000000 
 FilterContext[1]         : 0000000000000000 
 FilterContext[2]         : 0000000000000000 
 FilterContext[3]         : 0000000000000000 

   Cmd     IrpFl   OpFl  CmpFl  Instance FileObjt Completion-Context  Node Adr
--------- -------- ----- -----  -------- -------- ------------------  --------
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7ae0
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7a60
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c79e0
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [0,0]    00000000  00   0000   0000000000000000 0000000000000000 0000000000000000-0000000000000000   fffffa800e8c7960
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 [9,0]    00060004  00   0002   fffffa800d43d610 fffffa800d6cf640 fffff880014ecea0-0000000000000000   fffffa800e8c78e0
            ("ivm","IVM")  ivm!IvmFSPostOpDispatch 
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Working IOPB:
>[9,0]    00060004  00          fffffa800e73e010 fffffa800d6d1420                     fffffa800e8c7898
            ("luafv","luafv")  
     Args: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000

So now that I have the FLT_CALLBACK_DATA I need to get the FILE_OBJECT. Please note that the FILE_OBJECT changes between the parameters my filter gets and the ones FltMgr is using now (after my filter was called), which make sense since I replace the FILE_OBJECT in the callback. So let's see what that FILE_OBJECT looks like and what the file is:

kd> !fileobj fffffa800d6d1420

掃

Related File Object: 0xfffffa800d43d460

Device Object: 0xfffffa800d301cd0   \Driver\volmgr
Vpb: 0xfffffa800cee62f0
Event signalled
Access: Read Write 

Flags:  0x140008
 No Intermediate Buffering
 Handle Created
 Random Access

FsContext: 0xfffff8a000aaa6f0 FsContext2: 0xfffff8a000aaa940
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: fffffa800d6d3af8
  Shared Cache Map: 00000000


File object extension is at fffffa800d41b780:

As you have undoubtedly noticed, the file name is actually pretty strange (and in case you can read that character, one could possibly associate sweeping with flushing a file, but it's a bit of a stretch :)) . Whenever I see a filename that looks like one or a couple of characters that don't belong I suspect that that file was opened by ID (and in this case I actually know my filter does this so I was sort of expecting it anyway). A very important thing to check at this point is that the FILE_OBJECT is actually an NTFS FILE_OBJECT and I'm not somehow leaking FILE_OBJECTs from my filter into the file system. The easiest way to check is to look at the tags for the buffers in FsContext and FsContext2, since the owner of the FILE_OBJECT allocates those:

kd> !pool 0xfffff8a000aaa6f0 2
Pool page fffff8a000aaa6f0 region is Paged pool
*fffff8a000aaa5b0 size:  580 previous size:   80  (Allocated) *NtfF
  Pooltag NtfF : FCB_INDEX, Binary : ntfs.sys
kd> !pool 0xfffff8a000aaa940 2
Pool page fffff8a000aaa940 region is Paged pool
*fffff8a000aaa5b0 size:  580 previous size:   80  (Allocated) *NtfF
  Pooltag NtfF : FCB_INDEX, Binary : ntfs.sys

One final thing to check is that the actual file name is a file ID. This is pretty complicated to do since a fileID can look almost like anything. The easiest way to do this is to use a VM (so that it's easy to snap back to the same OS at an earlier point) and look at what the file system looks like without the filter installed and use fsutil.exe to resolve the FileID to a name:

kd> dt fffffa800d6d1420 nt!_FILE_OBJECT FileName.
   +0x058 FileName  :  "掃"
      +0x000 Length    : 8
      +0x002 MaximumLength : 0x38
      +0x008 Buffer    : 0xfffff8a0`00aed860  "掃"
kd> dp 0xfffff8a0`00aed860 L1
fffff8a0`00aed860  00010000`00006383

So anyway, this does look like a file ID and it actually was the fileID for the file "\Windows\System32\Config\SYSTEM", which is the system hive. This was pretty confusing because I was expecting I was doing something obviously wrong (like pass the wrong FILE_OBJECT down the stack) and yet I couldn't see anything broken here. So I decided to see what's actually going on in NTFS.
I spent some time looking at the disassembly for NtfsFlushBootCritical but like most NTFS code it's packed with structures I have no idea about. In these cases it's easier to walk through the code in the debugger to see what the structures look like (and the pool tags they use) to try to figure out what they might be. However, NtfsFlushBootCritical doesn't actually get called at all for (almost) any other file. So at this point I was pretty puzzled because it meant NTFS somehow knew that the system hive was a "boot critical" file, unlike any other file on the file system (since that function was never called again). I didn't know that NTFS did special things for files that are deemed boot critical (since NTFS is generally pretty uniform in how it does things, it's rare to see it do something special for a certain windows process) and I had no idea what those files were.
Anyway, as it turned out NtfsFlushBootCritical was called for the system hive even when my driver wasn't there so I could debug it in more detail. After walking through it a couple of times it became obvious that it was walking a list of SCBs, starting from the file I had. The SCB when my filter was present was NULL and then NTFS tried to access a member of the SCB, resulting in a NULL pointer dereference.
Now, I'd like to point out that NTFS keeps a pointer to the parent SCB for each SCB. So the SCB for the file "\Windows\System32\Config\SYSTEM" has a pointer to the SCB for the folder "\Windows\System32\Config", which in turn has a pointer to the SCB for its parent folder, "\Windows\System32" and so on. However this doesn't always happen for files that are opened by ID, because there is no path traversal necessary in order to open the file (NTFS can use the fileID as the index and immediately find the file) and it doesn't make much sense for NTFS to populate the SCB if it's not necessary.
So, obviously, the issue here is that when opening the file \Windows\System32\Config\SYSTEM by ID the pointer to the folder SCB wasn't initialized when NTFS expected it would be and so it ended up dereferencing a NULL pointer.
What puzzled me about this is that this never happened on Win7. So I debugged my code on that platform and guess what, in Win7 the link is actually initialized even when the file is opened by fileID. As it turns out, NTFS will not populate the pointer to the parent SCB if it doesn't have to, but if someone asks for the file name (a filter calling FltGetFileNameInformation() for example) then it will walk the list and build the path and then the SCB pointer will be initialized. This is exactly what happens on Win7 because there are other filters (FileInfo, Luafv) that query names, but on Srv08R2 they're not present (or at least not enabled) and so this doesn't happen.
Now the final bit of mystery was how did NTFS know that "\Windows\System32\Config\SYSTEM" should be handled in a special way. What was it that made it a boot critical file ? Well, as it turns out, there is an FSCTL, FSCTL_SET_BOOTLOADER_ACCESSED, which serves this purpose. When NTFS receives this for a FILE_OBJECT it sets an internal flag and then when an IRP_MJ_FLUSH_BUFFERS arrives on the same handle NTFS will flush not only that file but all the folders that make up the path to that file (which is all handled in NtfsFlushBootCritical).

Identifying the Target Device for an Opened File Object

2012-05-03T10:20:00.000-07:00

As you may have guessed from the title, in this post I plan to cover how the OS knows to which device a certain request needs to go for a certain FILE_OBJECT. We spent quite a bit of time on this blog talking about how IRP_MJ_CREATE works and how the actual DEVICE_OBJECT and file contents are found in that case, but once the IRP_MJ_CREATE has successfully completed and the FILE_OBJECT is initialized, subsequent requests for that FILE_OBJECT take a different path through the system.
So first let's look at the FILE_OBJECT structure and its fields (this is x86 Win7):

0: kd> dt nt!_FILE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Int2B
   +0x004 DeviceObject     : Ptr32 _DEVICE_OBJECT
   +0x008 Vpb              : Ptr32 _VPB
   +0x00c FsContext        : Ptr32 Void
   +0x010 FsContext2       : Ptr32 Void
   +0x014 SectionObjectPointer : Ptr32 _SECTION_OBJECT_POINTERS
   +0x018 PrivateCacheMap  : Ptr32 Void
   +0x01c FinalStatus      : Int4B
   +0x020 RelatedFileObject : Ptr32 _FILE_OBJECT
   +0x024 LockOperation    : UChar
   +0x025 DeletePending    : UChar
   +0x026 ReadAccess       : UChar
   +0x027 WriteAccess      : UChar
   +0x028 DeleteAccess     : UChar
   +0x029 SharedRead       : UChar
   +0x02a SharedWrite      : UChar
   +0x02b SharedDelete     : UChar
   +0x02c Flags            : Uint4B
   +0x030 FileName         : _UNICODE_STRING
   +0x038 CurrentByteOffset : _LARGE_INTEGER
   +0x040 Waiters          : Uint4B
   +0x044 Busy             : Uint4B
   +0x048 LastLock         : Ptr32 Void
   +0x04c Lock             : _KEVENT
   +0x05c Event            : _KEVENT
   +0x06c CompletionContext : Ptr32 _IO_COMPLETION_CONTEXT
   +0x070 IrpListLock      : Uint4B
   +0x074 IrpList          : _LIST_ENTRY
   +0x07c FileObjectExtension : Ptr32 Void

As you can see the FILE_OBJECT has a DeviceObject structure that (at least according to the name) should what we're looking for. This is actually true for FILE_OBJECTs for a direct device open but is not really the case for FILE_OBJECTs that represent files on a file system (which is the main focus of this blog). Still, if you have one of those FILE_OBJECTs for a direct device open then the proper way to get the target DEVICE_OBJECT is by calling IoGetAttachedDevice() or IoGetAttachedDeviceReference() on the FILE_OBJECT->DeviceObject member. Because DEVICE_OBJECTs can be stacked in NT it is important to note that both these functions walk the stack of devices and return the topmost one.
What about FILE_OBJECTs that represent files on file system ? In that case the FILE_OBJECT->DeviceObject member actually points to the storage DEVICE_OBJECT and not the filesystem DEVICE_OBJECT, which is where the request should go. So in that case the IO manager uses the FILE_OBJECT->Vpb->DeviceObject to get the file system device object that is mounted on the actual storage device. Once the file system device is found IoGetAttachedDevice() should be called to get the topmost DEVICE_OBJECT.
Things aren't that complicated so far, but there is yet another twist. If the FILE_OBJECT was opened with a device hint (IoCreateFileSpecifyDeviceObjectHint() or IoCreateFileEx() or any of the FltCreateFile functions with an Instance parameter that is not NULL) then the IO manager remembers the DEVICE_OBJECT hint in the FILE_OBJECT and it uses that as the target DEVICE_OBJECT.
Fortunately, all this logic is hidden inside the IoGetRelatedDeviceObject() so developers don't really need to implement it. It is useful, however, to know how this works for debugging things.
I think it is worth mentioning is that inside IoGetRelatedDeviceObject() the IO manager also checks that the hint DEVICE_OBJECT (if the FILE_OBJECT was opened in that way) is actually attached to the mounted file system stack for that volume. This check is obviously unnecessary when there is no hint because the IO manager returns the mounted file system stack.
There is yet another function that does something pretty similar to this, IoGetBaseFileSystemDeviceObject(). This function is undocumented but what it does is very similar to what IoGetRelatedDeviceObject() except that it doesn't return the topmost DEVICE_OBJECT on the file system stack, but rather the lowest one. Should you need this functionality and don't want to use undocumented functions, the same thing can be achieved by calling IoGetRelatedDeviceObject() followed by IoGetDeviceAttachmentBaseRef().
Finally, when working with these function always pay attention to whether the DEVICE_OBJECTs you get are referenced or not. The documentation isn't always clear and so the debugger can really come in handy (just look at the PointerCount on the DEVICE_OBJECT before and after you call the function). Anyway, from the ones I mentioned here, IoGetAttachedDevice(), IoGetRelatedDeviceObject() and IoGetBaseFileSystemDeviceObject() don't return a referenced DEVICE_OBJECT, while IoGetAttachedDeviceReference() and IoGetDeviceAttachmentBaseRef() do.

Smart Reading and Storing of Files

2012-04-26T10:57:00.001-07:00

I often run into tools that move files around (back-up solutions or cloud storage solutions or file copy utilities) that don't support some core NTFS features like alternate data streams. This lack of support for a very useful feature (just think about how many different file formats are out there that provide little value besides adding support for storing metadata along with the main data stream) is very frustrating.

Anyway, from looking at various applications that don't support this I see two main reasons for it. One is that they're multi-platform apps that only support the common feature set between all the platforms (which makes sense in some cases but doesn't really make much sense in other cases). The other reason is that I suspect people just don't want to have to deal with enumerating alternate data streams and then coming up with a mechanism to serialize them into a single file. On the other hand, most such solutions must read other file metadata and preserve it, such as file attributes, timestamps and so on, which I suspect they do on their own, using various Windows APIs that provide that functionality. Smarter solutions need to also deal with sparse files (it would be pretty stupid for a cloud storage solution to ignore the file system information that a large chunk of a file is all 0s and instead use bandwidth to transfer those) and possibly even hardlinks and reparse points.

Fortunately, this is simpler that it sounds, at least on Windows :). There is a set of Windows APIs that operate on a stream of data that describes a file complete with alternate data streams information and sparse blocks information and security information and so on. This stream is formatted in a way that allows the caller to understand what type of data each part of the stream represents. The API set even allows skipping certain types of file data that is not relevant to the caller. The APIs I'm referring to are referred to as the Backup APIs:

There is also the WIN32_STREAM_ID structure, which describes the information that follows the structure in the stream and allows for figuring out whether the stream is interesting for the caller, how long it is and so on.

These APIs allow the caller to read the file contents and metadata as a long stream of bytes, skip through the stream and write all the information back as one stream of bytes. Moreover, since the information is formatted it's also possible that when reading the data only a certain type of data is read. For example, let's discuss a solution that archives data to some cloud storage and then wants to read the data into a file on an OS different than Windows. It's quite easy to keep track in some database of the information of where the main data stream begins and how long it is so that it can only download that information for that platform.

There is quite a bit of documentation on these APIs and on what a backup solution should do with them and so on. There are some documents under the [MS-BKUP]: Microsoft NT Backup File Structure page and even a basic sample, Creating a Backup Application.

So please, next time you run across a solution that doesn't handle sparse files or alternate data streams or some other such features, feel free to point the developers to these APIs :)

The Standby List and Storage Overprovisioning

2012-04-19T10:37:00.000-07:00

This post is about an interesting issue I spent quite a bit of time debugging. As is often the case with very complex system, I knew most of the bits of information related to the issue but I didn't quite put everything together and so this scenario still surprised me.
It all started with me playing with file sizes and directory entries and so I was copying a large number of files to a VHD. Since I didn't have a lot of space for the VHD in my VM I decided to make all the files sparse so that they won't take any space on the VHD, which was rather small (2GB). I created about 20GB worth of sparse files on it and all was well. I've actually been using this setup for a while but when doing this in a Win8 VM I quickly ran into problems. The VHD ran out of free space. I knew there was no way for that to happen since all my files were sparse files and I didn't expect that I suddenly had that many directory entries that the file system metadata actually used most of the 2GB of the VHD. So figured this would be an interesting investigation.
The first thing I noticed was that if I dismounted the VHD and mounted it again the VHD look pretty much the way I expected: all the files were there and there were roughly 2GB worth of free space. So it seems it was a transient situation. I spent a bit of time looking at various NTFS counters (fsutil is a good tool for that), looking at ProcMon logs and poking the file system in various ways, but I couldn't find anything. I was about to embark on the next step, which was to try to figure out which blocks on the volume are owned by which file in the file system so that I can see why those files aren't sparse, but I was lucky enough to discover by chance that exactly the same behavior happens on a Win7 machine when running Microsoft Security Essentials. This was quite helpful because I stopped suspecting there was some new behavior in NTFS in Win8 and instead I could focus on Microsoft Security Essentials (MSE). Other AV products I had running in my VMs didn't seem to have the same effect so this was particular to MSE. One thing I knew that was rather unique to MSE (at least it was a some point in the past) was the fact that MSE uses mapped files (also known as sections in the NT world) to read file data so I started wondering if that had anything to do with it.
So using fsutil I created a new 1GB file and made it sparse. Then I opened it with FileTest, created a file mapping and then mapped a view for the whole file. Guess what: NTFS reserved space for the range I was reading (naturally I didn’t expect MSE to change the files it was scanning so I was just reading the files). This is necessary because in case something writes to the file using the section NTFS must be able to save that information to disk. When working with mapped files NTFS can't know in advance what data will be written (if any) and so it does a lot of preparation to be able to accommodate the scenario where everything is written. So it was pretty clear what was going on, the fact that MSE created sections for my sparse files made NTFS reserve blocks for the files. The one last thing I had to figure out was why MSE held on to the sections for so long. My files were pretty small (less than 1 MB on average) and so it took quite a lot of them to get NTFS to run out of free space. Initially I suspected a section leak in MSE, but while I was playing with FileTest I noticed that even when I closed FileTest (and so I could know for sure that all the file handles and memory mapping handles and mapped views and so on were released) the blocks still weren't returned to the free space pool. And at this point it hit me that it must have been MM that kept the section open and indeed using RamMap I could see that was the case.
Here is a quick recap about what the standby list is. When a file is used for memory mapped IO, when the pages are no longer used (the view is unmapped or the section or the file handle are closed and even when the whole process is terminated) the pages that are backed by the file are moved to the standby list. They will be moved out of the standby list to either the free list or the zero list (depending on whether there is memory pressure in the system and who's asking for what kind of pages) or they will be reused if the same file is used for memory mapped or cached IO. This last behavior is pretty much a file cache (not to be confused with the cache manager which has quite a different role). In my case since there was no memory pressure the pages would remain on the standby list for quite a while and so NTFS would not see the section being closed and so it kept the reserved blocks.
Please note that this is not unique to sparse files, any form of files that are overprovisioned (such as compressed files) have the same semantics. So it is quite easy to run out of space on a volume where the total logical size of all the files exceeds the volume's capacity even without writing anything to the volume.
Now, since this is a file systems and filters development blog I should mention that if you are a file system or a filter and you do any work with compressed files or sparse files or some such, you can actually tell MM to close a section using MmForceSectionClosed().
Update: I wanted to add some steps on how to reproduce this problem, in case you're interested.

Create an empty 1GB file:

C:\>fsutil file createnew C:\TestFile.bin 1000000000
File C:\TestFile.bin is created

Make the file sparse:

C:\>fsutil sparse setflag C:\TestFile.bin
C:\>fsutil sparse setrange C:\TestFile.bin 0 1000000000

open the file in FileTest.exe. Make sure to request GENERIC_WRITE access:
create a read-only file mapping, map a view and read the whole file:
unmap the view, close the section handle and then the file handle and then close FileTest.exe (we could have closed it directly as well).

you now have 1 GB less free space on C:

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 10FA-5C1D

 Directory of C:\

06/10/2009  02:42 PM                24 autoexec.bat
06/10/2009  02:42 PM                10 config.sys
03/02/2010  06:31 PM    >DIR>          Far
07/13/2009  07:37 PM    >DIR>          PerfLogs
08/11/2011  11:13 AM    >DIR>          Perl
11/05/2010  09:34 AM    >DIR>          Program Files
04/20/2012  10:12 AM     1,000,000,000 TestFile.bin
11/04/2009  12:57 PM    >DIR>          Users
11/05/2010  09:40 AM    >DIR>          Windows
               3 File(s)  1,000,000,034 bytes
               6 Dir(s)  39,859,396,608 bytes free

Use RamMap and see all the pages for the file on the standby list:
Empty the standby list (click on Empty->Empty Standby List).

Finally check the free space on C: again:

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 10FA-5C1D

 Directory of C:\

06/10/2009  02:42 PM                24 autoexec.bat
06/10/2009  02:42 PM                10 config.sys
03/02/2010  06:31 PM    >DIR>          Far
07/13/2009  07:37 PM    >DIR>          PerfLogs
08/11/2011  11:13 AM    >DIR>          Perl
11/05/2010  09:34 AM    >DIR>          Program Files
04/20/2012  10:12 AM     1,000,000,000 TestFile.bin
11/04/2009  12:57 PM    >DIR>          Users
11/05/2010  09:40 AM    >DIR>          Windows
               3 File(s)  1,000,000,034 bytes
               6 Dir(s)  41,361,416,192 bytes free