Thursday, July 28, 2011

Rolling your Own IO in Minifilters

The topic of how to issue an "IRP" in a minifilter does occasionally come up. In general this is required when there is no Flt API to perform a certain operation (for example the well known case of FltQueryDirectoryFile() in XP) or when the Flt API does not provide some feature that the caller needs (see my example in a previous post about FltSetInformationFile not issuing a create with FILE_SHARE_DELETE). There two main articles that I believe any file system filter writer should be familiar with, OSR's Rolling Your Own - Building IRPs to Perform I/O and Microsoft's own document about IO in minifilters, Minifilter Generated I/O.

As I've said before, a minifilter can be either in the context of an IO operation or not. When the minifilter is not in the context of an IO operation or when it wants to send a request to a different device then the one it's currently processing IO on the minifilter can usually issue an IRP directly (though this is rather rare so if you need to issue an actual IRP from a minifilter then make sure there isn't something wrong about your design; layering problems have a nasty habit of not showing up until testing interop with other minifilters, which doesn't usually happen in early testing). In other words, a minifilter only needs to issue IO using the FltMgr framework in the same circumstances when it would call a FltXxx routine, which I've discussed here. There isn't anything special to issuing an IRP from a minifilter so I'll focus only on issuing a FLT_CALLBACK_DATA type IO.

The steps involved in issuing a FLT_CALLBACK_DATA are pretty straightforward:

  1. Allocate a FLT_CALLBACK_DATA structure by calling FltAllocateCallbackData().
  2. Initialize the FLT_CALLBACK_DATA structure for your IO. This primarily requires setting up the FLT_CALLBACK_DATA->Iopb structure. Don't forget to set up the MajorFunction and MinorFunction members.
  3. Send the request down to the filters below using FltPerformSynchronousIo() or FltPerformAsynchronousIo().
  4. Either free the FLT_CALLBACK_DATA (FltFreeCallbackData()) or reuse it (FltReuseCallbackData()), depending on whether you need to issue additional IO or not.

Here is an example of what the code looks like to issue your own call similar to FltQueryDirectoryFile() (which is the code I have in my minifilters that need to run on XP):

    __in PFLT_INSTANCE  Instance,
    __in PFILE_OBJECT  FileObject,
    __out PVOID FileInformation,
    __in ULONG Length,
    __in FILE_INFORMATION_CLASS  FileInformationClass,
    __in BOOLEAN  ReturnSingleEntry,
    __in_opt PUNICODE_STRING  FileName,
    __in BOOLEAN  RestartScan,
    __out_opt PULONG  LengthReturned
    PFLT_CALLBACK_DATA callbackData = NULL;

    status = FltAllocateCallbackData( Instance,
                                      &callbackData );
    if (!NT_SUCCESS(status)) {

        return status;

    callbackData->Iopb->MajorFunction = IRP_MJ_DIRECTORY_CONTROL;
    callbackData->Iopb->MinorFunction = IRP_MN_QUERY_DIRECTORY;

    if (RestartScan) {

        SetFlag( callbackData->Iopb->OperationFlags, SL_RESTART_SCAN );

    if (ReturnSingleEntry) {

        SetFlag( callbackData->Iopb->OperationFlags, SL_RETURN_SINGLE_ENTRY );

    params = &callbackData->Iopb->Parameters;
    params->DirectoryControl.QueryDirectory.Length = Length;
    params->DirectoryControl.QueryDirectory.FileName = FileName;
    params->DirectoryControl.QueryDirectory.FileInformationClass = FileInformationClass;
    params->DirectoryControl.QueryDirectory.FileIndex = 0;
    params->DirectoryControl.QueryDirectory.DirectoryBuffer = FileInformation;
    params->DirectoryControl.QueryDirectory.MdlAddress = NULL;

    FltPerformSynchronousIo( callbackData );

    status = callbackData->IoStatus.Status;    

    if (LengthReturned != NULL) {

        *LengthReturned = (ULONG)(callbackData->IoStatus.Information);

    FltFreeCallbackData( callbackData );

    return status;

The MSDN documentation for the APIs is pretty thorough and combined with the powerpoint presentation from MS I think it covers the subject matter pretty well. However, there are some things I'd like to emphasize:

  • You simply can't issue an IRP_MJ_CREATE this way, use FltCreateFile(Ex(2)).
  • Both FltPerformSynchronousIo() and FltPerformAsynchronousIo() will set the FLT_CALLBACK_DATA->IoStatus.Status to reflect the status of the operation. Unfortunately, it's impossible to tell whether the request failed in FltMgr or if it was actually sent down and it failed in a lower layer. However, in the debugger one can tell whether the request failed in a lower layer by looking at the IRP associated with the FLT_CALLBACK_DATA. The filter allocated FLT_CALLBACK_DATA starts without being associated with an IRP so if the IRP is still NULL then that's an indication that the request failed in FltMgr. If the IRP is not null then it's possible to tell whether the IRP is completed or not and to see the status of the operation.
  • FltPerformAsynchronousIo() will ALWAYS call the asynchronous completion routine. Basically, once a minifilter calls FltPerformAsynchronousIo() it is guaranteed one call to the async completion routine no matter what. So don't make assumptions that the async completion routine won't be called if the request fails in any way.
  • The best way that I've found to figure out how to initialize a FLT_CALLBACK_DATA structure for a certain operation is to filter that operation and see what a FLT_CALLBACK_DATA generated by the FltMgr for an existing IRP looks like.
  • For a call to FltPerformAsynchronousIo() that returned STATUS_PENDING, make sure to not free the FLT_CALLBACK_DATA until the IO actually completes. In fact, a good strategy is to call FltFreeCallbackData() from the async completion routine, which is guaranteed to be called only after the IO is complete.
  • This could be a pretty useful feature to preallocate IRPs in the event that FltAllocateCallbackData() or some other FltXxx API fail because of low system resources and the filter wants to try to implement forward progress (for more discussion on forward progress in general see this page and the RamDisk WDK sample). However, since FltAllocateCallbackData() doesn't allocate the IRP associated for the FLT_CALLBACK_DATA structure, it's possible that even if one preallocates some FLT_CALLBACK_DATA structures to use for forward progress, the calls to FltPerformSynchronousIo() and FltPerformAsynchronousIo() might still fail when trying to allocate the IRP. This is why in Win7 FltMgr introduced FltAllocateCallbackDataEx() which allows a minifilter to preallocate a FLT_CALLBACK_DATA that is guaranteed to preallocate all the necessary memory thus enabling forward progress in a low-memory situation (see the explanation for the FLT_ALLOCATE_CALLBACK_DATA_PREALLOCATE_ALL_MEMORY flag).

Thursday, July 21, 2011

Using IoRegisterFsRegistrationChangeMountAware

I've already talked about how file system filters attach to volumes. However, there is one thing I didn't mention in the context of that discussion. There is a race that can happen in that path that can have unpleasant side-effects for filters.

I'll start with steps involved in a legacy filter attaching to a volume. This is discussed in more detail in another couple of posts on this blog so I'll skip over some steps and focus only on the ones that are relevant to the problem at hand:

  1. Legacy filter calls IoRegisterFsRegistrationChange().
  2. The notification callback gets called for a file system (initially all the registered ones and then new ones when they register).
  3. In the notification callback the legacy filter attaches to the file system control device objects so that it can receive the IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code request whenever the file system is asked to mount a new volume (and thus the filter will be notified of all new mounted volumes).
  4. Also in the notification callback the legacy filter walks over the list of devices that the file system has already created, which are all file system volume device objects (VDOs) for all the volumes mounted by that file system, and attaches to each of them (this takes care of the existing volumes).

Now let's take a look at the steps that the IO manager takes when trying to mount a volume (again these are just the relevant ones to the problem):

  1. Start at the head of the list of registered file system and get a reference to the CDO for that file system.
  2. Prepare an IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code IRP that will be sent to that CDO.
  3. Send the IRP and wait for it to complete.
  4. If the file system couldn't mount the volume then get the next entry in the list of registered file systems and reference the CDO for that file system and then go to step 2.

It is important to note that both the list of registered file systems and the list of drivers to be notified about the arrival of new file systems (drivers that called IoRegisterFsRegistrationChange()) are protected by the same lock. Incidentally, this lock (though private to the OS) is available in the debugger as nt!IopDatabaseResource:

0: kd> x nt!IopDatabaseResource
8299e860 nt!IopDatabaseResource = <no type information>
0: kd> !locks 8299e860 

Resource @ nt!IopDatabaseResource (0x8299e860)    Available
1 total locks

In step 3 of the volume mount path, where the IRP is sent to the CDO, the IO manager releases the IopDatabaseResource before sending the IRP down to the file system and then once the IRP completes it reacquires it. After all, holding a lock across a call to a driver is to be avoided whenever possible. However, this opens a very small window in which things can go wrong. If a volume mount is in progress and let's say that the IO manager has prepared the mount IRP and has just released the IopDatabaseResource and then the thread is preempted and on a different thread a filter calls IoRegisterFsRegistrationChange(), gets the list of registered file systems, attaches a device on the CDO for each file system and then it enumerates all the VDOs then the problem is the file system filter will completely miss the volume that is about to be mounted because there is no VDO for it yet (since the IRP_MN_MOUNT_VOLUME IRP hasn't reached the file system yet and so no VDO was created) and the device it has attached to the CDO will also not see the IRP_MN_MOUNT_VOLUME request because when the IO manager referenced the top device for the CDO the filter wasn't there yet and so the IRP will go to the device right below the filter.

The result of all this is that the filter will completely miss a mounted volume and will not attach to it. Since this requires that a filter calls IoRegisterFsRegistrationChange() exactly at the time when a volume is mounted, it is a very narrow window. This window can be avoided by using IoRegisterFsRegistrationChangeMountAware() instead of IoRegisterFsRegistrationChange(), where the IO manager synchronizes volume mounts with calls to IoRegisterFsRegistrationChangeMountAware().

Of course, this discussion is really only relevant to legacy filters, minifilters don't have to deal with all this since they never register with the IO manager directly.

Thursday, July 14, 2011

More on Instances and Volumes

I've recently been playing some with instances and I've come across a couple of things that I wanted to share. Prerequisites for this discussion are my old posts on the FLT_INSTANCE structure and the FLT_VOLUME structure.

In Filter Manager terms, a volume is an attachment of fltmgr to the file system stack on a volume. The volume maps to a to a FltMgr DEVICE_OBJECT attached to a file system VDO. In most cases, where there are no legacy filters on a system, the volume represents the whole IO stack between the IO manager and the file system. However, when legacy filters are present on the stack multiple volumes can be attached on each file system stack. See this picture which I'm reusing from my FLT_VOLUMES post.

An interesting thing to note is the way FltMgr attaches to a file system stack. The simplified view is that FltMgr attaches a frame between each legacy filter, but that's not an accurate picture in a couple of ways. First, a legacy filter can attach only to some volumes, which means that on the other volumes there might be no legacy filter at all. Nevertheless, for consistency reasons, FltMgr attaches a DEVICE_OBJECT even if there are no DEVICE_OBJECTs belonging to other legacy filters on a volume. Also, since there is no mechanism to know when a device was attached to a device stack, FltMgr can't know when a legacy filters attaches to a certain device stack, which prevents it from being able to attach immediately on top of each legacy filter. So FltMgr only looks at the file system stack and tries to attach when a minifilter is loaded. At that time FltMgr tries to figure out which frame it should belong to, depending on the altitude (and in case you were wondering, the altitude comes from the default instance, which is why FltRegisterFilter() might fail with STATUS_OBJECT_NAME_NOT_FOUND if there is no default instance specified in the INF file). If no frame already exists where the minifilter altitude fits (and since Frame 0 starts at altitude 0 this scenario usually happens when the altitude of the new minifilter is higher than the altitude of the highest frame), then FltMgr looks at whether the top frame has any legacy filter attached on top. If not it will simply increase the highest altitude on that frame and loads the minifilter there. However, if a legacy filter has attached to the top frame then in Vista and newer OSes FltMgr tries to figure out what the altitude of that legacy filter is based on the Group (as in LoadOrderGroup) and then it grows the top of the highest frame (it increases the altitude) up to the altitude associated with that Group. Incidentally this is another good reason for legacy filters to use the appropriate Group. This way they can benefit to some extent from the layering guaranteed by FltMgr. Anyway, if the altitude is higher than the altitude of the top frame even after it was extended (again, this is only true for Vista and newer OSes, in XP the altitude on the frame is not increased) then a new frame is needed and so FltMgr proceeds to allocate a new frame and attach a new set of DEVICE_OBJECTs to each stack. This can have a couple of implications:

  • There can be multiple legacy filters directly on top of each other, if no minifilter was loaded between the time when the first legacy filter was attached and the time when the second legacy filter was attached.
  • There can be some volumes on which there are only FltMgr DEVICE_OBJECTs directly on top of each other. This should have no impact on minifilter developers but it might surprise someone looking a the stack in the debugger. This is actually quite common and it's perfectly fine.
  • In extreme cases, it's possible that on one volume a legacy filter is attached above a certain frame while on a different volume it is attached below that frame. I've never seen this happen but I can imagine it would if the legacy filter attaches to volumes late (when some user mode apps requests attachment) or if the attachment happens to race with a minifilter loading.

An instance is an attachment of a filter to a certain file system volume. The notable thing about this is that an instance can be attached at multiple altitudes on the same volume. The altitudes at which an instance can attach are limited, however, by the altitude range of the frame. In other words, once a filter is loaded it is associated with a frame and it can only create instances at altitudes within that frame. Why would a filter create multiple instances on the same volume? One good reason for that is to test that the filter can attach above itself, which is a good way to test that the design is safe and it doesn't violate any layering rules. Another reason might be to analyze the behavior of a specific filter. In this case one might attach logging instances above and below it.

One decision a file system filter developer must make pretty early on is whether the filter should attach to volumes automatically or whether it needs manual attachment. For manual attachments a minifilter can use the FltAttachVolume() and the FltAttachVolumeAtAltitude() functions, but surprisingly these functions lack a context parameter. Looking at the PFLT_INSTANCE_SETUP_CALLBACK callback, we can see there is no callback parameter being passed as well (indicating that this is a design decision rather than a bug with the APIs). This can be problematic for filters that behave differently depending on which volume the filter is attached to. For example, imagine there is a filter that implements some form of file-level redundancy by duplicating some of the operations that happen for a file on a volume on a file another volume. This implies that when the filter starts working it might need to be attached to both volumes and it might need to know which is the target instance and which is the destination instance. One possible workaround would be to use an instance context for each instance that contains information about the role of the instance. This way a filter can call FltAttachVolume() or FltAttachVolumeAtAltitude() and if the call is successful it can use the pointer to the new instance to call FltSetInstanceContext() on that instance and inform the instance on the role it must perform. This is a rather unusual mechanism (passing the context to a callback is by far the more prevalent model in Windows) and the only reason I can think this was done this way is because of FilterAttach() and FilterAttachAtAltitude() for which passing in a context is not possible (passing in a pointer from kernel mode to user mode is not a good idea).

Finally, one last thing I'd like to point out is that there are two similar types of contexts, a volume context and an instance context. The vast majority of filters only have at most one instance per volume and so from a functional perspective they are equivalent. The instance context however is much faster to access because it is pretty much attached to the FLT_INSTANCE structure (so it's just a pointer deref) whereas the volume context is stored in some hash structure with the filter as the hash key so any lookup implies locking and walking the hash structure, which is much more costly.

So the couple of ideas that are important to remember from this post:

  • All filters must have a default instance, otherwise FltRegisterFilter() will fail with STATUS_OBJECT_NAME_NOT_FOUND (which incidentally is not documented as a possible return value).
  • When testing for interop with legacy filters, try to load the legacy both above your filter and below your filter (that might not be necessary on Vista+ environments where the legacy filter uses a Group, which might guarantee a fixed position relative to your filter).
  • Use instance contexts always instead of volume contexts. There is almost no reason not to.
  • If you are writing or maintaining a legacy filter, please take the time to make sure that the Group in the INF file is set to the right value. It's a text-only change and it might save a lot of time in support costs...

Thursday, July 7, 2011

Opening Volume Handles in Minifilters

This should be a pretty straight-forward topic, right ? After all, FltMgr even provides a function for this, FltOpenVolume. However, a recent post on OSR's NTFSD made me take a deeper look of this issue and there are some interesting things that I found. First, let me say upfront that the real problem of the poster was trying to issue an FSCTL using FltDeviceIoControlFile instead of FltFsControlFile. However, his post was about FltCreateFile failing and looking at the code I couldn't figure out why which is usually a good sign I'm missing something and that I should investigate further.

Here is a small function that I've added to everyone's favorite WDK sample, Passthrough (please note the hardcoded path to E:):

NTSTATUS MyOpenVolume(
    OBJECT_ATTRIBUTES objectAttributes;
    IO_STATUS_BLOCK ioStatus;
    HANDLE volumeHandle = NULL;
    UNICODE_STRING gVolumeRoot = RTL_CONSTANT_STRING(L"\\DosDevices\\E:");

    InitializeObjectAttributes( &objectAttributes, 
                                OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, 
                                NULL );

    status = FltCreateFile( gFilterHandle, 
                            FILE_SHARE_READ | FILE_SHARE_WRITE, 
                            0 , 

    if (volumeHandle != NULL) {


    return status;

I'm simply calling this function for each IRP_MJ_CREATE (in PtPreOperationPassThrough)

    if (Data->Iopb->MajorFunction == IRP_MJ_CREATE) {

        status = MyOpenVolume( FltObjects );        


So anyway, FltCreateFile simply fails with STATUS_INVALID_PARAMETER. This was quite unexpected because I had used similar code before, just not in a minifilter (and looking through some old code I was able to confirm that). So I decided to see what would happen if I called ZwCreateFile() instead of FltCreateFile(). To my surprise, it worked using the exact same parameters (well, except of course for the Filter, Instance and Flags parameters). I was surprised that it worked because I was expecting an infinite loop since ZwCreateFile() doesn't target the IRP_MJ_CREATE and so it would go into my create handler again and again… Then my next step was to try to replace ZwCreateFile() with FltCreateFile() and instead of using my instance use a NULL instance so that the request should also go to the top of the stack just like ZwCreateFile() would. But that also failed with STATUS_INVALID_PARAMETER, which was also pretty strange. So I decided to look at the handle I've just opened to see if I notice anything:

1: kd> !fileobj 935afbf0  

Device Object: 0x92f0da60   \Driver\volmgr
Vpb is NULL
Event signalled

Flags:  0x40800
 Direct Device Open
 Handle Created

CurrentByteOffset: 0

There are a couple of things that looked unusual. First, there is no FsContext or FsContext2, meaning they are NULL. Then, the Flags field has the "Direct Device Open" flag (FO_DIRECT_DEVICE_OPEN). Also, there is no FO_VOLUME_OPEN flag even though this should be a volume open. And finally, the VPB is NULL, even though the volume is mounted (this is not obvious from this FO, I just happen to know it's mounted). All this means that the handle I have is in fact a handle to the storage stack volume instead of the file system volume. This is an interesting NT behavior that I had forgotten about. The idea is that opening a volume using certain access rights will open the storage volume without triggering a mount of the file system. This is useful when a driver wants to talk to the actual volume and query some attributes or something of that nature without forcing the file system to be mounted. You can find out more about this behavior on the MSDN page "Common Driver Reliability Issues", if you scroll all the way to "Requests to Create and Open Files and Devices" and then look at the entry for "Relative Open Requests for Direct Device Open Handles". Please note that this is not the same as DASD IO.

So anyway I wanted to see what happens if when calling FltCreateFile() I also request FILE_WRITE_ATTRIBUTES, thus changing the semantics for the IRP_MJ_CREATE and not getting a direct device open. And this time around FltCreateFile() worked. Here is the FILE_OBJECT that got created:

1: kd> !fileobj 94126488  

Device Object: 0x92f0da60   \Driver\volmgr
Vpb: 0x92f09570
Event signalled

Flags:  0x440008
 No Intermediate Buffering
 Handle Created
 Volume Open

FsContext: 0x92fb2e18 FsContext2: 0xa3f08bf8
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: 9352d4f4
  Shared Cache Map: 00000000

File object extension is at 9305e2f0:

So this is a file system volume open, as we can see from the Volume Open flag (FO_VOLUME OPEN). Also, FsContext and FsContext2 and the VPB are no longer null. Still, it's not clear why FltCreateFile would return STATUS_INVALID_PARAMETER for a direct device open. Once again tracing through IopParseDevice provides the answer:

1: kd> kn
 # ChildEBP RetAddr  
00 a157d5bc 82a77ff2 nt!IopCheckTopDeviceHint+0x5c
01 a157d698 82a5926b nt!IopParseDevice+0x81c
02 a157d714 82a7f2d9 nt!ObpLookupObjectName+0x4fa
03 a157d774 82a7762b nt!ObOpenObjectByName+0x165
04 a157d7f0 82aaee29 nt!IopCreateFile+0x673
05 a157d920 a0ede0e1 nt!IoCreateFileEx+0x9e
06 a157d994 a0ede1d4 PassThrough!MyOpenVolume+0xd1 [c:\temp3\passthrough\passthrough.c @ 409]
07 a157d9ac 96029aeb PassThrough!PtPreOperationPassThrough+0xa4 [c:\temp3\passthrough\passthrough.c @ 873]
WARNING: Frame IP not in any known module. Following frames may be wrong.
08 a157da88 828744bc 0x96029aeb
09 a157daa0 82a786ad nt!IofCallDriver+0x63
0a a157db78 82a5926b nt!IopParseDevice+0xed7
0b a157dbf4 82a7f2d9 nt!ObpLookupObjectName+0x4fa
0c a157dc50 82a7762b nt!ObOpenObjectByName+0x165
0d a157dccc 82ab267e nt!IopCreateFile+0x673
0e a157dd14 8287b44a nt!NtOpenFile+0x2a
0f a157dd14 774764f4 nt!KiFastCallEntry+0x12a
10 0012d958 00439f12 0x774764f4
11 0012d99c 0049a03e 0x439f12
12 0012dbd0 0049b43f 0x49a03e
13 0012dbec 004551cc 0x49b43f
14 0012f590 00491382 0x4551cc
15 0012f5ac 004914e8 0x491382
16 0012f5d0 00491630 0x4914e8
17 0012fe50 0048ecfe 0x491630
18 0012fe94 0048f4bd 0x48ecfe
19 0012ff40 004ed433 0x48f4bd
1a 0012ff88 765e1194 0x4ed433
1b 0012ff94 7748b495 0x765e1194
1c 0012ffd4 7748b468 0x7748b495
1d 0012ffec 00000000 0x7748b468
1: kd> u nt!IopCheckTopDeviceHint+0x5c
82a9b354 b80d0000c0      mov     eax,0C000000Dh
82a9b359 5d              pop     ebp
82a9b35a c20400          ret     4
82a9b35d 90              nop
82a9b35e 90              nop
82a9b35f 90              nop
82a9b360 90              nop
82a9b361 90              nop

What is going on here is that nt!IopCheckTopDeviceHint simply fails with STATUS_INVALID_PARAMETER if it's a direct device open. Basically, the combination of targeted IRP_MJ_CREATE (like FltCreateFile() issues when an Instance parameter is specified) and direct device open always fails. But while this explains why ZwCreateFile works, it's not clear why FltCreateFile() with a NULL instance fails. So after another bit of tracing there I discovered that FltCreateFileEx2() (which both FltCreateFile() and FltCreateFileEx() call) fails any request if the FILE_OBJECT it gets has the FO_DIRECT_DEVICE_OPEN flag set.

So before this post gets waaaay too long, let's get to our conclusions:

  • FltCreateFile() simply cannot be used for direct device opens. Minifilter developers can use ZwCreateFile() in this scenario, which is safe because the IRP_MJ_CREATE issued does not go to any file system and so there is no reentrancy. This is the same as opening any non-file system device on the system.
  • Direct device handles are not the same as DASD handles. DASD FILE_OBJECTs have the FO_VOLUME_OPEN flag set and represent an open to the file system volume device, while the direct device FILE_OBJECT have the FO_DIRECT_DEVICE_OPEN flag set and are targeted directly at the storage volume.
  • Try to use FltOpenVolume() instead of rolling your own open volume code.
  • Do not use a direct device handle when issuing FSCTLs, it makes no sense. The FSCTLs must go to the file system device.
  • Do not send FSCTLs using ZwDeviceIoControlFile() or FltDeviceIoControlFile(). Instead one should use ZwFsControlFile() or FltFsControlFile().