Thursday, July 28, 2011

Rolling your Own IO in Minifilters

The topic of how to issue an "IRP" in a minifilter does occasionally come up. In general this is required when there is no Flt API to perform a certain operation (for example the well known case of FltQueryDirectoryFile() in XP) or when the Flt API does not provide some feature that the caller needs (see my example in a previous post about FltSetInformationFile not issuing a create with FILE_SHARE_DELETE). There two main articles that I believe any file system filter writer should be familiar with, OSR's Rolling Your Own - Building IRPs to Perform I/O and Microsoft's own document about IO in minifilters, Minifilter Generated I/O.

As I've said before, a minifilter can be either in the context of an IO operation or not. When the minifilter is not in the context of an IO operation or when it wants to send a request to a different device then the one it's currently processing IO on the minifilter can usually issue an IRP directly (though this is rather rare so if you need to issue an actual IRP from a minifilter then make sure there isn't something wrong about your design; layering problems have a nasty habit of not showing up until testing interop with other minifilters, which doesn't usually happen in early testing). In other words, a minifilter only needs to issue IO using the FltMgr framework in the same circumstances when it would call a FltXxx routine, which I've discussed here. There isn't anything special to issuing an IRP from a minifilter so I'll focus only on issuing a FLT_CALLBACK_DATA type IO.

The steps involved in issuing a FLT_CALLBACK_DATA are pretty straightforward:

  1. Allocate a FLT_CALLBACK_DATA structure by calling FltAllocateCallbackData().
  2. Initialize the FLT_CALLBACK_DATA structure for your IO. This primarily requires setting up the FLT_CALLBACK_DATA->Iopb structure. Don't forget to set up the MajorFunction and MinorFunction members.
  3. Send the request down to the filters below using FltPerformSynchronousIo() or FltPerformAsynchronousIo().
  4. Either free the FLT_CALLBACK_DATA (FltFreeCallbackData()) or reuse it (FltReuseCallbackData()), depending on whether you need to issue additional IO or not.

Here is an example of what the code looks like to issue your own call similar to FltQueryDirectoryFile() (which is the code I have in my minifilters that need to run on XP):

NTSTATUS
MyFltQueryDirectoryFile(
    __in PFLT_INSTANCE  Instance,
    __in PFILE_OBJECT  FileObject,
    __out PVOID FileInformation,
    __in ULONG Length,
    __in FILE_INFORMATION_CLASS  FileInformationClass,
    __in BOOLEAN  ReturnSingleEntry,
    __in_opt PUNICODE_STRING  FileName,
    __in BOOLEAN  RestartScan,
    __out_opt PULONG  LengthReturned
    )
{
    NTSTATUS status = STATUS_SUCCESS;
    PFLT_CALLBACK_DATA callbackData = NULL;
    PFLT_PARAMETERS params = NULL;

    status = FltAllocateCallbackData( Instance,
                                      FileObject,
                                      &callbackData );
    if (!NT_SUCCESS(status)) {

        return status;
    }

    callbackData->Iopb->MajorFunction = IRP_MJ_DIRECTORY_CONTROL;
    callbackData->Iopb->MinorFunction = IRP_MN_QUERY_DIRECTORY;

    if (RestartScan) {

        SetFlag( callbackData->Iopb->OperationFlags, SL_RESTART_SCAN );
    }

    if (ReturnSingleEntry) {

        SetFlag( callbackData->Iopb->OperationFlags, SL_RETURN_SINGLE_ENTRY );
    }

    params = &callbackData->Iopb->Parameters;
    params->DirectoryControl.QueryDirectory.Length = Length;
    params->DirectoryControl.QueryDirectory.FileName = FileName;
    params->DirectoryControl.QueryDirectory.FileInformationClass = FileInformationClass;
    params->DirectoryControl.QueryDirectory.FileIndex = 0;
    params->DirectoryControl.QueryDirectory.DirectoryBuffer = FileInformation;
    params->DirectoryControl.QueryDirectory.MdlAddress = NULL;

    FltPerformSynchronousIo( callbackData );

    status = callbackData->IoStatus.Status;    

    if (LengthReturned != NULL) {

        *LengthReturned = (ULONG)(callbackData->IoStatus.Information);
    }

    FltFreeCallbackData( callbackData );

    return status;
}

The MSDN documentation for the APIs is pretty thorough and combined with the powerpoint presentation from MS I think it covers the subject matter pretty well. However, there are some things I'd like to emphasize:

  • You simply can't issue an IRP_MJ_CREATE this way, use FltCreateFile(Ex(2)).
  • Both FltPerformSynchronousIo() and FltPerformAsynchronousIo() will set the FLT_CALLBACK_DATA->IoStatus.Status to reflect the status of the operation. Unfortunately, it's impossible to tell whether the request failed in FltMgr or if it was actually sent down and it failed in a lower layer. However, in the debugger one can tell whether the request failed in a lower layer by looking at the IRP associated with the FLT_CALLBACK_DATA. The filter allocated FLT_CALLBACK_DATA starts without being associated with an IRP so if the IRP is still NULL then that's an indication that the request failed in FltMgr. If the IRP is not null then it's possible to tell whether the IRP is completed or not and to see the status of the operation.
  • FltPerformAsynchronousIo() will ALWAYS call the asynchronous completion routine. Basically, once a minifilter calls FltPerformAsynchronousIo() it is guaranteed one call to the async completion routine no matter what. So don't make assumptions that the async completion routine won't be called if the request fails in any way.
  • The best way that I've found to figure out how to initialize a FLT_CALLBACK_DATA structure for a certain operation is to filter that operation and see what a FLT_CALLBACK_DATA generated by the FltMgr for an existing IRP looks like.
  • For a call to FltPerformAsynchronousIo() that returned STATUS_PENDING, make sure to not free the FLT_CALLBACK_DATA until the IO actually completes. In fact, a good strategy is to call FltFreeCallbackData() from the async completion routine, which is guaranteed to be called only after the IO is complete.
  • This could be a pretty useful feature to preallocate IRPs in the event that FltAllocateCallbackData() or some other FltXxx API fail because of low system resources and the filter wants to try to implement forward progress (for more discussion on forward progress in general see this page and the RamDisk WDK sample). However, since FltAllocateCallbackData() doesn't allocate the IRP associated for the FLT_CALLBACK_DATA structure, it's possible that even if one preallocates some FLT_CALLBACK_DATA structures to use for forward progress, the calls to FltPerformSynchronousIo() and FltPerformAsynchronousIo() might still fail when trying to allocate the IRP. This is why in Win7 FltMgr introduced FltAllocateCallbackDataEx() which allows a minifilter to preallocate a FLT_CALLBACK_DATA that is guaranteed to preallocate all the necessary memory thus enabling forward progress in a low-memory situation (see the explanation for the FLT_ALLOCATE_CALLBACK_DATA_PREALLOCATE_ALL_MEMORY flag).

Thursday, July 21, 2011

Using IoRegisterFsRegistrationChangeMountAware

I've already talked about how file system filters attach to volumes. However, there is one thing I didn't mention in the context of that discussion. There is a race that can happen in that path that can have unpleasant side-effects for filters.

I'll start with steps involved in a legacy filter attaching to a volume. This is discussed in more detail in another couple of posts on this blog so I'll skip over some steps and focus only on the ones that are relevant to the problem at hand:

  1. Legacy filter calls IoRegisterFsRegistrationChange().
  2. The notification callback gets called for a file system (initially all the registered ones and then new ones when they register).
  3. In the notification callback the legacy filter attaches to the file system control device objects so that it can receive the IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code request whenever the file system is asked to mount a new volume (and thus the filter will be notified of all new mounted volumes).
  4. Also in the notification callback the legacy filter walks over the list of devices that the file system has already created, which are all file system volume device objects (VDOs) for all the volumes mounted by that file system, and attaches to each of them (this takes care of the existing volumes).

Now let's take a look at the steps that the IO manager takes when trying to mount a volume (again these are just the relevant ones to the problem):

  1. Start at the head of the list of registered file system and get a reference to the CDO for that file system.
  2. Prepare an IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code IRP that will be sent to that CDO.
  3. Send the IRP and wait for it to complete.
  4. If the file system couldn't mount the volume then get the next entry in the list of registered file systems and reference the CDO for that file system and then go to step 2.

It is important to note that both the list of registered file systems and the list of drivers to be notified about the arrival of new file systems (drivers that called IoRegisterFsRegistrationChange()) are protected by the same lock. Incidentally, this lock (though private to the OS) is available in the debugger as nt!IopDatabaseResource:

0: kd> x nt!IopDatabaseResource
8299e860 nt!IopDatabaseResource = <no type information>
0: kd> !locks 8299e860 

Resource @ nt!IopDatabaseResource (0x8299e860)    Available
1 total locks

In step 3 of the volume mount path, where the IRP is sent to the CDO, the IO manager releases the IopDatabaseResource before sending the IRP down to the file system and then once the IRP completes it reacquires it. After all, holding a lock across a call to a driver is to be avoided whenever possible. However, this opens a very small window in which things can go wrong. If a volume mount is in progress and let's say that the IO manager has prepared the mount IRP and has just released the IopDatabaseResource and then the thread is preempted and on a different thread a filter calls IoRegisterFsRegistrationChange(), gets the list of registered file systems, attaches a device on the CDO for each file system and then it enumerates all the VDOs then the problem is the file system filter will completely miss the volume that is about to be mounted because there is no VDO for it yet (since the IRP_MN_MOUNT_VOLUME IRP hasn't reached the file system yet and so no VDO was created) and the device it has attached to the CDO will also not see the IRP_MN_MOUNT_VOLUME request because when the IO manager referenced the top device for the CDO the filter wasn't there yet and so the IRP will go to the device right below the filter.

The result of all this is that the filter will completely miss a mounted volume and will not attach to it. Since this requires that a filter calls IoRegisterFsRegistrationChange() exactly at the time when a volume is mounted, it is a very narrow window. This window can be avoided by using IoRegisterFsRegistrationChangeMountAware() instead of IoRegisterFsRegistrationChange(), where the IO manager synchronizes volume mounts with calls to IoRegisterFsRegistrationChangeMountAware().

Of course, this discussion is really only relevant to legacy filters, minifilters don't have to deal with all this since they never register with the IO manager directly.

Thursday, July 14, 2011

More on Instances and Volumes

I've recently been playing some with instances and I've come across a couple of things that I wanted to share. Prerequisites for this discussion are my old posts on the FLT_INSTANCE structure and the FLT_VOLUME structure.

In Filter Manager terms, a volume is an attachment of fltmgr to the file system stack on a volume. The volume maps to a to a FltMgr DEVICE_OBJECT attached to a file system VDO. In most cases, where there are no legacy filters on a system, the volume represents the whole IO stack between the IO manager and the file system. However, when legacy filters are present on the stack multiple volumes can be attached on each file system stack. See this picture which I'm reusing from my FLT_VOLUMES post.

An interesting thing to note is the way FltMgr attaches to a file system stack. The simplified view is that FltMgr attaches a frame between each legacy filter, but that's not an accurate picture in a couple of ways. First, a legacy filter can attach only to some volumes, which means that on the other volumes there might be no legacy filter at all. Nevertheless, for consistency reasons, FltMgr attaches a DEVICE_OBJECT even if there are no DEVICE_OBJECTs belonging to other legacy filters on a volume. Also, since there is no mechanism to know when a device was attached to a device stack, FltMgr can't know when a legacy filters attaches to a certain device stack, which prevents it from being able to attach immediately on top of each legacy filter. So FltMgr only looks at the file system stack and tries to attach when a minifilter is loaded. At that time FltMgr tries to figure out which frame it should belong to, depending on the altitude (and in case you were wondering, the altitude comes from the default instance, which is why FltRegisterFilter() might fail with STATUS_OBJECT_NAME_NOT_FOUND if there is no default instance specified in the INF file). If no frame already exists where the minifilter altitude fits (and since Frame 0 starts at altitude 0 this scenario usually happens when the altitude of the new minifilter is higher than the altitude of the highest frame), then FltMgr looks at whether the top frame has any legacy filter attached on top. If not it will simply increase the highest altitude on that frame and loads the minifilter there. However, if a legacy filter has attached to the top frame then in Vista and newer OSes FltMgr tries to figure out what the altitude of that legacy filter is based on the Group (as in LoadOrderGroup) and then it grows the top of the highest frame (it increases the altitude) up to the altitude associated with that Group. Incidentally this is another good reason for legacy filters to use the appropriate Group. This way they can benefit to some extent from the layering guaranteed by FltMgr. Anyway, if the altitude is higher than the altitude of the top frame even after it was extended (again, this is only true for Vista and newer OSes, in XP the altitude on the frame is not increased) then a new frame is needed and so FltMgr proceeds to allocate a new frame and attach a new set of DEVICE_OBJECTs to each stack. This can have a couple of implications:

  • There can be multiple legacy filters directly on top of each other, if no minifilter was loaded between the time when the first legacy filter was attached and the time when the second legacy filter was attached.
  • There can be some volumes on which there are only FltMgr DEVICE_OBJECTs directly on top of each other. This should have no impact on minifilter developers but it might surprise someone looking a the stack in the debugger. This is actually quite common and it's perfectly fine.
  • In extreme cases, it's possible that on one volume a legacy filter is attached above a certain frame while on a different volume it is attached below that frame. I've never seen this happen but I can imagine it would if the legacy filter attaches to volumes late (when some user mode apps requests attachment) or if the attachment happens to race with a minifilter loading.

An instance is an attachment of a filter to a certain file system volume. The notable thing about this is that an instance can be attached at multiple altitudes on the same volume. The altitudes at which an instance can attach are limited, however, by the altitude range of the frame. In other words, once a filter is loaded it is associated with a frame and it can only create instances at altitudes within that frame. Why would a filter create multiple instances on the same volume? One good reason for that is to test that the filter can attach above itself, which is a good way to test that the design is safe and it doesn't violate any layering rules. Another reason might be to analyze the behavior of a specific filter. In this case one might attach logging instances above and below it.

One decision a file system filter developer must make pretty early on is whether the filter should attach to volumes automatically or whether it needs manual attachment. For manual attachments a minifilter can use the FltAttachVolume() and the FltAttachVolumeAtAltitude() functions, but surprisingly these functions lack a context parameter. Looking at the PFLT_INSTANCE_SETUP_CALLBACK callback, we can see there is no callback parameter being passed as well (indicating that this is a design decision rather than a bug with the APIs). This can be problematic for filters that behave differently depending on which volume the filter is attached to. For example, imagine there is a filter that implements some form of file-level redundancy by duplicating some of the operations that happen for a file on a volume on a file another volume. This implies that when the filter starts working it might need to be attached to both volumes and it might need to know which is the target instance and which is the destination instance. One possible workaround would be to use an instance context for each instance that contains information about the role of the instance. This way a filter can call FltAttachVolume() or FltAttachVolumeAtAltitude() and if the call is successful it can use the pointer to the new instance to call FltSetInstanceContext() on that instance and inform the instance on the role it must perform. This is a rather unusual mechanism (passing the context to a callback is by far the more prevalent model in Windows) and the only reason I can think this was done this way is because of FilterAttach() and FilterAttachAtAltitude() for which passing in a context is not possible (passing in a pointer from kernel mode to user mode is not a good idea).

Finally, one last thing I'd like to point out is that there are two similar types of contexts, a volume context and an instance context. The vast majority of filters only have at most one instance per volume and so from a functional perspective they are equivalent. The instance context however is much faster to access because it is pretty much attached to the FLT_INSTANCE structure (so it's just a pointer deref) whereas the volume context is stored in some hash structure with the filter as the hash key so any lookup implies locking and walking the hash structure, which is much more costly.

So the couple of ideas that are important to remember from this post:

  • All filters must have a default instance, otherwise FltRegisterFilter() will fail with STATUS_OBJECT_NAME_NOT_FOUND (which incidentally is not documented as a possible return value).
  • When testing for interop with legacy filters, try to load the legacy both above your filter and below your filter (that might not be necessary on Vista+ environments where the legacy filter uses a Group, which might guarantee a fixed position relative to your filter).
  • Use instance contexts always instead of volume contexts. There is almost no reason not to.
  • If you are writing or maintaining a legacy filter, please take the time to make sure that the Group in the INF file is set to the right value. It's a text-only change and it might save a lot of time in support costs...

Thursday, July 7, 2011

Opening Volume Handles in Minifilters

This should be a pretty straight-forward topic, right ? After all, FltMgr even provides a function for this, FltOpenVolume. However, a recent post on OSR's NTFSD made me take a deeper look of this issue and there are some interesting things that I found. First, let me say upfront that the real problem of the poster was trying to issue an FSCTL using FltDeviceIoControlFile instead of FltFsControlFile. However, his post was about FltCreateFile failing and looking at the code I couldn't figure out why which is usually a good sign I'm missing something and that I should investigate further.

Here is a small function that I've added to everyone's favorite WDK sample, Passthrough (please note the hardcoded path to E:):

NTSTATUS MyOpenVolume(
    __in PCFLT_RELATED_OBJECTS FltObjects
    )
{
    NTSTATUS status = STATUS_SUCCESS;
    OBJECT_ATTRIBUTES objectAttributes;
    IO_STATUS_BLOCK ioStatus;
    HANDLE volumeHandle = NULL;
    UNICODE_STRING gVolumeRoot = RTL_CONSTANT_STRING(L"\\DosDevices\\E:");

    InitializeObjectAttributes( &objectAttributes, 
                                &gVolumeRoot, 
                                OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, 
                                NULL, 
                                NULL );

    status = FltCreateFile( gFilterHandle, 
                            FltObjects->Instance, 
                            &volumeHandle, 
                            FILE_READ_ATTRIBUTES, 
                            &objectAttributes, 
                            &ioStatus, 
                            NULL, 
                            FILE_ATTRIBUTE_NORMAL, 
                            FILE_SHARE_READ | FILE_SHARE_WRITE, 
                            FILE_OPEN, 
                            0 , 
                            NULL, 
                            0L, 
                            0);  

    if (volumeHandle != NULL) {

        ZwClose(volumeHandle);
    }

    return status;
}

I'm simply calling this function for each IRP_MJ_CREATE (in PtPreOperationPassThrough)

...
    if (Data->Iopb->MajorFunction == IRP_MJ_CREATE) {

        status = MyOpenVolume( FltObjects );        
    }

    return FLT_PREOP_SUCCESS_WITH_CALLBACK;
...

So anyway, FltCreateFile simply fails with STATUS_INVALID_PARAMETER. This was quite unexpected because I had used similar code before, just not in a minifilter (and looking through some old code I was able to confirm that). So I decided to see what would happen if I called ZwCreateFile() instead of FltCreateFile(). To my surprise, it worked using the exact same parameters (well, except of course for the Filter, Instance and Flags parameters). I was surprised that it worked because I was expecting an infinite loop since ZwCreateFile() doesn't target the IRP_MJ_CREATE and so it would go into my create handler again and again… Then my next step was to try to replace ZwCreateFile() with FltCreateFile() and instead of using my instance use a NULL instance so that the request should also go to the top of the stack just like ZwCreateFile() would. But that also failed with STATUS_INVALID_PARAMETER, which was also pretty strange. So I decided to look at the handle I've just opened to see if I notice anything:

1: kd> !fileobj 935afbf0  



Device Object: 0x92f0da60   \Driver\volmgr
Vpb is NULL
Event signalled

Flags:  0x40800
 Direct Device Open
 Handle Created

CurrentByteOffset: 0

There are a couple of things that looked unusual. First, there is no FsContext or FsContext2, meaning they are NULL. Then, the Flags field has the "Direct Device Open" flag (FO_DIRECT_DEVICE_OPEN). Also, there is no FO_VOLUME_OPEN flag even though this should be a volume open. And finally, the VPB is NULL, even though the volume is mounted (this is not obvious from this FO, I just happen to know it's mounted). All this means that the handle I have is in fact a handle to the storage stack volume instead of the file system volume. This is an interesting NT behavior that I had forgotten about. The idea is that opening a volume using certain access rights will open the storage volume without triggering a mount of the file system. This is useful when a driver wants to talk to the actual volume and query some attributes or something of that nature without forcing the file system to be mounted. You can find out more about this behavior on the MSDN page "Common Driver Reliability Issues", if you scroll all the way to "Requests to Create and Open Files and Devices" and then look at the entry for "Relative Open Requests for Direct Device Open Handles". Please note that this is not the same as DASD IO.

So anyway I wanted to see what happens if when calling FltCreateFile() I also request FILE_WRITE_ATTRIBUTES, thus changing the semantics for the IRP_MJ_CREATE and not getting a direct device open. And this time around FltCreateFile() worked. Here is the FILE_OBJECT that got created:

1: kd> !fileobj 94126488  



Device Object: 0x92f0da60   \Driver\volmgr
Vpb: 0x92f09570
Event signalled

Flags:  0x440008
 No Intermediate Buffering
 Handle Created
 Volume Open

FsContext: 0x92fb2e18 FsContext2: 0xa3f08bf8
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: 9352d4f4
  Shared Cache Map: 00000000


File object extension is at 9305e2f0:

So this is a file system volume open, as we can see from the Volume Open flag (FO_VOLUME OPEN). Also, FsContext and FsContext2 and the VPB are no longer null. Still, it's not clear why FltCreateFile would return STATUS_INVALID_PARAMETER for a direct device open. Once again tracing through IopParseDevice provides the answer:

1: kd> kn
 # ChildEBP RetAddr  
00 a157d5bc 82a77ff2 nt!IopCheckTopDeviceHint+0x5c
01 a157d698 82a5926b nt!IopParseDevice+0x81c
02 a157d714 82a7f2d9 nt!ObpLookupObjectName+0x4fa
03 a157d774 82a7762b nt!ObOpenObjectByName+0x165
04 a157d7f0 82aaee29 nt!IopCreateFile+0x673
05 a157d920 a0ede0e1 nt!IoCreateFileEx+0x9e
06 a157d994 a0ede1d4 PassThrough!MyOpenVolume+0xd1 [c:\temp3\passthrough\passthrough.c @ 409]
07 a157d9ac 96029aeb PassThrough!PtPreOperationPassThrough+0xa4 [c:\temp3\passthrough\passthrough.c @ 873]
WARNING: Frame IP not in any known module. Following frames may be wrong.
08 a157da88 828744bc 0x96029aeb
09 a157daa0 82a786ad nt!IofCallDriver+0x63
0a a157db78 82a5926b nt!IopParseDevice+0xed7
0b a157dbf4 82a7f2d9 nt!ObpLookupObjectName+0x4fa
0c a157dc50 82a7762b nt!ObOpenObjectByName+0x165
0d a157dccc 82ab267e nt!IopCreateFile+0x673
0e a157dd14 8287b44a nt!NtOpenFile+0x2a
0f a157dd14 774764f4 nt!KiFastCallEntry+0x12a
10 0012d958 00439f12 0x774764f4
11 0012d99c 0049a03e 0x439f12
12 0012dbd0 0049b43f 0x49a03e
13 0012dbec 004551cc 0x49b43f
14 0012f590 00491382 0x4551cc
15 0012f5ac 004914e8 0x491382
16 0012f5d0 00491630 0x4914e8
17 0012fe50 0048ecfe 0x491630
18 0012fe94 0048f4bd 0x48ecfe
19 0012ff40 004ed433 0x48f4bd
1a 0012ff88 765e1194 0x4ed433
1b 0012ff94 7748b495 0x765e1194
1c 0012ffd4 7748b468 0x7748b495
1d 0012ffec 00000000 0x7748b468
1: kd> u nt!IopCheckTopDeviceHint+0x5c
nt!IopCheckTopDeviceHint+0x5c:
82a9b354 b80d0000c0      mov     eax,0C000000Dh
82a9b359 5d              pop     ebp
82a9b35a c20400          ret     4
82a9b35d 90              nop
82a9b35e 90              nop
82a9b35f 90              nop
82a9b360 90              nop
82a9b361 90              nop

What is going on here is that nt!IopCheckTopDeviceHint simply fails with STATUS_INVALID_PARAMETER if it's a direct device open. Basically, the combination of targeted IRP_MJ_CREATE (like FltCreateFile() issues when an Instance parameter is specified) and direct device open always fails. But while this explains why ZwCreateFile works, it's not clear why FltCreateFile() with a NULL instance fails. So after another bit of tracing there I discovered that FltCreateFileEx2() (which both FltCreateFile() and FltCreateFileEx() call) fails any request if the FILE_OBJECT it gets has the FO_DIRECT_DEVICE_OPEN flag set.

So before this post gets waaaay too long, let's get to our conclusions:

  • FltCreateFile() simply cannot be used for direct device opens. Minifilter developers can use ZwCreateFile() in this scenario, which is safe because the IRP_MJ_CREATE issued does not go to any file system and so there is no reentrancy. This is the same as opening any non-file system device on the system.
  • Direct device handles are not the same as DASD handles. DASD FILE_OBJECTs have the FO_VOLUME_OPEN flag set and represent an open to the file system volume device, while the direct device FILE_OBJECT have the FO_DIRECT_DEVICE_OPEN flag set and are targeted directly at the storage volume.
  • Try to use FltOpenVolume() instead of rolling your own open volume code.
  • Do not use a direct device handle when issuing FSCTLs, it makes no sense. The FSCTLs must go to the file system device.
  • Do not send FSCTLs using ZwDeviceIoControlFile() or FltDeviceIoControlFile(). Instead one should use ZwFsControlFile() or FltFsControlFile().

Thursday, June 30, 2011

Using FltGetFileNameInformationUnsafe

I was talking to someone recently and I realized that the FltGetFileNameInformationUnsafe() is an API is that virtually unknown and, as a result, unused. This post is meant to explain where FltGetFileNameInformationUnsafe() fits in the overall set of FltMgr name APIs and when and how it should be used. But first let's see what it looks like.

__checkReturn
__drv_maxIRQL(APC_LEVEL) 
NTSTATUS
FLTAPI
FltGetFileNameInformationUnsafe (
    __in PFILE_OBJECT FileObject,
    __in_opt PFLT_INSTANCE Instance,
    __in FLT_FILE_NAME_OPTIONS NameOptions,
    __deref_out PFLT_FILE_NAME_INFORMATION *FileNameInformation
    );
We'll compare it with FltGetFileNameInformation() :

__checkReturn
__drv_maxIRQL(APC_LEVEL) 
NTSTATUS
FLTAPI
FltGetFileNameInformation (
    __in PFLT_CALLBACK_DATA CallbackData,
    __in FLT_FILE_NAME_OPTIONS NameOptions,
    __deref_out PFLT_FILE_NAME_INFORMATION *FileNameInformation
    );
So as you can see there are a couple of differences:

  • FltGetFileNameInformation() takes a FLT_CALLBACK_DATA structure. This makes it impossible to be called when a minifilter might want to get the name of a file outside the context of an IO operation. Consider for example an activity monitor filter. In order to have as little impact on the performance of the system the minifilter should to record some information about the operation as quickly as possible. Such a minifilter might implement a scheme where it references the FILE_OBJECT on which an operation happens and then resolve the FILE_OBJECT to a file name only later, in a different logging thread, outside of the context of the IO operation. Which means that the minifilter might want to call FltGetFileNameInformation and not have a FLT_CALLBACK_DATA structure.
  • FltGetFileNameInformationUnsafe() takes a FILE_OBJECT and a FLT_INSTANCE parameter. However, the FILE_OBJECT alone should be enough for FltMgr to return a name (the way IoQueryFileDosDeviceName() can get the name) so where does the FLT_INSTANCE come in ? As I've said in other posts, the name of the file might change at different points in the file system stack. If a minifilter virtualizes the namespace then it's possible that the name of a file as seen above that filter is different from the name as seen below the filter. As such, the altitude for the name is important and the FLT_INSTANCE is used to figure out for which altitude in the file system stack should the file name be returned.
  • FLT_INSTANCE is optional. The MSDN page for FltGetFileNameInformationUnsafe states that FLT_INSTANCE is optional to allow for the case when a minifilter doesn't yet have an instance, such as in DriverEntry. However, I'd be curious to see such a case, since I can't imagine how a filter would get a FILE_OBJECT in DriverEntry without having opened the file in the first place…
  • There is no FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP flag for FltGetFileNameInformationUnsafe(). The reason for this is that since FltGetFileNameInformationUnsafe() doesn't perform the checks that FltGetFileNameInformation() uses to know if it's safe to call into the file system this flag doesn't make sense.
When I set a breakpoint on FltGetFileNameInformationUnsafe in a win7 VM and then tried to start IE, it immediately triggered. The stack looked like this:
1: kd> kb
ChildEBP RetAddr  Args to Child              
99c714a4 96492bd4 941ae9a0 00000000 00000101 fltmgr!FltGetFileNameInformationUnsafe
99c714c4 96492c65 941ae9a0 99c714e4 99c714e0 tcpip!WfpAleQueryNormalizedImageFileName+0x26
99c714e8 96492e85 941ae9a0 99c71528 99c71538 tcpip!WfpAleCaptureImageFileName+0x21
99c7153c 82a8f2c6 9235b578 000001f8 99c71560 tcpip!WfpCreateProcessNotifyRoutine+0xe3
99c715f4 82a8e5af 92367948 0135b578 99c71650 nt!PspInsertThread+0x5c0
99c71d00 8287b44a 03cce628 03cce604 02000000 nt!NtCreateUserProcess+0x742
99c71d00 774764f4 03cce628 03cce604 02000000 nt!KiFastCallEntry+0x12a
WARNING: Stack unwind information not available. Following frames may be wrong.
03cce948 76592059 00000000 03f26e74 0246da18 ntdll!KiFastSystemCallRet
03cce980 759051e6 03f26e74 0246da18 00000000 kernel32!CreateProcessW+0x2c
03ccea78 75912c74 0002015e 00000000 03f26e74 SHELL32!_SHCreateProcess+0x251
03cceacc 75904fc5 00000001 03f250e8 00000001 SHELL32!CExecuteApplication::_CreateProcess+0xfc
The interesting to note here is that tcpip directly calls this API. As you can imagine tcpip doesn't really have an instance (not being a minifilter) and so I was wondering what FLT_INSTANCE it might be using, if any. Well, as you probably expect, it's not really using an instance, it's just passing in NULL. So this is an example of a regular driver (not a file system filter) calling FltGetFileNameInformationUnsafe to get the name of a file. This is actually a pretty neat idea since FltGetFileNameInformationUnsafe uses FltMgr's name cache and so it must perform better than querying the name from the file system every time. Also, this allows a caller to request a normalized name, as opposed to IoQueryFileDosDeviceName() which just gets the FileNameInformation information class directly from a file object. However, the implication is that name provider callbacks in a minifilter will impact more than just other file system filters, because name providers callbacks are used by FltGetFileNameInformationUnsafe() which is in turn used by the OS.

In my opinion, the "Unsafe" part of the name refers to two things. It's not safe to call this in some cases listed in the MSDN page for the API, because doing so might deadlock. FltGetFileNameInformation() actually checks for these cases and won't call into the file system if it's not safe to do so. Another way this API is unsafe is in that it should never be called from a minifilter while processing an operation (either during a preOp or postOp callback). FltGetFileNameInformation is the API to call in all those cases.
To wrap up, these are cases where this function can be called:

  • In a minifilter, but outside the context of an IO operation. For example, in a worker thread or something similar. Also, it can be used if the minifilter requires the name of a FILE_OBJECT that belongs to a different volume, but I have a hard time coming up with such a scenario.
  • In a legacy file system filter, outside the context of an IO operation. For example, in a worker thread or something similar.
  • In a minifilter in the PFLT_GENERATE_FILE_NAME callback, where the CallbackData parameter is null. The CallbackData parameter can only be null if this request originates from a FltGetFileNameInformationUnsafe as well.
  • In a regular driver (not a file system filter) where the driver has a FILE_OBJECT and it needs a normalized name. This is not documented by MS as being supported (in fact, they're being pretty specific about this case) so the supported route would be to call IoQueryFileDosDeviceName().The only advantages of calling FltGetFileNameInformationUnsafe() instead of IoQueryFileDosDeviceName() are the fact that FltGetFileNameInformationUnsafe() can use FltMgr's cache and that it can return a normalized name. Also, another edge case might be that IoQueryFileDosDeviceName() is documented as only being available since XP, while FltMgr was available on Win2K and as such FltGetFileNameInformationUnsafe() might be available there as well, though I've not tried it...
And these are cases where it should NOT be called:
  • In a minifilter in the context of an IO operation.
  • In a legacy file system filter in the context of an IO operation.
  • In any of the cases documented on the MSDN page for the function.

Thursday, June 23, 2011

Rename in File System Filters - part II

In this second part about renames in file system filters I'll to cover what steps need to be taken if a filter wants to issue its own IRP_MJ_SET_INFORMATION request. Also I'll like to look at the FastFat WDK implementation and point out some interesting things.

So what does a minifilter need to do if it wants to issue its own IRP_MJ_SET_INFORMATION? It turns out that it's not really that much. FltMgr provides a FltSetInformationFile() API that can be used for this purpose. A quick peek at that function reveals that for rename operations it calls a function fltmgr!FltpOpenLinkOrRenameTarget which performs a similar role to IopOpenLinkOrRenameTarget. So it would seem that for a minifilter there is never a need to actually send an IRP_MJ_SET_INFORMATION request directly. However, I have run into a case where FltSetInformationFile failed with STATUS_SHARING_VIOLATION on a customer machine in a scenario I've never been able to reproduce. After some investigation I've discovered that fltmgr!FltpOpenLinkOrRenameTarget issues its own create with FILE_SHARE_READ | FILE_SHARE_WRITE and no FILE_SHARE_DELETE. I can't tell for sure why that's the case, but it's consistent with IopOpenLinkOrRenameTarget. However, in this particular case, in an interaction with the CSC driver (the client side caching component in windows) NtSetInformationFile() worked without my minifilter in the picture while my call to FltSetInformationFile failed. I tried hard to reproduce this problem but I couldn't make it happen on my local machine and since there was only one IRP_MJ_CREATE issued I decided that I should try to add SHARE_DELETE and see if it fixes the problem (and it did). So I needed to implement my own function and build a FLT_CALLBACK_DATA structure and then send it to the file system below. These are the necessary steps that mimic the steps that fltmgr!FltpOpenLinkOrRenameTarget takes (the function itself is quite long but if there is enough interest it'll post it as an example… leave me a private message):

  1. Allocate and initialize the FILE_RENAME_INFORMATION structure.
  2. Allocate and initialize a FLT_CALLBACK_DATA structure.
  3. Call FltCreateFileEx2 (or FltCreateFile depending on the OS version) to open a handle to the target directory. Make sure to use IO_OPEN_TARGET_DIRECTORY here.
  4. Compare the DEVICE_OBJECTs associated with the source and target FILE_OBJECTs (resolve the handle returned by FltCreateFile for that) and fail if they're not the same (STATUS_NOT_SAME_DEVICE).
  5. Set PFILE_RENAME_INFORMATION->RootDirectory to the handle I've just got.
  6. Set PFLT_CALLBACK_DATA->Iopb->Parameters.SetFileInformation.ParentOfTarget to the PFILE_OBJECT associated with handle in PFILE_RENAME_INFORMATION->RootDirectory.
  7. Call FltPerformSynchronousIo( PFLT_CALLBACK_DATA …)
  8. Cleanup (close the handles, dereference the FILE_OBJECTs, free any buffers and so on)...

Finally it's time to look at the FastFat WDK sample and see what IO_OPEN_TARGET_DIRECTORY does. I'm looking at the files under the \WinDDK\7600.16385.1\src\filesys\fastfat\Win7\ directory in case you want to follow along. The reason this is interesting is because when looking at FatSetRenameInfo it's easy to see that if TargetFileObject is present, the new name for the file is exactly the name for TargetFileObject, completely ignoring whatever is set in the actual FILE_RENAME_INFORMATION buffer (here is the interesting line):

            NewName = *((PUNICODE_STRING)&TargetFileObject->FileName);

So looking at the FatCommonCreate function to see what it does for SL_OPEN_TARGET_DIRECTORY it is obvious that FatOpenTargetDirectory is the function where the magic happens. What FatOpenTargetDirectory does is that it replaces the FILE_OBJECT->FileName with the final component of that path, which explains why in FatSetRenameInfo Fat can simply look at the TargetFileObject->FileName to get the file name. This is pretty interesting since (as I said before) it means that when the IRP_MJ_SET_INFORMATION IRP is processed by filters, any modifications to the FILE_RENAME_INFORMATION->FileName are ignored.

The next step was to see if the other file systems work in a similar fashion. Unfortunately the other file system that ships with the WDK is CDFS, which doesn't support renames (I guess that's because it was designed to work on CDs, which are read-only and so renames would make no sense). So I took the passthrough sample and modified it a bit so that it would break during a successful postCreate for an operation that had the SL_OPEN_TARGET_DIRECTORY flag set, so that I could investigate what happens with the file name. First let me post the source code for the minifilter (i've just modified PtPreOperationPassThrough and PtPostOperationPassThrough):

FLT_PREOP_CALLBACK_STATUS
PtPreOperationPassThrough (
    __inout PFLT_CALLBACK_DATA Data,
    __in PCFLT_RELATED_OBJECTS FltObjects,
    __deref_out_opt PVOID *CompletionContext
    )
...
{
    NTSTATUS status;

    UNREFERENCED_PARAMETER( FltObjects );
    UNREFERENCED_PARAMETER( CompletionContext );

    PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,
                  ("PassThrough!PtPreOperationPassThrough: Entered\n") );

    if ((Data->Iopb->MajorFunction == IRP_MJ_CREATE) &&
        (FlagOn(Data->Iopb->OperationFlags, SL_OPEN_TARGET_DIRECTORY))) {

        //
        // this is an IRP_MJ_CREATE operation for a target of a rename.
        // tell the postCreate callback we'd like to break. Use the 
        // CompletionContext like a BOOLEAN variable.
        //

        *CompletionContext = (PVOID)TRUE;
    }
….


FLT_POSTOP_CALLBACK_STATUS
PtPostOperationPassThrough (
    __inout PFLT_CALLBACK_DATA Data,
    __in PCFLT_RELATED_OBJECTS FltObjects,
    __in_opt PVOID CompletionContext,
    __in FLT_POST_OPERATION_FLAGS Flags
    )
...
{
    UNREFERENCED_PARAMETER( Data );
    UNREFERENCED_PARAMETER( FltObjects );
    UNREFERENCED_PARAMETER( CompletionContext );
    UNREFERENCED_PARAMETER( Flags );

    PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,
                  ("PassThrough!PtPostOperationPassThrough: Entered\n") );

    if ((CompletionContext != NULL) &&
        (Data->IoStatus.Status == STATUS_SUCCESS)) {

        DbgBreakPoint();
    }

    return FLT_POSTOP_FINISHED_PROCESSING;
}

Once I had the minifilter in place I fired up FileTest.exe and renamed a file to C:\rename_target_dir\rename_target_file.bin. My plan was that once it breaks in the debugger I would poke around and see what the FILE_OBJECT->FileName looks like. It turns out that NTFS follows a similar approach, except that the FILE_OBJECT->FileName for the directory that is opened (since when the SL_OPEN_TARGET_DIRECTORY flag is set the CREATE always opens a directory) points to the actual path for the directory. However, that path to the directory comes from the original rename target file path, which is just truncated to not include the file component. Then that final component hidden is used by the file system as the target of the rename in a similar fashion to FastFat. In order to make sure that NTFS actually uses the name in the FileObject I changed in the debugger so that the file name would be "rename_target_fi1e.bin".

1: kd> ?? FltObjects->FileObject
struct _FILE_OBJECT * 0x924a18e8
   +0x000 Type             : 0n5
   +0x002 Size             : 0n128
   +0x004 DeviceObject     : 0x92f0ebc8 _DEVICE_OBJECT
   +0x008 Vpb              : 0x92f0b210 _VPB
   +0x00c FsContext        : 0xa5962d08 Void
   +0x010 FsContext2       : 0xb127a610 Void
   +0x014 SectionObjectPointer : (null) 
   +0x018 PrivateCacheMap  : (null) 
   +0x01c FinalStatus      : 0n0
   +0x020 RelatedFileObject : (null) 
   +0x024 LockOperation    : 0 ''
   +0x025 DeletePending    : 0 ''
   +0x026 ReadAccess       : 0 ''
   +0x027 WriteAccess      : 0x1 ''
   +0x028 DeleteAccess     : 0 ''
   +0x029 SharedRead       : 0x1 ''
   +0x02a SharedWrite      : 0x1 ''
   +0x02b SharedDelete     : 0 ''
   +0x02c Flags            : 0
   +0x030 FileName         : _UNICODE_STRING "\rename_target_dir"
   +0x038 CurrentByteOffset : _LARGE_INTEGER 0x0
   +0x040 Waiters          : 0
   +0x044 Busy             : 0
   +0x048 LastLock         : (null) 
   +0x04c Lock             : _KEVENT
   +0x05c Event            : _KEVENT
   +0x06c CompletionContext : (null) 
   +0x070 IrpListLock      : 0
   +0x074 IrpList          : _LIST_ENTRY [ 0x924a195c - 0x924a195c ]
   +0x07c FileObjectExtension : (null) 
1: kd> ?? FltObjects->FileObject->FileName
struct _UNICODE_STRING
 "\rename_target_dir"
   +0x000 Length           : 0x24
   +0x002 MaximumLength    : 0x52
   +0x004 Buffer           : 0xb0796550  "\rename_target_dir"
1: kd> db 0xb0796550 L0x52
b0796550  5c 00 72 00 65 00 6e 00-61 00 6d 00 65 00 5f 00  \.r.e.n.a.m.e._.
b0796560  74 00 61 00 72 00 67 00-65 00 74 00 5f 00 64 00  t.a.r.g.e.t._.d.
b0796570  69 00 72 00 5c 00 72 00-65 00 6e 00 61 00 6d 00  i.r.\.r.e.n.a.m.
b0796580  65 00 5f 00 74 00 61 00-72 00 67 00 65 00 74 00  e._.t.a.r.g.e.t.
b0796590  5f 00 66 00 69 00 6c 00-65 00 2e 00 62 00 69 00  _.f.i.l.e...b.i.
b07965a0  6e 00                                            n.
1: kd> eb b0796596 0x31
1: kd> db 0xb0796550 L0x52
b0796550  5c 00 72 00 65 00 6e 00-61 00 6d 00 65 00 5f 00  \.r.e.n.a.m.e._.
b0796560  74 00 61 00 72 00 67 00-65 00 74 00 5f 00 64 00  t.a.r.g.e.t._.d.
b0796570  69 00 72 00 5c 00 72 00-65 00 6e 00 61 00 6d 00  i.r.\.r.e.n.a.m.
b0796580  65 00 5f 00 74 00 61 00-72 00 67 00 65 00 74 00  e._.t.a.r.g.e.t.
b0796590  5f 00 66 00 69 00 31 00-65 00 2e 00 62 00 69 00  _.f.i.1.e...b.i.
b07965a0  6e 00                                            n.
1: kd> g

After continuing execution the file on the file system was renamed to the new name that I had changed in the debugger and it thus validated my theory (well, almost.. It was still possible that the associated IRP_MJ_SET_INFORMATION was somehow initialized to use the buffer I've modified from the FileObject->FileName so I debugged and made sure that's not the case…)

Here are some more things that make renames difficult to deal with in a file system filter (in addition to the list at the end of last post):

  • A file system filter that needs to redirect a rename operation can't just rely on changing the destination name in the FILE_RENAME_INFORMATION buffer for renames where the FILE_RENAME_INFORMATION->RootDirectory is not null, since some file systems ignore that. Instead it needs to make sure that it creates a handle to the parent directory using the IO_OPEN_TARGET_DIRECTORY flag. However a filter must also change the FILE_RENAME_INFORMATION because another filter might rely on that (FltMgr's FltGetDestinationFileNameInformation for example) so data in the buffer and the data that the file system will use must be kept in sync.
  • There are issues with FltSetInformationFile where calling it for a FileRenameInformation will fail because of a sharing violation. If someone has run into this problem and has figured out why it happens or if they have some steps to reproduce it so that I could investigate it myself I'd appreciate if they contacted me offline .

Thursday, June 16, 2011

Rename in File System Filters - part I

Rename is one of the operations that a lot of different classes of minifilters need to handle. I have been surprised by how rename works on a couple of occasions and so I figured it might be worthwhile to talk about how it is implemented and what particular problems it creates for file system filters.

Before we go any further there are some architectural decisions that are important to understand in order to see where some of the peculiarities of the rename operation come from:

  • In general the file system on a volume isn't aware of the other volumes mounted by the same file system. This means that a mounted file system (which I will refer to as a file system volume to differentiate it from a storage volume) never performs operations on a different file system volume (for example the NTFS code running on C: never touches NTFS structures for D: and is generally unaware of the existance of D: altogether). This is a big architectural decision and it has the advantage that it can tremendously simplify implementation. It's easy to imagine how complicated the file system stack would become if a file system might need to take locks and references against a different file system volume for some operations. So because of this rename operations at the file system level only support the same file system volume, and in fact even the OS doesn't support cross-volume renames (actually this depends on what your definition for the OS is; there is no support at NT level (which to me is the OS) but Win32 however introduces some support for cross-volume renames).
  • File paths are a big part of what a file system does. In fact, maintaining the namespace is the most important function of a file system when considering the fact that file data storage is largely implemented by the storage stack and all the file system has to do is to translate file offsets into storage offsets (to be fair there are cases where the file system does more than that, for example when compressing or encrypting). So clearly a lot of the code in a file system is dedicated to maintaining and operating on the namespace. The one code path that obviously needs to deal with file names is the IRP_MJ_CREATE path (regardless of whether files and directories are opened or created). So naturally a lot of the coding effort goes into implementing and optimizing that path. Once a file is opened, the file system would prefer to not have to deal with the file name at all (and this is indeed how it's implemented). With this in mind it's easy to see that it is best if any operation that deals with names can reuse as much of the CREATE code path as possible. This reuse can take two forms, either by having the OS call IRP_MJ_CREATE every time it needs to pass a file path to the file system and getting back a FILE_OBJECT on which to operate, or by having the file system internally call a lot of the functions that collectively make up the file system create path (which doesn't really happen). This is why a lot of the Win32 APIs take file paths (like DeleteFile()) and then internally convert this into "handle = ZwCreteFile(file_path), internal_operation(handle), ZwClose(handle)".

First let's look at what the APIs are that are usually used to rename a file. The usermode APIs are:

BOOL WINAPI MoveFile(
  __in  LPCTSTR lpExistingFileName,
  __in  LPCTSTR lpNewFileName
);

BOOL WINAPI MoveFileEx(
  __in      LPCTSTR lpExistingFileName,
  __in_opt  LPCTSTR lpNewFileName,
  __in      DWORD dwFlags
);

BOOL WINAPI MoveFileWithProgress(
  __in      LPCTSTR lpExistingFileName,
  __in_opt  LPCTSTR lpNewFileName,
  __in_opt  LPPROGRESS_ROUTINE lpProgressRoutine,
  __in_opt  LPVOID lpData,
  __in      DWORD dwFlags
);

BOOL WINAPI MoveFileTransacted(
  __in      LPCTSTR lpExistingFileName,
  __in_opt  LPCTSTR lpNewFileName,
  __in_opt  LPPROGRESS_ROUTINE lpProgressRoutine,
  __in_opt  LPVOID lpData,
  __in      DWORD dwFlags,
  __in      HANDLE hTransaction
);

These APIs follow a sort of progression, adding more and more options. However, for a file system developer it doesn't really matter much whether the user wants to track the progress of the operation or whether they specify a transaction or not (since at the file system filter's level the transaction is always available if the OS supports transactions). The most important parameters are the existing file name, the destination file name and the flags. Of the flags the most important one is MOVEFILE_COPY_ALLOWED because it instructs the IO manager that if the destination file is on different volume it should copy the file to the new volume and delete it from the source volume. This flag is usually specified and it's on by default when calling MoveFile (on my Win7 VM MoveFileW() is just a wrapper over MoveFileWithProgressW() where this is the only flag specified).

Now, when it gets to the OS layer, the way to issue a rename is by calling ZwSetInformationFile() with the FileRenameInformation information class, which requires the FILE_RENAME_INFORMATION:

NTSTATUS ZwSetInformationFile(
  __in   HANDLE FileHandle,
  __out  PIO_STATUS_BLOCK IoStatusBlock,
  __in   PVOID FileInformation,
  __in   ULONG Length,
  __in   FILE_INFORMATION_CLASS FileInformationClass
);

typedef struct _FILE_RENAME_INFORMATION {
  BOOLEAN ReplaceIfExists;
  HANDLE  RootDirectory;
  ULONG   FileNameLength;
  WCHAR   FileName[1];
} FILE_RENAME_INFORMATION, *PFILE_RENAME_INFORMATION;

The documentation for FILE_RENAME_INFORMATION and for IRP_MJ_SET_INFORMATION is pretty good and explains a lot about the parameters. However, the documentation for ZwSetInformationFile does not mention the FileRenameInformation case, which is actually pretty interesting.

One thing I mentioned at the beginning of this post is that one file system volume doesn't ever interact with another file system volume. Since the target of a rename can be a full path and since it can point to a different volume, the OS should only send the path down if it is on the same volume. So what happens is that ZwSetInformationFile must validate if the rename is within the same volume before sending the IRP_MJ_SET_INFORMATION request down. This is achieved in the function IopOpenLinkOrRenameTarget, which performs a couple of simple steps:

  1. check if the FILE_RENAME_INFORMATION->RootDirectory is a user mode handle and if so convert it to a kernel handle by calling IoConvertFileHandleToKernelHandle. This is very important for filter writers because it means that a filter cannot expect that the handle is kernel handle.
  2. issue an IoCreateFileEx to open the target of the rename, which is either a full rename or a relative one (depending on whether FILE_RENAME_INFORMATION->RootDirectory is NULL or not). This IoCreateFileEx inherits both the transaction and the DeviceObject hint from the source file object. Also this IRP_MJ_CREATE always has the SL_OPEN_TARGET_DIRECTORY flag set.
  3. If the create succeeds then the DeviceObject for the source FILE_OBJECT is compared with the DeviceObject for the target FILE_OBJECT and if they are different then IopOpenLinkOrRenameTarget returns STATUS_NOT_SAME_DEVICE.
  4. If they are the same then the new FILE_OBJECT (which is always a directory, the parent directory of the path specified by FILE_RENAME_INFORMATION->FileName and is always obtained from the IRP_MJ_CREATE with the SL_OPEN_TARGET_DIRECTORY as explained above) is set into IrpSp->Parameters.SetFile.FileObject. This FILE_OBJECT is the used by the file system to determine the parent directory for the target of the rename in the file system.

Once IopOpenLinkOrRenameTarget returns the only thing that remains for ZwSetInformationFile to do is to send the IRP to the file system.

I have some more things to say about renames that I'll save for next week, but I'd like to close with a list of things that make life really hard for file system developers:

  • The fact that ZwSetInformationFile performs the check of whether a certain rename would be a cross-volume rename before issuing the actual IRP_MJ_SET_INFORMATION means that all a file system filter sees is an IRP_MJ_CREATE file with SL_OPEN_TARGET_DIRECTORY, but it cannot know what the source file for this rename is. This is problematic for filters that might need to redirect renames to different locations depending on the source file (an application virtualization filter for example might want to redirect renames for certain files to some other location); such filters need to actually wait until the IRP_MJ_SET_INFORMATION request arrives and change the destination at that point. Another class of filters that is impacted by this are filters that do something where they create a virtual namespace in a volume (for example in a directory on a volume) by bringing contents from another volume. Clearly renames won't work across volumes, but even if the filter is prepared to implement something similar to the MOVEFILE_COPY_ALLOWED flag and copy the file if it is across volumes, it doesn't have the option because an IRP_MJ_SET_INFORMATION will simply not arrive if the target and the source are on different volumes. One feature that I feel would be very helpful here would be to add the source FILE_OBJECT into an ECP on the IRP_MJ_CREATE issued by IopOpenLinkOrRenameTarget, which would allow filters to detect what is the source file for the rename.
  • Another thing that makes things unnecessarily complicated for filter developers is that reusing of the DeviceObject hint in IopOpenLinkOrRenameTarget. This means that anytime a minifilter wants to rename a FILE_OBJECT that it created via FltCreateFile it must use as the FILE_RENAME_INFORMATION->FileName parameter a path on the same device as the FILE_OBJECT that it has created. This isn't that big of a problem since the minifilter must already know the right device, but it might still need to do some file path manipulation to make sure that the path itself is on that device (which can happen if the target of a rename comes from the user and it contains a reparse point). Just something to keep in mind I guess.
  • The presence of a handle (which is possibly a user mode handle) in the FILE_RENAME_INFORMATION structure means that a minifilter that wants to pend a rename operation must take extra steps to make sure that the handle is still valid in the context of the process where the thread handling the pended rename runs. It must keep all the data in the FILE_RENAME_INFORMATION and the IrpSp->Parameters.SetFile in sync because it can't make any assumption which data point another file system filter or even the file system might use.
  • The fact that the FILE_OBJECT that was opened by IopOpenLinkOrRenameTarget is used by the file system when processing the rename and in particular how it is used also means that filters that want to issue their own rename operation (by directly issuing an IRP or FLT_CALLBACK_DATA) must duplicate the stepts the OS takes in order to build a proper request. But more on this next week.