Thursday, March 17, 2011

How File System Filters Attach to Volumes - Part II

And now it's time to take a look at how minifilters fit in the picture. One of the first things to note is that the minifilter model does away completely with CDOs. The only objects that a minifilter interacts with are FLT_VOLUMEs, which are equivalent to VDOs. This is not normally a problem because most operations that are sent to the CDO are not really relevant to file system filters. However, IRP_MN_MOUNT_VOLUME is the one operation minifilters might be interested in, because some minifilters might want to block mounting of a volume. This is where IRP_MJ_VOLUME_MOUNT comes in. This is a virtual IRP (there isn't such an IRP function in the IO manager) that is only relevant to minifilters. When an IRP_MN_MOUNT_VOLUME is received by FltMgr on the CDO for a file system, it will create a request with the type IRP_MJ_VOLUME_MOUNT and send it to minifilters that have registered for that notification. Please note that there are some limitations about what the minifilter can do at this time, the intention is that a minifilter uses this notifcation only if it wants to block a volume mount and it must figure out whether this volume needs to be blocked or not without much help from the file system. Blocking a mount means that the volume will not be mounted at all on the system and is not what is used by  minifilter to tell FltMgr that it doesn't want to be attached to that volume (which is handled in the InstanceSetupCallback). Here are some things that are special about this operation:
  • The minifilter sees this request before the volume is mounted by the file system (in fact at this point the file system doesn't know anything about the volume, the IRP_MN_MOUNT_VOLUME would be the first notification the file system receives about that volume), so it really can't do any file system operation on that volume.
  • Only the Filter and the Volume members of the FLT_OBJECTS structure are set up.
  • The IO manager has locks held at this point and any file system IO to the same volume or even a different volume that might not be mounted at this time might deadlock. Block level IO to the volume should work though.
  • This is listed as a FAST_IO operation, though in fact it's not. Don't return FLT_PREOP_DISALLOW_FASTIO to it. Don't return FLT_PREOP_PENDING either. You must either return FLT_PREOP_SUCCESS_WITH_CALLBACK, FLT_PREOP_SUCCESS_NO_CALLBACK or FLT_PREOP_COMPLETE (if you want to prevent the mount from reaching the file system CDO, in which case the status must not be STATUS_SUCCESS).
This is what it looks like when the request reaches a minifilter (the passthrough sample in my case). Things to note are the fields in FltObjects and the fact that the volume isn't initialized yet (I've highlighted them):
1: kd> kn
 # ChildEBP RetAddr  
00 9960b8c4 9604319a PassThrough!PtPreOperationPassThrough+0x3c [c:\temp\passthrough\passthrough.c @ 675]
01 9960b930 960489ec fltmgr!FltpPerformPreMountCallbacks+0x1d0
02 9960b998 96048c5b fltmgr!FltpFsControlMountVolume+0x116
03 9960b9c8 828454bc fltmgr!FltpFsControl+0x5b
04 9960b9e0 829c102d nt!IofCallDriver+0x63
05 9960ba44 828a5424 nt!IopMountVolume+0x1d8
06 9960ba7c 82a48f9f nt!IopCheckVpbMounted+0x64
07 9960bb60 82a2a26b nt!IopParseDevice+0x7c9
08 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
09 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0a 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0b 9960bd00 8284c44a nt!NtCreateFile+0x34
0c 9960bd00 778464f4 nt!KiFastCallEntry+0x12a
1: kd> ?? FltObjects
struct _FLT_RELATED_OBJECTS * 0x9960b8e8
   +0x000 Size             : 0x18
   +0x002 TransactionContext : 0
   +0x004 Filter           : 0x9299d678 _FLT_FILTER
   +0x008 Volume           : 0x9297fad8 _FLT_VOLUME
   +0x00c Instance         : (null) 
   +0x010 FileObject       : (null) 
   +0x014 Transaction      : (null) 
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000002 (1)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa74-92cdaa74] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000008] Mounting
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa74-92cdaa74] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 00000000 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
The next thing to talk about is the InstanceSetupCallback. This is the callback that gets called when a new instance gets created. This callback allows the minifilter to decide whether it needs to attach to the volume and to set up its internal state for that volume. The interesting thing about this is understanding when it gets called. One important factor in the decision is that the minifilter needs to be able to perform operations on the file system in its InstanceSetupCallback (like opening a file for example; see the MetadataManager minifilter sample in the WDK for a minifilter that does that). Let's look at some of the factors that would impact the decision about when the notification needs to be called:
  • The IO manager needs to be able to process operations on the volume. This means that the mount must be completed, because FltCreateFile (and most other operations) would go to the IO manager and if the IO manager doesn't know that the mount is completed it will block the operation behind the mount. So clearly, the InstanceSetupCallback could not have been called anywhere during IRP_MN_MOUNT_VOLUME processing.
  • The InstanceSetupCallback must be called before ANY other callback is sent to the minfilter on that volume, which means it can't be asynchronous with other operations because for asynchronous operations, the order in which they reach various layers is impossible to guarantee. So all operations above a certain layer must be blocked until InstanceSetupCallback for that layer is completed.
  • Because minifilters need to be able to perform IO on the file system in their InstanceSetupCallback, the filters below them need to see that operation (since minifilters should be able to filter all operations on a volume). This means that the InstanceSetupCallback must have already been called for all the minifilters below (otherwise they wouldn't be able to process operations).
So when considering all these factors we arrive at the current implementation. When any operation reaches FltMgr on a volume, if InstanceSetupCallbacks have not been called yet, FltMgr will block that operation and call the InstanceSetupCallbacks for all the minifilters, starting from the lowest one and going up (where up means higher altitude numbers). After all the InstanceSetupCallbacks are complete FltMgr will release the lock and IO can proceed normally. Let's take a look in the debugger and see these effects. Things to note are how we're still in the context of the NtCreateFile request where we were before. Also, please note that the FileInfo minifilter is below PassThrough and it's already set up:
1: kd> kn
 # ChildEBP RetAddr  
00 9960b880 96049bf5 PassThrough!PtInstanceSetup [c:\temp\passthrough\passthrough.c @ 393]
01 9960b8b4 9604a417 fltmgr!FltpDoInstanceSetupNotification+0x69
02 9960b900 9604a7d1 fltmgr!FltpInitInstance+0x25d
03 9960b970 9604a8d7 fltmgr!FltpCreateInstanceFromName+0x285
04 9960b9dc 96053cde fltmgr!FltpEnumerateRegistryInstances+0xf9
05 9960ba2c 960487f4 fltmgr!FltpDoFilterNotificationForNewVolume+0xe0
06 9960ba70 828454bc fltmgr!FltpCreate+0x206
07 9960ba88 82a496ad nt!IofCallDriver+0x63
08 9960bb60 82a2a26b nt!IopParseDevice+0xed7
09 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
0a 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0b 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0c 9960bd00 8284c44a nt!NtCreateFile+0x34
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000006 (3)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa68-924ea7f4] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000066] PendingSetupNotify SetupNotifyCalled EnableNameCaching FilterAttached
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa68-924ea7f4] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 9297fad8 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
      FLT_INSTANCE: 92979b40 "PassThrough Instance" "370030"
      FLT_INSTANCE: 923f2dc8 "FileInfo" "45000"
Before we go on I'd like to recap the sequence of events during a mount:
  1. A request to open a file (that ultimately arrives in IopCreateFile) is sent to a volume that is not mounted yet.
  2. During the IopParseDevice call IO manager discovers that the volume isn't mounted so it tries to mount it. It does this behind a lock so multiple requests would be queued here until the mount completes.
  3. the IO manager sends the IRP_MJ_FILE_SYSTEM_CONTROL with IRP_MN_MOUNT_VOLUME request to the CDO of each file system.
  4. FltMgr is attached to each CDO and so it gets this request and sends the IRP_MJ_VOLUME_MOUNT request to minifilters.
  5. If no minifilters blocked the request, FltMgr sends the request to the FS CDO below.
  6. The FS creates a VDO and the volume is mounted.
  7. The IO manager knows the volume is mounted and it releases all the operations blocked behind that volume mount.
  8. The FltMgr gets all these operations and it discovers that the topmost instance on the volume hasn't been initialized yet, so it block all the operations behind a lock again.
  9. It then calls InstanceSetupCallback for the lowest minifilter in the stack, then for the one above it, and then for the one above that one and so on.. Please note that this notification happens in the context of whichever thread happens to win the race of the lock, so if there is more than one thread trying to perform an operation on a volume, it's possible that the IRP_MJ_VOLUME_MOUNT callback is called in the context of one thread and InstanceSetupCallback in a context of a different one.
  10. Once all the instances have been set up, FltMgr allows all operations to continue and the initialization of the volume is now complete.
Finally I'd like to talk about a couple of deadlocks that I've seen and some design decisions to avoid.
  • One interesting deadlock happened with a minifilter that blocked preCreate and called a user mode service to scan the file (like an anti-virus). When another minifilter above that one tried to create a file in its InstanceSetupCallback (it actually was the MetadataManager sample), this minifilter blocked that create and sent it to the user mode service. The user mode services tried to open the file to scan it but it was blocked in FltMgr because the instance setup phase wasn't complete so all top level IO was blocked. Alternative approaches that would have avoided that deadlock would have been to scan in postCreate (which is what most such filters do) or to use a private communication channel with the user mode service to insure that all IO issued by the user mode service is layered properly.
  • Another interesting deadlock can happen with the registry. As you can tell from the stack above, FltMgr needs to read minifilter configuration information from the registry (see that call to fltmgr!FltpEnumerateRegistryInstances). However, the registry has some very complicated locking rules and so if it happens that the registry is locked when FltMgr needs to read its configuration, FltMgr will wait for it. In one case I've seen, a driver (not a minifilter) was calling ZwLoadKey() for a file on a different volume from the system volume. The volume wasn't mounted so inside the ZwLoadKey() call the registry would acquire a lock, try to open the file, which resulted in a mount and then FltMgr tried to check the registry for any minifilter instances and it got blocked behind the registry lock. One possible solution in this case would be make sure that the volume is mounted before calling ZwLoadKey(). Please note that this might happen in many cases, any operation that ties a registry operation with a file system operation can potentially deadlock. For example, a registry filter that tries to log operation to a file might also cause the same deadlock if the volume containing the log file hasn't arrived yet.
  • Another pretty well known deadlock happens in InstanceSetup with the MountMgr. It is described in great detail here http://www.osronline.com/showThread.cfm?link=90003, so I won't do it. This should be fixed in Vista and Win7.
I hope this post has been useful in explaining how instances get created on a volume and how they might sometimes deadlock and what to look for when such deadlocks occur.

Thursday, March 10, 2011

How File System Filters Attach to Volumes - Part I

I want to talk a bit about how FltMgr attaches to volumes and how instances are created when a new volume arrives. I want to use that as the basis to talk about what minifilters can do in their InstanceSetup callback. This should also explain some possible deadlocks in that path and emphasize the point that doing things in postCreate is preferable to preCreate. I also want to talk about IRP_MJ_VOLUME_MOUNT and how it works and why it's there. I was going to write just one post but it's too long already and I'm not done so I'll split it in a couple of posts...

I'll start with a refresher on how file systems mount volumes and how legacy file system filters attach to file systems. When a file system driver is initialized it creates what is called a Control Device Object (CDO). It can create more than one of those (look at the FastFat WDK sample for an example of a file system creating more than a CDO). The reason the file system needs to do that is that it must register a device with the IO manager when it tells it is a file system (by calling IoRegisterFileSystem and passing in the CDO(s)). Please note that this mechanism predates PNP and as you can see it is very different. These CDOs are named device objects and their purpose is to receive commands for the file system. One such command is the IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code (which I'll just refer to as IRP_MN_MOUNT_VOLUME from now on since IRP_MN_MOUNT_VOLUME is only delivered through an IRP_MJ_FILE_SYSTEM_CONTROL and there is no possibility of confusion), which is sent by the IO manager when it wants to mount a volume. One possible sequence of operations is this:

  1. Volume DEVICE_OBJECT is created, usually by the volume manager, with a name like "\Device\HarddiskVolume2".
  2. The volume manager alerts the system of the arrival of the volume by calling IoRegisterDeviceInterface() with the MOUNTDEV_MOUNTED_DEVICE_GUID or GUID_DEVINTERFACE_VOLUME (which are in fact the same GUID). This alerts MountMgr that a volume has arrived.
  3. MountMgr queries the volume for the name and sets up the NT volume name (which looks like "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}") and the DOS volume name (which might look like "C:"). Both these names point to the volume device ("\Device\HarddiskVolume2").
  4. At this point the volume is not mounted and it has a VPB structure associated with it that keeps track of that.
  5. After a while someone issues an operation to the volume (like trying to open "C:\foo.txt", or "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\foo.txt" or "\Device\HarddiskVolume2\foo.txt", which are different names for the same thing). While trying to issue the IRP_MJ_CREATE, IO manager will check if the volume is mounted and if not it will mount it (nt!IopCheckVpbMounted). See my post "About IRP_MJ_CREATE and minifilter design considerations - Part II" and look at step 2 in my steps for nt!IopParseDevice.
  6. If the volume is not mounted in nt!IopCheckVpbMounted then IO mgr calls nt!IopMountVolume which walks through the registered file systems for that device type (hence the need for more than one CDO) and sends the IRP_MN_MOUNT_VOLUME request to each of devices on the list of registered file systems (which is a list of CDOs).
  7. When a file system receives an IRP_MN_MOUNT_VOLUME it checks whether it can mount the file system (reads some sectors and does whatever it needs to do to figure it is it's volume) and then it creates a new DEVICE_OBJECT (anonymous this time) which is called a Volume Device Object (VDO), which is linked through the VPB to the actual volume DEVICE_OBJECT (the one that has a name and a drive letter).
  8. Once nt!IopCheckVpbMounted completes and a volume is mounted nt!IopParseDevice continues and an IRP_MJ_CREATE is sent to the newly mounted volume, which is the first operation that the file system processed on that VDO.
Another way to look at this is that the CDO device functions as a factory for file system instances, and the IRP_MN_MOUNT_VOLUME is a request for the factory to generate an instance associated with the storage volume DEVICE_OBJECT, which will either fail if the file system doesn't recognize the volume or will return the file system VDO, which is the file system instance for that volume. Here is some debugger output to illustrate all this. In order to generate all this I took a 32bit Win7 and rebooted it and put a breakpoint on nt!IopMountVolume (that's why NTFS has no volumes and just a CDO). I'm showing mainly to showcase some more windbg commands that are useful when debugging file systems:
This is NTFS initialized, with just one DEVICE_OBJECT, the CDO. Also please note how the CDO is a named device:
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93215638  
0: kd> !devobj 93215638  
Device object (93215638) is for:
 Ntfs \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl 973af50c DevExt 00000000 DevObjExt 932156f0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93211020 \FileSystem\FltMgr
Device queue is not busy.
This is what the stack looks like when IopMountVolume is called. Please note that volsnap is opening a file on a volume. Also, note how the DeviceObject member of the VPB is null (since no file system is mounted on the volume), and the VPB flags are also all clear:
0: kd> kb
ChildEBP RetAddr  Args to Child              
984b18cc 828ad424 934cd768 924d7a00 00000000 nt!IopMountVolume
984b1904 82a50f9f 924d7a48 984b1a30 984b19c8 nt!IopCheckVpbMounted+0x64
984b19e8 82a3226b 934cd768 844d6f78 924f55e8 nt!IopParseDevice+0x7c9
984b1a64 82a582d9 00000000 984b1ab8 00000240 nt!ObpLookupObjectName+0x4fa
984b1ac4 82a5062b 984b1c44 924d6f78 93b15900 nt!ObOpenObjectByName+0x165
984b1b40 82a8b67e 984b1c90 00120089 984b1c44 nt!IopCreateFile+0x673
984b1b88 8285444a 984b1c90 00120089 984b1c44 nt!NtOpenFile+0x2a
984b1b88 828527c1 984b1c90 00120089 984b1c44 nt!KiFastCallEntry+0x12a
984b1c18 969b0414 984b1c90 00120089 984b1c44 nt!ZwOpenFile+0x11
984b1c94 969b9194 934d60d8 00000000 00000000 volsnap!VspOpenControlBlockFile+0x108
984b1d1c 969b9eea 934d60d8 935775ac 934c78bc volsnap!VspOpenFilesAndValidateSnapshots+0x2e
984b1d34 969a5e59 935775a8 00000000 93500020 volsnap!VspSetIgnorableBlocksInBitmapWorker+0x40
984b1d50 82a1f6d3 934c79ac 432d39b1 00000000 volsnap!VspWorkerThread+0x83
984b1d90 828d10f9 969a5dd6 934cd6a0 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
0: kd> !obja 984b1c44 
Obja +984b1c44 at 984b1c44:
 Name is \Device\HarddiskVolume2\System Volume Information\{3808876b-c176-4e48-b7ae-04046e6cc752}
 OBJ_CASE_INSENSITIVE
0: kd> !devobj \Device\HarddiskVolume2
Device object (934cd768) is for:
 HarddiskVolume2 \Driver\volmgr DriverObject 93b00388
Current Irp 00000000 RefCount 1 Type 00000007 Flags 00003150
Vpb 934cb290 Dacl 973af50c DevExt 934cd820 DevObjExt 934cd908 Dope 934cab20 DevNode 934cfc48 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 934d0b70 \Driver\fvevol
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x0 
DeviceObject: 0x00000000
RealDevice:   0x934cd768
RefCount: 0
Volume Label: 
Next thing we're going to step out of this function and look at the objects again. There is a new, anonymous DEVICE_OBJECT that NTFS created, which is pointed by VPB->DeviceObject and the VPB flags have changed to indicate that the volume is mounted.
1: kd> gu
nt!IopCheckVpbMounted+0x64:
828ad424 8b4d10          mov     ecx,dword ptr [ebp+10h]
0: kd> gu
nt!IopParseDevice+0x7c9:
82a50f9f 8945c4          mov     dword ptr [ebp-3Ch],eax
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93690020  93215638  
0: kd> !devobj 93690020  
Device object (93690020) is for:
  \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00040000
DevExt 936900d8 DevObjExt 93690fb0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93566c08 \FileSystem\FltMgr
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x1 mounted 
DeviceObject: 0x93690020
RealDevice:   0x934cd768
RefCount: 15
Volume Label: 
Filters have largely been out of the picture so far (except for the fact that FltMgr was attached both to NTFS' CDO and the newly created VDO). So let's talk about how legacy filters (FltMgr being a legacy filter) enter this picture. When NTFS calls IoRegisterFileSystem, FltMgr creates and attaches a DEVICE_OBJECT of its own on top of NTFS. So FltMgr will have a device attached to all CDOs. Then, when an IRP_MN_MOUNT_VOLUME request arrives on that CDO, FltMgr creates a new DEVICE_OBJECT (that will be attached to the VDO created by the file system if the mount is successful or discarded if the mount is not successful) and then it simply passes the IRP_MN_MOUNT_VOLUME request below. Please note that FltMgr can't know in advance if the file system will actually mount the volume or not, so it must wait until the IRP_MN_MOUNT_VOLUME is completed to do more significant work. However, if it waited for the completion of IRP_MN_MOUNT_VOLUME before allocating the new DEVICE_OBJECT, it might end up in the position where the mount was successful but allocating the new DEVICE_OBJECT failed so it wouldn't be able to attach to the volume. The only reason I'm mentioning this is to illustrate that the safe approach when filtering something is to pre-allocate all resources that might be necessary (and perform all checks) before the operation is sent to the layer below (and if anything fails then fail the operation), because if the layer below successfully completes the operation the filter must not fail in processing it or it might end up in a broken state. Alternatively it might have to undo the operation performed at the underlying layer, which might not be easy or even possible.
The key things to remember from this post are:
  • The drive letter (DOS name) and other volume names (NT name) are not associated with the file system device, but rather with the storage volume.
  • Mounting the volume happens on first access to that volume.
  • Also, the first IO on a volume is an IRP_MJ_CREATE, so for a filter (both legacy and minifilter) the preCreate callback will be the first operation callback called on a newly mounted file system volume.

Thursday, March 3, 2011

Duplicating User Mode Handles

Among the many new verifier checks in Win7 is a particular one about using user handles in kernel mode. I won't go into the details of why that is potentially bad and instead I'll focus on how to work around the issue. However I'd like to point to some documentation explaining the improvements in Driver Verifier in Win7 in general and this check in particular. There is a PPT "Driver Verifier Advancements In Windows 7" that is pretty good for a high-level view and there is also a more detailed document "Driver Verifier in Windows 7". The verifier bugcheck message in this case is this:

DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 000000f6, Referencing user handle as KernelMode.
Arg2: xxxxxxxx, Handle value being referenced.
Arg3: xxxxxxxx, Address of the current process.
Arg4: xxxxxxxx, Address inside the driver that is performing the incorrect reference.
Before we go into more detail I'd also like to explain what I was trying to do in one case where I ran into this issue. I was writing a driver that had a user mode command line utility that was used to send commands to the driver. One of these commands required sending a user mode file to the driver so it could write logging information into that file. One approach could be to pass in the name of the file and use the driver to open the file, but this is not trivial for various reasons:

  • How to get the file name? The user might call the command line utility with a relative path (like "foo.exe -file ..\bar.txt") and so I need to figure out either the full path or to send the current directory path to the driver (yuck!).
  • Even using a full path wouldn't be enough because the drive letter might be different depending on the session the user is in. Besides, who knows what a path really points to ? Some followers of this blog might know how much I dislike file names and how I try to avoid them.
  • The user might not actually have access to write to that file but the kernel would so I would have to impersonate the user before trying to create the file.
So anyway my decision was to open the file in user mode and then call the driver and tell it the handle to the file and let the driver figure out what the object is and how to use it. However, what I wanted was to use the ZwWriteFile API to write to the file and so I needed a handle to that object. Looking at the OB APIs it's easy to see that we could simply call ObReferenceObjectByHandle followed by ObOpenObjectByPointer to create the new kernel handle. This is what my code looked like:
        status = ObReferenceObjectByHandle( ioctlBuffer->UserHandle,
                                            FILE_READ_DATA | FILE_WRITE_DATA | SYNCHRONIZE | STANDARD_RIGHTS_READ | FILE_READ_ATTRIBUTES,
                                            *IoFileObjectType,
                                            UserMode,
                                            &userFileObject,
                                            NULL );
 
        if (!NT_SUCCESS(status)) {
 
            __leave;
        }
 
        ASSERT(FlagOn( userFileObject->Flags, FO_HANDLE_CREATED ) && 
                   !FlagOn( userFileObject->Flags, FO_CLEANUP_COMPLETE ));
 
        status = ObOpenObjectByPointer( userFileObject,
                                        OBJ_KERNEL_HANDLE,
                                        NULL,
                                        0,//FILE_READ_DATA | FILE_WRITE_DATA,
                                        *IoFileObjectType,
                                        KernelMode,
                                        &kernelFileHandle);
This worked pretty well for local files but when I tried to open a file that was on a remote file system it failed to open the kernel handle with STATUS_ACCESS_DENIED . I spent some time tracing through the code and what I found was that ObOpenObjectByPointer in this case always ends up sending an IRP_MJ_QUERY_SECURITY request to the file system. Moreover, this request seemed to always ask for all the security information (which you can see if you disassemble nt!ObpGetObjectSecurity and look at how it sets the SecurityInformation; on my Win7 it's a "mov dword ptr [xxx],1Fh"):
#define OWNER_SECURITY_INFORMATION       (0x00000001L)
#define GROUP_SECURITY_INFORMATION       (0x00000002L)
#define DACL_SECURITY_INFORMATION        (0x00000004L)
#define SACL_SECURITY_INFORMATION        (0x00000008L)
#define LABEL_SECURITY_INFORMATION       (0x00000010L)
 
3: kd> dt 0xfffff9801a3d8fb8 nt!_IO_STACK_LOCATION Parameters.QuerySecurity.
   +0x008 Parameters                : 
      +0x000 QuerySecurity             : 
         +0x000 SecurityInformation       : 0x1f
         +0x008 Length                    : 0x100
However, while this works well on the local system, it almost always fails over SMB. Looking at the page "2.2.1.3 SECURITY_INFORMATION", there is this table that describes what the caller needs in order to be able to read various information types. For SACL_SECURITY_INFORMATION we see that in fact READ_CONTROL is not enough and that a certain privilege is required. This privilege is unlikely to be granted to any client of the server and so in the general case the IRP_MJ_QUERY_SECURITY issued by nt!ObpGetObjectSecurity will fail with STATUS_ACCESS_DENIED.

Security information access requested
Rights required of caller on server
Privileges required of caller on server
OWNER_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
GROUP_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
DACL_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
SACL_SECURITY_INFORMATION
Does not apply.
Security privilege.

So now since this approach was out of the picture, I needed something else. Unfortunately, I have been unable to figure out a documented way to achieve this (pretty much anything I tried called nt!ObpGetObjectSecurity at some point). However, there is one undocumented function that actually does exactly what I wanted, ZwDuplicateObject. So now my code looks something like this:
    //
    // duplicate the handle... first get a handle to the system process
    // so we can call ZwDuplicateObject on it.
    //

    status = ObOpenObjectByPointer( PsInitialSystemProcess,
                                    OBJ_KERNEL_HANDLE,
                                    NULL,
                                    STANDARD_RIGHTS_READ,
                                    NULL,
                                    KernelMode,
                                    &systemProcessHandle );

    
    if (!NT_SUCCESS(status)) {

        return status;
    }

    status = ZwDuplicateObject( NtCurrentProcess(),
                                ioctlBuffer->UserHandle,
                                systemProcessHandle,
                                &kernelFileHandle,
                                FILE_READ_DATA | FILE_WRITE_DATA,
                                OBJ_KERNEL_HANDLE,
                                DUPLICATE_SAME_ATTRIBUTES | DUPLICATE_SAME_ACCESS );

This approach works because when using DUPLICATE_SAME_ACCESS ZwDuplicateObject() doesn't actually try to validate the access. This works fine in cases like the one I described where I didn't want any more rights than the user had. However, if the driver needs more (or maybe just different) access to the object then this function will also perform access checks.
Another possible approach would have been to open the file again in the driver and create a new handle, which would work because the security privilege isn't necessary to open a file on a remote server. However in this case I would have had to make sure that the user had the right type of access to the file. There is also the performance issue to consider, since issuing a new IRP_MJ_CREATE is not exactly cheap. It didn't really matter in my case but I'm just mentioning it here just in case.
And finally, there are some caveats to consider for this approach:
  • Because we've duplicated the handle, we are effectively using the same FILE_OBJECT as the user and so any changes we make will potentially affect them. For example, if the IO manager keeps track of the current byte offset for this FILE_OBJECT, operations the driver might perform change that and so the user mode component might get confused. So if the driver is planning on being transparent to the user mode client, then it needs to be extra careful about this sort of things. This wasn't a concern in my case since the user mode client was aware it was sending the handle to a driver and didn't use the handle afterwards, but it might be different for a minifilter.
  • Since we've duplicated the handle, it is possible that IRP_MJ_CLEANUP no longer arrives in the context of the user process (depending on whether the kernel handle gets closed first or not). This might have an impact on some minifilters as well as on any byte range locks on the file.
  • Since ZwDuplicateObject() is not documented it might not be supported in this form (or at all) in future OS releases. Though IMO Microsoft should document this API.
  • In XP and Server 2003 (SRV03) there is a bug (fixed in XP SP3 and SRV03 R2 SP1 (I'm not sure about the version)) where the handle that is returned by ZwDuplicateObject is a kernel handle (it belongs in the system process' handle table) but is not marked as such (the most significant bit is not set).
  • Finally, there is one other flag to ZwDuplicateObject that might be interesting to anyone using to duplicate user mode handles, DUPLICATE_CLOSE_SOURCE. This closes the user mode handle before the function returns. According to Gary Nebbett's book, it will close the handle regardless of the status of the operation.

Thursday, February 24, 2011

Tracking a minifilter's ActiveOpens files

I've recently done a bit of detective work that I thought might be an interesting thing to share, especially since we've been talking about contexts so much lately. The issue was how to find the files that were opened by a minifilter which prevent that minifilter from unloading. FltMgr actually keeps track of the files that were opened by a minifilter and if the minifilter gets unloaded it will wait for those files to be closed before it unloads the driver. You can see this in the debugger here:

1: kd> !fltkd.filter 94130008 

FLT_FILTER: 94130008 "luafv" "135000"
   FLT_OBJECT: 94130008  [02000000] Filter
      RundownRef               : 0x0000000a (5)
      PointerCount             : 0x00000001 
      PrimaryLink              : [922c75a4-92cdaa24] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000006] FilteringInitiated NameProvider
   DriverObject             : 9306bc50 
   FilterLink               : [922c75a4-92cdaa24] 
   PreVolumeMount           : 81fbe0cc  luafv!LuafvPreRedirect 
   PostVolumeMount          : 00000000  (null) 
   FilterUnload             : 00000000  (null) 
   InstanceSetup            : 81fca62b  luafv!LuafvInstanceSetup 
   InstanceQueryTeardown    : 00000000  (null) 
   InstanceTeardownStart    : 00000000  (null) 
   InstanceTeardownComplete : 00000000  (null) 
   ActiveOpens              : (941300dc)  mCount=1 
   Communication Port List  : (94130108)  mCount=0 
   Client Port List         : (94130134)  mCount=0 
   VerifierExtension        : 00000000 

So the task is to find that one file that ActiveOpens is tracking for LUAFV. Before we go any further, let's see what the chain of structures looks like, starting from the FILE_OBJECT. Some of the structures are documented while others are not. I've marked the undocumented structures with a '?' at the end of the name. We don't know (or care for this post) what the other members of the structures are.

Now, we need to walk the arrows backwards so the steps we need to follow are:

  1. From fltmgr!_FLT_FILTER->ActiveOpens find fltmgr!FO_CONTEXT?
  2. From fltmgr!FO_CONTEXT? find the nt!FILE_OBJECT_CONTEXTS_HEADER? Structure
  3. From nt!FILE_OBJECT_CONTEXTS_HEADER? Structure find the nt!_IOP_FILE_OBJECT_EXTENSION pointing to it
  4. From the nt!_IOP_FILE_OBJECT_EXTENSION find the nt!_FILE_OBJECT structure that points to it
  5. ???
  6. Profit!!!!

Starting with ActiveOpens, let's look at the structure. Please note that mCount appears to be shifted by 1. Also, please note that mList is a regular doubly linked list and we expect that it contains one entry (since mCount is 1):

1: kd> dt 941300dc _FLT_MUTEX_LIST_HEAD
fltmgr!_FLT_MUTEX_LIST_HEAD
   +0x000 mLock            : _FAST_MUTEX
   +0x020 mList            : _LIST_ENTRY [ 0x93712498 - 0x93712498 ]
   +0x028 mCount           : 2
   +0x028 mInvalid         : 0y0
1: kd> dl 0x93712498 
93712498  941300fc 941300fc 0421000e 706e5043
941300fc  93712498 93712498 00000002 00000001
1: kd> !pool 0x93712498 2
Pool page 93712498 region is Nonpaged pool
*93712430 size:   70 previous size:    8  (Allocated) *FMfc
  Pooltag FMfc : FLTMGR_FILE_OBJECT_CONTEXT structure, Binary : fltmgr.sys

So now we know that the structure that we've been calling fltmgr!FO_CONTEXT? is in fact called FLTMGR_FILE_OBJECT_CONTEXT. Step 1 is done and we're moving on to step 2. We also know the size isn't larger than 0x70. However, we don't know where the LIST_ENTRY is in there. The only other thing we know is that in that structure there must be a member that is of type FSRTL_PER_FILEOBJECT_CONTEXT (and we know this because that's how FILE_OBJECT contexts are implemented; see the documentation for FsRtlInsertPerFileObjectContext). Since the beginning of the _LIST_ENTRY is at address 0x93712498 and the pool block starts at 0x93712430, we can guess our LIST_ENTRY is towards the end of the structure, so we'll go back a bit and display the words. Then we'll look for something that looks like an FSRTL_PER_FILEOBJECT_CONTEXT. FSRTL_PER_FILEOBJECT_CONTEXT doesn't really contain much easily identifiable information but it does contain a LIST_ENTRY, which means we should find two words that look like kernel mode addresses next to each other. I've highlighted possible candidates and then we simply try "dl" on them (if anyone knows a better way I'd love to hear about it, please leave a comment). Of course, if you look at the number carefully you can see that it's very likely that there is a LIST_ENTRY at 93712480 if the list has only one element. But in most cases the list has more than one element so it's hard to tell at a quick glance. So we simply issue a "dl" on each candidate and hope that if they're not doubly linked lists they'll simply run into some invalid address sooner or later.

1: kd> dp 0x93712498-0x60 
93712438  48706345 00000000 00000000 00000000
93712448  00ac5851 45e4702b 2a5794ac 7e7ac1fe
93712458  00000000 00000047 00000068 96040c00
93712468  00000000 0034f110 001d0000 92cda9c8
93712478  94134ce8 93011008 9306b0c8 9306b0c8
93712488  938f1230 00000000 94130008 941307d8
93712498  941300fc 941300fc 0421000e 706e5043
937124a8  937395a8 937126b8 937699d8 93739ad8
1: kd> dl 92cda9c8
92cda9c8  0340f103 960409f8 960409f8 00000000
0340f103  00000000 00000000 00000000 00000000
1: kd> dl 94134ce8 
94134ce8  00800005 92f05788 92f06870 92fa4998
…
1: kd> dl 9306b0c8  
9306b0c8  93712480 93712480 00010006 e56c6946
93712480  9306b0c8 9306b0c8 938f1230 00000000
1: kd> !pool 9306b0c8 2
Pool page 9306b0c8 region is Nonpaged pool
*9306b0a0 size:   30 previous size:   10  (Allocated) *FOCX
  Pooltag FOCX : File System Run Time File Object Context structure, Binary : nt!fsrtl

It looks like we may have found our structure. Now we're starting our step 3 and we have a pointer into a structure of type nt!FILE_OBJECT_CONTEXTS_HEADER?, but since we don't know the type we don't know where the structure starts. Normally what I do is assume it starts right after the pool tag and search for that address, and if that fails search for the address at the next word boundary and so on. Let's see:

1: kd> db 9306b0a0 L0x20
9306b0a0  02 00 06 04 46 4f 43 58-01 00 00 00 00 00 00 00  ....FOCX........
9306b0b0  00 00 00 00 01 00 04 20-00 00 00 00 bc b0 06 93  ....... ........
1: kd> s -d 80000000 L?0x20000000 9306b0a8
9306ba78  9306b0a8 00000000 00000000 00000000  ................
1: kd> !pool 9306ba78  2
Pool page 9306ba78 region is Nonpaged pool
*9306ba60 size:   30 previous size:   10  (Allocated) *Io  
  Pooltag Io   : general IO allocations, Binary : nt!io

So actually this looks pretty good. In other cases I've found multiple random values that looked like references so I've just had to look at each one. But in this case it looks pretty clean. We expect that the pointer is in a structure that's allocated by the IO mgr, and that the structure size something about the size of nt!_IOP_FILE_OBJECT_EXTENSION. So this means we've completed step 3 and we have the nt!_IOP_FILE_OBJECT_EXTENSION structure. In our picture I've shown that in the _IOP_FILE_OBJECT_EXTENSION it is FoExtPerTypeExtension[3] that points to this structure that we just found. I've figured this out by experimenting. I simply got a FILE_OBJECT and added a context by calling FsRtlInsertPerFileObjectContext and watched which value changed. So now we know where the structure starts and we need to search a pointer to it. The pointer we would find here would be a FILE_OBJECT->FileObjectExtension so we expect the pool tag to be "File". Also, we expect the FILE_OBJECT to start 0x7c bytes before the pointer. Please note that the first two results returned by the search were invalid (!pool told me so):

1: kd> ? 0x9306ba78-0x10
Evaluate expression: -1828275608 = 9306ba68
1: kd> s -d 80000000 L?0x20000000 9306ba68
82b80008  9306ba68 9306ba68 00000000 abcddcba  h...h...........
82b8000c  9306ba68 00000000 abcddcba 00000001  h...............
94134d64  9306ba68 04530015 6661754c 00000000  h.....S.Luaf....
1: kd> !pool 94134d64  2
Pool page 94134d64 region is Nonpaged pool
*94134cc0 size:   a8 previous size:  2d8  (Allocated) *File (Protected)
  Pooltag File : File objects
1: kd> !fileobj 0x94134d64-0x7c  



Device Object: 0x92f05788   \Driver\volmgr
Vpb: 0x92f06870
Event signalled
Access: Read Write SharedRead SharedWrite SharedDelete 

Flags:  0x440008
 No Intermediate Buffering
 Handle Created
 Volume Open

FsContext: 0x92fa4998 FsContext2: 0x97e0cbc0
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: 92f0cd14
  Shared Cache Map: 00000000


File object extension is at 9306ba68:

So this is it, we know now that LUAFV opens the volume using FltCreateFile (if it didn't it would be on the ActiveOpens list). For the record, this is win7:

1: kd> vertarget
Windows 7 Kernel Version 7600 MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7600.16617.x86fre.win7_gdr.100618-1621
Machine Name:
Kernel base = 0x82809000 PsLoadedModuleList = 0x82951810
Debug session time: Thu Feb 17 07:44:10.641 2011 (UTC - 8:00)
System Uptime: 0 days 0:02:42.733

Thursday, February 17, 2011

Filter Layering and IO Targeting in FltMgr - part II

Let's look at how targeting in filter manager can fail and what it looks like when it happens. I'd like to say that while such things do happen and I've analyzed a couple of cases over the years, they are far less frequent with minifilters than with legacy filters. Anyway, layering violations in some cases can just go unnoticed, but if they do cause trouble most likely what will happen is an infinite recursion which will result in a bugcheck. Deadlocks can also happen, but the cases I've investigated so far were all infinite recursions.

One way layering can fail is when a FILE_OBJECT is used in postCreate by a minifilter and the minifilter is not using Flt APIs to perform the IO. For example, using the setup from the previous blog post, if minifilter1 in postCreate calls a Zw function or creates a handle for a user mode app and lets the user mode app use that handle to do something like scan the file or recall it or decompress it, what will happen is that the requests generated by the user mode service or the Zw calls will go to Frame0 because the FILE_OBJECT stores the information about the device hint, but FltMgr will not find its targeting structure and will show the requests all minifilters in the frame starting with Minifilter 3. However, this is a clear violation of the minifilter rules because minifilter1 broke the layering contract by sending IO to the top of the IO stack. If it wants to do this sort of thing it needs to call its own FltCreateFile and after that create succeeds it can either use a Zw API or create a handle for that FILE_OBJECT to be used by a user mode service.

Now, if filter manager would always use a device hint things wouldn't be too bad. However, there is a case where FltMgr violates the rule of never sending IO to the top of the stack. This case is in the naming path (and more precisely in the FltpExpandFilePathWorker function). When a minifilter calls FltGetFileNameInformation and requests a normalized name, FltMgr gets a name for the file and then proceeds to normalize it. It does so by opening folders along the path to the file and querying information that contains the long name for each component on the path. In this case however, FltMgr does use a targeting structure to identify which minifilters should see the create, but it does not use a DeviceHint for the IoCreateFileEx call. The reason, as far as I can tell, is that the request will fail if the name contains a reparse point that reparses to a different volume (remember that bit about IopCheckTopDeviceHint a couple of posts back ?) so FltMgr just sends it to the top of the stack.

So in this case the IRP_MJ_CREATE is not targeted in the IO manager (it will go to the top of the IO stack) but there will be FltMgr targeting information attached to the IRP_MJ_CREATE. Looking at our picture from the previous post, we can see that if the IRP_MJ_CREATE issued by (or on behalf of) minifilter2 is one of these creates then Frame1, Legacy Filter B and Legacy Filter A will all see the IRP_MJ_CREATE and the subsequent requests. Frame1 will find the targeting information and infer that no minifilter in that frame should see the request and just send it below. However, Legacy Filter B and Legacy Filter A will not be aware that they shouldn't see this request and will perform their usual functions. So, since the targeting information has not been attached to the FILE_OBJECT yet (it will only happen when IoCreateFileEx returns to FltMgr) and there is no DeviceHint on the FILE_OBJECT if Legacy Filter B issues any requests to the device below them, then not only will legacy filter A see that request (which is expected), but also when the request reaches Frame 0 all the minifilters will see it. But minifilter3 and minifilter2 should not have seen it an in fact they haven't seen the IRP_MJ_CREATE as well so there are quite a few things that can go wrong here.

Another interesting case I've seen was when a legacy filter (let's use Legacy Filter B again as an example) tried to implement something similar to FltReissueSynchronousIo in the minifilter world and in its postCreate, if the IRP_MJ_CREATE failed, it changed something in the request and sent it down again. This worked well on Vista and newer but in XP it failed. As you remember, in XP the targeting information is stored in an EA, and for whatever reason the EA mechanism was designed such that the structure is associated with an IRP_MJ_CREATE by storing the EaLength in the IO_STACK_LOCATION and the EA buffer in the IRP (Irp->AssociatedIrp.SystemBuffer). This is in my opinion a pretty poor design, because it suggests that the EaLength can be layered but the EA buffer cannot. It also means there is only one EA buffer per IRP and so FltMgr must use EA chaining. When FltMgr receives such a create it needs to remove its EA information from the buffer before it sends the request down. However, once the EA buffer has been changed if a legacy filter sends the request down again in the manner we explained then FltMgr (in Frame0 in our example) will not find the targeting information and will therefore show the information to all the minifilters in the frame, which can also result in a layering violation. Moreover, the EaLength and the EA buffer are now out of sync and the file system might not like this. For a very clear example of this issue looks like, please read this thread: http://www.osronline.com/showthread.cfm?link=187295. Please note that in this thread though we're not dealing with recursion but with the file system not being able to cope with the EaLength and the EA buffer being in an inconsistent state. Though infinite recursion could still have happened if there were more minifilters installed on the system.

Before ending this post, I'd like to point out that pretty much all layering issues require a legacy filter in the picture. In general FltMgr by itself with just minifilters is pretty good about it. Please note that even perfectly written legacy filters could trigger this issue (which is another way of saying that it isn't the legacy filter's fault), the real problem is that FltMgr breaks layering. What I would like FltMgr to do (in fact, what I wish it had done already) is to offer an API to legacy filters by which such a filter can tell whether a certain IRP or FILE_OBJECT is one they should ignore. Also, I think that FltMgr could address a large class of issues very easily by simply moving the targeting information to the FILE_OBJECT immediately after the IRP_MJ_CREATE completes in the file system.

Thursday, February 10, 2011

Filter Layering and IO Targeting in FltMgr

I've been talking about layering quite a lot on this blog. I've also mentioned how FltMgr performs IO targeting when a minifilter calls FltCreateFile in this post and how after such a FILE_OBJECT is created, targeting works even when using Zw apis in this post. However, let's take a more closer look at how it actually is implemented in FltMgr and what are some of the implications of the design.

As it might be apparent from the previous links, there are two different kinds of targeting going on. The IO manager targeting that directs an IRP at the appropriate device and FltMgr's targeting, which identifies the appropriate minifilter for that operation. Please take a look at the following picture, where the blue blocks represent devices (FltMgr's Frame0 and Frame1 and the attachments for the two legacy filters are all devices) and the red blocks are minifilters. The picture shows how an FltCreateFile request goes to the IO manager, and how then it find the minifilter below the one issuing that call.

The steps involved in this are as follows:

  1. Minifilter2 calls FltCreateFile
  2. FltMgr allocates targeting information (fltmgr calls it TargetedIoControl) and inserts it into an ECP structure and then it calls IoCreateFileEx with the ECP and a device hint that points to the device for Frame0.
  3. IoCreateFileEx goes through the usual steps in the OB manager, the OPEN_PACKET is initialized, and it eventually gets to IopParseDevice
  4. IoMgr in IopParseDevice allocates an IRP_MJ_CREATE , attaches the ECP and sends it directly to the hint device, Frame0.
  5. FltMgr get's the IRP_MJ_CREATE on the device for Frame0, looks for the targeting ECP and extracts the TargetedIoControl from it and then it analyzes it and figures out that the first minifilter that should see this request is Minifilter1.
  6. Minifilter1's preCreate callback get's called.
  7. the IRP_MJ_CREATE is processed further by the IO stack, file system and so on
  8. the IRP_MJ_CREATE completes to the IO manager
  9. IoCreateFileEx returns to FltMgr
  10. the original call to FltCreateFile returns control to Minifilter2

Please note that things are a bit different in XP, for example instead of an ECP FltMgr uses an EA and instead of IoCreateFileEx FltMgr calls IoCreateFileSpecifyDeviceObjectHint. I will only focus on the behavior in Vista and newer releases but XP should be pretty similar anyway.

Let's take a look at how IO manager's targeting is implemented. Each FILE_OBJECT structure has something called a FileObjectExtension and there are some functions in the IO manager that can set things in the extension. Please note that this is not the same as the FILE_OBJECT context support added in Vista (and which is available throug APIs like FsRtlLookupPerFileObjectContext and friends).:

0: kd> dt nt!_FILE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Int2B
   +0x004 DeviceObject     : Ptr32 _DEVICE_OBJECT
   …
   +0x07c FileObjectExtension : Ptr32 Void

0: kd> x nt!*Extension*
...
828b1d09 nt!IopAllocateFileObjectExtension = 
...
828a40e2 nt!IopGetFileObjectExtension = 
...
828c8fe7 nt!IopSetTypeSpecificFoExtension = 
...
82a686f0 nt!IopDeleteFileObjectExtension = 
82aa3238 nt!IopSymlinkSetFoExtension = 
...
82a6c1e7 nt!IopAllocateFoExtensionsOnCreate = 
So these extensions are of different types, for internal use by various OS components. The interesting function here is nt!IopAllocateFoExtensionsOnCreate which initializes some extensions whenever a FILE_OBJECT is initialized. For example, if a DeviceHint was specified then some specific extension is allocated and then the IO manager will always use that extension on the FILE_OBJECT to figure out which device an IO request needs to be sent to. So IO manager's targeting information is associated with the FILE_OBJECT immediately upon creation.

FltMgr takes a different approach. For one, it is not involved in FILE_OBJECT creation and so it doesn't know when the FILE_OBJECT is created. So the approach it takes is a bit more complex. Looking at the steps above associated with the picture above, in step 9 FltMgr now takes the TargetedIoControl structure that was associated with the IRP_MJ_CREATE and associates it with the FILE_OBJECT, before returning from FltCreateFile. In fact, the flow in FltCreateFile looks something like this:

  1. Allocate TargetedIoControl
  2. Call IoCreateFileEx with the DeviceHint pointing to the current FltMgr device and the TargetedIoControl
  3. When IoCreateFileEx returns, if the create was successful, associate the TargetedIoControl with the FILE_OBJECT.
This algorithm is also employed in cases where FltMgr needs to open a file itself (mostly in the Naming code) because it doesn't internally call FltCreateFile and instead it simply follows these steps. In fact let's take one more look in the debugger at the functions in FltMgr that are associated with targeting:
0: kd> x fltmgr!*target*
96050f68 fltmgr!FreeTargetedIoCtrl = 
960394b6 fltmgr!FltpGetIoTargetFromFileObject = 
...
96051114 fltmgr!TargetedIOCtrlGenerateECP = 
9605132e fltmgr!TargetedIOCtrlAttachAsFoCtx = 
So as you can see, we have a function to add a TargetedIoControl as an ECP (undoubtedly for the IRP_MJ_CREATE case) and as a FILE_OBJECT context (after the IRP_MJ_CREATE is complete), as well as a function to get the target of an IO operation from the FILE_OBJECT context (fltmgr!FltpGetIoTargetFromFileObject). There doesn't seem to be a function that figures out the target from an ECP so that's probably only handled inline.

The really important thing to note here is that this mechanism is different from the IO manager mechanism in that the FILE_OBJECT doesn't have FltMgr's targeting information that should be associated with it until after IoCreateFile returns. So for a fair bit of time, between the moment when the IRP_MJ_CREATE is completed by the file system (and when the FILE_OBJECT becomes initialized) and the moment when the IoCreateFileEx call returns to FltMgr, the FILE_OBJECT is initialized but it doesn't have any FltMgr targeting information (it does however have IO manager's targeting information). We'll discuss the implications of this particular approach (and the whole class of issues it introduces) in the next blog post, as well as a couple of various different approach FltMgr could have used.

Thursday, February 3, 2011

More contexts: tracking hardlinks

In one of the comments to my previous post, Lyndon pointed out that there is not a lot of support from either the OS or FltMgr when it comes to tracking hardlinks. So I figured I'd explain why this is so complicated and explain what a filter would need to do to implement this. I'm not going to describe what hardlinks are or how they operate, focusing instead on what FltMgr does and what a filter might need to do as well.

However, there is one specific particularity about hardlinks that i'll keep referring to. Once a file is opened the file system remembers which link was used to open the file and it will return that name when querying the file name. If that linked is renamed, the FS will of course return the new name.

So the problem with hardlinks is, like Lyndon pointed out, that the SCB model isn't granular enough. The SCB is associated with the stream and it doesn't really matter how the stream was opened (by which name), the SCB is the same. So a StreamContext is the same, regardless of how many hardlinks were used. On the other hand, StreamHandleContexts are too granular, in that they simply track the FILE_OBJECT and different opens even from the same link (using the same name for the file) will obviously get different FILE_OBJECTs and thus different StreamHandleContexts.

Filter manager doesn't offer an additional type of context. However, it does need to deal with hardlinks because it implements a name cache. The name cache is pretty simple to implement for files that only have one name, the name is stored in a structure associated with the stream. However, for hardlinks, clearly the structure needs to be different so that opens for the same name are cached properly. FltMgr solves this problem by not caching the file name in a structure associated with the SCB if the file has more than one link (as reported by the FileStandardInformation information class) and instead it caches the name per FILE_OBJECT.

If a filter wanted to keep track of hardlinks it would need to, as Lyndon indicated in his comment, look at the name that it gets from the file system (the FileNameInformation class) and from that deduce which link was used. This is complicated because a link can be renamed at any time so that must be taken into account. A possible implementation would need to keep some structure in a perStream context that would map each FILE_OBJECT to a link (possibly introducing an artificial concept like linkID or linkGuid or something) and in postCreate would map the newly opened FILE_OBJECT to the appropriate link (which requires looking at link names using the FileHardLinkInformation class) while disabling renames for that stream.

I was planning on writing more on this topic and playing with hardlinks some more, but I'm busy at work and it'll have to wait for a future post.