Thursday, March 17, 2011

How File System Filters Attach to Volumes - Part II

And now it's time to take a look at how minifilters fit in the picture. One of the first things to note is that the minifilter model does away completely with CDOs. The only objects that a minifilter interacts with are FLT_VOLUMEs, which are equivalent to VDOs. This is not normally a problem because most operations that are sent to the CDO are not really relevant to file system filters. However, IRP_MN_MOUNT_VOLUME is the one operation minifilters might be interested in, because some minifilters might want to block mounting of a volume. This is where IRP_MJ_VOLUME_MOUNT comes in. This is a virtual IRP (there isn't such an IRP function in the IO manager) that is only relevant to minifilters. When an IRP_MN_MOUNT_VOLUME is received by FltMgr on the CDO for a file system, it will create a request with the type IRP_MJ_VOLUME_MOUNT and send it to minifilters that have registered for that notification. Please note that there are some limitations about what the minifilter can do at this time, the intention is that a minifilter uses this notifcation only if it wants to block a volume mount and it must figure out whether this volume needs to be blocked or not without much help from the file system. Blocking a mount means that the volume will not be mounted at all on the system and is not what is used by  minifilter to tell FltMgr that it doesn't want to be attached to that volume (which is handled in the InstanceSetupCallback). Here are some things that are special about this operation:
  • The minifilter sees this request before the volume is mounted by the file system (in fact at this point the file system doesn't know anything about the volume, the IRP_MN_MOUNT_VOLUME would be the first notification the file system receives about that volume), so it really can't do any file system operation on that volume.
  • Only the Filter and the Volume members of the FLT_OBJECTS structure are set up.
  • The IO manager has locks held at this point and any file system IO to the same volume or even a different volume that might not be mounted at this time might deadlock. Block level IO to the volume should work though.
  • This is listed as a FAST_IO operation, though in fact it's not. Don't return FLT_PREOP_DISALLOW_FASTIO to it. Don't return FLT_PREOP_PENDING either. You must either return FLT_PREOP_SUCCESS_WITH_CALLBACK, FLT_PREOP_SUCCESS_NO_CALLBACK or FLT_PREOP_COMPLETE (if you want to prevent the mount from reaching the file system CDO, in which case the status must not be STATUS_SUCCESS).
This is what it looks like when the request reaches a minifilter (the passthrough sample in my case). Things to note are the fields in FltObjects and the fact that the volume isn't initialized yet (I've highlighted them):
1: kd> kn
 # ChildEBP RetAddr  
00 9960b8c4 9604319a PassThrough!PtPreOperationPassThrough+0x3c [c:\temp\passthrough\passthrough.c @ 675]
01 9960b930 960489ec fltmgr!FltpPerformPreMountCallbacks+0x1d0
02 9960b998 96048c5b fltmgr!FltpFsControlMountVolume+0x116
03 9960b9c8 828454bc fltmgr!FltpFsControl+0x5b
04 9960b9e0 829c102d nt!IofCallDriver+0x63
05 9960ba44 828a5424 nt!IopMountVolume+0x1d8
06 9960ba7c 82a48f9f nt!IopCheckVpbMounted+0x64
07 9960bb60 82a2a26b nt!IopParseDevice+0x7c9
08 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
09 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0a 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0b 9960bd00 8284c44a nt!NtCreateFile+0x34
0c 9960bd00 778464f4 nt!KiFastCallEntry+0x12a
1: kd> ?? FltObjects
struct _FLT_RELATED_OBJECTS * 0x9960b8e8
   +0x000 Size             : 0x18
   +0x002 TransactionContext : 0
   +0x004 Filter           : 0x9299d678 _FLT_FILTER
   +0x008 Volume           : 0x9297fad8 _FLT_VOLUME
   +0x00c Instance         : (null) 
   +0x010 FileObject       : (null) 
   +0x014 Transaction      : (null) 
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000002 (1)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa74-92cdaa74] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000008] Mounting
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa74-92cdaa74] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 00000000 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
The next thing to talk about is the InstanceSetupCallback. This is the callback that gets called when a new instance gets created. This callback allows the minifilter to decide whether it needs to attach to the volume and to set up its internal state for that volume. The interesting thing about this is understanding when it gets called. One important factor in the decision is that the minifilter needs to be able to perform operations on the file system in its InstanceSetupCallback (like opening a file for example; see the MetadataManager minifilter sample in the WDK for a minifilter that does that). Let's look at some of the factors that would impact the decision about when the notification needs to be called:
  • The IO manager needs to be able to process operations on the volume. This means that the mount must be completed, because FltCreateFile (and most other operations) would go to the IO manager and if the IO manager doesn't know that the mount is completed it will block the operation behind the mount. So clearly, the InstanceSetupCallback could not have been called anywhere during IRP_MN_MOUNT_VOLUME processing.
  • The InstanceSetupCallback must be called before ANY other callback is sent to the minfilter on that volume, which means it can't be asynchronous with other operations because for asynchronous operations, the order in which they reach various layers is impossible to guarantee. So all operations above a certain layer must be blocked until InstanceSetupCallback for that layer is completed.
  • Because minifilters need to be able to perform IO on the file system in their InstanceSetupCallback, the filters below them need to see that operation (since minifilters should be able to filter all operations on a volume). This means that the InstanceSetupCallback must have already been called for all the minifilters below (otherwise they wouldn't be able to process operations).
So when considering all these factors we arrive at the current implementation. When any operation reaches FltMgr on a volume, if InstanceSetupCallbacks have not been called yet, FltMgr will block that operation and call the InstanceSetupCallbacks for all the minifilters, starting from the lowest one and going up (where up means higher altitude numbers). After all the InstanceSetupCallbacks are complete FltMgr will release the lock and IO can proceed normally. Let's take a look in the debugger and see these effects. Things to note are how we're still in the context of the NtCreateFile request where we were before. Also, please note that the FileInfo minifilter is below PassThrough and it's already set up:
1: kd> kn
 # ChildEBP RetAddr  
00 9960b880 96049bf5 PassThrough!PtInstanceSetup [c:\temp\passthrough\passthrough.c @ 393]
01 9960b8b4 9604a417 fltmgr!FltpDoInstanceSetupNotification+0x69
02 9960b900 9604a7d1 fltmgr!FltpInitInstance+0x25d
03 9960b970 9604a8d7 fltmgr!FltpCreateInstanceFromName+0x285
04 9960b9dc 96053cde fltmgr!FltpEnumerateRegistryInstances+0xf9
05 9960ba2c 960487f4 fltmgr!FltpDoFilterNotificationForNewVolume+0xe0
06 9960ba70 828454bc fltmgr!FltpCreate+0x206
07 9960ba88 82a496ad nt!IofCallDriver+0x63
08 9960bb60 82a2a26b nt!IopParseDevice+0xed7
09 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
0a 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0b 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0c 9960bd00 8284c44a nt!NtCreateFile+0x34
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000006 (3)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa68-924ea7f4] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000066] PendingSetupNotify SetupNotifyCalled EnableNameCaching FilterAttached
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa68-924ea7f4] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 9297fad8 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
      FLT_INSTANCE: 92979b40 "PassThrough Instance" "370030"
      FLT_INSTANCE: 923f2dc8 "FileInfo" "45000"
Before we go on I'd like to recap the sequence of events during a mount:
  1. A request to open a file (that ultimately arrives in IopCreateFile) is sent to a volume that is not mounted yet.
  2. During the IopParseDevice call IO manager discovers that the volume isn't mounted so it tries to mount it. It does this behind a lock so multiple requests would be queued here until the mount completes.
  3. the IO manager sends the IRP_MJ_FILE_SYSTEM_CONTROL with IRP_MN_MOUNT_VOLUME request to the CDO of each file system.
  4. FltMgr is attached to each CDO and so it gets this request and sends the IRP_MJ_VOLUME_MOUNT request to minifilters.
  5. If no minifilters blocked the request, FltMgr sends the request to the FS CDO below.
  6. The FS creates a VDO and the volume is mounted.
  7. The IO manager knows the volume is mounted and it releases all the operations blocked behind that volume mount.
  8. The FltMgr gets all these operations and it discovers that the topmost instance on the volume hasn't been initialized yet, so it block all the operations behind a lock again.
  9. It then calls InstanceSetupCallback for the lowest minifilter in the stack, then for the one above it, and then for the one above that one and so on.. Please note that this notification happens in the context of whichever thread happens to win the race of the lock, so if there is more than one thread trying to perform an operation on a volume, it's possible that the IRP_MJ_VOLUME_MOUNT callback is called in the context of one thread and InstanceSetupCallback in a context of a different one.
  10. Once all the instances have been set up, FltMgr allows all operations to continue and the initialization of the volume is now complete.
Finally I'd like to talk about a couple of deadlocks that I've seen and some design decisions to avoid.
  • One interesting deadlock happened with a minifilter that blocked preCreate and called a user mode service to scan the file (like an anti-virus). When another minifilter above that one tried to create a file in its InstanceSetupCallback (it actually was the MetadataManager sample), this minifilter blocked that create and sent it to the user mode service. The user mode services tried to open the file to scan it but it was blocked in FltMgr because the instance setup phase wasn't complete so all top level IO was blocked. Alternative approaches that would have avoided that deadlock would have been to scan in postCreate (which is what most such filters do) or to use a private communication channel with the user mode service to insure that all IO issued by the user mode service is layered properly.
  • Another interesting deadlock can happen with the registry. As you can tell from the stack above, FltMgr needs to read minifilter configuration information from the registry (see that call to fltmgr!FltpEnumerateRegistryInstances). However, the registry has some very complicated locking rules and so if it happens that the registry is locked when FltMgr needs to read its configuration, FltMgr will wait for it. In one case I've seen, a driver (not a minifilter) was calling ZwLoadKey() for a file on a different volume from the system volume. The volume wasn't mounted so inside the ZwLoadKey() call the registry would acquire a lock, try to open the file, which resulted in a mount and then FltMgr tried to check the registry for any minifilter instances and it got blocked behind the registry lock. One possible solution in this case would be make sure that the volume is mounted before calling ZwLoadKey(). Please note that this might happen in many cases, any operation that ties a registry operation with a file system operation can potentially deadlock. For example, a registry filter that tries to log operation to a file might also cause the same deadlock if the volume containing the log file hasn't arrived yet.
  • Another pretty well known deadlock happens in InstanceSetup with the MountMgr. It is described in great detail here http://www.osronline.com/showThread.cfm?link=90003, so I won't do it. This should be fixed in Vista and Win7.
I hope this post has been useful in explaining how instances get created on a volume and how they might sometimes deadlock and what to look for when such deadlocks occur.