Thursday, March 10, 2011

How File System Filters Attach to Volumes - Part I

I want to talk a bit about how FltMgr attaches to volumes and how instances are created when a new volume arrives. I want to use that as the basis to talk about what minifilters can do in their InstanceSetup callback. This should also explain some possible deadlocks in that path and emphasize the point that doing things in postCreate is preferable to preCreate. I also want to talk about IRP_MJ_VOLUME_MOUNT and how it works and why it's there. I was going to write just one post but it's too long already and I'm not done so I'll split it in a couple of posts...

I'll start with a refresher on how file systems mount volumes and how legacy file system filters attach to file systems. When a file system driver is initialized it creates what is called a Control Device Object (CDO). It can create more than one of those (look at the FastFat WDK sample for an example of a file system creating more than a CDO). The reason the file system needs to do that is that it must register a device with the IO manager when it tells it is a file system (by calling IoRegisterFileSystem and passing in the CDO(s)). Please note that this mechanism predates PNP and as you can see it is very different. These CDOs are named device objects and their purpose is to receive commands for the file system. One such command is the IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code (which I'll just refer to as IRP_MN_MOUNT_VOLUME from now on since IRP_MN_MOUNT_VOLUME is only delivered through an IRP_MJ_FILE_SYSTEM_CONTROL and there is no possibility of confusion), which is sent by the IO manager when it wants to mount a volume. One possible sequence of operations is this:

  1. Volume DEVICE_OBJECT is created, usually by the volume manager, with a name like "\Device\HarddiskVolume2".
  2. The volume manager alerts the system of the arrival of the volume by calling IoRegisterDeviceInterface() with the MOUNTDEV_MOUNTED_DEVICE_GUID or GUID_DEVINTERFACE_VOLUME (which are in fact the same GUID). This alerts MountMgr that a volume has arrived.
  3. MountMgr queries the volume for the name and sets up the NT volume name (which looks like "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}") and the DOS volume name (which might look like "C:"). Both these names point to the volume device ("\Device\HarddiskVolume2").
  4. At this point the volume is not mounted and it has a VPB structure associated with it that keeps track of that.
  5. After a while someone issues an operation to the volume (like trying to open "C:\foo.txt", or "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\foo.txt" or "\Device\HarddiskVolume2\foo.txt", which are different names for the same thing). While trying to issue the IRP_MJ_CREATE, IO manager will check if the volume is mounted and if not it will mount it (nt!IopCheckVpbMounted). See my post "About IRP_MJ_CREATE and minifilter design considerations - Part II" and look at step 2 in my steps for nt!IopParseDevice.
  6. If the volume is not mounted in nt!IopCheckVpbMounted then IO mgr calls nt!IopMountVolume which walks through the registered file systems for that device type (hence the need for more than one CDO) and sends the IRP_MN_MOUNT_VOLUME request to each of devices on the list of registered file systems (which is a list of CDOs).
  7. When a file system receives an IRP_MN_MOUNT_VOLUME it checks whether it can mount the file system (reads some sectors and does whatever it needs to do to figure it is it's volume) and then it creates a new DEVICE_OBJECT (anonymous this time) which is called a Volume Device Object (VDO), which is linked through the VPB to the actual volume DEVICE_OBJECT (the one that has a name and a drive letter).
  8. Once nt!IopCheckVpbMounted completes and a volume is mounted nt!IopParseDevice continues and an IRP_MJ_CREATE is sent to the newly mounted volume, which is the first operation that the file system processed on that VDO.
Another way to look at this is that the CDO device functions as a factory for file system instances, and the IRP_MN_MOUNT_VOLUME is a request for the factory to generate an instance associated with the storage volume DEVICE_OBJECT, which will either fail if the file system doesn't recognize the volume or will return the file system VDO, which is the file system instance for that volume. Here is some debugger output to illustrate all this. In order to generate all this I took a 32bit Win7 and rebooted it and put a breakpoint on nt!IopMountVolume (that's why NTFS has no volumes and just a CDO). I'm showing mainly to showcase some more windbg commands that are useful when debugging file systems:
This is NTFS initialized, with just one DEVICE_OBJECT, the CDO. Also please note how the CDO is a named device:
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93215638  
0: kd> !devobj 93215638  
Device object (93215638) is for:
 Ntfs \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl 973af50c DevExt 00000000 DevObjExt 932156f0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93211020 \FileSystem\FltMgr
Device queue is not busy.
This is what the stack looks like when IopMountVolume is called. Please note that volsnap is opening a file on a volume. Also, note how the DeviceObject member of the VPB is null (since no file system is mounted on the volume), and the VPB flags are also all clear:
0: kd> kb
ChildEBP RetAddr  Args to Child              
984b18cc 828ad424 934cd768 924d7a00 00000000 nt!IopMountVolume
984b1904 82a50f9f 924d7a48 984b1a30 984b19c8 nt!IopCheckVpbMounted+0x64
984b19e8 82a3226b 934cd768 844d6f78 924f55e8 nt!IopParseDevice+0x7c9
984b1a64 82a582d9 00000000 984b1ab8 00000240 nt!ObpLookupObjectName+0x4fa
984b1ac4 82a5062b 984b1c44 924d6f78 93b15900 nt!ObOpenObjectByName+0x165
984b1b40 82a8b67e 984b1c90 00120089 984b1c44 nt!IopCreateFile+0x673
984b1b88 8285444a 984b1c90 00120089 984b1c44 nt!NtOpenFile+0x2a
984b1b88 828527c1 984b1c90 00120089 984b1c44 nt!KiFastCallEntry+0x12a
984b1c18 969b0414 984b1c90 00120089 984b1c44 nt!ZwOpenFile+0x11
984b1c94 969b9194 934d60d8 00000000 00000000 volsnap!VspOpenControlBlockFile+0x108
984b1d1c 969b9eea 934d60d8 935775ac 934c78bc volsnap!VspOpenFilesAndValidateSnapshots+0x2e
984b1d34 969a5e59 935775a8 00000000 93500020 volsnap!VspSetIgnorableBlocksInBitmapWorker+0x40
984b1d50 82a1f6d3 934c79ac 432d39b1 00000000 volsnap!VspWorkerThread+0x83
984b1d90 828d10f9 969a5dd6 934cd6a0 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
0: kd> !obja 984b1c44 
Obja +984b1c44 at 984b1c44:
 Name is \Device\HarddiskVolume2\System Volume Information\{3808876b-c176-4e48-b7ae-04046e6cc752}
 OBJ_CASE_INSENSITIVE
0: kd> !devobj \Device\HarddiskVolume2
Device object (934cd768) is for:
 HarddiskVolume2 \Driver\volmgr DriverObject 93b00388
Current Irp 00000000 RefCount 1 Type 00000007 Flags 00003150
Vpb 934cb290 Dacl 973af50c DevExt 934cd820 DevObjExt 934cd908 Dope 934cab20 DevNode 934cfc48 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 934d0b70 \Driver\fvevol
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x0 
DeviceObject: 0x00000000
RealDevice:   0x934cd768
RefCount: 0
Volume Label: 
Next thing we're going to step out of this function and look at the objects again. There is a new, anonymous DEVICE_OBJECT that NTFS created, which is pointed by VPB->DeviceObject and the VPB flags have changed to indicate that the volume is mounted.
1: kd> gu
nt!IopCheckVpbMounted+0x64:
828ad424 8b4d10          mov     ecx,dword ptr [ebp+10h]
0: kd> gu
nt!IopParseDevice+0x7c9:
82a50f9f 8945c4          mov     dword ptr [ebp-3Ch],eax
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93690020  93215638  
0: kd> !devobj 93690020  
Device object (93690020) is for:
  \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00040000
DevExt 936900d8 DevObjExt 93690fb0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93566c08 \FileSystem\FltMgr
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x1 mounted 
DeviceObject: 0x93690020
RealDevice:   0x934cd768
RefCount: 15
Volume Label: 
Filters have largely been out of the picture so far (except for the fact that FltMgr was attached both to NTFS' CDO and the newly created VDO). So let's talk about how legacy filters (FltMgr being a legacy filter) enter this picture. When NTFS calls IoRegisterFileSystem, FltMgr creates and attaches a DEVICE_OBJECT of its own on top of NTFS. So FltMgr will have a device attached to all CDOs. Then, when an IRP_MN_MOUNT_VOLUME request arrives on that CDO, FltMgr creates a new DEVICE_OBJECT (that will be attached to the VDO created by the file system if the mount is successful or discarded if the mount is not successful) and then it simply passes the IRP_MN_MOUNT_VOLUME request below. Please note that FltMgr can't know in advance if the file system will actually mount the volume or not, so it must wait until the IRP_MN_MOUNT_VOLUME is completed to do more significant work. However, if it waited for the completion of IRP_MN_MOUNT_VOLUME before allocating the new DEVICE_OBJECT, it might end up in the position where the mount was successful but allocating the new DEVICE_OBJECT failed so it wouldn't be able to attach to the volume. The only reason I'm mentioning this is to illustrate that the safe approach when filtering something is to pre-allocate all resources that might be necessary (and perform all checks) before the operation is sent to the layer below (and if anything fails then fail the operation), because if the layer below successfully completes the operation the filter must not fail in processing it or it might end up in a broken state. Alternatively it might have to undo the operation performed at the underlying layer, which might not be easy or even possible.
The key things to remember from this post are:
  • The drive letter (DOS name) and other volume names (NT name) are not associated with the file system device, but rather with the storage volume.
  • Mounting the volume happens on first access to that volume.
  • Also, the first IO on a volume is an IRP_MJ_CREATE, so for a filter (both legacy and minifilter) the preCreate callback will be the first operation callback called on a newly mounted file system volume.

3 comments:

  1. Thank you; waiting for the next part

    ReplyDelete
  2. Great post. Thank you. I'm very pleased to know your blog.

    I have a question.
    Do you know Dokan project? It supports making filesystem in user mode, like linux FUSE.

    I'm making a network redirector with the project.
    But the Irps which come to my redirector aren't catched by Process Monitor.
    All Minifilter drivers never recognize Dokan filesystem.

    The weird thing is, dokan never call the IoRegisterFileSystem function.

    http://code.google.com/p/dokan/source/browse/tags/dokan-0.6.0/sys/init.c#593

    I guess it's the key, but no idea anymore. Could you give me a hint?
    I really want to make working this well. Without Process monitor, the debugging job is so difficult.

    ReplyDelete
  3. Hello Benjamin,

    I have heard of the project but I have never used it.

    If it is a network redirector then it doesn't necessarily need to call IoRegisterFileSystem.

    There are many articles and posts on OSR's list on this topic. See http://www.osronline.com/article.cfm?id=79 and http://www.osronline.com/showThread.cfm?link=83334.

    ReplyDelete