Thursday, December 9, 2010

More on IRPs and IRP_CTRLs

Sometimes I see posts on discussion lists about how a callback is not being called for some operation that a minifilter registered for. In most (possibly all) cases it turns out that that's not what the problem is and that the callback is in fact called, it's just that the poster can't tell it happened. It's happened to me a couple of times, but since I have a lot of confidence in FltMgr (having worked on it and all) I start of with the assumption that it must be something I'm doing wrong.

However, I've been wondering why people seem so keen on assuming that they don't get to see the callback for minifilters. And then I've realized that it might have something to do with the fact that minifilters use a callback model whereas the NT IO model is call-through. I'll talk a bit the call-through model and the limitations it has. I'll start with a brief refresh of the NT IO model and then explain the limitations and how the minifilter model tries to address them. Then I'll explain some of the downsides and how to work around them.

When an IO request (open a file, read or write and so on) reaches the IO manager, the information about the request is put in an IO request packet (IRP). Then the IO manager calls the driver that should process that IRP by calling IoCallDriver. There may be multiple drivers needed in order to complete a single operation, for example when the user opens a remote file so the IO request goes to a file system which then needs to send something to the network, so now there are at least two drivers involved in this. One could design the OS so that the drivers could go back to the IO manager and let it dispatch the request to the appropriate driver again or let the two drivers communicate directly. NT was designed to let the drivers communicate directly. Moreover, in many cases it one request may pass through many drivers that make up an IO stack (like the file system stack or the storage stack or the network stack), where each driver performs a specific role. So the IRP is potentially modified by each driver and sent to the next driver by calling IoCallDriver.

This is a call-through model. In the debugger it can sometimes look like this (please note that the IRP model allows the request to be completely decoupled from the thread but in practice you still see a lot of cases where a lot of drivers simply call the next driver in the same thread):

1: kd> kn
 # ChildEBP RetAddr  
00 a204bb10 828734bc volmgr!VmReadWrite
01 a204bb28 963bc475 nt!IofCallDriver+0x63
02 a204bb34 963bc548 fvevol!FveRequestPassThrough+0x31
03 a204bb50 963bc759 fvevol!FveReadWrite+0x4e
04 a204bb80 963bc7a9 fvevol!FveFilterRundownReadWrite+0x197
05 a204bb90 828734bc fvevol!FveFilterRundownWrite+0x33
06 a204bba8 9639a76e nt!IofCallDriver+0x63
07 a204bc88 9639a8a5 rdyboost!SmdProcessReadWrite+0xa14
08 a204bca8 828734bc rdyboost!SmdDispatchReadWrite+0xcb
09 a204bcc0 965a0fd9 nt!IofCallDriver+0x63
0a a204bce8 965a12fd volsnap!VolsnapWriteFilter+0x265
0b a204bcf8 828734bc volsnap!VolSnapWrite+0x21
0c a204bd10 960b091c nt!IofCallDriver+0x63
0d a204bd1c 828a711e Ntfs!NtfsStorageDriverCallout+0x14
0e a204bd1c 828a7215 nt!KiSwapKernelStackAndExit+0x15a
0f 981c964c 828c711d nt!KiSwitchKernelStackAndCallout+0x31
10 981c96c0 960af939 nt!KeExpandKernelStackAndCalloutEx+0x29d
11 981c96ec 960b05a6 Ntfs!NtfsCallStorageDriver+0x2d
12 981c9730 960af0a0 Ntfs!NtfsMultipleAsync+0x4d
13 981c9860 960ae0a6 Ntfs!NtfsNonCachedIo+0x413
14 981c9978 960af85f Ntfs!NtfsCommonWrite+0x1ebd
15 981c99f0 828734bc Ntfs!NtfsFsdWrite+0x2e1
16 981c9a08 9605f20c nt!IofCallDriver+0x63
17 981c9a2c 9605f3cb fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x2aa
18 981c9a64 828734bc fltmgr!FltpDispatch+0xc5
19 981c9a7c 82a74f6e nt!IofCallDriver+0x63
1a 981c9a9c 82a75822 nt!IopSynchronousServiceTail+0x1f8
1b 981c9b38 8287a44a nt!NtWriteFile+0x6e8
1c 981c9b38 828798b5 nt!KiFastCallEntry+0x12a
1d 981c9bd4 82a266a8 nt!ZwWriteFile+0x11

So here we can see how a write (ZwWriteFile) goes through FltMgr, NTFS, volsnap, rdyboost, fvevol and volmgr (where I set my breakpoint for this blog post).

One big problem with this approach is that the size of the kernel stack in NT is pretty small (depends on the architecture and so on but it's something like 12K or 20K..) and so if there are enough drivers, each of them using some stack space then it is possible to run out of stack. This in fact happens in some cases (AV filters were notorious for using a lot of stack) and the outcome is a bugcheck. Please note that in the example above, most filters were just letting the request pass through them, without necessarily doing anything to it. So they still use stack space even if they don't care about the operation at all…

Another problem with this approach is that it is almost impossible to unload a driver because very often each driver remembers which driver they need to send the IRP to next, so they are either referencing it (so it will never go away) or just using it without referencing it and so immediately after it goes away there is a bugcheck.

FltMgr's main goal was designed to increase system reliability (yeah, making file system filters development easier was just a secondary objective) and it tried to address this issue by making the minifilter model a callback model. This addresses both problems. Unloading a minifilter works because now each filter doesn't need to know which is the next filter to call and so the only component that must reference a minifilter is FltMgr, which then allows a minifilter to go away by informing only FltMgr about it.

The way this takes care of stack usage is a bit more interesting. When the minifilter callback is done it returns to FltMgr a status that instructs FltMgr whether they want to be notified when the request completes or not (or a couple of other statuses) but that's it. The stack space associated with the call to the minifilter's callback (the stack frame) is released and can be reused. This is why in the stack above, the IRP simply goes from IO manager to FltMgr and then to the filesystem. It doesn't matter how many minifilters were attached to the volume, they all use no stack space at all at this time.

Now, let's look in more detail at filter manager's stack frame. There are no minifilters functions on the frame because they all returned nicely to FltMgr and no longer use any stack space. This is the most confusing thing about this, that the minifilters cannot be seen on the stack so it looks like they have never been called at all… However, now that we know that FltMgr must have called some minifilters, is there a way to see which minifilters were called and so on ? In a previous post I explained that FltMgr has an internal structure that wraps the IRP called the IRP_CTRL (also known as a CALLBACK_DATA), and all the information about the request is stored in there. FltMgr clearly must remember the IRP_CTRL associated with this IRP someplace, but where ?

1: kd> kbn
 # ChildEBP RetAddr  Args to Child              
16 981c9a08 9605f20c 93460958 94301bf8 00000000 nt!IofCallDriver+0x63
17 981c9a2c 9605f3cb 981c9a4c 93460958 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x2aa
18 981c9a64 828734bc 93460958 94301bf8 94301bf8 fltmgr!FltpDispatch+0xc5
19 981c9a7c 82a74f6e 93715f80 94301bf8 94301dac nt!IofCallDriver+0x63

Well, it turns out that there is another very useful structure called the IRP_CALL_CTRL, which is a structure that associates an IRP and an IRP_CTRL and other context that FltMgr keeps for the operation:

1: kd> dt 981c9a4c fltmgr!_IRP_CALL_CTRL
   +0x000 Volume           : 0x932f1008 _FLT_VOLUME
   +0x004 Irp              : 0x94301bf8 _IRP
   +0x008 IrpCtrl          : 0x93591de0 _IRP_CTRL
   +0x00c StartingCallbackNode : 0xffffffff _CALLBACK_NODE
   +0x010 OperationStatusCallbackListHead : _SINGLE_LIST_ENTRY
   +0x014 Flags            : 0x204 (No matching name)

From here we can see the IRP_CTRL pointer and call my favorite extension, !fltkd (I get a complaint on my current symbols about how the PVOID type is not defined, which I've edited out):

1: kd> !fltkd.irpctrl 0x93591de0

IRP_CTRL: 93591de0  WRITE (4) [00000001] Irp
Flags                    : [10000004] DontCopyParms FixedAlloc
Irp                      : 94301bf8 
DeviceObject             : 93460958 "\Device\HarddiskVolume2"
FileObject               : 93715f80 
CompletionNodeStack      : 93591e98   Size=5  Next=1
SyncEvent                : (93591df0)
InitiatingInstance       : 00000000 
Icc                      : 981c9a4c 
PendingCallbackNode      : ffffffff 
PendingCallbackContext   : 00000000 
PendingStatus            : 0x00000000 
CallbackData             : (93591e40)
 Flags                    : [00000001] Irp
 Thread                   : 93006020 
 Iopb                     : 93591e6c 
 RequestorMode            : [00] KernelMode
 IoStatus.Status          : 0x00000000 
 IoStatus.Information     : 00000000 
 TagData                  : 00000000 
 FilterContext[0]         : 00000000 
 FilterContext[1]         : 00000000 
 FilterContext[2]         : 00000000 
 FilterContext[3]         : 00000000 

   Cmd     IrpFl   OpFl  CmpFl  Instance FileObjt Completion-Context  Node Adr
--------- -------- ----- -----  -------- -------- ------------------  --------
 [0,0]    00000000  00   0000   00000000 00000000 00000000-00000000   93591fb8
     Args: 00000000 00000000 00000000 00000000 00000000 0000000000000000
 [0,0]    00000000  00   0000   00000000 00000000 00000000-00000000   93591f70
     Args: 00000000 00000000 00000000 00000000 00000000 0000000000000000
 [0,0]    00000000  00   0000   00000000 00000000 00000000-00000000   93591f28
     Args: 00000000 00000000 00000000 00000000 00000000 0000000000000000
 [0,0]    00000000  00   0000   00000000 00000000 00000000-00000000   93591ee0
     Args: 00000000 00000000 00000000 00000000 00000000 0000000000000000
 [4,0]    00060a01  00   0002   9341d918 93715f80 9608e55e-2662d614   93591e98
            ("FileInfo","FileInfo")  fileinfo!FIPostReadWriteCallback 
     Args: 00020000 00000000 003a0000 00000000 92fc6000 0000000000000000
Working IOPB:
>[4,0]    00060a01  00          9341d918 93715f80                     93591e6c
     Args: 00020000 00000000 003a0000 00000000 92fc6000 0000000000000000

Here we can see what the minifilter stack looks like and that the FileInfo minifilter wanted a postOp callback for this operation. Another thing we can do is this (using the FLT_VOLUME pointer from the IRP_CALL_CTRL):

1: kd>  !fltkd.volume 0x932f1008

FLT_VOLUME: 932f1008 "\Device\HarddiskVolume2"
   FLT_OBJECT: 932f1008  [04000000] Volume
      RundownRef               : 0x00000074 (58)
      PointerCount             : 0x00000001 
      PrimaryLink              : [9334f404-932ad9b4] 
   Frame                    : 930adcc0 "Frame 0" 
   Flags                    : [00000064] SetupNotifyCalled EnableNameCaching FilterAttached
   FileSystemType           : [00000002] FLT_FSTYPE_NTFS
   VolumeLink               : [9334f404-932ad9b4] 
   DeviceObject             : 93460958 
   DiskDeviceObject         : 932b2320 
   FrameZeroVolume          : 932f1008 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Ntfs" 
   CDODriverName            : "\FileSystem\Ntfs" 
   TargetedOpenCount        : 55 
   Callbacks                : (932f109c)
   ContextLock              : (932f12f4)
   VolumeContexts           : (932f12f8)  Count=0
   StreamListCtrls          : (932f12fc)  rCount=2630 
   FileListCtrls            : (932f1340)  rCount=0 
   NameCacheCtrl            : (932f1388)
   InstanceList             : (932f1058)
      FLT_INSTANCE: 94114498 "luafv" "135000"
      FLT_INSTANCE: 9341d918 "FileInfo" "45000"

From here we can tell that there are in fact two minifilters attached to this frame , luafv and fileinfo. We knew about fileinfo from the IRP_CTRL, but what about luafv ? Did it even get called ? Well, unfortunately the only thing we can know for sure is that luafv was registered with fltmgr and attached to this volume. They might not have a callback registered for WRITEs or that callback was called but they returned FLT_PREOP_SUCCESS_NO_CALLBACK, so fltmgr didn't use a completion node for it so there is no record of it… We can look at the filter and see the registered callbacks, but we might not be able to find a record of whether the callback was actually called..