Thursday, October 27, 2011

TargetInstance Redirection Problems for FastIO on WinXP

Frankly I don't expect this is a problem that many people will run into but I'd like to show some of the debugging that led me to figure out the problem and what the implications are. I've already explained how using TargetInstance might help filters and some of the issues associated with it in my post on File IO Redirection Between Volumes Using FltMgr and I also have a post on Handling IRP_MJ_NETWORK_QUERY_OPEN in a Minifilter and I encourage you to revisit those posts if you need a refresher. In the post on handling IRP_MJ_NETWORK_QUERY_OPEN my suggestion was to return STATUS_FLT_DISALLOW_FAST_IO if you don't want to deal with all the weird semantics it introduces. However, there is a small performance overhead associated with failing IRP_MJ_NETWORK_QUERY_OPEN in this manner so while I was chasing down some performance issues I decided to actually implement this path. The filter I was working on was a pretty classic design, returning STATUS_REPARSE to redirect IRP_MJ_CREATEs to a different volume. Also, let's use the simplifying assumption that the file name was exactly the same between the two volumes. This meant that in IRP_MJ_NETWORK_QUERY_OPEN I should be able to just redirect the request to a different volume by changing the TargetInstance to the instance associated with the other volume and the request would then follow down that path and get the attributes for the file on the other volume. And since there is no handle open as a result of this operation I didn't have to worry about subsequent operations and such.
I'll post some pseudocode because there is just too much infrastructure to set things up properly in the passthrough sample. There is an instance context that I use to figure out if we need to redirect requests and where to redirect them (if there is no context attached then I don't redirect anything):
typedef struct _MY_INSTANCE_CONTEXT {

    PFLT_INSTANCE InstanceToRedirectTo;

} MY_INSTANCE_CONTEXT, *PMY_INSTANCE_CONTEXT;
And the following piece of code I've added to PtPreOperationPassThrough:
        if (!NT_SUCCESS(status)) {

            PT_DBG_PRINT( PTDBG_TRACE_OPERATION_STATUS,
                          ("PassThrough!PtPreOperationPassThrough: FltRequestOperationStatusCallback Failed, status=%08x\n",
                           status) );
        }
    }

    if (Data->Iopb->MajorFunction == IRP_MJ_NETWORK_QUERY_OPEN) {

        status = FltGetInstanceContext( FltObjects->Instance, &instanceContext );

        if (NT_SUCCESS(status)) {

            //
            // send this request to the instance we want it to go to and we must
            // mark the FLT_CALLBACK_DATA dirty.
            //

            Data->Iopb->TargetInstance = instanceContext->InstanceToRedirectTo;

            FltSetCallbackDataDirty( Data );

            //
            // we'll release this in the postOp callback.
            //
            
            *CompletionContext = (PVOID)instanceContext;
        } else {

            if (status == STATUS_NOT_FOUND) {

                //
                // this isn't an instance for which we want to redirect this 
                // operation, send the request down and don't care about
                // the postOp Callback.
                //

                return FLT_PREOP_SUCCESS_NO_CALLBACK;

            } else {

                //
                // some other error. we can either fail the request here or
                // we can just return STATUS_FLT_DISALLOW_FAST_IO and we'll
                // get another shot at it on the IRP_MJ_CREATE path. 
                //

                return STATUS_FLT_DISALLOW_FAST_IO ;
            }
        }
    }

    return FLT_PREOP_SUCCESS_WITH_CALLBACK;
}
Again, this is very simplified to only show how to set the TargetInstance but there are a couple of things I'd like to point out. Because of how FastIO works (each driver calls the next driver passing parameters on the stack) FastIO doesn't have the problem described in my post on File IO Redirection Between Volumes Using FltMgr because there is no IRP and there are no IO_STACK_LOCATIONs (it is possible though to run out of thread stack but that can also be worked around). Also, in terms of referencing, please note that I'm keeping a reference to the instance context from preOp to postOp callback which in turn keeps the instance pointed by instanceContext->InstanceToRedirectTo around (though of course there are multiple different ways to achieve the same result).
So anyway, the code I have works fine in Win7 (after I disabled LUAFV because LUAFV always fails IRP_MJ_NETWORK_QUERY_OPEN with IRP_MJ_NETWORK_QUERY_OPEN; if you're wondering why I went through all the trouble because LUAFV will be running on all Vista and Win7 machines anyway then let me remind you that server SKUs don't have LUAFV running so there are machines out there running the Win7 kernel without LUAFV in the picture so my code might actually help them; also as you expect performance is a much bigger concern for servers). However, on WinXP SP3 I kept getting STATUS_OBJECT_NAME_NOT_FOUND (and the other statuses that indicate that the file isn't there) but the file was definitely present. Having tested that Win7 worked I started to wonder whether there was something different in WinXP that I needed to worry about. So I decided to see whether the request makes it to the right volume after all:
1: kd> kn L5 
 # ChildEBP RetAddr  
00 f53f293c f8477888 myfilter!PreNetworkQueryOpen // this is my preOp callback
01 f53f299c f84791a7 fltMgr!FltpPerformPreCallbacks+0x2d4 // this calls the preOp callbacks
02 f53f29b4 f8485c7a fltMgr!FltpPassThroughFastIo+0x3b // this is FLtMgr's function to process FastIO operations
03 f53f29f8 f83d6f70 fltMgr!FltpFastIoQueryOpen+0xf4 // FltMgr's FastIO callback for this operation
04 f53f2a18 805830fe sr!SrFastIoQueryOpen+0x40 // SR is issuing the request
1: kd> ?? Data // we need the address of the FLT_CALLBACK_DATA
struct _FLT_CALLBACK_DATA * 0x81b49684
   +0x000 Flags            : 2
...
1: kd> dt 0x81b49684 fltmgr!_FLT_CALLBACK_DATA Iopb->TargetInstance // See what is the instance the request was originally going to 
   +0x008 Iopb                 : 
      +0x00c TargetInstance       : 0x820d3008 _FLT_INSTANCE
1: kd> dt 0x820d3008 fltmgr!_FLT_INSTANCE Volume // get the volume from the instance 
   +0x018 Volume : 0x8237e5c0 _FLT_VOLUME
1: kd> dt  0x8237e5c0 fltmgr!_FLT_VOLUME DeviceObject // get FltMgr's DEVICE_OBJECT from the volume
   +0x01c DeviceObject : 0x823dac70 _DEVICE_OBJECT
1: kd> !devstack 0x823dac70 // see what's the bottom DEVICE_OBJECT for this volume. 
  !DevObj   !DrvObj            !DevExt   ObjectName
  823637a8  \FileSystem\sr     82363860  
> 823dac70  \FileSystem\FltMgr 823dad28  
  822fe020  \FileSystem\Ntfs   822fe0d8  // so we have NTFS on the bottom
1: kd> bp /t @$thread f8477888 // ok, now let's step out of my preOp callback on this thread and see what we change the instance to 
1: kd> bl
 0 e f8477888     0001 (0001) fltMgr!FltpPerformPreCallbacks+0x2d4
     Match thread data 81a3cbe8

1: kd> g
Breakpoint 0 hit
fltMgr!FltpPerformPreCallbacks+0x2d4:
f8477888 83f802          cmp     eax,2
1: kd> bc 0
1: kd> dt 0x81b49684 fltmgr!_FLT_CALLBACK_DATA Iopb->TargetInstance // it's the same FLT_CALLBACK_DATA but the instance should be different
   +0x008 Iopb                 : 
      +0x00c TargetInstance       : 0x820d9008 _FLT_INSTANCE
1: kd> dt  0x820d9008 fltmgr!_FLT_INSTANCE Volume // get the volume for the new instance
   +0x018 Volume : 0x820ebae0 _FLT_VOLUME
1: kd> dt 0x820ebae0 fltmgr!_FLT_VOLUME DeviceObject // get the DEVICE_OBJECT for the volume
   +0x01c DeviceObject : 0x820ebee8 _DEVICE_OBJECT
1: kd> !devstack 0x820ebee8 // see what's the bottom DEVICE_OBJECT… again, NTFS… 
  !DevObj   !DrvObj            !DevExt   ObjectName
  820eb020  \FileSystem\sr     820eb0d8  
> 820ebee8  \FileSystem\FltMgr 820ebfa0  
  820ea020  \FileSystem\Ntfs   820ea0d8  
1: kd> bp /t @$thread Ntfs!NtfsNetworkOpenCreate // ok, put a break on NTFS's function that processes this FastIO on this thread 
1: kd> g
Breakpoint 0 hit
Ntfs!NtfsNetworkOpenCreate:
f834ffb8 6878010000      push    178h
1: kd> bc 0
1: kd> kb L5 // show us the stack with parameters so we can see which device the request was actually sent to.
ChildEBP RetAddr  Args to Child              
f53f2968 f84790e8 81a6c380 f53f2c00 822fe020 Ntfs!NtfsNetworkOpenCreate // what do you know, it's the original DEVICE_OBJECT: 822fe020
f53f2988 f84791e4 000000f2 00000000 81b496c0 fltMgr!FltpPerformFastIoCall+0x300
f53f29b4 f8485c7a 003f29d8 823637a8 81a6c510 fltMgr!FltpPassThroughFastIo+0x78
f53f29f8 f83d6f70 81a6c380 f53f2c00 823dac70 fltMgr!FltpFastIoQueryOpen+0xf4
f53f2a18 805830fe 81a6c380 f53f2c00 823637a8 sr!SrFastIoQueryOpen+0x40
So what I did was to get the FLT_CALLBACK_DATA at the beginning of my callback and from that extract the file system's DEVICE_OBJECT on which the original request was sent. Then I let my callback run and I checked what the new stack instance was and got the file system's DEVICE_OBJECT on that stack. Then I simply let the request go until it hit the file system (NTFS on both volumes in this case) and then on the stack I can see which DEVICE_OBJECT the request was actually sent to. And, as I suspected, the request was sent on the original DEVICE_OBJECT and not the DEVICE_OBJECT for the instance I switched to. But why ? What should I have changed to make the request go where I wanted ? With some stepping through the code and reading a bunch of assembly I got to this part:
1: kd> u fltMgr!FltpPassThroughFastIo+0x55 L0xE
fltMgr!FltpPassThroughFastIo+0x55:
f84791c1 8b0f            mov     ecx,dword ptr [edi] // what is EDI
f84791c3 8b4664          mov     eax,dword ptr [esi+64h] // what is ESI ?
f84791c6 8d5e68          lea     ebx,[esi+68h]
f84791c9 53              push    ebx
f84791ca ff711c          push    dword ptr [ecx+1Ch]
f84791cd 8d4810          lea     ecx,[eax+10h]
f84791d0 ff7640          push    dword ptr [esi+40h]
f84791d3 51              push    ecx
f84791d4 33c9            xor     ecx,ecx
f84791d6 8a4805          mov     cl,byte ptr [eax+5]
f84791d9 0fb64004        movzx   eax,byte ptr [eax+4]
f84791dd 51              push    ecx
f84791de 50              push    eax
f84791df e804fcffff      call    fltMgr!FltpPerformFastIoCall (f8478de8)
1: kd> !pool @esi 2
Pool page 81b49628 region is Nonpaged pool
*81b49620 size:  108 previous size:   18  (Allocated) *FMic
  Pooltag FMic : IRP_CTRL structure, Binary : fltmgr.sys
1: kd> r @edi
edi=f53f29d8 // this is an address on the current stack
1: kd> dp f53f29d8
f53f29d8  8237e5c0 00000000 81b49628 ffffffff // so this structure has a pointer to the FLT_VOLUME and IRP_CTRL.. Must be the IRP_CALL_CTRL
f53f29e8  00000000 00000000 000001b4 0000493e
1: kd> dt @edi fltmgr!_IRP_CALL_CTRL
   +0x000 Volume           : 0x8237e5c0 _FLT_VOLUME
   +0x004 Irp              : (null) 
   +0x008 IrpCtrl          : 0x81b49628 _IRP_CTRL
   +0x00c StartingCallbackNode : 0xffffffff _CALLBACK_NODE
   +0x010 OperationStatusCallbackListHead : _SINGLE_LIST_ENTRY
   +0x014 Flags            : 0 (No matching name)
1: kd> dt fltmgr!_FLT_VOLUME
   +0x000 Base             : _FLT_OBJECT
   +0x014 Flags            : _FLT_VOLUME_FLAGS
   +0x018 FileSystemType   : _FLT_FILESYSTEM_TYPE
   +0x01c DeviceObject     : Ptr32 _DEVICE_OBJECT
….
So as you can see it looks like FltMgr picks the DEVICE_OBJECT from the IRP_CALL_CTRL->Volume structure. Let's see what happens in Win7:
0: kd> u fltmgr!FltpPassThroughFastIo+0x5a L0xD
fltmgr!FltpPassThroughFastIo+0x5a:
96019198 8b4668          mov     eax,dword ptr [esi+68h] // offset 0x68 where we had 0x64 in XP
9601919b 8d5e6c          lea     ebx,[esi+6Ch] // offset 0x6C where we had 0x68 in XP… did the IRP_CTRL change ?
9601919e 832300          and     dword ptr [ebx],0
960191a1 53              push    ebx
960191a2 ff763c          push    dword ptr [esi+3Ch] // and then there is a push for IRP_CTRL+0x3c instead of IRP_CALL_CTRL->Volume+0x1c..
960191a5 8d4810          lea     ecx,[eax+10h]
960191a8 ff7640          push    dword ptr [esi+40h]
960191ab 51              push    ecx
960191ac 0fb64805        movzx   ecx,byte ptr [eax+5]
960191b0 0fb64004        movzx   eax,byte ptr [eax+4]
960191b4 51              push    ecx
960191b5 50              push    eax
960191b6 e803fcffff      call    fltmgr!FltpPerformFastIoCall (96018dbe)
1: kd> dt fltmgr!_IRP_CTRL
   +0x000 Type             : _FLT_TYPE
   +0x004 Flags            : _IRP_CTRL_FLAGS
   +0x008 MajorFunction    : UChar
   +0x009 Reserved0        : UChar
   +0x00a CompletionStackLength : UChar
   +0x00b NextCompletion   : UChar
   +0x00c CompletionStack  : Ptr32 _COMPLETION_NODE
   +0x010 SyncEvent        : _KEVENT
   +0x020 Irp              : Ptr32 _IRP
   +0x020 FsFilterData     : Ptr32 _FS_FILTER_CALLBACK_DATA
   +0x024 AsyncCompletionRoutine : Ptr32     void 
   +0x028 AsyncCompletionContext : Ptr32 Void
   +0x02c InitiatingInstance : Ptr32 _FLT_INSTANCE
   +0x030 PendingCallbackNode : Ptr32 _CALLBACK_NODE
   +0x030 StartingCallbackNode : Ptr32 _CALLBACK_NODE
   +0x034 preOp            : __unnamed
   +0x034 postOp           : __unnamed
   +0x038 PostCompletionRoutine : Ptr32     void 
   +0x03c DeviceObject     : Ptr32 _DEVICE_OBJECT // so we get the DEVICE_OBJECT from the IRP_CTRL
...
Ok, so what's going on is that in Win7 it looks like the DEVICE_OBJECT is taken from the IRP_CTRL (which is the internal FltMgr structure that hosts the FLT_CALLBACK_DATA), which makes sense since we change the TargetInstance in the FLT_CALLBACK_DATA. In XP the DEVICE_OBJECT is taken from the IRP_CALL_CTRL->Volume and I haven't been able to find any code path that updates the IRP_CALL_CTRL. So based on this I've decided that this is an XP bug and that I can't really work around it for WinXP (since there is no way to update the FLT_VOLUME inside the IRP_CALL_CTRL which are both undocumented btw..). So I've updated my code so that in WinXP it always returns STATUS_FLT_DISALLOW_FAST_IO.
Finally, there is one more aspect to discuss. It looks like FltpPassThroughFastIo is a generic handler for all FastIO routines and as such this problem might actually be affecting all FastIO in WinXP and not only IRP_MJ_NETWORK_QUERY_OPEN, so if you see that TargetInstance redirection isn't working then it might be this issue.