Friday, November 12, 2010

Some thoughts on FltDoCompletionProcessingWhenSafe

I've been meaning to talk about this for a while. There is a warning in the MSDN page for FltDoCompletionProcessingWhenSafe which is pretty interesting:

Caution   To avoid deadlocks, FltDoCompletionProcessingWhenSafe cannot be called for I/O operations that can be directly completed by a driver in the storage stack, such as the following:
• IRP_MJ_READ
• IRP_MJ_WRITE
• IRP_MJ_FLUSH_BUFFERS

Let's start by looking a bit at how file systems handle requests. There are multiple ways in which file systems can complete user requests, but largely they fall into a few cases. I'd like to point out that I'm simplifying things here, there are many ways in which file systems might handle operations and the same goes for storage devices… What I'm describing is not an exhaustive list of how things happen in a file system and storage stack, but rather a plausible way in which they can happen in some file systems in some cases:
• Synchronous - when all the data is readily available then the file system doesn't need to do any additional steps and can just perform the operation and return to the caller. For example, when setting the delete disposition on a file, the file system only needs to access the FCB and set the flag (because the delete disposition is a flag on the FCB). If the file system can acquire the FCB immediately it can just set the flag to whatever disposition the caller wanted, release the FCB and call IoCompleteRequest. When this happens the completion routines (and the postOp callbacks for minifilters) are actually called in the same thread as the original operation, at the same IRQL (which is very likely at PASSIVE_LEVEL)...
• Queued (asynchronous) - this happens when the file system realizes it can't complete the operation immediately and it needs to pend the request and complete it when some condition occurs. There are a lot of cases when this happens, for example when the file system needs to acquire some resource and it doesn't want to wait for it inline. Another case where this is pretty much the only course of action is when the caller registers notifications for something (oplocks, directory changes and such) and the IRP gets pended. In these cases, the postOp callbacks will be called generally in the context of the thread that released the resource or that did something to trigger the notification (acknowledge an oplock break, rename a file and so on). This is usually a different thread from the original thread the request came in, and usually the IRQL is <= APC_LEVEL.
• Forwarded - this can happen when the file system needs to get some data from the storage device and it simply forwards the request the underlying device. For example let's say that a user wants to read some aligned data from a file. The file system might simply calculate where the data begins on disk (by consulting its allocation maps which we'll assume are cached so no reading from the device is necessary), change the offset in the IRP_MJ_READ parameters to the right sector where the data is located, then lock the buffer in memory and then call IoCallDriver. When this request will be satisfied by the storage stack, it will call IoCompleteRequest and the file system will pretty much not do anything (or free some resources or some such) and then let the request go up. In this case, the thread in which the postOp callback gets called is the thread that was running when the disk IO was completed by the device (the IO will be completed in an interrupt, which will likely queue a DPC, which will then execute in whatever thread context the CPU happened to be running when the interrupt triggered) and at DPC_LEVEL.

Now, in a lot of cases the file system will need to perform a bunch of things in response to one single user request. For example, a request to write something might mean the file system will need to do at least the following (please ignore the order of the operations here):
• Write the data
• Update the last access time
• Update the file size
All these changes need to be saved to different places on disk (usually, it really depends on the filesystem) so the request might be pended by the file system while it issues a bunch of different IO requests to the storage device and when all of them complete it can complete the request. So in most cases operations are a combination of queued and forwarded operations.

The reason I went into all of this was because I wanted to make this point: in most cases, the postOp callback will be called at DPC only if the operation required one or more IOs to be sent to the storage device and the filesystem didn't need to synchronize the operation back to some internal thread and instead simply had a passthrough completion routine (see FatSingleAsyncCompletionRoutine in the FASTFAT sample ). The file system will not usually complete an operation at DPC in other cases (again, different file system do things differently so it MIGHT still happen).

Now, this means that the either warning or the function are useless, because the only reason the FltDoCompletionProcessingWhenSafe exists is to enable minifilters to write completion routines that use functions that require being <= APC_LEVEL and not worry about whether the postOp callback is called at DPC. So if according to the warning, "FltDoCompletionProcessingWhenSafe cannot be called for I/O operations that can be directly completed by a driver in the storage stack", then this is like saying that FltDoCompletionProcessingWhenSafe cannot be called for operations that might be completed at DPC_LEVEL, which is the only case where it is useful.

I'll talk about the actual deadlocks in a post next week.

Thursday, November 4, 2010

ObQueryNameString can return names with a NULL Buffer ( and an example with SR.sys)

ObQueryNameString is a very useful API. It's used in a lot of places and is a pretty good choice if you want to find the name for an OB object. However, using it is not without pitfalls. At the moment the documentation page on MSDN has this to say in the Remarks section: "If the given object is unnamed, or if the object name was not successfully acquired, ObQueryNameString sets Name.Buffer to NULL and sets Name.Length and Name.MaximumLength to zero.". What is not clearly spelled in there is the fact that the return status in this case will be STATUS_SUCCESS.

So let's recap. Any app developer can call ObQueryNameString and get STATUS_SUCCESS, but the Name.Buffer will be NULL and they might not expect that. I've seen this issue over and over again. People get a reference to an object and they query the name, get a NULL buffer and then try to read/compare/do whatever with it and they get a visit from the bugcheck fairy. Please note that since the Length and MaximumLength are both 0, people would be safe using the Rtl functions since they tend to check these sort of things..

It is interesting to understand the reasons about how people can get bitten by this as well. The documentation specifies that "If the given object is unnamed, or if the object name was not successfully acquired...", which I guess for most people it translates into "if the name was not successfully acquired then I will some error NTSTATUS… if this object is unnamed then it's not clear what I get, maybe also some error code?..". So I suppose that people that create named objects that they own (or objects that the system creates and are guaranteed to be named) imagine that they can never get the NULL buffer and STATUS_SUCCESS. But any named object can become unnamed when it is deleted. After all, the namespace entry is simply an additional reference to the object and deleting a named object simply deletes that reference, but the object might still be kept around by other references. One easy way to see this is to follow the calls to IoCreateDevice. For example, for an unnamed device one can see this :

Imediately after a IoCreateDevice for an unnamed device:

3: kd> !devobj 93602e48  
Device object (93602e48) is for:
  \FileSystem\FltMgr DriverObject 92cb6660
Current Irp 00000000 RefCount 0 Type 00000003 Flags 00000080
DevExt 93602f00 DevObjExt 93602f30 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
Device queue is not busy.

3: kd> !object 93602e48  
Object: 93602e48  Type: (922d6440) Device
    ObjectHeader: 93602e30 (new version)
    HandleCount: 0  PointerCount: 1

And immediately after a named device:
2: kd> !devobj 930e0628  
Device object (930e0628) is for:
 FltMgr \FileSystem\FltMgr DriverObject 92f691e8
Current Irp 00000000 RefCount 0 Type 00000008 Flags 000000c0
Dacl 96fd2eec DevExt 00000000 DevObjExt 930e06e0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
Device queue is not busy.

2: kd> !object 930e0628
Object: 930e0628  Type: (922d7508) Device
    ObjectHeader: 930e0610 (new version)
    HandleCount: 0  PointerCount: 2
    Directory Object: 96e61948  Name: FltMgr

Please notice how the pointer count is different. Once the named device is deleted (IoDeleteDevice), the reference from the OB namespace is removed (and the object's name in the OB header is changed) and then, when the reference count eventually reaches 0, the object is freed. However, it anyone calls ObQueryNameString on one of those references, they will get the NULL Name.Buffer...

So it is perfectly possible for a driver that is working with an object that it knows it must be named to actually get in the window between when the object is removed from the OB namespace and the time the final reference is released (the driver will of course have a reference of its own in order to be able to access the object… ). So that this means is that calling ObQueryNameString might return STATUS_SUCCESS and a NULL Name.Buffer even for a named object.

I've recently had the pleasure to debug an issue with SR.sys and my virtual volume drive on XP SP3. I will share it since it was somewhat interesting and it points out to this specific issue. This is what the stack looks like:

1: kd> lm v m sr
start    end        module name
f8489000 f849af00   sr         (pdb symbols)          d:\symbols\sr.pdb\9D5432B7234C4CD2A8F6275B9D9AF41F1\sr.pdb
    Loaded symbol image file: sr.sys
    Image path: sr.sys
    Image name: sr.sys
    Timestamp:        Sun Apr 13 11:36:50 2008 (480252C2)
    CheckSum:         00012604
    ImageSize:        00011F00
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
The problem in SR is this one:
sr!SrGetObjectName+0xd4:
f849105c ff15c49b48f8    call    dword ptr [sr!_imp__ObQueryNameString (f8489bc4)]  <- call ObQueryNameString
f8491062 3bc3            cmp     eax,ebx   <- check for STATUS_SUCCESS
f8491064 894514          mov     dword ptr [ebp+14h],eax  <- save the status...
f8491067 7c24            jl      sr!SrGetObjectName+0x105 (f849108d)
f8491069 0fb707          movzx   eax,word ptr [edi]   <-  this is the Length member of the UNICODE_STRING for the name
f849106c 8b4f04          mov     ecx,dword ptr [edi+4]  <- this is the Buffer member of the UNICODE_STRING..
f849106f d1e8            shr     eax,1   <- calculate the number of characters instead of the number of bytes
f8491071 66897702        mov     word ptr [edi+2],si   <- write some value in MaximumLength… 
f8491075 66891c41        mov     word ptr [ecx+eax*2],bx  <-   write in the buffer a 0 (basically, make sure the string is NULL terminated).. But ECX can be NULL
The stack when I hit this problem looks like this:
1: kd> kbn
 # ChildEBP RetAddr  Args to Child              
00 f80b7944 f849440d 00000000 81dc0a18 e10eac08 sr!SrGetObjectName+0xed
01 f80b7990 f848ecf2 81dc0a18 8239a818 f80b79c0 sr!SrCreateAttachmentDevice+0x99
02 f80b79c4 f848ee0f 8239a818 8239a8d0 81fb4d48 sr!SrFsControlMount+0x2e
03 f80b79e0 804ef18f 8239a8d0 81fb4c90 81fb4c90 sr!SrFsControl+0x4b
04 f80b79f0 80581bc7 00000000 81dc0a18 806e6a4c nt!IopfCallDriver+0x31
05 f80b7a40 804f53d6 c000014f f80b7b00 00000000 nt!IopMountVolume+0x1b9
06 f80b7a70 80582bc0 81e1f268 81dc0a18 f80b7ba4 nt!IopCheckVpbMounted+0x5e
07 f80b7b60 805bf444 81dc0a18 00000000 81fc6600 nt!IopParseDevice+0x3d8
08 f80b7bd8 805bb9d0 00000000 f80b7c18 00000040 nt!ObpLookupObjectName+0x53c
09 f80b7c2c 80576033 00000000 00000000 00000001 nt!ObOpenObjectByName+0xea
0a f80b7ca8 805769aa 009bef80 00100001 009bef24 nt!IopCreateFile+0x407
0b f80b7d04 8057a1a9 009bef80 00100001 009bef24 nt!IoCreateFile+0x8e
0c f80b7d44 8054161c 009bef80 00100001 009bef24 nt!NtOpenFile+0x27
0d f80b7d44 7c90e4f4 009bef80 00100001 009bef24 nt!KiFastCallEntry+0xfc
0e 009beef4 7c90d58c 7c80ec86 009bef80 00100001 ntdll!KiFastSystemCallRet
0f 009beef8 7c80ec86 009bef80 00100001 009bef24 ntdll!NtOpenFile+0xc
10 009bf1f0 7c80ef87 01be31e8 00000000 01be7bf0 kernel32!FindFirstFileExW+0x1a7
11 009bf210 751b1e05 01be31e8 01be7bf0 751a2a04 kernel32!FindFirstFileW+0x16
12 009bf240 751aad1f 009bf714 00000001 000e1358 srsvc!Delnode_Recurse+0x12e
13 009bfb34 751abd1f 009bfd54 7c97b440 7c97b420 srsvc!CEventHandler::OnFirstWrite_Notification+0x3cd
14 009bff60 7c927ba5 00000000 0000006a 000e5f40 srsvc!IoCompletionCallback+0x17a
15 009bff74 7c927b7c 751abba5 00000000 0000006a ntdll!RtlpApcCallout+0x11
16 009bffb4 7c80b713 00000000 00000000 00000000 ntdll!RtlpWorkerThread+0x87
17 009bffec 00000000 7c910230 00000000 00000000 kernel32!BaseThreadStart+0x37
So as you can see, in the mount path SR.sys is trying to create their device to attach to the volume and while doing that it tries to get the name for this device:
1: kd> !devobj 81dc0a18 
Device object (81dc0a18) is for:
  \Driver\IvmVhd DriverObject 81fad590
Current Irp 00000000 RefCount 1 Type 00000007 Flags 00000050
Vpb 81ea0f10 Dacl e1f17924 DevExt 81dc0ad0 DevObjExt 81dc0c30 
ExtensionFlags (0x00000002)  DOE_DELETE_PENDING
Device queue is not busy.
This happens to be my virtual volume device, which as you can tell from the DOE_DELETE_PENDING flag, is about to be torn down. So what this all looks like is this:

1. Something is trying to open a file, see the IopCreateFile (frame 0xb)

2. Io manager, while trying to send the IRP_MJ_CREATE irp (frame 7) wants to make sure the volume is mounted. Please note that at this point the volume device is still in the OB namespace, since otherwise the ObpLookupObjectName call (frame 8) would not have been able to reference it… So at this point IO manager resolved the name to a device object and it now has a reference to the device...

3. IopCheckVpbMounted (frame 6) finds the volume is not mounted (since I dismount it before tearing it down) so it tries to mount it…

4. SR.sys gets the mount request and is trying to build a device to attach to the newly mounted volume (in case the mount succeeds). This is pretty standard stuff for a legacy filter… Anyway, in doing so it calls ObQueryNameString which no longer finds a name for the device and returns a NULL buffer. SR checks for NT_SUCCESS but doesn't check the buffer to make sure it's not null (or even the length which is 0) and it blindly tries to make sure the string is NULL terminated (which is also pointless since ObQueryNameString documentation mentions that "The object name (when present) includes a NULL-terminator and all path separators "\" in the name.")… bugcheck.

What my driver did was simply call IoDeleteDevice somewhere between frame 8 and frame 0.

I'm willing to bet that not checking for the null Name.Buffer is a pretty common mistake. For example, there is some code posted on a blog that looks like this:

Status = ObQueryNameString(FileObject->Vpb->RealDevice,OBI,Returned,&Returned ); 

if(NT_SUCCESS(Status)) { 
if(Root) { 
wcscat(OBI->Name.Buffer,L"\\"); 
OBI->Name.Length+=sizeof(WCHAR); 
}