Friday, November 12, 2010

Some thoughts on FltDoCompletionProcessingWhenSafe

I've been meaning to talk about this for a while. There is a warning in the MSDN page for FltDoCompletionProcessingWhenSafe which is pretty interesting:

Caution   To avoid deadlocks, FltDoCompletionProcessingWhenSafe cannot be called for I/O operations that can be directly completed by a driver in the storage stack, such as the following:
• IRP_MJ_READ
• IRP_MJ_WRITE
• IRP_MJ_FLUSH_BUFFERS

Let's start by looking a bit at how file systems handle requests. There are multiple ways in which file systems can complete user requests, but largely they fall into a few cases. I'd like to point out that I'm simplifying things here, there are many ways in which file systems might handle operations and the same goes for storage devices… What I'm describing is not an exhaustive list of how things happen in a file system and storage stack, but rather a plausible way in which they can happen in some file systems in some cases:
• Synchronous - when all the data is readily available then the file system doesn't need to do any additional steps and can just perform the operation and return to the caller. For example, when setting the delete disposition on a file, the file system only needs to access the FCB and set the flag (because the delete disposition is a flag on the FCB). If the file system can acquire the FCB immediately it can just set the flag to whatever disposition the caller wanted, release the FCB and call IoCompleteRequest. When this happens the completion routines (and the postOp callbacks for minifilters) are actually called in the same thread as the original operation, at the same IRQL (which is very likely at PASSIVE_LEVEL)...
• Queued (asynchronous) - this happens when the file system realizes it can't complete the operation immediately and it needs to pend the request and complete it when some condition occurs. There are a lot of cases when this happens, for example when the file system needs to acquire some resource and it doesn't want to wait for it inline. Another case where this is pretty much the only course of action is when the caller registers notifications for something (oplocks, directory changes and such) and the IRP gets pended. In these cases, the postOp callbacks will be called generally in the context of the thread that released the resource or that did something to trigger the notification (acknowledge an oplock break, rename a file and so on). This is usually a different thread from the original thread the request came in, and usually the IRQL is <= APC_LEVEL.
• Forwarded - this can happen when the file system needs to get some data from the storage device and it simply forwards the request the underlying device. For example let's say that a user wants to read some aligned data from a file. The file system might simply calculate where the data begins on disk (by consulting its allocation maps which we'll assume are cached so no reading from the device is necessary), change the offset in the IRP_MJ_READ parameters to the right sector where the data is located, then lock the buffer in memory and then call IoCallDriver. When this request will be satisfied by the storage stack, it will call IoCompleteRequest and the file system will pretty much not do anything (or free some resources or some such) and then let the request go up. In this case, the thread in which the postOp callback gets called is the thread that was running when the disk IO was completed by the device (the IO will be completed in an interrupt, which will likely queue a DPC, which will then execute in whatever thread context the CPU happened to be running when the interrupt triggered) and at DPC_LEVEL.

Now, in a lot of cases the file system will need to perform a bunch of things in response to one single user request. For example, a request to write something might mean the file system will need to do at least the following (please ignore the order of the operations here):
• Write the data
• Update the last access time
• Update the file size
All these changes need to be saved to different places on disk (usually, it really depends on the filesystem) so the request might be pended by the file system while it issues a bunch of different IO requests to the storage device and when all of them complete it can complete the request. So in most cases operations are a combination of queued and forwarded operations.

The reason I went into all of this was because I wanted to make this point: in most cases, the postOp callback will be called at DPC only if the operation required one or more IOs to be sent to the storage device and the filesystem didn't need to synchronize the operation back to some internal thread and instead simply had a passthrough completion routine (see FatSingleAsyncCompletionRoutine in the FASTFAT sample ). The file system will not usually complete an operation at DPC in other cases (again, different file system do things differently so it MIGHT still happen).

Now, this means that the either warning or the function are useless, because the only reason the FltDoCompletionProcessingWhenSafe exists is to enable minifilters to write completion routines that use functions that require being <= APC_LEVEL and not worry about whether the postOp callback is called at DPC. So if according to the warning, "FltDoCompletionProcessingWhenSafe cannot be called for I/O operations that can be directly completed by a driver in the storage stack", then this is like saying that FltDoCompletionProcessingWhenSafe cannot be called for operations that might be completed at DPC_LEVEL, which is the only case where it is useful.

I'll talk about the actual deadlocks in a post next week.