Thursday, May 5, 2011

IRP Completion, STATUS_PENDING and FltMgr

One of the trickiest things filter developers need to deal with is IRP completion. There are two documents that describe the issue in great detail: Handling IRPs: What Every Driver Writer Needs to Know and Secrets of the Universe Revealed! - How NT Handles I/O Completion. I tend to re-read them every time I need to deal with IO completion because the subject is pretty complex and it doesn't come up frequently enough that I am confident to get it right without a refresher. This is going to be a pretty dry post, but it is a pretty important one because it is one of the issues that I've seen that causes a lot of application compatibility issues.

The thing I'd like to talk about in this post is STATUS_PENDING and how it impacts minifilter development. As a quick refresher, if any component in the IO path decides it might want to process an IRP at a later time (for reasons that can vary from resource usage to synchronization) it has the options of either blocking the calling thread until it can process the IRP or to return STATUS_PENDING which doesn't block the calling thread and lets the caller do something useful in the mean time. STATUS_PENDING is the preferred mechanism for doing this, but some components look at the request to figure out if it is a synchronous request or not (in other words, to figure out if it is worth releasing the thread since if the request is synchronous then the IO manager would wait for the IRP to complete anyway and releasing the thread won't do any good) using APIs such as IoIsOperationSynchronous and its minifilter counterpart FltIsOperationSynchronous and only return STATUS_PENDING if the request is not synchronous. Please note that this only applies to IRPs. FastIo and fsFilter type requests are synchronous anyway and cannot be pended. The really important thing here is that the general rule is that ANY IRP might return STATUS_PENDING and the caller should be prepared to handle this status, regardless of whether the FILE_OBJECT is opened for synchronous IO or the IRP is marked as synchronous and so on. Basically, any call to IoCallDriver can return STATUS_PENDING. In fact, driver verifier has a mode where it forces returning STATUS_PENDING for pretty much all IO just to detect drivers that are not prepared to handle it properly.

Minifilters however don't deal with IRPs directly and IO processing is a bit different. While processing a preOp a minifilter might return FLT_PREOP_PENDING to achieve a similar effect to a legacy filter returning STATUS_PENDING. Then, when the minifilter has processed the IO and is ready to return it to FltMgr it must call FltCompletePendedPreOperation (please note that this doesn't mean that it may only return FLT_PREOP_COMPLETE; the minifilter can return any of the usual FLT_PREOP_CALLBACK_STATUSes except FLT_PREOP_PENDING, which would be meaningless in this context anyway since if the minifilter is not done processing the request, why call FltCompletePendedPreOperation at all ?). During the postOp a minifilter can also do a similar thing by returning FLT_POSTOP_MORE_PROCESSING_REQUIRED to FltMgr and then by calling FltCompletePendedPostOperation when it is done with the request.

Someone that has written some legacy filters might notice that there is one significant difference between the IRP model and the minifilter model. In the IRP case, a filter cannot return STATUS_MORE_PROCESSING_REQUIRED from a completion routine if it STATUS_PENDING wasn't returned for IRP. This is usually pretty tricky in a legacy filter because it may not be possible to know for sure when the filter might want to return STATUS_MORE_PROCESSING_REQUIRED in a completion routine and so the legacy filter has to return STATUS_PENDING while processing the preOp in all possible cases where it might want to return STATUS_MORE_PROCESSING_REQUIRED and then in the completion routine decide if it wants to actually return it or not. Minifilters have it real easy here, they can just have a preOp that returns FLT_PREOP_SUCCESS_WITH_CALLBACK (to indicate they want a postOp callback) and the in the postOp callback decide whether they need to return FLT_POSTOP_MORE_PROCESSING_REQUIRED.

So how does this work? Well, FltMgr is a legacy filter and as I said before, legacy filters that couldn't know beforehand if they will want to pend the IRP during postOp processing had to return STATUS_PENDING anyway. So FltMgr calls the PreOp callbacks for all filters in the frame and if any of them registered for a postOp callback but didn't require the operation to be synchronized (since FLT_PREOP_SYNCHRONIZE is implemented by holding the thread until the request is completed by the layers below so that the postOp routine is called in the context of this same thread) then FltMgr will return STATUS_PENDING anyway. This is a very important point for this topic so I'm going to restate it: if a minifilter registers a postOp routine and if it returns FLT_PREOP_SUCCESS_WITH_CALLBACK from the preOp callback then FltMgr will return STATUS_PENDING for that operation (provided that another minifilter in the frame didn't return FLT_PREOP_SYNCHRONIZE).

So we finally get to the juicy part. Some applications have a type of bug where they open a handle for asynchronous IO and then use it in a synchronous fashion. For example, they might call CreateFile() for asynchronous IO (with the FILE_FLAG_OVERLAPPED set) and then call ReadFile() without specifying an OVERLAPPED structure. This might even work in most cases (there is KB article that describes some cases where asynchronous IO might be converted to synchronous IO: Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP.) and the issue might not be detected for a long time. However, as we've explained above, FltMgr might return STATUS_PENDING regardless of what a file system would return. One might expect that ReadFile would return some failure code when it detects this case, but unfortunately that's not what happens. ReadFile will check if the OVERLAPPED structure is present and if it is not then it expects that the file has been opened the synchronous IO, so if it gets STATUS_PENDING it will simply wait on the file handle. The problem in this case is the file handle will be signaled every time IO completes, so for a handle opened for asynchronous IO that is actually used asynchronously, the file handle will be signaled when ANY operation completes, so just waiting for the handle doesn't really achieve the intended effect and instead the function might return before the actual IO is completed. This is hinted at in the documentation, which states "If hFile is opened with FILE_FLAG_OVERLAPPED, the lpOverlapped parameter must point to a valid and unique OVERLAPPED structure, otherwise the function can incorrectly report that the read operation is complete."(in the page for the ReadFile() function).

So the net effect of all this corner cases is that an application that is broken in this way but has always worked because of some NTFS implementation quirks might stop working. What's even worse is the fact that it might stop working in pretty interesting and hard to detect ways. For example, consider the case when the application is writing to a file using WriteFile and it mistakenly believes the operation has completed and it frees or reuses the buffer. The actual data written in the file will be whatever happened to be in the buffer when the write is actually processed, which is a clear data corruption scenario. The same goes for ReadFile(), the application might think it has a buffer with data from a file and might get in a broken state trying to process that buffer.

Another issue is that even if a user is diligent enough to associate the presence of a filter on their system with some random file corruptions, the bug report might be something like "file gets corrupted when installing filter X". In this case the developer might have a really hard reproducing the issue because they might not have exactly the same application version as the user or they might have a different filter that masks the issue (remember that if another filter in the frame or in a frame above that frame happens to return FLT_PREOP_SYNCHRONIZE then STATUS_PENDING will not be returned so things might still work) or they don't even process IRP_MJ_READ and IRP_MJ_WRITE so they don't expect any data contents to change (but a file might become corrupt in other ways, for example if the EOF is moved improperly, which is an IRP_MJ_SET_INFORMATION call).

In terms of identifying the issue, one might use driver verifier to force the STATUS_PENDING behavior or just install the PassThrough sample in the WDK which it will also cause pretty much all operations to be pended because it always returns FLT_PREOP_SUCCESS_WITH_CALLBACK. Another way to try to identify the issue is to return FLT_PREOP_SYNCHRONIZE for the operations that a developer suspects might be causing this and see if the broken behavior still reproduces. Of course, returning FLT_PREOP_SYNCHRONIZE instead of FLT_PREOP_SUCCESS_WITH_CALLBACK when it's not necessary might have a huge performance impact on the system so it can't be used in a production environment, but it's a good way to test things nevertheless.