Thursday, December 30, 2010

About IRP_MJ_CREATE and minifilter design considerations - Part III

An interesting topic when discussing creates is the context (thread and process context) in which the create happens. This isn't really interesting from the OS perspective (since the OS always receives the request in the context of the requestor) but from a filtering perspective. In the previous post we discussed how the OS takes the request and eventually sends an IRP to the file system. There are some things to note:

  1. CREATE operations must be synchronized by the OS. I think this is true for any stateful protocol (and stateless protocols don't really have a CREATE operation anyway). The CREATE operation simply means "hey everyone, there will be some requests in this context for this object so you'd better set up your contexts so you know what we're talking about when you get the next request". So the requestor can't really do anything until the request is complete since they don't even have a handle. This means that the IO manager will pretty much execute in a single thread and when it needs to wait for some other service (like the FS) it will send a request (the IRP_MJ_CREATE IRP) and wait for it to come back.
  2. The FS stack however is layered. The implication of this is that while the user can treat the CREATE operation as synchronous, the layers involved in processing that create can't. For file system filters (legacy and minifilters), there are 3 distinct steps:
    1. Before the request makes to the minifilter (before the preCreate callback is called)
    2. After the request is seen by the minifilter, but before the minifilter knows the request has been completed by the lower layers (after the preCreate callback but before the postCreate callback)
    3. After the minifilter knows the request has completed, but before the IO manager knows about it (after the postCreate callback)
    This is important to understand because there are certain limitations, depending on what each layer of the OS knows about the request. For example, during a preCreate callback, the IO manager knows someone wants to open a file but the FS doesn't yet know about that file. So even though the minifilter has a FILE_OBJECT structure (which comes from the IO manager), trying to use it to request something from the FS (like reading or writing or even queries) cannot work since the FS has not yet seen the request and has no idea what the FILE_OBJECT is supposed to represent (the information about which stream on disk the FILE_OBJECT will represent is stored in the create IRP and not in the FILE_OBJECT). In a similar fashion, during the postCreate callback the filter knows how the FS handled the request (whether it was a successful request or not) but the IO manager doesn't, so trying to call a function that involves the IO manager for that FILE_OBJECT (for example ObOpenObjectByPointer, which will create a HANDLE given an OBJECT) will fail.
  3. FltMgr will also synchronize IRP_MJ_CREATE requests for a couple of reasons. From a minifilter perspective, this is beneficial because it simplifies the model quite a bit. In general synchronized operations are somewhat simpler to handle in the postOp callback but synchronizing every operation will have a negative impact on the system. So FltMgr won't synchronize by default any operation except CREATE, where there is no negative impact because the IO manager synchronizes it already. While this is guaranteed by documentation, minifilters should still always return FLT_PREOP_SYNCHRONIZE instead of FLT_PREOP_SUCCESS_WITH_CALLBACK for IRP_MJ_CREATE just so this behavior is made obvious.
  4. This brings us to the most important point. FltMgr documentation mentions in a bunch of different places that the postCreate callback will be called in the same context as the preCreate callback. In some cases I've this statement being interpreted as "FltMgr guarantees that the postCreate will be called in the same thread where the user request was issued". However, this is not the case. FltMgr makes no guarantees about what thread the preCreate callback will be called on, just that it will call postCreate on the same thread. What can happen is that a filter (legacy or minifilter) can return STATUS_PENDING for an IRP_MJ_CREATE and the continue the request on a different thread, in a different process altogether. This is a legal option and what happens is that the filter below the filter that returned pending will have its preCreate callback called on the new thread, in the new process context. This is a brief example of what happens in this case (let's say the FS will return STATUS_REPARSE):
    1. The IO manager receives the CREATE request on Thread1 and issues an IRP_MJ_CREATE on the same thread.
    2. FilterA (let's say it's a legacy filter) sees IRP_MJ_CREATE request on Thread1 and pends it and then sends it down on a different thread, Thread2 .
    3. MinifilterB (below FilterA) sees the IRP_MJ_CREATE request (i.e. minifilter B's preCreate callback is called) on Thread2, where it queues the request and returns FLT_PREOP_PENDING.
    4. MinifilterB then dequeues the request on a different thread (Thread3) and it sends it down (calls FltCompletePendedPreOperation with FLT_PREOP_SYNCHRONIZE for example)
    5. The FS receives the IRP_MJ_CREATE on Thread3, processes and discovers it is a reparse point and so it returns STATUS_REPARSE.
    6. FltMgr's completion routine gets called on Thread3 and since FltMgr knows the operation is synchronized, it simply signals Event2.
    7. FltMgr resumes the operation on Thread2 where it was waiting for the event and calls the postCreate callback for minifilterB.
    8. Minifilter B does whatever processing it does for STATUS_REPARSE and returns FLT_POSTOP_FINISHED_PROCESSING.
    9. FltMgr completes the request (we're still on Thread2).
    10. FilterA's IoCompletion routine gets called on Thread2 and FilterA performs whatever processing it needs before completing the IRP.
    11. the IO manager's IoCompletion routine gets called (still on Thread2), but the IO manager is synchronizing the operation so it signals Event1.
    12. IO manager's wait on Thread1 returns so the IO manager can inspect the result of the call. Since the FS returned STATUS_PENDING, it might return back to OB and restart parsing from there… This in turn might come down the same path and issue a new IRP_MJ_CREATE on Thread1 and so on...
    Here is a picture of what this would look like.
As you can see, it is impossible for a filter to guarantee that its preCreate callback will be called on the thread of the original request. So what can a file system filter (or a file system) do ? Well, there are largely three reasons why a file system (or filter) might care about the context of a certain operation:
  • The operation refers to some buffer and the VA is only valid in the process context of the originator.
  • The operation refers to some other variable that is process specific (for example , a handle), like IRP_MJ_SET_INFORMATION with FileRenameInformation or FileLinkInformation, where the parameters contain a handle.
  • The operation needs to evaluate security so it needs to know who is the requestor for the operation.
IRP_MJ_CREATE doesn't care about user buffers or other process dependent variables (they are all captured before getting to the IO manager) so file systems and filters don't need to worry about that. However, security is a really big part of IRP_MJ_CREATE processing so filters often need to know who is requesting the operation. However, as I mentioned in the previous post in this series, the security context is captured in nt!ObOpenObjectByName and sent in the IRP parameters (Parameters.Create.SecurityContext) and so the file system and the filters can simply use the context there to decide who is requesting the operation.
In conclusion, the fact that a filter can't guarantee that it will be called in the context of the thread where the original request was issued doesn't matter much.