Thursday, December 16, 2010

About IRP_MJ_CREATE and minifilter design considerations - Part I

This is the first in a series of posts where I'll try to address various common questions about IRP_MJ_CREATE. My plan is to address the following topics:

  • What exactly is it that IRP_MJ_CREATE creates ? (a bit of rambling on one of my favorite topics, operating systems design)
  • Why is there no IRP_MJ_OPEN ? Surely MS could afford one more IRP :)...
  • Flow of a file open request through the OS.
  • What is the difference between a stream and a file from an FS perspective
  • What does STATUS_REPARSE do ?
  • What is name tunneling ? How does it affect creates ?
  • How to open the same stream as an existing FILE_OBJECT in a name-safe way.
  • What are stream file objects and why are they necessary ?
  • Various strategies to redirect a file open to a different file.
  • How to track a create when reparsing ?

In order to address this properly, I'd like to explain some things about operating systems. This is a rather dry topic but in my opinion the things I'm going to talk about are fundamental for understanding not only how IRP_MJ_CREATE works, but also why it works the way it does.

There are many ways to define an operating system but for this topic I think that a very useful way to describe it is as a hardware abstraction layer. It is a library of functions combined with a machine abstraction. As such, OS code is pretty much dedicated to either "abstract stuff that people use a lot" (allocate memory, create a window, draw strings and so on) or "hardware interaction code" (talk to the disk, talk to the memory controller hardware, talk to the graphics hardware). As such it should come as no surprise that the kernel part of OS is designed around interaction with hardware (as opposed to the user mode part which in general implements more abstract services).

File systems (and the whole file system stack including legacy filters and minifilters) are "higher level drivers" (since they don't usually talk to hardware directly). However, they must fit into the OS model which is built around hardware. This is why file system still create device objects and when calling FltGetFileNameInformation the name it returns starts with "\Device\....".

One other very important concept that plays into why IRP_MJ_CREATE functions the way it does is that the OS itself is implemented as a set of "services". Each service has its own protocol, usually described by an API set (the memory manager has it's own command set, the object manager has its own set and so does the IO manager). Most (if not all) of these protocols are stateful. The caller issues an "initialize" command (ExAllocatePool, ZwCreateFile, FltRegisterFilter) and they get back a more or less opaque handle (for ExAllocatePool, the pointer serves as the handle; ZwCreateFile -> an actual handle; FltRegisterFilters -> a PFLT_FILTER pointer and so on) and they can then issue additional commands that require that handle to be passed in (ExFreePool, ZwReadFile, FltStartFiltering). For stateful protocols the service (or server) has a blob of data that describes the internal state of each object and based on that data it knows how to satisfy each request. The opaque handle is a key that helps the service find that data. For example, for ExAllocatePool the internal data blob is the nt!_POOL_HEADER, for ZwCreateFile the context is pretty much a set of granted access rights for that handle and a reference to the FILE_OBJECT and for FltStartFiltering the FLT_FILTER structure. From this point on I'll call that blob of data a context (as in MM's context, IO manager's context, FltMgr's filter context). For services that already provide support for caller defined contexts (like FltMgr) I'll use the terms "internal context" and "user's context" to differentiate the two. The conclusion here is that any stateful protocol must have some context in the service (or server) side that the service can use to keep track of the state of communication with the client.

The important thing I wanted to get to is that sometimes some operations require multiple OS components to work together to satisfy a user request and as such multiple contexts might need to be created by each component. For example, for a ZwCreateFile call there might need to be created some of the following contexts: a handle, a FILE_OBJECT, a FltMgr internal context, some minifilter contexts, one or more file system contexts and a couple of MM contexts (where all the other contexts will be stored).

So with all these things in place, we can start talking about IRP_MJ_CREATE. As I said above, the OS has an abstract interface which consists mainly of OBJECTs for various things. When someone needs to talk to a device (physical or a virtual device, like a file system; anything that can be represented internally by a DEVICE_OBJECT), the OS context is a FILE_OBJECT. So in other terms, the FILE_OBJECT simply represents the state associated with the OS communicating to a DEVICE_OBJECT. The "create" word in ZwCreateFile and IRP_MJ_CREATE simply refers to FILE_OBJECT itself. There is no IRP_MJ_OPEN because there is no way to open an existing FILE_OBJECT. In order to get a FILE_OBJECT one must either create it or already have a reference to it (pointer or handle) and must call either ObReferenceObject or ObReferenceObjectByHandle to get another reference to that FILE_OBJECT.

The next topic, which is the flow of a create operation through the OS is pretty long so I'll save for next week. In the mean time please fell free to let me know what other topics related to the IRP_MJ_CREATE path you have that you'd like to address.