Of Filesystems And Other Demons: About IRP_MJ_CREATE and minifilter design considerations

In this post I want to talk about the IoCreateStreamFileObject API, as well as the difference between what is normally referred to as an FCB or an SCB and a FILE_OBJECT. But before that I'm going to rant a bit about my favorite subject, namespaces :).

Any stateful communication protocol needs a way to identify the connection once it has been established. In a lot of cases the requestor of the service initiates the connection and receives back a token that identifies the connection to the provider of the service. This token belongs to the namespace of the service provider. This is the case with file handles (a caller requests a file to be opened and they get back a token which they can then use when requesting reads and writes), network connections (where the token is a socket), web pages (the user logs on to a server, receives a session token or a cookie) and many other things. However, the protocol could also go the other way around, where the user could create a token (which would then belong to the user's namespace) and pass it in with the session initiation request. In such a scenario, opening a file would more like "hey file system, I plan to read from a file and I will use the value 123 for it, so whenever you see value 123 know that I'm talking about that file". So as you can see an important question when designing a protocol is who should learn the other's context? Should it be the service provider or the server requestor ? It's pretty easy with people because they can never remember some else's context (a token that someone else gives them) so in that case whichever end of the protocol needs human interaction should be the one that generates the token. For example, each file in a file system can be opened by ID, but humans still prefer names even though they could in theory learn the ID. Or when browsing a web page people don't remember the URL at the top for each page they visit, even for web pages where that URL doesn't change. In fact the reason Google is such a big company is because they figured out that people don't remember URLs even when they aren't random characters, but instead people remember key phrases about the page they're looking for (that's their token, not the URL). Going even further one could argue that the whole history of computer science is the history of building contexts for people . Assembly language was a way to associate names that were meaningful to people with memory locations (so instead of the human operator remembering the address, which is the machine's context, the human operation would remember a name and the assembler would convert that name into the address). A file system is a way to associate disk locations with a name and so on.

Anyway, now that my rant is over, let's get back to IRP_MJ_CREATE. IRP_MJ_CREATE is the type of protocol where the IO manager tells the file system to open something and it also tells it the token it's going to use to refer to it in the future, the FILE_OBJECT. IO manager allocates a FILE_OBJECT that it will use to identify that stream and it needs to tell the file system about it. However, the file system also needs a context associated with that stream. It needs to know where the stream is located on disk, whether it is encrypted and so on. This is all very specific to the file system (clearly only a file system that supports encryption will need to know whether the stream is encrypted) and so there is no general purpose structure that all the file systems can use. Therefore each file system needs to keep its own internal structure for streams it opens and it needs to learn how to associate the FILE_OBJECT with its internal structure. It could implement a key-value structure where the FILE_OBJECT is the key and the file system internal structure is the value, but that would potentially be time consuming (the key lookup would need to happen for each operation). The decision was made to allocate a field in the FILE_OBJECT to be used by the file system to store this context, and that field is FILE_OBJECT->FsContext. The way the protocol works is that during IRP_MJ_CREATE FsContext is NULL and when the request reaches the file system, the file system will allocate its internal context and store a pointer to it in the FsContext. In other words IRP_MJ_CREATE is a mechanism that allows a file system to initialize its fields in the FILE_OBJECT.

This internal context that identifies a stream to the file system is traditionally called an FCB (file control block) because there used to be only one stream per file. However, when file systems added the ability to associate multiple streams of data with a file (alternate data streams), the file system needed to be aware of the distinction between the stream and file and so in such file systems (NTFS and UDFS are examples of file systems that support alternate data streams (or ADS)) the FCB actually means "context associated with the file" and what traditionally used be the FCB is now called an SCB (stream control block). On a final note about SCBs, it is worth mentioning that they aren't really completely private and that in fact the OS cares about some information associated with the SCB. As such, all SCBs should start with an FSRTL_ADVANCED_FCB_HEADER. So for any file system developers, please make sure to implement this. I've done it for file systems that didn't support it and it can probably be done in a couple of hours. Without this your file system won't support FltMgr and minifilters and probably break some legacy filters as well (there are other negative side effects as well).

So now that I mentioned that the IRP_MJ_CREATE can be seen as way to associate an a FILE_OBJECT with the SCB, let's talk about IoCreateStreamFileObject. A file system needs to track a lot of metadata. Some of it is related to user files (names, directory structures, permissions) and some is internal to the file system (transaction log, journal). This metadata can be split into logical units (the journal, the transaction log, the directory information, the permissions hash) and they must be dynamic. So it makes sense that a file system would treat most of this data as if they were user files. In this way it can reuse a lot of the code it already has implemented for reads and writes and so on. So when user does something like enumerate a directory, the file system can simply say "open the stream associated with the directory and read it all" using its internal functions that deal with converting file offsets into disk offsets and so on. However, reading metadata from disk for every user operation will make things very slow so a file system might prefer to cache things. It could of course allocate memory and remember which data was more frequently accessed and it might decide that if there is memory pressure the size of the cache should decrease but there is already a component in the system that does all that, the Cache Manager. So if the file system could use the Cache Manager then it could benefit from all the logic in there. But the Cache Manager is a system component and it doesn't know anything about SCBs (which are specific to each file system). So in order to keep things generic, it would be nice if the file system could use FILE_OBJECTs for its internal streams and then use all the facilities in the OS that use FILE_OBJECTs.

Of course, creating FILE_OBJECTs for internal streams is a noble goal, but how does one create such FILE_OBJECTs ? Allocating memory and initializing the structure by hand is just asking for trouble, since the structure is different between OS versions, and if we don't call ObCreateObject (which is not documented) then we're probably going to break some OB integration anyway. One possible solution would be to call IoCreateFile from the file system. However, not all internal streams have names and while the file system could do something like use ECPs or allocate GUIDs as file names and use those as keys, this would still be a pretty ugly hack. Moreover, as we've discussed above, the IRP_MJ_CREATE is nothing but a way for the IO manager to tell the file system which internal stream to associate with a FILE_OBJECT, but since the file system already knows exactly which stream is wants to open, why even have an IRP_MJ_CREATE ? What a file system needs is a way to request a new FILE_OBJECT from the IO manager, which it then can associate with the right internal structure. IoCreateStreamFileObjectEx is an API to do just that. There are some examples in the WDK about how to call it and when it should be used.

This is a brief overview of what IoCreateStreamFileObjectEx does (IoCreateStreamFileObject simply calls IoCreateStreamFileObjectEx with a NULL handle).:

Call ObCreateObject to create the actual FILE_OBJECT
Setup a minimal set of the FILE_OBJECT fields
Set the FO_STREAM_FILE flag in the FILE_OBJECT.
Call ObInsertObjectEx to create a handle for the FILE_OBJECT
If the caller passed in a NULL pointer, close the handle.

From a file system filtering perspective, the implication is that filters should expect IO and possibly other operations on FILE_OBJECTs that they haven't seen an IRP_MJ_CREATE for. Depending on the filter's functionality the filter might want to ignore such FILE_OBJECTS. Since all stream FILE_OBJECTs have the FO_STREAM_FILE flag set in the FILE_OBJECT->Flags, checking for this flag is a pretty reliable way to identify such FILE_OBJECTs.

Of Filesystems And Other Demons

Thursday, January 13, 2011

About IRP_MJ_CREATE and minifilter design considerations - Part V

No comments:

Post a Comment

Translate

Search This Blog

Helpful links for file system developers

About Me

Followers

Blog Archive

Popular Posts