Thursday, October 28, 2010

Useful Models - how choosing the right abstraction can help design and some useful abstractions for working with minifilters

The poll on the site indicated this was the topic most people were interested in so here it is.

I find myself quite often in the position of trying to explain why something doesn't work the way someone expects it would. I guess this is due in large part that the work I do (storage and file systems) is something that people interact with quite often but in fact operates quite differently than the abstraction it presents to the users. I've mentioned this in my other posts anyway…

So in order to explain why some architecture won't work, I try to find an analogy or a model that would immediately make the problem obvious. Some of these models are very dependent on the problem I'm dealing with while some others I keep reusing. Some of the models are obviously not practical, but they highlight a certain features of the system. It would be nice if these models could be implemented as actual tools (like Driver Verifier) but the reality is that in some cases the effort to write something like this would not justify the benefits… So I guess most of them will remain in the realm of thought experiments but they can be useful nevertheless...

I'll go through a list of commonly asked questions and the models that I find help explain the problem. I'm sure most of the readers of this post could contribute their own examples so please do so through the comments.

Q: Why not send the file name directly to our minifilter from a service or some other user mode program ?
A: it really depends on the other minifilters on the system. The model here is a minifilter that implements ALL of the namespace perfectly, with file IDs and hardlinks and so on, at its level, and below itself it keeps a flat structure where all streams are identified by GUIDs and there are not directories. If your minifilter happens to be below such a filter then obviously the name of the file at your level (which is a GUID) has absolutely nothing to do with the name the user mode service sees (which can be a regular path). Now, it must be said that any minifilter that does anything like this to the namespace would be in the virtualization group, so if you are above the virtualization group you don't have this problem. But if you are IN or below the virtualization group, then you must take this into account.

Q: Why not communicate with my minifilter through a private communication channel and have it open and read files on behalf of my service ?
A: if you are in or below the virtualization group, see the example above. If you are below the AV group, then you should always think about malware. Let's say you do something very benign, like open your own file and read some configuration data (as opposed to opening and parsing or executing random user files). If there is a vulnerability with your parsing code, this allows someone to write a file based exploit targeting your product and no AVs will be able to see your accesses to the file and catch the vulnerability. Unfortunately, there isn't a good generic malware model so you need to construct your own every time you need to explain why bypassing some security measure is not a good idea…

Q: Why not create a back-up of a VHD file while the volume is mounted ? (which is another way of saying "why not try to read the data on a mounted volume by directly accessing the sectors ?").. This is a question that's not really related to file systems but to the storage stack.. However, I find a lot of people are confused about this and keep trying to read mounted volumes.
A: the model I find helps is that of a volume with a file system on top that on volume mount reads everything into memory and then it only writes the odd bytes (byte 1, 3, 5 and so on) of anything and keeps the even bytes in a cache, until it gets either a flush or a dismount. This makes immediately visible what would happen if you tried to read it. However, once I mention this people immediately ask whether we could flush and then take a snapshot, but then I point out that immediately after the flush the system might already have received some writes and then only the odd bytes have been written so you need a way to guarantee that no more writes happen on the file system, and the only way to do that is to dismount it.

Probably the most powerful model that exposes a lot of issues with filters (not only file system, any filters of any component really) is the "filter attached on top of itself" model. This is important because in general anything you can do in your filter someone else can do in theirs. For example, let's say the discussion is whether creating a new FSCTL that is currently unused and sending it down the FS stack to your filter is a good idea (spoiler: it's not). In the general case this wouldn't work with your filter attached twice, since all the IOCTLs will be captured by the top filter. This might not be an obvious problem (because depending on what the filter should do with the IOCTL , it might still work fine), but then consider that someone else can write a filter just like yours using the same IOCTL derived through the same mechanism and then you can expect more serious problems. So in this particular case you would want to make sure to either use a communication mechanism guaranteed to deliver messages directly to your filter like a control device or (if using a minifilter) communication ports. The same applies for file names (what if there already is a file with that name?) and other named resources.. Thinking about what would happen if your filter would be attached on top of itself is always an interesting thought experiment and highly recommended since it will expose potential problems with your design. Once you know what the problems are you can decide about how likely it is to happen and whether you should address the issue..

I thought I had more models and I should have done a better job at keeping track of them but I can't remember anymore right now. I will update the post when I do.

Thursday, October 21, 2010

Filtering in the Windows Storage Space

This post assumes that reader had some knowledge about the IO subsystem in windows.

The file system stack is simply a set of drivers between the IO manager and the file system (including the file system). These drivers are usually referred to as file system filters. In general the file system is the component that implements the hierarchy of files and directories and perhaps an additional set of features (like byte-range locking or hardlinks and so on). The file system filters usually add some functionality on top of what the file system provides (such as encryption or replication or security (think anti-virus scanners), quota management and so on). Most of these features could be implemented at any of these layers (for example, byte-range locking is usually done in the file system, but a filter can do it as well…). The decision is usually driven by customer requirements and even in the OS itself some things are done in filters, so that customers that don't need the feature don't pay the price.

For a pretty complete list of types of things file system filters can do, one can take a look at the list here. Of course, this is not a complete list, but still it shows how rich the ecosystem really is. I remember hearing that an average user on a Windows machine is running around 4 or 5 file system filters, usually without even realizing it.

The interface between the IO manager and the file system is very rich and complex. There are very many rules and everything is asynchronous which makes things very complicated. On top of this, while there is support in the NT model for filtering, it doesn't really provide some of the facilities that file system writers need (for example, there is not a lot of support for getting the name of a file or for attaching context to a certain file). This is where minifilters comes in. The minifilter infrastructure was written to primarily address some things that almost all file system filter need, without really changing the filtering model too much (which is why I'm avoiding the phrase "minifilter model" since it doesn't really change the IO model much, it just adds some features to it). This is all implemented via a support driver called filter manager. Filter manager is a legacy filter that is a part of the operating system and it provides things such as :
1. Support for contexts
2. An easier model for attaching to a volume
3. Easier model for file name querying
4. Support for unloading filters
5. Predictable filtering order
6. Easier communication between a user mode service and a driver.

Some of these are just nice features (like context support, where a legacy filter can still reliably implement their own scheme if they want) while some are downright impossible in the legacy model (for example, it used to be very problematic to make sure that an anti-virus filter would not be loaded below an encryption filter (which would make scanning files useless)).

The numbers that I've heard were that a legacy filter needs about 5000 lines of (very complicated and highly sensitive) code to just load and do nothing. With the minifilter model I'd say less than 50 are necessary, and most of them are just setting up structures and such.

Of course, a legacy filter can do all a minifilter can because filter manager itself is a legacy filter and it doesn't use private or undocumented interfaces. However, since the minifilter model is supported on all platforms since Windows 2000 there is really no reason for anyone developing a new filter to write a legacy filter. At least, that's my view. There are some people who disagree with this statement (as with any other model in fact) but the fact is that Microsoft is moving towards making the legacy model obsolete.

It is important to note that the storage infrastructure consists of two big parts, the file system stack and the disk stack. The disk stack deals with IO that is issued by the file system. The file system stack encapsulates all the complexity of operating with files and folders and such and issues just sector reads and writes. The disk stack has no concept of byte range locks, files and so on. What is deals with are sectors. The types of filters in this space are categorized about what they filter (disk, partition or volume) as well as the functionality they provide (encryption, compression, replication and so on). For example filters can offer things like volume snapshots, full volume encryption or full disk encryption, volume or partition replication, performance monitoring at all levels and so on.

As you can see, the storage subsystem is very rich and most of the time filters play a huge role in it (at least in the Windows world, where one can't just modify the source to add features to an operating system component). However, with so many ways to do things it is sometimes hard to know what architecture is best suited for a certain type of problem, and unfortunately selecting the wrong one can have a huge impact on the cost and complexity of a project.