Thursday, October 21, 2010

Filtering in the Windows Storage Space

This post assumes that reader had some knowledge about the IO subsystem in windows.

The file system stack is simply a set of drivers between the IO manager and the file system (including the file system). These drivers are usually referred to as file system filters. In general the file system is the component that implements the hierarchy of files and directories and perhaps an additional set of features (like byte-range locking or hardlinks and so on). The file system filters usually add some functionality on top of what the file system provides (such as encryption or replication or security (think anti-virus scanners), quota management and so on). Most of these features could be implemented at any of these layers (for example, byte-range locking is usually done in the file system, but a filter can do it as well…). The decision is usually driven by customer requirements and even in the OS itself some things are done in filters, so that customers that don't need the feature don't pay the price.

For a pretty complete list of types of things file system filters can do, one can take a look at the list here. Of course, this is not a complete list, but still it shows how rich the ecosystem really is. I remember hearing that an average user on a Windows machine is running around 4 or 5 file system filters, usually without even realizing it.

The interface between the IO manager and the file system is very rich and complex. There are very many rules and everything is asynchronous which makes things very complicated. On top of this, while there is support in the NT model for filtering, it doesn't really provide some of the facilities that file system writers need (for example, there is not a lot of support for getting the name of a file or for attaching context to a certain file). This is where minifilters comes in. The minifilter infrastructure was written to primarily address some things that almost all file system filter need, without really changing the filtering model too much (which is why I'm avoiding the phrase "minifilter model" since it doesn't really change the IO model much, it just adds some features to it). This is all implemented via a support driver called filter manager. Filter manager is a legacy filter that is a part of the operating system and it provides things such as :
1. Support for contexts
2. An easier model for attaching to a volume
3. Easier model for file name querying
4. Support for unloading filters
5. Predictable filtering order
6. Easier communication between a user mode service and a driver.

Some of these are just nice features (like context support, where a legacy filter can still reliably implement their own scheme if they want) while some are downright impossible in the legacy model (for example, it used to be very problematic to make sure that an anti-virus filter would not be loaded below an encryption filter (which would make scanning files useless)).

The numbers that I've heard were that a legacy filter needs about 5000 lines of (very complicated and highly sensitive) code to just load and do nothing. With the minifilter model I'd say less than 50 are necessary, and most of them are just setting up structures and such.

Of course, a legacy filter can do all a minifilter can because filter manager itself is a legacy filter and it doesn't use private or undocumented interfaces. However, since the minifilter model is supported on all platforms since Windows 2000 there is really no reason for anyone developing a new filter to write a legacy filter. At least, that's my view. There are some people who disagree with this statement (as with any other model in fact) but the fact is that Microsoft is moving towards making the legacy model obsolete.

It is important to note that the storage infrastructure consists of two big parts, the file system stack and the disk stack. The disk stack deals with IO that is issued by the file system. The file system stack encapsulates all the complexity of operating with files and folders and such and issues just sector reads and writes. The disk stack has no concept of byte range locks, files and so on. What is deals with are sectors. The types of filters in this space are categorized about what they filter (disk, partition or volume) as well as the functionality they provide (encryption, compression, replication and so on). For example filters can offer things like volume snapshots, full volume encryption or full disk encryption, volume or partition replication, performance monitoring at all levels and so on.

As you can see, the storage subsystem is very rich and most of the time filters play a huge role in it (at least in the Windows world, where one can't just modify the source to add features to an operating system component). However, with so many ways to do things it is sometimes hard to know what architecture is best suited for a certain type of problem, and unfortunately selecting the wrong one can have a huge impact on the cost and complexity of a project.