Thursday, February 11, 2010

Issuing IO in minifilters: Part 1 - FltCreateFile

In this post I'll try to address a couple of questions that are all related and that I've seen asked a lot. This is a rather long topic so I'll split into a couple of posts. I’ll try to explain both how things work and why they are this way (at least, why I think they are this way…).

These are some of the questions that I'm trying to provide an answer to:

  1. How does FltCreateFile work ? How does a CREATE request find the next filter to send the IO to ?
  2. I don’t have an instance but I need to perform some operation a file and I'd like to use the FltXxx function.. how do I do this ?
  3. There is no Flt… function for what I want, what should I do ?
  4. Why shouldn’t a minifilter send IO to the top of the stack ? What does that mean and what happens if I do..
  5. Why is it a bad idea to hold locks across calls into the file system?

The fundamental issue here is that the file system stack in Windows is reentrant. What this means is that it is possible that in response to some IO operation there will be a new IO operation generated at some location in the IO stack (i.e. in a filter or in the file system) that will be posted at the top of the stack, which needs to complete before the original operation can complete. The critical thing here is that the new IO needs to complete before the original one. If a filter or the file system issues IO to the top of the stack but doesn’t wait for it to complete then the chance of getting into a deadlock is greatly reduced (it can still happen if the logic in some other components depends on the order in which the original IO and the second IO complete, but still it’s a lot less likely).

Reentrancy in the IO model

For this example I chose a filter to trigger the reentrancy but usually it is the file system that does this. Please note that the filter might be triggering this just by trying to access some memory that is paged out (in that case step 4 wouldn’t go directly to IO manager but rather memory manager but anyway you get the idea), so operations 4-8 are simply trying to page the data in.

So why is this bad ? Well, the main reason is that it’s very easy to end up deadlocking the system. If any of the filters happen to have a lock that only one operation can acquire then the second IO will block behind that lock and it will wait forever for the first IO to finish. This is a simple and clear example and one could try to imagine various schemes to prevent this from happening (like adding rules around when the lock is held or making the second IO aware that locks are held and make it not try to acquire the lock again and so on). This is in fact exactly what the file system does (since it holds locks while issuing IO to the top of the stack pretty often) and it all works well until something else gets in the way of that IO (usually a filter). This is why filters need to be extra careful about how they issue IO.

This is also one of the main reasons why it’s generally a bad idea to hold locks across calls to the file system. If there was no reentrancy then you have a guarantee that the operation you sent down will finish without trying to reacquire the lock. However, the way it is now it’s impossible to know all the possible paths, especially since filters always alter the semantics in subtle ways, which can mean new dependencies between operations that were previously unrelated.

Before we start talking about FltCreateFile I must point out that the usual way of issuing IO (allocate an IRP or CALLBACK_DATA, fill in the parameters, send it to the driver below and wait for it to complete – this does not require reentering the IO stack) does not work well for creates. Anyone trying to issue an IRP_MJ_CREATE this way would soon find out that they need to handle things like security checks, handle creation, name resolution, reparse point or symlink handling, the whole lifetime of the FILE_OBJECT and many other sensitive operations. The safest bet is to let the system handle the CREATE operation, which means the create operation needs to go to the IO manager so it can handle all these things. So any filter that needs to open a file needs to reenter the IO stack at the top.

Now, let’s think about how a filter can open a file. Let’s say this is an anti-virus filter and it wants to open the file before the user can open it and scan the contents. Let’s also assume that it wants to do this only in kernel mode (this is not usually what AV filters do, but let’s keep it simple). So the logic in its CREATE processing routine is simple:

  1. Get the current file name that the user is trying to open
  2. Open the same file (remember that this goes to the top of the stack)
  3. Scan it
  4. Close the file
  5. If the file is clean, let the original create go to the file system. Otherwise fail it.

Looks pretty simple on paper. The real question is, of course, how to open the file. Should it simply send the create to the top of the stack (and not worry about reentrancy)? Just doing that quickly results in an infinite loop (the new create comes down the path and the AV filter will see it and since it’s a create it needs to block it and send a new create which will come down the path etc…). So clearly the AV needs a way to identify that the create it sent to the top is its own create, so it shouldn’t block it (like filter 3 in the picture above). But how ? Pretty much any scheme it can use to tag this create somehow will be broken if there is another minifilter in the stack that does exactly the same thing. Imagine Filter 1 and Filter 2 are both AVs that work in the same way but they don’t know how about each other (two different products from different vendors if you will). What will happen is this:

  1. User’s create gets to Filter 1
  2. Filter 1 check for its tag (T1 – the tags can be anything the filter can do to mark a create, from doing something to the file name to setting some weird flag combination; also Filter 1 knows nothing about Filter 2 so it doesn’t look for it’s tag, T2, which we will talk about later) and since it doesn’t find the tag it issues a new Create (C1) which it tags and sends to the top of the stack.
  3. Filter 1 sees the new Create but it finds the tag T1 so it lets it go down.
  4. Filter 2 now sees a create, checks for its tag (T2; as I said before, Filter 2 knows nothing about how Filter 1 works so it doesn’t know to look for T1) which is not set, so it blocks the create and it sends a new create (C2) with the T2 tag set to the top of the stack. In this process T1 might get lost or overwritten by T2 (assume they use exactly the same way to tag things… ). Goto 2.
  5. Actually, step 5 will never execute…

So here you have another way where reentrancy can mess things up. This should also explain why tagging operations in a certain way and sending them all the way to the top can fail if you don’t know what the filters above yours do.

Clearly, it would be better if the filter could simply send the creates below itself, to the rest of the IO stack, while still using the IO manager’s code. Of course, if any other filter misbehaves and sends a new create to the top of the stack things can get ugly, but at least if all filters work well this model will actually work. Fortunately, IO manager provides a routine to do just that: IoCreateFileSpecifyDeviceObjectHint, which opens a file through IO manager while skipping some devices on the stack. So this takes care of the legacy file system drivers. They simply should call IoCreateFileSpecifyDeviceObjectHint and target the create at the next device below themselves.

What about minifilters and FltCreateFile? Please note that the other functions (FltCreateFileEx and FltCreateFileEx2) work the same way, so I'm talking about all three of them. Well, as you may have noticed, FltCreateFile takes an FLT_INSTANCE parameter. This parameter, when present, indicates to Filter Manager that the Create should go only to instances below that instance. Of course, if the instance is missing then the Create will go the top of the stack. So when issuing the create Filter Manager needs to tag it with some information which it can then use (when it sees the create come down the stack) to figure out which minifilters should see it.

Now, let’s talk a bit about how FltMgr can tag the create. Starting with Vista there is a mechanism that allows filters to add extra information to creates. The structures that are allocated to hold this information are called ECPs (Extra Create Parameters) and each create operation can have an arbitrary number of such ECPs that are all attached on an ECP_LIST. There is a whole set of APIs to allocate the list and the ECPs (start by reading documentation for FsRtlAllocateExtraCreateParameterList or FltAllocateExtraCreateParameterList). Before Vista filters would achieve some of the same functionality (but not all of it, the ECP model is more powerful) by using Extended Attributes (look at IoCreateFile and IoCreateFileSpecifyDeviceObjectHint, they take an EaBuffer parameter). However, for filter manager’s purposes, EAs are good enough.

Now that we have all the elements we can finally answer the first question in the list. FltCreateFile will allocate an internal targeting structure and store it in an ECP (or, before Vista in an EA) and then call IoCreateFileEx and specify its own device (based on the instance that is passed in) as the hint. Then while processing any create operation it checks for its ECP (or the EA), it gets the targeting information structure from there and from that structure it figures out where the IO needs to go. If there is no targeting information then the assumption is that this IO was not targeted so filter manager will simply show the IO to all minifilters. So at this point the DeviceObject hint takes care of the legacy filters above the target frame (including other filter manager frames) and the targeting information tells it which minifilter should be the first to see this create.

In this post I’ve addressed questions 1, 4 and 5. The next post will tackle 2 and 3.