Thursday, February 11, 2010

Context Usage in Minifilters

I’m not sure why but in spite of there being pretty good documentation and even a sample available, the topic of how Contexts work and how filters should use them comes up a lot.

There are a couple of rules that govern contexts and pretty much everything follows from the interaction between these rules (this applies to all contexts). Please note that this is more of a design discussion and the implementation might be slightly different:

  1. When the reference count on the context gets to 0, the memory is freed.
  2. Any pointer to the context needs to have a corresponding increment on the reference count. This is done transparently when the filter requests the context via one of the functions (FltAllocateContext(),FltGetXxxContext(),FltReferenceContex() and even FltSetXxxContext() and so on).
  3. A context needs to be linked to the underlying structure (i.e. StreamContext  to the stream, StreamHandleContext to the FILE_OBJECT, VolumeContext to the volume and so on…; please note that we are talking from a design perspective, the implementation of exactly which structure has the pointer to the context might be different, but this is irrelevant for this discussion).

This is pretty much it. I’d like to walk through the most common scenarios and explain how those rules apply:

A filter allocates a context (FltAllocateContext) and it gets a pointer to a context (refcount 1). The context is not linked to anything at this point in time. If the filter calls FltReleaseContext, the refcount will drop to 0 and the context will be freed. If the filter tries to attach the context to the structure (say by using FltSetStreamContext – i’ll use StreamContexts for the rest of the discussion, and the underlying structure in this case is the SCB (Stream Context Block; or, for file systems that don’t support multiple streams per file (aka alternate data streams), the FCB)), then there are three cases:

  1. It succeeds. refcount is now 2, one for the link from the SCB and the other one is the one the filter has. 
  2. It fails and the filter doesn’t get a context back (for whatever reason: memory is low or the filter passed a NULL pointer for OldContext or there is some other failure). In this case there is still only one pointer, the one the filter has, so the refcount needs to be 1.
  3. It fails and the filter gets another context back (there already was a context attached and OldContext was not NULL). Now the filter has two contexts, the original context that it has allocated which has a refcount of 1 (only the filter has a pointer to it) and a new context (though the name is OldContext), with a reference count of at least 2 (because there are at least two pointers to it, one from the underlying structure, the SCB, and one that was just returned in OldContext so the filter can use it – there could be other references from other threads, but to keep things simple we will ignore those). The filter will need to release the original context it has allocated because it can’t use it (and since the refcount was 1 this will drop it to 0 and will free it). The filter will also need to eventually release the reference it got on OldContext, after using it (which will drop it back to 1, which represents the pointer from the SCB to the context).

Before we go any further i want to discuss what a filter can do when getting a context fails for whatever reason (this includes allocation failures and failing to set or get the context). Some filters can simply ignore that object (for example, a filter trying to log file IO might make a note in the log that IO to file X will not be logged and that’s that). Other filters might work in a degraded mode (for example, an anti-virus filter that is trying to be smart about scanning a file when it’s closed might want to remember whether there was any write to the file. If it fails to get a context it might scan the file anyway… performance might be worse but it will still work). And yet another case is where a filter might simply not be able to work when it doesn’t get a context. In that case the filter might want to allocate and initialize the context early enough so that the operation can be failed, usually in the Create path so in case the allocation fails the filter can fail the Create and the file won’t be opened at all.

Yet another thing to mention is that if a filter needs to use a context at DPC (let’s say in postWrite) then the context needs to be allocated from nonpaged pool and since the context functions are not callable at DPC the recommended way is to get the context (which might involve allocating it and attaching it) in the preOperation callback and pass it through the CompletionContext to the postOperation callback which can use it and then call FltReleaseContext to release the reference (yeah, even at DPC if the context is allocated from nonpaged pool).

One might wonder why the strange dance with the OldContext and NewContext. Couldn’t the filter just check if there is a context and only allocate one if there isn’t one ? Well, of course it could but because the environment is asynchronous, multiple threads might be doing the same thing at the same time, and they will all check if there is a context, find none, allocate a context and set it so now you have 10 threads each trying to attach an SCB with a different context… So the context operation needs to be a CAS (CompareAndSwap) so that only one thread succeeds.

Thus filter should not really start using a context that it allocated until it actually manages to attach it to the structure. However, immediately after the context is attached another filter might get to it, so it needs to be in some defined state, otherwise the other filter will get an invalid context (more on this later). The steps need to be something like this (this is pretty much the logic in CtxFindOrCreateStreamContext in the ctx sample in the WDK):

  1. context = FltGetStreamContext()..
  2. If we didn’t get one:
    1. context = NULL
    2. FltAllocateContext (NewContext)
    3. Initialize context to whatever default values make sense. Please note that those values need to take into account the current reference as well.. I’ll explain more below.
    4. FltSetStreamContext(NewContext, OldContext)
    5. If it failed:
      1. FltReleaseContext(NewContext) –> no point in keeping it around. Since we had the only reference refcount was 1 and it dropped to 0 so it will be freed.
      2. If we got OldContext, context = OldContext
      3. else, we didn’t get OldContext but we also couldn’t attach our context for some reason – the filter needs to continue without a context, whatever that means… (and no, KeBugCheck is not a good idea :)… )
    6. if it didn’t fail –> context = NewContext
  3. At this point context points to the context to use. If it is null, something went wrong and we should bail… (we could bail here or in 2.5.3., doesn’t matter). By bail i mean we should either fail the operation or popup a warning to the user or mark somewhere that we missed one so the results are not reliable anymore.. doesn’t matter.
  4. do things with context….
  5. FltReleaseContext(context) -  here we release our context. We can do this later, for example if we get the context in the PreOperation callback we might want to pass it via the CompletionContext to the PostOperation and release it there. Or we could queue some work item and pass it in a context and have the work item release it. Anyway, once the context gets release the reference count on the structure will drop back to 1 (for the link from the SCB to the context).

In step 2.3. i said that the context needs to be initialized to whatever values make sense, but IT MUST take into account the current reference. Well, this is not always needed but it depends on the particular design of the filter (it’s usually needed though so keep reading). Consider a filter that uses a StreamContext to keep track of how many threads it has doing IO on a Stream and is using a handle that the filter opened via FltCreateFileEx2. Let’s say that when the count gets to 0 the filter will call FltClose on the filter’s handle. Now let’s imagine a case where in step 2.3. the filter simply initializes the count to 0. The logic would be something like this:

  1. context = GetStreamContext(); // allocate new context or get the existing one. also get a handle by calling FltCreateFileEx2(…) if needed
  2. context->Count++
  3. Do things on context->Handle
  4. context->Count--;
  5. If (context->Count == 0) then FltClose(context->Handle).
  6. FltReleaseContext(context);

Do you see the problem here ? What happens if there are two threads, T1 and T2, and T1 allocates the new context, initializes so that context->Count is 0 (which means, it is initialized to a default value that doesn’t take into account the current reference) and then it sets the context (refcount is 2, 1 for T1 and one for the underlying SCB), before getting to step 2. it gets preempted by T2, which starts at the top. T2 will get a context (refcount is 3, 1 for T1, 1 for T2 and one from the SCB), it will increment the count (so context->Count is 1), it “does things”, then it decreases the count in step 4. (so context->Count is now 0) and then step 5. will proceed to close the handle. Step 6. will release the context (so refcount drops back to 2). Then when T1 resumes it will be at Step 2. and it will again increase the context->Count to 1 (from the wiki link above, this is a manifestation of the ABA problem), then it will do things on context->Handle which has been closed….. And there you have it… This could have been avoided if GetContext() actually initialized the newly allocated context Count to 1. This complicates things a bit because Step2. now might need to only be called when the context was not allocated in this thread, meaning that Step2. will probably need to move in GetContext() and so on..

Another thing worth mentioning is that once the underlying object is torn down, the link from it to the context will be broken (i.e. the pointer from the underlying object will go away), so the reference count will need to be decremented. In most cases where there are no outstanding references (there are no other threads using the context) the refcount will go to 0 and the context will be freed (and the filter’s context teardown callback will be called, if one was registered). There are a couple of implications this has. If a filter simply allocates a context, associates it with an object and then calls FltReleaseContext() (which is the normal way to set up a context), the filter doesn’t need to do anything else to make sure the context goes away. It will be torn down when the underlying object is torn down.

The other thing that follows from the fact that the context is tied to the lifetime of the underlying object is that if a filter can never leave the context in a bad state assuming that it will go away, because the underlying object might hang around for a while and get reused, reviving the context. For example, for a StreamContext where a filter has pointer to an ERESOURCE and an open count, it would be a mistake to free the ERESOURCE when the open count gets to 0 under the assumption that once the last handle goes away the SCB will go away as well, because that might not be true. The file system might cache the SCB and if a new open to the same files comes along the file system will reuse the cached SCB, which means that the filter will get a context that has an invalid pointer to an ERESOURCE. So in this case the right place to free the ERESOURCE is in the context teardown callback.

Finally, the last thing i want to mention is what FltDeleteContext does. FltDeleteContext unlinks the context from the underlying object. So if a filter decides it no longer needs a context associated with a stream (for example), it will need to do something like this:

  1. context = FltGetStreamContext();
  2. If (context != NULL)
    1. FltDeleteContext(context);
    2. FltReleaseContext(context);

At this point it should be obvious that FltDeleteContext needs to be called before FltReleaseContext (because since FltReleaseContext will release the reference associated with the context it is not safe to use context at all after calling it; FltDeleteContext will only remove the reference from the underlying object, if it is set, not doing anything about the current reference). Please note that after FltDeleteContext unlinks the StreamContext, any thread trying to get it from the object will not find it. This means that the filter should not try to use it in a meaningful way since other threads might not see the changes. Basically, once FltDeleteContext was called, the filter should simply call FltReleaseContext…

I hope this makes sense. If nothing else it should be useful when you have trouble sleeping (i almost fell asleep twice while proofreading it)

8 comments:

  1. I have a simple question. You use fltkd all over, but whenever I try to use it, WinDbg complains:

    "Could not read offset of field "FrameList.rList" from type fltmgr!_GLOBALS"

    Now this is on XPSP2, with symbols from Microsoft's symbol servers. What am I missing?

    ReplyDelete
  2. Hi Jeff,

    Sorry for the delay. The symbols have been broken for a long time and on some OS releases the extension just doesn't work. I've tried to fix them for Win7 but it's really complicated how this works and i've been unable to do so for downlevel OSes. However, there might be something i might be able to do for this.. I'll post a blog about it if and when i get to it.

    For all the posts in the blog i've been using Win7 so they should work. However, i was also using private symbols at the time.

    ReplyDelete
  3. Alex

    You stated, I quote:

    "FltReleaseContext() (which is the normal way to set up a context), the filter doesn’t need to do anything else to make sure the context goes away. It will be torn down when the underlying object is torn down."

    On the other hand, the FltReleaseContext documentation states:
    "FltReleaseContext decrements the reference count on the given context. When the reference count reaches zero, the context is freed immediately if the caller is running at IRQL <= APC_LEVEL. If the caller is running at IRQL DISPATCH_LEVEL, a work item is scheduled to free the context."

    Can you please elaborate on this?

    ReplyDelete
  4. Well, i think you're taking it a bit out of context (probably my fault as well, i should have phrased it better). What i meant was that a filter normally sets up a context by calling first FltAllocateContext, then FltSetXxxContext and then initializes it or whatever it does and then it calls FltReleaseContext. This works because FltAllocateContext creates a context with a refcount of 1, FltSetXxxContext associates the context with some structure and increments the refcount to 2 and then FltReleaseContext() takes the refcount back to 1, so now the context is only associated with the underlying object. When that object goes away it will also call FltReleaseContext, which will decrement the refcount from 1 to 0 and as such proceed to free the context like the documentation says.
    Does this make sense now ?

    ReplyDelete
  5. Thank you, this re-phrase looks better to me – it emphasizes that FltDeleteContext is not used in this scenario - it is very tempting for a programmer to have every “alloc” to be matched with “delete"

    ReplyDelete
  6. Exactly. That's what i was trying to emphasize as well.

    ReplyDelete
  7. Alex, first of all thank you for this blog!

    I am a bit confused, in the scenario you have explained to universalkludge, when there is 1 reference to the context lesf and associated with filesystem structures, I have noticed I can´t use FltReleaseContext() as it crashes, so I guess the filesystem releases this reference.

    My question is is that right and when does it happen?

    Thanks a lot!!

    Santi

    ReplyDelete
  8. Yes, there is one reference from the underlying structure (depending on the context type). For example for a stream context there is a reference from the file system's SCB to the context. When the file system releases the SCB it will in turn release that final reference with will then release the stream context. You can actually see this in the debugger if you look at the stack when your context release callback is called. If you want to break this link you must call FltDeleteContext().

    ReplyDelete