Thursday, August 25, 2011

Using EX_RUNDOWN_REF

This time I'd like to talk about a pretty cool OS primitive, the RundownReference. For some reason there doesn't seem to be an MSDN page on it so as usual with undocumented stuff, use at your own risk. Still, it's available in the WDK.H file and it's simple enough that the function declarations are pretty much all one needs to figure out how it works.

I imagine everyone reading this blog is familiar with reference counting. However, there is one particular access pattern that I run into a lot where reference counting is almost right but not quite. When writing filters I often find myself in a position where I get an object and I have some structure (a context) associated with it. The context contains some member variables or pointers to structures that are not necessarily valid for the lifetime of the context. Lets say that at some point I want to tear those down while the rest of the context is still around. I guess an example would explain thing a lot better:

//
//  Stream context data structure
//

typedef struct _MY_STREAM_CONTEXT {

    ...

    PVOID MyBuffer;
    ...


} MY_STREAM_CONTEXT, *PMY_STREAM_CONTEXT;

The scenario I have in mind is one where I have a buffer that I store in my StreamContext. However, when some event happens I need to change the buffer (possibly reallocate it). The problem I have is that when I get the event I can't tell who is using the buffer. One way to deal with the situation would be to add a lock to the structure and acquire the lock (could even be a shared lock) and use the buffer and then release the lock, but that places some restrictions on what I can do with the buffer (IO with a lock held should be avoided where possible). So if I can't hold a lock while using the buffer, the only alternative is to count how many threads are using it. So now I need to add a ref count. However, if it's just a regular ULONG then I have no mechanism to specify that I'm waiting for people to stop using my buffer and that they shouldn't use it anymore. So now I'd need to add a flag that I'd set when I'm waiting to free the buffer and then whoever gets a reference to the buffer would need to check the flag first and if the flag is set then they'd know that I'm waiting for it to tear down. Of course, the buffer and the refcount would have to be kept in sync so setting the flag and incrementing the refcount will need to be protected by a lock. Moreover, I'd like to wait for the threads that already got a reference to the buffer to release it but I don't want to busy-wait so I'll need an event or some other sort of signaling mechanism.. So as you can see, things have already gotten out of hand and I'd need to build a lot of additional infrastructure for such a simple thing… Enter the RundownReference!

The RundownReference works like a regular reference count, except that at some point someone can say "ok, no more references, let's wait for all the references to be released". After that point all new attempts to get a reference fail and the caller can block until all the references are gone…

This is the structure definition:

typedef struct _EX_RUNDOWN_REF {

#define EX_RUNDOWN_ACTIVE      0x1
#define EX_RUNDOWN_COUNT_SHIFT 0x1
#define EX_RUNDOWN_COUNT_INC   (1<<EX_RUNDOWN_COUNT_SHIFT)

    union {
        __volatile ULONG_PTR Count;
        __volatile PVOID Ptr;
    };
} EX_RUNDOWN_REF, *PEX_RUNDOWN_REF;

You can see that it's a very small structure, basically the size of a pointer. The APIs one can use with it are:

  • ExInitializeRundownProtection() - well, this initializes it...
  • ExAcquireRundownProtection() - this simply takes a reference. It is a BOOLEAN and it can fail if the RundownRef doesn't allow more references to be taken.
  • ExReleaseRundownProtection() - this releases a reference.
  • ExWaitForRundownProtectionRelease() - this sets the RundownRef into a "draining" mode, where no more references can be taken (ExAcquireRundownProtection() returns FALSE) and it blocks until the last reference is returned.
  • ExRundownCompleted() - this should be called after ExWaitForRundownProtectionRelease() returns. Calls to ExAcquireRundownProtection() will still fail, but it performs some cleanup on the structure.
  • ExReInitializeRundownProtection() - finally this will reinitialize the structure when and if it should be reused. Though looking at it in the debugger it's not really all that different from ExInitializeRundownProtection()… still, there it is.

So now all I need to do in my StreamContext is to add a RundownRef and I'm set:

typedef struct _MY_STREAM_CONTEXT {

    ...

    EX_RUNDOWN_REF MyBufferRundownRef;
    PVOID MyBuffer;
    ...


} MY_STREAM_CONTEXT, *PMY_STREAM_CONTEXT;

This is a pretty versatile primitive in that it can be allocated from regular pool (if you don't need to access it at high IRQL). I also think the routines should work at DISPATCH_LEVEL but I've never actually used them in that way (I see some calls for KeGetCurrentIrql() but I've not looked at everything going on so maybe it'll just check for DISPATCH_LEVEL and fail in a very elaborate way). All the functions are using the FASTCALL calling convention (no stack frame) which makes then pretty quick.

For completeness I'll post the declarations here but you should check WDM.H and you'll get the latest and greatest from there..

NTKERNELAPI
VOID
FASTCALL
ExInitializeRundownProtection (
    __out PEX_RUNDOWN_REF RunRef
    );

NTKERNELAPI
VOID
FASTCALL
ExReInitializeRundownProtection (
    __inout PEX_RUNDOWN_REF RunRef
    );

__checkReturn
__drv_valueIs(==0;==1)
NTKERNELAPI
BOOLEAN
FASTCALL
ExAcquireRundownProtection (
    __inout PEX_RUNDOWN_REF RunRef
    );

__checkReturn
__drv_valueIs(==0;==1)
NTKERNELAPI
BOOLEAN
FASTCALL
ExAcquireRundownProtectionEx (
    __inout PEX_RUNDOWN_REF RunRef,
    __in ULONG Count
    );

NTKERNELAPI
VOID
FASTCALL
ExReleaseRundownProtection (
    __inout PEX_RUNDOWN_REF RunRef
    );

NTKERNELAPI
VOID
FASTCALL
ExReleaseRundownProtectionEx (
    __inout PEX_RUNDOWN_REF RunRef,
    __in ULONG Count
    );

NTKERNELAPI
VOID
FASTCALL
ExRundownCompleted (
    __out PEX_RUNDOWN_REF RunRef
    );

NTKERNELAPI
VOID
FASTCALL
ExWaitForRundownProtectionRelease (
    __inout PEX_RUNDOWN_REF RunRef
    );

4 comments:

  1. I have been researching this topic a while ago...

    http://rezkiy.livejournal.com/55599.html

    I will work as a devil's advocate here and will suggest to Timeo Danaos et dona ferentes

    Get the trick: instead of storing the refcount you store double that. So you always have an even number. When AddRefing, make sure you end up with an even number. When destroying, increment (or decrement, that's an implementation detail), just make it odd. If you've made it odd, you know no addrefs will ever happen. So you can figure how many releases will happen and if that's more than 0, wait on the event that you've set up in your own stack frame just before decrementing the refcount. The last releaser will signal the event. Bingo.

    After getting the trick I'd say it is much safer to just code everything by yourself. Pushlock implementation has changed at some point, this one can change too.

    ExWaitForRundownProtectionRelease will most definitely not work at DISPATCH.

    ReplyDelete
  2. Hi Sergei,

    That's pretty much how EX_RUNDOWN_REF is implemented :). I was planning on implementing it myself in case it ever changes in a way that is incompatible with the current semantics, but honestly i don't exactly see what different semantics it might implement. Anyway, that's always the risk with using undocumented structures, as i mentioned.

    When you say "ExWaitForRundownProtectionRelease will most definitely not work at DISPATCH", can you please explain why ? From looking at the disassembly it looks like it's at least trying to work. What was your experience with it ?

    ReplyDelete
  3. Very useful and interesting post!

    What do you think about remove lock IO_REMOVE_LOCK? It seems that I can use remove lock to archive the same aim as with rundown reference?

    ReplyDelete
  4. This API is now documented. No need to worry about it changing.

    http://msdn.microsoft.com/en-us/library/windows/hardware/jj569382%28v=vs.85%29.aspx

    ReplyDelete