Thursday, September 1, 2011

Using EX_PUSH_LOCK

After talking about EX_RUNDOWN_REF I'd like to talk about another primitive that is pretty cool yet also rather undocumented, the EX_PUSH_LOCK. Even though they are OS primitives (you can see the Exf functions if you look at the exports from the kernel) they are not documented or even declared in the WDK headers. However, there are some wrapper functions for them exported by FltMgr and there is some documentation about them there. The cool thing is that you don't have to be a minifilter to use them, though you'll still need to access them through FltMgr's functions (but those don't require any minifilter specific structures like instances or such). Anyway, here is the link to FltInitializePushLock() and then there is this thread on OSR's list that is pretty interesting.

So let's talk briefly about what pushlocks actually are. They are shared-exclusive locks with similar semantics to ERESOURCEs, but that have a couple of different properties:

  • They are smaller than ERESOURCEs (size of a machine pointer)
  • They are more efficient for mostly shared access
  • They can live in paged pool.
  • They CANNOT be acquired recursively.
  • They have different fairness guarantees (or no guarantees if you prefer)

A very important difference (and a major drawback in my oppinion) is the fact that they are not at all convenient to debug. There is no !locks debugger extension for them and looking at the structures directly isn't easy (or at least I've had a really hard time trying). Still, the structure is available in the debugger:

0: kd> dt nt!_EX_PUSH_LOCK
   +0x000 Locked           : Pos 0, 1 Bit
   +0x000 Waiting          : Pos 1, 1 Bit
   +0x000 Waking           : Pos 2, 1 Bit
   +0x000 MultipleShared   : Pos 3, 1 Bit
   +0x000 Shared           : Pos 4, 28 Bits
   +0x000 Value            : Uint4B
   +0x000 Ptr              : Ptr32 Void

And here is a list of FltMgr's functions that operate on pushlocks:

VOID
FLTAPI
FltInitializePushLock(
    __out PEX_PUSH_LOCK PushLock
    );

VOID
FLTAPI
FltDeletePushLock(
    __in PEX_PUSH_LOCK PushLock
    );

VOID
FLTAPI
FltAcquirePushLockExclusive(
    __inout __deref __drv_acquiresExclusiveResource(ExPushLockType)
    PEX_PUSH_LOCK PushLock
    );

VOID
FLTAPI
FltAcquirePushLockShared(
    __inout __deref __drv_acquiresExclusiveResource(ExPushLockType)        
    PEX_PUSH_LOCK PushLock
    );

VOID
FLTAPI
FltReleasePushLock(
    __inout __deref __drv_releasesExclusiveResource(ExPushLockType)        
    PEX_PUSH_LOCK PushLock
    );

So one thing to note is that all the Flt wrappers are pretty thin and in general all they do is make sure that the Exf function that does the actual work is called while in a critical region (which makes things easier for the caller and makes them almost drop-in replacements for the FltXxxResource functions that have similar semantics). Another interesting aspect is that even though there are functions for initialization and cleanup, they don't seem to be doing much. Looking in the debugger we can see that FltDeletePushLock is empty and FltInitializePushLock and ExInitializePushLock (which is the only pushlock function actually declared in the headers in the WDK) do nothing more than zero out the pushlock. In fact, the FsRtlSetupAdvancedHeader() function (which is an inline) has this bit of code which confirms this:

//
//  API not avaialble down level
//  We want to support a driver compiled to the last version running downlevel,
//  so continue to use use the direct init of the push lock and not call
//  ExInitializePushLock.
//

    *((PULONG_PTR)(&localAdvHdr->PushLock)) = 0;
    /*ExInitializePushLock( &localAdvHdr->PushLock ); API not avaialble down level*/

So they seem to be pretty low cost (small size and low initialization overhead) which makes them pretty attractive if you want to have shared-exclusive lock in many structures in case you'll need them (like something you add to each context for example where you don't want to pay the price of an ERESOURCE).

Because of how complicated they are to debug what I've done (and I've seen others do the same thing as well) was to use ERESOURCEs in debug builds and in initial releases of a product in order to make sure there are no deadlocks and such and only once the code is thoroughly tested switch to using pushlocks. You can even use a runtime flag to enable your code to use ERESOURCEs instead of pushlocks in case you need the option to run with a primitive that's easier to debug for whatever reason (and I can actually guess that reason :)).

Finally, please note that since these are not magical (not being invented by Apple) they won't make crappy code that uses suboptimal implementations of algorithms any faster. In most cases your performance buck is better spent improving the logic and algorithms used by your driver instead of focusing on faster primitives. Also, please note that the guarantees are not the same as with ERESOURCEs so if you rely on the ordering guarantees of ERESOURCEs or you need to be able to acquire them recursively you are better of with ERESOURCEs.

The best way to describe them is "use at your own risk" (especially since they seem to be changing in behavior from one OS release to the other, as indicated by the OSR post I mentioned earlier). Still, if you need a very small and fast shared-exclusive lock you should give these guys a try.