Thursday, November 3, 2011

Byte Range Locks and IRP_MJ_CLEANUP

Byte range locks are not a complicated concept but there are some interesting implementation details that might make life hard for a filter. I ran into this a couple of days ago when I was tracking down some IFS tests failures related to locking (in particular the UnlockRangeOnCloseTest test from the FileLocking group).
Byte range locks are documented fairly well, at least when compared with other concepts. There is the Lock 'Em Up - Byte Range Locking OSR article and an MSDN page on Locking and Unlocking Byte Ranges in Files. However, for this discussion, the relevant feature is described in the user mode API for locking files, LockFileEx(). This is the quote:

If a process terminates with a portion of a file locked or closes a file that has outstanding locks, the locks are unlocked by the operating system. However, the time it takes for the operating system to unlock these locks depends upon available system resources. Therefore, it is recommended that your process explicitly unlock all files it has locked when it terminates. If this is not done, access to these files may be denied if the operating system has not yet unlocked them.

So what this means is that a process doesn't necessarily have to release all its locks on a file before closing the handle it has and the OS will release all the locks on its behalf (though this is not the recommended way of doing things). There is an interesting aspect here that is worth noting. In fact, whenever the documentation says that something happens automatically for a handle when its closed I immediately think about what happens about handles in different processes that point to the same object. For example, what happens when a file is opened with handle A (HA) in process A and then process A creates process B in such a way that process B inherits the handle from process A (HB). Both HA and HB point to the same FILE_OBJECT and when the first handle is closed nothing particularly interesting happens for the file system (the IRP_MJ_CLEANUP only gets sent when the last handle to a FILE_OBJECT is closed). For the rest of this post let's assume that HA is closed first and then HB is closed and the closing of the HB handle is the one that prompts the IO manager to send the IRP_MJ_CLEANUP call.
So now let's look at what happens in FastFat to handle this case. Looking at the code that processes IRP_MJ_CLEANUP (in \src\filesys\fastfat\Win7\cleanup.c) we find this block of code:
            //  Unlock all outstanding file locks.

            (VOID) FsRtlFastUnlockAll( &Fcb->Specific.Fcb.FileLock,
                                       IoGetRequestorProcess( Irp ),
                                       NULL );
There are two interesting things to note about this call.

  • First we can see that a process is passed in (and this is the process associated with the IRP which FastFat gets from IoGetRequestorProcess()). Moreover, the process is a mandatory parameter, as we can see from the declaration for FsRtlFastUnlockAll():
    NTSTATUS FsRtlFastUnlockAll(
      __in      PFILE_LOCK FileLock,
      __in      PFILE_OBJECT FileObject,
      __in      PEPROCESS ProcessId,
      __in_opt  PVOID Context
    The documentation clearly states that the locks that are released are specific to a process and so during IRP_MJ_CLEANUP FastFat will automatically close the handles associated with the handle on which the IRP_MJ_CLEANUP call came. For our example, handle HB. But what about the locks acquired on handle HA ? Are they going to be left behind ?
  • The second interesting thing to note is that the FILE_LOCK structure is a private member of the FCB, not part of the FSRTL_ADVANCED_FCB_HEADER. So the IO manager can't know where that structure is located without specific knowledge about each file system and as such it can't call FsRtlFastUnlockAll by itself.

Searching for FsRtlFastUnlockAll() in the FastFat source we find that there is another place where it is called, in the FatFastUnlockAll() function (in \src\filesys\fastfat\Win7\lockctrl.c). As the name suggests, FatFastUnlockAll() is a fast IO callback for FastFat and it really doesn't do much else than release all the byte range locks associated with the calling process. This looks like a good mechanism to have the IO manager call the file system to instruct it to release all the locks when a handle is closed. However, there was still one puzzling aspect. FastIO is supposed to be optional so what happens if a filter fails the FastIO or a file system doesn't implement it at all ? I expected there would be an IRP equivalent for this FastIO but there is no other place in the code where FsRtlFastUnlockAll() is called. Well, in fact there is an IRP equivalent for the FastIO but it is not explicitly processed by the FastFat file system. Instead all the lock processing associated with the IRP_MJ_LOCK_CONTROL IRP is handled inside FatCommonLockControl(), which simply calls FsRtlProcessFileLock() and lets the FsRtl package handle it.
Finally, now that we know how the IO manager calls the file system to tell it to release the locks associated with a process, there is one more twist. Does the IO manager call an unlock all every time a handle is closed ? Or, if not, how does it know when to do it ? Clearly it doesn't need to do it for the last handle (since the file system's IRP_MJ_CLEANUP routine will do it) but what about the other handles ? It turns out that there is an optimization here. Whenever the IO manager issues a byte range lock request to the file system it sets the FILE_OBJECT->LockOperation boolean to TRUE. Then, whenever it is closing a handle, if FILE_OBJECT->LockOperation is set it knows that it must notify the file system to release any potential locks. Please note that this flag appears to never be cleared (i.e. even if a process locks and then unlocks all the ranges so that there are no locks to release when closing the handle) so don't be surprised if you receive this in your filter even when there are no locked ranges.
So to summarize things, this is the logic involved here:

  • On every lock operation the IO manager sets FILE_OBJECT->LockOperation. It is worth mentioning that LockOperation is never actually used by the file system (at least not that I've seen in any file system I've looked at).
  • When a handle is closed, if the FILE_OBJECT->LockOperation is set then the IO manager knows there were some locks taken on the FILE_OBJECT and so it must release them. So the IO manager will issue the IRP_MJ_LOCK_CONTROL IRP with the IRP_MN_UNLOCK_ALL minor function (or it will call the FastIO equivalent) to tell the file system to release all the locks. However, this is not necessary if this is the last handle for the FILE_OBJECT because the IO manager will issue the IRP_MJ_CLEANUP IRP in that case and the file system will release all the locks for that process anyway.
  • When a file system processes the IRP_MJ_CLEANUP IRP must also release all the byte range locks for the FILE_OBJECT for that process.

Ok, so now let's look at some of the problems that filters might introduce or might run into:

  • A filter that acquires locks on a FILE_OBJECT without going through the IO manager (i.e. without calling ZwLockFile() but by issuing their own IO (IRP or FLT_CALLBACK_DATA)) should also set the FILE_OBJECT->LockOperation flag so that the IO manager knows locks have been taken on that file because otherwise it'll be really complicated to release the locks at the right time.
  • A filter that duplicates a handle for a FILE_OBJECT might also change the behavior a bit depending on when it closes the handle. If for example if closes the handle after the user has closed his handle then the IRP_MJ_CLEANUP IRP will be sent for their close and not the user's close. Now, the IO manager should handle this properly and frankly I don't see any problem with it off the top of my head, but it's something to keep in mind.
  • When a filter calls ZwClose (or FltClose) for a handle they've opened the IoGetRequestorProcess() call for the IRP_MJ_CLEANUP IRP will return the system process, so the file system will release all byte range locks on the FILE_OBJECT in the system process. This might be broken if, for example, there are two handles, H1 and H2 for the same FILE_OBJECT in the system process and a lock was taken on handle H1 but then the filter closes H2 and the IO manager finds FILE_OBJECT->LockOperation set and it tells the file system to release all the locks in the system process for that FILE_OBJECT and thus it releases the byte range lock that H1 had.
  • Also, there are some filters that open their own handles to certain files and then they forward some requests that arrive on other files to the files they've opened (for example some back-up filter might forward all IRP_MJ_WRITE for each file (foo.txt) requests to another file (foo.txt.bak)). Also Shadow File Object type filters will often exhibit the same behavior. Now, if they ever forward a byte range lock request to the file they've opened (by doing something like changing the TargetFileObject) then when they close their file that close will most likely not be in the same process as the process that requested the byte range lock originally and so some ranges of the file they've opened might remain locked. In this case the filter might need to call IRP_MJ_LOCK_CONTROL with IRP_MN_UNLOCK_ALL itself from the process context where the forwarded lock request originated.

Finally, there is one more thing I'd like to say. There are no Flt functions equivalents for ZwLockFile or ZwUnlockFile. A filter that wants to lock files on the file system below must issue their own requests. However, there are some Flt special functions for byte range locks (like FltProcessFileLock()) but they are meant for filters that implement byte range locks for some FILE_OBJECTs (like a file system would). For example FltProcessFileLock() should be called where a file system would call the FsRtlProcessFileLock() function. However, since the FsRtlProcessFileLock() requires an IRP parameter FltMgr had to implement a wrapper function that takes a FLT_CALLBACK_DATA structure instead of that IRP. This is not the case for all the FsRtlXxxLock() functions because not all of them take an IRP parameter (for example FsRtlFastUnlockAll() doesn't take an IRP and there is no Flt equivalent and instead a filter that implements file locks simply calls FsRtlFastUnlockAll() directly). Basically a filter that implements file locks must mix calls to FsRtl functions with calls to Flt functions.