Thursday, October 13, 2011

The Delete Minifilter Sample

Update (3/9/2012): This is where you get the sample from: http://code.msdn.microsoft.com/windowshardware/Delete-File-System-b904651d.

As I mentioned in my previous post there is a new file system filter sample in the Win8 WDK, Delete. Unfortunately I don't think I can reproduce the sample code here (since it belongs to Microsoft) so you'll need a Win8 WDK to be able to follow along. This also makes this post rather dry and I'm sorry about that.
In my experience I've seen mainly three types of filters that need to interface with delete:
  • Layered FSD - these filters take complete control of some files or parts of the namespace and that means they need to implement deletion for those files as well. However, in that respect they are more similar to a file system than a filter (because they maintain the state of the file and thus are the authority for the state of the file) and so I think that the FastFat source code in the WDK is a better example for those.
  • Undelete Filters - these are filters that want to be able to "undelete" a file either by moving it instead of deleting it or by preventing the user from deleting it in the first place under certain circumstances. For example, if the "Recycle Bin" feature would be implemented in a filter then it would be in this category. One typical problem these filters run into is the inability to know if a file is going to be deleted or not by querying the file system about the status of the file (by looking at the FILE_STANDARD_INFORMATION->DeletePending flag) and the inability to actually reset the DeletePending flag for files that were opened with FILE_DELETE_ON_CLOSE. As I mentioned in my previous post this can be worked around by removing the FILE_DELETE_ON_CLOSE flag from the create and then by sending a FILE_DISPOSITION_INFORMATION from postCreate to set the delete disposition which enables the filter to query and reset the disposition at a later time.
  • Filters that need to know when a certain stream disappears from the system, for example filters that keep metadata about some streams on a file system (like encryption keys or sizes or the time they were last modified and so on). Such filters might want to change their state or remove some metadata when a stream they are tracking disappears from the file system. For these it might not matter how the user tries to delete a file and in some cases it might not even matter what the file name is (so they would still track files across renames or deletes to Recycle Bin), what they need to know is when a certain stream is gone from the underlying file system without the possibility for it to come back. This is what the Delete sample is trying to show.
So, to recap, the goal of the delete sample is to detect when files and streams disappear from the file system. Once this happens the filter simply prints a notification with the file (or stream) name on the debugger output.
One aspect to note is that the filter prints the name of a file after the file has been deleted. This means it can't query the file system for the name (since it doesn't make any sense to ask a file system for a name of a file that doesn't exist) and so how is the filter supposed to know the name of file ? It might be tempting to try to implement a mechanism to be involved in all the name operations and thus to be able to know what was the last name for the file after the file has been deleted but this is definitely not a trivial task. However, as I mentioned in my previous post about using names in file system filters, if a name is only needed for logging then it doesn't really matter whether the name is exactly in sync with the file system or not since the name will be consumed later (usually much later) after the event happened so the name will likely be out of sync with the file system anyway. This is especially true for delete operations since the file isn't even on the file system anymore. So the delete sample takes the approach of printing a name for the file or stream without trying too hard to make sure that that name is exactly the last name the file had in the file system (though it will be right in the vast majority of cases).
It's interesting to look at the actual implementation of how the name is generated and stored. If a stream is interesting (in other words, if there is a possibility that the stream will be deleted) then the name of the stream to be deleted is stored in the stream context as a referenced pointer to the FLT_FILE_NAME_INFORMATION structure that is populated during preCleanup. If the stream has been opened multiple times there will be multiple IRP_MJ_CLEANUPs that the filter receives and the code simply calls FltGetFileNameInformation every time and updates the FLT_FILE_NAME_INFORMATION structure so that the name that is stored is the latest name in the file system right before the last IRP_MJ_CLEANUP for the file. Another thing to note is that the name that is generated is the OPENED_NAME. As I've said before, getting a NORMALIZED_NAME is expensive and pretty much only really necessary when the name is to be compared with other names (or parts of name). In this case the name is intended to be "consumed" by someone looking at the debugger log and so the normalized name is not necessary.
One interesting issue that filters face when trying to keep track of the state in the file system is that the IO stack in NT is asynchronous and as such the order in which a minifilter sees requests is not necessarily the order in which the file system sees them. Let's use the example of two IRP_MJ_SET_INFORMATION calls that are racing down the IO stack, both trying to set a FILE_DISPOSITION_INFORMATION, one of the them with DeleteFile set to TRUE and the other one with DeleteFile set to FALSE. Moreover, they are racing in a way that the filter sees both preOp callbacks before it sees the postOp callback for either of them (in other words both requests are being processed by layers below the filter at the same time). When a filter sees these requests it might see the one that sets it to TRUE and then the one that sets it to FALSE and assume that the delete disposition was set and then reset and so the file won't be deleted. However, it's very possible that the file system will received the request that sets the delete disposition to FALSE before the one it sets it to TRUE and so it will delete the file. This is clearly not a frequent case but it can happen. What the Delete filter does in this case is to keep a counter of the number of in-flight FileDispositionInformation operations it has seen. If there is only ever one operation then the filter can know for sure what the state is in the file system and so it registers a postOp callback where it checks if the operation was successful and if it was then it updates the information it keeps in the stream context with the disposition. If the filter ever processes more than one FileDispositionInformation operation then it gives up on trying to figure out what the state of the flag is in the file system and it falls back to its default behavior where it tries to figure out if the file was deleted from the file system. This is a perfect example of how a minifilter can optimize the common case (where there is only one FileDispositionInformation operation issued at a time) but when it detects that it can't do that it must use other ways that are possibly less efficient.
This brings us to discussing how the minifilter can tell whether a file or stream was actually deleted. The minifilter relies on a couple of implementation details in NTFS, where NTFS knows that the file or stream has been deleted and it answers an IO request in a slightly different way to indicate that. For example, querying for FileStandardInformation after the file system has processed IRP_MJ_CLEANUP for a stream will fail with STATUS_FILE_DELETED if the stream was actually deleted. This requires one additional IO request to the file system (so it does have a performance hit) which makes it less optimal than the case where the filter can know for sure what the delete disposition is. In addition to this, if the delete was for an alternate data stream (ADS) it is possible (but not guaranteed) that the whole file will be deleted. So once the filter figures out the stream was actually deleted it must find out whether the whole file was deleted as well. Again, the filter relies on some undocumented behavior (or at least I'm not aware of it being documented anywhere): trying to get the OBJECT_ID for a file in postCleanup will return STATUS_FILE_DELETED as well if the whole file was deleted (please note that the call might still fail if there isn't an OBJECT_ID for the file, but it won't fail with STATUS_FILE_DELETED if the file hasn't been deleted). This doesn't work for transacted files and so in that case the filter tries to open a file by ID, which fails with STATUS_INVALID_PARAMETER if the file was been deleted inside a transaction.
Since I mentioned transactions it's interesting to see what the filter does in case of deletes inside transactions. It can use the same mechanism to detect delete operations like it would for non-transacted handles, but the additional complication is that a transaction can be rolled-back, undoing all the delete operations. So whenever the filter detects a file being deleted inside a transaction it will add that file to a list and when the transaction is finalized the filter checks to see if the transaction was rolled-back and if so it will notify that the streams that were previously deleted in that transaction have now come back.
In fact, even just the fact that the minifilter supports transactions makes in an interesting sample, since other than the minispy sample there is no example about how a minifilter that is transaction aware needs to be implemented, and the minispy sample simply lists the requests it sees and doesn't try to do anything beyond that.
Finally, another thing worth mentioning is how contexts are allocated during preCreate and set during postCreate (when necessary), which is different from the Ctx sample that tries to allocate the context during postCreate. This method has a couple of advantages. First, it allows the filter to fail an operation before the operation is seen by the file system if the filter can't get a context (not enough memory for example) instead of trying to deal with the failure after the operation happened, which is sometimes impossible because some operations can't be undone. Also, for operations where the postOp callback can be at DPC this allows the filter to allocate and set the context during preOp when the code isn't running at DPC and pass it in through the CompletionContext to the postOp callback which can update it and then just call FltReleaseContext() which is supported at DPC (please note that if context needs to be accessible at DPC then it must be allocate from non-paged pool).
Unfortunately the delete sample doesn't show how to deal with more complicated types of deletes (overwriting renames for example), but it's still a welcome addition to the set of filter samples anyway.