Thursday, February 2, 2012

Problems with STATUS_REPARSE - Part I

I've seen a lot of filters lately using STATUS_REPARSE to implement some of their functionality and I wanted to talk a bit about some of the problems that using STATUS_REPARSE might cause.

I've mentioned STATUS_REPARSE a couple of times before but I wanted to quickly go over some of the details, just as a refresher. The basic idea is that during file creation (IRP_MJ_CREATE) a filter can return STATUS_REPARSE and change the FILE_OBJECT->FileName to point to a different path, in which case the status gets propagated back to the Object Manager (OB) which will retry the create with the new path. Because the new path is resolved at OB level the new file path must include the device name (either in "\Device\HarddiskVolume1" for or in "\??\C:" form, both of which are valid at OB level). There is even a minifilter sample in the latest WDK (SimRep) that shows a pretty easy way to implement a filter that redirects an open to a different file using STATUS_REPARSE. There are multiple ways in which this might be useful for a filter, in particular I've seen this used in data deduplication filters and virtualization filters.

There is another way to use STATUS_REPARSE. This generally requires a file system filter that needs to be notified by the file system when someone it trying to access a file. It is possible to tag a specific file on the file system so that when someone is trying to open the file the file system itself will return STATUS_REPARSE and the filter can watch in postCreate for STATUS_REPARSE and check if it owns the tag and if so it can perform whatever operations it wants on the file. This is a very common approach for hierarchical storage management solutions (HSM).

This technique of tagging a file can also be used by deduplication filters. For example a deduplication filter might find that fileA and fileB are identical and copy the file to a special folder (let's say it would be \SpecialFolder\UniqueFileA.bin) and then remove all the contents from FileA and FileB and tag them and store in an internal database the fact that the paths for FileA and fileB should be redirected to "\SpecialFolder\UniqueFileA.bin". Then the filter might simply wait for the file system to return STATUS_REPARSE with the right tag and then, based on the file path, it might simply update the FILE_OBJECT->FileName to point to some file under "\SpecialFolder". Of course, the filter might not need a tag in the file system and instead it might opt to check the paths for all the files in preCreate and reparse them as appropriate (actually as far as I can tell this seems to be the more popular approach).

Virtualization filters might also use this to redirect opens to a different file than the file the application or the user thinks its opening. This is fairly useful for application virtualization filters (where an application might try to open a specific configuration file in a certain location ("C:\program files\app..") but since the app is virtualized the file doesn't exist at that location and so a filter will return STATUS_REPARSE to open the file at the actual location).

So now that we have covered some of the scenarios, let's move on to some of the problem these approaches might have.

  • reparse tracking is impossible before Vista - by this I mean that it's impossible for a filter to know whether an IRP_MJ_CREATE request is the result of a STATUS_REPARSE (by the same or a different filter) or not. In other words, in the case of the deduplication filter above, when the filter sees a create for "\SpecialFolder\UniqueFileA.bin" it can't tell if the IRP_MJ_CREATE was originally sent to FileA or FileB or even if it was a direct open to \SpecialFolder\UniqueFileA.bin. Consider what happens when the user wants to do something like rename FileA. All the filter sees is a rename arrive on the FILE_OBJECT opened for \SpecialFolder\UniqueFileA.bin but it doesn't know whether this rename means FileA should be renamed or FileB should be renamed. The same applies to other namespace operations like deletes and also to all operations that modify the file contents. Please note that with the introduction of ECPs in Vista it is possible to track whether a file has been reparsed and where it has been reparsed from (if a filter sets an ECP before returning STATUS_REPARSE the ECP will be attached to IRP_MJ_CREATE on the new path).
  • split IOs - in the scenario described above, even if the filter knows what the original file that the user tried to open was (FileA or FileB), we should discuss what a filter can do when it sees a request to modify the file data. Let's say the filter gets a request to truncate the file. Clearly the request can't be allowed to happen to UniqueFileA.bin because it would then impact both FileA and FileB, but the request was only for one of them and so the other one should remain unchanged. For a deduplication filter this is a time when the deduplication should be broken (and so the UniqueFileA.bin should be copied to the original file (either FileA or FileB) and the filter should no longer reparse from that file to UniqueFileA.bin). However, this can't be done at the time when the modifying operation happens and it must be done before the file is opened, and so the only thing the filter can rely on is the information available in IRP_MJ_CREATE (such as requested access for the file) to guess whether the handle will be used for modifying the file or not. That is not very reliable but let's assume that the filter somehow can figure out exactly when to break the deduplication. There is still a problem if there are already opened handles to the file because they now point to a different file. In other words, if an application opens FileA for read-only and so the filter correctly decides that it can reparse it to UniqueFileA.bin (Handle1) and then later another application (or even the same app, doesn't matter) opens another handle for FileA for write and so the filter breaks the dedup (and so now we have Handle2), the problem is that any modification performed on Handle2 will not be seen by Handle1 because they point to different files. This is what I call Split IOs. In other words, whenever a file name changes the actual file it points to it is possible to get this type of problem, and the only way to avoid this with a filter that returns STATUS_REPARSE is to make sure that the underlying mapping to the file on a file system never changes. Also, I'd like to point out this is a pretty common scenario. During my university days I used to have homework along the lines of "implement a producer-consumer solution where there are 10 consumer processes and 1 producer process and they all use a file for communication" and naturally if I had such a filter on my system all the 10 consumer processes would point to UniqueFileA.bin and the producer would point to FileA and so the consumers would never see the producer data… Also, please note that once there are handles that are supposed to point to the same file but point to different files it's not just reads and writes that are going to be out of sync, but also locks and oplocks and many other file system semantics might be broken. This is a very serious issue because it can lead to security issues (what if an antivirus thinks it scans FileA (but instead scans UniqueFileA.bin) and it finds it clean and allows the IRP_MJ_CREATE to continue but then the actual request ends up on a different file that is malicious?), data corruption or data loss.
  • FileIDs are more complicated - please see my previous post on STATUS_REPARSE and fileIDs.
  • Performance is generally not great - in most cases using STATUS_REPARSE requires dealing with names and querying the name in preCreate is a big performance hit. Also, looking up names in internal databases is generally done using normalized names, which are also a bigger performance hit (so normalized names in preCreate are the worst in terms of performance). Finally, if the architecture of the product requires getting a normalized name in preCreate and because filters that need to run on XP or Server 2003 can't tell when a reparse has happened it is often the case that filters get the normalized name twice (in our example, for FileA when they determine they must reparse to "\SpecialFolder\UniqueFileA.bin" and then for the subsequent IRP_MJ_CREATE for \SpecialFolder\UniqueFileA.bin, where the filter would determine that no reparse is necessary (but it would still query the name to make that determination)). This can make this type of filters have a pretty significant performance hit.

This is pretty dense material so I'll leave the other issues I was going to talk about for next week.

4 comments:

  1. Hi, Alex. Thank you for the article. Currently, I am to develop legacy-filter that using reparse technique for forward original FS-operations to special folder - storage.
    For me, seems there are also important question: when need perform copy-on-write operation for copied original file data to new file in storage? Two ways, I think: 1)in Create-op when it opens with modified access or 2)on demand - when operation perform (write, for example). I chose 2-nd way - on demand, because of performance issues. What you think about that?

    ReplyDelete
    Replies
    1. Hi Artem,

      This is what I was talking about when I mentioned breaking the deduplication. The problem with doing it in preCreate is that there are many applications that open a file with write access in case they need to modify it (or in some cases the developers are just careless and open for write without needing to) which means that you'll end up copying a lot more files that necessary. I agree with your approach of copying the file on demand, but that is generally pretty complicated. Once you've copied the file you must redirect all operations to the new file (the one you've copied the original file to) which means you must be in a position to do that. For some operations (consider oplocks) it's going to be hard to move them to the new file (if you need to). Also, in either situation, you'll have to deal with the split-IO problem I mentioned above (unless you track all the handles open against the original file and when you device you must copy the file you move ALL the handles that are supposed to be moved and not only the one that's doing the write).

      Ultimately I'd say that for your decision it matters whether you must be more conservative (both in terms of performance and space) and so you'd want to only copy only when the modification happens (or will happen for sure) or you can afford to be less conservative and so you can just copy on preCreate. My choice is to copy on actual modification (and not in preCreate) but as I said before, that's more complicated to get right.

      Delete
  2. Alex,
    Another excellent post, thanks.

    A quick note for the record - People doing the "slam a new name into fileObject->FileName and return STATUS_REPARSE", should also do a couple of extra things:

    - Use the IoManage function to do this to save eventual Verifier issues.
    - Set Iopb->IoStatus->Information to IO_REPARSE. This latter is important since upper filters will (should - I met a couple at the latest plugfest including my own ) be looking at it to see their tag.

    ReplyDelete
    Replies
    1. Thanks Rod!

      For the record, the IO manager function Rod mentioned is IoReplaceFileObjectName.

      Delete