Thursday, March 29, 2012

Directory Entries and File Properties

This post is about a pretty well-known behavior of NTFS that nevertheless seems to occasionally surprise people : the fact that the directory entries aren't an authoritative source of information when it comes to file properties. What I'm referring to here is that it's perfectly possible that doing a "dir" command will not return accurate information for the file. When hearing about this for the first time most people think of the situation where a file is open and actively being written to (appended for example) and so the dir command returns a file size that was the actual size at that time but since then the file has been modified and so the obviously the file size doesn't match what is the file size right at this moment.
What I've described above is actually a pretty straightforward case and I've not yet met anyone that was surprised or confused by it. However, following this train of thought leads us to pretty interesting places. If the situation I've described above can frequently happen, how is an application expected to get the actual file size so that it knows it can't change ? (For an example of where this might used think of a copy application that's trying to figure out whether there is enough space on a volume to copy some file there before it starts the copy…) Well, the right approach is to open the file and query the file size and not rely on the directory entry. However, it is possible that the file is modified even while the application has the file opened, so if being able to "know" the file size is really important then the application should not allow other applications write access to the file while it has it open (and this is done by using the sharing modes parameters of the create operation, where an application can not allow other handles to be opened for write, thus making sure the file size or contents can't change).
So let's recap. If the application must know the file size and must make sure it doesn't change, it must open a handle to the file. If there is no handle then the file size information can't be guaranteed to be accurate. So then why even bother to actually return the file size in a dir command ? It turns out that for a lot of cases knowing the file sizes without guaranteeing that they won't change is sufficient. After all, think back to all the cases where you've done a dir and looked at file sizes. I bet in most cases you didn't care if the file size changed a bit later.
So now we know that enumerating the files in a directory isn't by definition an operation that is expected to be 100% accurate and if an app needs those kinds of guarantees then the app must implement its own synchronization mechanism (that might involve opening all the files without sharing write and so on). So with this in mind, what should a file system do to implement the IRP_MJ_DIRECTORY_CONTROL with the IRP_MN_QUERY_DIRECTORY minor code ? One might be tempted to go through all the files in the directory and find their on-disk information and retrieve the file attributes and file information from there, but that would certainly be rather slow. So since the information isn't expected to be accurate, wouldn't it be better to have a sort of cache of the file information ? In fact that's how most file systems implement this. The directory actually stores information about the files it contains in a cache and it returns the information from that cache, which is much faster. One can even see this in action when comparing the time it takes to get a directory listing in CMD with the time it takes to open the directory in Explorer. Explorer displays additional information for each file (the icon) and so when it goes through a directory it will need to open each file and figure out what icon it should display. But please also note that Explorer implements an icon cache as well (there are many posts describing this icon cache, google it).
Now that we've established that a folder caches the file information, when does the cache get updated ? It would make sense that the cache gets updated when the file is closed, which is pretty much how most file systems do it. Incidentally, this explains why if you have a file that is constantly being written to in a thread and then you open it and close it from a different thread (or process) the attributes and file size get updated.
The really interesting thing happens for NTFS when you have hardlinks for a file. One might expect that the file system updates all the directories containing the file to show the new information, but that's not what happens. Instead, the link that is used to open the file is updated. It's not even about the directory that contains that link, it's that particular link. BTW, this is documented behavior: see Hard Links and Junctions and the msdn page for the CreateHardLink function. This is what this looks like (please note how the file size changes for the link I modify it from and then for the other link how it changes when I open the file even without modifying it):
D:\templink>dir
 Volume in drive D is Data
 Volume Serial Number is 3817-6E24

 Directory of D:\templink

03/29/2012  11:46 AM    <DIR>          .
03/29/2012  11:46 AM    <DIR>          ..
03/29/2012  11:46 AM                 8 foo.txt
               1 File(s)              8 bytes
               2 Dir(s)  74,514,755,584 bytes free

D:\templink>mklink /H bar.txt foo.txt
Hardlink created for bar.txt <<===>> foo.txt

D:\templink>dir
 Volume in drive D is Data
 Volume Serial Number is 3817-6E24

 Directory of D:\templink

03/29/2012  11:47 AM    <DIR>          .
03/29/2012  11:47 AM    <DIR>          ..
03/29/2012  11:46 AM                 8 bar.txt
03/29/2012  11:46 AM                 8 foo.txt
               2 File(s)             16 bytes
               2 Dir(s)  74,514,755,584 bytes free

D:\templink>echo hello world >foo.txt

D:\templink>dir
 Volume in drive D is Data
 Volume Serial Number is 3817-6E24

 Directory of D:\templink

03/29/2012  11:47 AM    <DIR>          .
03/29/2012  11:47 AM    <DIR>          ..
03/29/2012  11:46 AM                 8 bar.txt
03/29/2012  11:47 AM                14 foo.txt
               2 File(s)             22 bytes
               2 Dir(s)  74,514,755,584 bytes free

D:\templink>type bar.txt
hello world

D:\templink>dir
 Volume in drive D is Data
 Volume Serial Number is 3817-6E24

 Directory of D:\templink

03/29/2012  11:47 AM    <DIR>          .
03/29/2012  11:47 AM    <DIR>          ..
03/29/2012  11:47 AM                14 bar.txt
03/29/2012  11:47 AM                14 foo.txt
               2 File(s)             28 bytes
               2 Dir(s)  74,514,755,584 bytes free
This is very interesting to think about from a filter perspective. This kind of behavior where the file system will return data without any guarantees that it will remain consistent is fairly common and identifying the pattern can make life a lot easier for filter developers. For example, let's say we have a filter that wants to make certain files appear in a directory. To make things harder, let's say that all the files are stored somewhere on a network with very expensive characteristics, for example in the cloud somewhere where there is a real dollar cost in terms of bytes of traffic. If the filter is written with the assumption that the directory entries must always reflect the actual file size in the cloud then on each IRP_MN_QUERY_DIRECTORY it might query the file size from the cloud, which generates traffic and so it has a real dollar cost associated with it. However, once the developer understands this particular contract of the file system they can get away with caching the file properties locally and only updating them when the file is actually opened.
Another such example that is very dear to me is file names. Most minifilters implement very complicated procedures to store names and cache them in file contexts and so on without taking advantage of the fact that in most cases names are meant to be transient information (and also without taking advantage of the fact that FltMgr's name cache is doing exactly that anyway). For more on this see my previous post on Names and file systems filters.