Thursday, April 19, 2012

The Standby List and Storage Overprovisioning

This post is about an interesting issue I spent quite a bit of time debugging. As is often the case with very complex system, I knew most of the bits of information related to the issue but I didn't quite put everything together and so this scenario still surprised me.
It all started with me playing with file sizes and directory entries and so I was copying a large number of files to a VHD. Since I didn't have a lot of space for the VHD in my VM I decided to make all the files sparse so that they won't take any space on the VHD, which was rather small (2GB). I created about 20GB worth of sparse files on it and all was well. I've actually been using this setup for a while but when doing this in a Win8 VM I quickly ran into problems. The VHD ran out of free space. I knew there was no way for that to happen since all my files were sparse files and I didn't expect that I suddenly had that many directory entries that the file system metadata actually used most of the 2GB of the VHD. So figured this would be an interesting investigation.
The first thing I noticed was that if I dismounted the VHD and mounted it again the VHD look pretty much the way I expected: all the files were there and there were roughly 2GB worth of free space. So it seems it was a transient situation. I spent a bit of time looking at various NTFS counters (fsutil is a good tool for that), looking at ProcMon logs and poking the file system in various ways, but I couldn't find anything. I was about to embark on the next step, which was to try to figure out which blocks on the volume are owned by which file in the file system so that I can see why those files aren't sparse, but I was lucky enough to discover by chance that exactly the same behavior happens on a Win7 machine when running Microsoft Security Essentials. This was quite helpful because I stopped suspecting there was some new behavior in NTFS in Win8 and instead I could focus on Microsoft Security Essentials (MSE). Other AV products I had running in my VMs didn't seem to have the same effect so this was particular to MSE. One thing I knew that was rather unique to MSE (at least it was a some point in the past) was the fact that MSE uses mapped files (also known as sections in the NT world) to read file data so I started wondering if that had anything to do with it.
So using fsutil I created a new 1GB file and made it sparse. Then I opened it with FileTest, created a file mapping and then mapped a view for the whole file. Guess what: NTFS reserved space for the range I was reading (naturally I didn’t expect MSE to change the files it was scanning so I was just reading the files). This is necessary because in case something writes to the file using the section NTFS must be able to save that information to disk. When working with mapped files NTFS can't know in advance what data will be written (if any) and so it does a lot of preparation to be able to accommodate the scenario where everything is written. So it was pretty clear what was going on, the fact that MSE created sections for my sparse files made NTFS reserve blocks for the files. The one last thing I had to figure out was why MSE held on to the sections for so long. My files were pretty small (less than 1 MB on average) and so it took quite a lot of them to get NTFS to run out of free space. Initially I suspected a section leak in MSE, but while I was playing with FileTest I noticed that even when I closed FileTest (and so I could know for sure that all the file handles and memory mapping handles and mapped views and so on were released) the blocks still weren't returned to the free space pool. And at this point it hit me that it must have been MM that kept the section open and indeed using RamMap I could see that was the case.
Here is a quick recap about what the standby list is. When a file is used for memory mapped IO, when the pages are no longer used (the view is unmapped or the section or the file handle are closed and even when the whole process is terminated) the pages that are backed by the file are moved to the standby list. They will be moved out of the standby list to either the free list or the zero list (depending on whether there is memory pressure in the system and who's asking for what kind of pages) or they will be reused if the same file is used for memory mapped or cached IO. This last behavior is pretty much a file cache (not to be confused with the cache manager which has quite a different role). In my case since there was no memory pressure the pages would remain on the standby list for quite a while and so NTFS would not see the section being closed and so it kept the reserved blocks.
Please note that this is not unique to sparse files, any form of files that are overprovisioned (such as compressed files) have the same semantics. So it is quite easy to run out of space on a volume where the total logical size of all the files exceeds the volume's capacity even without writing anything to the volume.
Now, since this is a file systems and filters development blog I should mention that if you are a file system or a filter and you do any work with compressed files or sparse files or some such, you can actually tell MM to close a section using MmForceSectionClosed().
Update: I wanted to add some steps on how to reproduce this problem, in case you're interested.
  1. Create an empty 1GB file:
    C:\>fsutil file createnew C:\TestFile.bin 1000000000
    File C:\TestFile.bin is created
    
  2. Make the file sparse:
    C:\>fsutil sparse setflag C:\TestFile.bin
    C:\>fsutil sparse setrange C:\TestFile.bin 0 1000000000
  3. open the file in FileTest.exe. Make sure to request GENERIC_WRITE access:
  4. create a read-only file mapping, map a view and read the whole file:
  5. unmap the view, close the section handle and then the file handle and then close FileTest.exe (we could have closed it directly as well).
  6. you now have 1 GB less free space on C:
    C:\>dir
     Volume in drive C has no label.
     Volume Serial Number is 10FA-5C1D
    
     Directory of C:\
    
    06/10/2009  02:42 PM                24 autoexec.bat
    06/10/2009  02:42 PM                10 config.sys
    03/02/2010  06:31 PM    >DIR>          Far
    07/13/2009  07:37 PM    >DIR>          PerfLogs
    08/11/2011  11:13 AM    >DIR>          Perl
    11/05/2010  09:34 AM    >DIR>          Program Files
    04/20/2012  10:12 AM     1,000,000,000 TestFile.bin
    11/04/2009  12:57 PM    >DIR>          Users
    11/05/2010  09:40 AM    >DIR>          Windows
                   3 File(s)  1,000,000,034 bytes
                   6 Dir(s)  39,859,396,608 bytes free
    
  7. Use RamMap and see all the pages for the file on the standby list:
  8. Empty the standby list (click on Empty->Empty Standby List).
  9. Finally check the free space on C: again:
    C:\>dir
     Volume in drive C has no label.
     Volume Serial Number is 10FA-5C1D
    
     Directory of C:\
    
    06/10/2009  02:42 PM                24 autoexec.bat
    06/10/2009  02:42 PM                10 config.sys
    03/02/2010  06:31 PM    >DIR>          Far
    07/13/2009  07:37 PM    >DIR>          PerfLogs
    08/11/2011  11:13 AM    >DIR>          Perl
    11/05/2010  09:34 AM    >DIR>          Program Files
    04/20/2012  10:12 AM     1,000,000,000 TestFile.bin
    11/04/2009  12:57 PM    >DIR>          Users
    11/05/2010  09:40 AM    >DIR>          Windows
                   3 File(s)  1,000,000,034 bytes
                   6 Dir(s)  41,361,416,192 bytes free