Thursday, March 1, 2012

Win8 Bugcheck 0x1a_61946

I was going to skip posting this week but at plugfest last week I've run into an issue that I believe might impact users testing with Win8 and I wanted to explain what the problem is and how it can be avoided.

This is what the bugcheck might look like:

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

MEMORY_MANAGEMENT (1a)
    # Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 00061946, The subtype of the bugcheck.
Arg2: d0427500
Arg3: 000214ce
Arg4: 00000000

Debugging Details:
------------------


BUGCHECK_STR:  0x1a_61946

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from 8217aa43 to 82151d70

STACK_TEXT:  
8989f2ec 8217aa43 00000003 07cae677 0000001a nt!RtlpBreakWithStatusInstruction
8989f33c 8217a088 00000003 80bf4138 8989f744 nt!KiBugCheckDebugBreak+0x1c
8989f718 821509da 0000001a 00061946 d0427500 nt!KeBugCheck2+0x594
8989f73c 82150911 0000001a 00061946 d0427500 nt!KiBugCheck2+0xc6
8989f75c 821f0e08 0000001a 00061946 d0427500 nt!KeBugCheckEx+0x19
8989f79c 820b9b14 8989f7c0 000000c4 d0427500 nt! ?? ::FNODOBFM::`string'+0x1dc6f
8989f800 82518ec8 d0427500 00000000 00000001 nt!MmProbeAndLockPages+0x134
8989f820 8281fb38 d0427500 00000000 00000001 nt!VerifierMmProbeAndLockPages+0x7b
8989f860 8282553d d53fc4f8 d5601e28 00000001 Ntfs!NtfsLockUserBuffer+0x4c
8989f880 828177eb 00000001 00000000 d53fc4f8 Ntfs!NtfsPrePostIrpInternal+0xa1
8989f8a0 82824d68 00000001 8dcd8e00 db5e1c9f Ntfs!NtfsPostRequest+0x21
8989f90c 8281d01e d53fc4f8 d5601e28 c00000d8 Ntfs!NtfsProcessException+0x2b4
8989f988 82500e6b 88cac018 d5601e28 d5601c6c Ntfs!NtfsFsdRead+0x376
8989f9b0 82090047 8278b0ee d5601c68 8989fa14 nt!IovCallDriver+0x2f3
8989f9c0 8278b0ee d5601e28 88ca6ba8 d421e6c0 nt!IofCallDriver+0x72
8989fa14 8278c432 8989fa38 00000000 00000000 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25b
8989fa50 82500e6b 88ca6ba8 d5601e28 88ccf120 FLTMGR!FltpDispatch+0xca
8989fa78 82090047 822d9c57 d5601e28 8989fadc nt!IovCallDriver+0x2f3
8989fa88 822d9c57 d5601e28 d5602000 d5601e30 nt!IofCallDriver+0x72
8989fadc 822d3b37 88ca6ba8 00000001 00000000 nt!IopSynchronousServiceTail+0x10a
8989fb74 821cabec 88ca6ba8 80002e70 00000000 nt!NtReadFile+0x3f7
8989fb74 8214f1f1 88ca6ba8 80002e70 00000000 nt!KiFastCallEntry+0x12c
8989fc10 82780b85 80002038 80002e70 00000000 nt!ZwReadFile+0x11
8989fc94 82771fa9 8989fcc8 94c0f000 00001000 mydriver!MyDriverReadFile+0x28f 

STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt! ?? ::FNODOBFM::`string'+1dc6f
821f0e08 8bd0            mov     edx,eax

SYMBOL_STACK_INDEX:  5

SYMBOL_NAME:  nt! ?? ::FNODOBFM::`string'+1dc6f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  4f23bd74

FAILURE_BUCKET_ID:  0x1a_61946_VRF_nt!_??_::FNODOBFM::_string_+1dc6f

BUCKET_ID:  0x1a_61946_VRF_nt!_??_::FNODOBFM::_string_+1dc6f

Followup: MachineOwner
---------

Now, what MyDriver was doing was to take an MDL that was used for a paging read and it created a system VA for it (by calling MmGetSystemAddressForMdlSafe()) and then it was populating some parts of it by calling ZwReadFile to read into that buffer. The problem is that when calling ZwReadFile for the address the pages will be marked dirty. However, the pages were supposed to be populated from a paging read and so it makes no sense that they are marked "dirty" in the paging read path. There are a couple of different reasons where this is problematic:

  • If the pages are marked dirty it means that MM will try to save their contents (since it believe the page contains data that hasn't been saved to disk yet - that's what a dirty page means). So if MM is running out of physical pages it will try to create more space by reusing some of the existing physical pages. For pages that aren't dirty MM can simply discard them because it knows it can get the contents back later (through another paging read). However, if the pages are marked dirty it must save the contents somewhere (in this case in the page file) and fault them in from the page file later. This is slow (MM must first save the contents to the page file though it doesn’t need to) and it can lead to problems because space in the page file hasn't been allocated for this page so MM might run out of space later and bugcheck.
  • this can be problematic for image pages (for exes, dlls) because any fixups that might normally be applied on the fly when paging them in from the file system will become "permanent" because the data is now dirty and so it might lead to crashes later on when the page is used in a different process.

Anyway, this is clearly bad so let's discuss what a driver should do about this:

  • If a file system filter (or any other type of driver) has an MDL and it wants to populate it with data from a file (or from multiple files) (i.e. if it needs to issue one or more reads to get the data) then it must issue an IRP_MJ_READ request using the same MDL that was passed in (please note that it might be necessary to split the MDL by using IoBuildPartialMdl()). Minifilters can use FltReadFileEx() to read data directy into the MDL (or MDLs). Legacy filters or other types of drivers must NOT use ZwReadFile and instead they must issue an IRP_MJ_READ with the partial MDLs.
  • If a file system (or file system filter) wants to populate the MDL or parts of the MDL by writing into it directly (for example an encryption filter that will write the decrypted data into the user's buffer directly; another example would be a filter that completely owns a virtual file for which it supplies all the contents without trying to read them from disk at all) it can call MmGetSystemAddressForMdlSafe() to get a system VA and then the filter can write directly into that buffer. However, the filter must not create a new MDL for any part of that buffer. If a filter must use a MDL for all or some part of the buffer it should build a partial MDL.

Please note that this has always been the correct way of doing things, but developers have been taking the easy way out and getting lucky until now. Checked builds for previous OS releases used assert when this was detected. One additional benefit of doing things the right way is that performance should be slightly better as well.

Just to point out what this might look like for a user, please take a look at this VirtualBox ticket https://www.virtualbox.org/ticket/10290 which illustrates this issue.