I recently had an opportunity to offer assistance on a case relating to stop errors (blues screens) experienced in a Virtual Machine (VM) running on a Hyper-V Failover Cluster. I was advised that two attempts to increase memory on the VM did not provide positive results (I’ll explain later on why the amount of memory assigned to the VM was suspect). The only thing I could initially get my hands on was a memory dump file, and I would like to take you through how one command in WinDbg can give you clues on what the cause of the issue was and how it was resolved.
Quick Memory Dump Analysis
So I started to take a look at the Kernel Memory Dump that was generated during the most recent crash using the Debugging Tools for Windows (WinDbg). WinDbg can be downloaded at https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools. I’m not a regular debugger but I immediately made interesting discoveries when I opened the dump file.
The following are noticeable from the image above:
- User address space may not be available as this is a kernel dump
- Symbols and other information that may be useful such as product build
- Bugcheck analysis (in case of a crash) with some good guidance on next steps
Let us get the issue of assigned memory out of the way before we look at other data. I used the !mem command from the MEX Debugging Extension for WinDbg (https://www.microsoft.com/en-us/download/details.aspx?id=53304) to dump memory information. As it can be seen on the image below, available memory is definitely low, which explains the reason for increasing assigned memory (which was later dropped as it did not help in this case).
The !vm command provides similar output if you don’t use the MEX extension.
I ran !analyze –v to get detailed debugging information as WinDbg suggests.
The output above shows that this was a Bug Check 0x7A: KERNEL_DATA_INPAGE_ERROR (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x7a–kernel-data-inpage-error). More information can also be found in the WinDbg help file if you are unable to access the Internet. Additional debug text states that the Windows Memory Manager detected corruption of a pagefile page while performing an in-page operation. The data read from storage does not match the original data written. This indicates the data was corrupted by the storage stack, or device hardware. Just be careful since this is a VM and does not have direct access to hardware!
This explanation is inline with what I picked up in the stack:
How to determine the appropriate page file size for 64-bit versions of Windows provides a nice summary and guidance on paging files.
Let’s take a brief look at the !analyze window above (Bugcheck Analysis). Here it can be seen that the BIOS date is 05/23/2012. This is concerning as system BIOS should be kept up to date. This also gave me a clue that we could be dealing with outdated Integration Services, which was the case.
Hyper-V Integration Services allow a virtual machine to communicate with the Hyper-V host. Many of these services are conveniences, such as guest file copy, while others are important to the virtual machine’s ability to function correctly.
What’s the cause of this unexpected behavior?
You’ve guessed it! Outdated Integration Services. Here’s what happened:
- The VM was configured with a startup RAM of 4 GB
- Guest physical memory dropped when the VM did not need it (memory was reclaimed by Hyper-V)
- An attempt by the VM to reclaim this RAM later when it was required failed as it (the VM) had difficulties communicating with the host through the Dynamic Memory Integration Service
Upgrading Integration Services resolved the issue. After monitoring for some time, the VM was stable and there was no more memory pressure – it was able to reclaim memory as it needed it. Here is an example of what it looked like in Process Explorer’s System Information View.
This document also states that Integration Services must be upgraded to the latest version and that the guest operating system (VM) must supports Dynamic Memory in order for this feature to function properly.
I demonstrated how one command in WinDbg (!analyze –v) can help you with some clues when dealing with system crashes. In this case, it was outdated Integration Services (BIOS date was the clue). I would also like to highlight the importance of monitoring. There is a lot of information on the Internet on ensuring smooth and reliable operation of Hyper-V hosts and VMs.
If WinDbg and a memory dump was all you had, this would be one of the ways to go. Grab a free copy and have it ready on your workstation if you don’t already have it installed : )
Till next time…