VDR Backup Failures

I’ve spent the last month dealing with VDR backup failures. Sometime around vSphere 4.1 VDR stated failing on my domain controllers, then my SQL servers, then all windows servers in general. The linux servers and any other VM’s backing up without application quiescing never skipped a beat. Here’s the dreaded error message from the VDR appliance:

Failed to create snapshot for specops-cmd, error -3960 ( cannot quiesce virtual machine)

or from the vCenter server itself:

Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.

DAMNED LIES! That wasn’t the real problem.

After a whole lot of testing, about 12 emails, 2 webex sessions, and 5 phone-calls to support, I narrowed the problem down to the following multiple causes. Any of the following will cause this error message on a Server 2008 R2 VM backing up with the VMWare Tools VSS driver.

  1. Independent Disks
  2. iSCSI Connections with MS iSCSI
  3. Running an older version of vmware tools — note that vCenter itself will tell you the tools are ‘OK’ unless you’re a major build behind. My systems were running b257589, and most of the VDR issues were resolved when I did an in-place upgrade to b299420.
  4. “Missing” drives in Disk Management. I had one system that kept adding ‘missing drives’. The key was that on boot, I’d get the message “VMWare Customization in Progress”. Apparently it never cleared sysprep out of the boot list. See the fix for that at my wiki here: vSphere on JP Wiki. Search for ‘customization’.
  5. There is a problem with the way VMWare Tools calls for a VSS Snapshot on systems running Active Directory (the NTDS writer). The writer will show “non-retryable error” after a backup attempt. This is a known issue slated to be fixed in the next release of VMWare tools. For now, just create a text file in “C:\programdata\vmware\vmware tools” called vmbackup.conf with 1 line: “NTDS” without the quotes. This disables the NTDS writer, but it’s better than Disk.EnableUUID=false because the rest of your system will still be application-quiesced.

I hope this helps someone avoids the work of systematically ruling out the other 20+ suspected causes :).

References:

Advertisements

7 thoughts on “VDR Backup Failures

  1. Cheers for the post, and I know your pain. I’ve been investigating this and getting no-where for weeks now. My problem is I’ve got Backup Exec 2010 R2 in the mix and am getting failed / corrupt vmdk files along with the updates not installing along with (more recently) the ‘quiesced snapshot’ issue.

    Anyway, will check out some of the things you’ve posted.
    Just wanted to say ‘Cheers for your research and info’.

  2. THANK YOU!

    I had this problem for awhile now and resorted to removing the “missing” drive and running the VDR Backup manually, which seemed to work.

    I added the conf file to the folder, and will see if the automated snapshot works

  3. Hey windowsamsher,
    Can you give me the link to the page that refers to the missing disk in disk management issue thats posted number 4 in the list above thanks

  4. Pingback: Table of Contents | windowsmasher

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s