Agent Best Practices for Host-based Backups

Virtualization is a dominant trend across the IT landscape—and for good reason. However, adoption of virtualization has occurred in concert with the expansion in seamless, automated backup. In this world, it is crucial to understand the role of agents relative to host-based backup.

Author(s): Jason Buffington

Published: August 27, 2012

Overview

Organizations seek enhanced economics through virtualization, but they will do nothing to risk the safety of their data. In fact, a recent ESG survey showed that improving data backup and recovery and expanding server virtualization were both top-ranked priorities (see Figure 1).[1]

Figure 1. Top Ten IT Priorities for 2012

Indeed, the tie-in with server virtualization is notable: Virtualization solves so many problems, particularly from an IT server and service delivery perspective. However, a traditional backup methodology that doesn’t understand the unique nature of how VM sprawl affects storage consumption with a great deal of duplication across virtualized operating systems) doesn’t adapt well to the needs of a virtualized infrastructure. So, although virtualization is a good thing, it can exacerbate problems in other aspects of IT, such as data protection.

In the world of virtualization, there have been three predominant ways to perform backup:

  • Guest-based, in which agents inside each virtual machine act as if the machine were physical.
  • Array-based, in which the storage solution is responsible for the copies, often through snapshotting.
  • Host-based, in which the hypervisor enables whole-VM backups, typically with help from the storage, as well as potentially from “agents” or other executable code that is inserted into each virtual machine.

But it hasn’t always been easy to protect virtual environments; many approaches have been tried and are often still in use today (see Figure 2).[2] Some tools back up data within each VM, using VM-specific agents on a per “guest” basis. Additionally, tools evolved that could make a copy of a VM’s system image. These tools usually operated at the level of the host (or array) as described earlier. That all sounds good, but in practice, guest-based backups were often overly complex and expensive, while host-based backups got better: they were not only providing a system image, but also enabling the restore of granular data from within whole VM backups.

Figure 2. Approaches to Protecting Virtual Infrastructure

Finding a better way to back up in a virtualized world has been complicated by a number of factors, not the least of which is the diversity of hypervisors, including multiple products and product versions from VMware, Microsoft, Citrix, etc.

Still, it is important to note that protection of virtualized environments is not “broken”—ESG surveys show that regardless of the method used, most IT organizations are relatively satisfied with their virtualized backups. In its 2012 Trends in Data Protection Modernization research report,[3] ESG found the following levels of satisfaction with survey respondents’ primary method of backing up virtual machines:

  • 18% Very Satisfied
  • 46% Satisfied
  • 31% Somewhat Satisfied
  • 2% Dissatisfied

How Host-based Backups Really Work

Against that background, it’s important to recognize that not all host-based backups are the same.

The goal of any host-based backup solution is application-consistent protection of virtualized applications. With Windows Server-based applications such as SQL Server, Exchange, and SharePoint being among the most frequently virtualized, a closer look at host-based backups must be on Microsoft Volume Shadow Copy Service (VSS).

Microsoft Volume Shadow Copy Service (VSS)

VSS has been part of Windows for years. It supports backup activities and snapshotting, and it provides a framework of interoperability between backup agents, production applications, and underlying storage capabilities. Along with the VSS foundation within the Windows OS, the three active components are: 

  • VSS Requester, which includes components typically found in a traditional backup agent (often from a third party) that initiates the backup process.
  • VSS Writer components are in the application or workload services (e.g., SQL Server, Exchange, or even the Windows file system), which ensures that the workload is ready to be backed up by performing tasks such as memory-based transaction flushing or other backup preparation.
  • VSS Provider components are in the storage layer (OS, software based, or hardware based) that capture a snapshot of the to-be-protected data set.

The three VSS components then crunch through a series of seven steps:

  1. A VSS Requester (agent) asks VSS to enumerate the workloads that are on track to being backed up, based on which applications have registered their VSS Writer(s) within the production applications.
  2. The backup VSS Requester then asks that the workload be prepared for backup.
  3. Upon request, the workload’s VSS Writer works through the application to prepare its data to be backed up or snapshotted—including flushing caches or other memory areas, applying transaction logs, etc.
  4. With preparation completed, the VSS Writer notifies VSS and its VSS Provider that its data is ready.
  5. The VSS Provider “snaps” the data set and notifies the VSS Requester that it has the data.
  6. The VSS Requester references the snapshot within the VSS Provider and routes appropriate data to the backup server.
  7. Finally, having completed the backup and having received acknowledgement from the backup server, VSS informs the VSS Provider (storage) that it can release the snapped data. The workload can then do its post-backup maintenance tasks.

That is how it works with a physical server. For this process to work in a virtualized environment, from the host perspective, two tiers of data conversations should occur—first between the backup server and the hypervisor (host), then between the guest OS(s) and the host and/or backup server.

In the case of Microsoft Hyper-V, the hypervisor has its own VSS Writer for its “workload” of virtualizing machines.

Host-based VSS Backups of Virtual Machines

This is how the eight steps look from a host perspective (using Hyper-V as the example):

  1. The backup software’s agent runs on the Hyper-V host and recognizes Hyper-V as able to be protected because of the host’s VSS Writer.
  2. The backup software asks to back up a particular virtual machine.
  3. The Hyper-V host’s VSS Writer does what it needs to do in order for its “data” (the VM) to be backed up.

If this were a SQL Server, it would prepare its data (the database) to be backed up. Because it is a hypervisor, its data to be backed up is a virtual machine. So, its preparation involves telling the guest’s Hyper-V Integration Components’ (ICs’) VSS Requester to be a backup agent—and the whole process happens inside the guest, which is why Microsoft refers to it as a Recursive VSS operation.

Inside the VSS-capable Guest

  1. The “preparation” of the data (step 4) occurs by the virtual machine being protected.
  1. The Hyper-V IC’s VSS Requester discovers VSS-capable workloads for protection, such as SQL Server or Exchange, by means of their VSS Writers, then instructs those workloads to be backed up.
  2. The guest-based applications then do what they do (flush logs, clear cache, etc.) to prepare for backups.
  3. When the workloads report being “ready” for backup, the workloads’ VSS Writer notifies the guest Windows OS VSS Provider that the data is ready.
  4. Then the Windows OS VSS Provider soft-snaps the data volumes, as instructed by the workloads.
  5. The Hyper-V IC VSS Requester notifies its requesting backup server, which is actually just the Hyper-V host, that the VM is in a protectable state, including an application-consistent, software-based snapshot.
  6. Now that the guest internals are “protected,” its container (the logical VM) is “ready” to be protected.

Remember, in a host-based backup model, the VM as a whole is the “data” to be backed up. So, just like any other VSS‑based workload, when it is ready, the original backup process continues.

The Host Process Continues

  1. The Hyper-V host’s VSS writer notifies the host VSS OS and its underlying VSS Providers (hardware or software) that the VM is “ready” to be snapped.
  2. The VSS Provider (HW/SW) snaps the volume that the VM virtual hard disks (VHDs) reside on.
  3. The host-based backup agent that requested the backup is given access to the snap, and it feeds the VHDs to the backup server, wherever it might be.

Figure 3 shows how VSS works.[4] There are, of course, variations with regard to scheduling and manageability, deduplication of common objects across multiple VMs, integration with higher-level management, the ability to recover individual items from within a host-based backup, and most importantly, ensuring that the guest-based application knows that it has been successfully backed up—so that it can do its log truncation and other management tasks.

Figure 3. How Recursive VSS Works—Using Hyper-V as an Example

The key to the process is that it is VSS enabled all of the way from the host’s Hyper-V VSS Writer, through the VSS components inside the guests, and back again. It is important to note that although the VSS operations within the guest are similar, VMware’s approach in vSphere 5 and its VMware APIs for Data Protection (VADP) is very different from a host perspective.

Good Agents and Bad Agents

One key difference among approaches to virtualized application backups is how transactional post-processing happens. Whereas Microsoft VSS provides a recursive handling of VSS within each guest, as well as from the host, including post-backup application log truncation, not all other hypervisors do. And while ESG expects to continue to see those API methods mature, leading backup solutions are recognizing those deficiencies and filling the gaps through their own “agents” or executable code into the guests prior or during backups. They complete this process in order to facilitate the post-process notification or other metadata/indexing functions, and that isn’t a bad thing. 

It is important to understand that not all agents are the same. Some may add more complexity than value; others are “good.”

(Not-as-good) Agents

Installing traditional backup agents may be necessary for a small number of situations with unique application requirements or granular data selection for protection needs. For most of “us,” this necessity just creates excessive I/O demands, which affects that virtual machine as well as the host and other virtual machines. That’s why host-based backup is almost always better for virtualized infrastructures, and guest-based backups should be avoided in most cases.

Good Agents

On the other hand, while most hypervisors provide a good way to back up a VM and its applications correctly, they do not all provide the same means of notifying the application after a successful backup has been achieved. Microsoft Hyper-V environments do provide this ability due to their utilization of VSS within both the Hyper-V host and the Windows guests, but not all other hypervisors do. Applications on those hypervisors won’t truncate their transaction logs or perform other database maintenance tasks. To compensate for this lack, some virtualization protection solutions will install additional agent-like executables (“good” agents) within the guests—not for backup, but simply for necessary post-backup processes.

The Bigger Truth

Virtualization makes many things in IT easier—but backup isn’t usually one of them. The real differentiation isn’t in “how” the backup was done, but rather which recovery options are available—such as instant recovery of a virtual machine from the backup storage pool or granular recovery of not only files, but also other objects from within whole-VM backups.

That being said, the “how” for VM backup matters, and while there a variety of methods in use today for protecting virtual machines, most boil down to either backing up from the “inside” of a virtual machine (guest-based) or the “outside” (host- or array-based).

In the old days, physical servers were often undersubscribed, so they had plenty of extra CPU and I/O to give when a backup began. But with virtualization, multiple virtual machines consume almost every free resource on their shared host. With most legacy-style guest-based backups, it is not uncommon for the I/O-CPU spikes incurred within one VM to affect performance of the underlying host as well as the other virtual machines within that host. So, as virtualization continues as mainstream and VM density per host continues to grow, it is important to be more efficient and less intrusive—and that means host-based or array-based backups.

Seeing that as the future, the next important note is to understand that not all backups are equal—even if they mostly use VSS within the guests. Understanding this, and selecting a backup (and recovery) solution that is truly application aware, can be the difference between those applications continuing to perform well or actually being hindered because of how they were backed up.

The question isn’t whether you have an agent (or other differently named binary) in the guest. The question is what is its job?  If your hypervisor doesn’t enable all of the functionality that you need for well-behaved virtual applications, then your backup application may take up the slack; that is okay, as long as it is handling metadata and post-processing, and not actually moving data as if it were on a physical server. To ignore this process can result in unmanaged transactional applications that must be manually managed by the app-owner, and data that might only be recovered by first restoring the entire VM and its virtual disks. Clearly, this is an undesirable end.

So for now, the “how” for backup boils down to good agents and “bad” agents—and each should be recognized for what it is, and either accepted or avoided, respectively.


[1] Source: ESG Research Report, 2012 IT Spending Intentions Survey, January 2012.

[2] Source: ESG Research Report, 2012 Trends in Data Protection Modernization, August 2012.

[3] Ibid.

[4] Source: Jason Buffington, Data Protection for Virtual Data Centers, Wiley Press, 2010.

 

NEWSLETTER

Enter your email address, and click subscribe