ESG Technical Review: Simplify Business Continuity and Disaster Recovery Orchestration with Veritas Resiliency Platform

Co-Author(s): Christophe Bertrand


This ESG Technical Review documents hands-on validation of the Veritas Resiliency Platform (VRP). The goal of the report is to validate that VRP, once integrated within a business’s IT environment, significantly enhances the ability of IT to support the business’s digital transformation initiatives by reducing the risk of interruptions caused by application downtime.

The Challenges

ESG research reveals that businesses feel that managing IT systems and applications isn’t getting any easier. In fact, two-thirds of businesses (66%) say that IT is more complex than it was two years ago. Many factors are driving this increased complexity, including: the number and type of endpoint devices, higher data volumes, the number and types of applications, and the need to use both on-premises data centers and public cloud providers.1 Businesses are also finding that managing both on-premises applications and cloud applications requires new skillsets and is more difficult and costly than they had anticipated. Migrating a workload from an on-premises environment to the cloud will often require an elaborate, multi-stepped process that may include machine image conversion, data exports and imports, networking reconfiguration, and other customized processes.

Because of these challenges, ESG believes that intelligent orchestration and automation technologies like the Veritas Resiliency Platform will play a major role in ensuring that IT organizations can reliably deliver highly available applications for the business, regardless of where the applications are running.

The Solution: Veritas Resiliency Platform

The Veritas Resiliency Platform gives a business the means to manage where virtual machines and applications are running across all their data centers, including the public cloud. With VRP, virtual machines and related data are grouped together in objects called “Resiliency Groups” (RGs), and RGs can be grouped together into objects called “Virtual Business Services” (VBS). In addition, VRP ensures that virtual machine data is replicated to one or more additional locations (e.g., a secondary data center or the public cloud) and that when required—for example, following an outage at the primary location—the virtual machine, application, group of applications, or entire data center can be moved to a new location.

The key architectural components include:

  • The Resiliency Manager (RM). This is the main management server of the Veritas Resiliency Platform. This is a virtual appliance that may be installed in multiple locations to ensure that the environment can be managed in the event of an outage at one or more sites. The RM also hosts the HTML5-based user interface.
  • Infrastructure Management Server (IMS). This is a virtual appliance installed at each site that discovers, monitors, and manages the physical assets at each site. This appliance performs functions like starting and stopping VMs within the local data center. Multiple IMSs may have to be installed depending on the number and type of assets to be managed.
  • Replication Gateway or Data Mover (DM). This is a virtual appliance installed at each site that manages data replication between sites. Multiple DMs may be required based on the environments being replicated and the volume of data and associated change rates.
  • Application Virtual Machines (VMs). These are the VMs that are protected and managed by VRP. These may be VMs within VMware, Hyper-V, AWS, or Azure.
  • External Storage Platforms. VRP supports the native storage replication technologies on many external storage arrays as a mechanism for replicating VM data.

ESG Lab Validated

ESG performed a hands-on evaluation of the Veritas Resiliency Platform via a joint testing session hosted by Veritas. We evaluated the integration and operation of VRP with a vSphere cluster running in Veritas’s lab and on AWS EC2 virtual machines.

Getting Started

ESG began its review of VRP with an overview of Veritas’s test vSphere cluster being used for this report. We used VMware vCenter to review the vSphere configuration. Once we had a solid understanding of the in-scope vSphere environment, we reviewed the requirements and process involved for the VRP deployment and management.

During the testing process, ESG had the opportunity to review the deployment and configuration steps in detail. As shown in Figure 3, when deploying a technology like VRP, a simple and straightforward installation and configuration process with end-to-end visibility is important so that organizations can quickly obtain value from the platform.

Key findings of ESG’s review of the installation and configuration with end-to-end visibility included:

  • All the virtual appliances required to install VRP, including those within the public cloud, are readily available from Veritas and the public cloud providers (via the Azure and AWS Marketplaces). These machine images are easily imported and booted for each environment.
  • Veritas has a sizing guide that recommends the number of appliances required based on the size of the environment (number of VMs, amount of data, data change rates, etc.). These factors will obviously affect the complexity and time it takes to deploy VRP.
  • The architecture of the environment will also impact installation and setup. For example, integrating one or more public cloud environments will require virtual appliances to be configured within those environments as well as the requisite VMs and associated storage.
  • The use of alternative data replication techniques (e.g., storage array-based replication) will require integration with those storage technologies with the VRP Infrastructure Management Servers, which will also add complexity and require additional installation and configuration steps.

Next, ESG started the VRP GUI by directing our web browser at the Resiliency Manger (RM) virtual machine in the testing environment. From the main dashboard, we were presented with a map of the current VRP Resiliency Domain showing the two locations (see Figure 4). Note that the Veritas lab site is represented by a circle icon on the east coast of the US while the AWS location is represented by a cloud icon on the west coast. The map displays the relationship between the various sites and configured resiliency objects (Groups and Services).

The VRP dashboard is laid out in a manner that makes it easy to ascertain the overall state of resources within the Resiliency Domain. The VRP configuration is displayed, including where VRP services are being run across the environment. In this lab setting, there was one Resiliency Manager server running at the AWS site (site “B”). There was also one Infrastructure Management Server running at each site.

To the right of the map is an inventory and status summary of the Resiliency Groups and Virtual Business Services. From this display, one can quickly see which groups are functioning as expected, which are at risk, and which have configuration issues that must be addressed. The dashboard also displays any active migration or rehearsal activities, along with the elapsed time of the most recent data replication tasks. We also reviewed some of the key VRP constructs, including Resiliency Groups and Virtual Business Services (see Figure 4).

We then went through the steps required to create Resiliency Groups and a Virtual Business Service. These steps were very intuitive. VRP GUI does a good job of preventing a user from doing things that they aren’t supposed to do. For example, an asset (e.g., a VM) cannot belong to more than one RG. Therefore, when creating a new RG, only those assets that don’t belong to an RG are shown in the pick list. We were then able to review the state of the RG, as seen in Figure 5.

The Resiliency Group view displays all the key information regarding a specific RG. The “Details” section provides key information, including in which data center the group is active, what data replication method is being used, any current activity that may be taking place, and the status of recent activity. The “Service Objective” section identifies the service level requirements for the RG, including the RPO, the currently available recovery point (i.e., when the last replication completed), and the state of “readiness” for recovery. The “Risks” section summarizes the potential risks to the RG to recoverability, continuity, and meeting the SLA. The user can drill down into each of the areas to obtain additional details on the associated risks.

The “Replication” section provides a summary of the state of replication for the RG, including a graphical representation of where the RG is currently active (in dark green in Figure 5), and where its recovery location has been defined (shown in light green). Also shown is the state of data synchronization, including whether data synchronization is lagging behind the stated RPO.

Why This Matters

IT environments are getting more complex due to higher volumes of data, greater numbers and variety of endpoints, and greater numbers and variety of applications. No one wants the tools and applications that they use to manage these environments to follow suit, especially in areas like automation and orchestration where specialized skills are in short supply. However, simplicity alone is not always the answer if the tools and application don’t have what it takes to do the job.

ESG confirmed that the installation and setup is straightforward, allowing a business to quickly extract value from the platform. The VRP console is user-friendly, intuitive, and very responsive. Gathering asset information from the current vSphere environment was as easy as pointing at the vCenter server and providing the necessary credentials. Setting up the connectivity to the AWS cloud was also straightforward and quick to do. Once the assets were defined in an existing AWS account, it was easy to create the required VRP objects necessary to manage migration between sites.

Automation, Orchestration, and Risk Mitigation

Some of the key features of VRP are the automation and orchestration features. Basically, what these allow you to do is define a scripted playbook for migrating a Resource Group, a Virtual Business Service, or an entire data center to another site. The interface used to execute these tasks is simple to use, involving the drag-and-drop of various assets in a specific order with the ability to add customized steps/scripts/manual procedures in between. There are even options to take unique actions based on the return code from inserted scripts.

There are two kinds of automation plans:

  1. Resiliency Plan. This is a plan that you would use to perform a planned migration of an RG, VBS, or entire data center.
  2. Evacuation Plan. This is a plan that you would use to evacuate an entire primary site and bring it back up on a secondary site (like the “Takeover” function). With this option, a graceful shutdown of services is attempted at the primary site before migrating to the secondary site.

During ESG’s review, we created a Resiliency Plan for the VBS called “Finance Group” that was previously created. Figure 6 shows the tasks we selected to easily create a test Resiliency Plan.

As previously stated, creating a Resiliency Plan is a drag-and-drop operation within VRP. VRP provides all the functions necessary to build the Resiliency Plan including functions to perform actions such as stopping a service/group, starting a service/group, executing a script, inserting a manual task for activities that can’t be automated (e.g., throwing on a power switch), network reconfiguration, and many more common functions needed to migrate an application. Each of these steps is linked in a sequential manner by using connecting arrows from one step to the next. The process is very intuitive and offers a rehearsal feature that allows the plan to be tested before being executed on a live application. It should be noted that rehearsal mode is launched by simply clicking on the rehearsal icon, which is in the “Management Operations” section of the user interface, the same place you would run a Resiliency Plan. A rehearsal creates a snapshot of the replicated disk of a Resiliency Plan and automatically provisions the associated VMs. It can be run at the RG or VBS level. All the rehearsal resources are run on an isolated network so there is no impact to the production environment. A rehearsal can be used to test and report on DR or for other IT operations such as Test/Dev and analytics. Once the rehearsal is no longer needed, the administrator can simply click on the Cleanup icon and all the provisioned resources will be removed. Another useful feature of creating a plan is the ability to terminate and roll back the plan based on the return code from a custom script executed in a previous step.

Next, as shown in Figure 7, ESG invoked a site-to-site failover plan and monitored the state of the VBS as it was gracefully stopped at the primary site (running on vSphere at site “A”) and then brought up on the secondary site (running on AWS EC2 at site “B”). The console provided an easy way to view the progress of the plan. As shown in Figure 7, the defined steps and order are shown at the top of the console and the status of individual tasks are shown below as they are being executed along with detailed statistics including start time, end time, and duration.

As intended, the RGs and associated assets in the VBS were gracefully shut down, migrated, and then started up within AWS. It should be noted that all data conversion (between vSphere and AWS) was handled by the VRP data mover and no special conversion steps had to be performed.

VRP also includes a rich reporting capability with many predefined reports that can be manually generated or automatically generated and distributed on a regular basis. One of the most useful reports that ESG generated during the review was the “Current Risks” report. This report provides a summary along with details on all the risks associated with the RGs and VBS being managed. Risks are classified according to whether they affect data recoverability, continuity of business applications, or the ability to meet the specified SLAs. The severity of the risk is also classified as either “error” or “warning.” This comprehensive risk report enables the IT organization and the business to quickly identify where remediation is needed in order to support the level of resiliency needed by the business. Figure 8 shows some of the output from the “Current Risks” report that we ran during our testing.

Why This Matters

In today’s complex and constantly changing data centers, it’s often difficult to understand what impact faults and changes within the environment will have on the resiliency of the business’s critical applications. And the complexity of IT environments and increased criticality of IT business systems has made the use of manual procedures and “playbooks” obsolete. Now more than ever, automation and orchestration are required to deliver the resiliency and availability of applications that the business demands.

The Veritas Resiliency Platform has robust capabilities for creating these plans that automate and orchestrate the steps necessary to ensure that business function can be quickly and accurately restored. This capability reduces human error and the need for subject matter experts to manually perform the steps required to restore business function. And the Veritas Resiliency Platform provides end-to-end visibility of these risks in a single console.

The Bigger Truth

Digital transformation has fundamentally changed the IT landscape, driving up the diversity of applications, number and types of devices to be managed, rapid growth in data volume, and hybrid architectures consisting of traditional and cloud-native applications running within on-premises data centers as well as in public and private clouds. These factors make ensuring that applications are resilient and highly available more challenging than it has ever been. Meanwhile, the organization’s business units are demanding reduced risk, downtime, and cost from the solutions their IT teams deliver and manage. Also adding to this increased complexity is the challenge IT organizations face when ensuring that they have the proper skillsets needed to manage these new environments.

Automation and orchestration technologies like Veritas Resiliency Platform will be a strategic element for organizations looking to ensure that their digital transformation initiatives are successful. VRP enables the business to meet their application availability requirements while integrating public and private cloud infrastructures. It also helps mitigate the corresponding increase in complexity needed to deliver application fluidity that is typically found in more traditional solutions. Change is happening at a rapid pace, even in small and mid-sized businesses, and traditional techniques for managing application availability and resiliency are no longer adequate. VRP does not require new skillsets to deploy, and once configured, proactively identifies risk elements across the landscape that may impact application availability. This fundamentally changes the management paradigm from siloed management of individual infrastructure elements (compute, networking, storage) to managing business services. This is how the IT organization ultimately becomes a business driver in collaboration with the lines of business, rather than just a cost center and impediment to the business’s goals.

1. Source: ESG Master Survey Results, 2019 IT Spending Intentions Survey, March 2019.
This ESG Technical Review was commissioned by Veritas and is distributed under license from ESG.
Topics: IT Infrastructure Data Protection Cloud Services & Orchestration