VMware vSphere Disaster Recovery (DR) Best Practices

When it comes to designing an environment for disaster recovery and especially site-level recovery, there needs to be a lot of thought and planning ahead of time to ensure the best strategy and processes are used for recovery. When disaster strikes, the processes need to flow smoothly and be proven with extensive testing ahead of time.

In a VMware vSphere environment, there are a lot of features and capabilities that can be utilized to ensure business-critical data is available at all times in multiple locations. The 3-2-1 backup best practice methodology is a great benchmark standard by which DR plans can be designed to ensure data availability and recoverability.

Protect Your Data with BDRSuite

Cost-Effective Backup Solution for VMs, Servers, Endpoints, Cloud VMs & SaaS applications. Supports On-Premise, Remote, Hybrid and Cloud Backup, including Disaster Recovery, Ransomware Defense & more!

Learn More

Develop a Disaster Recovery Plan
Provision a Disaster Recovery Site
Make Use of VMware vSphere Cluster Technologies
Backups are Essential
Use Replication for Failover Capabilities
Use Network Automation in the DR Site Recovery
Verify Backups
Automatic Backups, Replication, Verification, and Network Automation

When thinking about DR in a VMware vSphere environment, what are some of the VMware vSphere Disaster Recovery Best Practices when it comes to backing up data and ensuring it is available even in the event of a site-level failure?

Let’s take a look at some of the important VMware vSphere Disaster Recovery Best Practices that organizations need to give attention to when designing their disaster recovery and overall business continuity strategies.

Develop a Disaster Recovery Plan

The adage that “poor planning leads to poor performance” is certainly true in the realm of disaster recovery and data availability. Organizations that fail to plan for disaster are essentially asking for “disaster” as disaster can strike when least expected. Disaster is usually in the form of a “data disaster” meaning that it isn’t always a natural disaster or some other catastrophic event that leads to needing a DR plan. Often, it can be an accidental erasure of business-critical data or an intentional disruption of data due to something such as a ransomware attack that holds data hostage. Businesses need to plan for these and other types of disaster that makes sense for their type of business, location, and other factors.

Performing a risk analysis is an important step in disaster recovery planning as it leads to identifying the common or more likely disasters that may happen for a particular organization. These will most likely be different from company to company.

However, this type of planning leads to thinking deeply about the critical aspects of infrastructure and how these will be affected in different types of scenarios. This Disaster Recovery Plan needs to include not only the systems and infrastructure but all the processes and workflows that will be disrupted or need to be rerouted in the event of a disaster. A contingency plan will help to define the secondary processes and procedures that will handle disruption with the primary business-critical processes that may be disrupted.

Provision a Disaster Recovery Site

A disaster recovery site is a critically important part of the overall disaster recovery strategy for an organization. The disaster recovery site is a physically separate facility that can be located relatively close, or in a completely different geographic region to the primary production site. There are tradeoffs when deciding to house a disaster recovery site relatively close or far away. Sites that are close normally have extremely low latency that allows using more advanced techniques and technologies to replicate and protect data.

However, if the DR site is close, it is likely it could be affected if the impact of a disaster is large in scope such as in a hurricane, etc. Sites that are much farther away are limited by the relatively much higher latency than sites closer to the production facility. However, they are much less likely to be impacted by a disaster that is large in scope as it could be in a different geographic region.

What is contained in a DR site?

The DR site is generally a replica in terms of infrastructure (compute, network, storage) to a production site. There could be differences, however, depending on the SLA for performance during a disaster recovery event. Businesses often scope the performance for DR sites to be much less than a true production facility as the business may have decided the impact on performance during a DR event is tolerable with less powerful hardware. Some businesses may opt for the exact same hardware in the DR site as is the case in production as they may not want any impact to performance to be evident even during a DR event.

A DR site may contain a similarly configured VMware vSphere cluster as is found in production so that virtual machines can be replicated from production to the DR facility. This allows the data to be completely recreated in the DR site and readily available for failover with the given RPO and RTO values decided upon.

Make Use of VMware vSphere Cluster Technologies

What are some technologies that are used for the purposes of disaster recovery?

When thinking about disaster recovery and the availability of business-critical data, organizations do well to start with the basics and work their way forward. As mentioned, VMware vSphere contains great features and capabilities right out of the box when considering how to keep data available.

VMware vSphere clusters contain technologies that are built-in at the vSphere cluster level allows organizations to account for hardware failure or even performance related impacting events.

VMware High-availability or HA technology is a cluster-level technology that allows restarting a virtual machine on a healthy host in the event the host on which the VM currently resides on goes down for some reason (hardware failure, etc). This results in only a short downtime for the VM since it is quickly restarted on a healthy host. This requires no administrative interaction to accomplish.

VMware’s Distributed Resource Scheduler or DRS is another cluster-level technology that allows virtual machines to be migrated between hosts depending on certain load and performance metrics at the cluster-level. If a host begins to experience high memory pressure or CPU contention, VMs can be migrated to a different host to account for the impacting resource limitations.

Enabling VMware HA in a VMware vSphere cluster

Enabling VMware DRS in a VMware vSphere cluster

While HA and DRS are not truly recovery of data in the classic sense, this high-availability mechanism is an essential core part of the overall availability of data that organizations need to account for.

Backups are Essential

Backups are at the core of any data protection solution. They are absolutely necessary to protect data. High availability mechanisms fall short when there isn’t an underlying hardware reason that data isn’t available. If data is simply deleted or corrupted, a high-availability mechanism has no way to recover your data. Backups are exact copies of your data stored in a separate location that doesn’t depend in any way on the production infrastructure or systems.

Today’s modern backup solutions are generally able to take image-level backups at the virtual machine level so that data is protected as well as all of the characteristics of the VM itself are copied to a safe location. This includes all the information about the VM such as virtual hardware configuration, number, and size of the VMDKs, etc.

Use Replication for Failover Capabilities

When thinking about site-level recovery, replication is the core piece of technology that makes recovery from a site-level failure possible. Replication is essentially taking an exact copy of a running production virtual machine and copying that exact copy over to your DR facility. This secondary environment is ready to start running your production VMs in a moment’s notice and allows for what is called failover. Failover is when connectivity and data access is shifted from a production location to a secondary location such as a DR site.

Replication allows you to skip past the time it would take to “restore” your data in your DR facility during a disaster since this process is essentially taken care of during the replication process itself. This saves a tremendous amount of time when you need it most to get your data back online quickly.

Use Network Automation in the DR Site Recovery

One challenge when replicating VMs to a DR facility is the VMs are essentially in a different network, compared to production. This may include a different local subnet, different gateway, different DNS servers, and different virtual switch settings in the DR vSphere cluster.

Using a backup solution that provides the means for automating the network reconfiguration process for these replicated virtual machines is an essential characteristic to look for when picking a data protection solution for this role.

Performing the reconfiguration of the above-mentioned items manually would be painstakingly difficult if you have tens or possibly hundreds of VMs contained in a DR facility that should be brought online. A data protection solution that can perform the replication task, automatically reconfigure the network configuration of the virtual machines and reconfigure the virtual switches for replicated VMs can take the heavy lifting out of this tedious task during a DR situation.

Verify Backups

A huge concern for VMware vSphere administrators is making sure the backups that are taken of the vSphere environment are valid and the integrity of the data is good. The worst thing that can happen is to find out that a backup is corrupted in a true disaster recovery situation!

Verifying backups is a step that often gets missed or overlooked in day-to-day data protection tasks. Using a solution that can automatically take the heavy lifting out of this process is a great way to ensure that backups are useable and contain good data. This is a critical step in bolstering any data protection plan.

Automatic Backups, Replication, Verification, and Network Automation

All of the above points underscore the need for a data protection solution that is able to provide the automatic backups, replication, verification of backups, and network automation for DR reconfiguration.

BDR Suite allows organizations to do all of the above effectively, efficiently, and in a way that allows VMware vSphere administrators to provide operations support and BC/DR duties intelligently.

BDR Suite solution provides:

Support for the latest release of VMware vSphere clustering features
Automated Backups with intelligent scheduling
Efficient Replication operations along with replica seeding
Automatic backup verification to ensure backups are valid
Network Automation for Replica failover network reconfiguration

With these and many other features and functionality, BDR Suite allows meeting VMware vSphere Disaster Recovery Best Practice objectives easily and powerfully.

VMware HA and DRS Explained

Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.

Rate this post

VMware vSphere Disaster Recovery (DR) Best Practices

Table of Contents

Develop a Disaster Recovery Plan

Provision a Disaster Recovery Site

Make Use of VMware vSphere Cluster Technologies

Backups are Essential

Use Replication for Failover Capabilities

Use Network Automation in the DR Site Recovery

Verify Backups

Automatic Backups, Replication, Verification, and Network Automation

Related Posts:

About the Author: Brandon Lee

Leave A Comment Cancel reply

VMware vSphere Disaster Recovery (DR) Best Practices

Table of Contents

Develop a Disaster Recovery Plan

Provision a Disaster Recovery Site

Make Use of VMware vSphere Cluster Technologies

Backups are Essential

Use Replication for Failover Capabilities

Use Network Automation in the DR Site Recovery

Verify Backups

Automatic Backups, Replication, Verification, and Network Automation

Related Posts:

Share This Story, Choose Your Platform!

About the Author: Brandon Lee

Leave A Comment Cancel reply

Subscribe for Blog Updates