Organizations today have more options than ever before to ensure having multiple copies of their data in various places. Having multiple copies of data has been fundamental to data protection since the idea was conceived and helps to ensure that you always have a good copy of your data in another location besides your production location in harmony with the 3-2-1 best practice rule for data protection.
With the onset of software-defined technologies, there are new capabilities to be able to ensure having data in multiple locations. One such idea is the software-defined stretched cluster that provides copies of data in software-defined datastores at two locations. There is also the more traditional idea of replicating virtual machines to a secondary location such as a DR facility.
In this post, we will take a look at software-defined stretched clusters vs VM replication to see the advantages and disadvantages of both approaches to ensuring multiple copies of data exist for high-availability.
Software-defined Stretched Clusters
In the world of software-defined stretched clusters, many powerful capabilities are afforded by the software abstracted storage layer. Taking VMware vSAN as an example, the vSAN stretched cluster is a great way to easily have the ability to have multiple copies of the virtual machine objects required for virtual machine availability in more than one location.
VMware vSAN is a specialized object store that has the built-in capability to create stretched clusters that span more than one physical location. In the simplest of configurations, the two-node vSAN stretched cluster has two nodes that hold the data for the vSAN datastore. The witness component provides the tie-breaker component that establishes quorum for VM availability. In this configuration, the VM data objects are protected via a RAID 1 mirror approach. There are two fault domains, a preferred and secondary fault domain.
Data is continuously synchronized between the two data nodes so that data is written both to the preferred host and the secondary host. In this way, if the preferred host fails for some reason, the secondary host has all the virtual machine data and is able to restart the virtual machine in the secondary fault domain. The vSAN stretched cluster functionality is natively built into the vSAN solution which is configurable via the vSphere web client. This allows configuring the stretched cluster functionality right within the vSphere infrastructure without any special storage configuration outside of the vSphere environment.
This means that with little effort administrators can effectively configure synchronous storage that exists in different fault domains that is able to easily provide resiliency against an entire site failure. This also means that both failovers and failbacks are considerably easier than failing over to replicas that have been seeded by replication. This is due to the fact that replicas are point-in-time replicas that are never completely up-to-date copies of production. They may be close, but not exact copies. They contain the latest data as of the last replication interval which is generally defined by the RPO SLAs. These are generally defined differently between one business to the next depending on how much data the business is able to withstand losing. With VM replica failbacks, the challenge is reversing the process and replicating data from the secondary location back to the primary production environment. Not so with the stretched cluster as the data is synchronously up-to-date in both locations.
A great advantage of stretched cluster hardware is the hardware is able to be utilized in an active manner. The secondary fault domain hardware is able to carry workloads actively. This helps to justify the expenditure of the hardware to be housed in a DR or secondary location.
There are drawbacks to the stretched cluster configuration from a latency and bandwidth perspective. Generally speaking, stretched clusters require no greater than 5 milliseconds of latency between sites to be a viable solution. This rules out extremely long distance stretched clusters due to latency restrictions. With the latency restrictions, stretched clusters are not able to provide geographic diversity for data protection. Again, they are typically connected in a metropolitan area network or on the same campus location.
Advantages:
- Easy configuration – in the box with VMware vSAN
- Provides synchronization of VM objects between the data hosts in both fault domains
- Extremely easy failover and failbacks
- Allows making use of the hardware at the secondary location
Disadvantages:
- Distance limitations – need to typically be in the same metro area or campus
- Would not provide geodiversity for disasters affecting the geographic region
Replicated VMs
Virtual machine replication has long been used as a means to provide “warm” VMs in a DR or secondary location that are able to provide resiliency in the case of a total site-level failure. Virtual machine replication when initially configured performs a full copy of the virtual machine to the secondary location. With each following replication interval, changes are incrementally synchronized to the target replicated virtual machine. For all intents and purposes, the resulting virtual machine is an exact copy of the running production virtual machine in the production environment. The replica VM contains all of the current data as of the last replicated changes.
As a result, with VM replication, the business has to decide how much data they are willing to lose. With this decision made, the replication interval can be configured accordingly. As mentioned in the stretched cluster section, this is a disadvantage of VM replicas when compared to stretched cluster VM objects which always have up-to-date information at both locations.
The failover and failback process are also much more complicated to do correctly with VM replication. After a successful failover, the process has to be performed in reverse to get the active VM data back at the preferred production location.
Hardware utilized as a “standby”, passive configuration as a replication target is not utilized other than being sent the replicated data with each replication interval. The disadvantage here is that typically it is a much harder sale to make to management to convince purchasing additional hardware that would theoretically only be used during a production site failure.
When comparing stretched clusters and VM replication, replication is not bound by the latency requirements that exist with stretched clusters. Replication of virtual machines is an asynchronous process, so it has much more lenient requirements from a network perspective. It is a much more suitable mechanism for environments with limited bandwidth or high-latency connections between sites. Additionally, it provides a great way to have VM replicas stored in different geographic locations which are generally not possible with stretched cluster configurations. There is nothing to say that replication cannot be used in conjunction with stretched clustering to provide additional resiliency to help diversify the geographic location of VM copies. In this sense, even if software-defined storage technologies such as vSAN are utilized, it can be a complementary technology on top of a stretched cluster configuration.
Advantages:
- VM Replication is not bound by strict latency and bandwidth requirements
- Able to provide geographic diversity of data
- Can be used in conjunction with stretched clustering to provide additional geo-data diversity
Disadvantages:
- Replicated VMs RPOs are always behind the production
- Failover and failbacks are more complicated
- Hardware for replicated environments is passive and generally not utilized
Thoughts
With the new options available to organizations today by way of software-defined solutions, organizations must consider all the advantages and disadvantages of each solution. Then, decisions must be made based on the needs of the business.
Having multiple copies of virtual machine data in virtualized environments is necessary to ensure effective business continuity/disaster recovery. Both software-defined stretched clusters and VM replication are able to provide a mechanism to have multiple copies of VM data in different locations. Each has advantages and disadvantages to be considered.
As a whole, stretched clustering technology provided by the likes of VMware vSAN is a very viable solution to providing this functionality. However, there are very strict latency and bandwidth requirements that must be taken into consideration before deploying. Replicating virtual machines to a secondary location is the traditional approach that is dependable and is not bound by the strict latency requirements. Also, replication can be used in conjunction with software-defined storage technologies such as VMware’s vSAN.
Organizations most likely will want to take a holistic approach and utilize both types of technology together as complementary rather than competitive. However, the characteristics and best use cases are worth noting for each when designing data protection and high-availability mechanisms for business-critical data.
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.