With today’s high-speed WAN links and tremendously powerful features of Windows Failover Clustering that powers Hyper-V environments, organizations have many capabilities at their fingertips to solve various business problems or various use cases.
Microsoft supports a configuration of Windows Failover Clustering called Geographically Dispersed Clusters (majority node set). They are also sometimes referred to as “Stretched Clusters” or “Geo-Clusters”. This cluster configuration allows member nodes in Windows Clusters to exist in different sites from one another.
Table of Contents
- Microsoft Hyper-V Multi-Site Clusters Use Cases
- Microsoft Hyper-V Geographically Dispersed Clusters Challenges
- Geographically Dispersed Clusters Networking Considerations
- Windows Server 2016 Site-Aware Failover Clusters
- Concluding Thoughts
This allows for some really unique use cases for customers. However, with the powerful configuration options this opens up to customers, this requires tremendous planning and coordination with network teams, storage teams, and hardware vendors. Network and storage requirements must be satisfied and the hardware configuration must be compatible with the geographically dispersed configuration.
Let’s take a closer look at Hyper-V multi-site clusters, use cases, and the unique configuration considerations involved in provisioning.
Microsoft Hyper-V Multi-Site Clusters Use Cases
Clustering technology is not new and certainly is not a new concept in the world of Windows Servers. However, when thinking about Windows Server clusters, it is generally assumed you are referring to the same location. Most of us are familiar with 2, 3, or more nodes all in the same server rack that are participating in a Windows Server cluster.
Why would you want to scale a Windows Server cluster across multiple sites?
What benefit would you gain by doing so?
Traditionally, organizations will have a cluster that resides in the primary datacenter and a secondary cluster that exists in a DR facility that receives replicated virtual machines ready to be powered on in case of a site failure in the production location. This is known as an active-passive configuration.
However, with a geographically dispersed cluster, the cluster nodes exist in multiple geographic locations but are a part of the same “logical” Windows Server cluster.
The primary advantage of the geographically dispersed cluster is the ability to seamlessly migrate virtual machines from one geographic location to another geographic location. The stretched cluster provides a great solution for possible geographic locations that are prone to natural disasters.
If a looming natural disaster such as a hurricane is headed for a specific location, business-critical virtual machines can simply be live-migrated to another host in the stretched cluster.
The alternative solution is Hyper-V replicas. Hyper-V replicas replicate warm virtual machines to another Hyper-V host or cluster. The Hyper-V replica is generally preferable from a complexity and management overhead perspective compared to Hyper-V stretched clusters as they generally serve the same purpose for most applications.
Generally speaking as well, developing the applications themselves to have high-availability is generally smarter than solving this problem from a Windows Server Hyper-V cluster infrastructure standpoint.
While traditional Windows Server clusters allow organizations to have redundancy at the application level, geographically dispersed clusters allow organizations to have real-time site resiliency. There are challenges to configuring geographically dispersed clusters that involve at least the three main areas that were mentioned in the outset. These include challenges with: Let’s take a look at each of these individual areas of geographically dispersed clusters and the challenges to each specific area. There are some pretty stringent requirements that must be kept in mind when thinking about geographically stretched clusters. These are centered around both networking and storage requirements. According to Microsoft the following is required for geographically dispersed clusters and their design: Windows Server 2016 Failover Cluster technology has added to the features and overall functionality of the stretched cluster design by introducing something called Site-aware failover clusters. With site-aware failover clusters, nodes that are part of the stretched cluster can be grouped based on the physical location or “site”. This greatly enhances key functionality such as failovers, placement, heartbeats and also quorum behavior. The concept of a preferred site has been introduced that allows for configuring the preferred site for the placement of resources. This proves handy to configure your Primary datacenter location. This is achieved by the new fault domain awareness in Windows Server 2016. Fault domains describe any set of hardware components that share a single point of failure. Defining fault domains allows organizations to provide chassis, rack, and site fault tolerance that help to achieve the cloud-like uptime demanded by today’s businesses. Stretched clustering benefits from fault domains for storage affinity. Stretch clustering allows servers that are geographically distant have the ability to join a cluster. This allows apps or VMs to be run on the servers closest to the storage. This awareness of fault domains allows for the storage affinity.
Microsoft Hyper-V Geographically Dispersed Clusters Challenges
Geographically Dispersed Clusters Networking Considerations
Windows Server 2016 Site-Aware Failover Clusters
Additional technologies benefit from the fault domain technology such as storage spaces and storage spaces direct which function like the distributed RAID where multiple copies of data are kept in sync and copies are used to recover from a hardware failure.
Site-Aware Failover Clusters also include the following benefits:
Failover Affinity:
- Groups failover to a node in the same site instead of failing over to a node in a different site
- Virtual machines are moved first to a node in the same site during a “drain” operation
- CSV load balancer distributes within the same site
- CSV load balancer is the mechanism that ensures CSV disks are evenly distributed across all cluster nodes
Storage Affinity:
- VMs are moved to the same site where their storage resides and will the process of live migration to the same site as their associated CSV after 1 minute of storage being moved
Cross-Site Heartbeat Thresholds:
- This allows setting thresholds for cluster heartbeats between sites
- Two new properties have been introduced – CrossSiteDelay and CrossSiteThreshold to control the heartbeats between sites
Concluding Thoughts
Organizations today have the business need to think like public cloud providers. Making sure data, applications, and virtual machines are redundant at various levels is a must. With technologies such as geographically dispersed clusters and the new site-aware clusters in Windows Server 2016 that make use of fault domains, organizations are able to granularly define these failure points and architect Windows Server clusters accordingly. These technologies allow organizations to architect a more geographically resilient and diverse configuration for data and infrastructure.
With greater resiliency with Windows Server clustering, however, comes the downside of added complexity and management overhead of the stretched cluster design. Additionally, there are much more strict requirements both from a hardware perspective as well as network and storage considerations that must be accounted for in designing the cluster.
The network and storage design requires extremely strict requirements from a latency perspective and the private and public network connections between cluster nodes must appear as a single, non-routed LAN that uses technologies such as virtual LANs (VLANs). The strict network requirements in terms of latency and non-routed requirements can be challenging to implement. Organizations will certainly want to weigh these technologies against the more traditional options of using Hyper-V replicas as it will come down to the individual business needs and the business application’s ability to natively support failover.
Related Posts:
Virtual Machine Load Balancing with Hyper-V Failover Cluster
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.