Windows Server Hyper-V clusters are built on top of Windows Server Failover Clusters which provides the underlying mechanism for high availability and resource distribution for Hyper-V. Windows Failover Clustering is able to protect production workloads running on top of multiple hosts by utilizing a special means for establishing node majority in the Windows Failover Cluster, called quorum.
Table of Contents
- What is Windows Server Failover Split Brain?
- Windows Server Failover Cluster Quorum
- New Windows Server Failover Cluster Quorum Capabilities
- Configuring Windows Server Failover Cluster Quorum
- Concluding Thoughts
In this post, we will take a look at Windows Server Failover Clustering Quorum and how it relates to the Hyper-V.
- What is Quorum?
- How does it protect Windows Failover Cluster resources?
- How might this look in a Windows Server Failover Cluster running Hyper-V?
What is Windows Server Failover Split Brain?
In a Windows Server Failover Cluster configuration, you want your clusters to be protected against a “split-brain” scenario.
What is split-brain?
Split-brain as you might discern is a situation that can develop with members of a cluster configuration where more than one node thinks it is in control of cluster resources when only one of those nodes should be in control of that resource. This can happen when there is a “partitioned” cluster. This can be due to cluster nodes becoming separated from one another generally due to a network connectivity issue. Each of the Windows Server Failover Cluster hosts in a network partitioned state are not able to communicate with one another and each may assume the other host has failed it needs to take control of resources.
As you can imagine, this could lead to many issues related to two hosts fighting over shared resources. The resources in question are virtual machines. Fortunately, in most situations, only one host will be able to gain access to a virtual machine disk. However, split-brain can lead to all kinds of failover and failback scenarios that can lead to resources flapping up and down. Below, you can see both hosts cannot see each other. Each Hyper-V host will assume the other has failed and attempt to assume ownership of virtual machine resources.
Windows Server Failover Cluster Quorum
To resolve the Windows Server Failover Cluster split-brain scenario, Microsoft has introduced a mechanism called Quorum which allows Windows Server Failover Clusters to resolve the potential issue of an “even vote” among Cluster nodes.
How does Quorum resolve these issues?
In terms of human group functions, quorum is defined as the “minimum number of members of an assembly or society that must be present at any of its meetings to make the proceedings of that meeting valid”. In the same sense, Windows Server Failover Clusters hosting the Hyper-V role, the cluster nodes serve the role of serving out highly available virtual machines that are not subject to the potential split-brain scenario.
A common misconception about cluster quorum is that a cluster will stop running if too many failures occur to prevent the remaining nodes from taking on more workloads than the remaining number of nodes are able to host, leaving them overcommitted. In fact, the cluster does not take capacity limitations into consideration when determining the operational state of the cluster.
As mentioned earlier, quorum is designed to overcome situations where there are network communication issues between sets of cluster nodes that two servers, do not try to host resources and perform disk I/O writes to the same disk at a time. This can lead to corruption. Quorum is in place to ensure there is only one owner of a particular resource at a time. Having quorum is based on a majority of voters in a Windows Failover Cluster. This is accomplished with a voting algorithm where more than half of the voters must be online and able to communicate with one another.
There are a number of ways of quorum being determined in a Windows Server Failover Cluster configuration. The oldest traditional mechanism was to have a single disk witness that allowed determining quorum. As long as a Windows Server Failover Cluster node is able to reach this witness disk, it is able to remain online. However, this method of determining quorum is no longer recommended as it is a single point of failure. If the witness disk goes offline, then all nodes no longer have quorum and are forced offline.
The more common and recommended quorum mechanism involves utilizing node majority. Node majority as you might have already discerned, is determined by each node in the cluster being able to see and communicate with a majority number of nodes, including itself, out of the total number of nodes in the cluster, it can remain online. This is accomplished by having an odd number of hosts in the Windows Server Failover Cluster. Clusters that have an even number of nodes in the cluster, can also make use of a disk or file share witness that can serve as a tiebreaker for the node majority set.
New Windows Server Failover Cluster Quorum Capabilities
There have been new additions to the abilities of the Windows Server Failover Clustering quorum capabilities over the past several iterations of Windows Server. These have included the following:
- Windows Server 2012 – dynamic quorum (also called dynamic witness)
- Windows Server 2012 R2 – ability to remove disk witness
- Windows Server 2016 – ability to create and use a Cloud Witness
Dynamic quorum
Dynamic quorum is the new ability of the Windows Failover Cluster service to adjust the vote of remaining active nodes to ensure that quorum can be maintained in the event of yet another node failure or shutdown. Starting with Windows Server 2012, dynamic quorum is enabled by default. This is helpful in situations where the quorum requirements of a cluster are changing frequently. Dynamic quorum allows Failover Clustering to change the vote in case there is a tied vote where one vote is stripped from a node in question. Microsoft recommends to always use a disk or file share witness when deploying Windows Server 2012 R2 clusters and higher.
Cloud Witness
Cloud Witness is a new type of Windows Server Failover Cluster quorum witness that was introduced in Windows Server 2016. The Cloud Witness functionality allows organizations to take advantage of Microsoft Azure as the quorum arbitration point. Utilizing Azure Blob storage to read/write a blog file allows Windows Failover Clusters to maintain quorum in the case of a split-brain resolution. This configuration can especially be of value in a multi-site stretched Windows Server Failover Cluster where nodes are stretched across datacenters.
An example may be an organization with two datacenters and Windows Server Failover Cluster hosts in each datacenter participating in a stretched cluster. It is recommended to host the quorum witness in a separate datacenter outside of the two datacenters with participating cluster nodes. This would require a third separate datacenter. If an organization does not have a third datacenter or does not want to host the overhead of a file share witness on a separate highly available file server, the Azure-backed cloud witness is an extremely powerful and practical option that provides the “tie-breaker” quorum vote in a Windows Server Failover Cluster.
This provides many advantages, including:
- Uses Azure infrastructure for the cloud witness
- Azure Blob storage is used which minimizes cost
- The same Azure Storage account can be used for multiple clusters – stored as separate blob files per cluster
- This is a built-in cloud witness type with Windows Server 2016
Configuring Windows Server Failover Cluster Quorum
In general, allowing the Windows Server Failover Cluster the ability to automatically configure quorum will be best for most, however, you do have the ability to manually configure the quorum mechanism in Windows Server Failover Clustering. Below, in Failover Cluster Manager, you can right-click the cluster name and choose More Actions >> Configure Cluster Quorum Settings to launch the Configure Cluster Quorum Wizard.
By default, you can see the quorum configuration for a Windows Server Failover Cluster is determined automatically by letting the cluster control these settings.
In the Voting Configuration, you can choose which nodes can participate in quorum voting.
On the Select Quorum Witness configuration screen, you can select the witness options for the cluster including:
- Disk witness
- File share witness
- Cloud witness
- No quorum witness
Concluding Thoughts
Windows Server Failover Cluster technology backing Microsoft Hyper-V provides a powerful mechanism to withstand failures, provide high-availability, distribute resources, and also protect against “split-brain” scenarios. Split-brain is a very real problem that can lead to very odd cluster behavior and possibly corruption of data. With the quorum functionality found in Windows Server Failover Cluster services, Microsoft has provided a way for cluster nodes to determine if a node is able to participate in the Failover Cluster. The node majority mechanism is a simple way for odd numbered host clusters to simply look at the majority of reachable nodes and determine if quorum is met. With new versions of Windows Server, Microsoft has introduced new functionality such as dynamic quorum and the cloud witness to further extend the ability of Windows Server Failover Cluster quorum functionality to meet the demands of today’s multi-datacenter, stretched clusters and maintain the availability of the Failover Cluster resources.
Fortify your Hyper-V platform with the unrivaled prowess of BDRSuite! Shield your precious data with confidence and defend it against any looming threat. Trust in the power of our cutting-edge solution to safeguard your digital assets and keep your virtual world impregnable. Secure your Hyper-V platform now and let your data thrive under the wing of uncompromising protection.
Related Posts:
Windows Server Failover Cluster Hyper-V Basics
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.