In the last VMware for Beginners blog, we discussed vSphere Proactive HA. In this final blog post about High Availability in vSphere, we will learn and discuss vSphere Fault Tolerance(FT).
vSphere Fault Tolerance is a big subject and would need more than two blog posts to explain everything about the various configurations and how to work with vSphere Fault Tolerance with its full features and use cases.
Table of Contents
- What is vSphere Fault Tolerance
- How does vSphere Fault Tolerance work
- vSphere Fault Tolerance Requirements
To simplify it, in two blog posts I will only focus on what it is, how It works, and how to configure and do a simple VM FT with some failover.
What will we discuss in this vSphere Fault Tolerance?
- What is vSphere Fault Tolerance?
- How does vSphere Fault Tolerance work?
- vSphere Fault Tolerance restrictions
- vSphere Fault Tolerance requirements
What is vSphere Fault Tolerance
vSphere Fault Tolerance is a feature of VMware’s vSphere virtualization platform that provides continuous availability for virtual machines (VMs). It creates a secondary copy, or “shadow instance,” of a running VM that is synchronized with the primary instance in real-time.
The secondary instance is kept in lockstep with the primary instance using a technology called vLockstep, which mirrors all of the actions taken on the primary VM to the secondary VM. If the primary VM fails for any reason, the secondary VM seamlessly takes over without disrupting the applications or services running on it.
This provides a higher level of availability than traditional failover solutions, which typically require some amount of downtime during the failover process. With vSphere Fault Tolerance, there is no need for manual intervention or restarts, ensuring that critical applications and services remain available to end-users at all times.
With vSphere Fault Tolerance, you can create an online replication of your Virtual Machine vs Application and have zero downtime with a fully High Availability.
By eliminating even the smallest disruptions caused by server hardware failures, vSphere Fault Tolerance helps Business-critical applications to be highly available. In the event of server failure, VMware Fault Tolerance provides instantaneous, non-disruptive failover, protecting organizations from even the smallest interruption or data loss when downtime costs can reach thousands of dollars.
VMware Fault Tolerance also provides continuous availability for critical applications. When hardware fails, applications continue to run without interruptions, user disconnections, or data loss due to automatic failure detection and seamless failover. Even homegrown and custom applications can be protected by VMware Fault Tolerance, ensuring continuous availability.
How does vSphere Fault Tolerance work
vSphere Fault Tolerance works by creating and maintaining a synchronized copy of a running virtual machine (VM) on a secondary host. The secondary VM is kept in a “shadow instance” continuously synchronized with the primary VM using a technology called vLockstep.
When vSphere Fault Tolerance is enabled for a VM, the primary VM and its shadow instance are kept on separate hosts in the vSphere cluster. The shadow instance is continuously synchronized with the primary VM in real-time, mirroring all of its CPU and memory operations.
If the primary VM fails for any reason, such as a hardware failure or an operating system crash, the shadow instance takes over seamlessly, without any disruption to the applications or services running on it. This is because the shadow instance has the same state as the primary VM, including its CPU and memory contents, and can immediately continue processing from where the primary VM left off.
The takeover process is automatic and transparent to end-users, without manual intervention or restarts. Once the secondary VM takes over, it becomes the new primary VM, and a new shadow instance is created on another host in the cluster to ensure continuous availability.
vSphere Fault Tolerance provides a higher level of availability than traditional failover solutions, which may incur some downtime during the failover process. By keeping a continuously synchronized copy of the primary VM, vSphere Fault Tolerance ensures that critical applications and services remain available to end-users at all times.
The following image shows how vSphere Fault Tolerance works in your infrastructure.
The next image shows an example of vSphere Fault Tolerance failover. When an ESXi host has a problem, or the Virtual Machine stops working, vSphere Fault Tolerance automatically puts the Secondary VM online, promotes it to Primary VM, and creates a Secondary ESXi host in the next available ESXi host.
vSphere Fault Tolerance workflow rebuilds a new mirror and creates a new Primary VM and a new Secondary VM.
While vSphere Fault Tolerance provides a high level of availability for virtual machines (VMs), there are several restrictions that you should be aware of before implementing this feature:
- Limited to 8 vCPU: Fault Tolerance is limited to virtual machines with a 2 vCPU or 8 vCPU(depending on the license). This means that if your VM has multiple vCPUs, you’ll need to reduce it to use Fault Tolerance
- The maximum number of Fault Tolerant VMs allowed on a host in the cluster is 4. Both Primary VMs and Secondary VMs count towards this limit. However, you can use larger numbers if the workload performs well in FT VMs
- To configure vSphere Fault Tolerance, your system must meet specific requirements. This includes having sufficient CPU resources, meeting virtual machine limits, and ensuring the correct licensing. When setting up vSphere Fault Tolerance, you should also consider other factors, such as the type of workloads, the size of the VMs, and the overall performance and scalability of the environment
- Limited hardware compatibility: Fault Tolerance requires specific hardware configurations to function correctly. You should consult the VMware Compatibility Guide to ensure your hardware is compatible with this feature
- Limited to certain types of VMs: Fault Tolerance is not available for all types of virtual machines, such as VMs with specific devices or configurations
- Virtual machines with more than 16 virtual Disks and 2Tb size disks
- Virtual machines with more than 128 GB of memory
- Virtual machines with more than 8 virtual CPUs (vCPU)
- Virtual machines with physical RDM (Raw Device Mapping) disks
- Virtual machines with virtual RDMs in physical compatibility mode
- Virtual machines with CPU affinity configured
- Virtual machines with specific virtual devices, such as USB devices, parallel ports, and SATA controllers
For example, virtual machines with the following devices or configurations cannot be protected with Fault Tolerance:
- USB devices
- Parallel ports
- SATA controllers
Check the following table of the vSphere Fault Tolerance max limits.
vSphere Fault Tolerance Max Limits | vSphere Standard | vSphere Enterprise Plus | vSphere+ |
---|---|---|---|
VMs vCPU | 2 | 8 | 8 |
Virtual disks | 8 | 16 | 16 |
Disk size | 2 Tb | 2 Tb | 2 Tb |
RAM per FT VM | 128 Gb | 128 Gb | 128 Gb |
Virtual machines per host | 4 | 4 | 4 |
Virtual CPU per host | 8 | 8 | 8 |
vSphere Fault Tolerance Requirements
To use vSphere Fault Tolerance, you must ensure that your environment meets the following requirements:
- Compatible hardware: Fault Tolerance requires specific hardware configurations to function correctly. You should consult the VMware Compatibility Guide to ensure your hardware is compatible with this feature
- At the host level, the CPUs in the host machines must be compatible with vSphere vMotion and must also support Hardware MMU virtualization (Intel EPT or AMD RVI)
- Virtual machine requirements: Fault Tolerance is limited to virtual machines with a 2 vCPU or 8 vCPU(depending on the license)
- Network requirements: Fault Tolerance generates additional network traffic to keep the primary and secondary VMs in sync. You should ensure that your network infrastructure can handle this increased traffic and that your hosts are connected to the same network switch
- Storage requirements: Fault Tolerance requires additional storage resources to store the shadow instance of the VM. You should ensure that you have enough storage capacity to accommodate the additional overhead.
The hosts must have an FT-compatible storage device, such as a shared or replicated storage system, so that the FT-enabled VMs can be replicated across hosts.
- Host requirements: Fault Tolerance requires that the primary and secondary VMs be located on separate hosts in the vSphere cluster. Additionally, your hosts must be running in a vSphere HA cluster and have access to shared storage
The following CPUs are supported.
- Intel Sandy Bridge or later. Avoton is not supported
- AMD Bulldozer or later
Ensuring that your environment meets these requirements allows you to successfully implement vSphere Fault Tolerance and provide continuous availability for your virtual machines.
With the vSphere Fault Tolerance requirements, we finish this first part of VMware for Beginners – vSphere Fault Tolerance.
Read more on our VMware for Beginners series:
VMware for Beginners: A Step-by-Step Guide to Learn VMware and Boost Your Career
VMware for Beginners – vSphere HA Configuration: Part 12(a)
VMware for Beginners – vSphere HA Configuration: Part 12(b)
VMware for Beginners – vSphere HA Configuration: Part 12(c)
VMware for Beginners – What is vSphere Proactive HA?: Part 13
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.