In this article, let’s talk about one of the key interesting features available with VMware vSphere which helps us to ensure that our mission critical virtual machines are always up and running.
What is VMware vSphere Fault Tolerance?
Fault tolerance is a feature that allows virtual machines (VMs) to remain operational in the event of hardware or software failure. This is achieved by replicating the VM across two separate physical servers, ensuring that even if one server fails, the other server can take over and keep the VM running. This provides a high degree of redundancy and reliability, ensuring that services remain available even in the event of a hardware or software failure.
So what differentiates Fault Tolerance from High availability?
In simple terms, fault tolerance is about designing systems that can withstand and recover from failures, while high availability is about designing systems that minimize downtime and ensure continuous access to services. Fault tolerance is more concerned with fault recovery, whereas high availability emphasizes uninterrupted service and minimal impact on users.
VMware vSphere Fault Tolerance provides Zero downtime and Zero data loss for our mission critical virtual machines by creating a Primary Virtual Machine and an exact replica Secondary Virtual Machine which can replace the primary virtual machine in case of any failure.
VMware vSphere Fault Tolerance supports Virtual Machines configured with not more than 4vCPU and 64GB RAM, maximum of 4 fault tolerance Virtual Machines running on one ESXi host.
We can also make use of VMware vMotion to migrate both primary and secondary virtual machines on desired ESXi hosts
When working with VMware vSphere Fault Tolerance, when we enable FT on a Virtual Machine – it becomes the Primary Virtual Machine and duplicate Virtual Machine known as Secondary Virtual Machine gets created on another ESXi host, which is an exact replica of the Primary Virtual Machine and can take over at any point of time without any interruptions.
When we talk about these Primary and Secondary Virtual Machines, these are logically identical representing a single virtual machine state and a single network identity, however located on two different datastores. Both Primary and Secondary Virtual Machines have their own set of virtual machine files (including VMX and VMDK files), which are kept in sync.
Apart from the Virtual Machine files, there are two more files which are considered important when working with VMware vSphere Fault Tolerance i.e. shared.vmft which is also known as metadata file responsible for maintaining the UUID for both primary and secondary and .ftgeneration is responsible for avoiding split brain scenario which can occur when host becomes isolated, .ftgeneration ensure only one virtual machine (Primary/Secondary) can read and write to virtual machine disks.
When we enable VMware vSphere Fault Tolerance on a Virtual Machine an initial full synchronization takes place between the two VMDK files with the help of VMware Storage vMotion, ensuring primary and secondary virtual machines have the exact same disk state.
Once the initial full synchronization is done, VMware vSphere Fault Tolerance starts mirroring the VMDK write operations between the primary and secondary over the FT logging network, to ensure the storage of the replicas continues to be identical.
When it comes to interoperability of VMware vSphere Fault Tolerance with VMware vSphere High Availability and VMware vSphere Distributed Resource Scheduler, VMware vSphere High Availability is required for Fault Tolerance and VMware vSphere Distributed Resource Scheduler helps to place Primary and Secondary Virtual Machines on the best host configured, however doesn’t load balance fault tolerance Virtual machines automatically, ensuring both primary and secondary virtual machines are not running on the same ESXi host because a host failure can result in losing both the Virtual Machines.
Note on Fault tolerance vs High availability:
The basic difference between fault tolerance and high availability is that fault tolerance is the ability of a system to continue to operate properly in the event of a hardware, software, or network failure, while high availability is the ability of a system to remain operational and accessible for use at any given time. Fault tolerance is more concerned with the ability of a system to tolerate failure and keep running, while high availability focuses on ensuring that the system is accessible and available for use at all times.
Click here to learn more about High availability and DRS in VMware
Read more on:
How to Configure a vSphere High Availability Cluster
High Availability vs Disaster Recovery – Are Both Needed?
High Availability and Disaster Recovery considerations in Cloud
Feel free to download our Free backup solution here, which supports backups of multiple platforms, including agentless VMware backups.
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.