Preventing unexpected downtime is becoming an absolute requirement for today’s businesses that are looking to effectively compete in most markets. Virtualization has brought about so many advancements in the way that high-availability can be engineered as well as the way disaster recovery plans are architected.
When using VMware vSphere there are several things to think about in preventing unexpected downtime with vSphere.
What are key areas of vSphere that need attention to ensure your data is protected, available, and resilient to unexpected downtime?
Let’s take a look at preventing unexpected downtime with VMware vSphere and see what configurations, key areas, features, and solutions you should give attention to in this important aspect of running your VMware vSphere infrastructure.
How to Prevent Unexpected Downtime with VMware vSphere
Preventing unexpected downtime with VMware vSphere is accomplished by using several key configurations to ensure your vSphere environment is resilient to failures and does not have a single point of failure. In addition, there are vSphere specific technologies that can help to overcome failure and remain resilient when components and network/storage paths fail.
Let’s take a look at the following key areas in preventing unexpected downtime:
- Shared Storage
- Resilient Networking
- Multiple Storage Paths
- vSphere HA
- vSphere DRS
By giving attention to these key areas, you can minimize the impact of failures.
VMware vSphere Shared Storage
What are some of the key concepts with preventing unplanned downtime with VMware vSphere storage? Using shared storage is a requirement for a VMware vSphere cluster that is configured for high-availability for virtual machines. When using shared storage this allows all the hosts in the vSphere cluster to have access to the storage location for the VMs that are running in the cluster.
Most modern SAN storage devices have redundancy features that can be utilized from the underlying RAID configuration of the hard disks backing the storage, multiple storage controllers, and multiple PSUs.
Additionally, by using SAN mirroring and other replication features that are contained in SAN storage devices, you can create additional copies of your data in another location such as a disaster recovery site. This allows keeping your data in multiple sites for redundancy and resiliency in times of failure.
Resilient Networking
Effective and resilient VMware vSphere environments require fault tolerance to be built into the network. This requires eliminating single points of failure in all networking paths and having multiple physical devices carrying network traffic between your ESXi hosts in the vSphere cluster.
Starting with the physical network adapters in the ESXi host, you want to have multiple physical adapters configured for each host. In this way, if you have a failure in a physical network adapter in a particular host, you have additional network adapters that continue to carry traffic. Network teaming allows having multiple physical network adapters backing your vSwitches and port groups so that you eliminate those single points of failure for your host and VM networking.
Multiple Storage Paths
Carrying the idea forward from resilient networking, storage multipathing allows you to have multiple paths for your ESXi hosts to get to shared storage. It is extremely important to have multiple paths to storage. Storage is at the heart of your virtualization infrastructure. You never want your ESXi host, running business-critical VMs to become disconnected from the storage the VMs are residing on.
This means you will want to ensure the following:
- Multiple NIC cards backing the VMkernel ports for iSCSI or NFS
- Multiple storage switches carrying traffic between the ESXi hosts and the shared storage
- “X-ed” out layer 1 cabling between the ESXi host and the shared storage
- Multiple storage controllers in your SAN device
Multiple storage paths in your VMware vSphere environment goes hand in hand with your resilient network configuration both from a logical, software and physical cabling standpoint. You want to make sure you have multiple paths for your data to traverse in order to effectively prevent unexpected downtime in your vSphere environment.
VMware vSphere HA
VMware vSphere HA is a vSphere cluster feature that you should definitely be using if you are not already. VMware vSphere HA allows your vSphere cluster to have resilient built into the architecture so that each host watches the other hosts to ensure the hosts are reachable.
When an ESXi in a vSphere cluster fails, the failure is recognized and the virtual machines that were running on the failed host are restarted on healthy hosts. There will be slight downtime for the VMs in order for the VMs to be restarted, however, this prevents extended downtime of any VMs running on a host that has failed.
VMware vSphere HA is a key component in preventing unexpected downtime in your vSphere infrastructure. One of the key benefits of vSphere HA is that it is an automated solution. If a failure happens with a host in the middle of the night, vSphere HA will start working on getting your VMs restarted on a healthy host.
A new feature of vSphere HA introduced in vSphere 6.5 is Proactive HA.
What is Proactive HA?
The vSphere Proactive HA feature is able to integrate with various hardware vendor-provided monitors that allow vSphere to monitor the actual state of hardware components in vSphere cluster hosts. When a failure or warning occurs on a component (such as a power supply) in an ESXi host, proactive HA is able to proactively vMotion virtual machines to healthy hosts in the vSphere cluster.
With Proactive HA you suffer no downtime as VMs are vMotioned to a healthy ESXi host(s) before any downtime occurs due to a component level failure continuing to deteriorate the stability of a physical ESXi host. Again, this takes the burden of catching hardware warnings and failures manually off the vSphere administrator. This type of resiliency automation is key to preventing unexpected downtime in VMware vSphere environments.
VMware vSphere DRS
You may not initially think of VMware vSphere DRS as a component in your strategy to prevent unexpected downtime since it is a resource scheduler. However, VMware vSphere DRS plays a key role. VMware vSphere DRS works in tandem with vSphere HA. When vSphere HA or Proactive HA operations move VMs to healthy ESXi hosts remaining in the vSphere cluster, resource pressure can build in the cluster due to fewer CPU and memory resources servicing the VM workloads.
By using vSphere DRS in your cluster, DRS is able to equalize and move workloads around to balance out resources as needed, especially when resource contention builds in the cluster due to a host failure.
Outside of vSphere HA events, DRS continuously monitors the vSphere cluster for resource contention. The last thing you want to have happen is to have an overloaded ESXi host that leads to severe performance degradation which in turn leads to unexpected downtime. The downtime could occur due to unusable applications or an ESXi host that becomes unresponsive or crashes due to exhausted resources. With this being considered, VMware vSphere DRS is an integral part of an effective strategy to prevent unexpected downtime.
Prevent Unexpected Downtime with Backups & Replication
The final key component of an effective strategy to prevent unexpected downtime in your VMware vSphere environment is backups and replication.
Backups are the most basic means of protecting your data. True backups take a copy of your actual production data housed in your virtual machines and saves it to a totally separate environment that has no reliance on your production vSphere infrastructure. In this way, if your entire environment is lost, you can restore all the VMs that were contained in that environment, including all VM settings and data from the backups that were taken of the VMs.
Replication, on the other hand, takes a copy of your production virtual machine and copies it to a different vSphere environment that is housed in a secondary location like a DR facility. With each replication interval, the VM is updated with the most recent production data at the time of the replication interval. This allows you to have site-level resiliency to help prevent unexpected downtime.
If you lose an entire site, you can failover to your replicated virtual machines located in the secondary DR site. Traffic can be shifted from the primary production location and serviced from the DR facility. In this way, an otherwise major disruption caused by losing an entire site can be minimized with the replication process.
In order to have an effective backup and replication strategy, you need a data protection solution that is able to effectively backup and replicate your VMware vSphere virtual machines. Vembu BDR Suite allows building an effective backup and replication solution in your vSphere environment.
In addition, Vembu BDR Suite allows creating effective failover mechanisms with automated virtual network and IP addressing reconfiguration to minimize unexpected downtime as a result of failing over to a secondary location during a site-level failure. Vembu BDR Suite Free Edition allows you to have this functionality for 10 VMs. Be sure to download a trial copy of Vembu BDR Suite here.
Wrapping Up
Preventing unexpected downtime with VMware vSphere is dependent on making use of several key technologies and best practice configuration recommendations in your vSphere environment. This includes implementing shared storage, network resiliency, storage multipathing, vSphere HA, and vSphere DRS. In addition, backups and replication are both crucial to ensure uptime as well as to effectively protect your data in times of disaster where both uptime and data are in jeopardy.
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.