Join Rick Crisci for an in-depth discussion in this video vSphere 6.5: What's new in HA, part of VMware vSphere 6.5 Essential Training Part 2.
- [Instructor] In this video, we'll take a look at what's new for High Availability in vSphere 6.5, and there are a few big changes to the feature set of High Availability in 6.5, some really great new features that can help us not only respond to outages, but actually prevent them, and that's where proactive HA comes in. Proactive High Availability is a way to prevent outages, and we know with High Availability, if you have a host failure or something along those lines, HA normally involves downtime.
If we have an HA type event, we're going to have virtual machines that are down. Proactive High Availability seeks to avoid preventable downtime by detecting hardware issues and proactively migrating virtual machines to different hosts. Proactive HA actually works with hardware vendors, different monitoring solutions, to find out about the health status of hardware components, like things like power supplies, memory, fan, storage systems, network systems, and then we can configure High Availability to respond according to the failures of these hardware components, so this can help us avoid downtime for our virtual machines by detecting hardware failures, and then taking that host and performing some sort of action on it.
And one of those possibilities is that a host can be placed in something called quarantine mode. Now, this feature does require DRS. We can't enable proactive High Availability unless we have DRS enabled. So proactive High Availability is actually using V motion, and don't confuse that with normal HA. Usually HA does not use V motion. Usually a host fails and then those virtual machines are booted up on other ESXI hosts, and that's not V motion.
It requires shared storage, but it's not V motion. Proactive HA actually uses V motion to migrate virtual machines with no downtime. Okay, so when we set up a cluster with proactive High Availability, there's a couple of different automation levels that we can choose. Number one, we can choose manual mode, and this is very similar to DRS. So if we set up proactive HA in manual mode, what will happen is if there was some sort of problem detected on the hosts, V center will give us suggestions about where virtual machines should be relocated to, but it won't automatically move any VM's to different hosts, whereas with automated mode, virtual machines will be automatically migrated to hosts that are healthy, and the hosts that are in a problematic state or the degraded hosts will be entered into either quarantine or maintenance mode, depending on how we've configured proactive HA.
When we're configuring proactive HA, we can choose from a few different options of host modes. The first host mode that we can choose from is called quarantine mode, and quarantine mode basically just dictates that no new VM additions can be made on that host. So no new virtual machines are gonna be moved to that host by DRS, it's essentially going to sit there with all of the VM's that are currently on it, but just no new VM's will be added to that host, whereas with mixed mode, all of the VM's will continue to run on that host unless there's some sort of severe hardware failure.
So in mixed mode, proactive HA is not going to move anything, it's not gonna do any V motions unless the failure is something severe, whereas with maintenance mode, proactive HA will place that host that's degraded into maintenance mode and migrate every single virtual machine off that host using V motion. Another new feature of High Availability in vSphere 6.5 is orchestrated restart, and the purpose of orchestrated restart is to allow High Availability to restart virtual machines in a specified order, so if we need to have certain virtual machines that have dependencies on each other, orchestrated restart is a great way to control the order in which those virtual machines boot up.
Like for example, let's say that we have multi-tier applications, maybe we have virtual machines that are web servers, database servers, and application servers, and all of these VM's work together to deliver an application. So maybe it's a web application and we need the web layer to deliver the website, we need the application layer, and on the back end, we need a database.
These sorts of applications typically don't work properly unless the virtual machines involved boot up in the appropriate order. So for example, we could create groups of virtual machines, like a group of virtual machines called web servers, and that group will contain all of the web servers in this multi-tier application, and then we can create other groups of virtual machines called the application servers for this three tier application and the database servers.
And once we've created these groups, then we can start to establish dependency rules. So, if a host fails, High Availability will restart the VM's in the database servers group first. We can create a rule for this. And then, once that group is completed, it'll move on to the application servers group, and then maybe we can create another rule that says boot the application servers group and then boot the web servers group, alright? So we can create all of these rules and this logic to ensure that if we have a multi-tier application that the appropriate components are launched in the appropriate order, that's what orchestrated restart is all about, enabling the ordered restart of virtual machines and the establishment of dependencies.
Finally, there's some big changes to admission control in vSphere 6.5, some things that you may not have even noticed if you just took a quick glance at it, but the biggest difference is the default admission control policy is controlled by cluster resource percentage. So if you ever worked with the host failures net cluster tolerance admission control method, you might be familiar with slot size.
Slot size is how High Availability used to sort of determine what amount of resources were available and what amount of resources were required to be reserved for fail over, and it was honestly a little bit of a difficult system to work with, because slot size could get skewed by virtual machines with large reservations, it was just complicated, and what happened is, a lot of the times, people would just turn off admission control, because they couldn't figure out why it wasn't working properly and why it wasn't allowing them to boot up virtual machines.
Admission control kind of took the best of both admission control policies and combined them into one. So here's how it works now, you'll go in and enable admission control and you'll say something like, "Well, I want to tolerate one host failure." Well, that's the way we've always done it, right? Well, what's actually happening under the surface is your cluster, your V center server is actually going to look at the cluster and determine what's the actual percentage of resources that I need in order to satisfy this admission control policy.
What percentage of resources do I need to hold back? We don't use slot size for this anymore. That's the big improvement is all of that complexity involved with managing a slot size is now eliminated. So let's take a look at a screenshot. Here's admission control, we're specifying we want to tolerate one host failure, and we're going to determine the host fail over capacity by cluster resource percentage instead of slot size here.
So we still have multiple options, but this is really the best way to do it. Let's figure out our fail over capacity required by cluster resource percentage instead of slot size. And we can even establish acceptable levels of performance degradation. So here at the bottom, we see performance degradation VM's to tolerate and what that means is basically the amount of performance reduction that we're willing to handle after a failure.
So High Availability will actually monitor virtual machine performance data based on distributed resource scheduler to determine whether or not there's enough capacity to continue performing adequately after a fail over. So what we can do is say, we can do 100 percent here, right? 100 percent is actually disabling this warning, or we can say zero percent, right? We want everything to perform exactly the same, even if there's a host failure.
So this percentage allows us to kind of forecast and prepare for what percentage of performance impact we are willing to handle if a host failure does occur.
- Configuring vMotion and Storage vMotion
- vSphere High Availability (HA)
- Setting DRS rules and automation levels
- Fault tolerance
- DRS data protection
- Troubleshooting vSphere
- Monitoring vSphere