From the course: vSphere 6.7 Professional Part 03: Monitoring Tools

Demo: Monitoring virtual machines in vSphere 6.7 - vSphere Tutorial

From the course: vSphere 6.7 Professional Part 03: Monitoring Tools

Demo: Monitoring virtual machines in vSphere 6.7

- [Instructor] In this video, we'll take a look at a few of the different tools that are available for you to monitor your VMs, and we'll talk about some key performance indicators that you may want to be aware of. And in order to demonstrate these tasks, we're going to be using the VMware hands-on lab environment. These are available at hol.vmware.com. The lab that we're going to focus on in this particular lesson is the vSphere 6.7 Performance, Diagnostics, and Benchmarking lab. So I'm not really going to follow the instructions of this particular lab, I'm going to kind of do my own thing. But this hands-on lab includes a lot of great virtual machines that you can use to generate workloads, and get us some nice statistics to look at within these performance charts. So if you're looking to improve the performance of your vSphere environment, this is a great lab to take a look at. Now, also, I'll note that the lab is for vSphere 6.7, but these lessons are applicable to either vSphere 6.5, or vSphere 6.7, and we're going to be using the vSphere web client. So here we are in the hands-on lab environment. I'm just going to do a few things to give myself a little more room to work with. Going to make my browser full screen, and I'm going to unpin all of these little panes that I don't need to look at right now. So I'm just going to unpin them, and now I've got lots of room to work with, and I'm going to start by going to Hosts and Clusters. So here you can see in the Hosts and Clusters view. I've got all these virtual machines powered on, all these perf worker VMs, and so, let's start by focusing on one of these virtual machines. I'm just going to pick perf worker 02A, and anytime we're talking about monitoring VMs, what we want to do is click on the VM, and go to the monitor tab, and under the monitor tab, there's a variety of information available. So here I could see if there was any alarms that were currently triggered. I can also see all of the alarm definitions for this particular VM, so if we click on Triggered Alarms, it's going to show me I don't have any alarms triggered right now. Alarm definitions, these are all of the alarms that can possibly get triggered by conditions on this virtual machine, or by events on this virtual machine. We got a bunch of alarms defined there, but what I really want to focus on are the performance charts that are available in the vSphere web client. So if I click on Performance here, I can take a look at Overview, and Advanced Charts. So let's start with the Overview charts. And the overview charts are exactly what they sound like: they're a great way to get an overview, performance-wise, of what's happening with this particular virtual machine. So if I'm not really sure what the issue might be, if I'm just experiencing slowness on a VM, the overview charts are a great place to kind of begin my troubleshooting process, because there's a whole bunch of information here on CPU percentage, on memory usage, on CPU usage, that could potentially point me in the right direction. If I'm not really sure what the issue is with this VM, maybe it'll become apparent as I'm looking at these overview charts, and just getting a good overall idea of how this VM is performing. And so you may notice I'm clicking on these charts. Each chart you have to click on it to populate the data, and there's not a whole lot of great data here pointing me in the exact right direction, so at that point, I may move on and look at some of these advanced performance charts. Now the advanced performance charts are more complex, for sure. They're more complicated to utilize, and you really have to have a decent idea of what area of performance you want to focus on before you get into the advanced charts. So for example, I may want to look at some memory or CPU as a particular area of concern with this virtual machine. So let's go back to perf worker 02A, and let's look at some of the advanced performance charts here. And what I'm going to do, is I'm going to start with a real-time chart, so let's go to chart options here. When you look at a real-time chart for a virtual machine, or for anything in the vSphere web client, those real-time charts are updated every 20 seconds. Now if I want to look a longer time interval, I can, but I'm going to simply look at a real-time chart, and so I'm going to choose a memory performance chart. Again, real-time, so it'll update every 20 seconds, and I want to take a look at a few key metrics here, so I'm going to deselect Active Memory. I want to look at Ballooned Memory, because I want to get an idea of if any ballooning activity is happening on this particular VM. Then I'd like to look at other charts, like Consumed or Granted Memory, I can look at Entitlements, to see how much memory this virtual machine is being granted at the moment, I can see if swapping activity is occurring as well, so that would be bad. If I see swapping activity occurring, I know that we have a serious memory contention issue, and so I'm just going to go ahead and hit Okay here, and take a look at this performance chart that I've just designed for this particular VM. And it's going to display all this information real-time, so you'll see the chart kind of moving over towards the left as new information populates. So for example, I can click on Ballooned Memory, I can see there hasn't been any ballooning activity. I can click on Memory Entitlement, I can see at any given moment, this number can change. It depends on the share structure, and how much memory is in use. So my entitlement can potentially change depending on what's going on on the host at any given moment. And at the moment, I have no swapping activity, so that's good. If you see ballooning activity, that means the host is attempting to reclaim memory. Swap in and out, that means the host is really starving for memory. So that's a basic real-time memory chart for a virtual machine. Let's take a look at another real-time chart. Let's take a look at CPU this time, and I am just going to go ahead and pick the first processor of this virtual machine, and I want to look at two things: number one, I want to look at Co-stop. Basically what Co-stop is going to tell me is, are my processors getting out of skew? Now that's definitely not going to be the case here, because I only have one processor, but I like to highlight that metric. And the other metric that we're going to take a look at is really our most important one, CPU Ready. So actually I'm going to deselect Co-stop, because we can only select two values here, and if you click on this little information icon, it's going to tell you, yup, you can only select two distinct units at a time, so we'll look at CPU Ready, and we'll look at CPU Usage, as a percentage, and let's go ahead and hit Okay here, and see what our performance chart looks like. Okay, so now we can see our CPU Ready chart. You can see that CPU Ready was pretty high when the virtual machine initially booted. That was when I booted the VM. So, why was CPU Ready value high when I booted this VM? Well, VMs use a lot of resources as they boot, and so, remember what ready values mean. When your virtual machine is ready to execute some instruction against the processor, it enters the ready state. The moment that those instructions get executed by the processor, the VM leaves the ready state. So a lot of time, a lot of higher ready values, those are typically bad, because it means that the VM is ready to do something, but it's waiting on physical processors, and we can see here our latest CPU Ready value is 39, and if we scroll over a little bit to the right here, we can see our average CPU Ready is kind of skewed because of the boot, but it's at 118. So let's focus on this CPU Ready value right here. So let's just go with 30. Let's assume that our CPU Ready is staying somewhere around 30 milliseconds. What does that mean to us? Is that a good value, is that a bad value? It's hard to say without doing a little bit of math here. So my latest CPU Ready value's around 30, so 30 milliseconds of CPU Ready over the last 20 seconds. So that means out of 20 seconds, we have 30 milliseconds of latency. That sounds pretty good. I don't think we're going to have a big problem there. But just to make sure, I'll bring up my handy little calculator here, and we'll say, okay, about 30 milliseconds, we'll put in 30, over 20,000 milliseconds, because we had 20 seconds, so that's 20,000 milliseconds. So we can see here less than 1% of the time, less than 1% of the time our virtual machine is in the CPU Ready state, that's not bad at all. That's a great number. So it depends on what chart you're looking at. You know, if you're looking at the past day, the sampling interval will be shown right here. If you're looking at the past week or past month, the sampling interval will be shown right here. So you have to take those CPU ready values that are being expressed, and think, okay, if I have this many milliseconds, over this sampling interval, what does that mean? What's my percentage of CPU Ready? And you want to see something below 5%. That's your guideline. You want CPU Ready values to be less than 5%. Now, here I can see my average CPU usage, and the average CPU usage is around 1.6%. If this virtual machine had multiple virtual CPUs, I would now go remove some of them. This virtual machine is barely using the CPU that it has, but it only has one CPU, so there's not much to do there. I'll just leave it alone. So those are some of your good performance charts for virtual machines, and you can look at other ones, like, data store, and disk, so if we click on Disk, you can see latency information here. Typically what you want to see is something around 10 to 12 milliseconds of latency or less. That's a fairly average number for storage latency. If you have, like, an all flash storage array, or something like that, you're probably expecting to see better numbers than that. So with storage, really read and write latency, those are kind of the two big ones that you typically want to focus on from a storage performance perspective. How long are storage commands taking to be completed? Happening really quickly, are they happening really slowly? We can see the average latency for this VM is .11 milliseconds for writes, and .02 for reads. This is extremely low latency, so normally you'd probably see something up around six, seven, eight, nine, 10 milliseconds or so. Again though, it really depends on the speed of your storage array. If it's something really fast, if it's all flash, it's going to be much lower than 10 milliseconds. And then, finally, the last performance chart that I want to take a quick look at for this VM is the network performance chart. So again, I'm going to go to my advanced performance charts, and under network, there's a couple of key metrics that I want to take a look at. What I want to look at is Received, Packets Dropped, and Transmit Packets Dropped. And if you are seeing any values here, if you're seeing dropped received, or dropped transmit packets, that's usually a really strong indication that there's some sort of congestion in your network. So that's what I want to think when I see those numbers that are higher than zero, is I've got some kind of congestion going on in my network that I need to address. Now finally, one more thing I want to look at from a monitoring perspective: we can go to Tasks and Events. Here we can see, basically, what's been going on with this virtual machine. You can see that I powered it on, you can see some reconfigurations that have been done, and so you can kind of get an idea under Tasks what we've been doing from an administrative perspective, and who has been doing those things. Those are our tasks. And our events are basically anything that's happened at this VM. So has it had alarms go off, for example? Here's an alarm that changed to green status. Has it been powered on? Basically, the events are going to show you everything that's happened. Under Scheduled Tasks you can actually schedule certain things, like a reboot of this virtual machine, or a VMotion of this virtual machine. You can schedule all sorts of tasks here, a clone of it, changes to the resource settings that you want to have these occur at a certain time of day, you can schedule those things here. And you can also monitor compliance with a storage policy. So if you have assigned a storage policy to this virtual machine to control where its virtual disks are stored, kind of data store they should be on, if I've configured those storage policies, I can monitor that here to see if the virtual machine is currently in compliance with whatever storage policy has been assigned to it.

Contents