Learn how to configure storage environments using vSphere 6.5. Explore storage architecture, VMFS, vSAN, DRS, and volumes. Prepare for the VMware Certified Professional exam.
at some different storage architectures and how we can effectively troubleshoot them. So, when we think about a virtual machine we've got really two parts of a virtual machine. We've got the running state of a virtual machine which exists on our hypervisor, and we've got the files that make up our virtual machine, we want it to appear as if it has its own memory, its own CPU. We want it to think it has its own SCSI controller, and this is really the critical part for storage. So, what we're going to do is present drivers to the virtual machine that it can use to access storage, just like a physical machine would have. A physical machine would have some sort of SCSI controller. And so, we're going to trick our virtual machine. We're going to give it a driver. We're going to allow it to use something like a Paravirtual SCSI controller, or a LSILogic SCSI controller. We're going to trick that machine into thinking it has access to physical storage, when really it's simply going to take those storage commands, dump them into the hypervisor, and the hypervisor will take it from there, and the same thing with our virtual NIC. We're tricking the virtual machine into thinking it has a network interface card, when in reality it just has a driver for a VMXNET3 network interface or an e1000 network interface. It's not real hardware. It's just taking that network traffic, dumping it into the hypervisor, and the hypervisor takes it from there. And so, the beautiful thing about this is that the hypervisor then gives us a layer of abstraction. It doesn't matter to the virtual machine what the actual underlying hardware is. So, in the case of storage, the virtual machine never sees whether it's dealing with an NFS storage device, whether it's dealing with VSAM, whether it's dealing with iSCSI, Fiber Channel, local physical storage. It has no awareness of any of that. Windows sees a virtual SCSI controller. It sends those storage commands, those SCSI commands to that virtual SCSI controller, and that's all it really knows. So, let's take a look at this process. We have a few different options when it comes to our virtual disks. Just like CPU and memory, a VM doesn't really actually have any physical storage hardware. It's accessing a shared resource, and in this case our resource is called the datastore. So, Windows needs to see the storage hardware. Windows needs to think, hey, I've got a SCSI controller. I've got something that I can send these SCSI commands to. So, we're going to trick it. We're going to give it a virtual SCSI controller. We're going to give it a driver, and then when Windows needs to read or write data those SCSI commands are sent to the virtual SCSI controller and from there those storage commands hit the hypervisor, and they're redirected to the appropriate VMDK file for this virtual machine. This is what gives me a lot of the possible storage features that we have with vSphere. I can do something like a Storage vMotion. I could move this VMDK to some other datastore, and the virtual machine will be completely unaware that that has happened, because all the virtual machine really sees is that SCSI command going out on the virtual SCSI controller. What the hypervisor does with that SCSI command after the fact is completely hidden from the VM. Now, there's also some options for the type of virtual disk that we create for our VMs, and the most common choice is a thin provisioned disk. So, let's assume that we have a VM created with an 80 gig virtual disk, but the VM only has 40 gigs of actual data. This means that only 40 gigs of actual storage capacity to two different physical switches for fail over purposes. But the other problem that can occur here is at the Ethernet switch level, right? So, I could have some kind of problem here in the Ethernet switch. I could have incorrect MTU settings. I could have a switch that's just plain overwhelmed. Maybe the CPU and memory are overwhelmed, or something along those lines. That's another potential spot where my problem might exist. I could have a a congested link between the switch and my actual storage device. The processors on my NAS device could be overwhelmed, and then I could have physical storage issues in the NAS device itself. Now, these sorts of problems are typically going to manifest on multiple ESXi hosts. So, maybe I don't have a high enough spindle count. I sort of think of spindle count like bottles of ketchup. If I'm squirting ketchup out of one bottle, I'm limited to the throughput of one bottle. But if I'm squirting ketchup out of four bottles at once, I can squirt four times as much ketchup simultaneously. You know, it's kind of a weird analogy, but that's sort of like disks. If I'm pulling data from one disk versus pulling it from four I can pull it four times as fast. That's why spindle count's important. So, maybe I don't have a high enough spindle count. Maybe my disks aren't fast enough. Maybe the cache is inadequate. Whatever the case may be, there are a bunch of potential problems that you could have on the actual storage device itself as well. So, having a diagram like this is really useful. Let's take a look at one more diagram. This one's for iSCSI. So again, the diagram doesn't really change that much, but there are some different pieces to the puzzle. Again, if the problem is isolated to a VM, look at the operating system of that VM. When the virtual machine generates a SCSI command, the SCSI command is relayed to the storage adapter of the ESXi host. So, in this case we're using iSCSI. And so, we've got some kind of storage adapter here. It could be a dedicated hardware iSCSI initiator. It could be a software iSCSI initiator. It could be a dependent hardware iSCSI initiator. In this case, I've chosen to make it a software iSCSI initiator. So, the job of the storage adapter is to receive SCSI commands in their raw format from the virtual machine and then take those SCSI commands and prepare them as iSCSI packets, so that they can traverse the physical network. So, what do I want to potentially troubleshoot here? That storage adapter is only going to perform well if there are adequate CPU resources on the ESXi host. It's a software component of ESXi. The virtual switch, the VM kernel port, those things are only going to perform well if the host itself has adequate processing resources. And then on the virtual switch or potentially multiple virtual switches, so, with iSCSI what I could do is I could create a VM kernel port on one virtual switch and a VM kernel port on a second virtual switch, and I can round robin, have the traffic use both of those VM kernel ports. That's a great way to ensure that, number one, you're spreading your traffic out across multiple physical adapters, but number two if one of these switches fails, if one of these network connections fails, our traffic still continues to flow. So, if I'm having performance problems and they are local to this host, I'm looking at the CPU of this host. I'm looking at the physical network connections for this host. Are they overwhelmed? And if those are not the scenarios that we're experiencing, if those are not the problems, then I can move my way into the network itself, right? So, are the physical switches being overwhelmed with CPU or memory or is there just too much traffic hitting them? Have one of these connections to a storage processor failed? Are all of my connections actually up and running? Is there some kind of network problem that's creating latency? And then, I can look at my storage writing. Now again, if I'm having storage write or network issues I'm probably seeing problems on multiple ESXi hosts. So, if the CPUs of my storage processor are overwhelmed, that's going to cause all sorts of latency. I might even see storage commands being aborted. You'll see those in ESXTOP as well as aborts per second. Aborts are bad news. If you have storage commands that are being aborted they're sitting out there for so long that they're basically just being dropped. So, aborts are bad, and they can happen if the storage array is overwhelmed, or again, maybe I have inadequate spindle count. Maybe I don't have fast enough disks. Maybe I'm using 7200 RPM SATA when I should really be using 15,000 RPM SAS or I should be using Flash or SSD. So, key to troubleshooting and problem determination with storage is always a solid understanding of my storage and network topology, so that I can kind of systematically work my way through it and determine what the root cause of my storage problem is.
- Storage commands
- Virtualized operating systems
- Securing storage traffic
- Network topology
- Connections and arrays
- VMkernel port
- Local and shared storage
- Storage adapters
- Configuring encryption options
- Creating clones and snapshots
- Storage performance and availability
- Failure management
- Analyzing metadata