Join Neil Anderson for an in-depth discussion in this video The benefits of SAN and NAS storage, part of Introduction to SAN and NAS Storage.
- [Instructor] In the last lecture, you learned the basics of what SAN and NAS storage is. In this lecture, you'll learn why we would use that external storage rather than using internal storage in the client machines. (electronic music) So, I'll list the benefits of external SAN and NAS storage here, and I'll start off with a big one, what a lot of people would say is the main benefit of using it, and that's the improved disk utilization as compared to when you're using internal, direct attached storage.
So, for example, if a server requires 100 gigabytes of storage for its operating system and applications when it is initially deployed, and it's expected to grow to 300 gigabytes maximum of storage capacity, it's pretty typical to install 500 gigabytes of disk space for that server. The reason is that you want to leave some overhead in case there's any unexpected growth, if it goes about that 300 gig maximum you're expecting.
In that case, it might not be possible to add capacity later without bringing the server online, which would be a whole heap of hassle, so what you'll see a lot of companies doing is just over-provisioning enough to make sure they don't run out of disk space anywhere. The problem with this is that you'll often get utilization of around 30 percent when you're doing that with direct attached storage. The example there, our server, it's maybe going to be running normally at about 160 gigabytes, when that's happening, you've got 500 gigabytes in there, you're only actually using 30 percent of the storage that you paid for, so it's very low disk utilization and it's very wasteful.
Breaking that down into the maths: so our initial requirement when we first installed the server for the OS and the apps, was 100 gigabytes per server, and for this example, let's say that we've got 50 servers, 'cause in a normal organization you're not going to have just the one server. And this is obviously going to multiply the waste because you're wasting storage space on all of those different servers. So say we've got 50 servers times that 100 gigabyte space required for the initial deployment, that adds up to 5 terabyte space required in total.
But our expected size, we expect those servers to go up to each, is around 300 gigabytes per server. We multiply that by 50, we get 15 terabytes storage space in total that we expect to need, but because we're going to over-provision, we're going to end up buying 50 times 500 gigabytes, so we're going to pay for 25 terabytes of storage when we do the initial deployment of those servers.
So let's say that we've actually got in use 8 terabytes in total, when the servers have got an average of about 160 gigabytes of space being used each, so 25 take away eight, that is 17 terabytes of space, physical disk space we've put in there, that we've paid for, that is not being used. So we're paying for a lot of wasted space. So that's how it works with direct attached storage using internal storage in our clients.
How about if we move to using external SAN/NAS storage? Well with those external storage systems, that provides a pool of centralized shared storage that all of those different machines can use. Devices and applications can be allocated storage as required and, on enterprise class storage systems, the size can be easily changed on the fly, non-disruptively, so we can give a client or an app location a particular size right now.
If the requirement changes later, if it needs less, or if it needs more, we can easily change that on the fly, and it's going to be transparent to the client. It doesn't even know about it. So because we can do this, centralized storage can provide disk utilization closer to 80 percent. Now, you're never going to get 100 percent utilization. Let's say that your servers are using 160 gigs on average, you can't just give them exactly that and get 100 percent utilization, because there wouldn't be any space for anybody to write new data there.
So obviously, you do need to leave some overhead, you do need to leave some additional space for new data to be written, but we can have that, we can have the actual amount of physical storage that we've provisioned a lot closer to what is actually being used. We just have a little bit for overhead, and then that way, it's much more cost-effective. So what are the different technologies that are going to help with this? We've got thin provisioning. Thin provisioning allows you to make it appear to the servers that they've got more storage than you've actually paid for, so you pay for a certain amount of physical storage, physical disks, but it looks to the servers that they've actually got more than that available.
So what you can buy with centralized storage for the same example again, is 50 times 200 gigabytes per server is how we figured out how much we're actually going to buy, we buy 10 terabytes of storage total for our example, rather than 25 terabytes. But the servers each believe that they've got 500 gigabytes of space available, so we've actually paid for less than it looks like to the servers. It looks like there's 25 terabytes of storage space available, but we've only paid for 10 terabytes, so, we've only paid for that much, that costs a lot less than paying for more physical storage.
That 10 terabytes of shared space on the centralized storage is used first-come first-served, so if any of the servers, whenever they write, they are going to start using up that disk space. So the disk space will start to get filled. When that happens, you add additional space as and when needed, and it's transparent to the servers; they don't know that anything has happened. So this allows you to just pay for the storage as and when you need it.
Other technologies that we've got that help with disk utilization: deduplication and compression. Deduplication, just like it sounds, it detects and eliminates identical blocks. The eliminated blocks are replaced with a pointer so it, to a single copy of the block on disk, so it saves you a lot of space. Anything that's duplicate, you just keep one copy of it, rather than if you were using direct attached storage, you'd have all of those separate blocks. We also have compression as well. Deduplication and compression are usually used together.
Compression detects and eliminates redundant data and white space in files; it makes your files smaller. And these technologies can give you huge space savings. If you think of those fifty servers where say, SQL servers, for example. Well all of them are using the same operating system, they've all got the same patches applied as well, that's all duplicate information. If we had 50 separate cases of storage, like we would have with direct attached storage, well, they each require to have that extra disk space.
When we use deduplication and compression, we can just keep one copy of it for all of them, so that means it gives you the effect we can fit much more data into the smaller space, giving us those huge space savings. That is quite normal, again, it depends on the actual work load, it depends on how much duplicate data you do have with your environment, and also with the files that are being used, how compressible they are; you can get really big storage savings, like, a four to one saving is pretty typical.
So, to summarize that, pooled storage and thin provisioning moves from a just in case to a just in time model of purchasing storage space. You don't front load it by more than you need right at the start, you just buy a little more than you need exactly right now, and as your storage needs grow, you can add that on the fly. Deduplication and compression provide additional space savings as well, and all that provides cost savings on your hardware, and because you've got less disks, it's taking up less rack space, less power, and less cooling, giving you more savings.
And the savings are multiplied because storage costs tend to come down over time. Because you don't buy the disks right now, maybe you buy them a year from now, well by that time, storage costs have come down, you can get more for your money. Next benefit of external storage is performance and capacity. Now, when I say performance, you might say, but wait, if the clients are accessing their data across a network rather than in direct attached storage, that's going to add latency.
Yes, that's correct, but it's offset because data can be striped across many disks in an enterprise class storage system. So you've got that same amount of data. Rather than writing it to a few disks, you can write it to many disks at the same time, which is going to give you better performance. And because you can have so many disks in that storage system, that gives you additional capacity capabilities as well. Another thing is that storage vendors are at the cutting edge of new storage technologies.
They're storage vendors, that's what they do, so any new technologies that are going to improve performance, the storage vendors are going to be at the forefront of that and you're going to be the first person to have that available to you when you're using their storage system. Next benefit is resiliency. Your SAN and NAS storage systems are always built to have very high degrees of resiliency, because pretty much always, you're going to be storing mission critical data on there. The vendors know that, so they build the system so that there's no single point of failure.
If any single component fails, there's another component, there's another backup to take its place. Next benefit is centralized management. Using that example with 50 servers again, it's obviously much easier to manage them all from one central location for their storage, rather than having 50 distributed systems where you're trying to keep track of how much disk space they're using on those separate systems. Much easier to do it through a single pane of glass.
Another thing that we can do with SAN storage is we can have diskless servers. With SAN protocols, your clients can boot up from a logical disk which is on the remote storage. This means that they don't have to have a local hard drive at all, they can have no disks. So that's a very popular option with blade servers, and again, this gives you savings on your power, your cooling, and your space.
Next benefit is storage tiering. The storage system can have media with different attributes, like high performance SSD drives, and high capacity, but lower performance SATA drives. If you don't know about drive types yet, don't worry, we're going to cover that coming up right after this. When you do have those different types of performances, like high performance disks and lower performance but higher capacity disks, on the storage system, you can use storage tiering. What that will do is it will analyze the data and it will automatically keep your frequently accessed, which is your hot data, on the high performance disks, and it can archive off your cold data that's not being accessed too frequently onto those lower-performance, high-capacity disks, so it really optimizes your costs of the storage system by doing that.
Next one is centralized backups. I can tell you from experience, managing backups, when you're doing it to tape on distributed systems, is very inconvenient and time-consuming. The place I worked in before, I had to do that, it was the first thing I did every morning. We had just a small place, we had five servers, each with their own tape drives, and every morning I had to come in and change the tape drive, change the tapes in each of those drives, I had to do the labeling as well, and then I had to arrange for it to be taken off-site.
So that was inconvenient. Restores were even more inconvenient, because I had to phone up the company where the tapes were stored, tell them the correct tape to get, and then they had to bring it over, and then it took ages to do restore from tape as well. Benefit you get with SAN and NAS storage is that rather than having distributed systems that you're doing your backups on individually, it gives you that centralized location for your backup solution as well. And you can backup to tape if you want, or you can also backup to remote disk.
You can backup to a different storage system somewhere else. That reduces your backup windows, 'cause it's slow to backup to tape, and it doesn't require you to unload and unload the physical media every day like you do with tapes, which is super inconvenient. Next thing is snapshots. Snapshots are a point in time copy of the file system which can be used as a convenient short term backup. The snapshot consists of pointers to the original blocks on disk rather than being a new copy of the data, so with traditional backup, you backup the data to another location, you have to copy it there, which is time-consuming.
With snapshots, you're not actually moving the data anywhere, you're just saying, this is how the file system looks right now. And it only takes up space when you make changes after that, so it's pretty much instantaneous to take a snapshot, and it takes up pretty much zero space. If your data gets corrupted or someone accidentally deletes a file later, you can recover it very quickly from the snapshot. So snapshots give you very quick, very convenient backups and restores. Now the snapshot is stored in the same location as where the data is, it's not a separate copy of the data, so it does not replace an off-site backup.
If your storage system burns down, you lose the data, you lose the disk, you lose the snapshots, too, 'cause they're in the same place, so you do still need an off-site backup, but your snapshots can be used in conjunction with that to give you that super convenient backup and restore. Next one is disaster recovery. All of the different storage systems support their own replication technologies where you can move the data off, copy it off to a disaster recovery site, and you can keep the two sites closely in synchronization with each other as well.
So that means that if your main site blows up, you can fail over to the disaster recovery site. Also, if the data is read-only, then you can load balance between the two sites. So say that you've got a site in New York as your main site, you've got your disaster recovery site in London. If New York, say that there's some kind of natural disaster there, like a flood, you can fail over London, and for your read-only data, you can have people post to the U.S. reading it from New York, you can have people located in Europe reading it from London so that you get the best possible performance for your different customers.
Now you can only do that for read-only data, 'cause for writable data you need to keep one single consistent master copy of the data. Okay, last one, last benefit I'm going to cover here is virtualization support. So software such as VMware and Hyper-V from Microsoft allows you to run multiple virtual servers on the same underlying physical hardware server. You could have, for example, a Linux web server, a Microsoft Exchange mail server, a SQL database server, all running on the same physical box, and those different virtual servers don't know that.
If you want to learn more about our virtualization, you can check out my introduction to cloud course, where I go into this in a bit more detail. Okay, the killer feature of virtualization software is the ability to move those virtual servers between physical servers on the fly with no outage. When you do that, the virtual servers, they keep on running, there is no outage, even if the underlying physical server fails or is taken down for maintenance.
So what you can do with this is you can have multiple physical servers, and you've got different virtual machines running on those physical servers. If one of those physical servers goes down, say it has a power outage, you can automatically move the virtual machines onto a different physical server, with no outage. And if you want to take it down for maintenance, again, same thing, the virtual machines can automatically move to a different physical server. So there's a huge benefit of using virtualization.
And when we're using VMware for this, the feature is called vMotion. Now to do that, you have to have external storage. So most medium to large companies now are using virtualization, so this is going to be a required feature. There are some workarounds that you can use where you don't use external storage but it's very hard to use them because there's a lot of caveats with that. So basically, if you're using virtualization, you're going to want to have an external SAN or NAS system to support your vMotion there.
Okay, that was it! So there was a load of benefits there, right? So hopefully you can see now the advantages you've got with using external storage.
- What is centralized storage?
- The benefits of centralized storage
- Comparing SAN and NAS solutions
- NAS protocols
- SAN protocols