From the course: Ubuntu Linux: Storage Management

Understanding storage terminology

From the course: Ubuntu Linux: Storage Management

Start my 1-month free trial

Understanding storage terminology

- [Instructor] Storage refers to systems and components on which information can be saved and retrieved as needed. A flash drive, a hard disk drive, a NAS, and a SAN are all different types of storage that can be used for different purposes. Storage is considered separately from memory, when talking about systems. While both kinds of components store information, storage media are designed for mass storage over the long term, whereas memory is designed for holding smaller amounts of information, for a much shorter period of time. Memory, or RAM, is extremely fast, but it's volatile; that is, it doesn't retain information when the power's removed. And it's also comparatively expensive per unit of capacity, so we use it where it makes the most sense: to give a system some space to work. Think of memory as a workbench, or the top of a desk, where the system can keep a few things that it's currently working on. Storage on the other hand, is quite a bit slower than memory, but it does retain information when it's powered off, and that's why we use it for holding data for longer periods of time. It's also much less expensive per unit of capacity. If memory is a desk surface in an office, storage is the filing cabinet, where information that isn't currently being worked on is kept. The information is accessible to the system, in case it's needed, but it's not stored in memory all the time. So even though they're often measured in gigabytes, memory and storage are different things used for different purposes. We typically refer to the devices used for storage as disks. A disk is a physical device you can hold in your hands. These devices can be traditional magnetic disks, SSDs, or other flash media. Storage devices are either local or remote. Local means that they're plugged into the system you're using, as is usually the case with a boot drive, or a USB disk, or anything else plugged in to a local data bus on your computer, like serial or parallel ETA, USB, FireWire, or Thunderbolt. Remote storage is accessed over some kind of network medium, like ethernet or wifi. Remote storage is provided by another system, to which your system connects. The devices that we'll be working with in this course are called block devices. Block devices are those which keep track of, and communicate information, in terms of blocks of binary information, as opposed to the other kind of device, called character devices. Character devices track information in bytes, interpreted as characters. A parallel port for a printer, or a serial port, like you might use for communicating with a micro controller, or programming some sort of network hardware, is a character device. The pieces of information sent back and forth are read as individual characters. A block device, like a hard drive, doesn't think about information in terms of characters, but rather just as chunks, or blocks, of information of a given size. Locally connected storage devices are represented as devices in the /dev directory, where every device and partition the system can find is represented with a series of letters and numbers. On many modern systems, we have disks attached on the serial bus, either on the SATA, or Serial ATA bus, or mounted via USB, the Universal Serial Bus. Drives attached in these ways are represented by the letters SD, for Serial Disk. And then they're given a letter to identify them, in the order that the system detects the devices. On older systems, or those with disks on a Parallel ATA bus, and some other buses, disks are labeled with HD instead of SD. The order in which these disks are detected by the system isn't guaranteed to stay the same, and we'll cover that when we look into mounting disks. But for now, the first disk the system detects is called A, then B, then C, and so on. Divisions within these disks, usually called partitions, are represented with a number, starting at 1. So on a disk that's the first or only one in the system, with three partitions, here in the /dev directory, and in other places, where we see the descriptors for the disks, we'd see sda, the whole disk, and sda1, sda2, and sda3, the partitions on that disk represented right alongside the disk itself. Aside from necessary hardware like a controller, or mechanical actuators, disks can contain media that store information as bits, which you can think of as a big field or space of areas that can represent either an on or off state. The space can be divided in various ways. Usually there's just one partition with one filesystem, on devices like USB drives, but there can be more than one, for various purposes. Most frequently, on a running system, you'll see a single device, with one or more partitions, each containing a filesystem. Information about the partitions is stored on-disk, in the partition table. A type of partition management that's been around since the 1980s is called MBR, or master boot record. This structure only allowed for up to four partitions, and these were called primary partitions. You could also replace one of those primary partitions with an extended partition, which could store any number of logical partitions as well. You may come across this partition management scheme if you're working on a very old system. A more recent type of partition management is called GPT, for GUID partition table, where GUID stands for Globally Unique Identifier, and it doesn't suffer from the same limitation on the number of partitions. Most modern computers, especially Windows systems and Macs, use this partition type. Linux can use it as well, of course, and you'll probably use it if you want to dual-boot Linux on a fairly recent system, but it's still considered space in terms of partitions. Once we have partitions, we can make filesystems on top of them, as we'll see later on, or we can go one step further, with a tool called LVM, which is an additional layer of management on top of standard partitions. While it uses standard partitions, usually on a GPT disk, LVM thinks about the storage as volumes and groups, and gives us a few advantages over traditional partitioning. Among other things, it gives us the ability to span a filesystem across more than one disk, which gives us a lot more flexibility, in terms of how much data we can store in a single filesystem, and also allows us to provide more redundancy, in the case of hardware failure. LVM first divides space up into PVs, or physical volumes, which are standard partitions, but are treated a little differently. These PVs become members of a VG, or volume group, an organizational unit that keeps track of the underlying volumes, whether they're all on one disk, or spread across more than one. And on top of this volume group, we have one or more logical volumes, or LVs, which are what contain the filesystems that we use. These layers of abstraction are what help give us more useful features, and a more flexible approach to managing space. Whether it's on a standard partition, or a logical volume, our storage needs a filesystem. The filesystem is the most user-facing part of the storage that we see. It's the data structure that keeps track of bits, and shows them to us as files, in folders with various information, or metadata about their size, creation and modification dates, permissions, and so on. There are a number of different filesystems, with features like snapshots and journaling, and we'll get into those ideas a little bit more in detail later on. The filesystem is the way that raw data is represented as files, and folders to the user on the system. The metaphor of representing pieces of information as files, organized into folders, goes back a long way in computing, and filesystems that work this way go back a long way too. Filesystems store more than just the information that makes up files, though. Depending on the type, they'll store all kinds of metadata, or data about the data, like the creation and modification dates, permissions that determine access to the file, and some other bits of information that systems need. In both cases, in order to use storage, it needs to be mounted, which means that the system not only needs to be aware that the storage hardware's available, but it needs to read information about the device, and understand how to make the data available on it accessible to a system or user. The system needs to be able to read and understand the filesystem of the mounted device, in order to make the volume available to use it on the system. And when the system is done using storage, it can unmount the device, or logically disconnect the filesystem, so it's no longer available, and is in a safe state to disconnect our power down. When a device is mounted, its filesystem becomes available somewhere in the host system's filesystem. So, to tell a system to mount a volume, we need to know where it's accessible, and where to show it. We need to specify where the system should make the filesystem, or volume on the partition table available, in its own filesystem. Very often, the primary filesystem for the system is mounted at /, or /root, so the operating system knows where to find the parts that it needs in order to run. A separate boot partition is mounted at /boot, and other disks that might be used on the system are commonly mounted in subfolders under /mnt, but that's just a convention, not a rule. You can mount disks anywhere, and some disk droves mount removable media like optical disks and disk images in the media directory instead. This flexibility lets us mount devices accessible to the whole system, or just to specific users, depending on where they are and what the permissions on those paths look like. That's a lot of terms, but knowing them before we start working with the mechanics of the filesystem, partitions, and disks, will be helpful.

Contents