Author
Updated
3/26/2019Released
10/3/2017Note: Because this is an ongoing series, viewers will not receive a certificate of completion.
Skill Level Intermediate
Duration
Views
- [Instructor] In the last episode we created a RAID 5 array with three disks to protect against disk loss. Now let's take a look at what happens when one of those disks fails. The system may notify you that the disk has failed or it may just disappear if there's some kind of catastrophic hardware failure. If you're getting errors about bad blocks or something like that, you'll know which device is having issues and then you'll use that information to start replacing a disk. Let's say I've been getting errors from dev sdd. I'll first mark that disk as failed with mdadm manage /dev md0 fail /dev/sdd1.
We can take a look at how the details have changed with mdadm detail /dev/md0. Here I can see that one of my disks has been removed from my array and that sdd1 has been marked as faulty. I'll remove that disk from the array with mdadm dash dash manage /dev/md0 remove /dev/sdd1. And again I'll take a look at the status.
My array has degraded and I'm missing one of the members. My files are still here on the disk because their information and parity have been written across all three disks. Well, now, two of the three disks and the distributed parity will recreate the missing stripes as needed. Though it'll be a bit slower to access when the array is in this degraded state. In your scenario you will need to identify the failed physical disk in order to remove the correct one and to do that you can use hdparm dash i and the path to the disk.
This way you can find the manufacturer and serial number of the device. Mine is a virtual disk so this serial number is just random, but on a physical disk you could match this to the serial number on the sticker to figure out which physical device corresponds to this logical device. I'll shut my system down now so I can add a new disk. (keys clicking) You may not need to shut down a system to add a disk if you have hardware that supports hotplugging, but my little virtual machine needs to be shut down to add SATA disks.
I'll start my system up again. Once my system starts back up I'll go through the steps of finding that disk with lsblock and here it is. It's sde. If you physically disconnected a failed disk from your system, the disk that you plug back in might have the same descriptor as the one that was removed. So keep an eye out for that. But for now, we'll work with sde as the replacement disk. I'll fdisk to create a partition and I'll write that partition table to the disk.
Now let's add this new disk to the array. I'll write mdadm manage /dev/md0 add /dev/sde1 and the array will start to resync. We can take a look at that with mdadm detail /dev/md0. The array is in a sensitive state right now. If we lose another disk before the array is rebuilt the whole array will be destroyed.
It's not uncommon for disks to fail during rebuild due to a combination of Murphy's Law and the disks being worked harder than they normally are. That's why RAID 6 is usually recommended for important data nowadays instead of RAID 5. RAID 6 gives you an extra disk of redundancy, but it requires one more disk. Okay, that's done. The array is resynced and if you haven't mounted the device already you can do that and start using the disk again. RAID isn't magical and it can come with its own problems, but it allows us to keep our data a little bit more safe than if it were just on one disk, but you should always have a backup of critical data just in case.
Remember, RAID is not a backup. We've seen how to replace a disk in an array. If you want to explore this even more a virtual machine is a great place to experiment with different failure modes and different RAID configurations.
Related Courses
-
Linux: Bash Shell and Scripts
with Kevin Dankwardt2h 46m Intermediate -
Linux: Files and Permissions
with Grant McWilliams1h 49m Intermediate -
Linux: Multitasking at the Command Line
with Scott Simpson39m 1s Intermediate
-
Introduction
-
Welcome1m 4s
-
-
Foundations
-
Foundations: What's Linux?5m 26s
-
Foundations: Distros10m 50s
-
Foundations: USB installer7m 46s
-
Linux on Azure7m 21s
-
Google Cloud Platform3m 37s
-
Linux on DigitalOcean6m 39s
-
-
System Basics
-
Bash operators6m 55s
-
Logs5m 4s
-
Backing up data with rsync5m 35s
-
Filesystem Basics
-
Files on Linux4m 10s
-
File system basics: Archives6m 22s
-
Loop devices4m 35s
-
-
Working with Text
-
Working with text: Vim 1015m 15s
-
Working with text: nano 1015m 29s
-
Working with text: grep4m 23s
-
Working with text: sed7m 24s
-
Working with text: AWK2m 51s
-
-
Package Management
-
Package management: Basics6m 59s
-
-
Working Remotely
-
Remote files: curl and wget4m 42s
-
SSH: Secure access8m 8s
-
SSH: Tunnels3m 46s
-
X11 forwarding2m 52s
-
Using a SOCKS proxy4m 53s
-
Using a Squid proxy8m 43s
-
-
Process Management
-
Process management: ps4m 30s
-
Process management: top4m 53s
-
Process management: htop6m 11s
-
Customizing tmux3m 32s
-
-
Security
-
Firewall basics4m 5s
-
Understanding iptables6m 41s
-
File checksums3m 48s
-
Encrypting files New7m 14s
-
Encrypted filesystems6m 58s
-
AppArmor4m 2s
-
-
System Administration
-
Disks and partitions6m 42s
-
Logical volume management9m 59s
-
Create a RAID array9m 10s
-
Repairing a RAID array4m 44s
-
-
Exploration Topics
-
Windows Subsystem for Linux7m 20s
-
-
Troubleshooting Topics
-
Projects
-
Linux on an old PC6m 7s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Repairing a RAID array