Learn how a hash is a one-way function that can be used to check whether a file has been modified. This is commonly used when distributing software, like Linux installers.
- [Instructor] A file is just a big collection of bits and when we use a file like a Linux installer ISO or really anything else, it's good to know that the file contains all the bits we expect it to. Files can be altered either by someone doing something sneaky like a download site serving up a file that's been modified to include malware, or simply by random errors that occur on storage or when being transmitted through a network. Networking protocols and storage devices try to maintain integrity, but even a Linux installer ISO has billions of bits, and even if one gets flipped, things may not work correctly.
Rather than comparing a file bit by bit to the original file, we can use a checksum to compare file contents. A checksum is the result of a hashing function, an algorithm that comes up with an repeatable answer given a particular input. A hash function treats the file as one long string of data and it computes the checksum based on what the file contains. If the contents of the file change, the checksum will be different. And by comparing the checksums of files, we can know with very high confidence whether they've changed. You may have seen checksums on download pages for installers, archive files, and so on.
These are provided by whoever's offering the download, so you can check for yourself whether the file you're using matches bit per bit the files that they intended to provide. Again, even though this hash comparison can be useful to detect malicious changes like an installer with malware added by the download site itself, it can also be used to make sure all the bits flowed from the server to your computer correctly. Let's take a quick look at generating and sharing checksums. I'll create a text file here.
And then I'll create a copy of it. There are a few different algorithms that we can use to generate and verify file checksums, and there are individual tools for each. md5sum, sha1sum, sha256sum, and so on. Each of them generate a checksum of a different length. Let's take a look at md5sum. I'll run that on both of these files here. I can see they have the same hash, this long string of characters here.
I'll make a change to one of these files, just adding one character. And now when I run that command again, I can see that the checksum of the changed file is completely different, not just off by one character or something like the actual file is. This makes it helpful to eyeball a checksum and see if it looks like the one you were expecting. But there is a more precise way of comparing a checksum. Let's say we went to download this file and were provided this checksum.
I'll copy this checksum. Once the file's downloaded, we could write echo and paste the checksum add a space and the file name. And then pipe that into md5sum dash c. And now the tool compares the checksum for us, telling us that this file is okay, which means that the checksum matches. Usually when we get a checksum from somewhere, they'll tell you which algorithm it's for. If not, you can use trial and error and see which tools will output a checksum you are expecting.
One thing to note. It's important that when you are comparing a checksum that you get the original from a trusted source, like a website viewed over HTTPS. Some providers also use signed checksums, relying on a cryptographic key stored on a key server. I won't go over that here, because that gets into more than just basic checksums. Checksums are really helpful for verifying the integrity of a downloaded file, and you can provide checksums of files you are passing along to help the recipient to ensure that the file they get hasn't been modified.
Note: Because this is an ongoing series, viewers will not receive a certificate of completion.