Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In this movie we're going to take a look at a Unix tool called DIFF, which is useful for comparing two files. Imagine that a client sends you revisions to a text document. You still have the original document but you can't tell where the client changed. You could put the two documents side-by- side and then scan each one looking for the differences then every time you find a difference you could take note of it. That's exactly what DIFF does for you, but much faster and with greater precision than you could. Using DIFF is easy. Notice that I am inside my user directory and inside unix_files and we're going to be working with two files that I've added to this directory.
The first one is original_file.txt, the other one is revised_file.txt. Both of these are included in the Exercise Files but they're also easy for you to create yourself. The first one just has several lines that say delete, several lines that say change, and several lines that say append. And then on the far left side I've included the line numbers for reference. The revised file has something very similar except that now I've deleted one of the lines that says delete, changed one of the lines that says change, and appended a line to the section of lines that say append.
Now what we want to do is ask DIFF to compare these two files and report the differences between them. We do that with diff and then as the two arguments we pass in the two filenames we want to compare. Typically, you want to put the old or original file on the left. You don't have to but that's sort of typical. original_file.txt revised_file.txt. DIFF compares the two files and reports the changes to us. Now the way that it reports those changes might seem a little cryptic at first. So let's understand what it's telling us. It found three changes altogether.
The first change is represented by these two lines. The d in the first line is letting it know that it detected a deletion. Something was deleted. The numbers on either side of the d let us know the line number where this would occur in each of the two files. The number on the left corresponds to the file that was passed in on the left, the first argument. The number on the right of corresponds to what I call the right file. That's the second argument. Notice also then that it tells us the text what was deleted and it has an angle bracket at the beginning that's acting as an arrow letting us know that this occurs in the left file, original file.
Second change that it found is described by these four lines. Now instead of a d we have a c letting us know that there was a change. Once again, we have the line numbers and you can see why it's useful to have those line numbers. Because the position of these two lines, even though they're being compared, it's changed. And if we had a lot of deletions, it might have changed a lot. This line might have jumped up a hundred lines in the file. Having the line number helps us to be able to locate the change regardless of what else has happened. Notice now instead of one line, it gives us two lines with dashes in between the two and the first one has an arrow pointing to the left letting us know that this is the text as it exists in the left file, and then we have an arrow pointing to the right thing, letting us know that this is the text as it exists in the right file.
And then last of all, we have append which uses an a. So we have d, c, and a. Once again, it tells us the line numbers. Don't let it throw you. This says 11 and this says 12. That's because we deleted line 2. The arrow this time points to the right letting us know that this is the text that exist in the right file. So deletes will always point to the left, appends will always point to the right, and changes will always have arrows that point in each direction and show us the text from both. So even though it may have seemed cryptic at first, once we understand how to read it, it's actually a very efficient way to describe the changes.
There are a number of other formats that we can output these results in. This is just the default one. We'll take a look at those but before we do that, let's take a look at some of the options that we can pass to DIFF for how it goes about comparing the two files. If we give DIFF the i option, then it performs a case insensitive comparison. A capital A and a lowercase a would be considered exactly the same. It would not bother reporting that difference to us. If we want it to ignore changes to the blank characters like Space and Tab or all of the white space or the number of blank lines that are in the file, well, then we could use the b, w, or capital B options.
DIFF doesn't just compare files. It'll actually compare whole directories of files. We can use the lowercase r option to recursively compare two directories. So then it will look for files that have the same name in each of those two directories and tell us what the differences are going down each and every one of those files. Now if two of those files are identical, it won't report it. It'll just ignore it and just tell us about the ones that actually have differences. If we also want it to tell us about the files that are identical, we have to use the s option. These are the primary options for controlling how DIFF does this comparison but there are a few others and you can read the Man Pages to see what those are.
But these are the most common ones.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.