Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
In this movie we're going to talk about the way that Git refers to its commits. Remember in the last movie where we talk about the workflow we had different changes that we moved from our working directory, to our staging index, and to the repository. And I just went ahead and gave those the very simple labels A, B, and C. Now that's not what Git calls them, and that's what we're going to looking at in this movie is how Git refers to each of these snapshots of changes. Now be careful don't mistakenly think that A, B, and C refer to a single file in anyway at all. In our example we were using a single file, but these are changed sets, sets of changes, and more often than not they wil refer to multiple files.
So in a typical Git workflow, A would represent changes to five files, B would represent changes that were made to three files, C might be two new files that were added to the repository. So A, B, and C are snapshots of the changes that were made not anything to do with files or versions of files. So let's take a look at how Git does refer to these files. When we submit these changes to the repository at that point Git generates a checksum for each changed set. A checksum is a number that's generated by taking data and feeding it into an algorithm, so checksum algorithm converts data into a simple number, and we call that simple number a checksum.
The same data put into the algorithm always equals the same checksum coming out that's important because if we change the data going in we get a different checksum out. So one of the most common uses for checksums in computers is to make sure that the data didn't change, if the data changed well then the checksum will be different. And this data integrity is fundamentally built into Git that's very different from other version control systems, they don't use checksums to validate that the data hasn't change. Git does it makes sure that you can't change what's in a commit or else you'll change the checksum that comes out of it. Changing the data changes the checksum.
Now the way that Git generates this checksum is by using the SHA-1 hash algorithm. You don't need to know anything about that hash algorithm itself, but you do need to know that it's called that because you will often hear people refer to this checksum or hash as being the SHA, or S-H-A value. The number that the algorithm generates is always going to be a 40 character hexadecimal string. Hexadecimal means they can have the numbers 0 through 9 and the letters a through f. So an example might look something like this 5c15e8bd540 and so on, 40 characters long made up of those characters.
So what Git does is it takes the entire set of changes, runs them through in algorithm, and in the end comes out with this one 40 digit number, we've seen this number before. When we get our Git log command here is that ID that I told you to get uses to track each one of our commits, its right there this is the SHA, S-H-A value, or the commit ID, you can call it whatever you want really. But it is a number that will be unique to the changes that are in this commit. So the way they get actually attaches that information is that if we have those three snapshot those sets of changes it feeds them into its algorithm to come up with the S-H-A value, and then it attaches a bit of meta information to each one of those snapshots, it has that commit number at the top, it has the parent commit the commit that comes before it, the author of the commit, and then the commit message.
So here you can see how the series of those commits are linked together, you can see that the parent for each one refers to the SHA-1 value of the other one before the identifier that come before, and that's how it knows the sequence of those commits. And then each one of those, each bit of meta information, points at a snapshot a set of changes or a Git object. Understanding how Git generates these hash values is important, because it helps us understand how Git summarizes these snapshots, it illustrates the data integrity that's built into Git, and most importantly we're going to be using these SHA-1 hash values to refer to the commits.
Get unlimited access to all courses for just $25/month.Become a member