- [Instructor] Let's talk about data normalization. Normalizing your data is a really tough topic to get your head around. A lot of professionals don't understand normalization entirely, so for us to grasp it, we really need to break it down and look at it from a step-by-step viewpoint. And you'll take in what you can, and do what you can to use normalization as you're buiding your database. It fits in here with the defining your data elements, because normalizing your data has to begin as soon as you organize it. The reason that normalization is important is because it increases data integrity, and the integrity of your data is absolutely vital.
If you have problems with your data down the road, you're going to end up deleting files as you're trying to delete an aspect of a file, like a piece of data. For example, if you want to delete a customer's address so you can re-enter it, or because the customer has moved and hasn't given you any forwarding address yet. And suddenly, the entire customer file is gone. Well, that happens because you didn't properly normalize your data, and your data dependencies aren't set up properly. So this topic falls within the defining your data elements.
As soon as you start organizing your data, now's the time to look at normalization. Well, the question is, what is normalization? And I've got a few steps for you. There are many different forms of normalization. We're going to focus on the first three forms, and you may only be able to achieve the first one or two. Really, it's a choice. Sometimes, data normalization just isn't practical. It's way too involved for the small database that you're building. But, as I said earlier, databases have a way of growing out of control.
They take on a life of their own, and they continue to expand and expand, as users begin entering data and managing their workflow. Normalizing your data is an important part of making sure you have a proper foundation. Look at this information that we have here. I have a simple table defined in the box at the top of my screen. I'm going to be tracking a description, a manufacturer, a manufacturer's address and phone number, the quantity we have in stock, and the department that it falls into. This list is my attempt to manage my inventory.
Right now, the data is un-normalized, meaning that the data integrity is very low, and the chances for error are very high. If you look at the table that results from this particular organization, you'll see that in my descriptions, I have blue pens and black pens all in the same line. The manufacturer of both those type of pen is Pens, Inc., and you can see the information for the manufacturer there. The quantity I have in stock represents both of those pieces of data. Now, card stock and office chair are still individual, but I'm sure that Papers, Inc., and Chairs, Inc., both make products that we're going to carry multiple versions of.
In this current table, before we have applied even the first normalization rules, you can see that we have problems right away. And most people have just an intuitive common sense understanding of organizing your data where this wouldn't happen. But let's take a look at the first normalization. When you bring your data into first normal form, you're eliminating redundancies. Well, in order to eliminate the redundancies in my previous table, we needed to add one item, and that was the item number.
By adding an item number, we can uniquely identify each item, and as you can see, the redundancy isn't there that we had before. We had the blue pen and the black pen both being described in one line. Well, that isn't going to fly. You can't search for each individual item. Now that you have a unique identifier, this being the item number, we've brought our data into first normal form by eliminating those redundancies, and the manufacturer just appears twice, because we have two items that we're carrying from the same manufacturer.
This is still a relatively low version of normalized data. First normal form is easy to achieve. It's essential, but it's only the first step. Now, let's look at bringing this data into second normal form. Second normal form involves looking at your dependencies. Well, what do I mean by dependency? Well, if we go back a step, in first normal form, all of the information in one record, which is one row, was dependent on the item number.
The item number became the key for that record, the unique piece of data that all this information was being sorted by. Well, the description of blue pen is dependent on our item number. But the manufacturer is not dependent on the item number. So the manufacturer information really is a separate issue from the inventory. So if we go back to second normal form, you'll see that, to bring it into second normal form, I had to separate those items.
Now, we have a unique item identifier for each inventory item. And we have a unique identifier for each manufacturer. And then, we simply reference one with the other. So, in item number 1001, the blue pen, it's manufacturer 2001. Now, we can easily refer to the other table, for the manufacturers, and see that 2001 is Pens, Inc., at this address and this phone number. We've brought our data into second normal form by breaking out information that isn't dependent on the unique identifier for that record.
This is also an important process, it's absolutely essential. Bringing your information into second normal form requires that you have satisfied first normal form. The normalization of your data is a building process. You can't have second normal form without satisfying first normal form. And likewise, we can't move on to third normal form without satisfying first and second normal form. So let's take a look at third normal form. Third normal form may be a little more difficult to achieve.
And you may not absolutely satisfy third normal form, but it's definitely a goal that we want to keep. In some cases, it won't be practical. What third normal form states is that, all of your data in a record must be directly dependent on the key. So in this case, I needed to break out the departments, because department information was not directly dependent on the key. The item number 1001, for example, and that was our unique identifier for each record, does not describe stationery or supplies or furniture, which are the department types that we can assign.
Those department types are separated, and need to be separated, within our database structure. So that information gets broken out, and now, for the small little example that we're looking at, this satisfies third normal form. And, as I mentioned earlier, in order to satisfy third normal form, we had to meet second normal form and first normal form before it. I'll admit that normalization can be tricky. But if you look at normalization as a step-by-step process, and try to continually improve your data, it's definitely worth it.
The payoffs in the end is a very sound structure for your database that isn't going to be easy to corrupt. So, look at data normalization, there's a lot of material out there on it. Try to comprehend the concepts behind data normalization, and apply it in your design at an early stage, and keep going back to check and double-check yourself to make sure that you're following those forms of normalization.