Join Adam Wilbert for an in-depth discussion in this video Understanding the third normal form (3NF), part of Database Fundamentals: Storage.
- View Offline
- Once we've checked our tables to ensure that they satisfy both 1NF and 2NF we can turn our attention to satisfy the Third Normal Form. 3NF starts with the assumption that the table is in the Second Normal Form and then adds the requirement that every field is non-transitively dependent on the primary key. That means that the data contained in a field is not determined by another field that's not part of the key. This typically includes calculations in other types of derived data. Let's return to management studio and see how to apply this.
This is the text that we wrapped up with in the last movie. If you need to catch up you can use the 3NF-Start text file in order to get back to this point. Now what I want to do is go down and create another table down here at the very end. It's going to store about our orders so I'll go ahead and scroll down and we'll start a new create table statement and the name of the table is going to be Orders. And the type of information that I want to store in my Orders table is a unique identifier for each order.
So I'm going to call this Order ID and just like before I want to remember that this is going to be the primary key for the table so off on the side I'm just going to type a comment with the two hyphens and I'll just put a reminder to myself that we'll establish this as the primary key at some point later in the future. The next field I want to know who placed the order so I want to have a reference to the Customer ID. Next I want to know which product they ordered and that will come out of a products table that we haven't created yet but that will come eventually. So I'll call this Product ID at this point.
After that maybe I want to know the name of the product so I'll go ahead and type in Product Name. We'll need to know the price that the customer paid so I'll type in Purchase Price. I want to know how many they bought so I'll type in Quantity and finally I'm going to want to know the total for the entire order. So let's first evaluate this table against the first normal form. It has a primary key established and we're not trying to stuff multiple values into a single field so it meets first normal form. What about second normal form? Each of our columns describes the order and we don't have a composite key so we also satisfy the second normal form.
Now in determining 2NF we're concerned with attributes being dependent on the whole key. For third normal form we're checking that attributes depend on nothing but the key. Just like with 2NF if we find any offending columns we'll simply move them into their own related data table. In this case the product name is dependent on the product ID that was ordered which in turn is determined by the order ID number. So we have a situation where the product name is only transitively dependent on the primary key. So it should actually be removed from the table. It belongs in the products table instead.
So I'm going to go ahead and highlight this and just remove it from our list here. Now you might be saying that the purchase price is also dependent on the product ID and you can make a strong argument that it should also be removed and stored in the related products table with the name. I would argue that the purchase price and the product price represent two different things. They might be the same value at the time of the purchase but what happens when the product price changes at some point in the future. You wouldn't want the invoice to suddenly jump in price because it's tied to the current price instead of the price at the time of purchase. Storing historically data gets a little bit tricky but in this case I'd suggest keeping the purchase price separate from the current product price in order to avoid any accounting discrepancies.
so we'll leave this field here inside of the Orders table. Next the Totals field here is transitively dependent on the primary key based off of the quantity and purchase price fields. Once we know both of those we can calculate the total with some simple multiplication. We can do so without having any knowledge of the order ID. In fact this type of calculation is a job for a query to perform on the fly and we shouldn't actually store the total anywhere in the database so let's go ahead and just get rid of that column altogether instead of moving it to a new table. So there's our Order table.
We've just confirmed that it meets the Third Normal Form requirements. We can go through and double check some of our other tables to make sure that they also meet Third Normal Form. One way that we can check to make sure that our tables meet Third Normal Form is to run through a couple of mental questions inside of our data tables and see if we can find some answers. So for instance if I take a look at our customers table here and if I were to give you a single last name, would you be able to tell me without a doubt exactly which customer that name belonged to? Now in this case we might have multiple customers with the same last name so you wouldn't be able to identify a single specific customer just based off of their last name.
We can do the same thing with the city or the state. If I gave you the state of California, could you tell me the one customer that I'm talking about? You wouldn't be able to because we could have multiple customers from the same state. In fact, if you go through that same kind of scenario through all of the different fields in our data table you'll find that each one of these doesn't provide you a single conclusive piece of information that will provide you with a single specific customer. We can do the same thing with our credit card or our customer phone numbers. If I gave you a single phone number, would you be able to uniquely identify the exact customer that that phone number belongs to? No, because you can have multiple customers with the same phone number.
Same thing applies with our credit card numbers here. So now we can see that all of our tables satisfy the requirements of the Third Normal Form. Anytime you add a new table to the database or a new column to an existing table you should run through the process again to make sure that your database is still in 3NF.
Note: This course will also prepare certification candidates for the "Understanding data storage" domain of the Microsoft Technology Associate (MTA) Exam 98-364, Database Administration Fundamentals. Find out more at https://www.microsoft.com/learning/en-us/exam-98-364.aspx.
- Reducing redundancy with normalization
- Fixing partial dependencies
- Creating primary keys
- Linking tables with foreign keys
- Creating clustered and nonclustered indexes