Join Simon Allardice for an in-depth discussion in this video First normal form, part of Foundations of Programming: Databases.
- Before we apply the first set of criteria, what's called First Normal Form, often shortened to 1NF, I'm taking as a given that we already have our columns and our primary keys specified. Okay, First Normal Form says that each of your columns and each of your tables should contain one value, just one value, and there should be no repeating groups. Okay, what does this actually mean? Well, let's say I begin developing a database for my company, and one of my tables is an Employee table. Very simple stuff.
EmployeeID, LastName, FirstName, and so on, and we allocate every employee a computer. I want to keep track of that, so we'll add a ComputerSerial column to keep track of who has what. Now this actually okay right now, this technically is in First Normal Form. Here's the problem. Let's say i figure out that some of our employees need a Mac and PC to do their testing, others need a desktop and a laptop. So several people have multiple computers, and I want to keep track of all of them.
There is a couple of ways that I could deal with this. I could just start stuffing extra data into that one column, we could start putting commas or vertical bars, or any other delimiter and put in multiple values in the one ComputerSerial column. This is just something you just don't do in relational database design. We're violating First Normal Form. Understand, the relational databases will happily deal with hundreds of tables, each table could have hundreds of columns and millions of rows, but, they do not want columns that have a variable amount of values.
Perhaps there's one value, perhaps there's two in it, perhaps there's ten. Each column and each row should have one, and only one, value. Now, could you do it this way? Sure. Technically, if that column was defined as character data, the database wouldn't stop you from entering commas and multiple values into it. But, as far as the database is concerned, it's still just one value that just happens to have commas in them. So, you would find it hard to search directly for a serial number, you would find it hard to sort, you would find it hard to maintain.
So, it's not in First Normal Form if you do this, because First Normal Form demands that every column, every field, contains one, and only one, value. So, what we might do then, is go back to the original way and instead start adding new columns, so, ComputerSerial2, ComputerSerial3. This is what's called a repeating group, and there should be no repeating groups. The classic sign of a repeating group column is a column of the same name and a number tacked on to the end of it, just to make it unique, because usually this is a sign of an inflexible design.
Sure, if we could guarantee that there would only ever be two or three, that's fine, but what happens when we want to add the tablet and the smartphone? What happens when one employee manages testing and needs to be associated with six computers? We don't want to require a change to the database schema just because we buy a new computer, so what do we do here? Well, what we do is the same thing for a lot of these normalization stamps. We'll take this data out of the Employee table and put it in it's own table.
This then has relationships. We create a one-to-many relationship between employee and this new computer, or it could be called an Asset table, or whatever else makes sense, and it has a foreign key back to the Employee table. I can take any EmployeeID, like 551, follow it to the Computer table and find his two computers, or 553, find his three computers. There's no repeating values, no repeating groups in either table, and this would get us into First Normal Form.
Now, it's very common that the solution to a normalization issue is to create a new table, sometimes it's a one-to-many relationship like this, other times it might even require a many-to-many with a linking table.
- What is a database?
- Why do you need a database?
- Choosing primary keys
- Identifying columns and selecting data types
- Defining relationships: one-to-one, one-to-many, and many-to-many
- Understanding normalization
- Creating queries to create, insert, update, and delete data
- Understanding indexing and stored procedures
- Exploring your database options