Shows how to follow the trail of foreign keys to find how tables are linked together, in order to picture database linkages and work out the roles of different tables
- [Instructor] We've looked at keys and indexes and constraints on one table. And now we're going to use that information to look at the database as a whole and see how our tables link together. Here we are representing the rental table as a blue box. Under the name of the table, we've listed all the primary and foreign keys. P for primary and F for foreign. I'm not listing the indexes or unique constraints, if there'd been any, in this diagram, although you could if you liked. The purpose of the diagram for me is to show how the tables are linked together, and neither indexes nor unique constraints affect other tables.
They're kind of internal to their table. Now we can see that the rental table has links to the inventory table. So let's look at that one next. Go to the inventory table and the structure tab. We can see that there are two foreign keys here and one primary key. So now we can add the inventory table to our diagram, and also add a link in there too. The inventory_id field in the rental table is a foreign key and the master copy of that data is the inventory_id field in the inventory table, which is a primary key.
I've drawn a line there connecting those two fields. We can see from the blue box that the rental table also has connections to the customer and staff tables. So let's have a look at those. Customer has two foreign keys and an index on last name. Staff has two foreign keys as well but no index. I guess you wouldn't need to search on staff very often. So we can add staff and customer tables, and the links between them, into our diagram.
Now we can see here, where it says address_id (F), and address_id (F) in the staff table as well, that's telling us that both customer and staff addresses are stored in the same table, which I'm guessing will be called address, given that the name of the foreign key is address_id. That's a design choice. Many database designers would have kept the customer address in the customer table and staff address, if it were needed at all, would go into the staff table. Using this method has probably created a slightly smaller database because we're not declaring fields for street and zip code twice over.
But it does increase the chance of confusion, and it means that every time you set up a customer, you're inserting two rows, one into the customer table, and one into the address table, instead of just one. We can also see that both customer and staff tables have an associated store_id. This is telling us something about the business model of the film rental company. A customer is linked to a single store. It might just be a primary store, and actually they can borrow films from any of the stores, or it may be that they can only borrow films from their specified store.
I've had gym memberships like that. A well-designed database can imply certain things about the organization it's for. Next up, we can see the address and address-related tables. The address table is going to be updated fairly frequently, every time a new customer is set up, or a new member of staff, but it relies upon a city table, which in turn relies on a country table. The city and country tables effectively contain lists.
If we look at the city table, it's just a list of all possible cities, and it's worldwide by the look of it. Each city has an associated country, as it would in real life. This is reference data. The city and country lists aren't going to change very often at all. In fact, the system and its database will probably be obsolete before these lists need to change. Even with this kind of data, you can see you have a primary key field and an audit column.
So why bother? Why not just put the cities and country names directly into the address table and save yourself two tables? The answer lies jointly in data integrity and user convenience. If you allowed the staff member to type in the city and country, you'd get all sorts of typos creeping in. Plus it saves time for the staff member. Instead of typing something in, in the front-end system that's based on this database, they're probably just selecting the option from a drop-down.
They select the city, and then selecting the city probably automates the selection of the country. It's only a few characters or clicks for them, instead of typing in the contents of two fields. And it's a lot less responsibility for keeping the data in good order. The city and country table almost belong to the address table. They're not used anywhere else. They're like supporting reference tables. Next up, we have the payment table. Between them, the rental, payment, and inventory tables are like the prime movers of this database.
They are the tables that update frequently. Every time a movie is borrowed. It's quite normal to keep the payment table separate from others, particularly where you're taking card payments. This is the table that contains fields that your bank updates when you take a payment. And they contain things like whether the payment was successful, and if it was unsuccessful, why it was refused. The inventory table is presumably for stock checking, essentially. And again, this is a design choice.
If store_id had been stored in rental, these two tables could probably have been merged. I know, because I've checked, that we have two remaining categories of data. We've got information about the store, and information about the films. The store data is fairly simple. It's a single table, which relies upon the address field, just as the customer and staff tables do. Notice that we have a field called manager_staff_id in the store table, which is linked to a field called staff_id in the staff table.
This is the first time we've seen a foreign key that isn't called exactly the same thing as the master data field. How can we be sure that this is even the correct link, then? To double-check these links, we can go to the store table, click on the structure tab, look at relation view, and here we see the second foreign key here takes a field called manager_staff_id and links it to staff_id in the staff table. That's how you'd check.
So now, for all the other tables. They come as a group, really, or perhaps three mini-groups. And they all describe the film in some way. We have the film table, which depends upon the language table. Language is a reference table, containing a pretty static list of languages, and film uses this list to identify which language the film is in. Then we've got film_category, which is dependent on category. Again, category is just a small table containing a list.
And last, we have film_actor and actor. And this is how they're linked together. The film table contains one row per film. You might wonder why film_category and film_actor are needed. Couldn't the category and actor go straight into the film table? I suspect this was done to solve a common problem in database design. What I think of is the one-to-many problem. There will be multiple actors in each film, and there are probably multiple categories that apply to each film as well.
To save this information in the film table, we would either have to save multiple values into one cell, which is very poor design, or we would have to repeat the film listing once for each actor or category. It's generally a good idea instead to have one row per film in this table, one row per category, and one row per actor. You want to keep one row per unique value. Hopefully, by now you can see that not all tables are created equal. Logically they are, but they are used for very different purposes.
There are reference tables, which are now shaded in green. These are rarely, if ever, updated, and typically contain a glorified list. They are used by other tables to help maintain data integrity by allowing system users to select a value from a drop-down rather than typing it in. Then there are semi-static tables, which update for a new customer, say, or a new staff member. And they're now shown in pink. And the remaining blue tables update every time a rental is made, or a new film is brought out, so they update much more frequently.
I've created this database diagram just by following the trail of foreign keys, and you can do the same with your database. It's the quickest way to get well-acquainted with it. You might be able to save some time by asking your IT colleagues for a data dictionary or other documentation. This will often include diagrams just like this one.
Join Emma Saunders as she shows you how to design and write simple SQL queries for data reporting and analysis. Review the different types of SQL, and then learn how to filter, group, and sort data, using built-in SQL functions to format or calculate results. Learn a bit about data types and database design. Discover how to perform more complex queries, such as joining data together from different database tables. Last but not least, Emma shows how to save your queries as views, so you can run them again and again.
- Using different versions of SQL
- Retrieving data with SELECT statements
- Filtering and sorting your results
- Transforming results with built-in SQL functions
- Grouping SQL results
- Merging data from multiple tables
- Identifying data types, and how to make sense of your database design
- Saving SQL queries