navigate site menu

Start learning with our library of video tutorials taught by experts. Get started

Foundations of Programming: Databases

Foundations of Programming: Databases

with Simon Allardice

 


Discover how a database can benefit both you and your architecture, whatever the programming language, operating system, or application type you use. In this course, explore options that range from personal desktop databases to large-scale geographically distributed database servers and classic relational databases to modern document-oriented systems and data warehouses—and learn how to choose the best solution for you. Author Simon Allardice covers key terminology and concepts, such as normalization, "deadly embraces" and "dirty reads," ACID and CRUD, referential integrity, deadlocks, and rollbacks. The course also explores data modeling step by step through hands-on examples to design the best system for our data. Plus, learn to juggle the competing demands of storage, access, performance, and security—management tasks that are critical to your database's success.
Topics include:
  • What is a database?
  • Why do you need a database?
  • Choosing primary keys
  • Identifying columns and selecting data types
  • Defining relationships: one-to-one, one-to-many, and many-to-many
  • Understanding normalization
  • Creating queries to create, insert, update, and delete data
  • Understanding indexing and stored procedures
  • Exploring your database options

show more

author
Simon Allardice
subject
Developer, Programming Foundations, Databases
software
Access , MySQL , SQL , PostgreSQL , MongoDB , Cassandra , Oracle
level
Appropriate for all
duration
3h 11m
released
Jan 25, 2013

Share this course

Ready to join? get started


Keep up with news, tips, and latest courses.

submit Course details submit clicked more info

Please wait...

Search the closed captioning text for this course by entering the keyword you’d like to search, or browse the closed captioning text by selecting the chapter name below and choosing the video title you’d like to review.



Introduction
Welcome
00:03Hi, I'm Simon Allardice, and welcome to Foundations of Programming: Databases.
00:09It really doesn't matter what programming language you use, what operating system you
00:14like, what kind of app you're building, database skills are vital.
00:18So in this course, we're going to start at the very beginning with: what is a database?
00:23Why would you want one?
00:24And what problems you are likely to run into if you don't have one?
00:28And we'll talk about the available database software options from small personal databases
00:34installed on a desktop or laptop, all the way to the current open source big data solutions like MongoDB and Hbase.
00:42But we'll focus mainly on classic relational databases.
00:47What are the questions to ask, the steps to go through, the things you need to think about.
00:51And we'll talk about query languages, the actual code you'd write to get information in and out of the database.
00:59What we cover in this course, you could be using not just years but decades from now, so let's get started.
01:06
Collapse this transcript
What you need to know
00:01This course isn't about using one particular database in one particular environment.
00:06The aim here is for content that you will find relevant and useful across multiple languages and platforms.
00:12But I do expect you know the basics of programming in at least one language, because I will end
00:16up talking about common programming ideas like Loops and Conditions, Variables, and Arrays.
00:23And if those aren't the terms you're comfortable with, take a look at our Foundations of Programming: Fundamentals course.
00:29Now, you may have already worked with databases, they are kind of hard to avoid if you do anything
00:34with computing, and whatever experience you have with them, great.
00:38You may be familiar with some of the concepts and terms like tables, rows, and columns, primary keys, and foreign keys.
00:45And if so, you might be tempted to jump ahead past the initial content.
00:50However, if up to now you've been primarily using a database that somebody else designed
00:56or using desktop database software like Microsoft Access or FileMaker, know that there's often
01:01a lot more going on than is apparent at the surface level.
01:04So I would invite you to sit back and go through all the content, because even in the early
01:09stages you may hear something new or a concept will have a new meaning for you.
01:14
Collapse this transcript
1. Understanding Databases
What are databases?
00:01As with most things in programming, if you want to understand databases, then don't jump
00:06directly into the features of database software.
00:09First, go the other way, realize these things were designed to solve a problem.
00:15So what's the problem? Well, that might seem like an easy one.
00:20You have some data, some information, you, your company, your clients, or even the business
00:25that you want to create, there is some data you need to store.
00:30And this data could be anything.
00:31It could be information about customers, products, employees, orders, details about the visits to your website.
00:39This data could be in text format, it could be names and descriptions, or numeric, amounts
00:44or dates, or this could be document files or images, audio or video.
00:50But here's the thing, you can already store data.
00:54You could just open up a text file and type your information into that or open up a spreadsheet and do that too.
01:00If you've got documents, just organize them in folders.
01:03Now, many small businesses would start with something like this, with a spreadsheet and
01:08a lot will continue like this for years, and that might be okay, because just having data
01:15is not a good enough reason to need a database. Having data is not the problem.
01:20The problem is what comes next, and there's a lot of potential problems, but I'll describe
01:26six of them without even getting technical.
01:29Size of your data is a potential problem, ease of updating your data, accuracy of it,
01:34security of it, redundancy in it, importance of the data.
01:38Let's take these one by one.
01:40First, size, because what starts off as a small amount of data it has a tendency to
01:46turn into a large amount of data, and if your spreadsheet solution is nice and speedy when
01:52there's 100 lines in it, what happens when there's 2 million lines in it?
01:57Maybe you'd start to split that up into a bunch of different files and then you had a sub-problem, speed.
02:03How would you find anything?
02:06Next, there's the idea of ease of updating your data.
02:10What happens if two people need to edit this spreadsheet at the same time?
02:15How about 20 at the same time or 200 people at the same time?
02:18If you work with a file-based system, it's just not going to happen.
02:22You'll end up with everybody overwriting everybody else's changes.
02:27Then there's accuracy of the data.
02:29Is there anything that would actually prevent me from typing incorrect data into a spreadsheet
02:35like this, to fat finger a date as being in the past when it should be in the future or
02:39just missing some vital piece of information out with the intention that I'll come back
02:44to that and do it later.
02:46Well, usually, there's nothing that would prevent me from doing so.
02:49So as these files get older and older, they start to fill up with data that isn't accurate,
02:54it isn't consistent, it isn't trustworthy anymore.
02:58And then there's the idea of security. Most of the time, you need to show this data, but
03:02just because it needs to be shared with someone, it doesn't mean it needs to be shared with
03:07everyone, because some of this could be sensitive, it could be payroll information, healthcare information.
03:14So, who would get to view it, and who would get to edit it?
03:18And if they do get to edit this data, is all we know just the last person who edited this
03:23file or do we know who made every entry and every change at every point, because with
03:29some information, healthcare, financial, there are laws in place to make sure you're auditing
03:35every alteration to sensitive data.
03:37And there's the idea of redundancy or duplication of your data having multiple copies of the same data.
03:44Now, redundancy by itself is not a bad thing, but backups, it's good to have redundancy,
03:51but in the data itself, not so good, because it leads to conflicts.
03:56Maybe one file ends up with two entries for the same product.
04:00One says the product is $12 and the other says the product is $10, which one is right?
04:06Or maybe that information is spread over a couple of different files, one owned by one
04:10department, one by another, which one is true?
04:13And then there's importance of your data.
04:17You've probably felt the pain of working in a spreadsheet or a word processor where something
04:21happens, there's a crash or a disconnect, and you lose 5 minutes of work, or an hour of work, or a day.
04:30It's annoying, it's inconvenient, but if the information you just lost was yesterday's
04:35orders, or the allergies of a patient, or the details of a stock trade, or the seat
04:40allocation of a Transatlantic flight that someone just paid for, that's not inconvenient, that's unacceptable.
04:48The data is critical. With many companies your data is your entire business.
04:54You can't lose any of it, not a single change, and you might have one of these problems.
05:01You might have all of these problems and more besides, but these are the reasons we need a database.
05:08These are the problems.
05:10It's not about providing somewhere to put your data.
05:12We could already do that.
05:14It's about allowing that data to grow and still be manageable, still be easy to get
05:18to everything, about letting it be easily updated even by lots of people at the same time.
05:23About keeping it accurate and internally consistent, keeping it secure and controlling access to it.
05:30Who gets to read it? Who gets to edit it?
05:32And tracking exactly what they did, and when they did it.
05:35It's about making sure that while your backups might be redundant because that data is critical,
05:40your data itself is not.
05:42There is one version of the truth, and once you make a change to that data, you will never
05:48lose that change, you'll never lose that data.
05:52This is what a database can give us, structure. It's not a place to put your data.
05:57It's a structured system to put your data in that imposes rules upon that data, and the
06:04rules are yours, because the importance of these things change is based on your needs.
06:09Maybe your main problem is size.
06:12You have terabytes of information, then you'll have different priorities from someone who
06:17has much smaller amounts of data where every single change must be controlled, audited,
06:23and logged the moment that it's made.
06:26And this is why if your experience is just using a database someone else has made, you haven't seen what's important.
06:33That's totally forgivable, with someone whose only time working with databases is using
06:38an old FileMaker or Microsoft Access database that they think this is what a database is.
06:43It's like a spreadsheet, it's just more tedious to use, and that's not it at all.
06:50This is the true power of a database, it's invisible.
06:55It's the things you can't see that are going on in the background, the security, the enforced
07:00integrity of the data, the ability to get to it fast and get to it reliably, that it's
07:05robust, it can withstand lots of people at the same time and even correctly survive crashes
07:09and hardware issues without corrupting the data.
07:13And that's what we need to do here, understand how to describe our structure and define those
07:20rules, so all these invisible stuff will actually happen.
07:23Now up to this point, I haven't gone into any particular database software or even any
07:28particular type of database, because there are different types, and some of them are better
07:33at one or more of these problems, and we'll get there.
07:37But there are a couple more terms to cover before we do.
07:42
Collapse this transcript
Exploring databases and database management systems
00:01First, we better get clear with the most important term here, database, because it's one of these
00:07words that are often casually misused even in conversation between programmers.
00:12Someone will say their database is Oracle or it is SQL Server or it is MySQL or DB2
00:18or PostgreSQL or MongoDB and so on. But let's avoid some confusion.
00:25These aren't databases, they are Database Management Systems, often shortened to DBMS.
00:34And there's a big difference between your Database Management System and your database.
00:39So, all these products here, they are Database Management Systems.
00:43We would install the DBMS. We would install the software.
00:48That could be a simple desktop Database Management System like Access or Bento, all the way the
00:54things like Oracle or DB2, which you wouldn't install on a personal machine but on a separate server.
01:00And then you'd use that software, that DBMS, to create and manage one or more databases.
01:07The database is your data and your rules about that data.
01:13The Database Management System, the DBMS, is the program or the set of programs that surround
01:18and manage it to make sure your rules are applied.
01:23And one DBMS can manage many different databases inside it, each one separate from the other
01:30with different rules, different data, different security.
01:33Now in a diagram of these, you'll often see an entire databases represented as cylinders.
01:38In a perfect world, you might imagine that an organization would have just one database
01:45that contains every single scrap of information important to that company, and in practice
01:50it's very common to have multiple databases.
01:53The database that deals with your order and customer information might be completely independent
01:58from you database that deals with human resource information.
02:03And in many organizations, you don't just have multiple databases but multiple
02:08Database Management Systems, such as SQL Server and Oracle, DB2 and MySQL, PostgreSQL and Hbase.
02:17And sometimes it's because you're using another software product that requires a certain DBMS,
02:23sometimes it's because one Database Management System is better at something than another
02:28is and sometimes because the company might acquire another company and bring it along with the purchase.
02:35But even though there are differences between the different Database Management Systems,
02:42we don't really need to stress about learning every single one because Database Management
02:48Systems fall into broad categories, different types of DBMS.
02:52And the most common by far is what's called a Relational Database Management System, or RDBMS.
03:01Things like Oracle, SQL Server, DB2, MySQL, they all fall into this category.
03:06Other types of Database Management Systems include Hierarchical Database Systems, Network
03:11Database Systems, Object-Oriented Database Systems and more recently, several that fall
03:17in the category of NoSQL Database Systems.
03:21Now, I'm not going to get into the specifics of the differences, not yet anyway, because
03:27we don't have enough of a base of knowledge to make these distinctions worthwhile.
03:32We're going to focus first on understanding Relational Database Management Systems for a couple of different reasons.
03:39One, they are by far the most common, the most widely used.
03:43Two, the same principles are usable across all of them.
03:47And three, even if you know right now that you'll be jumping into the more recent NoSQL databases.
03:55You'll find that even the most introductory discussion of NoSQL databases will assume
04:00that you already understand Relational Database concepts, and it will use those concepts to
04:05compare and contrast and explain the newer systems.
04:10
Collapse this transcript
2. Database Fundamentals
The features of a relational database
00:01We're now going to cover the most important concepts in a typical relational database.
00:06I'm expecting many of you have had some exposure to these concepts already if you've done even
00:10a little work with databases, but let's make sure we're all on the same page.
00:15A database is constructed of one or more-- usually more--tables, and it's a table that is the
00:22most basic, most fundamental building block of a database.
00:27A database without table is just an empty shell devoid of meaning. Not only is there
00:31no data in it, there's nowhere for any data to go, because all your data goes into a table
00:38and each table describes a formalized repeating list of data.
00:43Visually, it's often shown like a spreadsheet.
00:47These tables consists of columns and rows, rows going from left to right, columns, like
00:53the columns holding up a building running up and down.
00:57And if you have multiple tables in a database, each different table is containing repeating
01:03information, repeating rows, but always about the same kind of thing.
01:07So, you might have one table with all your customer information in it, another table
01:12with all your order information, another table with all your employees and another with all your locations.
01:18And within each table, every repeating row represents one single customer, or one single order, or one single employee.
01:28But each of these rows is not free form. You must apply structure to this data.
01:35You must say what every row is made of, and you do this by defining the columns in that table.
01:43Now, you might have 2 columns in a table or 5 or 50 columns.
01:48Each column describes one piece of data.
01:52It gives it a name like FirstName, LastName, HireDate, and very important, not just a name,
01:58but it also says what type of data this must be, perhaps, a piece of text for a name or a date or a number.
02:09Now we're going to go more into the different types of data they can be a little later.
02:14But every row must follow that same structure, following that same format.
02:20It's not allowed to deviate from the way that the columns are set up.
02:25So if this table contains all customers, each row is one customer and each column describes
02:32one very specific piece of data about that customer.
02:37And by deciding what these columns are and defining them, we're imposing rules on the data.
02:43Because once we describe these rules, the Database Management System will not let us break them.
02:49If I say upfront that the column called, HireDate must contain a date, then I cannot create
02:54a new row with just random text in there.
02:58If I say that the grade must contain a value, then I cannot create a row that leaves that piece of data empty.
03:06So we have tables, rows, and columns. It doesn't sound too bad, and it isn't.
03:13But realize that you won't think about these three things in the same way at the same time,
03:20because the up-front work is your tables and columns, later on, comes rows.
03:26When you just talk about tables and columns, you're saying what data should be in this
03:30database and what exactly should each little piece of data be.
03:34And it's after you've defined it, your day-to-day use is working with rows, adding new rows of data,
03:41updating rows, deleting rows, reading existing rows.
03:45Now one of the first requirements you'll run into is the ability to specifically get to
03:51one particular row in a table, to one customer or one order or one location row.
03:58And to do that, we need the next idea, the idea of a key.
04:03
Collapse this transcript
Exploring unique values and primary keys
00:01Almost all tables in a database will require a key.
00:05The key is a way to identify just one particular row in any table, and it's created typically
00:11as one column that will contain a guaranteed unique value for each row.
00:17And this is something you can do in almost all database systems.
00:20Say that for a table this particular column contains unique values, and this one does not.
00:26Now when a column is defined as containing unique values, it means just that.
00:32But even if there were a million rows in this table, the same value cannot occur more than once in that column.
00:40If you try, it will disallowed by the Database Management System.
00:44So whether that's a number, a piece of text, a date, whatever, it cannot repeat.
00:50Now if it's not unique, which is the default in most Database Management Systems, it doesn't matter.
00:56You can just repeat that value all you want.
00:59And most of your columns don't need to be unique and shouldn't be.
01:04If in your table you made say the FirstName unique, then you could only ever have one
01:09employee row with the FirstName of James.
01:12Or if you make HireDate unique, then only one employee can have a HireDate of the 1 October, 2006.
01:19Now, it is true that there might be just one employee called James in your table, but you
01:25probably don't want a database that enforces that rule.
01:29Now, sometimes there might be a column in your natural data that does uniquely identify
01:36its row, like having a Social Security number for employees or an ISBN number for books.
01:43That should be a number that is naturally unique.
01:46There should never be duplicates.
01:49But much of the time, there isn't one piece of natural data that's unique, so you will make one instead.
01:56You will tell the database to generate a new column, a unique column like a customer number
02:03or a product ID or employee ID, and most Database Management Systems will help you
02:09generate those kinds of columns with these kinds of values.
02:13And this column is what is referred to as your Primary Key for this table.
02:19It's the one piece of data that will identify one and only one row in this table.
02:25And the thing is even if you do have a piece of real data that uniquely identifies a row,
02:32like a Social Security number, you'll generate one of these kind of columns anyway, like
02:38employee ID, because this is a number that could be passed around in emails or printed
02:42on an ID card, unlike a sensitive piece of data, like a Social Security number.
02:48Now sidebar, if this Primary Key is generated by the database and doesn't just naturally
02:54occur in the data itself, you might also hear referred to as a synthetic key or a surrogate key.
03:00It is still a Primary Key.
03:03Those words just suggest that it's a system-generated value.
03:06Its only reason to exist is to be the Primary Key.
03:11Now, there are other kinds of keys, Foreign Keys, Compound Keys, but we'll cover those later.
03:17The Primary Key is the first and most important key in any table, whether it's natural or
03:23generated, because it's your Primary Keys that let you not only identify each row and
03:29each table, but also start to connect your different tables together.
03:34
Collapse this transcript
Defining table relationships
00:01While any database begins with defining your tables, a vital next step is the ability to
00:06add relationships from one table to another, because much of your data is naturally connected.
00:13You're not trying to invent arbitrary relationships that don't exist, you're trying to describe what's already there.
00:19So, you might first define a customer table by itself or an order table or a product table.
00:25But orders are placed by customers and they are orders for specific products.
00:31Yes, information about an order is different from information about a customer, but they are related.
00:39So we need to formally describe the relationship between our tables, and the way we describe
00:44relationships is based on keys. Let's go through an example.
00:48I have a simple Customer table here, and it has this column in the start, CustomerID, which
00:56is set up to automatically generate a unique number for each customer row as it's added to the database.
01:03So, that's our Primary Key here.
01:06And one of the benefits of having that key is we can also use it elsewhere.
01:11So, I have another simple table, Order.
01:15And each row here represents one order, so it has its own Primary Key, order ID, which
01:21also is generating an automatic number.
01:23Now, I'm going to imagine we have such a straight forward business that we only sell one product.
01:29So I just need to know a quantity and the total amount due.
01:34But each order is an order for a particular customer.
01:38And to process an order, you would need to know who this applies to.
01:42So, how do we get that data?
01:44Well, we could add columns to the order table and start copying across all the relevant
01:50data from the Customer table into the Order table whenever an order is placed.
01:55But A, that would be duplication of data, something to avoid in general.
02:00And B, there's just no need to do this.
02:03Instead, what we do is we add a CustomerID column to our Order table.
02:10We're taking the key to one row in the Customer table and using it again in the Order table
02:15to describe the connection between the two. Now, we're not doing this casually.
02:20We actually tell the Database Management System that this is a formal relationship between
02:26these tables, so every order row now has a CustomerID.
02:31But whereas the Customer table has a CustomerID that has to be unique,
02:36in Order table, it doesn't have to be unique.
02:38We could have the same CustomerID occur twice or three times or a dozen times.
02:44In this case, 367 is occurring twice, multiple orders by the same customer, because here,
02:50this number is not uniquely identifying a row in the Order table, we have order ID for that.
02:56So in the Customer table, CustomerID is our Primary Key.
03:00But when it's being used in the Order table, it is not a Primary Key, though it is still a key.
03:05It's referred to as a Foreign Key and is not unique.
03:08Now, the benefit of having this relationship defined in the database is we can go either way.
03:14We could start at the customer row and then take that CustomerID and go get every order
03:20for that customer, or we can go from the order row, find the CustomerID and find out which
03:26customer is associated with that particular order.
03:30And when you have a relationship like this, it's what's defined as a one-to-many relationship.
03:38One customer can have many orders.
03:42In a database diagram, you'll see this sketched out a couple of different ways.
03:45Some people use the crow's foot symbol to show which way the relationship goes.
03:50In others, you might see the one to infinity symbol.
03:54So, one customer has many orders, one category has many products, one department has many
03:59employees, one classroom has many students, and so on.
04:03And it's very, very common to have these kinds of relationships between your tables.
04:09Most tables in well-designed databases have relationships to one or more other tables.
04:15Now, bear in mind, when you describe a one-to-many relationship in a database, you aren't required to have many.
04:24A customer is not required to have many orders.
04:27Customer might place only one order, or might place no orders at all.
04:31But the database internal rules support the idea that one customer can have many orders,
04:37one category can have many products, one department can have many employees.
04:42But the opposite of these statements is not true, at least in the imaginary business I'm describing.
04:48So, while each customer can have many orders, each order is for only one customer.
04:55I can't say that an order has many customers, it can't.
04:59A department has many employees, but an employee only has one department.
05:03A product only has one category.
05:05Now as always, I'm trying to describe simple business examples that might be a little mundane, but easier to grasp.
05:13If your actual business rules support a more flexible situation that perhaps an order can
05:19involve multiple customers or an employee can work for multiple departments at the same
05:25time, you may have to define a different kind of relationship.
05:28Instead of one-to-many, many-to-many, and that's up next.
05:33
Collapse this transcript
Describing many-to-many relationships
00:01While the most common relationship between tables in a relational database is one-to-many,
00:06it's not unusual to sometimes need to describe a many-to-many relationship.
00:10We've got two problems here.
00:14Problem number one, a small problem, one-to-many relationships are easy to describe, but they're all over the place.
00:20But many-to-many scenarios are a little trickier. They're not so obvious to spot.
00:25Problem number two, a bigger problem.
00:28In most relational Database Management Systems, you cannot express a many-to-many relationship directly.
00:35It sounds like this could be a short movie.
00:38But while you can't create a direct many-to-many relationship, you can do it indirectly, and here's how we do it.
00:46As ever, let's focus on the business problem, not the technological fix.
00:51So, say we're a publishing company.
00:54We have an Author table with a list of a few authors inside it and their names, contact,
00:59information, some kind of Primary Key for them here, AuthorID.
01:04And we also have a Book table with the list of our book Titles, also with its own Primary Key, BookID.
01:12Well, at first glance, we might say, okay, this looks like it could be a one-to-many relationship.
01:19We could take the Author number 447, Jordan Winters, and say we want to represent that
01:25author has written both book 1145, Designing Databases, and book 1147, Pocket Guide to SQL.
01:32And we'd like to say that, say Robert, number 446, wrote book 1146, SQLite Made Simple.
01:39So, if this was a classic one-to-many relationship, one Author with one or more Titles, we could
01:48do this by just adding a new column to the Book table, AuthorID column.
01:53It would be a Foreign Key to the Author table. So far, so good.
01:59But here's the issue.
02:02What happens if in a day or a week or a month, we're told that the Designing Databases book
02:07was written by two different authors?
02:09Well, the way we have this described right now, we can't do that because our AuthorID
02:16column can only store one, storing a Foreign Key to the author 447 or 445, so I can currently
02:22support a one-to-many relationship, but I need a many-to-many relationship.
02:28One author can have many books, but also one book can have many authors.
02:34Now the way some people try and model this is they will add another column to the Book table.
02:41They'll add in an AuthorID two column, and they'll have this one, say 445 pointing to a different author.
02:49However, adding new columns to your tables, particularly these kind what I refer to as
02:54repeating groups or repeating columns, you're just doing the same information again, that's
03:00a bad idea, and it's discouraged in database design.
03:04So we'll get rid of that technique.
03:06Well, some other people think I'll just cheat a little bit.
03:11I'll do something quick and dirty, and what I'll do is I'll slide in two values into that AuthorID column.
03:17Maybe I'll use comma separated values and squeeze in another reference so that the
03:22AuthorID column now relates to two authors. But that's also a cheat.
03:27And like adding a new column, this is highly discouraged as well.
03:32So how do we solve this?
03:34Well, in fact, we're going to solve this problem by getting rid of that AuthorID column entirely.
03:41And in fact, first, we go back to two completely detached tables with no official reference between them.
03:49And what we do to create a many-to-many relationship is we add another table.
03:55We add what's often referred to as a junction or linking table.
03:59Now the only reason for this table to exist is to join the Author table and the Book table together.
04:06And in fact, the name of this linking table by convention would be Author Book.
04:12It could also be Book Author.
04:13It doesn't really matter which way it goes, because what we're using it for is to set
04:18up two one-to-many relationship.
04:21So, what we do is we define a one-to-many relationship from Author to Author Book.
04:27So, one author with AuthorID 445, that's Primary Key here, but a Foreign Key here.
04:31It's not unique there.
04:33It could exist in that table once or twice or three times or a dozen.
04:38And using this, we could go from Author to Author Book, find a BookID and map that to
04:45the Book table, and we can also do it the other way.
04:50We can take the identity of a book like number 1145, take it back to this junction or linking
04:57table, in this case to two rows, and there, go to the Author table. Grab the AuthorID.
05:03Follow that back up to the Author table.
05:05So we're now expressing a many-to-many relationship.
05:10One author can have many books. One book can have many authors.
05:14And the only reason for the existence of this table in the middle is to join the other two
05:19together, and that's how we do it.
05:21Now in a large database, you're going to end up with a lot of one-to-many relationships
05:27between your tables, and just a few of those were really being used to create a many-to-many relationship.
05:35Officially, there is a third kind, a one-to-one relationship that is possible, but it's not common.
05:42If you think about it, if one row in one table is pointing to one and only one row and another
05:47table, well, you might as well just combine those tables so it's one row in both places.
05:52So these are the official kinds of relationships, what is sometimes called the cardinality of
05:57how tables relate to each other.
06:00One-to-many, very common, many-to-many like this, occasionally required and one-to-one not common at all.
06:09Now in any large database, you'll also find some tables without formal relationships.
06:13Some just don't need them and some people consider none to be a fourth kind of cardinality
06:19or fourth kind of relationship. I don't think so, but you will see it.
06:23I certainly don't have to connect every table to every other table, but most of them will
06:29end up of being related to at least one other.
06:34
Collapse this transcript
Transactions and the ACID test
00:01When we're entering or changing data in our databases, we'll often need to work with transactions.
00:05A transaction is an incredibly important thing in the world of databases.
00:10And to understand them, the best way is to think about what we think of as a transaction in the real world.
00:16If you talk about making a transaction, you often mean something commercial.
00:20You hand over $15 to a bookseller, the bookseller gives you the book.
00:25That's the transaction, and it's important that both of those things happen.
00:29If you hand over the money, you expect to get the book.
00:32If they hand you the book, they expect to get the money.
00:35Either both of these things happen, or neither of them do.
00:38Now in a computing system, a classic example of transactions is a banking system.
00:44Imagine that you log on to your bank's website, and you want to transfer $2,000 from one
00:50of your account to another.
00:52Now, that's going to require two updates to this data, one to subtract $2,000 from
00:59the savings account, and the other to add $2,000 to the checking account.
01:04Now if the first part of this happens successfully, subtracting $2,000 and then we try and
01:11add $2,000, but there's a problem with the second part, perhaps it's locked for editing.
01:17Well, we will need to reverse the first part of this transaction.
01:21We never want to be in a situation where $2,000 has been debited from the first account, but
01:27it didn't get added to the second.
01:29So, a transaction is how you define a combined unit of work, either both of these things
01:36happen or neither of them do, and the first change will be reversed instantly by the database
01:41if any part of the transaction fails.
01:43Now, there's a common acronym that you'll come across when working with transactions
01:48in a database, ACID, A-C-I-D.
01:51A transaction must be Atomic, Consistent, Isolated, and Durable.
01:57Being Atomic, and this is the Greek word atom meaning an indivisible unit, it refers to
02:03the core idea that this transaction must completely happen or not at all.
02:08So, whether there are two steps in the transaction or 20 steps, they're all contained within the transaction.
02:14They either all complete successfully, or they all return to the original state.
02:19There is no such thing as a transaction that halfway occurs.
02:23Now whether the reason for the transaction failing is that the database had a power failure
02:27or ran out of space to hold a new data, or there was an application error, it doesn't matter.
02:32Atomic is the all or nothing rule.
02:35Now Consistency means that any transaction must take the database from one valid state
02:41to another valid state based on the rules of the database.
02:44So, even if a transaction is successfully atomic, it still cannot result in a situation
02:50that violates any of the integrity rules defined in a database.
02:54Isolation refers to the data and the transaction being essentially locked for that moment in
03:00which the transaction is occurring.
03:02So, while we're trying to change a balance on one of our account records, another part
03:07of the system must not be allowed to access that data until the first transaction has finished.
03:13And durability refers to the transaction being considered robust.
03:18If it happens and the database says this transaction has happened successfully, then
03:23the transaction is guaranteed.
03:26Say if you go to a travel website, purchase a flight on an aircraft, and you're guaranteed
03:31seat to A, then you should be able to regard that transaction as being durable.
03:36Even if half a second later, the database suffers a power failure and shuts down, when
03:41it reboots, that transaction will have survived that failure and they are not going to sell
03:46that seat to someone else simply because there was a glitch in the system half a second after you made your purchase.
03:52Now, the great thing about working with most Database Management Systems is these capabilities
03:58are built into the system.
04:01You don't have to worry about how to program these.
04:04You just need to know when to tell the database. Now, this is a transaction.
04:08These three things, these four things, these 10 things must be done together.
04:14That's what we'll see how to do a little later on.
04:19
Collapse this transcript
Introduction to Structured Query Language (SQL)
00:01SQL, or Structured Query Language, is the common language that lies at the heart of every relational
00:07Database Management System that you'll likely to use.
00:09Now, you've probably noticed the different pronunciation already.
00:13Some people say S-Q-L, some people say sequel.
00:16I tend to say S-Q-L when I'm talking about the original language itself and sequel when it's
00:21combined with something else as in SQL Server, or T-SQL, or MySQL.
00:26Now SQL, the language, has been around since the 1970s, and that's one of the few languages
00:32I was writing 25 years ago that I'm still writing now.
00:35A lot of other programming languages have come and gone, become fashionable and unfashionable,
00:41but SQL has stuck around, and I expect it to stay for a long time to come.
00:45But SQL is a very small, very focused language, and the key to first learning it is to understand
00:52that it's a little different from other programming languages you might have come across, like C, C++, Java, or Python.
01:00SQL is what's considered a declarative query language, as opposed to procedural or imperative languages.
01:08Okay, that's jargon, yes.
01:10But what it means is you use SQL to describe what you want, and you let the Database Management
01:17System handle how that's actually done.
01:19You don't have to manually lay out the algorithm, the different steps of the procedure as you
01:24would do in other programming languages.
01:27Say if I have a Books table with, perhaps, thousand different books in it, and I want
01:32to know which books have a list price of more than $40.
01:36Well, in a procedural or imperative language like C or Java, I'll have to write the steps to do this.
01:43I would probably write some code that would start at the first book and then would loop
01:48through all of them one by one.
01:51And every time going through this loop you're asking the question, "Is this more than $40?
01:55If so, do one thing. If not, do something else." Even writing this in pseudo code, I'd be writing a loop.
02:02I'd be writing conditions. I'd be writing returns.
02:07But with SQL, you don't describe all the steps, you just describe the outcome.
02:13You describe what you want the same way you might do in English, where you just say, I
02:18want to know all the books more than $40, or this written in SQL, SELECT * FROM Books WHERE ListPrice > 40. That's it.
02:30Select or go get everything from the Books table where the price is more than $40.
02:35The Database Management System will take this, it will look at all your data, figure it out,
02:41and return the result set, whether that would be one book, 500 books, or even none, based on your query.
02:48So SQL can be used to select, to retrieve, or read data, and to ask questions of it.
02:55It can also be used to create data, to update data, and delete data.
03:00In fact, all of these things are referred to with the wonderful acronym of CRUD: Create, Read, Update, Delete.
03:08An SQL can be used to create not just your data, but to define the databases themselves.
03:14Now most Database Management Systems have their own slightly different implementation
03:19of the core SQL language. If you work with Oracle, you use PL-SQL.
03:25With Microsoft SQL Sever, it's Transact-SQL, or T-SQL for short, but basic SQL knowledge works with all of them.
03:32And we're going to get into the specifics of SQL a little later.
03:35But first, we're going to get an overview of why so many database software options exist and what the differences are.
03:43
Collapse this transcript
3. Database Modeling: Tables
Introduction to database modeling
00:01Time to see how to design or model a relational database, and this is regardless of which
00:06relational database management system you're going to use to do it.
00:09We're developing the formal description of our database, so what's called the database
00:13schema, our tables, our columns, our primary keys in relationships.
00:18If you want a foolproof method of building a terrible database, it's to jump directly
00:23into the Database Management System software, whatever that is, and just start building something.
00:28That's a really bad idea.
00:30Planning is vital, because while in other areas of development, mobile programming,
00:35desktop, web development, I'm a big fan of agile, incremental, iterative approach.
00:41Building something quickly, getting it out, revising it, adding new features over weeks
00:45or even days, that's not what I want to do with databases.
00:49Relational databases reward upfront planning, because the entire point is to impose rules
00:55and constraints and structure on your data.
00:58We don't want to have one set of rules one week and a different set of rules next week.
01:02I made a comment once that building a database is like getting tattooed.
01:06You really want it to be correct the first time you do it.
01:10Changes are possible, but they are painful.
01:12Unlike tattooing, some changes may be easier than others.
01:16Adding something extra is a little easier than trying to directly change something that's
01:20been in place for a couple of years.
01:23But the great thing is relational databases aren't new.
01:26They've been around since the '70s.
01:28So you're coming in at a time when the methods for modeling a database have been battle tested over four decades.
01:34We know what works and what doesn't.
01:37And to design a good database, a large part of it is just to go through the steps.
01:41You see, database modeling is not a place to try and express your inner creativity
01:46and find wild and crazy innovative new ways of doing things.
01:50If you want to get wild and crazy and innovative, fine, but do it in your user interface,
01:55do it in your application.
01:57In your database schema, you want to be patient, methodical, and step by step.
02:03Now, as with other areas of software development, there are specific diagrams associated
02:09with planning a database schema.
02:10Unlike other areas, you can get super formal with these, but you rarely need to.
02:15A few basic lines and shapes are all we'll need.
02:19While you can use a diagramming tool like Visio or OmniGraffle and many of the Database
02:24Management System tools, and programming IDEs come with basic database diagram tools as well.
02:31All you really need to model a database, at least initially, is pencil and paper and be
02:36prepared to think for a moment about a few core questions.
02:41
Collapse this transcript
Planning your database
00:01There are two initial questions you need to ask before even starting your database schema.
00:06First, what's the point?
00:08What is this database for? And be careful of the first answer that leaps to mind.
00:12Sure, in most cases, you're building a database to support an application, whether that's
00:16a desktop or mobile or web app.
00:19Let's say you're building an online bookstore, well, it's way too easy to say the point
00:23of the database is to store product and order information, and think you're done.
00:29See this might be true, but what's the intention of this bookstore, whether it's a website or an application?
00:36Because wherever you wanted to go over the next year or two or five should affect
00:40what you're building right now.
00:41So even having an elevated, but your mission statement as corny as that might sound, will
00:46help you build a better database.
00:49If you have something like, "We help customers find books, discover what others thought about
00:54them, purchase and track their orders, contribute their own reviews and opinions, and learn
00:58about other products they may like based on people with similar reading habits."
01:03Well, you're going to build a very different database from that second description than
01:08you would from the first one. Now the next question, what do you already have?
01:13You might be lucky enough to be building this for a completely new business where nothing has happened yet.
01:18Most of the time, there is some existing process, this database is intended to replace or supplement,
01:23even if it's a manual one.
01:25So, if so, get any physical assets together, print-outs, order sheets, filing cabinets itself, and of course, people.
01:33It's going to help you figure out what data you need to keep track of.
01:37Next, do you have an existing database even if one isn't really worthy of the name?
01:42It could be in a desktop application, it could be in a set of spreadsheets.
01:46If so, what's wrong with it? What's right with it?
01:50Don't just make the assumption that you'll take everything you have as gospel and import
01:55it all into the new database.
01:57You might need to do that, but take the opportunity to fix any problems that you have,
02:03so you're not just recreating the same problems in your brand new database schema.
02:07So, understanding what you already have is essential before you can answer the first
02:13real question when designing a database. Which is, what entities do you have?
02:19We know that the relational database consists of one or more tables.
02:24Tables are the basic building block of a relational database, and you will create separate tables
02:30for each entity, that is each object or each thing that needs to be represented in the database.
02:36I'm using the word entity here just to keep things a little loose and abstract.
02:40We're trying to identify what entities we naturally have, rather than just directly
02:45asking, what tables do I need to go and make?
02:48Some of your entities might represent things that exist in the real world, Customer, Product,
02:54Employee, a Patient, an Asset, but others could be more abstract, a Blog entry, a Category, a Comment, an Appointment.
03:04A sidebar, there is some debate about whether to use plural or singular nouns when naming your
03:09entities or tables. Is it a Customer entity or a Customer's entity?
03:14Now I'm strongly in preference for singular nouns for your tables.
03:19That also seems more natural when you describe your entities that way, but I have certainly
03:23encountered databases where every table was named plurally, a Customer's table an Order's table.
03:30It's not a big deal, but I'd strongly suggest to pick one or the other, don't mix them.
03:36If you're on the fence about it, pick singular. So, sidebar over and back to it.
03:42Right now, we want nothing more than just to pick the most useful words.
03:47We're not yet worried about exactly what data to store for each one.
03:50That will come later, but one word of warning.
03:53Take care if you've already done or are doing a similar kind of process for your application
03:58itself, and you're working in an object oriented language, because you often ask similar kinds of questions.
04:04What exists in this application?
04:06You might even be writing use cases, and going through and picking out all the nouns.
04:11Sure, there is some crossover in, for example, a sales application I am likely to have both
04:17an order table defined in my database and an order class defined in my application,
04:23but it is different.
04:24In the relational database, we are concerned about the data that needs to be saved.
04:29Our focus is not on things like methods and behavior, or inheritance, polymorphism, like
04:34it might be in an object oriented language.
04:37So we're concerned here just about our entities, and we will be concerned about relationships between those entities.
04:44Relationships like the fact that our customer places an order then an order contains order
04:49items that refer to specific products whether the Paycheck is for a specific employee who
04:56belongs to a particular department.
04:58In fact, the stage of the process determining your early design is often referred to as
05:04Entity Relationship or ER modeling.
05:06It's a complex sounding term that at heart really isn't that bad, it's boxes with lines drawn
05:13between them, or it can be done very formally with a lot of specific connector lines,
05:19but that's overkill right now.
05:21One thing you might commonly see is the diamond boxes being used to describe the relationships,
05:26but right now our focus is to take the first spin at our entities, which will become our
05:31tables and sketch out the vague relationships between them.
05:35We'll describe exactly what those relationships are a little later, whether they're one-to-many, one-to-one, many-to-many.
05:42Right now, it's simply enough to show they exist.
05:47
Collapse this transcript
Identifying columns and selecting data types
00:01After figuring out at least your first list of what entities need to exist, you take a
00:05look at each one to specify exactly what data is important to store about them.
00:10Now officially, in an entity-relationship model, these are referred to as our attributes.
00:16And just as entities will become our tables, the attributes we draw it here for each one
00:20will become the columns in those tables.
00:23Now in sketching out an ER diagram, attributes are often shown like this, and you can be
00:28quite informal about writing them.
00:30But as you move into actually defining the official columns of the table, you're going
00:34to need to become very specific about what they are, so let's just get specific now.
00:40First, as a rule when defining a table, let's say an Employee table, you're going to go
00:45very granular, meaning, as individual as possible, so not just one column for name, but separate
00:52columns for FirstName and LastName or if your prefer, family name and given name and even
00:57other columns for middle initial or title or suffix, if you need those.
01:03The question might be, why do you split them up like this?
01:06And one reason is because you may do something different with that piece of data, and you
01:10want to get to it by itself.
01:12It is much easier to sort all your employees by surname or find out how many customers
01:17you have in a particular city, if you're storing that piece of data independently, rather than
01:23if it has to be extracted out of some larger value.
01:26Well, what should we call these columns? What names should they have?
01:30Well, you'll see a variety of ways that people name the columns in their tables, sometimes
01:35using underscores between multiple words, sometimes a lower case using Camel case, Pascal
01:41case, this has really more to do with your own naming conventions, than anything a database will enforce on you.
01:48As ever, the best idea is simply to have a standard.
01:51Write it down and stay consistent.
01:54I'll be using Pascal casing in my examples, meaning, I'll upper case each word, and I won't use underscores.
02:00Now, some databases do support using spaces in your column names, but I avoid those as
02:06they often mean a name has to be enclosed in quotes when referring to it programmatically,
02:11and this becomes inconvenient. But this is by no means enough.
02:16You next have to say what kind of data is going to be stored in each of these columns,
02:21really what is the data type for each column?
02:23Is it text or character data, or is it numeric?
02:28Is it a date, a time, or even binary data like an image, or a piece of audio or video?
02:34Data types in a database are absolutely not the same as data types in a programming language.
02:39You'll find that most database systems want you to be much more specific about your columns
02:44than a programming language wants you to be about your variables, and the Database Management
02:49System wants to know these specifics so it can be efficient about storing and indexing
02:54them, and so it can enforce your rules.
02:58For example, if you want to store an integer in one of your columns, you'll find that most
03:03Database Management Systems have multiple kinds of integer data type for different sizes of integer.
03:09If I look at the online manual for the MySQL Database Management System, I can go into
03:14the Numeric Types, and I'll find that just for integer alone, we got the regular INT,
03:19we've also got TINYINT, SMALLINT, MEDIUMINT and a BIGINT, all different lengths in terms
03:25of storage from one to eight bytes, and all different sets of numbers that they can take,
03:30whether they're signed to unsigned values. So, you can be very, very specific indeed.
03:35And you'll find the same in things like SQL Server as well and other Database Management Systems.
03:41BIGINT, SMALLINT, but no MEDIUMINT in SQL Server.
03:45Now, we can't cover every possible option in every possible Database Management System,
03:51but we will cover the basic issues because they're similar.
03:54What you'll find is when you get into using a specific DBMS, you'll just have a cheat
03:58sheet or a bookmark of all the available data types until you get to know them.
04:02Now, with some data types, things like character or text data, you can also specify if it should
04:08be a fixed length or a variable length.
04:10I might specify it here, for example, that my state should be limited to just two characters.
04:15And really, this all comes down to you knowing what each piece of data should be.
04:19And if you've got some character data is it going to represent a FirstName or might it
04:24represent the content of a 1,000-page manuscript?
04:27The idea here is that you want to use a column definition that's as small as possible but no smaller.
04:34Fixed lengths are more efficient.
04:36But as you might imagine, they're just not all about flexible.
04:38If you try and store a 15-character name in a column only defined as 12-characters, you'll
04:43just lose the last three characters.
04:46You will also find that many database systems have different options for storing ASCII text and Unicode text.
04:52So, if you know you're storing international text, big five languages and so on, you'll
04:57need to pick a Unicode character format.
05:00And even though Unicode does take up more space, I tend to default the Unicode format
05:04unless I absolutely know I won't need them.
05:08And then there's the option of storing binary data, a photo, a video, audio, a document
05:13where you typically have a binary type.
05:15Now some databases have specific methods for dealing with very large binary objects, gigabytes of binary files.
05:23You may hear these data types referred to as Binary Large Objects or BLOBs.
05:28Now likewise, large amount of text can be managed differently as Character Large Objects or CLOBs.
05:36After defining the data type and optionally the length, one very important question is
05:41when you add a new row, will a value be required in this column or is it optional?
05:47Most databases default to requiring all columns to contain values, but sometimes that's not necessary.
05:53You might require, for example, a value in AddressLine 1, but not in AddressLine 2, it's
05:59just not needed, it's not relevant.
06:02So, what you do is you actually define particular columns to allow null values.
06:08Null is an important keyword in relational databases.
06:12A null is not the same as just a space or a blank entry.
06:16Null means the complete absence of a value.
06:19So, your columns can be defined as allowing null values, or more typically a column can be not null.
06:26A value is required.
06:28And one option here is that perhaps a default value for a column.
06:32So if you don't provide a value, can a default one be entered automatically?
06:37I might say, for example, that when I'm creating a new employee row, if I don't put anything
06:42in for the date hired, it will default to today's date, the date the row was created.
06:47Along the same lines, we might do some constraints to check are there maximum or minimum values for something.
06:54Should it match a pattern, like matching an email address or a phone number or a credit card number?
07:00So flexibility is usually our friend in programming, but it's not what you're looking for in most relational databases.
07:07If you want to store an email address for an employee, you want to know what will always
07:11be an email address, not sometimes an email address, sometimes an inspiring quote or a
07:16date or an MP3 file or just missing entirely.
07:20What you're hoping to end up with is that for each table, each entity that you have,
07:25you have a reasonable list of the columns to create and the data allowed in each column.
07:30And defining your columns as exactly as possible, means that your database will enforce these
07:36rules on those columns, that your data will stay valid and consistent, and you won't end
07:41up with a database full of garbage.
07:46
Collapse this transcript
Choosing primary keys
00:01Each table should have a Primary Key, a value that uniquely identifies an individual row
00:06where there can be no duplicates and no confusion.
00:09So, if we have an EmployeeID, it should take us to only one employee row.
00:13If we have a CustomerID, it takes us to only one customer row.
00:16If we have an ISBN number, it takes us to one specific book.
00:20When creating a database schema, we need to say which column contains that Primary Key
00:25for each of our tables, and if there isn't one, we usually need to make one.
00:30As I mentioned earlier, occasionally the key is already naturally in the data.
00:35If, for example, I define a Book table where we enter all the details of a book, including
00:40its ISBN, well, that ISBN number should be naturally unique.
00:45I could then tell the database that this is the Primary Key, and because it occurs in
00:51the data, this is what sometimes called a Natural Key.
00:55However, in many examples, it doesn't naturally occur.
01:00As I start to build our Customer table, I might realize there's nothing in this that's inherently unique.
01:06We might say that Email is unique, but sometimes the same Email address is shared by multiple
01:10people as this the same address.
01:13So, if I decide there's nothing that I could guarantee would always be unique, I would add a Customer ID column.
01:20Now in most Database Management Systems, the way that you do this is to define this column
01:25as an integer, but also add an option to make it automatically increment and then specify
01:31it as the Primary Key or the Identifier.
01:34In this way, when adding a new row, we don't have to say what this value is the database
01:40will automatically generate it for us, and it will always be a unique value in this table.
01:45
Collapse this transcript
Using composite keys
00:01One option for a Primary Key that you might find useful from time to time is something called a Composite Key.
00:07This is when one value does not uniquely identify a row, but two values do, where we'd combine
00:14two column values to create a unique Primary Key.
00:16It sounds a little weird, so let me give you an example.
00:20Let's imagine I've got a publishing company and part of what I do is deal with Yearbooks for high schools.
00:25So, I have a Yearbook table.
00:27Now, the way this currently stands, none of these columns are naturally a potential Primary Key.
00:33There's nothing here that's inherently unique. The School name repeats and the Year repeats.
00:39PageCount might be unique but that's just coincidence.
00:42We're not looking for something that may be unique, if we're lucky.
00:46We need something that will always be and must be unique and makes logical sense to
00:52use when identifying each row. So, I can't use the School name.
00:56I can't use the Year, but what we can do is combine them.
01:01If in this Yearbook business, we create one book for each School per year, then if I select
01:07the name and the year that can uniquely identify each row.
01:12There might be multiple rows for Orchard High and multiple Years for 2010, but there's only one Orchard High for 2010.
01:20Only one Orchard High for 2011, only one Lawstone Elementary for 2010, and so on.
01:27This is a legitimate Primary Key, and this is known as a Composite Key where it's composed of two or more values.
01:36Now, it's true it might sometimes be more useful or even just be more convenient to
01:41generate a surrogate Primary Key column anyway.
01:44Say in this case, a column called Yearbook ID that will automatically generate an integer,
01:50but Composite Keys can be a useful technique, and you will run into them from time to time.
01:56One place you will see them, and we'll see them later on is they're used when we're joining
02:00tables together to create many-to-many relationships.
02:05
Collapse this transcript
4. Database Modeling: Relationships
Creating relationships
00:00I talked about Relationships in the earlier section on features of relational database,
00:05but it's time to get more specific.
00:07You're organizing your database into individual tables, and many of these tables
00:12will need to know about each other.
00:13Now, if you've started by sketching out a rough Entity Relationship diagram, you begin
00:19by just attempting to show that some kind of relationship exists between your entities,
00:25but to get closer to actually building this in a database, we need to say, what these Relationships are.
00:31And while it is common in an ER diagram to use a phrase or a verb to describe it like,
00:37contains or refers to or belongs to. That's for your benefit.
00:41The database can't describe it that way.
00:44Now, there are only three kinds of relationship of cardinality between tables in a relational database.
00:52One of them dwarfs the others, there is one-to-one, one-to-many and many-to-many.
00:58Okay, some people might add a fourth of none-at-all.
01:02But the time to start defining these is yet another example of how you can keep to a process
01:08and order when designing databases.
01:11First, you sketch out your entities, which will become your tables.
01:15Then you're going to define the data, the attribute for those entities, which will become
01:19columns in your tables, because only then can you say, which columns will be the primary
01:24keys, or if you need to generate primary keys.
01:27You can't properly define your relationship until these primary keys are specified.
01:32Now, I'm not trying to pretend that it's a purely linear process, not all your tables
01:37will become obvious in the first couple of steps.
01:40When you go through the process of normalization, which we'll get to in the next section, you'll
01:44probably end up with a few new tables.
01:47So we will revisit everything we've started to define, but we can still work through a process.
01:52Now, the benefit of specifying our relationships formally in the database, rather than just
01:58informally putting things together in an application is that the database management system itself
02:03will take care of keeping those relationships valid and meaningful.
02:07You can then use those relationships to answer all sorts of questions.
02:11Everything from the basic idea of, how many orders has a particular customer had.
02:16How many products were in an order? Or say in a game system.
02:20What weapons belong to a particular player?
02:22How many quests has one player completed compared to another?
02:26All the data is stored in these separate tables, but it becomes easy to jump from one to another
02:32by following the relationships.
02:34Next, we'll get into the most common, the one-to-many.
02:39
Collapse this transcript
Defining one-to-many relationships
00:00The most typical kind of cardinality in a relational database management system is one-to-many,
00:06which is just as easily described as many-to-one.
00:08It just depends on which side of the relationship you start at.
00:11In the example I showed earlier was this, perhaps the archetypal one-to-many or many-to-one relationship.
00:19Customer and Order, one customer can have many orders.
00:23Meaning, one Customer row can be associated with multiple Order rows but each Order row
00:29is only for one Customer.
00:31Now creating this relationship depends on us having our primary keys defined first.
00:36If you're attempting to describe that we need to link from a row in one table to specific
00:41rows in another table, we need a way to get to those specific rows and getting to a specific row takes a primary key.
00:48So going into this, we must have our tables with our columns and primary keys at least roughly planned out.
00:54Next, implementing a new one-to-many relationship requires a change to whatever table represents
01:01the many side of the relationship.
01:03So, to relate Customer and Order tables, I don't need to change anything about the Customer table, that's the one side.
01:09I need to add some extra information to the Order table, and it's the key to the Customer table.
01:16So, Customer ID, and this again is called a Foreign Key.
01:20It represents a column in this Order table that is a key to a row in a different table.
01:27These will specifically refer to one and only one row in the Customer table.
01:33You might find the Customer ID occurs more than once in order, but it's always pointing
01:38to only one row in the customer.
01:40We always make the change to the many side of the relationship because it's the only way to do it.
01:45I can add one column with one value to every Order row that will always point to a correct
01:50customer, but I cannot add one column with one value to the Customer row that could point
01:55to a variety of different orders.
01:57So, I'm making a match between the columns in these different tables.
02:02Now very often, we would use the same column name across both of those tables.
02:06So in this case, Customer ID and Customer, Customer ID and Order, but it doesn't have
02:11to be the same because we are really making the match on the values in these columns.
02:16In some cases, you couldn't use the same name because there'd be a conflict.
02:21Perhaps both of these tables were defined with the primary key just called ID, or we
02:25couldn't reuse the ID column twice in the Order table.
02:29There would be a conflict there, it just wouldn't work.
02:32So, we call it something else like Customer ID. It's still a foreign key.
02:37It just happens to have a different column name between the different tables.
02:41Another thing that you might see is a completely different name that attempts instead to describe the relationship.
02:47So, instead of using Customer ID, we might use something like Placed By as a column name.
02:53And Order is Placed By a particular Customer ID.
02:56It still refers to a value that's in the ID column in the other table, it's just using a different name.
03:02So, the name of the column doesn't have to match but the data type certainly should.
03:08It doesn't make sense to have a primary key that's an integer in one table, try and match
03:13a column that's described as character data in a different table.
03:16Now it's also very common to have one table that takes part in multiple relationships.
03:21So, a customer can have many orders.
03:24We go to one-to-many relationship here, but we may decide that our customers can have
03:29multiple different addresses they may ship to, so we might add a new table for address
03:34and have another one-to-many relationship between customer and address.
03:39This is perfectly acceptable and very common.
03:42Another option is that a table that is on the many side of a one-to-many relationship
03:47could be on the one side of another.
03:50So, for example, Order might be the many side of the customer to order relationship but
03:56Order itself could have many order items, so it's on the one side of that relationship.
04:02One customer has many orders, one order can contain many order items.
04:06Now as we start to get a bit further, we're going to want to start to diagram these a bit more specifically.
04:12I've shown the basics of entity relationship diagrams as they are very simple to just get started conceptually.
04:19Basically, boxes with lines between them.
04:21Although as you start to more formalize the actual database diagram, you'll see a different
04:27kind of layout emerge, and there's no fixed one standard.
04:31You'll see different options used across different database management systems and different charting applications.
04:38But they usually boil down to this kind of idea.
04:41A box for each table with the name of the table at the top, then we'd have the column names.
04:48You may or may not add a little extra to show what kind of data these are, character data,
04:53dates, binary, integers, and so on.
04:56What is very common is you would add a PK for a primary key, and you would add an FK
05:02to denote a Foreign Key and then start adding the relationship connector lines between them.
05:08This kind of diagram is very, very common across all relational databases.
05:13It's easy to see what tables exist, what columns, what relationships exist between the tables.
05:18Now in most relational database management system administration software, there is an
05:23option to generate these diagrams from an actual existing database in that server, and
05:29the relationships themselves will be shown with these connector lines.
05:32Different software tends to generate different looking lines.
05:35As I mentioned earlier, some will use the Crow's foot style to show the many part of
05:40a one-to-many relationship. Some will show the infinity symbol.
05:44There are other ways of doing this, but it's usually not that difficult to pick any one
05:48relationship and figure out which way it's going.
05:53
Collapse this transcript
Exploring one-to-one relationships
00:00It's possible to create one-to-one relationships in your database, but it's actually very unusual and here is why.
00:07Let's imagine I'm building an HR database.
00:09So, I sketched out a few entities based on real world object.
00:14And as part of this system, I figured out I need to keep track of employee data and
00:18driver's license information so I'll create two tables just for those pieces of data.
00:24But let's say the only reason I need to do this, to keep track of driver's license data
00:28is so I can associate a license with an employee for making travel reservations.
00:33So, in this system that I'm describing, it's a pure one-to-one relationship.
00:39One employee row points to one driver's license row.
00:42There are never multiple employees for one driver's license, and there are never multiple
00:46licenses for one employee.
00:48Now, I understand there are always caveats, so I'm assuming that this system just doesn't
00:53care that a few people might possess two licenses for different states or countries.
00:59It's trying to model the idea that we need one listed primary driver's license.
01:04So, I could create these as two tables with a one-to-one relationship.
01:09I could store a foreign key to the driver's license table in my employee row, or I could
01:13do it the other way around or in fact, like I'm showing here, I could just use the same
01:18primary key for each table, just mapping one row directly to another.
01:23But if this is the actual situation that it's a pure one-to-one, then I might as well just
01:29combine the two tables and have the driver's license information stored directly in the
01:34employee row with no relationship needed.
01:38So even if that driver's license might be a physical different entity, this makes a
01:43lot more sense to do it this way.
01:45One thing to take care of, when you're new to database design, it sometimes happen that
01:49you think you've got a one-to-one relationship when you don't.
01:52Here is an example where I've seen this happen multiple times.
01:55Let's say we're drawing up a fairly basic situation with an Order, an OrderItem, and a Product.
02:01There's some kind of relationship here.
02:03So, Order is storing information like an Order ID, a date, a total amount due.
02:08Each OrderItem is storing a Foreign Key to order ID, storing a quantity, and product ID
02:15because it's linking to a different table product that's storing information like the
02:18description in the list price. Now I know there's a relationship between them.
02:23And if asked someone to diagram the probable relationships in between basic entities like
02:28this, I usually have something that goes this way.
02:31They know it's a one-to-many between Order and OrderItem.
02:34Okay, one order can contain multiple OrderItems and then they take the OrderItem which is
02:40for a product so it has a product key and a quantity.
02:44And the reasoning often goes, well, one OrderItem is for one and only one product so it's a one-to one.
02:53But of course it isn't.
02:55What they've correctly determined is that it is a two-one relationship, meaning an OrderItem
03:00is for one and only one product but they haven't looked at it the other way, that the same
03:07product can be associated with multiple OrderItems.
03:10So, this really is a many-to-one.
03:13It's going from OrderItem to product and a one-to-many going from product to OrderItem.
03:18One product could be ordered multiple times.
03:21So, make sure that you're looking at your relationship both ways, particularly when
03:26you think you have found a one-to-one.
03:31
Collapse this transcript
Exploring many-to-many relationships
00:00While I went through an example of a many-to-many relationship earlier, it's worth going through
00:05again, particularly if you're new to database design.
00:08So, I had used the example of author and book that one author could write multiple books,
00:15but also one book could be written by multiple authors.
00:18So, this is a many-to-many business scenario.
00:21And I'm going to use a different business problem to describe at this time around, in
00:25this case, a database for a training center.
00:28So, let's imagine we have two tables right now, class and student.
00:34Class includes the information like a title of a training course, what date it's on, what
00:38classroom it's in, and the student is just exactly the information you would imagine,
00:43first name, last name, email, and so on.
00:45And what I want to describe is a relationship between them, that a class can be attended
00:50by multiple students but that also a student could take multiple different classes.
00:56It is a many-to-many relationship.
00:58And I cannot add one column to either table that would successfully represent all the
01:03possible combinations because if I had a StudentID column to the Class table, then I can only
01:10have on student in each class.
01:13If add a ClassID column to the Student table, then I can say that each student can only take one class.
01:20So once again, in relational database, you cannot represent a many-to-many relationship directly.
01:26You need to create a new table to link the two, in several names with this kind of table,
01:32a joint table, a junction table, a linking table, a bridging table, a cross-reference
01:37table, it really doesn't matter, what ever you prefer.
01:41By convention, the name of this table is just usually made up by taking the names of the
01:46two tables that it's cross-referencing and putting them together.
01:49So in this case, ClassStudent, or StudentClass, either would work.
01:54And there's a one-to-many relationship from Class, the ClassStudent linking table, and
01:59another one-to-many relationship from the Student table to the ClassStudent table.
02:04But this linking table doesn't contain any other data than just two columns.
02:09One of those columns is a Foreign Key to the Class table with Class ID, the other is a
02:14Foreign Key to Student with StudentID, and this is how we represent a many-to-many relationship.
02:19So, if I want to take Student ID 102, Viola, and say that she takes multiple classes.
02:26Then what I need is multiple rows in that liking table both referring to Student ID
02:32102 but referring to different Class IDs, in this case, 441 and 442, one student, multiple classes.
02:41And if I want to go the other way taking one class, say the database design class, Class
02:47ID 441, I'll take that over into the liking table, find all the rows with 441, find the
02:54student for those rows, 102, 101, 103, and just map those to the Student table, one class, multiple students.
03:02So is this a little more tedious than just having a single one-to-many relationship?
03:07Sure, but it's the only way that we can represent this kind of cardinality.
03:12Now one question you might have about the Class-Student linking table is what is the
03:18primary key for it? Do we need to add a new column?
03:22Well, we could add one, add an automatically incrementing primary key but we're unlikely to ever use it.
03:29This is one of the cases where we could just tell the database that we're going to use
03:32a composite or compound key, but neither column by itself would be enough to make any row
03:38unique, but both of them together would.
03:41There should only ever be one row with a particular StudentID for a particular ClassID.
03:47ClassID can repeat, StudentID can repeat, but both of them together, won't.
03:53That's enough to make every row unique. Now one more thing.
03:57First, I'll recap that quite often, you begin by thinking a relationship is a typical one-to-many
04:05and only later realize it is in fact a many-to-many in disguise.
04:10One reason for this is that the relationship in many-to-many isn't always weighted equally.
04:16Let's say that in your organization, one department has multiple employees who each work for only one department.
04:24That would be a one to many. It's easy to show and easy to represent.
04:29You would make it so each employee row just get to Department ID, and you're done.
04:33But if there exist, that one employee who's supposed to be splitting their responsibility
04:39across multiple departments, and you need to represent that in your system, then this
04:44is simply not going to work anymore.
04:47It's now a many-to-many relationship, and you would need a linking table to be able
04:52to contain that in your system.
04:57
Collapse this transcript
Understanding relationship rules and referential integrity
00:01So, why do we go to all this trouble?
00:02Well, it's usually easy to understand that relationships are a convenient way to link
00:07from one table to another. For a customer, we can just get all the orders.
00:11Well, just as important, if not more so is the idea that a relationship describes a rule,
00:17a constraint that I cannot violate, that I now must not have an order row for a customer that doesn't exist.
00:25The database will not let me add a new order row with a Customer ID 388 if there is no
00:30Customer ID 388 in the Customer table.
00:34I may say but I'm just about to add that customer, but at that moment the database would be no longer valid.
00:40It would no longer be internally consistent.
00:43This is what's considered a referential constraint.
00:46It's a rule that applies between tables where each table isn't just applying the rules of
00:51its own column definitions, but it's also cross-referencing the data in the Customer
00:56table to make sure that this is or is not allowed.
01:00If there isn't a Customer ID 388, you would not be able to enter that row.
01:05So, defining these rules will often imply a sequence in your database for creation of
01:11new content, but in this if I have a new customer and a new order for that customer,
01:17I must add the customer first before I can add the order for it.
01:21It would allow a customer without an order, but it won't allow an order for a non-existing customer.
01:28The idea that data is not just valid within one particular row or one particular table
01:34according to the rules of that table, but valid and meaningful between all your tables
01:39is known as referential integrity.
01:41There are two places that it primarily has an impact.
01:46Adding new rows as I just described, making sure we can't add an order row with a Customer
01:50ID that doesn't exist. This also applies for updates.
01:54I can't bypass it by adding a row for Customer 369 who does exist and then changing that
02:00to a value that doesn't. It will refuse that too.
02:04This is still referential integrity enforcing the rules of the relationship.
02:09What's more interesting is what happens with referential integrity when you delete something.
02:14So, if our data currently exist between these two tables in a valid meaningful state, say,
02:20I have Customer ID 367 who has two associated orders, well, what happens if I delete Customer 367?
02:28Well, in most database systems, it depends.
02:32It depends on you and your rules and what you want to have happen.
02:36One option you have when deleting between tables that have relationships is something
02:41called a Cascading Delete.
02:44If I want to delete a row from the Customer table, the database says, "You're the boss",
02:49and we'll just immediately delete any related row.
02:52So, in this case I tell it to delete Customer 367, it will first cascade down and delete
02:58all the Order rows for that customer and then delete the Customer row.
03:02If you have multiple tables connected to each other, one delete operation could theoretically
03:08delete many different connected rows in other tables.
03:11Another option for deleting is cascading nullify.
03:14Some database systems have this option, although it is rarely used, where you would null the Foreign Key column.
03:22So, in this case, I delete Customer ID 367 and the database would automatically scan
03:27all the orders and where it finds 367, set that to null.
03:31So, it's a situation where I must keep the order, but it's kind of now detached from
03:36a particular customer. It still exists but with a null column.
03:39So, what we're not doing is pointing to a customer that doesn't exist anymore. But more
03:44likely than either of this is simply refusal. No action.
03:49This is the default in most database management systems, that if you've created a relationship
03:55between these two tables and then you try and delete a customer that has existing orders,
04:01it's not going to let you do that. It will refuse the delete option.
04:05So just as when creating new data, I'd first have to create the customer first then the order.
04:11When deleting, you'd need to delete all the orders first or transfer their ownership to
04:16some other Customer ID and only then would I be allowed to delete that Customer row.
04:21
Collapse this transcript
5. Database Modeling: Optimization
Understanding normalization
00:00Once we've started to plan out our tables, our columns, and relationships, we do something called Database Normalization.
00:07This is a process where you take your database design, and you apply a set of formal criteria
00:13of rules called Normal Forms.
00:16These were developed about 40 years ago mainly by Edgar Codd, the father of relational databases.
00:21And we step through them 1, 2, 3, first normal form, second normal form, and third normal form.
00:28There are others but these are the important ones.
00:33Normalization should be carried out for every database you design.
00:36It's really not that hard, even though, yes,
00:39when you first start reading about database normalization, you'll run into phrases like,
00:44your database will comply with third normal form when every non-prime attribute of R is
00:48non-transitively dependent (i.e. directly dependent) on every candidate key of R.
00:53But you don't have to get into all this language unless you are mathematically inclined.
00:58The entire point of normalization is to make your database easier and more reliable to work with.
01:04You usually will end up creating a few new tables as part of the process.
01:09But the end result is your database will contain a minimum of duplicate or redundant data.
01:15It will contain data that's easy to get to, easier to edit, and maintain, and you can
01:20preform operations, even difficult ones on your database without creating garbage in
01:25it, without invalidating the state of it.
01:28And it's something we will do as part of our initial design, and it's also something that
01:32we would reapply when our database design is tweaked and revisited.
01:36Now, let me get something straight.
01:39Normalization is not an ivory tower, theoretical, academic thing that people talk about,
01:44but nobody really does in the real world, no, everybody does this.
01:49If you're a working database administrator or database designer, you can do normalization in your sleep.
01:55It's a core competency of the job. It's important.
01:59And as you'll see, we've already been doing a little of it.
02:04
Collapse this transcript
First normal form
00:00Before we apply the first set of criteria, what's called first normal form, often shortened
00:05to 1NF, I'm taking as a given that we already have our columns and our primary keys specified. Okay.
00:13First normal form says that each of your columns and each of your tables should contain one
00:17value just one value, and there should be no repeating groups.
00:22Okay, what does this actually mean?
00:23Well, let's say I begin developing a database for my company and one of my tables is an
00:28Employee table, very simple stuff, EmployeeID, LastName, FirstName, and so on.
00:34And we allocate every employee a computer.
00:37I want to keep track of that, so we'll add a ComputerSerial column to keep track of who has what.
00:43Now, this is actually okay right now. This technically is in first normal form.
00:49Here's the problem.
00:50Let's say I figured out that some of our employees need a Mac and a PC to do the testing.
00:55Others need a desktop and a laptop.
00:57So, several people have multiple computers, and I want to keep track of all of them.
01:01There is a couple of ways that I could deal with this.
01:04I could just start stuffing extra data into that one column.
01:08We could start putting commas or vertical bars or any other delimiter and put in multiple
01:13values in the one ComputerSerial column.
01:17This is just something you just don't do in Relational Database Design.
01:22We're violating first normal form.
01:25Understand the relational databases will happily deal with hundreds of tables.
01:30Each table could have hundreds of columns and millions of rows.
01:33But they do not want columns that have a variable amount of values.
01:38Perhaps, there's one value, perhaps there are two in it, perhaps there are ten.
01:43Each column and each row should have one and only one value.
01:47Now, could you do it this way? Sure,
01:50technically, if that column was defined as character data, the database wouldn't stop
01:55you from entering commas and multiple values into it.
01:58But as far as the database is concerned, it's still just one value that just happens to have commas in them.
02:04So, you would find it hard to search directly for a serial number.
02:07You'd find it hard to sort. You'd find it hard to maintain.
02:10So, it's not in first normal form if you do this because first normal form demands that
02:15every column, every field contains one and only one value.
02:19So, what we might do then is go back to the original way, and instead start adding new columns.
02:26So, ComputerSerial2, ComputerSerial3, this is what's called a repeating group, and there
02:33should be no repeating groups.
02:35The classic sign of a repeating group column is a column with the same name, and the number
02:40tacked onto the end of it just to make it unique, because usually this is a sign of an inflexible design.
02:46Sure, if we could guarantee that there would only ever be two or three, that's fine.
02:51But what happens when we want to add the tablet and the smart phone?
02:55What happens when one employee manages testing and needs to be associated with six computers?
03:01We don't want to require a change to the database schema just because we buy a new computer.
03:06So, what do we do here?
03:08Well, what we do is the same thing for a lot of these normalization steps.
03:13We'll take this data out of the Employee table, and put it in its own table.
03:19This then has relationships.
03:21We create a one-to-many relationship between employee, and this new computer, or it could
03:27be called an asset table or whatever else makes sense.
03:30And it has a foreign key back to the Employee table.
03:33I can take any EmployeeID like 551, follow it to the Computer table, and find his two
03:39computers or 553, find his three computers, there are no repeating values, no repeating
03:46groups in either table. And this will get us into first normal form.
03:51Now, it's very common that the solution to a normalization issue is to create a new table.
03:57Sometimes, it's a one-to-many relationship like this, other times it might even require
04:01a many-to-many with a linking table.
04:06
Collapse this transcript
Second normal form
00:01Before you attempt to go into second normal form or 2NF, well first, you have to be in first normal form.
00:08You don't pick and choose between them. You go through this one, two, three.
00:12Now whereas first normal form is about the idea of repeating values in a particular column,
00:18second normal form, and third normal form are all about the relationship between your
00:23columns that are your keys, and your other columns that aren't your keys.
00:28The second normal form has the rather puzzling official description that any non-key field
00:34should be dependent on the entire primary key.
00:37And that is about as simple as it can get phrased.
00:40Now, when I say the word field, it usually refers to the idea that the actual value in
00:45a particular column position for a particular row.
00:49But what does this actually mean?
00:51Well, for most of what we've done in this course, this actually won't be an issue for us.
00:56Second normal form is only ever a problem when we're using a Composite Primary Key.
01:01That is a primary key made of two or more columns.
01:05So, let me show you a table that currently is in first normal form but not in second normal form.
01:12Going back to the idea of a database for a training center, I have an Events table here
01:16that has an ID of a Course, a Date, CourseTitle, Room, Capacity, AvailableSeats, and so on.
01:23Now, what's actually happening here is this table has been defined to use two columns
01:28as the primary key. It's a composite primary key.
01:31And that's instead of just the Course ID column, we can't do that because, as you can see,
01:37SQL101 occurs multiple times, that can't be a primary key, neither could we use date.
01:43But I could combine the ID with the Date, that is logically unique.
01:49The same course may happen multiple times, and we may start multiple courses on the same date.
01:55But there's always only ever one particular course on only one particular date.
02:00So, SQL101 on the 1st of March is unique. It's a legitimate composite primary key.
02:05Now, the issue with second normal form is that if you use a composite key, you need
02:10to look closely at the other columns in this table.
02:14So, going along to my non-key columns, I have CourseTitle, SQL Fundamentals, Room 4A,
02:21Capacity is 12, there are 4 seats available.
02:24A lot of this information would be unique to this one entry, this one course on this particular date. That's fine.
02:32But second normal form asks that all of my non-key columns, everything that isn't part
02:37of the key, so Course Title, Room, Capacity, Available, they all have to be dependent on the entire primary key.
02:44Now, that is the case for Room and Capacity and Available.
02:48These are unique values based on the fact that we're running this particular Date, this
02:53particular Course, and this particular room with a certain number of seats available.
02:57It will always be different.
02:59But CourseTitle, well, I could get that just from half of the key.
03:05I could get that just from the first part of the key.
03:08It has no connection to the Date whatsoever. SQL Fundamentals will always be based on SQL101.
03:15It doesn't matter if it's being run in March or April or May.
03:18Now, this might sound a little bit ivory tower. But here would be the impact.
03:23What happens if somebody reached into this table, and they changed that Course ID, but
03:28they didn't change the title? Now, we've got a conflict.
03:32We might have the wrong title for the wrong piece of data.
03:35That's because my data now isn't in second normal form, and we're trying to fix that
03:40conflict from ever happening. So, how do we fix it?
03:43Well, once again, we're going to rip out the CourseTitle.
03:46We're going to create a separate Courses table, where we want to again map the ID of the course into its own row.
03:54So, we'll always have one specific title for one specific ID.
04:00And then we create a one-to-many relationship between Events and Course.
04:04And removing that from the Event table means that everything in that table is now based
04:09on the entire key, particular course, at a particular date which may have a different
04:13room or different capacity, different number of available seats.
04:17Now, here is the great thing.
04:19If you are not using composite keys, second normal form isn't even a concern.
04:24You can step ahead, go right through second normal form and into the next one, third normal form.
04:29
Collapse this transcript
Third normal form
00:00Now, let's take a look at the third normal form.
00:03Well, as plainly as this can be described, it's that no non-key field, meaning, a column
00:09that is not part of the primary key is dependent on another non-key field.
00:14It is in a way similar to second normal form.
00:17Second normal form asks can I figure out any of the values in this row from just part of the composite key?
00:23While third normal form asks can I figure out any of the values
00:27in this row from any of the other values in this row? And I shouldn't be able to do that.
00:33Let's take a look at an example.
00:35I've got this updated version of the Events and Courses table from the previous example.
00:40So, it's in both first normal form, it doesn't have any repeating values or repeating groups,
00:45and it's in second normal form.
00:47Meaning, there's no part of this that's dependent on just on a piece of the key.
00:54What I need to do for third normal form is look at my non-key fields, Room, Capacity, Availability.
01:02If I scan the entire row, let's take the first row, we've got SQL101 course occurring on the 1st of March.
01:08There is apparently 4 seats available. It's in Room 4A with a capacity of 12.
01:13Now, this is at a first look at it perfectly acceptable, because this course could be being
01:19scheduled in a different room every time with a different number of available seats as we
01:24start to sell different seats for a particular date. That's all okay.
01:29Here's the problem. It's between Room and Capacity.
01:33These are both non-key fields. These columns aren't part of the primary key.
01:38But if I look down the column for Room, I see 4A has 12 seats capacity, 4A has 12 seats, 7B has 14 seats.
01:47So, if every time we're in Room 4A, we always have 12 seats or every time we're in 7B,
01:53we always have 14 seats. I don't need to repeat that information.
01:57I could figure out capacity from Room and Room alone.
02:01I have one non-key field that is based on another non-key field.
02:05So, we don't need these to be stored in the same table.
02:08What we need to do is, you guessed it, split some of this information out into its own table.
02:14So, we need to pull out Capacity from the Event table, and just keep Room.
02:19And that's as long as Room will always tell us a fixed capacity, we'd create our own table
02:24for it, 4A always has 12, 7B always has 14, and so on.
02:29Now, we're in third normal form, no non-key field is dependent on another non-key field.
02:38Now, as you're seeing, it's all about the redundancy of the information.
02:43This is what we're trying to do with normalization.
02:45Now, another example of third normal form would be something like this, which is very common.
02:52Let's say we've got an OrderItem table, which is calculating different parts of an invoice.
02:57So, it has a ProductID with a Quantity, a UnitPrice, and a Total.
03:02Now, you don't have to worry about how this might relate to different tables.
03:06All I'm interested in looking at is this part.
03:09We've got Quantity for UnitPrice of $10, Total is $40. Here is the issue.
03:16We can see that Total is based purely as on Quantity times UnitPrice.
03:20Now, Quantity and UnitPrice are both non-key fields.
03:25So we're figuring out Total from these other two non-key fields.
03:29We don't need to do this. We don't need to store this in the database.
03:33We don't want to store information in your table that's easily ascertained by adding
03:38other non-key fields together, or in this case multiplying them.
03:41One of the main reasons for this is to prevent any conflicts.
03:45If in this example I have a row that says we have a Quantity of 4 and the UnitPrice
03:49of 10, but the Total says 50, well, where is the problem?
03:53There is a problem. How do we do it? How would we fix it?
03:56Is the Total wrong or is the Quantity wrong? Your data doesn't make sense anymore.
04:00So, we would remove that Total column form this table.
04:04We can figure it out when we need to figure it out.
04:06Now, third normal form will help you figure out these potential problems.
04:10Now just a quick side bar, in cases like this where you might find a total useful in the
04:16table, many database systems offer you the option of defining a computed or calculated column.
04:22It's not actually stored in the database, it is a convenient read-only fiction.
04:28Its value is automatically calculated based on the other columns in the table, and you
04:32may find that useful from time to time.
04:37
Collapse this transcript
Database denormalization
00:01So, we should always take our database design through the first, second, and third normal forms.
00:07There are more criteria available.
00:09There are fourth, fifth, and sixth normal forms. There's something called Boyce-Codd normal form.
00:15But taking it to third normal form is the usual expectation in a business environment,
00:19and certainly all we need to cover in a course like this one.
00:22Now, you will actually find a lot of tables out there intentionally break normalization
00:27rules and some others seem like they do but they actually don't. Here's one example.
00:33Let's say we've got an Employee table, and I'm storing an Email and a Phone number.
00:37Well, what happens if I want to add another email address, another phone number?
00:42Technically, this can be described as breaking first normal form.
00:45It's a repeating group.
00:47But in practice, you may find it more convenient to just allow an Email and Email2 column or
00:54perhaps a HomePhone and MobilePhone column rather than splitting everything out into
00:59multiple tables and having to follow relationships every single time you read or write this data.
01:06This will be referred to as a de-normalization decision.
01:09You're consciously making the choice that something could be normalized out into another table.
01:15You could follow the official rules.
01:17But for convenience and/or for performance, you're not going to.
01:21However, if in this case I needed to support a flexible number of email addresses
01:26or flexible number of telephone numbers or associate extra data along with these, then absolutely,
01:31you should split these out into their own table. But it's an important distinction.
01:37Normalization is concerned about the quality of your data and reducing redundancy with it.
01:42But it isn't primarily a performance decision.
01:45And because normalization often involves splitting data into multiple tables, performance can occasionally take a hit.
01:52And sometimes, you'll choose to de-normalize for performance improvement.
01:56Now, one example that can seem like a normalization and/or de-normalization issue but really isn't
02:04any table that's full of address information. This situation can be a little deceptive.
02:09If I look at a table like this, and I can see I've got Zip code being stored as the last column here.
02:16Theoretically, I could figure out what the City, and the State are just from the Zip Code,
02:21if I separated them out into their own table.
02:24So technically, I have non-key fields, City and State that are dependent on another non-key
02:30field, Zip, that could be figured out from Zip alone.
02:34However, this kind of case is not the full story because while it might be true 99% of
02:40the time that a Zip code maps to a particular City or Town, there are some cases where multiple
02:47towns or cities are allowed in the same zip code, some Zip codes even cross multiple states.
02:53Normalizing a table like this, thinking that I've immediately spotted a non-key field dependency,
02:58well, that would actually be taking it too far and making things more inconvenient.
03:04And the question is you really want to understand your data before you can make all these choices
03:09whether to normalize or de-normalize.
03:11And you might de-normalize to make things a bit more efficient, but do it knowingly instead of accidentally.
03:18And these really are the three steps that we would go through, first normal form, second
03:23normal form, and third normal form.
03:26First being about having no repeating values and no repeating groups, second normal form,
03:31no values based on just part of say half of a composite key, and third normal form.
03:38None of your non-key values should be based on or determined from another non-key value.
03:46Taking your database design through these three steps will vastly improve the quality of your data.
03:51I'm going to say that one more time.
03:54Taking your database design through these three central criteria will vastly improve the quality of your data.
04:01
Collapse this transcript
6. Database Modeling: Querying
Creating SQL queries
00:00Time to get back to SQL.
00:02As I mentioned in the first section of the course, Structured Query Language is the shared
00:07vocabulary of relational databases. That's not for general purpose programming.
00:12It's a small language focused purely on working with databases and the first handful of keywords
00:17that you'd learn maybe a dozen, will take care of 90% of everything you'd ever do in SQL.
00:23Now, we do have an entire course on SQL here at lynda.com, it's Bill Wienman's SQL Essential Training.
00:30But I would be remiss to have a database course that doesn't talk about SQL.
00:33So, in the next few videos, I'll cover the basics.
00:36Now, I said there are perhaps a dozen very useful keywords in SQL.
00:40But in terms of frequency, there's one that's head and shoulders above all the others.
00:45By far the most common word you'll ever write in SQL is SELECT.
00:49This is the word we start with when we want to SELECT or read information from one of
00:55the tables in one of our databases.
00:57We're using it to ask the database a question, and we expect a reply.
01:01Actually, the better word is not question but query.
01:05A query as in the Q in SQL: Structured Query Language.
01:09So, I will begin this with SELECT.
01:12But if I want to SELECT some data out of my database, I'm going to have multiple tables
01:17and multiple columns in each table.
01:18So, I need to say what specific part of this database am I interested in selecting?
01:25The general format of a SELECT statement is the word SELECT, what columns you're interested
01:30in, then the word FROM what table? SELECT and FROM are your SQL keywords.
01:36They're part of the SQL language itself. Now, the columns and table names are up to you.
01:42It's whatever you named your columns and your tables in your database.
01:44So, if you're interested in getting some information, say out of this Employee table, I'll just
01:50pick one of the columns: FirstName, and the name of the table itself, so SELECT FirstName from Employee.
01:57Now, running or executing this query would return all the values in that FirstName column,
02:04from all the rows in the Employee table whether that's one row or 100,000 rows or none.
02:10Now, you often want multiple column values, so if you want multiple columns, you separate them with commas. FirstName, LastName.
02:19If I run that query, we get both columns, again, for all the rows in that Employee's table.
02:25If on the other hand, I want all the columns in this table, I don't actually have to write them all.
02:29I can use the asterisk instead.
02:32So just SELECT asterisk or SELECT star is the common phrase that you'll hear, from Employee.
02:39And this would return every column for every row in the Employee table.
02:43It's basically saying just show me everything in that table.
02:46Now, because you often want to restrict what is returned, optionally you'll add another
02:51keyword to your query which is WHERE.
02:54This is your WHERE clause, where a particular condition is true.
02:59So, WHERE LastName = Green or WHERE EmployeeID = 474, Salary > 50,000.
03:08In the next video, I'm going talk about what's allowed in the WHERE part of the query, but
03:13there are a couple things you may be wondering already.
03:17First, what about case sensitivity is going on here?
03:21What's going on with upper case and lower case? How much does it matter?
03:24Well, SQL is not case sensitive generally speaking.
03:28It really doesn't care if the word SELECT here is written in upper case or lower case.
03:34Now, by long convention over many years, most developers will write the SQL keywords all uppercase.
03:40It makes these statements easier to read.
03:43Your table names and column names on the other hand, should just match however they were
03:47named in the database, whether that was, lowercase, uppercase, mixture, or whatever.
03:51Having said that, many Database Management Systems don't even care about that and they
03:55treat table names and column names as case insensitive too.
03:58Now, line breaks also don't matter. SQL is not sensitive to white space.
04:04You can split an SQL statement across multiple lines, as I'm doing here for readability.
04:09Now, officially, a complete SQL statement should be ended with a semicolon, though in
04:15practice, many Database Management Systems just don't care.
04:18But you'll see the semicolon on most of what I write to make it explicit.
04:22Currently, I'm selecting everything from the Employee table, but one question you might
04:27have is what if you have multiple databases?
04:29How does it know which database this table is in?
04:33After all, I could even have several databases containing tables with the same name.
04:38If that's the case, I need to use not just the table name but the name of the database
04:42itself in my SQL statement, but depending on the way you're writing it, the Database
04:47Management System may assume that you're already pointing to a particular database.
04:52There's often a default database for the database server if you don't name one explicitly.
04:58But if I wanted to be explicit, I'd use the name of the database, then a period, then the name of the table.
05:03So, here I'm making the assumption I have a database called HumanResources, and I'm
05:07interested in the Employee table inside that database.
05:12This can be done a little differently depending on the Database Management System, but it is usually this simple.
05:17Now, right now, these statements are only selecting from one table at a time.
05:22We can write SQL to bring back information from multiple tables in the same query, depending
05:27on the relationships we've defined, but that comes later.
05:30First, we need to take a closer look at the WHERE clause.
05:35
Collapse this transcript
Creating the WHERE clause
00:00When writing a basic query, coming up with the SELECT and FROM parts should be fairly
00:05easy to decide, simply what tables you're interested in and what columns in that table you want to see.
00:11But writing the WHERE part, where you're specifying their condition or what's officially called
00:16a predicate, well, that can give you a few more options to choose from.
00:19But writing a WHERE clause to go against a particular table is a bit like writing an
00:24if statement in a conventional programming language.
00:27You're trying to describe something that at heart is either true or false.
00:31So, if I want to go against this table and bring back all the rows where people have
00:35a certain LastName, it's either true, or it is false.
00:39It's the same if they have a certain Email address or their Employee ID is in a particular
00:44range, or they were hired after a specific date.
00:47It's either true, they match that condition or false that they don't.
00:51And how you write this part, this WHERE statement, kind of depends on the data that you're matching
00:56to and how specific you want to be. So, let's get through a few examples.
01:00Right now, if I wanted to select all the rows where the LastName of the employee was equal
01:06to Green, well, this is the way I would do it.
01:09And if it was text that I'm matching on, I need to write that in quotes.
01:12Now unlike most general purpose programming languages, strings in SQL use single quotes,
01:18not double quotes here. I'll say that again, because it's important.
01:22String values in SQL are surrounded with single quotes.
01:26It is one of those situations where some Database Management Systems will be flexible and they'll
01:32allow you to use either single or double quotes, but if you're talking classic SQL, it's single quotes.
01:39Also notice if you're coming from a C-based language, we don't use the double equal sign
01:44here for equalities just single equals.
01:47We also don't need to surround our condition, our predicate in parenthesis, though you can, if it helps.
01:54By default, in most database systems this comparison would be case insensitive.
01:58It doesn't matter if I use an uppercase G or a lowercase G for green, and it wouldn't
02:03matter if it was uppercase or lowercase G, as it's actually stored in the database,
02:08although this is something that can be changed from database to database.
02:12Most management systems offer the option to make these kinds of checks case-sensitive.
02:18If on the other hand I'm checking a number, something that is stored as a numeric column,
02:23EmployeeID, for example, I don't need the quotes.
02:26So I just directly ask if it's equal to that particular value.
02:30And if we're working with numeric options like Salary, I have the comparison operators.
02:35We have the operators that you might expect.
02:38Greater than, less than, greater than or equal to, less than or equal
02:42to, and not equal is written with the less than and greater than signs right next to each other.
02:47So, in this case I'm asking for all the rows WHERE Salary > 50,000.
02:52And you can also use AND or OR to combine multiple conditions in your WHERE clause.
02:59So, in this case I'll say WHERE Salary > 50,000 AND the Department column value is equal to 'Sales'.
03:06And particularly for those of you coming from C-based languages, we actually use the word
03:11AND or OR not vertical bars or ampersands.
03:15So, in this case a perfectly acceptable way to check if the value of Department is equal
03:20to Marketing, or it's equal to Sales, we're going to bring back all those rows.
03:24Now if you wanted to check for multiple values where you're interested in the same column
03:30having several different options, in this case Department being Marketing or Sales,
03:35we can also use the word IN and then surround the multiple values inside parentheses separating them by commas.
03:43Because I'm checking for a text value, I need to surround each independent value with the
03:48single quote and just separate them by commas here.
03:51If on the other hand, what you're looking to do is be a bit more flexible on matching
03:55text, then instead of using the equal sign, which wants an exact match, we can use the SQL keyword LIKE.
04:02Now in a lot of other programming languages or other ways of doing wild card searches,
04:08you use a wild card character of an asterisk, but in SQL, it's the percent sign.
04:13So, I'm asking here where the last name column is LIKE 'Green%' and then anything after this.
04:19So this query would match all the rows with Green in that column, but also Green Span,
04:24Green Berg, Green Away, and so on.
04:27The wild card can be placed at the end of the text, it can be placed in the middle,
04:31it can be placed at the start. And it will match any number of characters.
04:36If however, you're just looking to match on a single letter, you could represent that
04:40with an underscore in your LIKE statement, which in this case would bring back anything
04:45that matches on Smith with an I or Smyth with a Y or any letter there.
04:50However, do you understand that in large tables, using LIKE can be quite inefficient as it's
04:55more demanding on the database to check multiple possible contents of a particular column value,
05:02rather than just checking for pure equality, which is a lot easier to do. What about null?
05:07I've told you about the idea of null values in the database.
05:10Some columns can be set to allow null, meaning the lack of a value, and often you want to
05:16find out if that is the case or not the case in your WHERE clause.
05:19So, let's say this Employee table that I've defined has a column called MiddleInitial,
05:24and that has been set to allow null values when people simply don't have a MiddleInitial.
05:30If I wanted to find those people, I might first think to write something like this,
05:35WHERE MiddleInitial = NULL, but actually equals is not a good word to use when we're checking
05:41for the existence of null, because it's not equal to NULL because NULL isn't a value.
05:46It's not equal to anything. What we want instead is the word IS.
05:51So instead of MiddleInitial = NULL, it's WHERE MiddleInitial IS NULL.
05:56And this would bring back all the rows where there had been no values stored in that column.
06:01Conversely, if I'm just interested in the rows that have a value in that column where
06:05we can say instead IS NOT NULL and execute that, and now we'd only get the rows where
06:11that column actually had a value for each row.
06:14And this covers the basics of the most commonly used options in WHERE clauses.
06:18I'm not trying to cover everything.
06:20Take a look at SQL Essential Training for a few more options.
06:25
Collapse this transcript
Sorting query results
00:00The way that the database will return the results of your query is not always what you
00:04want, so let's see how to sort those results.
00:07In this simple example I have a Product table in my database, and I've got a very straightforward
00:13SELECT statement that's going to select three of the columns Description, ListPrice, and
00:16Color from the Product table, meaning, give me all the rows.
00:20I haven't even applied a WHERE clause on this.
00:22So, get everything returned whether that's a thousand rows or a million rows, but there
00:27will be no inherent order in these.
00:29The way that they are going to be returned is currently more to do with the internal
00:33structure of the database.
00:34Now, it doesn't necessarily match what I might find most useful.
00:38So, let's say what I want to do is find out the most expensive products that I have.
00:43I would get that data in my results, but not presented to me in an easy to scan way.
00:48I'd like to see most expensive first, cheapest last.
00:51And I can do this with another optional keyword in an SQL query, which I'll put right at the
00:57end of this statement after the FROM clause.
01:00So, I'm going to add the keyword Order By, and it is written as two separate words.
01:05The question is Order By what?
01:07In this case, I'd like to order by the values in the column called, ListPrice, whatever those values are.
01:13So, I'll use the name of that column, ORDER BY ListPrice.
01:16Now by default, ordering is in ascending order, which would mean the row with the smallest
01:22ListPrice would be arranged first.
01:25If I want to order by the most expensive first, I need to make this descending order and to
01:30do that, I just type in the word DESC afterwards.
01:34There is an ASC keyword for ascending, but ascending is the default.
01:39The results of these will come back, and this time around, it's going to bring back the
01:43same number of rows, but it will order them by ListPrice descending.
01:47Now you can also pick multiple columns to ORDER BY.
01:51So if in this example, I was writing a simple SELECT statement to select all the rows from
01:56the Employee table, in this case where a Salary is greater than 50,000, because we can
02:00do WHERE clause as well as an ORDER BY. I'll do an Order By LastName, FirstName.
02:07I didn't use the word DESC or ASC, so they're both going to be ascending.
02:11So, the results will come back, again, it doesn't matter how many rows they were.
02:16We're going to first be ordering by LastName all the way through these results.
02:21But then wherever the LastName is the same, we're going to do a sub ordering within it
02:26and in this case we'll order by FirstName.
02:29So as you see a very simple format to start to impose some kind of structure on the results
02:34you're getting back in your query.
02:39
Collapse this transcript
Using aggregate functions
00:00Next up, we have a few more SQL keywords to work with and these are all grouped under
00:04the term aggregate functions, which doesn't really suggest how useful they can be.
00:09An aggregate or grouping function means it will perform some kind of calculation on a
00:13set of data that we describe, but return a single value.
00:17Now, what that single value is depends on which aggregate function we use.
00:21So, let's take a look.
00:22Now, we've seen already how we can do a simple SELECT statement like this one, and this will
00:27return everything, all our columns, all our rows, whether that's five employee rows or 5,000 or 500,000.
00:36But what if that number itself was the piece of information I wanted?
00:40What if I just wanted you to know how many rows are in this table?
00:43I don't need anything else. I just want to know how many rows.
00:46Well, I can do that by using an Aggregate function in SQL called Count.
00:50And what I'm going to do is just change the SELECT * to SELECT COUNT (*).
00:57Count everything in the Employee table.
00:59If I execute this, it will COUNT all the rows and just return a single value just, in this case 547, let's say.
01:07Now, you could of course use a WHERE clause if you wanted to restrict the results to just
01:12counting the number of employee rows that have a Salary greater than 5,000, we get a different result.
01:18That is all COUNT will do for you is just COUNT the number of rows for this particular condition.
01:24Now sometimes, however, the COUNT isn't what you want.
01:27In the previous video on Sorting, I used the example of selecting Products and using the
01:33Order By clause to sort them by, in this case ListPrice descending.
01:37But what if the only reason I was doing that was to find out what the maximum ListPrice was?
01:42Well, instead of using this Order By descending and looking at the top row, I could instead
01:47just do something like this. SELECT MAX.
01:50Now instead of saying SELECT MAX with the asterisk, saying SELECT MAX of everything,
01:55I'm just focused on one particular column.
01:57What's the maximum value of ListPrice in the entire Product table?
02:02In this case, we'll bring back $699.
02:04Well, if we have MAX for maximum, it's a pretty good guess that we're also going to have MIN
02:09for minimum, and we do. We also have A-V-G for average.
02:13We would add up all the values in the ListPrice column divided by the number of rows and return that single value.
02:21While we've seen COUNT, there's also an option of SUM.
02:24SUM, instead of counting the rows will total all the values up.
02:29So, in this case with this statement, we're looking for a CustomerID equal to 854 that
02:34will find all the rows for that customer and then add together all the values in the total
02:39due column. And that's one of the great things about working with Aggregate functions that
02:44they really don't allow an awful lot of complexity.
02:47They are going to return a single number.
02:49They need to be very straightforward to customize.
02:51However, we can take them one step further.
02:54So, we've seen how we can use the idea of SELECT COUNT (*) to get the number of rows.
03:01We can use it in conjunction with a WHERE clause.
03:04In this case, COUNT up the number of rows where the color is equal to 'Red', and we'll get some result back.
03:10But if I wanted to know not just the COUNT of Red Products or All Products, but I want
03:15to know how many Red Products we have, how many Black Products we have, Silver, Gold
03:19and so on, well, I could create multiple statements like this just changing the WHERE clause every time.
03:25But that's not only tedious, it assumes I know ahead of time what all the colors will
03:30be, and we might be adding new colors all the time.
03:33So what I would like you to do is retrieve both the COUNT and the Color column from that
03:39table, but to group the results by Color.
03:41Now, I might be tempted to just write something like this, To select the COUNT, and the Color
03:48column, but this will confuse SQL.
03:51It can either give you the single COUNT of all rows, or it can give you all the multiple
03:56values in the Color column, but it doesn't understand how to give you both at the same time.
04:00However, what I can do is add another SQL keyword, which is great for use with the Aggregate
04:07functions, and that is GROUP BY. So, I'm selecting here to GROUP BY Color.
04:13If we run this SQL query, this is what we're going to get or something very like this.
04:19It will count up all the products for a particular color, because we've told that to group by color.
04:24A GROUP BY is something that only makes sense with Aggregate functions.
04:29You don't use GROUP BY otherwise.
04:31Conversely on the other side, if you are using an Aggregate function, you are pretty much
04:35always going to use it by itself unless you used GROUP BY, because the most you would
04:40ever expect to return from your query is one single value unless you are using GROUP BY
04:45to categorize those results.
04:50
Collapse this transcript
Joining tables
00:00After a while, you're going to find that very straight forward SQL statements where you're
00:05just selecting a few pieces of data from a single table, well, they end up being a little limiting.
00:11If we've gone to all the trouble of defining multiple tables to store our data, we do that
00:15with the understanding that it will be possible to get it all back using the relationships that we've described.
00:20So, I want to have one SQL query not to be limited to selecting from one table but to
00:26be able to select from two different tables or even three or more and the phrase we're
00:30going to use is to join our tables together.
00:33So, in the example I'm going to go through, we have an Employee table and a Department
00:38table, and there's a one-to-many relationship between department and employee using the Department ID column.
00:46Department ID is a foreign key and employee is a primary key in Department, and it's just
00:51so we don't store redundant department details for each employee row.
00:56Now the question is how would we start to join these together in SQL?
01:00Well, let's begin just by doing a fairly regular SQL Statement.
01:05So, I'm selecting a few columns, First Name, Last Name, Hire Date, and Department ID
01:10just from the employee table. So, we've seen this one before.
01:14No surprises there in the results that we would expect.
01:17So to start involving the other table, the magic word here is JOIN.
01:22So, after the Employee table, I'm going to use the word JOIN and then say Department.
01:26So, from Employee join Department.
01:29This by itself isn't doing very much but we're starting to add the necessary pieces bit by bit.
01:36If I say I want to join these two tables together, I can then start adding columns from the Department
01:42table to the Select clause.
01:45So, after Department ID, I'm going to say common name, location that are both columns
01:50in my Department table. But immediately, we have a problem.
01:54If I'm now selecting from two different tables, it's really common that it will have a name conflict.
01:59In this case, Department ID is going to give us a problem here because Department ID exists
02:04as a column in the Employee table, and as a column in the Department table.
02:09So, the SQL query would be very confused, which one am I talking about?
02:13And in this case, what I could do is just to prefix that with the name of the table,
02:17and it wouldn't actually really matter which one we picked, but we need to be explicit
02:23so that SQL doesn't get confused.
02:26And if I wanted my SQL statement here to be very explicit, I could do this for every column.
02:31So, select employee.first name, employee.lastname, employee.hiredate, department.name, department.location
02:39and so on, but I still do have an issue.
02:42I still couldn't run this query because I need to describe exactly how these tables are to be joined together.
02:49The way that we do that is to use the word ON, and we'll use that in conjunction with JOIN.
02:55So, it's Employee JOIN Department ON, and I name the columns in each table and how they link together.
03:03In this case, it is Department ID. Employee.DepartmentID = Department.DepartmentID.
03:09We've now described how the join happens.
03:12If I run that statement, we'll get the results back which are combining, joining these two tables together.
03:19Now, one important idea here is in the kind of join that we're doing right now.
03:24It's only going to bring back rows where there it is a match between the two tables.
03:28So, if you notice in the actual employee table at the top, the third row is Alice Bailey.
03:34Well, Alice has a Department ID column value of null.
03:38She's not linking to Department.
03:40So, what that means is when we do the join, we will not get a row coming back for Alice
03:44Bailey because it has to have a match, and that's because what we're doing here is called an inner join.
03:50So, I am using this JOIN keyword here, and if I was to be good about this, and I usually
03:56would be, I should use the words INNER JOIN rather than JOIN, even though that is the default
04:02kind of JOIN is an INNER JOIN.
04:04Meaning, only bring back the rows where there is a match in both tables.
04:08So, we'll neither see the row like Alice Bailey because she has a Department ID of null but
04:14on the other hand nor will we see any of rows from the Department table that don't have matching Employees.
04:20In this simple example here, I don't have anybody with the Department ID of one so I'm
04:25never seeing the row that says Production CA with the budget code of A4, but sometimes
04:30you might want to start involving these other rows that don't exactly match, and that would
04:36be by creating something called an OUTER JOIN.
04:40An OUTER JOIN means we're going to pick one of the tables and say this one takes precedence.
04:45We want to see all of the rows return from a particular table and still show the matching data where possible.
04:51Now, that might sound a little weird so let me demonstrate what the difference would be.
04:54So instead of using the INNER JOIN keywords, I'm going to use OUTER JOIN, but I can't just
05:00write OUTER JOIN I have to be explicit.
05:03With an OUTER JOIN, you are typically saying one of these tables takes precedence over the other.
05:09We are interested in where they match but we still want to get the results where they don't.
05:13So, I would typically use the word left or right, A LEFT OUTER JOIN or a RIGHT OUTER JOIN.
05:19The left and right here simply means is that the table on the left-hand side of the word
05:24JOIN, which for us as employee, or it is the one on the right-hand side of the word join, which will be department.
05:30So, in this one, I'm going to do a LEFT OUTER JOIN, and it's going to look to the left to
05:34the word join and see employee and then say that employee will take precedence.
05:39Now, we get back very similar results but we are getting back a row for Alice Bailey
05:45even though Alice Bailey doesn't link to a row in Department, we're getting that row back.
05:49We're just getting null values because there's nothing we could put there.
05:53Conversely, what I could do instead of a LEFT OUTER JOIN if I just change that LEFT OUTER
05:57JOIN to RIGHT OUTER JOIN, we now put the focus on the Department table, and we say we want
06:02everything that matches but we want everything from that Department table.
06:07In this case, we would have rows coming back for the production department which it doesn't have any associated employees.
06:14We'd have rows coming back for the marketing department which doesn't have any associated
06:18employees, and we would still write out the ones that match, and that's the RIGHT OUTER JOIN.
06:24There is even a full OUTER Join which does both.
06:27I'm a little pressed for space here so I want show that but as you might imagine, it says
06:32to include all rows from both tables in the results matching them wherever possible but
06:38returning rows with null values when there's no match between them.
06:42Now, as you start to read more and more SQL, you probably come across a couple of different
06:46ways that joins can be written, but this is the recommended NC Sequel standard way of doing it.
06:56
Collapse this transcript
Inserting, updating, and deleting
00:00So far, we've been focused on using the Select statement to read information out of a database, but that begs the question,
00:06how did the data get in there in the first place?
00:09We have several different keywords in SQL for inserting, updating, and deleting information.
00:14Now, this goes back to the idea that any computer system, not just databases that deals with
00:20storing data needs to provide four fundamental functions, the ability to Create, Read, Update,
00:27and Delete often using the acronym CRUD.
00:29Well, if you want to work with CRUD in SQL, we don't use these exact words here.
00:35There are a couple of them that are different.
00:37We know already that when we want to read information out of the database, we use the
00:41SQL keyword SELECT, and if we want to create, we're going to use the SQL keyword INSERT.
00:48Now update and delete there's no conflict there. They are just UPDATE and DELETE in SQL.
00:53So, let's take a look at the ones we haven't covered yet.
00:56I'm going to talk about inserting first.
00:58So, the assumption is we're inserting a new row of data into a particular table and the
01:03general format is INSERT INTO, then your table name, then a list of columns and then the
01:10values that you want to insert into those columns.
01:13So, if we're going to insert into this simple imaginary Employee table, I'll use the name
01:18of the table, and then inside parenthesis, the name of the columns that I'm going to provide values for.
01:25Now, you typically do this because you don't always provide data for every single column in a new row.
01:32Some of these columns might be null.
01:34Some columns might provide their own default values and quite often you might have some
01:38columns that are automatically generating primary keys, and you certainly don't want
01:42to try and insert your own values in those situations.
01:46So, you name the columns you're going to provide values for.
01:50And then after the keyword values, you provide those values.
01:54If there are strings they need to be in single quotes, if they are numeric key, you don't need the quotes.
01:58The columns, and you values must match in number, in order, and type.
02:03You can't say you're going to provide five columns and then only provide values for four.
02:09And also the data types must match.
02:11If the column is defined as a number, you can try and insert a string into that column.
02:16So, if I go ahead and execute this, what it's going to do is create a new row in that table.
02:22Now, I'm making the assumption here that the way the database has been defined, Employee
02:27ID will be an automatically-generating primary key.
02:30So, we didn't need to provide a value there. Hire Date?
02:34We probably could have provided one, but I'm imagining that it was set up to do a default
02:39value of the today's date for this row to be created.
02:42And let's imagine that email was allowable to have null values there.
02:46So, if we didn't provide one, it's just null. Now of course, here's the thing.
02:51We don't have to rank manual insert statements for every new row in our database.
02:57It's much more likely that this process will be done programmatically.
03:00The people will be using a more pleasant user interface to enter data, but this is the kind
03:05of insert statement that's going on behind the scenes.
03:09Well, we know we can read information out of the database, but what if we want to change something?
03:14It's quite often that that we'll change information in the database once it has been created,
03:19and we use the update statement for that. The format here is UPDATE, table name.
03:24You're going to use the keyword SET, and you're going to change a particular column.
03:28So, what column and what value do you want it to be and then almost always you'll be using a WHERE clause.
03:35This is exactly the same as doing where in the select statement that allows you to limit
03:40the amount of rows that this update will apply to.
03:42So, an example would be, if we have that row that we just inserted, and I wanted to update
03:48it to change the email column, I'd say UPDATE the Employee table.
03:53I'm going to set the column code Email equal to, provide the string value, and then the WHERE condition.
03:59In this case, the easiest way would be employee ID equals 734, and if we run that it will
04:04reach directly into that row, and change that one value.
04:08Now technically, you don't have to have a WHERE clause, but just as leaving off a WHERE
04:14clause in the Select statement would return all the rows in that table leaving off a WHERE
04:18clause in an update statement would mean that you would set this column value for every single row in that table.
04:24Now, occasionally, that is what you want in an update, but more often you will restrict
04:28your updates to one row or a handful of rows.
04:31So, we use WHERE clauses for that and using where in an update, it's exactly the same
04:36as using it in a select. Finally, DELETE.
04:39Well, DELETE is actually the simplest because it's all the about the where.
04:43The format is DELETE FROM, particular table name, WHERE condition.
04:48We don't have to name any columns in this query because delete just works on rows.
04:53If you delete a row, you delete an entire row. You can't delete part of a row.
04:58So, there's no point in naming any of the columns here.
05:00If you just want to blank out some information in an existing row, well, that's an update.
05:05So, that WHERE clause and the DELETE just defines which rows and which table you're going to delete.
05:11So, we'd say to DELETE FROM Employee, WHERE. In this case employee ID equals 734.
05:18This is about as specific as we could possibly get.
05:21Now, you can delete multiple rows at a time the same way that you could use a where clause
05:26to select the multiple rows at the time, but for heaven sake take care with your delete
05:31statements, almost everyone who has worked with the databases for a while has a horror
05:35story about the DELETE statement that was written a little to casually.
05:39Because if you write something as simple as this and execute it, the database management
05:44system is just going to say, "Well, you're the boss and go ahead and delete every single
05:49row in this employee table." It won't ask, are you sure?
05:53It won't ask you to confirm anything. It will just do it.
05:57DELETE is a very powerful keyword.
06:00Now a good practice when working with DELETE and also when working with update is to create
06:05it first as a select statement.
06:07So in this case, I'll do a select staff, an employee where employee ID equals 734 and
06:12just confirm that the result I would expect back, in this case one row, well, that's what
06:18would also affect if I was doing an update, and that's what I'd affect if I was doing a delete.
06:22So, I'll create it as a select first and then change the select box to delete.
06:29Execute that, and we go to bit better faith that what we're actually affecting is just one row.
06:34So, to good practice to make sure the where clause is only selecting the rows you expect
06:39before you turn it into a delete or an update statement.
06:44
Collapse this transcript
The data definition language
00:00Now the majority of SQL that you'll write will be things like SELECT statements,
00:05INSERTS, UPDATES, and DELETES but that's not the whole picture.
00:09These keywords are sometimes lumped together as the DML part of SQL.
00:14DML standing for Data Manipulation Language, and that's an impressive sounding phrase for
00:20something that's really not all that important of distinction.
00:22It simply means that part of the sequel language that deals with working with and manipulating data in your table.
00:30So, selecting it, inserting it, updating it, deleting it, but none of these words actually
00:35allow us to alter the structure of the database itself.
00:38So, how do we create a new table? How would we say what columns exist?
00:43How would we define primary keys? We can't use any of these words for that.
00:47But that can be done in SQL using a part of the language that is lumped together under
00:52the phrase DDL or Data Definition Language.
00:57The three key words there are Create, Alter, and Drop.
01:02These words let us change the structure of the database itself.
01:06Let's take a look at a couple of them.
01:08If I want to write SQL to create a new table, this is the format that I'll use.
01:13The word CREATE, and then the table name you just make it up, it's whatever you want
01:17to call it and then in parentheses, the definition of your columns.
01:21This is after all what a table is it's simply naming the columns and giving them data types
01:27in any rules they must comply to.
01:29So, if I wanted to create a new employee table, I would say CREATE Employee and then inside
01:34parenthesis, here's how I would provide column definitions.
01:38We start off with EmployeeID as an INTEGER which is PRIMARY KEY.
01:43And these are key words we're using to tell SQL the importance of and the rules these columns must comply to.
01:51I'm not going to get into the specific of these for reasons I'll cover in just a moment,
01:56but the general format should be at least understandable about what this would do.
02:00It's going to create a new table with five columns, an integer, three VARCHAR, and that
02:06means the variable length character field.
02:08I can recognize even phrases like NOT NULL or NULL.
02:11So, it looks like we don't have to put anything in the department column and then we're finishing
02:15off with salary as an integer.
02:17Now, obviously in a production database, these can be far more complex than this but this
02:22is the basics of how it's done.
02:24Now, if we move on a little bit and realize we need to change the definition of it, we
02:28can use the key word, ALTER. Now, typically would alter a table.
02:32If you give it the table name and then you give it a bit of information about what you're trying to change.
02:37In this case, I'm saying I'm adding a new column.
02:40That column is called Email.
02:41It's going to be a variable length character string of 100 characters.
02:47If I have a problem with what I created earlier, I can use the last of the three keywords which is DROP.
02:52And we would simply say something like DROP TABLE Employee.
02:57We wouldn't need any WHERE clauses, any column definitions.
03:00We're simply saying destroy the entire table, again, something to be very, very careful about working with.
03:06Now, I wanted to introduce these phrases, but I'm really not going to spend much more
03:10time on these words here for two reasons.
03:13First, many people who work on the actual design and structure of the database will
03:18use a visual application like MySQL Workbench or Sequel Server Management Studio, rather
03:24than hand coding create an Alter statements.
03:28Second, if you do need to go into more detail, take a look at our SQL Essential Training course.
03:33It's true that if you're in a database administration role you all might then spend a lot of time
03:38with these kinds of data definition statements, Create, Alter, and Drop, but regular day to
03:44day developers working in these databases typically don't.
03:48You shouldn't be spending most of your time changing the definition of the database.
03:52You should be spending most of your time manipulating the actual data in it.
03:57So, that's why we're not going to focus on that.
04:00And it's worth mentioning that there are a couple of other keywords in SQL that are typically
04:04lumped together under the idea of data control keywords.
04:08This is where you can either grant or revoke permissions for people in the database.
04:13This is not an area we're going to cover in this course because that really gets into
04:18the specifics of things like the operating system and the authentication schemes that
04:23you're using for your particular database. So, it's just not general content.
04:28You'll get into that for a specific database management system.
04:32But it's worthwhile to know that SQL itself is broken up into these larger categories
04:38of the different things you're likely to do with it.
04:40Most of your time manipulating data, some of the time defining the actual database itself,
04:45and a little bit of the time controlling who has access to different parts of it.
04:50
Collapse this transcript
7. Database Modeling: Indexing and Optimization
Understanding indexes
00:00As your database starts to grow, there's a part of the database design that becomes more and more important,
00:05and that's creating and using indexes.
00:08An index center database is like an index at the back of a textbook.
00:12It will help you find things in that book.
00:15Say if I have a 500-page book on database management, and I want to find the content
00:20dealing just with date columns, I can look in the back in the index for date.
00:26It tells me that's on page 124, and I turn directly to that page.
00:30Indexes are all about speed of access.
00:33I could have found that content by going through the book page by page.
00:37So the index isn't adding any content. It's just helping me find it.
00:40Here's the most essential ideas.
00:42See, the data in your tables is inherently unstructured and unordered.
00:48Sure, all the data in one row is kept together, but there's no particular enforced order to
00:54those rows unless you say otherwise.
00:56And yes, when we write queries, we can use the order by clause to impose a sequence on
01:01our results, but that's not the issue here.
01:04The issue is getting directly to certain content like getting directly to one specific row in a large table.
01:12You might think, well, hang on a second.
01:14Didn't we take care of that by adding a primary key to our table? Well, yes and no.
01:20The primary key does let us uniquely identify a row, but that's a very different question
01:25from does it let us find that row really fast when it's somewhere in the middle of 500,000 other rows.
01:32That's the problem we're trying to fix with indexes.
01:35Like with reading a textbook, I don't want to have to leaf through every single page every time.
01:39I want to be able to jump directly to the relevant point.
01:43So we need an index.
01:44The most basic, the first index that's created, the primary index on any table is what's called
01:51the clustered index, and that means pick a column as the clustered index and the database
01:57will order the data in that table based on that column.
02:02Meaning that on that physical disk itself, where these rows of data are stored as bits
02:07and bytes, they're actually sequenced that way.
02:10So the clustered index is almost like having a residential phonebook or White Pages where
02:15the clustered index in a phonebook would be last name.
02:18So everyone is listed in the book in alphabetical order by last name.
02:22The actual data is structured and presented that way.
02:26You don't have to look in the back of the book to find out where the names beginning with D are.
02:30You know where they are.
02:31They're towards the front between the Cs and the Es, because the data is ordered that way.
02:37The most common clustered index in any database table is the primary key column, whether that's
02:43an automatically generated integer or a unique string or anything else.
02:48Actually, most database management systems when you define a new table will automatically
02:54make the primary key the clustered index unless you say otherwise.
02:59That is usually the best option, because we already chose the primary key as a significant
03:04way to get to the individual rows.
03:06But if you found yourself accessing the data primarily using another column, you might
03:12change that column to be the clustered index instead.
03:14See, each table can only have one clustered index.
03:19The same way the names in a phonebook can only be sorted one way, but what you can have
03:24more of is non-clustered or unclustered indexes. Here's the situation.
03:30I have a basic customer table here with a primary key on the CustomerID column and the
03:35clustered index is also on that column.
03:38So if I write a query that uses that, I select where CustomerID equals query, it's going
03:44to be very fast together. There are two things going on.
03:47The database engine knows how to identify that row, because it's a primary key, but
03:52it knows how to get to that row quickly, because it's a clustered index, but on the other hand,
03:57if I figure out that we often also access this table using last names,
04:01so if I write a lot of queries that looks something this, select Staff and Customer
04:05where LastName equals Smith, well, that's a different matter, because the database management system has no idea physically
04:13where on this table where that particular last name might turn up.
04:18That means a query like this would require the database management system to start at
04:23the first row and read through every single one, row by row looking for that piece of data.
04:30This is sometimes called a full table scan.
04:33It's a very inefficient way to get to your data, and the more rows you have, the worse it gets.
04:38It wouldn't be bad with a dozen or a few hundred rows, but once you get to thousands, tens
04:44of thousands, hundreds of thousands, it's highly inefficient.
04:47If I have to do this kind of query a lot where we're actually querying on an unindexed column,
04:53it's going to be a problem. Here's one way I can fix it.
04:56I can create a secondary index, a non-clustered index.
05:02This is more like having an index at the back of the textbook.
05:06It's created as a separate piece by itself with its own meaningful order.
05:11So in this case, I'm imagining I have this index existing, it's sorting now by last name
05:16ascending, the way that we can't actually do in the regular table, because we're already sorting by CustomerID.
05:23We're basically creating a map here of how we can go to LastName.
05:27In this case say Smith, find that particular name.
05:31It's a much easier to get to in the index, because the last name is ordered alphabetically.
05:37Then we find the location of that in the regular table, and we can jump directly to that place.
05:43Now it's not quite as quick as using the clustered index as we still do have to read from one
05:49place then jump to another, but it's much quicker than a full table scan.
05:54Very useful if you know there's a column you're going to use a lot in your queries.
05:59A common thought from people new to database design at this time is usually, well,
06:04why don't we just index everything?
06:06Why don't we add a non-clustered index to every column so whatever query we write is nice and fast?
06:13That is a really bad idea. Here's why. Every index has a cost.
06:20Indexes are a benefit when reading data, but they're a detriment when writing or changing it,
06:26because they must be maintained.
06:28So let's say, I've got a table here, pretty typical employee table, and I've got two non-clustered indexes on this table.
06:37So the clustered index, the way the data is actually sorted is based on EmployeeID.
06:42And I have a non-clustered index on LastName, and a non-clustered index on the FirstName.
06:48Well, this is pretty good.
06:49It means when we do a select, I can go by EmployeeID, I can go by FirstName, I can go
06:53by last name, and that will be quite fast.
06:56But when we write to this table, when we insert a row, we're not only writing data into the table itself.
07:03And if your clustered index is on an automatically incrementing number we just write to the end
07:08which is easy, but every insert we do is going to require another change, a write into every non-clustered index.
07:16If I have added this new row of Brenda Daniels, I'll add it to the table,
07:20I now need to go ahead and make a change to the non-clustered index for LastName to insert
07:25Daniels in there and point to the EmployeeID or the location that she's at.
07:30Then do another insert to make sure that Brenda is inserted on the non-clustered index for FirstName.
07:35So if you started adding even more of this.
07:38If you have say five non-clustered indexes as well as your clustered one, that means
07:43that one insert which started off as one physical write operation to the database disk is now
07:49six write operations, and you have made things significantly more inefficient.
07:54That's why you don't put indexes on everything.
07:57You only put them on the columns that you really need and use all the time.
08:02It's far better to do an occasional table scan than have multiple indexes that you don't really use.
08:09Now one of the reasons I don't talk much about indexing up front in the early modeling section
08:14of the course is that it's an easy mistake to think that you decide your indexes up front and then you're done.
08:20Well, it's true you often do select your primary clustered index then with your primary key,
08:26but working with indexes as a much more ongoing thing.
08:30Because you often can't tell what the best non-clustered secondary indexes will be until
08:36your database has been in use for a while, and you figure it out how people are really
08:40using it and what kind of queries they're really running rather than how you hoped or
08:44thought that they'd use it. So indexing is a trade-off.
08:48It improves read performance, but at the expense of writing to the database and also size as
08:54every index will require its own space on the drive.
08:58In a large database, indexing is something the database administrator may spend a lot
09:03of time working on as the system grows.
09:06But there's one good thing, changing and tweaking database indexes is something that can be
09:12done without breaking any applications that use that database.
09:16Because as long as you're not changing column names, all your existing queries should work exactly the same.
09:21They'll just get faster. Hopefully.
09:26
Collapse this transcript
Understanding write conflicts
00:00Earlier in the course I talked about transactions and the ACID test.
00:03Now it's time to cover a little bit more about them.
00:06If you need to refresh your memory, take a look at that earlier section where I talk
00:10about the characteristics of a transaction, because here I'm going to focus just on one part, the idea of isolation.
00:17The idea that when simultaneous users are interacting with the database at the same
00:22time, how do we stop them from treading on each other's toes?
00:25This is the inherent problem of concurrency, or concurrent users.
00:30Well, first, let's understand how they might conflict with each other. Here's an example.
00:36We own a small bank, and we've got Alice and Bob who are two customers in our system and
00:40they have three accounts, a joint account, and a separate account each for incidentals.
00:46These are each a different row in this account table, about as simple as I could possibly make it here.
00:53Every so often, Alice and Bob want to transfer money from their joint account into the individual one.
00:59Every time they do, the process goes something like this.
01:02I'm going to assume that Alice is doing this transaction.
01:05I'm just going write pseudo code here, because it's not the SQL I want to talk about, it's the steps.
01:10We get the balance of the joint account. We reach into the table.
01:14Okay, joint is $10,000.
01:15We then get balance of Alice's account which is $50.
01:19Then we'll perform an operation to update the balance of the joint account to what I
01:23just retrieved which was $10,000 minus $1,000.
01:26So we're reaching into the database, setting it to $9,000, and then we'll reach back in
01:30and update the balance of Alice's account to $1,050. All well and good.
01:36We've subtracted $1,000 from joint. We've added $1,000 to Alice.
01:40Then one day, they both log on to the website at the same time and both attempt to do this move.
01:48Alice starts her operation.
01:50She reads the balance of the joint account. It's $10,000.
01:53Bob starts his. He reads the balance of the joint account. It's also $10,000.
01:57No changes have happened yet. Alice gets her balance, which is $50.
02:01Bob gets his which, is $45. Then Alice issues the update to joint.
02:07She'd retrieved it as $10,000.
02:08She will subtract $1,000 from that, which is $9,000, she changes the database.
02:13Bob also tries to perform that operation. He believes that joint account is $10,000.
02:18So he will also set that to $9,000.
02:21Then Alice updates her account adding $1,000 to it.
02:24Bob updates his adding $1,000 to that.
02:27Now the end result is that the joint account is down $1,000, but both Alice and Bob's account
02:32have been credited $1,000 each. The balance should be $8,000, not $9,000.
02:38Now Alice and Bob might be perfectly happy with this, but our little bank will stay little
02:42if we don't take care of the situation. This is what often called a race condition.
02:46It's a conflict where people two people or two programming threads are doing very similar
02:51steps and just getting a little bit ahead of each other.
02:54They're trying to affect the same data.
02:56When they're doing it together, you'll end up with a very different outcome to what would
03:00have happened if they done exactly the same steps, but Alice did all hers first and then Bob did his.
03:07So the first step to fixing this situation as we make these atomic.
03:12We're making these several actions grouped into one indivisible unit by making them transactions.
03:19We do this by adding SQL keywords at the start and end.
03:23In this case, we use the words Begin Transaction.
03:26We'll do all the operations we want to do and then we write the SQL keyword commit to say Our transaction is done.
03:33Now, these are SQL keywords that deal with transactions, although you will find that
03:37you write slightly different code to create a transaction in the different database management systems.
03:44Sometimes it's just Begin, sometimes it's just Begin Work, sometimes it's Begin Transaction
03:48or Start the Transaction, but again it's the core concept we're going for here.
03:54We're starting a unit of work.
03:56We want to go all the way through and commit if everything is successful.
04:00If there's a problem anywhere in the transaction, we want the database to roll back to the start
04:05of the transaction as if no part of it had ever happened.
04:08I said this is step one, because this by itself may not help our race condition problem.
04:14If both of these transactions are still executing at exactly the same time and they're allowed
04:19to do that, because we didn't actually have any errors occur.
04:24The issue is that Bob is being allowed to read from the same table that Alice is in
04:30the process of changing, but she hasn't finished changing it yet.
04:34This is often what's referred to as a dirty read.
04:37There's a transaction going on that's partway through it.
04:40It's going to be changing this data, but we're allowed to read from somewhere in the middle of that transaction.
04:46Often what happens is we don't want to just treat these all as one unit.
04:50Just making the transaction is not good enough.
04:52We also want some kind of locking to occur so that we are prevented from simultaneously
04:57changing the same data at least until the transaction is over.
05:02Now one way of doing this is what's referred to as pessimistic locking where we are pessimistic
05:08that transactions will conflict with each other.
05:11So as soon as a transaction starts, we're going to lock the data that it's referring
05:15to, and only unlock it once the transaction commits.
05:19Here's how this would work.
05:21Going back to the beginning situation we are creating transactions for both Alice and Bob.
05:26So Alice's transaction starts.
05:28She reads the balance of the joint account and she will lock that information.
05:32Now Bob would then attempt to read in his transaction, but he won't be allowed to. He'll be refused.
05:38The database might return an error, but more typically we just use blocking.
05:43We're not returning anything.
05:45We're just making Bob wait, because hopefully the entire transaction on the left is going
05:49to take a tenth of a second. We'll commit and everything unlocks.
05:53If it's more complex, if we've got multiple updates, full table scans, the database is
05:57slow and not indexed well, well, Bob may end up waiting for a lot longer or the database
06:03might return an error or the application that Bob is using to access the database might time out.
06:08Anyway, let's move on because we should hopefully only have milliseconds going on here.
06:14Bob is waiting, Alice goes ahead. She gets the balance of her account.
06:19She reaches in and changes the joint account to $9,000.
06:23She then updates her account and then hits commit.
06:26Her transaction is over, and that means any lock that she was applying is now unlocked.
06:32Bob will now get that message back. The balance is $9,000.
06:36He is allowed to continue to get his balance. He can subtract 1,000 from that.
06:40We're down to $8,000 as we should be. We update, and we commit his changes.
06:46The thing is pessimistic locking is great for Alice, but not necessarily so great for Bob.
06:52If you always do pessimistic locking, you may end up with a lot of read errors or a lot
06:57of people having to wait.
06:59Although with our situation we know there's a conflict, there is lot of cases where transactions
07:03aren't really conflicting, because we're reading multiple pieces of data.
07:08We're not always changing everything.
07:10One option with some database systems is to allow what's called optimistic locking, which
07:15means Bob would be allowed to read from the table while the transaction is going on, because
07:20we are optimistic that there won't be a conflict.
07:23We begin with the same situation. Alice and Bob both start.
07:27Alice starts her transaction and reads the joint account balance.
07:30Bob tries, but where he was refused a moment ago with pessimistic locking, he's going to
07:34be allowed to right now. Both continue. Alice will read hers, Bob will read his.
07:41Alice will change the joint account down by $9,000.
07:44Now here's the difference. Bob will try and update that too.
07:48He thinks he is going to update it from $10,000 to $9,000.
07:52What's going to happen with optimistic locking is the database will immediately realize that
07:57the contents of this row has changed since he read it, and it's going to actually issue an error.
08:02It will detect that there was a dirty read, and this transaction can't go any further,
08:06because it doesn't know how to handle that conflict.
08:08In most cases, it will just roll back to the beginning of the transaction.
08:12Bob would get some kind of error that the application would need to deal with.
08:16Alice can continue on, because her transaction hasn't thrown any kind of issue.
08:21She can commit with her changes in place.
08:25When you're working with locking, both pessimistic and optimistic, this is one of the areas where
08:29it's implemented a little differently across Database Management Systems, although the principles are the same.
08:35Now with some DBMS, you have specific keywords to explicitly lock data that you're accessing.
08:41With others, there's default behavior, what's called the default isolation level of a database,
08:47meaning, do we naturally do pessimistic locking or optimistic locking or perhaps something in-between.
08:53We will lock any inserts and updates in the transaction until they commit, but not the select, just the basic read.
09:00Take a look at your chosen database management system when it comes time to do this.
09:05
Collapse this transcript
Understanding stored procedures and injection attacks
00:00I've shown a lot of examples of detached SQL Queries that you might write casually as needed
00:06in a database application or from a Command Line.
00:09However, as you start to do more complex queries, you want to keep and reuse them.
00:13You can do this by creating a stored procedure.
00:16A stored procedure, sometimes called a sproc, is nothing remarkable.
00:19It's a chunk of SQL that can be written, given a name, and stored directly in the database,
00:25kind of like creating a function or method in other programming languages, and then it
00:30can be executed multiple times. So here's the most basic way.
00:34We have some SQL that we've written, and we want to reuse.
00:39Okay, this one isn't all that complex, but let's imagine it was a bit more involved than this.
00:44All we can then do is take this and surround it with a little bit of code that will create a named stored procedure.
00:51In this case, the procedure is called HighlyPaid.
00:54What we can then do a little later is call it using the keyword CALL and the name of the stored procedure.
01:00That's going to execute it and return the results as if we just typed the SQL out in full.
01:05This is going to be the main way that you reuse SQL.
01:09You could call it form within a Database Administration Application, you could call it from an application
01:15you were writing yourself in some other programming language or from a reporting application.
01:20Now, it is another area where the actual creation of a stored procedure is a little different
01:25across the Database Management Systems, although there are core similarities.
01:29A stored procedure needs a name, but we can take it a little bit further.
01:34Let's say if you're writing slightly more complex where clauses, in this case selecting
01:39all the employees for a particular department and ordering them by last name and first name,
01:44you probably want to parameterize this in some way so you don't have to always return
01:48just the people from the sales department.
01:50We can create this as a stored procedure, I'll call it EmployeesInDept, and then inside
01:56the parenthesis here, I'm going to define a parameter.
02:00I'm using the SQL keyword IN to say that this is a parameter coming into the stored procedure.
02:05We can also have OUT, but I'm not going to cover that here.
02:08I'll call this parameter dept.
02:10I can call it whatever I want and say that it is a VARCHAR with a maximum length of 50.
02:16I can then use that dept and just replace the string sales, and this is our parameterized
02:24stored procedure, very much like creating a function or a method in a lot of other programming languages.
02:30It simply means when I want to call this later on, I call the name the of the stored procedure
02:35passing in whatever parameter I want it to be replaced when the SQL actually runs.
02:41We can take it even further than this.
02:42We can give our SQL parameters variables, even flow and error handling.
02:48It's very much like turning it into a mini- program, but this would be where we diverge into the
02:53specifics of the particular Database Management Systems, which I want to avoid here because
02:58they handle things like parameters and error handling a little differently.
03:03But it is very common to require developers to write their SQL in a stored procedure.
03:08I've worked in several organizations which didn't even allow developers to manually write
03:13SQL inserts, updates, and delete statements directly in their programs.
03:18They had to write them as stored procedures.
03:21One of the benefits there is all the SQL is actually staying in the database and can then
03:26be controlled and hopefully optimized by the database administrator.
03:30And using a stored procedure is also a level of protection against something called an SQL Injection Attack.
03:38These were and still are common vectors of attack against websites.
03:42Now, let's say you've got an order look-up page on your website.
03:46The customer sees something like this, a basic textbox and a button.
03:51This could be a more substantial form, it could be a login form, it could be a sales
03:55form, but let's take something simple like this.
03:58What you're imagining is that they will enter in their customer ID and hit the Submit button.
04:03You're going to take that and in your web application code, whatever that is PHP, C#,
04:09VB.NET, Java, you're going to take it and create a custom SQL string, something like this.
04:16You're actually concatenating a few things together.
04:19So SELECT * FROM Orders WHERE CustomerID =, and then you'll add the contents of this text
04:24box and then add another little bit to finish off the last quote, and you'll then combine
04:30that into a string that will look something like this and execute that SQL, all well and
04:36good, and you'll get results back for that customer.
04:40Here's the problem.
04:42What happens when the user who sees this page is malicious and they enter something like this?
04:49Looks a little strange, so x'; what's going on?
04:54Well, really the important thing is in the middle here, the SELECT * FROM Users.
04:59What someone can do if they're malicious is they can wonder whether your website is
05:03liable or responsive to SQL injection attacks.
05:08If it is, that usually means that this is going to be constructed in some kind of concatenated string.
05:14So I'll put in this X with the quote after it to try and basically finish off that first,
05:21the standard, the built-in SQL command.
05:23So, that comes Select * FROM Orders WHERE CustomerID = 'x' with the quotes surrounding it.
05:30Then there's the semicolon to finish that statement.
05:34And then I'm writing my own SQL statement that I want to do, SELECT * FROM Users.
05:39Finish that with a Semicolon, and then I'll put in these two dashes.
05:43That's actually a SQL comment, and it means to tell this to ignore that final quote, which
05:49the string was trying to concatenate on the end in our code.
05:52So, it's a very slightly deceptive way of dealing with this.
05:56What that means is when we execute the SQL, it will essentially pass through three SQL commands to the database.
06:02The first one which may not return any values, the second one that we've just injected into
06:07the system, and the third one which will count as a comment, so that in itself could be returning
06:12a lot of sensitive data to someone who was trying to do this.
06:16And the worst thing is what if they did something like this, very similar format, except they're
06:22doing something really malicious.
06:24And if your SQL is executed with high permissions, what's basically going to happen is they will
06:29pass through the DDL command to drop the table of orders, and you've just lost an entire part of your database.
06:36Now it's true, you might not have a table called Orders, or in the earlier example,
06:40a table called Users, but if someone's first attempt generates an error message which is
06:46passed back to the user, they can often figure out that they can at least get SQL injection
06:51to work and they can just keep trying statements until they get something they find interesting.
06:56Allowing SQL injection can allow someone to log into your system as someone else, even
07:01administrator, without knowing a username or password.
07:04Now, it is true that these days, most web application software platforms have precautions
07:09against this, they guard against injection attacks, but they are still a potential issue.
07:14And one way around them is to use stored procedures with parameters rather than constructing their own strings.
07:21As stored procedures inherently don't have this level of flexibility in what's passed into them.
07:27You can't dynamically break a stored procedure apart into several statements.
07:32
Collapse this transcript
8. Database Options
Desktop databases
00:00Now let's move from discussing ideas and concepts into the ways you can actually sit down
00:05at a computer and start using them.
00:07In the next few movies, I'm going to be going over database management system software,
00:11what are the main options, and why might we choose or have to use one over another.
00:17We're going to start with the simplest kind,
00:19Desktop Database Applications, meaning an application you would install yourself on
00:23your own desktop or laptop, and you'd use the same application to both design the database
00:28and also to use it and enter data into it.
00:31There have been many products in this category over the years.
00:35The two most relevant now are Microsoft Access and FileMaker.
00:40And Access in particular is everywhere because it's part of the Microsoft Office Suite.
00:44There are other options, there are things like the open source alternatives like Apache Base
00:49and smaller and more purely personal database software like Bento, but really FileMaker
00:55and Access are the ones to talk about here.
00:57Now these kinds of applications are not targeted primarily at professional developers, although
01:03development skills are often very useful.
01:05They are targeted at power users and business users.
01:08They are a step up from just using a collection of spreadsheets to manage your data, but they're
01:13aiming at the same kind of person who might have done that.
01:16Now, reasons why you might use these, they are simple to install.
01:21You don't need to get the guys from IT involved, doesn't need a complex back-end server.
01:26Just install and go. They are pretty easy to use.
01:29Again, targeted at business users, not solely developers, so you can get quickly up and
01:33running without knowing much about databases or how they should be built, and whether this
01:38is a good idea is a topic of some discussion.
01:40Now of some help with this is you'll often find templates or starter solutions to get you moving.
01:46Microsoft and FileMaker know that many people use these for managing pretty typical similar
01:51business scenarios, like keeping track of costumer contacts or asset management.
01:56So they provide pre-built starting points for those situations that a user can create and then customize.
02:02They also provide database and user interface design tools.
02:06So you're not just designing your tables, your columns, and your relationships, but you're
02:10also creating the user interface to enter data into those tables.
02:15Now much as that may sound obvious, most enterprise-level database systems do not provide any
02:20kind of application targeted at the end user.
02:24Whether you want someone to interact with your database using a website or a desktop
02:28app or a mobile app, and you're using something like DB2 or SQL server, well, you're going to have
02:33to build that application. They don't provide it for you.
02:37They also provide reporting tools, and things like Access and FileMaker include not just
02:41the user interface for viewing and adding data, but options for generating reports based on that data.
02:47So, several legitimate reasons for using them, but also several reasons that you wouldn't,
02:53and the primary ones being that you have a lot of people or a lot of data.
02:58And yes, okay, a lot is an unfortunate hand-wavy term here.
03:03What does a lot of data mean? What is a lot of people?
03:07Well, for desktop databases, the perfect number of users is one.
03:13If you create a new database in Access or FileMaker, by default it's a single file that's
03:17saved on your machine, like having a Word document or spreadsheet.
03:21It is one file that contains everything about the database, and that's where the data will
03:25remain unless you say otherwise.
03:27Now you can put that file on a server and share it across a network, and there are various
03:32techniques and workarounds for sharing these kinds of databases.
03:35But the more users you add, the less reliable it becomes.
03:39Microsoft Access 2010 has an official limit of 255 simultaneous or concurrent users, but
03:46personally, I would never get anywhere near that number.
03:50If I know I even need 10 simultaneous users, I'm not going to use a desktop database,
03:55though it does depend on the kind of activity, because it's much easier to support 20 people
04:01at the same time just reading data out of a database than to support even half that
04:05number editing data at the same time.
04:08Now FileMaker has an official maximum limit of nine simultaneous users, but you can use
04:13FileMaker Server to increase that to 100 or 200 people.
04:16With Access, you can also upload to SharePoint server to make it easier to allow multiple users.
04:22But if you know from the get-go that your database is intended for hundreds or thousands
04:26of simultaneous users, these are not what you should be looking at.
04:29Likewise, if you're building a public website and looking for a back-end database, again,
04:34FileMaker and Access, not the right call.
04:36These are for internal use with small groups and small to medium size businesses with small
04:41to medium amounts of data.
04:43As a good example, just look at the templates that these applications provide.
04:48There are things like a Contact database, Asset and Inventory Management, and Event Planning
04:54database, and for this kind of level, they're great.
04:57For larger projects, they can be useful for doing things like a proof of concept simply
05:02because they're very easy to get up and running.
05:05But as you can tell, we haven't spent really any time on desktop databases in this course,
05:09our focus has been on more substantial and more scalable solutions.
05:13But everything we've covered in this course could be used for, say, an Access Desktop Database,
05:18and it would allow you to built a better and a more efficient one.
05:23
Collapse this transcript
Relational database management systems
00:00If any database systems can be described as classic, it's these ones, the Relational Database Management Systems,
00:06or RDBMS, that we've been focusing on in this course.
00:10They all share the key features discussed.
00:13They're made of tables with defined columns, multiple rows of data,
00:16Primary keys and Foreign Keys, and they support Relationships, Transactions, and the ACID test
00:21and using SQL to talk to them.
00:25The most obvious in this category are Oracle, IBM's DB2, Microsoft's SQL Server, MySQL,
00:32but there are several others that can be mentioned in the same breath.
00:35SQLite, Sybase, Teradata and Microsoft Access uses these ideas, as does PostgreSQL.
00:42Although technically speaking, PostgreSQL adds a few extra features, so it's formally
00:46known as an ORDBMS, an Object-Relational Database Management System.
00:51Now, of course these systems aren't identical. There are differences between them.
00:56There are differences in features, differences in implementation and hardware requirements
01:01and expense, but it is the similarities we're interested in here.
01:06These are all based on ideas developed by a gentleman by the name of Edgar Codd back
01:10in the 1970s, and the products themselves have been around nearly as long.
01:15Oracle was first released in 1979, and the first relational database that I used was DB2,
01:20and that was in 1983.
01:22Now, you'll sometimes hear the phrase enterprise- level used to refer to these database systems.
01:28Now, enterprise-level has no strict definitions.
01:30It's really more of a buzz word to refer to large organizations or government departments
01:35with at least hundreds but often thousands or tens of thousands of users.
01:40It's not so much that strict number of users is important, just that by the time you get
01:44to that level, your demands are more stringent.
01:47You need constant uptime, disaster recovery, continual back-up, and these systems will provide that and more.
01:53Now, they might sound like one app, one product, but you don't just casually install Oracle or install SQL Server.
02:02The Relational Database Management Systems are often split up into a family of different applications.
02:08You'd first install the database engine, typically on a shared server or using several servers.
02:14It just runs continually in the background but might have no graphical user interface at all.
02:19You'd either use a command line to interact with it or install a separate application
02:24just for database administration, things like SQL Server Management Studio for Microsoft SQL Server,
02:30or Oracle SQL Developer, MySQL Workbench. There are many others.
02:36Sometimes you'll need a separate application to work with back-up, another application
02:41to monitor and report on the database, and so on.
02:45Now beyond that, they have multiple versions or additions available of these.
02:49Often there is at least a standard addition and also an enterprise edition with extra
02:53features for very large-scale deployments, but typically, none of them provide any application
02:59targeted at the end user, and you need to build that yourself.
03:02Now, the larger and more complex your databases get, the more time they take to maintain and
03:08the more likely that you'll need a formal database administrator, or DBA.
03:11In large organizations, you may have multiple employees whose full-time job is simply to
03:16maintain the Database Management System and the databases inside it. So, is it backed up?
03:22Do we have enough storage for the next week month or year?
03:24Is it responding quickly enough?
03:26Is the design of the database as efficient as it could be?
03:29Does anything need to be purged from it?
03:31Do we have the right tables, columns, keys and relationships, and very importantly,
03:36do we have the right indexes?
03:39As you might imagine, being a good database administrator requires not just a good general
03:43knowledge of database design, but the specifics of the particular Database Management System you're using.
03:49Now, one obvious difference also is that some of these are purely commercial products, while others are open source.
03:55Now, I'm sometimes asked, okay, if I can buy Microsoft Access for a couple of hundred dollars,
04:01how much is Oracle? How much is DB2? That's not a simple answer.
04:05Most of these database systems can be licensed multiple ways on a per user basis or by how
04:11many machines--or rather CPUs--they're running on, or by which addition you buy, or by what
04:16extra features you need or don't need, and we're not covering that at all.
04:20However, if we're talking about licensing, one thing that is worth mentioning is the
04:24growth in recent years of hosted cloud- based database system providers, like
04:30Microsoft Windows' Azure, which does provide an option for creating hosted SQL Server database servers.
04:36You simply choose the size of the database you want and the number of databases you're
04:40having, and it'll give you a price for this.
04:42In a similar fashion, there's also the Amazon Relational Database Service, Amazon RDS.
04:48This one gives you an option of using MySQL, Oracle, or SQL Server with different prices and options for each.
04:55Now, you still have to design the database and build the applications that use it, but
05:00you can pay them to take care of the technical infrastructure.
05:03Now, note that even the open source database systems like MySQL are not always free.
05:09MySQL is owned by Oracle now, and while the database engine is free to download and use,
05:14there is an enterprise edition that includes separate applications for back-up and monitoring,
05:19and that is a commercial product.
05:21However, on the flip side of that, even the commercial database vendors also provide a free edition.
05:27This is typically restricted, although what you'd called restricted is still usually much
05:32more powerful than a desktop database.
05:34The constraints are often a reduction in size, perhaps the database can't be larger than
05:394 gigabytes, or the database engine might only use one CPU even when installed in a multi-CPU machine.
05:46When you're looking for the free edition of any Commercial Database Management System,
05:51look for the word Express.
05:53There's a SQL Server Express Edition from Microsoft.
05:57Oracle also has an express edition which is free.
06:00DB2 has an Express edition, although strangely that's not free, but there is an Express-C
06:06or Express-C version that is free.
06:09These are often very useful in the learning processes. They are convenient enough to just
06:14install on your own machine for experimentation.
06:18While Relational Database Management Systems are the most widely used with the longest
06:22history, they aren't necessarily always the best option.
06:25Let's take a look at a couple of other database types.
06:30
Collapse this transcript
Object-based and XML-based databases
00:01With the rise of XML as a format for exchanging data, there are also a handful of
00:05Database Management Systems oriented specifically around using the XML as their internal structure,
00:11rather than the tables of columns and rows.
00:13If you haven't delved into the world of XML much, the next two minutes might be a world of jargon.
00:19That's okay. This is really targeted to those who know they have to deal with a lot of XML or are worried they might.
00:27XML specific Database Systems include BaseX, Sedna, and eXist.
00:32These are all open source. They store XML data directly.
00:37Large amounts of XML documents using the nested tree structure of XML.
00:41So, you not only store XML directly, you use XML languages, typically XQuery, to query
00:48that data rather than using SQL.
00:50Now, these are for quite specific users, rather than the general use of Relational Database Systems,
00:56so we're not going to be spending much time on these, other than just making
01:00sure you're aware of their existence.
01:03If XML is naturally your only kind of data, they are worth looking at.
01:08If, however, what you have is some XML, there's nothing that stops you from just storing XML
01:13in a regular relational database table. It is after all just text.
01:19And these days, it's very common to see XML as a specific column data type for a regular relational database.
01:27SQL Server, Oracle, and DB2 all support XML as a Column Type.
01:32With other database systems, you can just store it as text.
01:35The benefit of having a specific XML type is usually that the database will support
01:40using XQuery to manipulate that particular column, instead of just SQL.
01:47We can store XML data directly as a column value and either use SQL to work with it
01:52or XQuery, but enough, moving on.
01:56Object-Oriented Databases, unlike the previous section, the next two minutes will mainly
02:01benefit those who know a little about object-orientation.
02:05One of the down sides with the Relational Database Management Systems came with the rise of object-oriented
02:10languages in the '80s and '90s like C++, Java, C#, and so on, because the objects we would
02:16generate in software in those languages don't necessarily map exactly to rows in the
02:21database table, they are more complex than that.
02:24There has been an Object-Oriented Database Management System over the years.
02:29Object databases, or OODBMS, they're trying to fix this.
02:32Things like Objectivity/DB, VelocityDB, and Versant.
02:37Like XML-based databases, they never reached the popularity level of conventional relational database systems.
02:43They did become popular, but mainly in niche areas like engineering or physics, not in
02:48general use, and like XML database systems, we're not working with them here, just making
02:54sure you're aware of their existence.
02:57Here's the thing, relational databases are still enormously popular with object-oriented languages.
03:03Yes, the problem does exist that software objects don't always match one on one with rows in a table.
03:11What's very popular is to use an ORM Framework, Object-Relational Mapping Software.
03:16This is a software framework that helps us map from objects in an object-oriented language
03:22to regular relational database tables and back again without really thinking very much about that mapping.
03:29These frameworks include NHibernate in Java, the Core Data Framework for Objective-C,
03:34the ActiveRecord Library in Ruby, and so on.
03:38These are always very specific to the language that you're working in, so they're out of
03:42the scope of a course like this one.
03:44In fact, with this, this is all I'm saying on these topics.
03:47Neither XML nor object databases became anywhere near as popular as some people expected.
03:53In fact, the most impactful developments in database technology came with the rise of
03:58what are called the NoSQL databases, and we'll talk about those next.
04:03
Collapse this transcript
NoSQL databases
00:00Over the last few years, a number of database systems have grown in popularity, and these
00:04are grouped together under the term NoSQL databases.
00:08NoSQL as a category can be a little confusing.
00:12These days that term is taken to mean not only SQL rather than NoSQL at all.
00:18But it is genuinely a wide category with a lot of options in it.
00:23And first, you can't assume that they're all similar to each other.
00:27Unlike relational databases as a category, where the skills are broadly transferable
00:31between, say Oracle and SQL Server, or DB2 and MySQL,
00:36in the NoSQL category, just because you know MongoDB, doesn't mean you have any head start
00:41on learning Neo4J. They're very different from each there.
00:45There are no official rules as to what makes a NoSQL database system.
00:51That's kind of the point, because it's more about what they aren't.
00:54They aren't traditional relational databases, and they're getting away from those rules.
00:59So there are few things we can say that these usually aren't.
01:02This isn't always the case, but most typically they don't use SQL as a query language.
01:09Usually they don't store data in tables--a few of them do.
01:13Usually they don't really care about relationships.
01:17Usually they don't provide asset transactions, though a few can.
01:21Usually they don't require a formal data schema, and that's quite a big deal with the NoSQL
01:27databases, there's flexibility there.
01:30Several have been developed and oriented around web deployment or web development in a way
01:34that's not as natural for relational databases to do, simply because relational databases
01:39predate the web by about 25 years.
01:41And most of them are Open Source projects, but again, not all.
01:46But beyond that there's a lot of variety.
01:49There are a few worthwhile groupings within the category of NoSQL.
01:53First, we have several document databases what are more typically called Document Stores.
01:59These are organized around the idea where the fundamental thing to be stored is a self-contained
02:03document, rather than individual rows of data consisting of well-defined columns that you decided beforehand.
02:12Now it doesn't necessarily mean a document in the sense of a Word document or PDF.
02:17Just know that everything you store is a self- contained piece of data that describes its own schema.
02:23The format you're actually storing might be XML, or it might be JSON, which is just a loose
02:28structure where every new piece of data that you store can describe its own schema.
02:33In this case, I'm saying store a simple piece of data consisting of two pieces of information: a title and a rating.
02:41Title is a string, and the rating is an integer, it's 10.
02:43I can turn around and the next piece of data I store has a completely different description,
02:49including nested levels of information.
02:52And I don't know what this is ahead of time, or certainly I don't need to know what this is ahead of time.
02:58I can just store anything I want in this document store, just keep adding new documents into the database.
03:05Each one is given a unique ID, but beyond that, you have flexibility, often total flexibility.
03:12You don't need to provide a formal schema. You don't need to define relationships.
03:16But because you don't necessarily know what's inside each, then the database needs to provide
03:21a more flexible way of querying that data than SQL would provide.
03:26Examples in this category would be CouchDB and MongoDB, both very popular NoSQL databases.
03:33In fact, technically the XML databases I talked about earlier could also arguably fall into
03:38this category, too, even though they predate the popularity of the term NoSQL.
03:43Now there's another common category which is Key-value databases, or Key-value stores.
03:50This is another category which is all about having no predefined schema for your data,
03:55simply storing and retrieving everything based on a key-value pair.
04:00You can kind of make an analogy, this is a bit like having a two-column table with whatever
04:06you want in the key column and whatever you want in the value column, nothing more enforcing
04:12that, could be any data type, integer, string, large, small, binary data, whatever.
04:17You could fit in bits of XML or bits of JSON, so in some sense, there's a lot of crossover
04:22here with document stores like MongoDB and CouchDB, they could fall both under the document
04:28store and key-value store categories, but other well-known NoSQL databases would be
04:35MemcachedDB, Riak, and Project Voldemort.
04:37Now worth mentioning is that many database systems in this key/value category are specifically
04:42designed to be very fault-tolerant distributed architecture, meaning you easily install them
04:48across multiple machines, and no one machine is a point of failure.
04:52They'll survive machine failures and keep on ticking.
04:56And the last grouping I'll mention are graph databases, or graph storage.
05:00This is where your data is in a graph structure, small connected nodes.
05:05There's no one master point. Everything is connected to everything else.
05:10Whereas many of the other in the NoSQL databases tend to de-emphasize relationships, this goes the other way.
05:16It's all about having small nodes of data that have connections to other small nodes of data.
05:21This could be a handful--the way that I'm showing them here--where certain nodes manage
05:25other nodes that work in other nodes that are located in other nodes, and you can just
05:30follow the relationships around.
05:33Or in the large scale, there could be millions or tens of millions of these nodes and connections.
05:37Neo4J is probably the best example here, though there are others like AllegroGraph, and worth
05:43mentioning in some classic relational databases are starting to provide extra features in these spaces.
05:50DB2, for example, has the NoSQL Graph Store now.
05:55The thing is there are great reasons for choosing relational databases and great reasons for
06:00choosing NoSQL databases, but it's not flip a coin.
06:04These things are very different.
06:06The most important question that might steer you away from relational databases, first,
06:11do you need a flexible schema?
06:13If you can't define a relational database, because when you start to draw out your tables
06:18and columns, you just have no idea what they'll be, because the data you need to store will
06:23change from moment to moment.
06:24That might be one pointer for looking at a NoSQL database where you have freedom to store
06:29more flexible data with a schema that might change from moment to moment.
06:34The second question, do you have an enormous amount of data?
06:37And by enormous, I mean enormous.
06:40A good definition of this is described in the documentation for the popular HBase NoSQL database.
06:48It's asking when should I use HBase? First, make sure you have enough data.
06:53If you have hundreds of millions or billions of rows, then HBase is a good candidate.
06:59If you only have a few thousand or a few million rows, a traditional relational Database Management System
07:04might be a better choice.
07:07So if that fits your needs, take a look at some of the projects that were specifically
07:11designed for that kind of data, what's being generically referred to as big data situations
07:17which have a different kind of requirements from what people envisioned for classic relational databases 40 years ago.
07:24And the third option, valuing, scaling over consistency.
07:27What do I mean by this?
07:30You see with most enterprise-level relational databases, traditional databases, they are
07:35oriented towards never losing a single thing, that's why it's set to focus on transaction,
07:40because for things like banking systems or airline bookings or patient information, you
07:45just can't lose a thing.
07:47If, however, you have a different kind of setup, your database needs to crunch vast amounts
07:52of statistics on website hits but no particular one entry is all that important, then you're
07:59more concerned with being able to scale really fast, to add machines, lose machines, add
08:04and lose disk drives and still keep up and running, even if you lost an entry here and
08:09there, and some of the NoSQL databases are really more oriented to that kind of approach
08:14than the never lose a single row.
08:17So these databases are great, but make sure you're using them for the right reasons.
08:22Some people think, well, they're newer.
08:23There must be a more up-to-date way to approach relational database issues.
08:27But that isn't the case at all.
08:29These systems are trying to fix different kinds of problems, and if you have a problem
08:33that a relational database can fix, then go ahead.
08:36If you need flexible schemas, vast amounts of data, just go for it.
08:40Or if you have well-defined data, don't underestimate the ability of traditional relational databases,
08:46even if it might not be the newest technology around.
08:49But for more on NoSQL, take a look at our Up and Running with NoSQL Databases course
08:54that gets deeper into some of these, particularly CouchDB.
08:59
Collapse this transcript
Conclusion
Final thoughts
00:01We've reached the end of this Foundations of Programming: Databases course.
00:05As with everything else in programming, once you've got a decent grasp of the theoretical
00:09content, and you're not looking for perfection, just that most of it makes sense most of the
00:14time, it's time to get hands-on.
00:17And there are several more courses here at lynda.com you might be interested in.
00:20So, take a look at our courses on the SQL language and on the hands-on use of Database
00:25Management Systems like SQL Server, MySQL, and PostgreSQL.
00:31Everything we've done in this course, you will find relevant in those and many others.
00:35I'd also suggest that once you've started designing your databases, it's a good idea
00:39to revisit this content now and again.
00:41As many of these ideas and concepts reward repeated attention over months and even years.
00:48And thanks for watching. See you next time.
00:53
Collapse this transcript


Suggested courses to watch next:

SQL Essential Training (2h 22m)
Bill Weinman

MySQL Essential Training (2h 46m)
Bill Weinman


FileMaker Pro 12 Essential Training (10h 28m)
Cris Ippolite


Are you sure you want to delete this bookmark?

cancel

Bookmark this Tutorial

Name

Description

{0} characters left

Tags

Separate tags with a space. Use quotes around multi-word tags. Suggested Tags:
loading
cancel

bookmark this course

{0} characters left Separate tags with a space. Use quotes around multi-word tags. Suggested Tags:
loading

Error:

go to playlists »

Create new playlist

name:
description:
save cancel

You must be a lynda.com member to watch this video.

Every course in the lynda.com library contains free videos that let you assess the quality of our tutorials before you subscribe—just click on the blue links to watch them. Become a member to access all 104,069 instructional videos.

get started learn more

If you are already an active lynda.com member, please log in to access the lynda.com library.

Get access to all lynda.com videos

You are currently signed into your admin account, which doesn't let you view lynda.com videos. For full access to the lynda.com library, log in through iplogin.lynda.com, or sign in through your organization's portal. You may also request a user account by calling 1 1 (888) 335-9632 or emailing us at cs@lynda.com.

Get access to all lynda.com videos

You are currently signed into your admin account, which doesn't let you view lynda.com videos. For full access to the lynda.com library, log in through iplogin.lynda.com, or sign in through your organization's portal. You may also request a user account by calling 1 1 (888) 335-9632 or emailing us at cs@lynda.com.

Access to lynda.com videos

Your organization has a limited access membership to the lynda.com library that allows access to only a specific, limited selection of courses.

You don't have access to this video.

You're logged in as an account administrator, but your membership is not active.

Contact a Training Solutions Advisor at 1 (888) 335-9632.

How to access this video.

If this course is one of your five classes, then your class currently isn't in session.

If you want to watch this video and it is not part of your class, upgrade your membership for unlimited access to the full library of 2,024 courses anytime, anywhere.

learn more upgrade

You can always watch the free content included in every course.

Questions? Call Customer Service at 1 1 (888) 335-9632 or email cs@lynda.com.

You don't have access to this video.

You're logged in as an account administrator, but your membership is no longer active. You can still access reports and account information.

To reactivate your account, contact a Training Solutions Advisor at 1 1 (888) 335-9632.

Need help accessing this video?

You can't access this video from your master administrator account.

Call Customer Service at 1 1 (888) 335-9632 or email cs@lynda.com for help accessing this video.

preview image of new course page

Try our new course pages

Explore our redesigned course pages, and tell us about your experience.

If you want to switch back to the old view, change your site preferences from the my account menu.

Try the new pages No, thanks

site feedback

Thanks for signing up.

We’ll send you a confirmation email shortly.


By signing up, you’ll receive about four emails per month, including

We’ll only use your email address to send you these mailings.

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

By signing up, you’ll receive about four emails per month, including

We’ll only use your email address to send you these mailings.

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked