IntroductionWelcome| 00:03 | Hi, I'm Simon Allardice, and welcome
to Foundations of Programming: Databases.
| | 00:09 | It really doesn't matter what programming
language you use, what operating system you
| | 00:14 | like, what kind of app you're building,
database skills are vital.
| | 00:18 | So in this course, we're going to start at
the very beginning with: what is a database?
| | 00:23 | Why would you want one?
| | 00:24 | And what problems you are likely
to run into if you don't have one?
| | 00:28 | And we'll talk about the available database
software options from small personal databases
| | 00:34 | installed on a desktop or laptop, all the way to the current
open source big data solutions like MongoDB and Hbase.
| | 00:42 | But we'll focus mainly on
classic relational databases.
| | 00:47 | What are the questions to ask, the steps to
go through, the things you need to think about.
| | 00:51 | And we'll talk about query languages, the actual code you'd
write to get information in and out of the database.
| | 00:59 | What we cover in this course, you could be using not
just years but decades from now, so let's get started.
| | 01:06 |
| | Collapse this transcript |
| What you need to know| 00:01 | This course isn't about using one
particular database in one particular environment.
| | 00:06 | The aim here is for content that you will find
relevant and useful across multiple languages and platforms.
| | 00:12 | But I do expect you know the basics of programming
in at least one language, because I will end
| | 00:16 | up talking about common programming ideas
like Loops and Conditions, Variables, and Arrays.
| | 00:23 | And if those aren't the terms you're comfortable with, take
a look at our Foundations of Programming: Fundamentals course.
| | 00:29 | Now, you may have already worked with databases,
they are kind of hard to avoid if you do anything
| | 00:34 | with computing, and whatever
experience you have with them, great.
| | 00:38 | You may be familiar with some of the concepts and terms like
tables, rows, and columns, primary keys, and foreign keys.
| | 00:45 | And if so, you might be tempted to
jump ahead past the initial content.
| | 00:50 | However, if up to now you've been primarily
using a database that somebody else designed
| | 00:56 | or using desktop database software like
Microsoft Access or FileMaker, know that there's often
| | 01:01 | a lot more going on than is
apparent at the surface level.
| | 01:04 | So I would invite you to sit back and go through
all the content, because even in the early
| | 01:09 | stages you may hear something new or a
concept will have a new meaning for you.
| | 01:14 |
| | Collapse this transcript |
|
|
1. Understanding DatabasesWhat are databases?| 00:01 | As with most things in programming, if you
want to understand databases, then don't jump
| | 00:06 | directly into the features of database software.
| | 00:09 | First, go the other way, realize these
things were designed to solve a problem.
| | 00:15 | So what's the problem?
Well, that might seem like an easy one.
| | 00:20 | You have some data, some information, you,
your company, your clients, or even the business
| | 00:25 | that you want to create, there
is some data you need to store.
| | 00:30 | And this data could be anything.
| | 00:31 | It could be information about customers, products,
employees, orders, details about the visits to your website.
| | 00:39 | This data could be in text format, it could
be names and descriptions, or numeric, amounts
| | 00:44 | or dates, or this could be
document files or images, audio or video.
| | 00:50 | But here's the thing,
you can already store data.
| | 00:54 | You could just open up a text file and type your information
into that or open up a spreadsheet and do that too.
| | 01:00 | If you've got documents,
just organize them in folders.
| | 01:03 | Now, many small businesses would start with
something like this, with a spreadsheet and
| | 01:08 | a lot will continue like this for years, and
that might be okay, because just having data
| | 01:15 | is not a good enough reason to need a database.
Having data is not the problem.
| | 01:20 | The problem is what comes next, and there's a
lot of potential problems, but I'll describe
| | 01:26 | six of them without even getting technical.
| | 01:29 | Size of your data is a potential problem,
ease of updating your data, accuracy of it,
| | 01:34 | security of it, redundancy
in it, importance of the data.
| | 01:38 | Let's take these one by one.
| | 01:40 | First, size, because what starts off as a
small amount of data it has a tendency to
| | 01:46 | turn into a large amount of data, and if your
spreadsheet solution is nice and speedy when
| | 01:52 | there's 100 lines in it, what happens
when there's 2 million lines in it?
| | 01:57 | Maybe you'd start to split that up into a bunch of
different files and then you had a sub-problem, speed.
| | 02:03 | How would you find anything?
| | 02:06 | Next, there's the idea of
ease of updating your data.
| | 02:10 | What happens if two people need to
edit this spreadsheet at the same time?
| | 02:15 | How about 20 at the same time
or 200 people at the same time?
| | 02:18 | If you work with a file-based
system, it's just not going to happen.
| | 02:22 | You'll end up with everybody
overwriting everybody else's changes.
| | 02:27 | Then there's accuracy of the data.
| | 02:29 | Is there anything that would actually prevent
me from typing incorrect data into a spreadsheet
| | 02:35 | like this, to fat finger a date as being in
the past when it should be in the future or
| | 02:39 | just missing some vital piece of information
out with the intention that I'll come back
| | 02:44 | to that and do it later.
| | 02:46 | Well, usually, there's nothing
that would prevent me from doing so.
| | 02:49 | So as these files get older and older, they
start to fill up with data that isn't accurate,
| | 02:54 | it isn't consistent, it
isn't trustworthy anymore.
| | 02:58 | And then there's the idea of security. Most
of the time, you need to show this data, but
| | 03:02 | just because it needs to be shared with
someone, it doesn't mean it needs to be shared with
| | 03:07 | everyone, because some of this could be sensitive, it
could be payroll information, healthcare information.
| | 03:14 | So, who would get to view it,
and who would get to edit it?
| | 03:18 | And if they do get to edit this data, is all
we know just the last person who edited this
| | 03:23 | file or do we know who made every entry
and every change at every point, because with
| | 03:29 | some information, healthcare, financial, there
are laws in place to make sure you're auditing
| | 03:35 | every alteration to sensitive data.
| | 03:37 | And there's the idea of redundancy or duplication
of your data having multiple copies of the same data.
| | 03:44 | Now, redundancy by itself is not a bad thing,
but backups, it's good to have redundancy,
| | 03:51 | but in the data itself, not so good,
because it leads to conflicts.
| | 03:56 | Maybe one file ends up with
two entries for the same product.
| | 04:00 | One says the product is $12 and the other
says the product is $10, which one is right?
| | 04:06 | Or maybe that information is spread over a
couple of different files, one owned by one
| | 04:10 | department, one by another, which one is true?
| | 04:13 | And then there's importance of your data.
| | 04:17 | You've probably felt the pain of working in a
spreadsheet or a word processor where something
| | 04:21 | happens, there's a crash or a disconnect, and you
lose 5 minutes of work, or an hour of work, or a day.
| | 04:30 | It's annoying, it's inconvenient, but if
the information you just lost was yesterday's
| | 04:35 | orders, or the allergies of a patient, or
the details of a stock trade, or the seat
| | 04:40 | allocation of a Transatlantic flight that someone just
paid for, that's not inconvenient, that's unacceptable.
| | 04:48 | The data is critical. With many
companies your data is your entire business.
| | 04:54 | You can't lose any of it, not a single change,
and you might have one of these problems.
| | 05:01 | You might have all of these problems and more
besides, but these are the reasons we need a database.
| | 05:08 | These are the problems.
| | 05:10 | It's not about providing
somewhere to put your data.
| | 05:12 | We could already do that.
| | 05:14 | It's about allowing that data to grow and
still be manageable, still be easy to get
| | 05:18 | to everything, about letting it be easily
updated even by lots of people at the same time.
| | 05:23 | About keeping it accurate and internally consistent,
keeping it secure and controlling access to it.
| | 05:30 | Who gets to read it?
Who gets to edit it?
| | 05:32 | And tracking exactly what
they did, and when they did it.
| | 05:35 | It's about making sure that while your backups
might be redundant because that data is critical,
| | 05:40 | your data itself is not.
| | 05:42 | There is one version of the truth, and once
you make a change to that data, you will never
| | 05:48 | lose that change, you'll never lose that data.
| | 05:52 | This is what a database can give us, structure.
It's not a place to put your data.
| | 05:57 | It's a structured system to put your data
in that imposes rules upon that data, and the
| | 06:04 | rules are yours, because the importance of
these things change is based on your needs.
| | 06:09 | Maybe your main problem is size.
| | 06:12 | You have terabytes of information, then
you'll have different priorities from someone who
| | 06:17 | has much smaller amounts of data where
every single change must be controlled, audited,
| | 06:23 | and logged the moment that it's made.
| | 06:26 | And this is why if your experience is just using a database
someone else has made, you haven't seen what's important.
| | 06:33 | That's totally forgivable, with someone
whose only time working with databases is using
| | 06:38 | an old FileMaker or Microsoft Access
database that they think this is what a database is.
| | 06:43 | It's like a spreadsheet, it's just more
tedious to use, and that's not it at all.
| | 06:50 | This is the true power of
a database, it's invisible.
| | 06:55 | It's the things you can't see that are going
on in the background, the security, the enforced
| | 07:00 | integrity of the data, the ability to get
to it fast and get to it reliably, that it's
| | 07:05 | robust, it can withstand lots of people at
the same time and even correctly survive crashes
| | 07:09 | and hardware issues without corrupting the data.
| | 07:13 | And that's what we need to do here, understand
how to describe our structure and define those
| | 07:20 | rules, so all these
invisible stuff will actually happen.
| | 07:23 | Now up to this point, I haven't gone into
any particular database software or even any
| | 07:28 | particular type of database, because there
are different types, and some of them are better
| | 07:33 | at one or more of these
problems, and we'll get there.
| | 07:37 | But there are a couple more
terms to cover before we do.
| | 07:42 |
| | Collapse this transcript |
| Exploring databases and database management systems| 00:01 | First, we better get clear with the most
important term here, database, because it's one of these
| | 00:07 | words that are often casually misused
even in conversation between programmers.
| | 00:12 | Someone will say their database is Oracle
or it is SQL Server or it is MySQL or DB2
| | 00:18 | or PostgreSQL or MongoDB and so on.
But let's avoid some confusion.
| | 00:25 | These aren't databases, they are Database
Management Systems, often shortened to DBMS.
| | 00:34 | And there's a big difference between your
Database Management System and your database.
| | 00:39 | So, all these products here, they
are Database Management Systems.
| | 00:43 | We would install the DBMS.
We would install the software.
| | 00:48 | That could be a simple desktop Database
Management System like Access or Bento, all the way the
| | 00:54 | things like Oracle or DB2, which you wouldn't
install on a personal machine but on a separate server.
| | 01:00 | And then you'd use that software, that DBMS,
to create and manage one or more databases.
| | 01:07 | The database is your data
and your rules about that data.
| | 01:13 | The Database Management System, the DBMS, is
the program or the set of programs that surround
| | 01:18 | and manage it to make
sure your rules are applied.
| | 01:23 | And one DBMS can manage many different
databases inside it, each one separate from the other
| | 01:30 | with different rules,
different data, different security.
| | 01:33 | Now in a diagram of these, you'll often see
an entire databases represented as cylinders.
| | 01:38 | In a perfect world, you might imagine that
an organization would have just one database
| | 01:45 | that contains every single scrap of information
important to that company, and in practice
| | 01:50 | it's very common to have multiple databases.
| | 01:53 | The database that deals with your order and
customer information might be completely independent
| | 01:58 | from you database that deals
with human resource information.
| | 02:03 | And in many organizations, you don't just
have multiple databases but multiple
| | 02:08 | Database Management Systems, such as SQL Server
and Oracle, DB2 and MySQL, PostgreSQL and Hbase.
| | 02:17 | And sometimes it's because you're using another
software product that requires a certain DBMS,
| | 02:23 | sometimes it's because one Database Management
System is better at something than another
| | 02:28 | is and sometimes because the company might acquire
another company and bring it along with the purchase.
| | 02:35 | But even though there are differences
between the different Database Management Systems,
| | 02:42 | we don't really need to stress about learning
every single one because Database Management
| | 02:48 | Systems fall into broad
categories, different types of DBMS.
| | 02:52 | And the most common by far is what's called a
Relational Database Management System, or RDBMS.
| | 03:01 | Things like Oracle, SQL Server, DB2,
MySQL, they all fall into this category.
| | 03:06 | Other types of Database Management Systems
include Hierarchical Database Systems, Network
| | 03:11 | Database Systems, Object-Oriented Database
Systems and more recently, several that fall
| | 03:17 | in the category of NoSQL Database Systems.
| | 03:21 | Now, I'm not going to get into the specifics
of the differences, not yet anyway, because
| | 03:27 | we don't have enough of a base of
knowledge to make these distinctions worthwhile.
| | 03:32 | We're going to focus first on understanding Relational
Database Management Systems for a couple of different reasons.
| | 03:39 | One, they are by far the most
common, the most widely used.
| | 03:43 | Two, the same principles
are usable across all of them.
| | 03:47 | And three, even if you know right now that you'll
be jumping into the more recent NoSQL databases.
| | 03:55 | You'll find that even the most
introductory discussion of NoSQL databases will assume
| | 04:00 | that you already understand Relational Database
concepts, and it will use those concepts to
| | 04:05 | compare and contrast and
explain the newer systems.
| | 04:10 |
| | Collapse this transcript |
|
|
2. Database FundamentalsThe features of a relational database| 00:01 | We're now going to cover the most important
concepts in a typical relational database.
| | 00:06 | I'm expecting many of you have had some
exposure to these concepts already if you've done even
| | 00:10 | a little work with databases, but
let's make sure we're all on the same page.
| | 00:15 | A database is constructed of one or more--
usually more--tables, and it's a table that is the
| | 00:22 | most basic, most fundamental
building block of a database.
| | 00:27 | A database without table is just an empty
shell devoid of meaning. Not only is there
| | 00:31 | no data in it, there's nowhere for any data
to go, because all your data goes into a table
| | 00:38 | and each table describes a
formalized repeating list of data.
| | 00:43 | Visually, it's often shown like a spreadsheet.
| | 00:47 | These tables consists of columns and rows,
rows going from left to right, columns, like
| | 00:53 | the columns holding up a
building running up and down.
| | 00:57 | And if you have multiple tables in a database,
each different table is containing repeating
| | 01:03 | information, repeating rows, but
always about the same kind of thing.
| | 01:07 | So, you might have one table with all your
customer information in it, another table
| | 01:12 | with all your order information, another table with
all your employees and another with all your locations.
| | 01:18 | And within each table, every repeating row represents one
single customer, or one single order, or one single employee.
| | 01:28 | But each of these rows is not free form.
You must apply structure to this data.
| | 01:35 | You must say what every row is made of, and you
do this by defining the columns in that table.
| | 01:43 | Now, you might have 2 columns
in a table or 5 or 50 columns.
| | 01:48 | Each column describes one piece of data.
| | 01:52 | It gives it a name like FirstName, LastName,
HireDate, and very important, not just a name,
| | 01:58 | but it also says what type of data this must be, perhaps,
a piece of text for a name or a date or a number.
| | 02:09 | Now we're going to go more into the
different types of data they can be a little later.
| | 02:14 | But every row must follow that same
structure, following that same format.
| | 02:20 | It's not allowed to deviate from
the way that the columns are set up.
| | 02:25 | So if this table contains all customers, each
row is one customer and each column describes
| | 02:32 | one very specific piece of
data about that customer.
| | 02:37 | And by deciding what these columns are and
defining them, we're imposing rules on the data.
| | 02:43 | Because once we describe these rules, the
Database Management System will not let us break them.
| | 02:49 | If I say upfront that the column called,
HireDate must contain a date, then I cannot create
| | 02:54 | a new row with just random text in there.
| | 02:58 | If I say that the grade must contain a value, then I
cannot create a row that leaves that piece of data empty.
| | 03:06 | So we have tables, rows, and columns.
It doesn't sound too bad, and it isn't.
| | 03:13 | But realize that you won't think about these
three things in the same way at the same time,
| | 03:20 | because the up-front work is your
tables and columns, later on, comes rows.
| | 03:26 | When you just talk about tables and columns,
you're saying what data should be in this
| | 03:30 | database and what exactly
should each little piece of data be.
| | 03:34 | And it's after you've defined it, your day-to-day
use is working with rows, adding new rows of data,
| | 03:41 | updating rows, deleting rows, reading existing rows.
| | 03:45 | Now one of the first requirements you'll
run into is the ability to specifically get to
| | 03:51 | one particular row in a table, to one
customer or one order or one location row.
| | 03:58 | And to do that, we need the
next idea, the idea of a key.
| | 04:03 |
| | Collapse this transcript |
| Exploring unique values and primary keys| 00:01 | Almost all tables in a
database will require a key.
| | 00:05 | The key is a way to identify just one
particular row in any table, and it's created typically
| | 00:11 | as one column that will contain a
guaranteed unique value for each row.
| | 00:17 | And this is something you can do
in almost all database systems.
| | 00:20 | Say that for a table this particular column
contains unique values, and this one does not.
| | 00:26 | Now when a column is defined as
containing unique values, it means just that.
| | 00:32 | But even if there were a million rows in this table,
the same value cannot occur more than once in that column.
| | 00:40 | If you try, it will disallowed
by the Database Management System.
| | 00:44 | So whether that's a number, a piece of
text, a date, whatever, it cannot repeat.
| | 00:50 | Now if it's not unique, which is the default in
most Database Management Systems, it doesn't matter.
| | 00:56 | You can just repeat that value all you want.
| | 00:59 | And most of your columns don't
need to be unique and shouldn't be.
| | 01:04 | If in your table you made say the
FirstName unique, then you could only ever have one
| | 01:09 | employee row with the FirstName of James.
| | 01:12 | Or if you make HireDate unique, then only one
employee can have a HireDate of the 1 October, 2006.
| | 01:19 | Now, it is true that there might be just one
employee called James in your table, but you
| | 01:25 | probably don't want a
database that enforces that rule.
| | 01:29 | Now, sometimes there might be a column in
your natural data that does uniquely identify
| | 01:36 | its row, like having a Social Security
number for employees or an ISBN number for books.
| | 01:43 | That should be a number
that is naturally unique.
| | 01:46 | There should never be duplicates.
| | 01:49 | But much of the time, there isn't one piece of
natural data that's unique, so you will make one instead.
| | 01:56 | You will tell the database to generate a new
column, a unique column like a customer number
| | 02:03 | or a product ID or employee ID, and most
Database Management Systems will help you
| | 02:09 | generate those kinds of
columns with these kinds of values.
| | 02:13 | And this column is what is referred
to as your Primary Key for this table.
| | 02:19 | It's the one piece of data that will
identify one and only one row in this table.
| | 02:25 | And the thing is even if you do have a piece
of real data that uniquely identifies a row,
| | 02:32 | like a Social Security number, you'll
generate one of these kind of columns anyway, like
| | 02:38 | employee ID, because this is a number that
could be passed around in emails or printed
| | 02:42 | on an ID card, unlike a sensitive piece
of data, like a Social Security number.
| | 02:48 | Now sidebar, if this Primary Key is generated
by the database and doesn't just naturally
| | 02:54 | occur in the data itself, you might also hear
referred to as a synthetic key or a surrogate key.
| | 03:00 | It is still a Primary Key.
| | 03:03 | Those words just suggest that
it's a system-generated value.
| | 03:06 | Its only reason to exist
is to be the Primary Key.
| | 03:11 | Now, there are other kinds of keys, Foreign
Keys, Compound Keys, but we'll cover those later.
| | 03:17 | The Primary Key is the first and most
important key in any table, whether it's natural or
| | 03:23 | generated, because it's your Primary Keys
that let you not only identify each row and
| | 03:29 | each table, but also start to
connect your different tables together.
| | 03:34 |
| | Collapse this transcript |
| Defining table relationships| 00:01 | While any database begins with defining your
tables, a vital next step is the ability to
| | 00:06 | add relationships from one table to another,
because much of your data is naturally connected.
| | 00:13 | You're not trying to invent arbitrary relationships that
don't exist, you're trying to describe what's already there.
| | 00:19 | So, you might first define a customer table
by itself or an order table or a product table.
| | 00:25 | But orders are placed by customers and
they are orders for specific products.
| | 00:31 | Yes, information about an order is different from
information about a customer, but they are related.
| | 00:39 | So we need to formally describe the relationship
between our tables, and the way we describe
| | 00:44 | relationships is based on keys.
Let's go through an example.
| | 00:48 | I have a simple Customer table here, and it
has this column in the start, CustomerID, which
| | 00:56 | is set up to automatically generate a unique number
for each customer row as it's added to the database.
| | 01:03 | So, that's our Primary Key here.
| | 01:06 | And one of the benefits of having
that key is we can also use it elsewhere.
| | 01:11 | So, I have another simple table, Order.
| | 01:15 | And each row here represents one order, so
it has its own Primary Key, order ID, which
| | 01:21 | also is generating an automatic number.
| | 01:23 | Now, I'm going to imagine we have such a
straight forward business that we only sell one product.
| | 01:29 | So I just need to know a
quantity and the total amount due.
| | 01:34 | But each order is an order
for a particular customer.
| | 01:38 | And to process an order, you would
need to know who this applies to.
| | 01:42 | So, how do we get that data?
| | 01:44 | Well, we could add columns to the order
table and start copying across all the relevant
| | 01:50 | data from the Customer table into the
Order table whenever an order is placed.
| | 01:55 | But A, that would be duplication of
data, something to avoid in general.
| | 02:00 | And B, there's just no need to do this.
| | 02:03 | Instead, what we do is we add a
CustomerID column to our Order table.
| | 02:10 | We're taking the key to one row in the
Customer table and using it again in the Order table
| | 02:15 | to describe the connection between the two.
Now, we're not doing this casually.
| | 02:20 | We actually tell the Database Management
System that this is a formal relationship between
| | 02:26 | these tables, so every
order row now has a CustomerID.
| | 02:31 | But whereas the Customer table has
a CustomerID that has to be unique,
| | 02:36 | in Order table, it doesn't have to be unique.
| | 02:38 | We could have the same CustomerID occur
twice or three times or a dozen times.
| | 02:44 | In this case, 367 is occurring twice,
multiple orders by the same customer, because here,
| | 02:50 | this number is not uniquely identifying a row
in the Order table, we have order ID for that.
| | 02:56 | So in the Customer table,
CustomerID is our Primary Key.
| | 03:00 | But when it's being used in the Order table, it
is not a Primary Key, though it is still a key.
| | 03:05 | It's referred to as a
Foreign Key and is not unique.
| | 03:08 | Now, the benefit of having this relationship
defined in the database is we can go either way.
| | 03:14 | We could start at the customer row and then
take that CustomerID and go get every order
| | 03:20 | for that customer, or we can go from the
order row, find the CustomerID and find out which
| | 03:26 | customer is associated
with that particular order.
| | 03:30 | And when you have a relationship like this,
it's what's defined as a one-to-many relationship.
| | 03:38 | One customer can have many orders.
| | 03:42 | In a database diagram, you'll see this
sketched out a couple of different ways.
| | 03:45 | Some people use the crow's foot symbol
to show which way the relationship goes.
| | 03:50 | In others, you might see
the one to infinity symbol.
| | 03:54 | So, one customer has many orders, one
category has many products, one department has many
| | 03:59 | employees, one classroom
has many students, and so on.
| | 04:03 | And it's very, very common to have these
kinds of relationships between your tables.
| | 04:09 | Most tables in well-designed databases have
relationships to one or more other tables.
| | 04:15 | Now, bear in mind, when you describe a one-to-many
relationship in a database, you aren't required to have many.
| | 04:24 | A customer is not required to have many orders.
| | 04:27 | Customer might place only one order,
or might place no orders at all.
| | 04:31 | But the database internal rules support the
idea that one customer can have many orders,
| | 04:37 | one category can have many products,
one department can have many employees.
| | 04:42 | But the opposite of these statements is not true,
at least in the imaginary business I'm describing.
| | 04:48 | So, while each customer can have many
orders, each order is for only one customer.
| | 04:55 | I can't say that an order
has many customers, it can't.
| | 04:59 | A department has many employees, but
an employee only has one department.
| | 05:03 | A product only has one category.
| | 05:05 | Now as always, I'm trying to describe simple business
examples that might be a little mundane, but easier to grasp.
| | 05:13 | If your actual business rules support a more
flexible situation that perhaps an order can
| | 05:19 | involve multiple customers or an employee
can work for multiple departments at the same
| | 05:25 | time, you may have to define a
different kind of relationship.
| | 05:28 | Instead of one-to-many, many-to-many,
and that's up next.
| | 05:33 |
| | Collapse this transcript |
| Describing many-to-many relationships| 00:01 | While the most common relationship between
tables in a relational database is one-to-many,
| | 00:06 | it's not unusual to sometimes need to
describe a many-to-many relationship.
| | 00:10 | We've got two problems here.
| | 00:14 | Problem number one, a small problem, one-to-many relationships
are easy to describe, but they're all over the place.
| | 00:20 | But many-to-many scenarios are a little trickier.
They're not so obvious to spot.
| | 00:25 | Problem number two, a bigger problem.
| | 00:28 | In most relational Database Management Systems, you
cannot express a many-to-many relationship directly.
| | 00:35 | It sounds like this could be a short movie.
| | 00:38 | But while you can't create a direct many-to-many relationship,
you can do it indirectly, and here's how we do it.
| | 00:46 | As ever, let's focus on the business
problem, not the technological fix.
| | 00:51 | So, say we're a publishing company.
| | 00:54 | We have an Author table with a list of a few
authors inside it and their names, contact,
| | 00:59 | information, some kind of
Primary Key for them here, AuthorID.
| | 01:04 | And we also have a Book table with the list of our
book Titles, also with its own Primary Key, BookID.
| | 01:12 | Well, at first glance, we might say, okay, this
looks like it could be a one-to-many relationship.
| | 01:19 | We could take the Author number 447,
Jordan Winters, and say we want to represent that
| | 01:25 | author has written both book 1145, Designing
Databases, and book 1147, Pocket Guide to SQL.
| | 01:32 | And we'd like to say that, say Robert,
number 446, wrote book 1146, SQLite Made Simple.
| | 01:39 | So, if this was a classic one-to-many
relationship, one Author with one or more Titles, we could
| | 01:48 | do this by just adding a new column
to the Book table, AuthorID column.
| | 01:53 | It would be a Foreign Key to the
Author table. So far, so good.
| | 01:59 | But here's the issue.
| | 02:02 | What happens if in a day or a week or a month,
we're told that the Designing Databases book
| | 02:07 | was written by two different authors?
| | 02:09 | Well, the way we have this described right
now, we can't do that because our AuthorID
| | 02:16 | column can only store one, storing a Foreign
Key to the author 447 or 445, so I can currently
| | 02:22 | support a one-to-many relationship,
but I need a many-to-many relationship.
| | 02:28 | One author can have many books, but
also one book can have many authors.
| | 02:34 | Now the way some people try and model this is
they will add another column to the Book table.
| | 02:41 | They'll add in an AuthorID two column, and they'll
have this one, say 445 pointing to a different author.
| | 02:49 | However, adding new columns to your tables,
particularly these kind what I refer to as
| | 02:54 | repeating groups or repeating columns, you're
just doing the same information again, that's
| | 03:00 | a bad idea, and it's
discouraged in database design.
| | 03:04 | So we'll get rid of that technique.
| | 03:06 | Well, some other people think
I'll just cheat a little bit.
| | 03:11 | I'll do something quick and dirty, and what I'll do
is I'll slide in two values into that AuthorID column.
| | 03:17 | Maybe I'll use comma separated values and
squeeze in another reference so that the
| | 03:22 | AuthorID column now relates to two authors.
But that's also a cheat.
| | 03:27 | And like adding a new column,
this is highly discouraged as well.
| | 03:32 | So how do we solve this?
| | 03:34 | Well, in fact, we're going to solve this problem
by getting rid of that AuthorID column entirely.
| | 03:41 | And in fact, first, we go back to two completely
detached tables with no official reference between them.
| | 03:49 | And what we do to create a many-to-many
relationship is we add another table.
| | 03:55 | We add what's often referred to
as a junction or linking table.
| | 03:59 | Now the only reason for this table to exist is to
join the Author table and the Book table together.
| | 04:06 | And in fact, the name of this linking
table by convention would be Author Book.
| | 04:12 | It could also be Book Author.
| | 04:13 | It doesn't really matter which way it goes,
because what we're using it for is to set
| | 04:18 | up two one-to-many relationship.
| | 04:21 | So, what we do is we define a one-to-many
relationship from Author to Author Book.
| | 04:27 | So, one author with AuthorID 445, that's
Primary Key here, but a Foreign Key here.
| | 04:31 | It's not unique there.
| | 04:33 | It could exist in that table once
or twice or three times or a dozen.
| | 04:38 | And using this, we could go from Author to
Author Book, find a BookID and map that to
| | 04:45 | the Book table, and we can
also do it the other way.
| | 04:50 | We can take the identity of a book like number
1145, take it back to this junction or linking
| | 04:57 | table, in this case to two rows, and there,
go to the Author table. Grab the AuthorID.
| | 05:03 | Follow that back up to the Author table.
| | 05:05 | So we're now expressing a
many-to-many relationship.
| | 05:10 | One author can have many books.
One book can have many authors.
| | 05:14 | And the only reason for the existence of this
table in the middle is to join the other two
| | 05:19 | together, and that's how we do it.
| | 05:21 | Now in a large database, you're going to
end up with a lot of one-to-many relationships
| | 05:27 | between your tables, and just a few of those were
really being used to create a many-to-many relationship.
| | 05:35 | Officially, there is a third kind, a one-to-one
relationship that is possible, but it's not common.
| | 05:42 | If you think about it, if one row in one table
is pointing to one and only one row and another
| | 05:47 | table, well, you might as well just combine
those tables so it's one row in both places.
| | 05:52 | So these are the official kinds of
relationships, what is sometimes called the cardinality of
| | 05:57 | how tables relate to each other.
| | 06:00 | One-to-many, very common, many-to-many like this,
occasionally required and one-to-one not common at all.
| | 06:09 | Now in any large database, you'll also
find some tables without formal relationships.
| | 06:13 | Some just don't need them and some people
consider none to be a fourth kind of cardinality
| | 06:19 | or fourth kind of relationship.
I don't think so, but you will see it.
| | 06:23 | I certainly don't have to connect every
table to every other table, but most of them will
| | 06:29 | end up of being related to at least one other.
| | 06:34 |
| | Collapse this transcript |
| Transactions and the ACID test| 00:01 | When we're entering or changing data in our
databases, we'll often need to work with transactions.
| | 00:05 | A transaction is an incredibly
important thing in the world of databases.
| | 00:10 | And to understand them, the best way is to think
about what we think of as a transaction in the real world.
| | 00:16 | If you talk about making a transaction,
you often mean something commercial.
| | 00:20 | You hand over $15 to a bookseller,
the bookseller gives you the book.
| | 00:25 | That's the transaction, and it's
important that both of those things happen.
| | 00:29 | If you hand over the money,
you expect to get the book.
| | 00:32 | If they hand you the book,
they expect to get the money.
| | 00:35 | Either both of these things
happen, or neither of them do.
| | 00:38 | Now in a computing system, a classic
example of transactions is a banking system.
| | 00:44 | Imagine that you log on to your bank's website,
and you want to transfer $2,000 from one
| | 00:50 | of your account to another.
| | 00:52 | Now, that's going to require two updates to
this data, one to subtract $2,000 from
| | 00:59 | the savings account, and the other to
add $2,000 to the checking account.
| | 01:04 | Now if the first part of this happens
successfully, subtracting $2,000 and then we try and
| | 01:11 | add $2,000, but there's a problem with the
second part, perhaps it's locked for editing.
| | 01:17 | Well, we will need to reverse
the first part of this transaction.
| | 01:21 | We never want to be in a situation where $2,000
has been debited from the first account, but
| | 01:27 | it didn't get added to the second.
| | 01:29 | So, a transaction is how you define a
combined unit of work, either both of these things
| | 01:36 | happen or neither of them do, and the first
change will be reversed instantly by the database
| | 01:41 | if any part of the transaction fails.
| | 01:43 | Now, there's a common acronym that you'll
come across when working with transactions
| | 01:48 | in a database, ACID, A-C-I-D.
| | 01:51 | A transaction must be Atomic,
Consistent, Isolated, and Durable.
| | 01:57 | Being Atomic, and this is the Greek word
atom meaning an indivisible unit, it refers to
| | 02:03 | the core idea that this transaction
must completely happen or not at all.
| | 02:08 | So, whether there are two steps in the transaction or
20 steps, they're all contained within the transaction.
| | 02:14 | They either all complete successfully,
or they all return to the original state.
| | 02:19 | There is no such thing as a
transaction that halfway occurs.
| | 02:23 | Now whether the reason for the transaction
failing is that the database had a power failure
| | 02:27 | or ran out of space to hold a new data, or
there was an application error, it doesn't matter.
| | 02:32 | Atomic is the all or nothing rule.
| | 02:35 | Now Consistency means that any transaction
must take the database from one valid state
| | 02:41 | to another valid state based
on the rules of the database.
| | 02:44 | So, even if a transaction is successfully
atomic, it still cannot result in a situation
| | 02:50 | that violates any of the
integrity rules defined in a database.
| | 02:54 | Isolation refers to the data and the
transaction being essentially locked for that moment in
| | 03:00 | which the transaction is occurring.
| | 03:02 | So, while we're trying to change a balance
on one of our account records, another part
| | 03:07 | of the system must not be allowed to access that
data until the first transaction has finished.
| | 03:13 | And durability refers to the
transaction being considered robust.
| | 03:18 | If it happens and the database says this
transaction has happened successfully, then
| | 03:23 | the transaction is guaranteed.
| | 03:26 | Say if you go to a travel website, purchase
a flight on an aircraft, and you're guaranteed
| | 03:31 | seat to A, then you should be able to
regard that transaction as being durable.
| | 03:36 | Even if half a second later, the database
suffers a power failure and shuts down, when
| | 03:41 | it reboots, that transaction will have
survived that failure and they are not going to sell
| | 03:46 | that seat to someone else simply because there was a glitch
in the system half a second after you made your purchase.
| | 03:52 | Now, the great thing about working with most
Database Management Systems is these capabilities
| | 03:58 | are built into the system.
| | 04:01 | You don't have to worry
about how to program these.
| | 04:04 | You just need to know when to tell the database.
Now, this is a transaction.
| | 04:08 | These three things, these four things,
these 10 things must be done together.
| | 04:14 | That's what we'll see
how to do a little later on.
| | 04:19 |
| | Collapse this transcript |
| Introduction to Structured Query Language (SQL)| 00:01 | SQL, or Structured Query Language, is the common
language that lies at the heart of every relational
| | 00:07 | Database Management System
that you'll likely to use.
| | 00:09 | Now, you've probably noticed the
different pronunciation already.
| | 00:13 | Some people say S-Q-L, some people say sequel.
| | 00:16 | I tend to say S-Q-L when I'm talking about the
original language itself and sequel when it's
| | 00:21 | combined with something else as
in SQL Server, or T-SQL, or MySQL.
| | 00:26 | Now SQL, the language, has been around since
the 1970s, and that's one of the few languages
| | 00:32 | I was writing 25 years ago
that I'm still writing now.
| | 00:35 | A lot of other programming languages have come
and gone, become fashionable and unfashionable,
| | 00:41 | but SQL has stuck around, and I
expect it to stay for a long time to come.
| | 00:45 | But SQL is a very small, very focused language,
and the key to first learning it is to understand
| | 00:52 | that it's a little different from other programming languages
you might have come across, like C, C++, Java, or Python.
| | 01:00 | SQL is what's considered a declarative query
language, as opposed to procedural or imperative languages.
| | 01:08 | Okay, that's jargon, yes.
| | 01:10 | But what it means is you use SQL to describe
what you want, and you let the Database Management
| | 01:17 | System handle how that's actually done.
| | 01:19 | You don't have to manually lay out the
algorithm, the different steps of the procedure as you
| | 01:24 | would do in other programming languages.
| | 01:27 | Say if I have a Books table with, perhaps,
thousand different books in it, and I want
| | 01:32 | to know which books have a
list price of more than $40.
| | 01:36 | Well, in a procedural or imperative language like
C or Java, I'll have to write the steps to do this.
| | 01:43 | I would probably write some code that would
start at the first book and then would loop
| | 01:48 | through all of them one by one.
| | 01:51 | And every time going through this loop you're
asking the question, "Is this more than $40?
| | 01:55 | If so, do one thing. If not, do something else."
Even writing this in pseudo code, I'd be writing a loop.
| | 02:02 | I'd be writing conditions.
I'd be writing returns.
| | 02:07 | But with SQL, you don't describe all
the steps, you just describe the outcome.
| | 02:13 | You describe what you want the same way you
might do in English, where you just say, I
| | 02:18 | want to know all the books more than $40, or this written
in SQL, SELECT * FROM Books WHERE ListPrice > 40. That's it.
| | 02:30 | Select or go get everything from the Books
table where the price is more than $40.
| | 02:35 | The Database Management System will take this,
it will look at all your data, figure it out,
| | 02:41 | and return the result set, whether that would be one book,
500 books, or even none, based on your query.
| | 02:48 | So SQL can be used to select, to retrieve,
or read data, and to ask questions of it.
| | 02:55 | It can also be used to create data,
to update data, and delete data.
| | 03:00 | In fact, all of these things are referred to with the
wonderful acronym of CRUD: Create, Read, Update, Delete.
| | 03:08 | An SQL can be used to create not just your
data, but to define the databases themselves.
| | 03:14 | Now most Database Management Systems have
their own slightly different implementation
| | 03:19 | of the core SQL language.
If you work with Oracle, you use PL-SQL.
| | 03:25 | With Microsoft SQL Sever, it's Transact-SQL, or T-SQL
for short, but basic SQL knowledge works with all of them.
| | 03:32 | And we're going to get into the
specifics of SQL a little later.
| | 03:35 | But first, we're going to get an overview of why so many
database software options exist and what the differences are.
| | 03:43 |
| | Collapse this transcript |
|
|
3. Database Modeling: TablesIntroduction to database modeling| 00:01 | Time to see how to design or model a
relational database, and this is regardless of which
| | 00:06 | relational database management
system you're going to use to do it.
| | 00:09 | We're developing the formal description of
our database, so what's called the database
| | 00:13 | schema, our tables, our columns,
our primary keys in relationships.
| | 00:18 | If you want a foolproof method of building
a terrible database, it's to jump directly
| | 00:23 | into the Database Management System software,
whatever that is, and just start building something.
| | 00:28 | That's a really bad idea.
| | 00:30 | Planning is vital, because while in other
areas of development, mobile programming,
| | 00:35 | desktop, web development, I'm a big fan
of agile, incremental, iterative approach.
| | 00:41 | Building something quickly, getting it out,
revising it, adding new features over weeks
| | 00:45 | or even days, that's not what
I want to do with databases.
| | 00:49 | Relational databases reward upfront planning,
because the entire point is to impose rules
| | 00:55 | and constraints and structure on your data.
| | 00:58 | We don't want to have one set of rules one
week and a different set of rules next week.
| | 01:02 | I made a comment once that building
a database is like getting tattooed.
| | 01:06 | You really want it to be
correct the first time you do it.
| | 01:10 | Changes are possible, but they are painful.
| | 01:12 | Unlike tattooing, some
changes may be easier than others.
| | 01:16 | Adding something extra is a little easier
than trying to directly change something that's
| | 01:20 | been in place for a couple of years.
| | 01:23 | But the great thing is
relational databases aren't new.
| | 01:26 | They've been around since the '70s.
| | 01:28 | So you're coming in at a time when the methods for
modeling a database have been battle tested over four decades.
| | 01:34 | We know what works and what doesn't.
| | 01:37 | And to design a good database, a large
part of it is just to go through the steps.
| | 01:41 | You see, database modeling is not a place
to try and express your inner creativity
| | 01:46 | and find wild and crazy
innovative new ways of doing things.
| | 01:50 | If you want to get wild and crazy and
innovative, fine, but do it in your user interface,
| | 01:55 | do it in your application.
| | 01:57 | In your database schema, you want to be
patient, methodical, and step by step.
| | 02:03 | Now, as with other areas of software development,
there are specific diagrams associated
| | 02:09 | with planning a database schema.
| | 02:10 | Unlike other areas, you can get super
formal with these, but you rarely need to.
| | 02:15 | A few basic lines and shapes are all we'll need.
| | 02:19 | While you can use a diagramming tool like
Visio or OmniGraffle and many of the Database
| | 02:24 | Management System tools, and programming IDEs
come with basic database diagram tools as well.
| | 02:31 | All you really need to model a database, at
least initially, is pencil and paper and be
| | 02:36 | prepared to think for a
moment about a few core questions.
| | 02:41 |
| | Collapse this transcript |
| Planning your database| 00:01 | There are two initial questions you need to
ask before even starting your database schema.
| | 00:06 | First, what's the point?
| | 00:08 | What is this database for? And be careful
of the first answer that leaps to mind.
| | 00:12 | Sure, in most cases, you're building a
database to support an application, whether that's
| | 00:16 | a desktop or mobile or web app.
| | 00:19 | Let's say you're building an online bookstore,
well, it's way too easy to say the point
| | 00:23 | of the database is to store product and
order information, and think you're done.
| | 00:29 | See this might be true, but what's the intention of
this bookstore, whether it's a website or an application?
| | 00:36 | Because wherever you wanted to go over the
next year or two or five should affect
| | 00:40 | what you're building right now.
| | 00:41 | So even having an elevated, but your mission
statement as corny as that might sound, will
| | 00:46 | help you build a better database.
| | 00:49 | If you have something like, "We help customers
find books, discover what others thought about
| | 00:54 | them, purchase and track their orders,
contribute their own reviews and opinions, and learn
| | 00:58 | about other products they may like based on
people with similar reading habits."
| | 01:03 | Well, you're going to build a very different
database from that second description than
| | 01:08 | you would from the first one.
Now the next question, what do you already have?
| | 01:13 | You might be lucky enough to be building this for a
completely new business where nothing has happened yet.
| | 01:18 | Most of the time, there is some existing process,
this database is intended to replace or supplement,
| | 01:23 | even if it's a manual one.
| | 01:25 | So, if so, get any physical assets together, print-outs,
order sheets, filing cabinets itself, and of course, people.
| | 01:33 | It's going to help you figure out
what data you need to keep track of.
| | 01:37 | Next, do you have an existing database
even if one isn't really worthy of the name?
| | 01:42 | It could be in a desktop application,
it could be in a set of spreadsheets.
| | 01:46 | If so, what's wrong with it?
What's right with it?
| | 01:50 | Don't just make the assumption that you'll
take everything you have as gospel and import
| | 01:55 | it all into the new database.
| | 01:57 | You might need to do that, but take the
opportunity to fix any problems that you have,
| | 02:03 | so you're not just recreating the same
problems in your brand new database schema.
| | 02:07 | So, understanding what you already have is
essential before you can answer the first
| | 02:13 | real question when designing a database.
Which is, what entities do you have?
| | 02:19 | We know that the relational
database consists of one or more tables.
| | 02:24 | Tables are the basic building block of a
relational database, and you will create separate tables
| | 02:30 | for each entity, that is each object or each
thing that needs to be represented in the database.
| | 02:36 | I'm using the word entity here just to
keep things a little loose and abstract.
| | 02:40 | We're trying to identify what entities we
naturally have, rather than just directly
| | 02:45 | asking, what tables do I need to go and make?
| | 02:48 | Some of your entities might represent things
that exist in the real world, Customer, Product,
| | 02:54 | Employee, a Patient, an Asset, but others could be more
abstract, a Blog entry, a Category, a Comment, an Appointment.
| | 03:04 | A sidebar, there is some debate about whether to use
plural or singular nouns when naming your
| | 03:09 | entities or tables.
Is it a Customer entity or a Customer's entity?
| | 03:14 | Now I'm strongly in preference
for singular nouns for your tables.
| | 03:19 | That also seems more natural when you
describe your entities that way, but I have certainly
| | 03:23 | encountered databases where every table was
named plurally, a Customer's table an Order's table.
| | 03:30 | It's not a big deal, but I'd strongly
suggest to pick one or the other, don't mix them.
| | 03:36 | If you're on the fence about it, pick singular.
So, sidebar over and back to it.
| | 03:42 | Right now, we want nothing more than
just to pick the most useful words.
| | 03:47 | We're not yet worried about
exactly what data to store for each one.
| | 03:50 | That will come later, but one word of warning.
| | 03:53 | Take care if you've already done or are doing
a similar kind of process for your application
| | 03:58 | itself, and you're working in an object oriented
language, because you often ask similar kinds of questions.
| | 04:04 | What exists in this application?
| | 04:06 | You might even be writing use cases, and
going through and picking out all the nouns.
| | 04:11 | Sure, there is some crossover in, for example,
a sales application I am likely to have both
| | 04:17 | an order table defined in my database and
an order class defined in my application,
| | 04:23 | but it is different.
| | 04:24 | In the relational database, we are
concerned about the data that needs to be saved.
| | 04:29 | Our focus is not on things like methods and
behavior, or inheritance, polymorphism, like
| | 04:34 | it might be in an object oriented language.
| | 04:37 | So we're concerned here just about our entities, and we
will be concerned about relationships between those entities.
| | 04:44 | Relationships like the fact that our customer
places an order then an order contains order
| | 04:49 | items that refer to specific products
whether the Paycheck is for a specific employee who
| | 04:56 | belongs to a particular department.
| | 04:58 | In fact, the stage of the process
determining your early design is often referred to as
| | 05:04 | Entity Relationship or ER modeling.
| | 05:06 | It's a complex sounding term that at heart really
isn't that bad, it's boxes with lines drawn
| | 05:13 | between them, or it can be done very
formally with a lot of specific connector lines,
| | 05:19 | but that's overkill right now.
| | 05:21 | One thing you might commonly see is the diamond
boxes being used to describe the relationships,
| | 05:26 | but right now our focus is to take the first
spin at our entities, which will become our
| | 05:31 | tables and sketch out the
vague relationships between them.
| | 05:35 | We'll describe exactly what those relationships are a little
later, whether they're one-to-many, one-to-one, many-to-many.
| | 05:42 | Right now, it's simply
enough to show they exist.
| | 05:47 |
| | Collapse this transcript |
| Identifying columns and selecting data types| 00:01 | After figuring out at least your first list
of what entities need to exist, you take a
| | 00:05 | look at each one to specify exactly
what data is important to store about them.
| | 00:10 | Now officially, in an entity-relationship
model, these are referred to as our attributes.
| | 00:16 | And just as entities will become our tables,
the attributes we draw it here for each one
| | 00:20 | will become the columns in those tables.
| | 00:23 | Now in sketching out an ER diagram,
attributes are often shown like this, and you can be
| | 00:28 | quite informal about writing them.
| | 00:30 | But as you move into actually defining the
official columns of the table, you're going
| | 00:34 | to need to become very specific about what
they are, so let's just get specific now.
| | 00:40 | First, as a rule when defining a table,
let's say an Employee table, you're going to go
| | 00:45 | very granular, meaning, as individual as possible,
so not just one column for name, but separate
| | 00:52 | columns for FirstName and LastName or if
your prefer, family name and given name and even
| | 00:57 | other columns for middle initial or
title or suffix, if you need those.
| | 01:03 | The question might be, why do
you split them up like this?
| | 01:06 | And one reason is because you may do
something different with that piece of data, and you
| | 01:10 | want to get to it by itself.
| | 01:12 | It is much easier to sort all your
employees by surname or find out how many customers
| | 01:17 | you have in a particular city, if you're
storing that piece of data independently, rather than
| | 01:23 | if it has to be extracted
out of some larger value.
| | 01:26 | Well, what should we call these columns?
What names should they have?
| | 01:30 | Well, you'll see a variety of ways that
people name the columns in their tables, sometimes
| | 01:35 | using underscores between multiple words,
sometimes a lower case using Camel case, Pascal
| | 01:41 | case, this has really more to do with your own naming
conventions, than anything a database will enforce on you.
| | 01:48 | As ever, the best idea is
simply to have a standard.
| | 01:51 | Write it down and stay consistent.
| | 01:54 | I'll be using Pascal casing in my examples, meaning, I'll
upper case each word, and I won't use underscores.
| | 02:00 | Now, some databases do support using spaces
in your column names, but I avoid those as
| | 02:06 | they often mean a name has to be enclosed in
quotes when referring to it programmatically,
| | 02:11 | and this becomes inconvenient.
But this is by no means enough.
| | 02:16 | You next have to say what kind of data is
going to be stored in each of these columns,
| | 02:21 | really what is the data type for each column?
| | 02:23 | Is it text or character data,
or is it numeric?
| | 02:28 | Is it a date, a time, or even binary data like
an image, or a piece of audio or video?
| | 02:34 | Data types in a database are absolutely not
the same as data types in a programming language.
| | 02:39 | You'll find that most database systems want
you to be much more specific about your columns
| | 02:44 | than a programming language wants you to be
about your variables, and the Database Management
| | 02:49 | System wants to know these specifics so it
can be efficient about storing and indexing
| | 02:54 | them, and so it can enforce your rules.
| | 02:58 | For example, if you want to store an integer
in one of your columns, you'll find that most
| | 03:03 | Database Management Systems have multiple kinds of
integer data type for different sizes of integer.
| | 03:09 | If I look at the online manual for the
MySQL Database Management System, I can go into
| | 03:14 | the Numeric Types, and I'll find that just
for integer alone, we got the regular INT,
| | 03:19 | we've also got TINYINT, SMALLINT, MEDIUMINT
and a BIGINT, all different lengths in terms
| | 03:25 | of storage from one to eight bytes, and all
different sets of numbers that they can take,
| | 03:30 | whether they're signed to unsigned values.
So, you can be very, very specific indeed.
| | 03:35 | And you'll find the same in things like SQL
Server as well and other Database Management Systems.
| | 03:41 | BIGINT, SMALLINT, but no
MEDIUMINT in SQL Server.
| | 03:45 | Now, we can't cover every possible option
in every possible Database Management System,
| | 03:51 | but we will cover the basic
issues because they're similar.
| | 03:54 | What you'll find is when you get into
using a specific DBMS, you'll just have a cheat
| | 03:58 | sheet or a bookmark of all the
available data types until you get to know them.
| | 04:02 | Now, with some data types, things like character
or text data, you can also specify if it should
| | 04:08 | be a fixed length or a variable length.
| | 04:10 | I might specify it here, for example, that my
state should be limited to just two characters.
| | 04:15 | And really, this all comes down to you
knowing what each piece of data should be.
| | 04:19 | And if you've got some character data is it
going to represent a FirstName or might it
| | 04:24 | represent the content of
a 1,000-page manuscript?
| | 04:27 | The idea here is that you want to use a column
definition that's as small as possible but no smaller.
| | 04:34 | Fixed lengths are more efficient.
| | 04:36 | But as you might imagine,
they're just not all about flexible.
| | 04:38 | If you try and store a 15-character name in a
column only defined as 12-characters, you'll
| | 04:43 | just lose the last three characters.
| | 04:46 | You will also find that many database systems have
different options for storing ASCII text and Unicode text.
| | 04:52 | So, if you know you're storing
international text, big five languages and so on, you'll
| | 04:57 | need to pick a Unicode character format.
| | 05:00 | And even though Unicode does take up more
space, I tend to default the Unicode format
| | 05:04 | unless I absolutely know I won't need them.
| | 05:08 | And then there's the option of storing
binary data, a photo, a video, audio, a document
| | 05:13 | where you typically have a binary type.
| | 05:15 | Now some databases have specific methods for dealing
with very large binary objects, gigabytes of binary files.
| | 05:23 | You may hear these data types
referred to as Binary Large Objects or BLOBs.
| | 05:28 | Now likewise, large amount of text can be managed
differently as Character Large Objects or CLOBs.
| | 05:36 | After defining the data type and optionally
the length, one very important question is
| | 05:41 | when you add a new row, will a value be
required in this column or is it optional?
| | 05:47 | Most databases default to requiring all columns to
contain values, but sometimes that's not necessary.
| | 05:53 | You might require, for example, a value in
AddressLine 1, but not in AddressLine 2, it's
| | 05:59 | just not needed, it's not relevant.
| | 06:02 | So, what you do is you actually define
particular columns to allow null values.
| | 06:08 | Null is an important
keyword in relational databases.
| | 06:12 | A null is not the same as
just a space or a blank entry.
| | 06:16 | Null means the complete absence of a value.
| | 06:19 | So, your columns can be defined as allowing null
values, or more typically a column can be not null.
| | 06:26 | A value is required.
| | 06:28 | And one option here is that
perhaps a default value for a column.
| | 06:32 | So if you don't provide a value, can a
default one be entered automatically?
| | 06:37 | I might say, for example, that when I'm
creating a new employee row, if I don't put anything
| | 06:42 | in for the date hired, it will default to
today's date, the date the row was created.
| | 06:47 | Along the same lines, we might do some constraints to
check are there maximum or minimum values for something.
| | 06:54 | Should it match a pattern, like matching an email
address or a phone number or a credit card number?
| | 07:00 | So flexibility is usually our friend in programming, but
it's not what you're looking for in most relational databases.
| | 07:07 | If you want to store an email address for an
employee, you want to know what will always
| | 07:11 | be an email address, not sometimes an email
address, sometimes an inspiring quote or a
| | 07:16 | date or an MP3 file or just missing entirely.
| | 07:20 | What you're hoping to end up with is that
for each table, each entity that you have,
| | 07:25 | you have a reasonable list of the columns to
create and the data allowed in each column.
| | 07:30 | And defining your columns as exactly as
possible, means that your database will enforce these
| | 07:36 | rules on those columns, that your data will
stay valid and consistent, and you won't end
| | 07:41 | up with a database full of garbage.
| | 07:46 |
| | Collapse this transcript |
| Choosing primary keys| 00:01 | Each table should have a Primary Key, a
value that uniquely identifies an individual row
| | 00:06 | where there can be no
duplicates and no confusion.
| | 00:09 | So, if we have an EmployeeID, it
should take us to only one employee row.
| | 00:13 | If we have a CustomerID, it
takes us to only one customer row.
| | 00:16 | If we have an ISBN number, it
takes us to one specific book.
| | 00:20 | When creating a database schema, we need to
say which column contains that Primary Key
| | 00:25 | for each of our tables, and if there
isn't one, we usually need to make one.
| | 00:30 | As I mentioned earlier, occasionally
the key is already naturally in the data.
| | 00:35 | If, for example, I define a Book table where
we enter all the details of a book, including
| | 00:40 | its ISBN, well, that ISBN
number should be naturally unique.
| | 00:45 | I could then tell the database that this
is the Primary Key, and because it occurs in
| | 00:51 | the data, this is what
sometimes called a Natural Key.
| | 00:55 | However, in many examples,
it doesn't naturally occur.
| | 01:00 | As I start to build our Customer table, I might
realize there's nothing in this that's inherently unique.
| | 01:06 | We might say that Email is unique, but
sometimes the same Email address is shared by multiple
| | 01:10 | people as this the same address.
| | 01:13 | So, if I decide there's nothing that I could guarantee
would always be unique, I would add a Customer ID column.
| | 01:20 | Now in most Database Management Systems, the
way that you do this is to define this column
| | 01:25 | as an integer, but also add an option to
make it automatically increment and then specify
| | 01:31 | it as the Primary Key or the Identifier.
| | 01:34 | In this way, when adding a new row, we don't
have to say what this value is the database
| | 01:40 | will automatically generate it for us, and
it will always be a unique value in this table.
| | 01:45 |
| | Collapse this transcript |
| Using composite keys| 00:01 | One option for a Primary Key that you might find useful
from time to time is something called a Composite Key.
| | 00:07 | This is when one value does not uniquely
identify a row, but two values do, where we'd combine
| | 00:14 | two column values to
create a unique Primary Key.
| | 00:16 | It sounds a little weird, so
let me give you an example.
| | 00:20 | Let's imagine I've got a publishing company and part
of what I do is deal with Yearbooks for high schools.
| | 00:25 | So, I have a Yearbook table.
| | 00:27 | Now, the way this currently stands, none of
these columns are naturally a potential Primary Key.
| | 00:33 | There's nothing here that's inherently unique.
The School name repeats and the Year repeats.
| | 00:39 | PageCount might be unique
but that's just coincidence.
| | 00:42 | We're not looking for something
that may be unique, if we're lucky.
| | 00:46 | We need something that will always be and
must be unique and makes logical sense to
| | 00:52 | use when identifying each row.
So, I can't use the School name.
| | 00:56 | I can't use the Year, but
what we can do is combine them.
| | 01:01 | If in this Yearbook business, we create one
book for each School per year, then if I select
| | 01:07 | the name and the year that
can uniquely identify each row.
| | 01:12 | There might be multiple rows for Orchard High and multiple
Years for 2010, but there's only one Orchard High for 2010.
| | 01:20 | Only one Orchard High for 2011, only one
Lawstone Elementary for 2010, and so on.
| | 01:27 | This is a legitimate Primary Key, and this is known as
a Composite Key where it's composed of two or more values.
| | 01:36 | Now, it's true it might sometimes be more
useful or even just be more convenient to
| | 01:41 | generate a surrogate Primary Key column anyway.
| | 01:44 | Say in this case, a column called Yearbook ID
that will automatically generate an integer,
| | 01:50 | but Composite Keys can be a useful technique,
and you will run into them from time to time.
| | 01:56 | One place you will see them, and we'll see
them later on is they're used when we're joining
| | 02:00 | tables together to create
many-to-many relationships.
| | 02:05 |
| | Collapse this transcript |
|
|
4. Database Modeling: RelationshipsCreating relationships| 00:00 | I talked about Relationships in the earlier
section on features of relational database,
| | 00:05 | but it's time to get more specific.
| | 00:07 | You're organizing your database into
individual tables, and many of these tables
| | 00:12 | will need to know about each other.
| | 00:13 | Now, if you've started by sketching out a
rough Entity Relationship diagram, you begin
| | 00:19 | by just attempting to show that some kind
of relationship exists between your entities,
| | 00:25 | but to get closer to actually building this in a
database, we need to say, what these Relationships are.
| | 00:31 | And while it is common in an ER diagram to
use a phrase or a verb to describe it like,
| | 00:37 | contains or refers to or belongs to.
That's for your benefit.
| | 00:41 | The database can't describe it that way.
| | 00:44 | Now, there are only three kinds of relationship of
cardinality between tables in a relational database.
| | 00:52 | One of them dwarfs the others, there is
one-to-one, one-to-many and many-to-many.
| | 00:58 | Okay, some people might
add a fourth of none-at-all.
| | 01:02 | But the time to start defining these is yet
another example of how you can keep to a process
| | 01:08 | and order when designing databases.
| | 01:11 | First, you sketch out your
entities, which will become your tables.
| | 01:15 | Then you're going to define the data, the
attribute for those entities, which will become
| | 01:19 | columns in your tables, because only then
can you say, which columns will be the primary
| | 01:24 | keys, or if you need to generate primary keys.
| | 01:27 | You can't properly define your
relationship until these primary keys are specified.
| | 01:32 | Now, I'm not trying to pretend that it's a
purely linear process, not all your tables
| | 01:37 | will become obvious in
the first couple of steps.
| | 01:40 | When you go through the process of normalization,
which we'll get to in the next section, you'll
| | 01:44 | probably end up with a few new tables.
| | 01:47 | So we will revisit everything we've started to
define, but we can still work through a process.
| | 01:52 | Now, the benefit of specifying our
relationships formally in the database, rather than just
| | 01:58 | informally putting things together in an
application is that the database management system itself
| | 02:03 | will take care of keeping those
relationships valid and meaningful.
| | 02:07 | You can then use those
relationships to answer all sorts of questions.
| | 02:11 | Everything from the basic idea of, how
many orders has a particular customer had.
| | 02:16 | How many products were in an order?
Or say in a game system.
| | 02:20 | What weapons belong to a particular player?
| | 02:22 | How many quests has one player
completed compared to another?
| | 02:26 | All the data is stored in these separate tables,
but it becomes easy to jump from one to another
| | 02:32 | by following the relationships.
| | 02:34 | Next, we'll get into the
most common, the one-to-many.
| | 02:39 |
| | Collapse this transcript |
| Defining one-to-many relationships| 00:00 | The most typical kind of cardinality in a
relational database management system is one-to-many,
| | 00:06 | which is just as easily
described as many-to-one.
| | 00:08 | It just depends on which side
of the relationship you start at.
| | 00:11 | In the example I showed earlier was this, perhaps
the archetypal one-to-many or many-to-one relationship.
| | 00:19 | Customer and Order, one
customer can have many orders.
| | 00:23 | Meaning, one Customer row can be associated
with multiple Order rows but each Order row
| | 00:29 | is only for one Customer.
| | 00:31 | Now creating this relationship depends on
us having our primary keys defined first.
| | 00:36 | If you're attempting to describe that we
need to link from a row in one table to specific
| | 00:41 | rows in another table, we need a way to get to those
specific rows and getting to a specific row takes a primary key.
| | 00:48 | So going into this, we must have our tables with our
columns and primary keys at least roughly planned out.
| | 00:54 | Next, implementing a new one-to-many
relationship requires a change to whatever table represents
| | 01:01 | the many side of the relationship.
| | 01:03 | So, to relate Customer and Order tables, I don't need
to change anything about the Customer table, that's the one side.
| | 01:09 | I need to add some extra information to the
Order table, and it's the key to the Customer table.
| | 01:16 | So, Customer ID, and this
again is called a Foreign Key.
| | 01:20 | It represents a column in this Order table
that is a key to a row in a different table.
| | 01:27 | These will specifically refer to one
and only one row in the Customer table.
| | 01:33 | You might find the Customer ID occurs more
than once in order, but it's always pointing
| | 01:38 | to only one row in the customer.
| | 01:40 | We always make the change to the many side of the
relationship because it's the only way to do it.
| | 01:45 | I can add one column with one value to every
Order row that will always point to a correct
| | 01:50 | customer, but I cannot add one column with
one value to the Customer row that could point
| | 01:55 | to a variety of different orders.
| | 01:57 | So, I'm making a match between the
columns in these different tables.
| | 02:02 | Now very often, we would use the same
column name across both of those tables.
| | 02:06 | So in this case, Customer ID and Customer,
Customer ID and Order, but it doesn't have
| | 02:11 | to be the same because we are really
making the match on the values in these columns.
| | 02:16 | In some cases, you couldn't use the
same name because there'd be a conflict.
| | 02:21 | Perhaps both of these tables were defined
with the primary key just called ID, or we
| | 02:25 | couldn't reuse the ID
column twice in the Order table.
| | 02:29 | There would be a conflict
there, it just wouldn't work.
| | 02:32 | So, we call it something else like Customer ID.
It's still a foreign key.
| | 02:37 | It just happens to have a different
column name between the different tables.
| | 02:41 | Another thing that you might see is a completely
different name that attempts instead to describe the relationship.
| | 02:47 | So, instead of using Customer ID, we might
use something like Placed By as a column name.
| | 02:53 | And Order is Placed By a particular Customer ID.
| | 02:56 | It still refers to a value that's in the ID column
in the other table, it's just using a different name.
| | 03:02 | So, the name of the column doesn't have to
match but the data type certainly should.
| | 03:08 | It doesn't make sense to have a primary key
that's an integer in one table, try and match
| | 03:13 | a column that's described as
character data in a different table.
| | 03:16 | Now it's also very common to have one table
that takes part in multiple relationships.
| | 03:21 | So, a customer can have many orders.
| | 03:24 | We go to one-to-many relationship here,
but we may decide that our customers can have
| | 03:29 | multiple different addresses they may ship
to, so we might add a new table for address
| | 03:34 | and have another one-to-many
relationship between customer and address.
| | 03:39 | This is perfectly acceptable and very common.
| | 03:42 | Another option is that a table that is on
the many side of a one-to-many relationship
| | 03:47 | could be on the one side of another.
| | 03:50 | So, for example, Order might be the many
side of the customer to order relationship but
| | 03:56 | Order itself could have many order items,
so it's on the one side of that relationship.
| | 04:02 | One customer has many orders, one
order can contain many order items.
| | 04:06 | Now as we start to get a bit further, we're going to
want to start to diagram these a bit more specifically.
| | 04:12 | I've shown the basics of entity relationship diagrams as
they are very simple to just get started conceptually.
| | 04:19 | Basically, boxes with lines between them.
| | 04:21 | Although as you start to more formalize the
actual database diagram, you'll see a different
| | 04:27 | kind of layout emerge, and
there's no fixed one standard.
| | 04:31 | You'll see different options used across different database
management systems and different charting applications.
| | 04:38 | But they usually boil down to this kind of idea.
| | 04:41 | A box for each table with the name of the
table at the top, then we'd have the column names.
| | 04:48 | You may or may not add a little extra to show
what kind of data these are, character data,
| | 04:53 | dates, binary, integers, and so on.
| | 04:56 | What is very common is you would add a PK
for a primary key, and you would add an FK
| | 05:02 | to denote a Foreign Key and then start adding
the relationship connector lines between them.
| | 05:08 | This kind of diagram is very, very
common across all relational databases.
| | 05:13 | It's easy to see what tables exist, what columns,
what relationships exist between the tables.
| | 05:18 | Now in most relational database management
system administration software, there is an
| | 05:23 | option to generate these diagrams from an
actual existing database in that server, and
| | 05:29 | the relationships themselves will
be shown with these connector lines.
| | 05:32 | Different software tends to
generate different looking lines.
| | 05:35 | As I mentioned earlier, some will use the
Crow's foot style to show the many part of
| | 05:40 | a one-to-many relationship.
Some will show the infinity symbol.
| | 05:44 | There are other ways of doing this, but
it's usually not that difficult to pick any one
| | 05:48 | relationship and figure
out which way it's going.
| | 05:53 |
| | Collapse this transcript |
| Exploring one-to-one relationships| 00:00 | It's possible to create one-to-one relationships in your
database, but it's actually very unusual and here is why.
| | 00:07 | Let's imagine I'm building an HR database.
| | 00:09 | So, I sketched out a few
entities based on real world object.
| | 00:14 | And as part of this system, I figured out
I need to keep track of employee data and
| | 00:18 | driver's license information so I'll
create two tables just for those pieces of data.
| | 00:24 | But let's say the only reason I need to do
this, to keep track of driver's license data
| | 00:28 | is so I can associate a license with an
employee for making travel reservations.
| | 00:33 | So, in this system that I'm describing,
it's a pure one-to-one relationship.
| | 00:39 | One employee row points to
one driver's license row.
| | 00:42 | There are never multiple employees for one
driver's license, and there are never multiple
| | 00:46 | licenses for one employee.
| | 00:48 | Now, I understand there are always caveats,
so I'm assuming that this system just doesn't
| | 00:53 | care that a few people might possess two
licenses for different states or countries.
| | 00:59 | It's trying to model the idea that we
need one listed primary driver's license.
| | 01:04 | So, I could create these as two
tables with a one-to-one relationship.
| | 01:09 | I could store a foreign key to the driver's
license table in my employee row, or I could
| | 01:13 | do it the other way around or in fact, like
I'm showing here, I could just use the same
| | 01:18 | primary key for each table, just
mapping one row directly to another.
| | 01:23 | But if this is the actual situation that it's
a pure one-to-one, then I might as well just
| | 01:29 | combine the two tables and have the
driver's license information stored directly in the
| | 01:34 | employee row with no relationship needed.
| | 01:38 | So even if that driver's license might be
a physical different entity, this makes a
| | 01:43 | lot more sense to do it this way.
| | 01:45 | One thing to take care of, when you're new
to database design, it sometimes happen that
| | 01:49 | you think you've got a one-to-one
relationship when you don't.
| | 01:52 | Here is an example where I've
seen this happen multiple times.
| | 01:55 | Let's say we're drawing up a fairly basic
situation with an Order, an OrderItem, and a Product.
| | 02:01 | There's some kind of relationship here.
| | 02:03 | So, Order is storing information like
an Order ID, a date, a total amount due.
| | 02:08 | Each OrderItem is storing a Foreign Key to
order ID, storing a quantity, and product ID
| | 02:15 | because it's linking to a different table
product that's storing information like the
| | 02:18 | description in the list price.
Now I know there's a relationship between them.
| | 02:23 | And if asked someone to diagram the probable
relationships in between basic entities like
| | 02:28 | this, I usually have
something that goes this way.
| | 02:31 | They know it's a one-to-many
between Order and OrderItem.
| | 02:34 | Okay, one order can contain multiple
OrderItems and then they take the OrderItem which is
| | 02:40 | for a product so it has a
product key and a quantity.
| | 02:44 | And the reasoning often goes, well, one OrderItem
is for one and only one product so it's a one-to one.
| | 02:53 | But of course it isn't.
| | 02:55 | What they've correctly determined is that it
is a two-one relationship, meaning an OrderItem
| | 03:00 | is for one and only one product but they
haven't looked at it the other way, that the same
| | 03:07 | product can be associated
with multiple OrderItems.
| | 03:10 | So, this really is a many-to-one.
| | 03:13 | It's going from OrderItem to product and a
one-to-many going from product to OrderItem.
| | 03:18 | One product could be ordered multiple times.
| | 03:21 | So, make sure that you're looking at your
relationship both ways, particularly when
| | 03:26 | you think you have found a one-to-one.
| | 03:31 |
| | Collapse this transcript |
| Exploring many-to-many relationships| 00:00 | While I went through an example of a many-to-many
relationship earlier, it's worth going through
| | 00:05 | again, particularly if
you're new to database design.
| | 00:08 | So, I had used the example of author and
book that one author could write multiple books,
| | 00:15 | but also one book could be
written by multiple authors.
| | 00:18 | So, this is a many-to-many business scenario.
| | 00:21 | And I'm going to use a different business
problem to describe at this time around, in
| | 00:25 | this case, a database for a training center.
| | 00:28 | So, let's imagine we have two
tables right now, class and student.
| | 00:34 | Class includes the information like a title
of a training course, what date it's on, what
| | 00:38 | classroom it's in, and the student is just
exactly the information you would imagine,
| | 00:43 | first name, last name, email, and so on.
| | 00:45 | And what I want to describe is a relationship
between them, that a class can be attended
| | 00:50 | by multiple students but that also a
student could take multiple different classes.
| | 00:56 | It is a many-to-many relationship.
| | 00:58 | And I cannot add one column to either
table that would successfully represent all the
| | 01:03 | possible combinations because if I had a
StudentID column to the Class table, then I can only
| | 01:10 | have on student in each class.
| | 01:13 | If add a ClassID column to the Student table, then
I can say that each student can only take one class.
| | 01:20 | So once again, in relational database, you
cannot represent a many-to-many relationship directly.
| | 01:26 | You need to create a new table to link the
two, in several names with this kind of table,
| | 01:32 | a joint table, a junction table, a linking
table, a bridging table, a cross-reference
| | 01:37 | table, it really doesn't
matter, what ever you prefer.
| | 01:41 | By convention, the name of this table is
just usually made up by taking the names of the
| | 01:46 | two tables that it's cross-referencing
and putting them together.
| | 01:49 | So in this case, ClassStudent, or
StudentClass, either would work.
| | 01:54 | And there's a one-to-many relationship from
Class, the ClassStudent linking table, and
| | 01:59 | another one-to-many relationship from the
Student table to the ClassStudent table.
| | 02:04 | But this linking table doesn't contain
any other data than just two columns.
| | 02:09 | One of those columns is a Foreign Key to
the Class table with Class ID, the other is a
| | 02:14 | Foreign Key to Student with StudentID, and this
is how we represent a many-to-many relationship.
| | 02:19 | So, if I want to take Student ID 102, Viola,
and say that she takes multiple classes.
| | 02:26 | Then what I need is multiple rows in that
liking table both referring to Student ID
| | 02:32 | 102 but referring to different Class IDs, in this
case, 441 and 442, one student, multiple classes.
| | 02:41 | And if I want to go the other way taking one
class, say the database design class, Class
| | 02:47 | ID 441, I'll take that over into the liking
table, find all the rows with 441, find the
| | 02:54 | student for those rows, 102, 101, 103, and just map
those to the Student table, one class, multiple students.
| | 03:02 | So is this a little more tedious than just
having a single one-to-many relationship?
| | 03:07 | Sure, but it's the only way that we
can represent this kind of cardinality.
| | 03:12 | Now one question you might have about the
Class-Student linking table is what is the
| | 03:18 | primary key for it?
Do we need to add a new column?
| | 03:22 | Well, we could add one, add an automatically
incrementing primary key but we're unlikely to ever use it.
| | 03:29 | This is one of the cases where we could
just tell the database that we're going to use
| | 03:32 | a composite or compound key, but neither
column by itself would be enough to make any row
| | 03:38 | unique, but both of them together would.
| | 03:41 | There should only ever be one row with a
particular StudentID for a particular ClassID.
| | 03:47 | ClassID can repeat, StudentID can
repeat, but both of them together, won't.
| | 03:53 | That's enough to make every row unique.
Now one more thing.
| | 03:57 | First, I'll recap that quite often, you begin by
thinking a relationship is a typical one-to-many
| | 04:05 | and only later realize it is in
fact a many-to-many in disguise.
| | 04:10 | One reason for this is that the relationship
in many-to-many isn't always weighted equally.
| | 04:16 | Let's say that in your organization, one department has
multiple employees who each work for only one department.
| | 04:24 | That would be a one to many.
It's easy to show and easy to represent.
| | 04:29 | You would make it so each employee row
just get to Department ID, and you're done.
| | 04:33 | But if there exist, that one employee who's
supposed to be splitting their responsibility
| | 04:39 | across multiple departments, and you need
to represent that in your system, then this
| | 04:44 | is simply not going to work anymore.
| | 04:47 | It's now a many-to-many relationship, and
you would need a linking table to be able
| | 04:52 | to contain that in your system.
| | 04:57 |
| | Collapse this transcript |
| Understanding relationship rules and referential integrity| 00:01 | So, why do we go to all this trouble?
| | 00:02 | Well, it's usually easy to understand that
relationships are a convenient way to link
| | 00:07 | from one table to another.
For a customer, we can just get all the orders.
| | 00:11 | Well, just as important, if not more so is
the idea that a relationship describes a rule,
| | 00:17 | a constraint that I cannot violate, that I now must
not have an order row for a customer that doesn't exist.
| | 00:25 | The database will not let me add a new
order row with a Customer ID 388 if there is no
| | 00:30 | Customer ID 388 in the Customer table.
| | 00:34 | I may say but I'm just about to add that customer, but
at that moment the database would be no longer valid.
| | 00:40 | It would no longer be internally consistent.
| | 00:43 | This is what's considered
a referential constraint.
| | 00:46 | It's a rule that applies between tables
where each table isn't just applying the rules of
| | 00:51 | its own column definitions, but it's also
cross-referencing the data in the Customer
| | 00:56 | table to make sure that
this is or is not allowed.
| | 01:00 | If there isn't a Customer ID 388, you
would not be able to enter that row.
| | 01:05 | So, defining these rules will often imply
a sequence in your database for creation of
| | 01:11 | new content, but in this if I have a new
customer and a new order for that customer,
| | 01:17 | I must add the customer first
before I can add the order for it.
| | 01:21 | It would allow a customer without an order, but it
won't allow an order for a non-existing customer.
| | 01:28 | The idea that data is not just valid within
one particular row or one particular table
| | 01:34 | according to the rules of that table, but
valid and meaningful between all your tables
| | 01:39 | is known as referential integrity.
| | 01:41 | There are two places that
it primarily has an impact.
| | 01:46 | Adding new rows as I just described, making
sure we can't add an order row with a Customer
| | 01:50 | ID that doesn't exist.
This also applies for updates.
| | 01:54 | I can't bypass it by adding a row for
Customer 369 who does exist and then changing that
| | 02:00 | to a value that doesn't.
It will refuse that too.
| | 02:04 | This is still referential integrity
enforcing the rules of the relationship.
| | 02:09 | What's more interesting is what happens with
referential integrity when you delete something.
| | 02:14 | So, if our data currently exist between these
two tables in a valid meaningful state, say,
| | 02:20 | I have Customer ID 367 who has two associated
orders, well, what happens if I delete Customer 367?
| | 02:28 | Well, in most database systems, it depends.
| | 02:32 | It depends on you and your rules
and what you want to have happen.
| | 02:36 | One option you have when deleting between
tables that have relationships is something
| | 02:41 | called a Cascading Delete.
| | 02:44 | If I want to delete a row from the Customer
table, the database says, "You're the boss",
| | 02:49 | and we'll just immediately delete any related row.
| | 02:52 | So, in this case I tell it to delete
Customer 367, it will first cascade down and delete
| | 02:58 | all the Order rows for that
customer and then delete the Customer row.
| | 03:02 | If you have multiple tables connected to each
other, one delete operation could theoretically
| | 03:08 | delete many different
connected rows in other tables.
| | 03:11 | Another option for
deleting is cascading nullify.
| | 03:14 | Some database systems have this option, although it is
rarely used, where you would null the Foreign Key column.
| | 03:22 | So, in this case, I delete Customer ID 367
and the database would automatically scan
| | 03:27 | all the orders and where it
finds 367, set that to null.
| | 03:31 | So, it's a situation where I must keep the
order, but it's kind of now detached from
| | 03:36 | a particular customer.
It still exists but with a null column.
| | 03:39 | So, what we're not doing is pointing to a
customer that doesn't exist anymore. But more
| | 03:44 | likely than either of this
is simply refusal. No action.
| | 03:49 | This is the default in most database management
systems, that if you've created a relationship
| | 03:55 | between these two tables and then you try
and delete a customer that has existing orders,
| | 04:01 | it's not going to let you do that.
It will refuse the delete option.
| | 04:05 | So just as when creating new data, I'd first
have to create the customer first then the order.
| | 04:11 | When deleting, you'd need to delete all the
orders first or transfer their ownership to
| | 04:16 | some other Customer ID and only then
would I be allowed to delete that Customer row.
| | 04:21 |
| | Collapse this transcript |
|
|
5. Database Modeling: OptimizationUnderstanding normalization| 00:00 | Once we've started to plan out our tables, our columns,
and relationships, we do something called Database Normalization.
| | 00:07 | This is a process where you take your database
design, and you apply a set of formal criteria
| | 00:13 | of rules called Normal Forms.
| | 00:16 | These were developed about 40 years ago mainly
by Edgar Codd, the father of relational databases.
| | 00:21 | And we step through them 1, 2, 3, first normal
form, second normal form, and third normal form.
| | 00:28 | There are others but
these are the important ones.
| | 00:33 | Normalization should be carried
out for every database you design.
| | 00:36 | It's really not that hard, even though, yes,
| | 00:39 | when you first start reading about
database normalization, you'll run into phrases like,
| | 00:44 | your database will comply with third normal
form when every non-prime attribute of R is
| | 00:48 | non-transitively dependent (i.e. directly dependent)
on every candidate key of R.
| | 00:53 | But you don't have to get into all this
language unless you are mathematically inclined.
| | 00:58 | The entire point of normalization is to make
your database easier and more reliable to work with.
| | 01:04 | You usually will end up creating a
few new tables as part of the process.
| | 01:09 | But the end result is your database will
contain a minimum of duplicate or redundant data.
| | 01:15 | It will contain data that's easy to get to,
easier to edit, and maintain, and you can
| | 01:20 | preform operations, even difficult ones on
your database without creating garbage in
| | 01:25 | it, without invalidating the state of it.
| | 01:28 | And it's something we will do as part of our
initial design, and it's also something that
| | 01:32 | we would reapply when our
database design is tweaked and revisited.
| | 01:36 | Now, let me get something straight.
| | 01:39 | Normalization is not an ivory tower,
theoretical, academic thing that people talk about,
| | 01:44 | but nobody really does in the real
world, no, everybody does this.
| | 01:49 | If you're a working database administrator or
database designer, you can do normalization in your sleep.
| | 01:55 | It's a core competency of the job. It's important.
| | 01:59 | And as you'll see, we've
already been doing a little of it.
| | 02:04 |
| | Collapse this transcript |
| First normal form| 00:00 | Before we apply the first set of criteria,
what's called first normal form, often shortened
| | 00:05 | to 1NF, I'm taking as a given that we already have
our columns and our primary keys specified. Okay.
| | 00:13 | First normal form says that each of your
columns and each of your tables should contain one
| | 00:17 | value just one value, and there
should be no repeating groups.
| | 00:22 | Okay, what does this actually mean?
| | 00:23 | Well, let's say I begin developing a
database for my company and one of my tables is an
| | 00:28 | Employee table, very simple stuff,
EmployeeID, LastName, FirstName, and so on.
| | 00:34 | And we allocate every employee a computer.
| | 00:37 | I want to keep track of that, so we'll add a
ComputerSerial column to keep track of who has what.
| | 00:43 | Now, this is actually okay right now.
This technically is in first normal form.
| | 00:49 | Here's the problem.
| | 00:50 | Let's say I figured out that some of our
employees need a Mac and a PC to do the testing.
| | 00:55 | Others need a desktop and a laptop.
| | 00:57 | So, several people have multiple computers,
and I want to keep track of all of them.
| | 01:01 | There is a couple of ways
that I could deal with this.
| | 01:04 | I could just start stuffing
extra data into that one column.
| | 01:08 | We could start putting commas or vertical
bars or any other delimiter and put in multiple
| | 01:13 | values in the one ComputerSerial column.
| | 01:17 | This is just something you just
don't do in Relational Database Design.
| | 01:22 | We're violating first normal form.
| | 01:25 | Understand the relational databases
will happily deal with hundreds of tables.
| | 01:30 | Each table could have hundreds
of columns and millions of rows.
| | 01:33 | But they do not want columns that
have a variable amount of values.
| | 01:38 | Perhaps, there's one value, perhaps
there are two in it, perhaps there are ten.
| | 01:43 | Each column and each row
should have one and only one value.
| | 01:47 | Now, could you do it this way? Sure,
| | 01:50 | technically, if that column was defined as
character data, the database wouldn't stop
| | 01:55 | you from entering commas
and multiple values into it.
| | 01:58 | But as far as the database is concerned, it's still
just one value that just happens to have commas in them.
| | 02:04 | So, you would find it hard to
search directly for a serial number.
| | 02:07 | You'd find it hard to sort.
You'd find it hard to maintain.
| | 02:10 | So, it's not in first normal form if you do
this because first normal form demands that
| | 02:15 | every column, every field
contains one and only one value.
| | 02:19 | So, what we might do then is go back to the
original way, and instead start adding new columns.
| | 02:26 | So, ComputerSerial2, ComputerSerial3, this
is what's called a repeating group, and there
| | 02:33 | should be no repeating groups.
| | 02:35 | The classic sign of a repeating group column
is a column with the same name, and the number
| | 02:40 | tacked onto the end of it just to make it unique,
because usually this is a sign of an inflexible design.
| | 02:46 | Sure, if we could guarantee that there would
only ever be two or three, that's fine.
| | 02:51 | But what happens when we want to
add the tablet and the smart phone?
| | 02:55 | What happens when one employee manages testing
and needs to be associated with six computers?
| | 03:01 | We don't want to require a change to the
database schema just because we buy a new computer.
| | 03:06 | So, what do we do here?
| | 03:08 | Well, what we do is the same thing
for a lot of these normalization steps.
| | 03:13 | We'll take this data out of the
Employee table, and put it in its own table.
| | 03:19 | This then has relationships.
| | 03:21 | We create a one-to-many relationship between
employee, and this new computer, or it could
| | 03:27 | be called an asset table or
whatever else makes sense.
| | 03:30 | And it has a foreign key
back to the Employee table.
| | 03:33 | I can take any EmployeeID like 551, follow
it to the Computer table, and find his two
| | 03:39 | computers or 553, find his three computers,
there are no repeating values, no repeating
| | 03:46 | groups in either table.
And this will get us into first normal form.
| | 03:51 | Now, it's very common that the solution to a
normalization issue is to create a new table.
| | 03:57 | Sometimes, it's a one-to-many relationship
like this, other times it might even require
| | 04:01 | a many-to-many with a linking table.
| | 04:06 |
| | Collapse this transcript |
| Second normal form| 00:01 | Before you attempt to go into second normal form or
2NF, well first, you have to be in first normal form.
| | 00:08 | You don't pick and choose between them.
You go through this one, two, three.
| | 00:12 | Now whereas first normal form is about the
idea of repeating values in a particular column,
| | 00:18 | second normal form, and third normal form
are all about the relationship between your
| | 00:23 | columns that are your keys, and your
other columns that aren't your keys.
| | 00:28 | The second normal form has the rather
puzzling official description that any non-key field
| | 00:34 | should be dependent on the entire primary key.
| | 00:37 | And that is about as
simple as it can get phrased.
| | 00:40 | Now, when I say the word field, it usually
refers to the idea that the actual value in
| | 00:45 | a particular column
position for a particular row.
| | 00:49 | But what does this actually mean?
| | 00:51 | Well, for most of what we've done in this
course, this actually won't be an issue for us.
| | 00:56 | Second normal form is only ever a problem
when we're using a Composite Primary Key.
| | 01:01 | That is a primary key
made of two or more columns.
| | 01:05 | So, let me show you a table that currently is in
first normal form but not in second normal form.
| | 01:12 | Going back to the idea of a database for a
training center, I have an Events table here
| | 01:16 | that has an ID of a Course, a Date,
CourseTitle, Room, Capacity, AvailableSeats, and so on.
| | 01:23 | Now, what's actually happening here is
this table has been defined to use two columns
| | 01:28 | as the primary key.
It's a composite primary key.
| | 01:31 | And that's instead of just the Course ID
column, we can't do that because, as you can see,
| | 01:37 | SQL101 occurs multiple times, that can't
be a primary key, neither could we use date.
| | 01:43 | But I could combine the ID with
the Date, that is logically unique.
| | 01:49 | The same course may happen multiple times, and
we may start multiple courses on the same date.
| | 01:55 | But there's always only ever one
particular course on only one particular date.
| | 02:00 | So, SQL101 on the 1st of March is unique.
It's a legitimate composite primary key.
| | 02:05 | Now, the issue with second normal form is
that if you use a composite key, you need
| | 02:10 | to look closely at the
other columns in this table.
| | 02:14 | So, going along to my non-key columns, I have
CourseTitle, SQL Fundamentals, Room 4A,
| | 02:21 | Capacity is 12, there are 4 seats available.
| | 02:24 | A lot of this information would be unique to this one entry,
this one course on this particular date. That's fine.
| | 02:32 | But second normal form asks that all of my
non-key columns, everything that isn't part
| | 02:37 | of the key, so Course Title, Room, Capacity, Available,
they all have to be dependent on the entire primary key.
| | 02:44 | Now, that is the case for Room
and Capacity and Available.
| | 02:48 | These are unique values based on the fact
that we're running this particular Date, this
| | 02:53 | particular Course, and this particular
room with a certain number of seats available.
| | 02:57 | It will always be different.
| | 02:59 | But CourseTitle, well, I could
get that just from half of the key.
| | 03:05 | I could get that just from
the first part of the key.
| | 03:08 | It has no connection to the Date whatsoever.
SQL Fundamentals will always be based on SQL101.
| | 03:15 | It doesn't matter if it's
being run in March or April or May.
| | 03:18 | Now, this might sound a little bit ivory tower.
But here would be the impact.
| | 03:23 | What happens if somebody reached into this
table, and they changed that Course ID, but
| | 03:28 | they didn't change the title?
Now, we've got a conflict.
| | 03:32 | We might have the wrong title
for the wrong piece of data.
| | 03:35 | That's because my data now isn't in second
normal form, and we're trying to fix that
| | 03:40 | conflict from ever happening.
So, how do we fix it?
| | 03:43 | Well, once again, we're
going to rip out the CourseTitle.
| | 03:46 | We're going to create a separate Courses table, where we
want to again map the ID of the course into its own row.
| | 03:54 | So, we'll always have one
specific title for one specific ID.
| | 04:00 | And then we create a one-to-many
relationship between Events and Course.
| | 04:04 | And removing that from the Event table
means that everything in that table is now based
| | 04:09 | on the entire key, particular course, at a
particular date which may have a different
| | 04:13 | room or different capacity,
different number of available seats.
| | 04:17 | Now, here is the great thing.
| | 04:19 | If you are not using composite keys,
second normal form isn't even a concern.
| | 04:24 | You can step ahead, go right through second normal
form and into the next one, third normal form.
| | 04:29 |
| | Collapse this transcript |
| Third normal form| 00:00 | Now, let's take a look at the third normal form.
| | 00:03 | Well, as plainly as this can be described,
it's that no non-key field, meaning, a column
| | 00:09 | that is not part of the primary key
is dependent on another non-key field.
| | 00:14 | It is in a way similar to second normal form.
| | 00:17 | Second normal form asks can I figure out any of the
values in this row from just part of the composite key?
| | 00:23 | While third normal form asks can I
figure out any of the values
| | 00:27 | in this row from any of the other values in this row?
And I shouldn't be able to do that.
| | 00:33 | Let's take a look at an example.
| | 00:35 | I've got this updated version of the Events
and Courses table from the previous example.
| | 00:40 | So, it's in both first normal form, it doesn't
have any repeating values or repeating groups,
| | 00:45 | and it's in second normal form.
| | 00:47 | Meaning, there's no part of this that's
dependent on just on a piece of the key.
| | 00:54 | What I need to do for third normal form is look at
my non-key fields, Room, Capacity, Availability.
| | 01:02 | If I scan the entire row, let's take the first row,
we've got SQL101 course occurring on the 1st of March.
| | 01:08 | There is apparently 4 seats available.
It's in Room 4A with a capacity of 12.
| | 01:13 | Now, this is at a first look at it perfectly
acceptable, because this course could be being
| | 01:19 | scheduled in a different room every time
with a different number of available seats as we
| | 01:24 | start to sell different seats for
a particular date. That's all okay.
| | 01:29 | Here's the problem. It's between Room and Capacity.
| | 01:33 | These are both non-key fields.
These columns aren't part of the primary key.
| | 01:38 | But if I look down the column for Room, I see 4A has
12 seats capacity, 4A has 12 seats, 7B has 14 seats.
| | 01:47 | So, if every time we're in Room 4A, we
always have 12 seats or every time we're in 7B,
| | 01:53 | we always have 14 seats.
I don't need to repeat that information.
| | 01:57 | I could figure out capacity
from Room and Room alone.
| | 02:01 | I have one non-key field that is
based on another non-key field.
| | 02:05 | So, we don't need these to
be stored in the same table.
| | 02:08 | What we need to do is, you guessed it, split
some of this information out into its own table.
| | 02:14 | So, we need to pull out Capacity from
the Event table, and just keep Room.
| | 02:19 | And that's as long as Room will always tell
us a fixed capacity, we'd create our own table
| | 02:24 | for it, 4A always has 12,
7B always has 14, and so on.
| | 02:29 | Now, we're in third normal form, no non-key
field is dependent on another non-key field.
| | 02:38 | Now, as you're seeing, it's all
about the redundancy of the information.
| | 02:43 | This is what we're trying
to do with normalization.
| | 02:45 | Now, another example of third normal form
would be something like this, which is very common.
| | 02:52 | Let's say we've got an OrderItem table, which
is calculating different parts of an invoice.
| | 02:57 | So, it has a ProductID with a
Quantity, a UnitPrice, and a Total.
| | 03:02 | Now, you don't have to worry about how
this might relate to different tables.
| | 03:06 | All I'm interested in looking at is this part.
| | 03:09 | We've got Quantity for UnitPrice of $10,
Total is $40. Here is the issue.
| | 03:16 | We can see that Total is based
purely as on Quantity times UnitPrice.
| | 03:20 | Now, Quantity and
UnitPrice are both non-key fields.
| | 03:25 | So we're figuring out Total from
these other two non-key fields.
| | 03:29 | We don't need to do this.
We don't need to store this in the database.
| | 03:33 | We don't want to store information in your
table that's easily ascertained by adding
| | 03:38 | other non-key fields together,
or in this case multiplying them.
| | 03:41 | One of the main reasons for
this is to prevent any conflicts.
| | 03:45 | If in this example I have a row that says
we have a Quantity of 4 and the UnitPrice
| | 03:49 | of 10, but the Total says 50,
well, where is the problem?
| | 03:53 | There is a problem. How do we do it?
How would we fix it?
| | 03:56 | Is the Total wrong or is the Quantity wrong?
Your data doesn't make sense anymore.
| | 04:00 | So, we would remove that
Total column form this table.
| | 04:04 | We can figure it out when
we need to figure it out.
| | 04:06 | Now, third normal form will help you
figure out these potential problems.
| | 04:10 | Now just a quick side bar, in cases like
this where you might find a total useful in the
| | 04:16 | table, many database systems offer you the
option of defining a computed or calculated column.
| | 04:22 | It's not actually stored in the database,
it is a convenient read-only fiction.
| | 04:28 | Its value is automatically calculated based
on the other columns in the table, and you
| | 04:32 | may find that useful from time to time.
| | 04:37 |
| | Collapse this transcript |
| Database denormalization| 00:01 | So, we should always take our database design
through the first, second, and third normal forms.
| | 00:07 | There are more criteria available.
| | 00:09 | There are fourth, fifth, and sixth normal forms.
There's something called Boyce-Codd normal form.
| | 00:15 | But taking it to third normal form is the
usual expectation in a business environment,
| | 00:19 | and certainly all we need to
cover in a course like this one.
| | 00:22 | Now, you will actually find a lot of tables
out there intentionally break normalization
| | 00:27 | rules and some others seem like they do
but they actually don't. Here's one example.
| | 00:33 | Let's say we've got an Employee table,
and I'm storing an Email and a Phone number.
| | 00:37 | Well, what happens if I want to add
another email address, another phone number?
| | 00:42 | Technically, this can be
described as breaking first normal form.
| | 00:45 | It's a repeating group.
| | 00:47 | But in practice, you may find it more
convenient to just allow an Email and Email2 column or
| | 00:54 | perhaps a HomePhone and MobilePhone column
rather than splitting everything out into
| | 00:59 | multiple tables and having to follow
relationships every single time you read or write this data.
| | 01:06 | This will be referred to as
a de-normalization decision.
| | 01:09 | You're consciously making the choice that
something could be normalized out into another table.
| | 01:15 | You could follow the official rules.
| | 01:17 | But for convenience and/or for
performance, you're not going to.
| | 01:21 | However, if in this case I needed to
support a flexible number of email addresses
| | 01:26 | or flexible number of telephone numbers or
associate extra data along with these, then absolutely,
| | 01:31 | you should split these out into their own table.
But it's an important distinction.
| | 01:37 | Normalization is concerned about the quality
of your data and reducing redundancy with it.
| | 01:42 | But it isn't primarily a performance decision.
| | 01:45 | And because normalization often involves splitting data
into multiple tables, performance can occasionally take a hit.
| | 01:52 | And sometimes, you'll choose to
de-normalize for performance improvement.
| | 01:56 | Now, one example that can seem like a normalization
and/or de-normalization issue but really isn't
| | 02:04 | any table that's full of address information.
This situation can be a little deceptive.
| | 02:09 | If I look at a table like this, and I can see I've
got Zip code being stored as the last column here.
| | 02:16 | Theoretically, I could figure out what the
City, and the State are just from the Zip Code,
| | 02:21 | if I separated them
out into their own table.
| | 02:24 | So technically, I have non-key fields, City
and State that are dependent on another non-key
| | 02:30 | field, Zip, that could be
figured out from Zip alone.
| | 02:34 | However, this kind of case is not the full
story because while it might be true 99% of
| | 02:40 | the time that a Zip code maps to a particular
City or Town, there are some cases where multiple
| | 02:47 | towns or cities are allowed in the same zip
code, some Zip codes even cross multiple states.
| | 02:53 | Normalizing a table like this, thinking that
I've immediately spotted a non-key field dependency,
| | 02:58 | well, that would actually be taking it too
far and making things more inconvenient.
| | 03:04 | And the question is you really want to
understand your data before you can make all these choices
| | 03:09 | whether to normalize or de-normalize.
| | 03:11 | And you might de-normalize to make things a bit more
efficient, but do it knowingly instead of accidentally.
| | 03:18 | And these really are the three steps that we
would go through, first normal form, second
| | 03:23 | normal form, and third normal form.
| | 03:26 | First being about having no repeating values
and no repeating groups, second normal form,
| | 03:31 | no values based on just part of say half
of a composite key, and third normal form.
| | 03:38 | None of your non-key values should be based
on or determined from another non-key value.
| | 03:46 | Taking your database design through these three
steps will vastly improve the quality of your data.
| | 03:51 | I'm going to say that one more time.
| | 03:54 | Taking your database design through these three central
criteria will vastly improve the quality of your data.
| | 04:01 |
| | Collapse this transcript |
|
|
6. Database Modeling: QueryingCreating SQL queries| 00:00 | Time to get back to SQL.
| | 00:02 | As I mentioned in the first section of the
course, Structured Query Language is the shared
| | 00:07 | vocabulary of relational databases.
That's not for general purpose programming.
| | 00:12 | It's a small language focused purely on working
with databases and the first handful of keywords
| | 00:17 | that you'd learn maybe a dozen, will take
care of 90% of everything you'd ever do in SQL.
| | 00:23 | Now, we do have an entire course on SQL here at
lynda.com, it's Bill Wienman's SQL Essential Training.
| | 00:30 | But I would be remiss to have a
database course that doesn't talk about SQL.
| | 00:33 | So, in the next few videos,
I'll cover the basics.
| | 00:36 | Now, I said there are perhaps a
dozen very useful keywords in SQL.
| | 00:40 | But in terms of frequency, there's one
that's head and shoulders above all the others.
| | 00:45 | By far the most common word
you'll ever write in SQL is SELECT.
| | 00:49 | This is the word we start with when we
want to SELECT or read information from one of
| | 00:55 | the tables in one of our databases.
| | 00:57 | We're using it to ask the database
a question, and we expect a reply.
| | 01:01 | Actually, the better word
is not question but query.
| | 01:05 | A query as in the Q in SQL:
Structured Query Language.
| | 01:09 | So, I will begin this with SELECT.
| | 01:12 | But if I want to SELECT some data out of my
database, I'm going to have multiple tables
| | 01:17 | and multiple columns in each table.
| | 01:18 | So, I need to say what specific part of
this database am I interested in selecting?
| | 01:25 | The general format of a SELECT statement is
the word SELECT, what columns you're interested
| | 01:30 | in, then the word FROM what table?
SELECT and FROM are your SQL keywords.
| | 01:36 | They're part of the SQL language itself.
Now, the columns and table names are up to you.
| | 01:42 | It's whatever you named your
columns and your tables in your database.
| | 01:44 | So, if you're interested in getting some
information, say out of this Employee table, I'll just
| | 01:50 | pick one of the columns: FirstName, and the name of
the table itself, so SELECT FirstName from Employee.
| | 01:57 | Now, running or executing this query would
return all the values in that FirstName column,
| | 02:04 | from all the rows in the Employee table
whether that's one row or 100,000 rows or none.
| | 02:10 | Now, you often want multiple column values, so if you want multiple
columns, you separate them with commas. FirstName, LastName.
| | 02:19 | If I run that query, we get both columns,
again, for all the rows in that Employee's table.
| | 02:25 | If on the other hand, I want all the columns in
this table, I don't actually have to write them all.
| | 02:29 | I can use the asterisk instead.
| | 02:32 | So just SELECT asterisk or SELECT star is the
common phrase that you'll hear, from Employee.
| | 02:39 | And this would return every column
for every row in the Employee table.
| | 02:43 | It's basically saying just
show me everything in that table.
| | 02:46 | Now, because you often want to restrict
what is returned, optionally you'll add another
| | 02:51 | keyword to your query which is WHERE.
| | 02:54 | This is your WHERE clause, where
a particular condition is true.
| | 02:59 | So, WHERE LastName = Green or
WHERE EmployeeID = 474, Salary > 50,000.
| | 03:08 | In the next video, I'm going talk about
what's allowed in the WHERE part of the query, but
| | 03:13 | there are a couple things
you may be wondering already.
| | 03:17 | First, what about case
sensitivity is going on here?
| | 03:21 | What's going on with upper case and lower case?
How much does it matter?
| | 03:24 | Well, SQL is not case
sensitive generally speaking.
| | 03:28 | It really doesn't care if the word SELECT
here is written in upper case or lower case.
| | 03:34 | Now, by long convention over many years, most
developers will write the SQL keywords all uppercase.
| | 03:40 | It makes these statements easier to read.
| | 03:43 | Your table names and column names on the
other hand, should just match however they were
| | 03:47 | named in the database, whether that was,
lowercase, uppercase, mixture, or whatever.
| | 03:51 | Having said that, many Database Management
Systems don't even care about that and they
| | 03:55 | treat table names and column
names as case insensitive too.
| | 03:58 | Now, line breaks also don't matter.
SQL is not sensitive to white space.
| | 04:04 | You can split an SQL statement across
multiple lines, as I'm doing here for readability.
| | 04:09 | Now, officially, a complete SQL statement
should be ended with a semicolon, though in
| | 04:15 | practice, many Database
Management Systems just don't care.
| | 04:18 | But you'll see the semicolon on most
of what I write to make it explicit.
| | 04:22 | Currently, I'm selecting everything from
the Employee table, but one question you might
| | 04:27 | have is what if you have multiple databases?
| | 04:29 | How does it know which
database this table is in?
| | 04:33 | After all, I could even have several
databases containing tables with the same name.
| | 04:38 | If that's the case, I need to use not just
the table name but the name of the database
| | 04:42 | itself in my SQL statement, but depending
on the way you're writing it, the Database
| | 04:47 | Management System may assume that you're
already pointing to a particular database.
| | 04:52 | There's often a default database for the
database server if you don't name one explicitly.
| | 04:58 | But if I wanted to be explicit, I'd use the name of
the database, then a period, then the name of the table.
| | 05:03 | So, here I'm making the assumption I have
a database called HumanResources, and I'm
| | 05:07 | interested in the Employee
table inside that database.
| | 05:12 | This can be done a little differently depending on
the Database Management System, but it is usually this simple.
| | 05:17 | Now, right now, these statements are
only selecting from one table at a time.
| | 05:22 | We can write SQL to bring back information
from multiple tables in the same query, depending
| | 05:27 | on the relationships we've
defined, but that comes later.
| | 05:30 | First, we need to take a
closer look at the WHERE clause.
| | 05:35 |
| | Collapse this transcript |
| Creating the WHERE clause| 00:00 | When writing a basic query, coming up with
the SELECT and FROM parts should be fairly
| | 00:05 | easy to decide, simply what tables you're
interested in and what columns in that table you want to see.
| | 00:11 | But writing the WHERE part, where you're
specifying their condition or what's officially called
| | 00:16 | a predicate, well, that can give
you a few more options to choose from.
| | 00:19 | But writing a WHERE clause to go against a
particular table is a bit like writing an
| | 00:24 | if statement in a
conventional programming language.
| | 00:27 | You're trying to describe something
that at heart is either true or false.
| | 00:31 | So, if I want to go against this table and
bring back all the rows where people have
| | 00:35 | a certain LastName, it's
either true, or it is false.
| | 00:39 | It's the same if they have a certain Email
address or their Employee ID is in a particular
| | 00:44 | range, or they were hired after a specific date.
| | 00:47 | It's either true, they match that
condition or false that they don't.
| | 00:51 | And how you write this part, this WHERE statement,
kind of depends on the data that you're matching
| | 00:56 | to and how specific you want to be.
So, let's get through a few examples.
| | 01:00 | Right now, if I wanted to select all the rows
where the LastName of the employee was equal
| | 01:06 | to Green, well, this is the way I would do it.
| | 01:09 | And if it was text that I'm matching
on, I need to write that in quotes.
| | 01:12 | Now unlike most general purpose programming
languages, strings in SQL use single quotes,
| | 01:18 | not double quotes here.
I'll say that again, because it's important.
| | 01:22 | String values in SQL are
surrounded with single quotes.
| | 01:26 | It is one of those situations where some
Database Management Systems will be flexible and they'll
| | 01:32 | allow you to use either single or double quotes,
but if you're talking classic SQL, it's single quotes.
| | 01:39 | Also notice if you're coming from a C-based
language, we don't use the double equal sign
| | 01:44 | here for equalities just single equals.
| | 01:47 | We also don't need to surround our condition, our
predicate in parenthesis, though you can, if it helps.
| | 01:54 | By default, in most database systems
this comparison would be case insensitive.
| | 01:58 | It doesn't matter if I use an uppercase G
or a lowercase G for green, and it wouldn't
| | 02:03 | matter if it was uppercase or lowercase G,
as it's actually stored in the database,
| | 02:08 | although this is something that can
be changed from database to database.
| | 02:12 | Most management systems offer the option to
make these kinds of checks case-sensitive.
| | 02:18 | If on the other hand I'm checking a number,
something that is stored as a numeric column,
| | 02:23 | EmployeeID, for example,
I don't need the quotes.
| | 02:26 | So I just directly ask if it's
equal to that particular value.
| | 02:30 | And if we're working with numeric options
like Salary, I have the comparison operators.
| | 02:35 | We have the operators that you might expect.
| | 02:38 | Greater than, less than, greater
than or equal to, less than or equal
| | 02:42 | to, and not equal is written with the less than
and greater than signs right next to each other.
| | 02:47 | So, in this case I'm asking for
all the rows WHERE Salary > 50,000.
| | 02:52 | And you can also use AND or OR to combine
multiple conditions in your WHERE clause.
| | 02:59 | So, in this case I'll say WHERE Salary > 50,000
AND the Department column value is equal to 'Sales'.
| | 03:06 | And particularly for those of you coming
from C-based languages, we actually use the word
| | 03:11 | AND or OR not vertical bars or ampersands.
| | 03:15 | So, in this case a perfectly acceptable way
to check if the value of Department is equal
| | 03:20 | to Marketing, or it's equal to Sales,
we're going to bring back all those rows.
| | 03:24 | Now if you wanted to check for multiple
values where you're interested in the same column
| | 03:30 | having several different options, in this
case Department being Marketing or Sales,
| | 03:35 | we can also use the word IN and then surround the
multiple values inside parentheses separating them by commas.
| | 03:43 | Because I'm checking for a text value, I
need to surround each independent value with the
| | 03:48 | single quote and just
separate them by commas here.
| | 03:51 | If on the other hand, what you're looking
to do is be a bit more flexible on matching
| | 03:55 | text, then instead of using the equal sign, which wants
an exact match, we can use the SQL keyword LIKE.
| | 04:02 | Now in a lot of other programming languages
or other ways of doing wild card searches,
| | 04:08 | you use a wild card character of an
asterisk, but in SQL, it's the percent sign.
| | 04:13 | So, I'm asking here where the last name column
is LIKE 'Green%' and then anything after this.
| | 04:19 | So this query would match all the rows with
Green in that column, but also Green Span,
| | 04:24 | Green Berg, Green Away, and so on.
| | 04:27 | The wild card can be placed at the end of
the text, it can be placed in the middle,
| | 04:31 | it can be placed at the start.
And it will match any number of characters.
| | 04:36 | If however, you're just looking to match
on a single letter, you could represent that
| | 04:40 | with an underscore in your LIKE statement,
which in this case would bring back anything
| | 04:45 | that matches on Smith with an I or Smyth
with a Y or any letter there.
| | 04:50 | However, do you understand that in large tables,
using LIKE can be quite inefficient as it's
| | 04:55 | more demanding on the database to check multiple
possible contents of a particular column value,
| | 05:02 | rather than just checking for pure equality,
which is a lot easier to do. What about null?
| | 05:07 | I've told you about the idea
of null values in the database.
| | 05:10 | Some columns can be set to allow null,
meaning the lack of a value, and often you want to
| | 05:16 | find out if that is the case or
not the case in your WHERE clause.
| | 05:19 | So, let's say this Employee table that I've
defined has a column called MiddleInitial,
| | 05:24 | and that has been set to allow null values
when people simply don't have a MiddleInitial.
| | 05:30 | If I wanted to find those people, I might
first think to write something like this,
| | 05:35 | WHERE MiddleInitial = NULL, but actually equals
is not a good word to use when we're checking
| | 05:41 | for the existence of null, because it's
not equal to NULL because NULL isn't a value.
| | 05:46 | It's not equal to anything.
What we want instead is the word IS.
| | 05:51 | So instead of MiddleInitial = NULL,
it's WHERE MiddleInitial IS NULL.
| | 05:56 | And this would bring back all the rows where
there had been no values stored in that column.
| | 06:01 | Conversely, if I'm just interested in the
rows that have a value in that column where
| | 06:05 | we can say instead IS NOT NULL and execute
that, and now we'd only get the rows where
| | 06:11 | that column actually had a value for each row.
| | 06:14 | And this covers the basics of the most
commonly used options in WHERE clauses.
| | 06:18 | I'm not trying to cover everything.
| | 06:20 | Take a look at SQL Essential
Training for a few more options.
| | 06:25 |
| | Collapse this transcript |
| Sorting query results| 00:00 | The way that the database will return the
results of your query is not always what you
| | 00:04 | want, so let's see how to sort those results.
| | 00:07 | In this simple example I have a Product table in
my database, and I've got a very straightforward
| | 00:13 | SELECT statement that's going to select
three of the columns Description, ListPrice, and
| | 00:16 | Color from the Product table,
meaning, give me all the rows.
| | 00:20 | I haven't even applied a WHERE clause on this.
| | 00:22 | So, get everything returned whether that's
a thousand rows or a million rows, but there
| | 00:27 | will be no inherent order in these.
| | 00:29 | The way that they are going to be returned
is currently more to do with the internal
| | 00:33 | structure of the database.
| | 00:34 | Now, it doesn't necessarily
match what I might find most useful.
| | 00:38 | So, let's say what I want to do is find
out the most expensive products that I have.
| | 00:43 | I would get that data in my results, but
not presented to me in an easy to scan way.
| | 00:48 | I'd like to see most
expensive first, cheapest last.
| | 00:51 | And I can do this with another optional
keyword in an SQL query, which I'll put right at the
| | 00:57 | end of this statement after the FROM clause.
| | 01:00 | So, I'm going to add the keyword Order By,
and it is written as two separate words.
| | 01:05 | The question is Order By what?
| | 01:07 | In this case, I'd like to order by the values in the
column called, ListPrice, whatever those values are.
| | 01:13 | So, I'll use the name of
that column, ORDER BY ListPrice.
| | 01:16 | Now by default, ordering is in ascending order,
which would mean the row with the smallest
| | 01:22 | ListPrice would be arranged first.
| | 01:25 | If I want to order by the most expensive first,
I need to make this descending order and to
| | 01:30 | do that, I just type in
the word DESC afterwards.
| | 01:34 | There is an ASC keyword for
ascending, but ascending is the default.
| | 01:39 | The results of these will come back, and
this time around, it's going to bring back the
| | 01:43 | same number of rows, but it will
order them by ListPrice descending.
| | 01:47 | Now you can also pick
multiple columns to ORDER BY.
| | 01:51 | So if in this example, I was writing a simple
SELECT statement to select all the rows from
| | 01:56 | the Employee table, in this case where a
Salary is greater than 50,000, because we can
| | 02:00 | do WHERE clause as well as an ORDER BY.
I'll do an Order By LastName, FirstName.
| | 02:07 | I didn't use the word DESC or ASC,
so they're both going to be ascending.
| | 02:11 | So, the results will come back, again,
it doesn't matter how many rows they were.
| | 02:16 | We're going to first be ordering by
LastName all the way through these results.
| | 02:21 | But then wherever the LastName is the same,
we're going to do a sub ordering within it
| | 02:26 | and in this case we'll order by FirstName.
| | 02:29 | So as you see a very simple format to start
to impose some kind of structure on the results
| | 02:34 | you're getting back in your query.
| | 02:39 |
| | Collapse this transcript |
| Using aggregate functions| 00:00 | Next up, we have a few more SQL keywords
to work with and these are all grouped under
| | 00:04 | the term aggregate functions, which
doesn't really suggest how useful they can be.
| | 00:09 | An aggregate or grouping function means it
will perform some kind of calculation on a
| | 00:13 | set of data that we describe,
but return a single value.
| | 00:17 | Now, what that single value is
depends on which aggregate function we use.
| | 00:21 | So, let's take a look.
| | 00:22 | Now, we've seen already how we can do a simple
SELECT statement like this one, and this will
| | 00:27 | return everything, all our columns, all our rows,
whether that's five employee rows or 5,000 or 500,000.
| | 00:36 | But what if that number itself was
the piece of information I wanted?
| | 00:40 | What if I just wanted you to know
how many rows are in this table?
| | 00:43 | I don't need anything else.
I just want to know how many rows.
| | 00:46 | Well, I can do that by using an
Aggregate function in SQL called Count.
| | 00:50 | And what I'm going to do is just
change the SELECT * to SELECT COUNT (*).
| | 00:57 | Count everything in the Employee table.
| | 00:59 | If I execute this, it will COUNT all the rows and just
return a single value just, in this case 547, let's say.
| | 01:07 | Now, you could of course use a WHERE clause
if you wanted to restrict the results to just
| | 01:12 | counting the number of employee rows that have a
Salary greater than 5,000, we get a different result.
| | 01:18 | That is all COUNT will do for you is just COUNT
the number of rows for this particular condition.
| | 01:24 | Now sometimes, however, the
COUNT isn't what you want.
| | 01:27 | In the previous video on Sorting, I used the
example of selecting Products and using the
| | 01:33 | Order By clause to sort them by,
in this case ListPrice descending.
| | 01:37 | But what if the only reason I was doing that
was to find out what the maximum ListPrice was?
| | 01:42 | Well, instead of using this Order By
descending and looking at the top row, I could instead
| | 01:47 | just do something like this. SELECT MAX.
| | 01:50 | Now instead of saying SELECT MAX with the
asterisk, saying SELECT MAX of everything,
| | 01:55 | I'm just focused on one particular column.
| | 01:57 | What's the maximum value of
ListPrice in the entire Product table?
| | 02:02 | In this case, we'll bring back $699.
| | 02:04 | Well, if we have MAX for maximum, it's a
pretty good guess that we're also going to have MIN
| | 02:09 | for minimum, and we do.
We also have A-V-G for average.
| | 02:13 | We would add up all the values in the ListPrice column
divided by the number of rows and return that single value.
| | 02:21 | While we've seen COUNT,
there's also an option of SUM.
| | 02:24 | SUM, instead of counting the
rows will total all the values up.
| | 02:29 | So, in this case with this statement, we're
looking for a CustomerID equal to 854 that
| | 02:34 | will find all the rows for that customer and
then add together all the values in the total
| | 02:39 | due column. And that's one of the great
things about working with Aggregate functions that
| | 02:44 | they really don't allow
an awful lot of complexity.
| | 02:47 | They are going to return a single number.
| | 02:49 | They need to be very
straightforward to customize.
| | 02:51 | However, we can take them one step further.
| | 02:54 | So, we've seen how we can use the idea of
SELECT COUNT (*) to get the number of rows.
| | 03:01 | We can use it in
conjunction with a WHERE clause.
| | 03:04 | In this case, COUNT up the number of rows where the color
is equal to 'Red', and we'll get some result back.
| | 03:10 | But if I wanted to know not just the COUNT
of Red Products or All Products, but I want
| | 03:15 | to know how many Red Products we have, how
many Black Products we have, Silver, Gold
| | 03:19 | and so on, well, I could create multiple statements
like this just changing the WHERE clause every time.
| | 03:25 | But that's not only tedious, it assumes I
know ahead of time what all the colors will
| | 03:30 | be, and we might be
adding new colors all the time.
| | 03:33 | So what I would like you to do is retrieve
both the COUNT and the Color column from that
| | 03:39 | table, but to group the results by Color.
| | 03:41 | Now, I might be tempted to just write
something like this, To select the COUNT, and the Color
| | 03:48 | column, but this will confuse SQL.
| | 03:51 | It can either give you the single COUNT of
all rows, or it can give you all the multiple
| | 03:56 | values in the Color column, but it doesn't
understand how to give you both at the same time.
| | 04:00 | However, what I can do is add another SQL
keyword, which is great for use with the Aggregate
| | 04:07 | functions, and that is GROUP BY.
So, I'm selecting here to GROUP BY Color.
| | 04:13 | If we run this SQL query, this is what
we're going to get or something very like this.
| | 04:19 | It will count up all the products for a particular
color, because we've told that to group by color.
| | 04:24 | A GROUP BY is something that only
makes sense with Aggregate functions.
| | 04:29 | You don't use GROUP BY otherwise.
| | 04:31 | Conversely on the other side, if you are
using an Aggregate function, you are pretty much
| | 04:35 | always going to use it by itself unless
you used GROUP BY, because the most you would
| | 04:40 | ever expect to return from your query is
one single value unless you are using GROUP BY
| | 04:45 | to categorize those results.
| | 04:50 |
| | Collapse this transcript |
| Joining tables| 00:00 | After a while, you're going to find that very
straight forward SQL statements where you're
| | 00:05 | just selecting a few pieces of data from a single
table, well, they end up being a little limiting.
| | 00:11 | If we've gone to all the trouble of defining
multiple tables to store our data, we do that
| | 00:15 | with the understanding that it will be possible to get
it all back using the relationships that we've described.
| | 00:20 | So, I want to have one SQL query not to be
limited to selecting from one table but to
| | 00:26 | be able to select from two different tables
or even three or more and the phrase we're
| | 00:30 | going to use is to join our tables together.
| | 00:33 | So, in the example I'm going to go through,
we have an Employee table and a Department
| | 00:38 | table, and there's a one-to-many relationship between
department and employee using the Department ID column.
| | 00:46 | Department ID is a foreign key and employee
is a primary key in Department, and it's just
| | 00:51 | so we don't store redundant
department details for each employee row.
| | 00:56 | Now the question is how would we
start to join these together in SQL?
| | 01:00 | Well, let's begin just by doing a
fairly regular SQL Statement.
| | 01:05 | So, I'm selecting a few columns, First Name,
Last Name, Hire Date, and Department ID
| | 01:10 | just from the employee table.
So, we've seen this one before.
| | 01:14 | No surprises there in the
results that we would expect.
| | 01:17 | So to start involving the other
table, the magic word here is JOIN.
| | 01:22 | So, after the Employee table, I'm going to
use the word JOIN and then say Department.
| | 01:26 | So, from Employee join Department.
| | 01:29 | This by itself isn't doing very much but we're
starting to add the necessary pieces bit by bit.
| | 01:36 | If I say I want to join these two tables together,
I can then start adding columns from the Department
| | 01:42 | table to the Select clause.
| | 01:45 | So, after Department ID, I'm going to say
common name, location that are both columns
| | 01:50 | in my Department table.
But immediately, we have a problem.
| | 01:54 | If I'm now selecting from two different tables,
it's really common that it will have a name conflict.
| | 01:59 | In this case, Department ID is going to give
us a problem here because Department ID exists
| | 02:04 | as a column in the Employee table,
and as a column in the Department table.
| | 02:09 | So, the SQL query would be very
confused, which one am I talking about?
| | 02:13 | And in this case, what I could do is just
to prefix that with the name of the table,
| | 02:17 | and it wouldn't actually really matter
which one we picked, but we need to be explicit
| | 02:23 | so that SQL doesn't get confused.
| | 02:26 | And if I wanted my SQL statement here to be
very explicit, I could do this for every column.
| | 02:31 | So, select employee.first name, employee.lastname,
employee.hiredate, department.name, department.location
| | 02:39 | and so on, but I still do have an issue.
| | 02:42 | I still couldn't run this query because I need to describe
exactly how these tables are to be joined together.
| | 02:49 | The way that we do that is to use the word ON,
and we'll use that in conjunction with JOIN.
| | 02:55 | So, it's Employee JOIN Department ON, and I name
the columns in each table and how they link together.
| | 03:03 | In this case, it is Department ID.
Employee.DepartmentID = Department.DepartmentID.
| | 03:09 | We've now described how the join happens.
| | 03:12 | If I run that statement, we'll get the results back
which are combining, joining these two tables together.
| | 03:19 | Now, one important idea here is in the
kind of join that we're doing right now.
| | 03:24 | It's only going to bring back rows where
there it is a match between the two tables.
| | 03:28 | So, if you notice in the actual employee
table at the top, the third row is Alice Bailey.
| | 03:34 | Well, Alice has a
Department ID column value of null.
| | 03:38 | She's not linking to Department.
| | 03:40 | So, what that means is when we do the join,
we will not get a row coming back for Alice
| | 03:44 | Bailey because it has to have a match, and that's
because what we're doing here is called an inner join.
| | 03:50 | So, I am using this JOIN keyword here, and
if I was to be good about this, and I usually
| | 03:56 | would be, I should use the words INNER JOIN
rather than JOIN, even though that is the default
| | 04:02 | kind of JOIN is an INNER JOIN.
| | 04:04 | Meaning, only bring back the rows
where there is a match in both tables.
| | 04:08 | So, we'll neither see the row like Alice
Bailey because she has a Department ID of null but
| | 04:14 | on the other hand nor will we see any of rows from
the Department table that don't have matching Employees.
| | 04:20 | In this simple example here, I don't have
anybody with the Department ID of one so I'm
| | 04:25 | never seeing the row that says Production
CA with the budget code of A4, but sometimes
| | 04:30 | you might want to start involving these other
rows that don't exactly match, and that would
| | 04:36 | be by creating something called an OUTER JOIN.
| | 04:40 | An OUTER JOIN means we're going to pick one of
the tables and say this one takes precedence.
| | 04:45 | We want to see all of the rows return from a particular
table and still show the matching data where possible.
| | 04:51 | Now, that might sound a little weird so let
me demonstrate what the difference would be.
| | 04:54 | So instead of using the INNER JOIN keywords,
I'm going to use OUTER JOIN, but I can't just
| | 05:00 | write OUTER JOIN I have to be explicit.
| | 05:03 | With an OUTER JOIN, you are typically saying one
of these tables takes precedence over the other.
| | 05:09 | We are interested in where they match but we
still want to get the results where they don't.
| | 05:13 | So, I would typically use the word left or
right, A LEFT OUTER JOIN or a RIGHT OUTER JOIN.
| | 05:19 | The left and right here simply means is that
the table on the left-hand side of the word
| | 05:24 | JOIN, which for us as employee, or it is the one on the
right-hand side of the word join, which will be department.
| | 05:30 | So, in this one, I'm going to do a LEFT
OUTER JOIN, and it's going to look to the left to
| | 05:34 | the word join and see employee and then
say that employee will take precedence.
| | 05:39 | Now, we get back very similar results but
we are getting back a row for Alice Bailey
| | 05:45 | even though Alice Bailey doesn't link to a
row in Department, we're getting that row back.
| | 05:49 | We're just getting null values
because there's nothing we could put there.
| | 05:53 | Conversely, what I could do instead of a
LEFT OUTER JOIN if I just change that LEFT OUTER
| | 05:57 | JOIN to RIGHT OUTER JOIN, we now put the
focus on the Department table, and we say we want
| | 06:02 | everything that matches but we want
everything from that Department table.
| | 06:07 | In this case, we would have rows coming back for the
production department which it doesn't have any associated employees.
| | 06:14 | We'd have rows coming back for the marketing
department which doesn't have any associated
| | 06:18 | employees, and we would still write out the
ones that match, and that's the RIGHT OUTER JOIN.
| | 06:24 | There is even a full OUTER Join which does both.
| | 06:27 | I'm a little pressed for space here so I
want show that but as you might imagine, it says
| | 06:32 | to include all rows from both tables in the
results matching them wherever possible but
| | 06:38 | returning rows with null values
when there's no match between them.
| | 06:42 | Now, as you start to read more and more SQL,
you probably come across a couple of different
| | 06:46 | ways that joins can be written, but this is the
recommended NC Sequel standard way of doing it.
| | 06:56 |
| | Collapse this transcript |
| Inserting, updating, and deleting| 00:00 | So far, we've been focused on using the Select statement
to read information out of a database, but that begs the question,
| | 00:06 | how did the data get in
there in the first place?
| | 00:09 | We have several different keywords in SQL for
inserting, updating, and deleting information.
| | 00:14 | Now, this goes back to the idea that any
computer system, not just databases that deals with
| | 00:20 | storing data needs to provide four fundamental
functions, the ability to Create, Read, Update,
| | 00:27 | and Delete often using the acronym CRUD.
| | 00:29 | Well, if you want to work with CRUD in
SQL, we don't use these exact words here.
| | 00:35 | There are a couple of them that are different.
| | 00:37 | We know already that when we want to read
information out of the database, we use the
| | 00:41 | SQL keyword SELECT, and if we want to create,
we're going to use the SQL keyword INSERT.
| | 00:48 | Now update and delete there's no conflict there.
They are just UPDATE and DELETE in SQL.
| | 00:53 | So, let's take a look at the
ones we haven't covered yet.
| | 00:56 | I'm going to talk about inserting first.
| | 00:58 | So, the assumption is we're inserting a new
row of data into a particular table and the
| | 01:03 | general format is INSERT INTO, then your
table name, then a list of columns and then the
| | 01:10 | values that you want to
insert into those columns.
| | 01:13 | So, if we're going to insert into this
simple imaginary Employee table, I'll use the name
| | 01:18 | of the table, and then inside parenthesis, the name of
the columns that I'm going to provide values for.
| | 01:25 | Now, you typically do this because you don't always
provide data for every single column in a new row.
| | 01:32 | Some of these columns might be null.
| | 01:34 | Some columns might provide their own
default values and quite often you might have some
| | 01:38 | columns that are automatically generating
primary keys, and you certainly don't want
| | 01:42 | to try and insert your own
values in those situations.
| | 01:46 | So, you name the columns
you're going to provide values for.
| | 01:50 | And then after the keyword
values, you provide those values.
| | 01:54 | If there are strings they need to be in single quotes,
if they are numeric key, you don't need the quotes.
| | 01:58 | The columns, and you values must
match in number, in order, and type.
| | 02:03 | You can't say you're going to provide five
columns and then only provide values for four.
| | 02:09 | And also the data types must match.
| | 02:11 | If the column is defined as a number, you
can try and insert a string into that column.
| | 02:16 | So, if I go ahead and execute this, what it's
going to do is create a new row in that table.
| | 02:22 | Now, I'm making the assumption here that the
way the database has been defined, Employee
| | 02:27 | ID will be an automatically-generating primary key.
| | 02:30 | So, we didn't need to provide
a value there. Hire Date?
| | 02:34 | We probably could have provided one, but I'm
imagining that it was set up to do a default
| | 02:39 | value of the today's date
for this row to be created.
| | 02:42 | And let's imagine that email was
allowable to have null values there.
| | 02:46 | So, if we didn't provide one, it's just null.
Now of course, here's the thing.
| | 02:51 | We don't have to rank manual insert
statements for every new row in our database.
| | 02:57 | It's much more likely that this
process will be done programmatically.
| | 03:00 | The people will be using a more pleasant user
interface to enter data, but this is the kind
| | 03:05 | of insert statement that's
going on behind the scenes.
| | 03:09 | Well, we know we can read information out of the
database, but what if we want to change something?
| | 03:14 | It's quite often that that we'll change
information in the database once it has been created,
| | 03:19 | and we use the update statement for that.
The format here is UPDATE, table name.
| | 03:24 | You're going to use the keyword SET, and
you're going to change a particular column.
| | 03:28 | So, what column and what value do you want it to be
and then almost always you'll be using a WHERE clause.
| | 03:35 | This is exactly the same as doing where in
the select statement that allows you to limit
| | 03:40 | the amount of rows that
this update will apply to.
| | 03:42 | So, an example would be, if we have that row
that we just inserted, and I wanted to update
| | 03:48 | it to change the email column,
I'd say UPDATE the Employee table.
| | 03:53 | I'm going to set the column code Email equal to,
provide the string value, and then the WHERE condition.
| | 03:59 | In this case, the easiest way would be
employee ID equals 734, and if we run that it will
| | 04:04 | reach directly into that row,
and change that one value.
| | 04:08 | Now technically, you don't have to have a
WHERE clause, but just as leaving off a WHERE
| | 04:14 | clause in the Select statement would return
all the rows in that table leaving off a WHERE
| | 04:18 | clause in an update statement would mean that you would
set this column value for every single row in that table.
| | 04:24 | Now, occasionally, that is what you want in
an update, but more often you will restrict
| | 04:28 | your updates to one row or a handful of rows.
| | 04:31 | So, we use WHERE clauses for that and
using where in an update, it's exactly the same
| | 04:36 | as using it in a select. Finally, DELETE.
| | 04:39 | Well, DELETE is actually the simplest
because it's all the about the where.
| | 04:43 | The format is DELETE FROM,
particular table name, WHERE condition.
| | 04:48 | We don't have to name any columns in this
query because delete just works on rows.
| | 04:53 | If you delete a row, you delete an entire row.
You can't delete part of a row.
| | 04:58 | So, there's no point in
naming any of the columns here.
| | 05:00 | If you just want to blank out some
information in an existing row, well, that's an update.
| | 05:05 | So, that WHERE clause and the DELETE just defines
which rows and which table you're going to delete.
| | 05:11 | So, we'd say to DELETE FROM Employee, WHERE.
In this case employee ID equals 734.
| | 05:18 | This is about as specific
as we could possibly get.
| | 05:21 | Now, you can delete multiple rows at a time
the same way that you could use a where clause
| | 05:26 | to select the multiple rows at the time,
but for heaven sake take care with your delete
| | 05:31 | statements, almost everyone who has worked
with the databases for a while has a horror
| | 05:35 | story about the DELETE statement
that was written a little to casually.
| | 05:39 | Because if you write something as simple as
this and execute it, the database management
| | 05:44 | system is just going to say, "Well, you're
the boss and go ahead and delete every single
| | 05:49 | row in this employee table."
It won't ask, are you sure?
| | 05:53 | It won't ask you to confirm anything.
It will just do it.
| | 05:57 | DELETE is a very powerful keyword.
| | 06:00 | Now a good practice when working with DELETE
and also when working with update is to create
| | 06:05 | it first as a select statement.
| | 06:07 | So in this case, I'll do a select staff,
an employee where employee ID equals 734 and
| | 06:12 | just confirm that the result I would expect
back, in this case one row, well, that's what
| | 06:18 | would also affect if I was doing an update, and
that's what I'd affect if I was doing a delete.
| | 06:22 | So, I'll create it as a select first
and then change the select box to delete.
| | 06:29 | Execute that, and we go to bit better faith
that what we're actually affecting is just one row.
| | 06:34 | So, to good practice to make sure the where
clause is only selecting the rows you expect
| | 06:39 | before you turn it into a
delete or an update statement.
| | 06:44 |
| | Collapse this transcript |
| The data definition language| 00:00 | Now the majority of SQL that you'll write
will be things like SELECT statements,
| | 00:05 | INSERTS, UPDATES, and DELETES
but that's not the whole picture.
| | 00:09 | These keywords are sometimes
lumped together as the DML part of SQL.
| | 00:14 | DML standing for Data Manipulation Language,
and that's an impressive sounding phrase for
| | 00:20 | something that's really not all
that important of distinction.
| | 00:22 | It simply means that part of the sequel language that deals
with working with and manipulating data in your table.
| | 00:30 | So, selecting it, inserting it, updating it,
deleting it, but none of these words actually
| | 00:35 | allow us to alter the
structure of the database itself.
| | 00:38 | So, how do we create a new table?
How would we say what columns exist?
| | 00:43 | How would we define primary keys?
We can't use any of these words for that.
| | 00:47 | But that can be done in SQL using a part of
the language that is lumped together under
| | 00:52 | the phrase DDL or Data Definition Language.
| | 00:57 | The three key words there
are Create, Alter, and Drop.
| | 01:02 | These words let us change the
structure of the database itself.
| | 01:06 | Let's take a look at a couple of them.
| | 01:08 | If I want to write SQL to create a new
table, this is the format that I'll use.
| | 01:13 | The word CREATE, and then the table name
you just make it up, it's whatever you want
| | 01:17 | to call it and then in parentheses,
the definition of your columns.
| | 01:21 | This is after all what a table is it's simply
naming the columns and giving them data types
| | 01:27 | in any rules they must comply to.
| | 01:29 | So, if I wanted to create a new employee table,
I would say CREATE Employee and then inside
| | 01:34 | parenthesis, here's how I
would provide column definitions.
| | 01:38 | We start off with EmployeeID as
an INTEGER which is PRIMARY KEY.
| | 01:43 | And these are key words we're using to tell SQL the
importance of and the rules these columns must comply to.
| | 01:51 | I'm not going to get into the specific of
these for reasons I'll cover in just a moment,
| | 01:56 | but the general format should be at least
understandable about what this would do.
| | 02:00 | It's going to create a new table with five
columns, an integer, three VARCHAR, and that
| | 02:06 | means the variable length character field.
| | 02:08 | I can recognize even
phrases like NOT NULL or NULL.
| | 02:11 | So, it looks like we don't have to put anything
in the department column and then we're finishing
| | 02:15 | off with salary as an integer.
| | 02:17 | Now, obviously in a production database,
these can be far more complex than this but this
| | 02:22 | is the basics of how it's done.
| | 02:24 | Now, if we move on a little bit and realize
we need to change the definition of it, we
| | 02:28 | can use the key word, ALTER.
Now, typically would alter a table.
| | 02:32 | If you give it the table name and then you give it a
bit of information about what you're trying to change.
| | 02:37 | In this case, I'm saying
I'm adding a new column.
| | 02:40 | That column is called Email.
| | 02:41 | It's going to be a variable length
character string of 100 characters.
| | 02:47 | If I have a problem with what I created earlier, I
can use the last of the three keywords which is DROP.
| | 02:52 | And we would simply say
something like DROP TABLE Employee.
| | 02:57 | We wouldn't need any WHERE
clauses, any column definitions.
| | 03:00 | We're simply saying destroy the entire table, again,
something to be very, very careful about working with.
| | 03:06 | Now, I wanted to introduce these phrases,
but I'm really not going to spend much more
| | 03:10 | time on these words here for two reasons.
| | 03:13 | First, many people who work on the actual
design and structure of the database will
| | 03:18 | use a visual application like MySQL
Workbench or Sequel Server Management Studio, rather
| | 03:24 | than hand coding create an Alter statements.
| | 03:28 | Second, if you do need to go into more detail,
take a look at our SQL Essential Training course.
| | 03:33 | It's true that if you're in a database
administration role you all might then spend a lot of time
| | 03:38 | with these kinds of data definition statements,
Create, Alter, and Drop, but regular day to
| | 03:44 | day developers working in
these databases typically don't.
| | 03:48 | You shouldn't be spending most of your
time changing the definition of the database.
| | 03:52 | You should be spending most of your
time manipulating the actual data in it.
| | 03:57 | So, that's why we're not going to focus on that.
| | 04:00 | And it's worth mentioning that there are a
couple of other keywords in SQL that are typically
| | 04:04 | lumped together under the
idea of data control keywords.
| | 04:08 | This is where you can either grant or
revoke permissions for people in the database.
| | 04:13 | This is not an area we're going to cover
in this course because that really gets into
| | 04:18 | the specifics of things like the operating
system and the authentication schemes that
| | 04:23 | you're using for your particular database.
So, it's just not general content.
| | 04:28 | You'll get into that for a
specific database management system.
| | 04:32 | But it's worthwhile to know that SQL
itself is broken up into these larger categories
| | 04:38 | of the different things
you're likely to do with it.
| | 04:40 | Most of your time manipulating data, some of
the time defining the actual database itself,
| | 04:45 | and a little bit of the time controlling
who has access to different parts of it.
| | 04:50 |
| | Collapse this transcript |
|
|
7. Database Modeling: Indexing and OptimizationUnderstanding indexes| 00:00 | As your database starts to grow, there's a part of the
database design that becomes more and more important,
| | 00:05 | and that's creating and using indexes.
| | 00:08 | An index center database is like
an index at the back of a textbook.
| | 00:12 | It will help you find things in that book.
| | 00:15 | Say if I have a 500-page book on database
management, and I want to find the content
| | 00:20 | dealing just with date columns, I can
look in the back in the index for date.
| | 00:26 | It tells me that's on page 124,
and I turn directly to that page.
| | 00:30 | Indexes are all about speed of access.
| | 00:33 | I could have found that content by
going through the book page by page.
| | 00:37 | So the index isn't adding any content.
It's just helping me find it.
| | 00:40 | Here's the most essential ideas.
| | 00:42 | See, the data in your tables is
inherently unstructured and unordered.
| | 00:48 | Sure, all the data in one row is kept together,
but there's no particular enforced order to
| | 00:54 | those rows unless you say otherwise.
| | 00:56 | And yes, when we write queries, we can use
the order by clause to impose a sequence on
| | 01:01 | our results, but that's not the issue here.
| | 01:04 | The issue is getting directly to certain content like
getting directly to one specific row in a large table.
| | 01:12 | You might think, well, hang on a second.
| | 01:14 | Didn't we take care of that by adding a
primary key to our table? Well, yes and no.
| | 01:20 | The primary key does let us uniquely
identify a row, but that's a very different question
| | 01:25 | from does it let us find that row really fast when
it's somewhere in the middle of 500,000 other rows.
| | 01:32 | That's the problem we're
trying to fix with indexes.
| | 01:35 | Like with reading a textbook, I don't want to
have to leaf through every single page every time.
| | 01:39 | I want to be able to jump
directly to the relevant point.
| | 01:43 | So we need an index.
| | 01:44 | The most basic, the first index that's created,
the primary index on any table is what's called
| | 01:51 | the clustered index, and that means pick a
column as the clustered index and the database
| | 01:57 | will order the data in that
table based on that column.
| | 02:02 | Meaning that on that physical disk itself,
where these rows of data are stored as bits
| | 02:07 | and bytes, they're actually sequenced that way.
| | 02:10 | So the clustered index is almost like having
a residential phonebook or White Pages where
| | 02:15 | the clustered index in a
phonebook would be last name.
| | 02:18 | So everyone is listed in the book
in alphabetical order by last name.
| | 02:22 | The actual data is
structured and presented that way.
| | 02:26 | You don't have to look in the back of the book
to find out where the names beginning with D are.
| | 02:30 | You know where they are.
| | 02:31 | They're towards the front between the Cs and
the Es, because the data is ordered that way.
| | 02:37 | The most common clustered index in any database
table is the primary key column, whether that's
| | 02:43 | an automatically generated integer
or a unique string or anything else.
| | 02:48 | Actually, most database management systems
when you define a new table will automatically
| | 02:54 | make the primary key the
clustered index unless you say otherwise.
| | 02:59 | That is usually the best option, because we
already chose the primary key as a significant
| | 03:04 | way to get to the individual rows.
| | 03:06 | But if you found yourself accessing the
data primarily using another column, you might
| | 03:12 | change that column to be
the clustered index instead.
| | 03:14 | See, each table can only
have one clustered index.
| | 03:19 | The same way the names in a phonebook can
only be sorted one way, but what you can have
| | 03:24 | more of is non-clustered or unclustered indexes.
Here's the situation.
| | 03:30 | I have a basic customer table here with a
primary key on the CustomerID column and the
| | 03:35 | clustered index is also on that column.
| | 03:38 | So if I write a query that uses that, I
select where CustomerID equals query, it's going
| | 03:44 | to be very fast together.
There are two things going on.
| | 03:47 | The database engine knows how to identify
that row, because it's a primary key, but
| | 03:52 | it knows how to get to that row quickly,
because it's a clustered index, but on the other hand,
| | 03:57 | if I figure out that we often also
access this table using last names,
| | 04:01 | so if I write a lot of queries that looks
something this, select Staff and Customer
| | 04:05 | where LastName equals Smith, well, that's a different matter,
because the database management system has no idea physically
| | 04:13 | where on this table where that
particular last name might turn up.
| | 04:18 | That means a query like this would require
the database management system to start at
| | 04:23 | the first row and read through every single
one, row by row looking for that piece of data.
| | 04:30 | This is sometimes called a full table scan.
| | 04:33 | It's a very inefficient way to get to your data,
and the more rows you have, the worse it gets.
| | 04:38 | It wouldn't be bad with a dozen or a few
hundred rows, but once you get to thousands, tens
| | 04:44 | of thousands, hundreds of
thousands, it's highly inefficient.
| | 04:47 | If I have to do this kind of query a lot where
we're actually querying on an unindexed column,
| | 04:53 | it's going to be a problem.
Here's one way I can fix it.
| | 04:56 | I can create a secondary
index, a non-clustered index.
| | 05:02 | This is more like having an
index at the back of the textbook.
| | 05:06 | It's created as a separate piece by
itself with its own meaningful order.
| | 05:11 | So in this case, I'm imagining I have this
index existing, it's sorting now by last name
| | 05:16 | ascending, the way that we can't actually do in the
regular table, because we're already sorting by CustomerID.
| | 05:23 | We're basically creating a map
here of how we can go to LastName.
| | 05:27 | In this case say Smith,
find that particular name.
| | 05:31 | It's a much easier to get to in the index,
because the last name is ordered alphabetically.
| | 05:37 | Then we find the location of that in the
regular table, and we can jump directly to that place.
| | 05:43 | Now it's not quite as quick as using the
clustered index as we still do have to read from one
| | 05:49 | place then jump to another, but it's
much quicker than a full table scan.
| | 05:54 | Very useful if you know there's a column
you're going to use a lot in your queries.
| | 05:59 | A common thought from people new to
database design at this time is usually, well,
| | 06:04 | why don't we just index everything?
| | 06:06 | Why don't we add a non-clustered index to every
column so whatever query we write is nice and fast?
| | 06:13 | That is a really bad idea. Here's why.
Every index has a cost.
| | 06:20 | Indexes are a benefit when reading data, but
they're a detriment when writing or changing it,
| | 06:26 | because they must be maintained.
| | 06:28 | So let's say, I've got a table here, pretty typical employee
table, and I've got two non-clustered indexes on this table.
| | 06:37 | So the clustered index, the way the data
is actually sorted is based on EmployeeID.
| | 06:42 | And I have a non-clustered index on LastName,
and a non-clustered index on the FirstName.
| | 06:48 | Well, this is pretty good.
| | 06:49 | It means when we do a select, I can go by
EmployeeID, I can go by FirstName, I can go
| | 06:53 | by last name, and that will be quite fast.
| | 06:56 | But when we write to this table, when we insert a row,
we're not only writing data into the table itself.
| | 07:03 | And if your clustered index is on an
automatically incrementing number we just write to the end
| | 07:08 | which is easy, but every insert we do is going to require
another change, a write into every non-clustered index.
| | 07:16 | If I have added this new row of
Brenda Daniels, I'll add it to the table,
| | 07:20 | I now need to go ahead and make a change to
the non-clustered index for LastName to insert
| | 07:25 | Daniels in there and point to the
EmployeeID or the location that she's at.
| | 07:30 | Then do another insert to make sure that Brenda is
inserted on the non-clustered index for FirstName.
| | 07:35 | So if you started adding even more of this.
| | 07:38 | If you have say five non-clustered indexes
as well as your clustered one, that means
| | 07:43 | that one insert which started off as one
physical write operation to the database disk is now
| | 07:49 | six write operations, and you have made
things significantly more inefficient.
| | 07:54 | That's why you don't put indexes on everything.
| | 07:57 | You only put them on the columns that
you really need and use all the time.
| | 08:02 | It's far better to do an occasional table scan
than have multiple indexes that you don't really use.
| | 08:09 | Now one of the reasons I don't talk much about
indexing up front in the early modeling section
| | 08:14 | of the course is that it's an easy mistake to think
that you decide your indexes up front and then you're done.
| | 08:20 | Well, it's true you often do select your
primary clustered index then with your primary key,
| | 08:26 | but working with indexes as
a much more ongoing thing.
| | 08:30 | Because you often can't tell what the best
non-clustered secondary indexes will be until
| | 08:36 | your database has been in use for a while,
and you figure it out how people are really
| | 08:40 | using it and what kind of queries they're
really running rather than how you hoped or
| | 08:44 | thought that they'd use it.
So indexing is a trade-off.
| | 08:48 | It improves read performance, but at the expense
of writing to the database and also size as
| | 08:54 | every index will require
its own space on the drive.
| | 08:58 | In a large database, indexing is something
the database administrator may spend a lot
| | 09:03 | of time working on as the system grows.
| | 09:06 | But there's one good thing, changing and
tweaking database indexes is something that can be
| | 09:12 | done without breaking any
applications that use that database.
| | 09:16 | Because as long as you're not changing column names,
all your existing queries should work exactly the same.
| | 09:21 | They'll just get faster. Hopefully.
| | 09:26 |
| | Collapse this transcript |
| Understanding write conflicts| 00:00 | Earlier in the course I talked
about transactions and the ACID test.
| | 00:03 | Now it's time to cover a
little bit more about them.
| | 00:06 | If you need to refresh your memory, take a
look at that earlier section where I talk
| | 00:10 | about the characteristics of a transaction, because here
I'm going to focus just on one part, the idea of isolation.
| | 00:17 | The idea that when simultaneous users are
interacting with the database at the same
| | 00:22 | time, how do we stop them from
treading on each other's toes?
| | 00:25 | This is the inherent problem of
concurrency, or concurrent users.
| | 00:30 | Well, first, let's understand how they might
conflict with each other. Here's an example.
| | 00:36 | We own a small bank, and we've got Alice and
Bob who are two customers in our system and
| | 00:40 | they have three accounts, a joint account,
and a separate account each for incidentals.
| | 00:46 | These are each a different row in this account table,
about as simple as I could possibly make it here.
| | 00:53 | Every so often, Alice and Bob want to transfer
money from their joint account into the individual one.
| | 00:59 | Every time they do, the
process goes something like this.
| | 01:02 | I'm going to assume that
Alice is doing this transaction.
| | 01:05 | I'm just going write pseudo code here, because
it's not the SQL I want to talk about, it's the steps.
| | 01:10 | We get the balance of the joint account.
We reach into the table.
| | 01:14 | Okay, joint is $10,000.
| | 01:15 | We then get balance of
Alice's account which is $50.
| | 01:19 | Then we'll perform an operation to update
the balance of the joint account to what I
| | 01:23 | just retrieved which was $10,000 minus $1,000.
| | 01:26 | So we're reaching into the database,
setting it to $9,000, and then we'll reach back in
| | 01:30 | and update the balance of Alice's
account to $1,050. All well and good.
| | 01:36 | We've subtracted $1,000 from joint.
We've added $1,000 to Alice.
| | 01:40 | Then one day, they both log on to the website at
the same time and both attempt to do this move.
| | 01:48 | Alice starts her operation.
| | 01:50 | She reads the balance of the
joint account. It's $10,000.
| | 01:53 | Bob starts his. He reads the balance of
the joint account. It's also $10,000.
| | 01:57 | No changes have happened yet.
Alice gets her balance, which is $50.
| | 02:01 | Bob gets his which, is $45.
Then Alice issues the update to joint.
| | 02:07 | She'd retrieved it as $10,000.
| | 02:08 | She will subtract $1,000 from that,
which is $9,000, she changes the database.
| | 02:13 | Bob also tries to perform that operation.
He believes that joint account is $10,000.
| | 02:18 | So he will also set that to $9,000.
| | 02:21 | Then Alice updates her
account adding $1,000 to it.
| | 02:24 | Bob updates his adding $1,000 to that.
| | 02:27 | Now the end result is that the joint account
is down $1,000, but both Alice and Bob's account
| | 02:32 | have been credited $1,000 each.
The balance should be $8,000, not $9,000.
| | 02:38 | Now Alice and Bob might be perfectly happy
with this, but our little bank will stay little
| | 02:42 | if we don't take care of the situation.
This is what often called a race condition.
| | 02:46 | It's a conflict where people two people or
two programming threads are doing very similar
| | 02:51 | steps and just getting a
little bit ahead of each other.
| | 02:54 | They're trying to affect the same data.
| | 02:56 | When they're doing it together, you'll end
up with a very different outcome to what would
| | 03:00 | have happened if they done exactly the same steps,
but Alice did all hers first and then Bob did his.
| | 03:07 | So the first step to fixing this
situation as we make these atomic.
| | 03:12 | We're making these several actions grouped into
one indivisible unit by making them transactions.
| | 03:19 | We do this by adding SQL
keywords at the start and end.
| | 03:23 | In this case, we use the
words Begin Transaction.
| | 03:26 | We'll do all the operations we want to do and then we write
the SQL keyword commit to say Our transaction is done.
| | 03:33 | Now, these are SQL keywords that deal with
transactions, although you will find that
| | 03:37 | you write slightly different code to create a
transaction in the different database management systems.
| | 03:44 | Sometimes it's just Begin, sometimes it's
just Begin Work, sometimes it's Begin Transaction
| | 03:48 | or Start the Transaction, but again
it's the core concept we're going for here.
| | 03:54 | We're starting a unit of work.
| | 03:56 | We want to go all the way through
and commit if everything is successful.
| | 04:00 | If there's a problem anywhere in the transaction,
we want the database to roll back to the start
| | 04:05 | of the transaction as if no
part of it had ever happened.
| | 04:08 | I said this is step one, because this by
itself may not help our race condition problem.
| | 04:14 | If both of these transactions are still
executing at exactly the same time and they're allowed
| | 04:19 | to do that, because we didn't
actually have any errors occur.
| | 04:24 | The issue is that Bob is being allowed to
read from the same table that Alice is in
| | 04:30 | the process of changing, but she
hasn't finished changing it yet.
| | 04:34 | This is often what's
referred to as a dirty read.
| | 04:37 | There's a transaction going
on that's partway through it.
| | 04:40 | It's going to be changing this data, but we're allowed
to read from somewhere in the middle of that transaction.
| | 04:46 | Often what happens is we don't want
to just treat these all as one unit.
| | 04:50 | Just making the transaction is not good enough.
| | 04:52 | We also want some kind of locking to occur
so that we are prevented from simultaneously
| | 04:57 | changing the same data at least
until the transaction is over.
| | 05:02 | Now one way of doing this is what's referred to
as pessimistic locking where we are pessimistic
| | 05:08 | that transactions will conflict with each other.
| | 05:11 | So as soon as a transaction starts, we're
going to lock the data that it's referring
| | 05:15 | to, and only unlock it
once the transaction commits.
| | 05:19 | Here's how this would work.
| | 05:21 | Going back to the beginning situation we are
creating transactions for both Alice and Bob.
| | 05:26 | So Alice's transaction starts.
| | 05:28 | She reads the balance of the joint
account and she will lock that information.
| | 05:32 | Now Bob would then attempt to read in his
transaction, but he won't be allowed to. He'll be refused.
| | 05:38 | The database might return an error,
but more typically we just use blocking.
| | 05:43 | We're not returning anything.
| | 05:45 | We're just making Bob wait, because
hopefully the entire transaction on the left is going
| | 05:49 | to take a tenth of a second.
We'll commit and everything unlocks.
| | 05:53 | If it's more complex, if we've got multiple
updates, full table scans, the database is
| | 05:57 | slow and not indexed well, well, Bob may end
up waiting for a lot longer or the database
| | 06:03 | might return an error or the application that
Bob is using to access the database might time out.
| | 06:08 | Anyway, let's move on because we should
hopefully only have milliseconds going on here.
| | 06:14 | Bob is waiting, Alice goes ahead.
She gets the balance of her account.
| | 06:19 | She reaches in and changes
the joint account to $9,000.
| | 06:23 | She then updates her
account and then hits commit.
| | 06:26 | Her transaction is over, and that means any
lock that she was applying is now unlocked.
| | 06:32 | Bob will now get that message back.
The balance is $9,000.
| | 06:36 | He is allowed to continue to get his balance.
He can subtract 1,000 from that.
| | 06:40 | We're down to $8,000 as we should be.
We update, and we commit his changes.
| | 06:46 | The thing is pessimistic locking is great for
Alice, but not necessarily so great for Bob.
| | 06:52 | If you always do pessimistic locking, you
may end up with a lot of read errors or a lot
| | 06:57 | of people having to wait.
| | 06:59 | Although with our situation we know there's a
conflict, there is lot of cases where transactions
| | 07:03 | aren't really conflicting, because
we're reading multiple pieces of data.
| | 07:08 | We're not always changing everything.
| | 07:10 | One option with some database systems is to
allow what's called optimistic locking, which
| | 07:15 | means Bob would be allowed to read from the
table while the transaction is going on, because
| | 07:20 | we are optimistic that
there won't be a conflict.
| | 07:23 | We begin with the same situation.
Alice and Bob both start.
| | 07:27 | Alice starts her transaction and
reads the joint account balance.
| | 07:30 | Bob tries, but where he was refused a moment
ago with pessimistic locking, he's going to
| | 07:34 | be allowed to right now. Both continue.
Alice will read hers, Bob will read his.
| | 07:41 | Alice will change the
joint account down by $9,000.
| | 07:44 | Now here's the difference.
Bob will try and update that too.
| | 07:48 | He thinks he is going to
update it from $10,000 to $9,000.
| | 07:52 | What's going to happen with optimistic locking
is the database will immediately realize that
| | 07:57 | the contents of this row has changed since he
read it, and it's going to actually issue an error.
| | 08:02 | It will detect that there was a dirty read,
and this transaction can't go any further,
| | 08:06 | because it doesn't know
how to handle that conflict.
| | 08:08 | In most cases, it will just roll back
to the beginning of the transaction.
| | 08:12 | Bob would get some kind of error that
the application would need to deal with.
| | 08:16 | Alice can continue on, because her
transaction hasn't thrown any kind of issue.
| | 08:21 | She can commit with her changes in place.
| | 08:25 | When you're working with locking, both
pessimistic and optimistic, this is one of the areas where
| | 08:29 | it's implemented a little differently across Database
Management Systems, although the principles are the same.
| | 08:35 | Now with some DBMS, you have specific keywords
to explicitly lock data that you're accessing.
| | 08:41 | With others, there's default behavior, what's
called the default isolation level of a database,
| | 08:47 | meaning, do we naturally do pessimistic locking or
optimistic locking or perhaps something in-between.
| | 08:53 | We will lock any inserts and updates in the transaction until
they commit, but not the select, just the basic read.
| | 09:00 | Take a look at your chosen database management
system when it comes time to do this.
| | 09:05 |
| | Collapse this transcript |
| Understanding stored procedures and injection attacks| 00:00 | I've shown a lot of examples of detached SQL
Queries that you might write casually as needed
| | 00:06 | in a database application
or from a Command Line.
| | 00:09 | However, as you start to do more complex
queries, you want to keep and reuse them.
| | 00:13 | You can do this by creating a stored procedure.
| | 00:16 | A stored procedure, sometimes
called a sproc, is nothing remarkable.
| | 00:19 | It's a chunk of SQL that can be written,
given a name, and stored directly in the database,
| | 00:25 | kind of like creating a function or method
in other programming languages, and then it
| | 00:30 | can be executed multiple times.
So here's the most basic way.
| | 00:34 | We have some SQL that we've
written, and we want to reuse.
| | 00:39 | Okay, this one isn't all that complex, but
let's imagine it was a bit more involved than this.
| | 00:44 | All we can then do is take this and surround it with
a little bit of code that will create a named stored procedure.
| | 00:51 | In this case, the procedure is called HighlyPaid.
| | 00:54 | What we can then do a little later is call it using
the keyword CALL and the name of the stored procedure.
| | 01:00 | That's going to execute it and return the
results as if we just typed the SQL out in full.
| | 01:05 | This is going to be the
main way that you reuse SQL.
| | 01:09 | You could call it form within a Database Administration
Application, you could call it from an application
| | 01:15 | you were writing yourself in some other
programming language or from a reporting application.
| | 01:20 | Now, it is another area where the actual
creation of a stored procedure is a little different
| | 01:25 | across the Database Management Systems,
although there are core similarities.
| | 01:29 | A stored procedure needs a name, but
we can take it a little bit further.
| | 01:34 | Let's say if you're writing slightly more
complex where clauses, in this case selecting
| | 01:39 | all the employees for a particular department
and ordering them by last name and first name,
| | 01:44 | you probably want to parameterize this in
some way so you don't have to always return
| | 01:48 | just the people from the sales department.
| | 01:50 | We can create this as a stored procedure,
I'll call it EmployeesInDept, and then inside
| | 01:56 | the parenthesis here, I'm
going to define a parameter.
| | 02:00 | I'm using the SQL keyword IN to say that
this is a parameter coming into the stored procedure.
| | 02:05 | We can also have OUT, but I'm
not going to cover that here.
| | 02:08 | I'll call this parameter dept.
| | 02:10 | I can call it whatever I want and say that
it is a VARCHAR with a maximum length of 50.
| | 02:16 | I can then use that dept and just replace
the string sales, and this is our parameterized
| | 02:24 | stored procedure, very much like creating a function
or a method in a lot of other programming languages.
| | 02:30 | It simply means when I want to call this later
on, I call the name the of the stored procedure
| | 02:35 | passing in whatever parameter I want it
to be replaced when the SQL actually runs.
| | 02:41 | We can take it even further than this.
| | 02:42 | We can give our SQL parameters
variables, even flow and error handling.
| | 02:48 | It's very much like turning it into a mini-
program, but this would be where we diverge into the
| | 02:53 | specifics of the particular Database
Management Systems, which I want to avoid here because
| | 02:58 | they handle things like parameters
and error handling a little differently.
| | 03:03 | But it is very common to require developers
to write their SQL in a stored procedure.
| | 03:08 | I've worked in several organizations which
didn't even allow developers to manually write
| | 03:13 | SQL inserts, updates, and delete
statements directly in their programs.
| | 03:18 | They had to write them as stored procedures.
| | 03:21 | One of the benefits there is all the SQL is
actually staying in the database and can then
| | 03:26 | be controlled and hopefully
optimized by the database administrator.
| | 03:30 | And using a stored procedure is also a level of
protection against something called an SQL Injection Attack.
| | 03:38 | These were and still are common
vectors of attack against websites.
| | 03:42 | Now, let's say you've got an
order look-up page on your website.
| | 03:46 | The customer sees something like
this, a basic textbox and a button.
| | 03:51 | This could be a more substantial form, it
could be a login form, it could be a sales
| | 03:55 | form, but let's take something simple like this.
| | 03:58 | What you're imagining is that they will enter
in their customer ID and hit the Submit button.
| | 04:03 | You're going to take that and in your web
application code, whatever that is PHP, C#,
| | 04:09 | VB.NET, Java, you're going to take it and
create a custom SQL string, something like this.
| | 04:16 | You're actually concatenating
a few things together.
| | 04:19 | So SELECT * FROM Orders WHERE CustomerID =,
and then you'll add the contents of this text
| | 04:24 | box and then add another little bit to
finish off the last quote, and you'll then combine
| | 04:30 | that into a string that will look something
like this and execute that SQL, all well and
| | 04:36 | good, and you'll get
results back for that customer.
| | 04:40 | Here's the problem.
| | 04:42 | What happens when the user who sees this page
is malicious and they enter something like this?
| | 04:49 | Looks a little strange, so x'; what's going on?
| | 04:54 | Well, really the important thing is in
the middle here, the SELECT * FROM Users.
| | 04:59 | What someone can do if they're malicious
is they can wonder whether your website is
| | 05:03 | liable or responsive to SQL injection attacks.
| | 05:08 | If it is, that usually means that this is going to
be constructed in some kind of concatenated string.
| | 05:14 | So I'll put in this X with the quote after it
to try and basically finish off that first,
| | 05:21 | the standard, the built-in SQL command.
| | 05:23 | So, that comes Select * FROM Orders WHERE
CustomerID = 'x' with the quotes surrounding it.
| | 05:30 | Then there's the semicolon
to finish that statement.
| | 05:34 | And then I'm writing my own SQL statement
that I want to do, SELECT * FROM Users.
| | 05:39 | Finish that with a Semicolon,
and then I'll put in these two dashes.
| | 05:43 | That's actually a SQL comment, and it means
to tell this to ignore that final quote, which
| | 05:49 | the string was trying to
concatenate on the end in our code.
| | 05:52 | So, it's a very slightly
deceptive way of dealing with this.
| | 05:56 | What that means is when we execute the SQL, it will
essentially pass through three SQL commands to the database.
| | 06:02 | The first one which may not return any values,
the second one that we've just injected into
| | 06:07 | the system, and the third one which will count
as a comment, so that in itself could be returning
| | 06:12 | a lot of sensitive data to
someone who was trying to do this.
| | 06:16 | And the worst thing is what if they did something
like this, very similar format, except they're
| | 06:22 | doing something really malicious.
| | 06:24 | And if your SQL is executed with high permissions,
what's basically going to happen is they will
| | 06:29 | pass through the DDL command to drop the table of
orders, and you've just lost an entire part of your database.
| | 06:36 | Now it's true, you might not have a table
called Orders, or in the earlier example,
| | 06:40 | a table called Users, but if someone's first
attempt generates an error message which is
| | 06:46 | passed back to the user, they can often
figure out that they can at least get SQL injection
| | 06:51 | to work and they can just keep trying statements
until they get something they find interesting.
| | 06:56 | Allowing SQL injection can allow someone to
log into your system as someone else, even
| | 07:01 | administrator, without
knowing a username or password.
| | 07:04 | Now, it is true that these days, most web
application software platforms have precautions
| | 07:09 | against this, they guard against injection
attacks, but they are still a potential issue.
| | 07:14 | And one way around them is to use stored procedures
with parameters rather than constructing their own strings.
| | 07:21 | As stored procedures inherently don't have this
level of flexibility in what's passed into them.
| | 07:27 | You can't dynamically break a stored
procedure apart into several statements.
| | 07:32 |
| | Collapse this transcript |
|
|
8. Database OptionsDesktop databases| 00:00 | Now let's move from discussing ideas and
concepts into the ways you can actually sit down
| | 00:05 | at a computer and start using them.
| | 00:07 | In the next few movies, I'm going to be
going over database management system software,
| | 00:11 | what are the main options, and why might
we choose or have to use one over another.
| | 00:17 | We're going to start with the simplest kind,
| | 00:19 | Desktop Database Applications, meaning an
application you would install yourself on
| | 00:23 | your own desktop or laptop, and you'd use the
same application to both design the database
| | 00:28 | and also to use it and enter data into it.
| | 00:31 | There have been many products
in this category over the years.
| | 00:35 | The two most relevant now are
Microsoft Access and FileMaker.
| | 00:40 | And Access in particular is everywhere
because it's part of the Microsoft Office Suite.
| | 00:44 | There are other options, there are things
like the open source alternatives like Apache Base
| | 00:49 | and smaller and more purely personal
database software like Bento, but really FileMaker
| | 00:55 | and Access are the ones to talk about here.
| | 00:57 | Now these kinds of applications are not
targeted primarily at professional developers, although
| | 01:03 | development skills are often very useful.
| | 01:05 | They are targeted at power
users and business users.
| | 01:08 | They are a step up from just using a collection
of spreadsheets to manage your data, but they're
| | 01:13 | aiming at the same kind of
person who might have done that.
| | 01:16 | Now, reasons why you might use
these, they are simple to install.
| | 01:21 | You don't need to get the guys from IT
involved, doesn't need a complex back-end server.
| | 01:26 | Just install and go.
They are pretty easy to use.
| | 01:29 | Again, targeted at business users, not
solely developers, so you can get quickly up and
| | 01:33 | running without knowing much about databases
or how they should be built, and whether this
| | 01:38 | is a good idea is a topic of some discussion.
| | 01:40 | Now of some help with this is you'll often find
templates or starter solutions to get you moving.
| | 01:46 | Microsoft and FileMaker know that many people
use these for managing pretty typical similar
| | 01:51 | business scenarios, like keeping track
of costumer contacts or asset management.
| | 01:56 | So they provide pre-built starting points for those
situations that a user can create and then customize.
| | 02:02 | They also provide database
and user interface design tools.
| | 02:06 | So you're not just designing your tables,
your columns, and your relationships, but you're
| | 02:10 | also creating the user interface
to enter data into those tables.
| | 02:15 | Now much as that may sound obvious, most
enterprise-level database systems do not provide any
| | 02:20 | kind of application targeted at the end user.
| | 02:24 | Whether you want someone to interact with
your database using a website or a desktop
| | 02:28 | app or a mobile app, and you're using something
like DB2 or SQL server, well, you're going to have
| | 02:33 | to build that application.
They don't provide it for you.
| | 02:37 | They also provide reporting tools, and
things like Access and FileMaker include not just
| | 02:41 | the user interface for viewing and adding data, but
options for generating reports based on that data.
| | 02:47 | So, several legitimate reasons for using them,
but also several reasons that you wouldn't,
| | 02:53 | and the primary ones being that you
have a lot of people or a lot of data.
| | 02:58 | And yes, okay, a lot is an
unfortunate hand-wavy term here.
| | 03:03 | What does a lot of data mean?
What is a lot of people?
| | 03:07 | Well, for desktop databases, the
perfect number of users is one.
| | 03:13 | If you create a new database in Access or
FileMaker, by default it's a single file that's
| | 03:17 | saved on your machine, like
having a Word document or spreadsheet.
| | 03:21 | It is one file that contains everything about
the database, and that's where the data will
| | 03:25 | remain unless you say otherwise.
| | 03:27 | Now you can put that file on a server and
share it across a network, and there are various
| | 03:32 | techniques and workarounds for
sharing these kinds of databases.
| | 03:35 | But the more users you add,
the less reliable it becomes.
| | 03:39 | Microsoft Access 2010 has an official limit
of 255 simultaneous or concurrent users, but
| | 03:46 | personally, I would never
get anywhere near that number.
| | 03:50 | If I know I even need 10 simultaneous users,
I'm not going to use a desktop database,
| | 03:55 | though it does depend on the kind of activity,
because it's much easier to support 20 people
| | 04:01 | at the same time just reading data out of
a database than to support even half that
| | 04:05 | number editing data at the same time.
| | 04:08 | Now FileMaker has an official maximum limit
of nine simultaneous users, but you can use
| | 04:13 | FileMaker Server to increase
that to 100 or 200 people.
| | 04:16 | With Access, you can also upload to SharePoint
server to make it easier to allow multiple users.
| | 04:22 | But if you know from the get-go that your
database is intended for hundreds or thousands
| | 04:26 | of simultaneous users, these are
not what you should be looking at.
| | 04:29 | Likewise, if you're building a public
website and looking for a back-end database, again,
| | 04:34 | FileMaker and Access, not the right call.
| | 04:36 | These are for internal use with small groups
and small to medium size businesses with small
| | 04:41 | to medium amounts of data.
| | 04:43 | As a good example, just look at the
templates that these applications provide.
| | 04:48 | There are things like a Contact database,
Asset and Inventory Management, and Event Planning
| | 04:54 | database, and for this
kind of level, they're great.
| | 04:57 | For larger projects, they can be useful for
doing things like a proof of concept simply
| | 05:02 | because they're very easy to get up and running.
| | 05:05 | But as you can tell, we haven't spent really
any time on desktop databases in this course,
| | 05:09 | our focus has been on more
substantial and more scalable solutions.
| | 05:13 | But everything we've covered in this course
could be used for, say, an Access Desktop Database,
| | 05:18 | and it would allow you to built
a better and a more efficient one.
| | 05:23 |
| | Collapse this transcript |
| Relational database management systems| 00:00 | If any database systems can be described as classic, it's
these ones, the Relational Database Management Systems,
| | 00:06 | or RDBMS, that we've been
focusing on in this course.
| | 00:10 | They all share the key features discussed.
| | 00:13 | They're made of tables with
defined columns, multiple rows of data,
| | 00:16 | Primary keys and Foreign Keys, and they support
Relationships, Transactions, and the ACID test
| | 00:21 | and using SQL to talk to them.
| | 00:25 | The most obvious in this category are Oracle,
IBM's DB2, Microsoft's SQL Server, MySQL,
| | 00:32 | but there are several others that
can be mentioned in the same breath.
| | 00:35 | SQLite, Sybase, Teradata and Microsoft Access
uses these ideas, as does PostgreSQL.
| | 00:42 | Although technically speaking, PostgreSQL
adds a few extra features, so it's formally
| | 00:46 | known as an ORDBMS, an Object-Relational
Database Management System.
| | 00:51 | Now, of course these systems aren't identical.
There are differences between them.
| | 00:56 | There are differences in features,
differences in implementation and hardware requirements
| | 01:01 | and expense, but it is the
similarities we're interested in here.
| | 01:06 | These are all based on ideas developed by
a gentleman by the name of Edgar Codd back
| | 01:10 | in the 1970s, and the products
themselves have been around nearly as long.
| | 01:15 | Oracle was first released in 1979, and the
first relational database that I used was DB2,
| | 01:20 | and that was in 1983.
| | 01:22 | Now, you'll sometimes hear the phrase enterprise-
level used to refer to these database systems.
| | 01:28 | Now, enterprise-level has no strict definitions.
| | 01:30 | It's really more of a buzz word to refer to
large organizations or government departments
| | 01:35 | with at least hundreds but often
thousands or tens of thousands of users.
| | 01:40 | It's not so much that strict number of
users is important, just that by the time you get
| | 01:44 | to that level, your demands are more stringent.
| | 01:47 | You need constant uptime, disaster recovery, continual
back-up, and these systems will provide that and more.
| | 01:53 | Now, they might sound like one app, one product, but
you don't just casually install Oracle or install SQL Server.
| | 02:02 | The Relational Database Management Systems are
often split up into a family of different applications.
| | 02:08 | You'd first install the database engine,
typically on a shared server or using several servers.
| | 02:14 | It just runs continually in the background but
might have no graphical user interface at all.
| | 02:19 | You'd either use a command line to
interact with it or install a separate application
| | 02:24 | just for database administration, things like
SQL Server Management Studio for Microsoft SQL Server,
| | 02:30 | or Oracle SQL Developer, MySQL Workbench.
There are many others.
| | 02:36 | Sometimes you'll need a separate application
to work with back-up, another application
| | 02:41 | to monitor and report on
the database, and so on.
| | 02:45 | Now beyond that, they have multiple
versions or additions available of these.
| | 02:49 | Often there is at least a standard
addition and also an enterprise edition with extra
| | 02:53 | features for very large-scale deployments, but
typically, none of them provide any application
| | 02:59 | targeted at the end user, and
you need to build that yourself.
| | 03:02 | Now, the larger and more complex your
databases get, the more time they take to maintain and
| | 03:08 | the more likely that you'll need a
formal database administrator, or DBA.
| | 03:11 | In large organizations, you may have
multiple employees whose full-time job is simply to
| | 03:16 | maintain the Database Management System and
the databases inside it. So, is it backed up?
| | 03:22 | Do we have enough storage for
the next week month or year?
| | 03:24 | Is it responding quickly enough?
| | 03:26 | Is the design of the database
as efficient as it could be?
| | 03:29 | Does anything need to be purged from it?
| | 03:31 | Do we have the right tables, columns, keys
and relationships, and very importantly,
| | 03:36 | do we have the right indexes?
| | 03:39 | As you might imagine, being a good database
administrator requires not just a good general
| | 03:43 | knowledge of database design, but the specifics of
the particular Database Management System you're using.
| | 03:49 | Now, one obvious difference also is that some of these
are purely commercial products, while others are open source.
| | 03:55 | Now, I'm sometimes asked, okay, if I can buy
Microsoft Access for a couple of hundred dollars,
| | 04:01 | how much is Oracle? How much is DB2?
That's not a simple answer.
| | 04:05 | Most of these database systems can be
licensed multiple ways on a per user basis or by how
| | 04:11 | many machines--or rather CPUs--they're running
on, or by which addition you buy, or by what
| | 04:16 | extra features you need or don't need,
and we're not covering that at all.
| | 04:20 | However, if we're talking about licensing,
one thing that is worth mentioning is the
| | 04:24 | growth in recent years of hosted cloud-
based database system providers, like
| | 04:30 | Microsoft Windows' Azure, which does provide an option
for creating hosted SQL Server database servers.
| | 04:36 | You simply choose the size of the database
you want and the number of databases you're
| | 04:40 | having, and it'll give you a price for this.
| | 04:42 | In a similar fashion, there's also the
Amazon Relational Database Service, Amazon RDS.
| | 04:48 | This one gives you an option of using MySQL, Oracle,
or SQL Server with different prices and options for each.
| | 04:55 | Now, you still have to design the database
and build the applications that use it, but
| | 05:00 | you can pay them to take care
of the technical infrastructure.
| | 05:03 | Now, note that even the open source
database systems like MySQL are not always free.
| | 05:09 | MySQL is owned by Oracle now, and while the
database engine is free to download and use,
| | 05:14 | there is an enterprise edition that includes
separate applications for back-up and monitoring,
| | 05:19 | and that is a commercial product.
| | 05:21 | However, on the flip side of that, even the
commercial database vendors also provide a free edition.
| | 05:27 | This is typically restricted, although what
you'd called restricted is still usually much
| | 05:32 | more powerful than a desktop database.
| | 05:34 | The constraints are often a reduction in size,
perhaps the database can't be larger than
| | 05:39 | 4 gigabytes, or the database engine might only use
one CPU even when installed in a multi-CPU machine.
| | 05:46 | When you're looking for the free edition of
any Commercial Database Management System,
| | 05:51 | look for the word Express.
| | 05:53 | There's a SQL Server
Express Edition from Microsoft.
| | 05:57 | Oracle also has an
express edition which is free.
| | 06:00 | DB2 has an Express edition, although
strangely that's not free, but there is an Express-C
| | 06:06 | or Express-C version that is free.
| | 06:09 | These are often very useful in the learning processes.
They are convenient enough to just
| | 06:14 | install on your own machine for experimentation.
| | 06:18 | While Relational Database Management Systems
are the most widely used with the longest
| | 06:22 | history, they aren't
necessarily always the best option.
| | 06:25 | Let's take a look at a
couple of other database types.
| | 06:30 |
| | Collapse this transcript |
| Object-based and XML-based databases| 00:01 | With the rise of XML as a format for
exchanging data, there are also a handful of
| | 00:05 | Database Management Systems oriented specifically
around using the XML as their internal structure,
| | 00:11 | rather than the tables of columns and rows.
| | 00:13 | If you haven't delved into the world of XML much,
the next two minutes might be a world of jargon.
| | 00:19 | That's okay. This is really targeted to those who know
they have to deal with a lot of XML or are worried they might.
| | 00:27 | XML specific Database Systems
include BaseX, Sedna, and eXist.
| | 00:32 | These are all open source.
They store XML data directly.
| | 00:37 | Large amounts of XML documents
using the nested tree structure of XML.
| | 00:41 | So, you not only store XML directly, you
use XML languages, typically XQuery, to query
| | 00:48 | that data rather than using SQL.
| | 00:50 | Now, these are for quite specific users, rather
than the general use of Relational Database Systems,
| | 00:56 | so we're not going to be spending much
time on these, other than just making
| | 01:00 | sure you're aware of their existence.
| | 01:03 | If XML is naturally your only kind
of data, they are worth looking at.
| | 01:08 | If, however, what you have is some XML,
there's nothing that stops you from just storing XML
| | 01:13 | in a regular relational database table.
It is after all just text.
| | 01:19 | And these days, it's very common to see XML as a specific
column data type for a regular relational database.
| | 01:27 | SQL Server, Oracle, and DB2
all support XML as a Column Type.
| | 01:32 | With other database systems,
you can just store it as text.
| | 01:35 | The benefit of having a specific XML type
is usually that the database will support
| | 01:40 | using XQuery to manipulate that
particular column, instead of just SQL.
| | 01:47 | We can store XML data directly as a column
value and either use SQL to work with it
| | 01:52 | or XQuery, but enough, moving on.
| | 01:56 | Object-Oriented Databases, unlike the
previous section, the next two minutes will mainly
| | 02:01 | benefit those who know a
little about object-orientation.
| | 02:05 | One of the down sides with the Relational Database
Management Systems came with the rise of object-oriented
| | 02:10 | languages in the '80s and '90s like C++, Java,
C#, and so on, because the objects we would
| | 02:16 | generate in software in those languages
don't necessarily map exactly to rows in the
| | 02:21 | database table, they are more complex than that.
| | 02:24 | There has been an Object-Oriented
Database Management System over the years.
| | 02:29 | Object databases, or OODBMS,
they're trying to fix this.
| | 02:32 | Things like Objectivity/DB,
VelocityDB, and Versant.
| | 02:37 | Like XML-based databases, they never reached the
popularity level of conventional relational database systems.
| | 02:43 | They did become popular, but mainly in niche
areas like engineering or physics, not in
| | 02:48 | general use, and like XML database systems,
we're not working with them here, just making
| | 02:54 | sure you're aware of their existence.
| | 02:57 | Here's the thing, relational databases are still
enormously popular with object-oriented languages.
| | 03:03 | Yes, the problem does exist that software objects
don't always match one on one with rows in a table.
| | 03:11 | What's very popular is to use an ORM
Framework, Object-Relational Mapping Software.
| | 03:16 | This is a software framework that helps us
map from objects in an object-oriented language
| | 03:22 | to regular relational database tables and back again
without really thinking very much about that mapping.
| | 03:29 | These frameworks include NHibernate in Java,
the Core Data Framework for Objective-C,
| | 03:34 | the ActiveRecord Library in Ruby, and so on.
| | 03:38 | These are always very specific to the
language that you're working in, so they're out of
| | 03:42 | the scope of a course like this one.
| | 03:44 | In fact, with this, this is
all I'm saying on these topics.
| | 03:47 | Neither XML nor object databases became
anywhere near as popular as some people expected.
| | 03:53 | In fact, the most impactful developments
in database technology came with the rise of
| | 03:58 | what are called the NoSQL databases,
and we'll talk about those next.
| | 04:03 |
| | Collapse this transcript |
| NoSQL databases| 00:00 | Over the last few years, a number of
database systems have grown in popularity, and these
| | 00:04 | are grouped together under
the term NoSQL databases.
| | 00:08 | NoSQL as a category can be a little confusing.
| | 00:12 | These days that term is taken to mean
not only SQL rather than NoSQL at all.
| | 00:18 | But it is genuinely a wide
category with a lot of options in it.
| | 00:23 | And first, you can't assume that
they're all similar to each other.
| | 00:27 | Unlike relational databases as a category,
where the skills are broadly transferable
| | 00:31 | between, say Oracle and
SQL Server, or DB2 and MySQL,
| | 00:36 | in the NoSQL category, just because you know
MongoDB, doesn't mean you have any head start
| | 00:41 | on learning Neo4J. They're
very different from each there.
| | 00:45 | There are no official rules as to
what makes a NoSQL database system.
| | 00:51 | That's kind of the point, because
it's more about what they aren't.
| | 00:54 | They aren't traditional relational databases,
and they're getting away from those rules.
| | 00:59 | So there are few things we can
say that these usually aren't.
| | 01:02 | This isn't always the case, but most typically
they don't use SQL as a query language.
| | 01:09 | Usually they don't store data
in tables--a few of them do.
| | 01:13 | Usually they don't really
care about relationships.
| | 01:17 | Usually they don't provide asset
transactions, though a few can.
| | 01:21 | Usually they don't require a formal data
schema, and that's quite a big deal with the NoSQL
| | 01:27 | databases, there's flexibility there.
| | 01:30 | Several have been developed and oriented around
web deployment or web development in a way
| | 01:34 | that's not as natural for relational
databases to do, simply because relational databases
| | 01:39 | predate the web by about 25 years.
| | 01:41 | And most of them are Open Source
projects, but again, not all.
| | 01:46 | But beyond that there's a lot of variety.
| | 01:49 | There are a few worthwhile
groupings within the category of NoSQL.
| | 01:53 | First, we have several document databases
what are more typically called Document Stores.
| | 01:59 | These are organized around the idea where the
fundamental thing to be stored is a self-contained
| | 02:03 | document, rather than individual rows of data consisting
of well-defined columns that you decided beforehand.
| | 02:12 | Now it doesn't necessarily mean a document
in the sense of a Word document or PDF.
| | 02:17 | Just know that everything you store is a self-
contained piece of data that describes its own schema.
| | 02:23 | The format you're actually storing might be
XML, or it might be JSON, which is just a loose
| | 02:28 | structure where every new piece of data
that you store can describe its own schema.
| | 02:33 | In this case, I'm saying store a simple piece of data
consisting of two pieces of information: a title and a rating.
| | 02:41 | Title is a string, and the
rating is an integer, it's 10.
| | 02:43 | I can turn around and the next piece of data I
store has a completely different description,
| | 02:49 | including nested levels of information.
| | 02:52 | And I don't know what this is ahead of time, or
certainly I don't need to know what this is ahead of time.
| | 02:58 | I can just store anything I want in this document
store, just keep adding new documents into the database.
| | 03:05 | Each one is given a unique ID, but beyond that,
you have flexibility, often total flexibility.
| | 03:12 | You don't need to provide a formal schema.
You don't need to define relationships.
| | 03:16 | But because you don't necessarily know what's
inside each, then the database needs to provide
| | 03:21 | a more flexible way of querying
that data than SQL would provide.
| | 03:26 | Examples in this category would be CouchDB
and MongoDB, both very popular NoSQL databases.
| | 03:33 | In fact, technically the XML databases I
talked about earlier could also arguably fall into
| | 03:38 | this category, too, even though they
predate the popularity of the term NoSQL.
| | 03:43 | Now there's another common category which
is Key-value databases, or Key-value stores.
| | 03:50 | This is another category which is all about
having no predefined schema for your data,
| | 03:55 | simply storing and retrieving
everything based on a key-value pair.
| | 04:00 | You can kind of make an analogy, this is a
bit like having a two-column table with whatever
| | 04:06 | you want in the key column and whatever you
want in the value column, nothing more enforcing
| | 04:12 | that, could be any data type, integer,
string, large, small, binary data, whatever.
| | 04:17 | You could fit in bits of XML or bits of JSON,
so in some sense, there's a lot of crossover
| | 04:22 | here with document stores like MongoDB and
CouchDB, they could fall both under the document
| | 04:28 | store and key-value store categories, but
other well-known NoSQL databases would be
| | 04:35 | MemcachedDB, Riak, and Project Voldemort.
| | 04:37 | Now worth mentioning is that many database
systems in this key/value category are specifically
| | 04:42 | designed to be very fault-tolerant distributed
architecture, meaning you easily install them
| | 04:48 | across multiple machines, and no
one machine is a point of failure.
| | 04:52 | They'll survive machine
failures and keep on ticking.
| | 04:56 | And the last grouping I'll mention
are graph databases, or graph storage.
| | 05:00 | This is where your data is in a
graph structure, small connected nodes.
| | 05:05 | There's no one master point.
Everything is connected to everything else.
| | 05:10 | Whereas many of the other in the NoSQL databases tend
to de-emphasize relationships, this goes the other way.
| | 05:16 | It's all about having small nodes of data that
have connections to other small nodes of data.
| | 05:21 | This could be a handful--the way that I'm
showing them here--where certain nodes manage
| | 05:25 | other nodes that work in other nodes that
are located in other nodes, and you can just
| | 05:30 | follow the relationships around.
| | 05:33 | Or in the large scale, there could be millions or
tens of millions of these nodes and connections.
| | 05:37 | Neo4J is probably the best example here,
though there are others like AllegroGraph, and worth
| | 05:43 | mentioning in some classic relational databases are
starting to provide extra features in these spaces.
| | 05:50 | DB2, for example, has the NoSQL Graph Store now.
| | 05:55 | The thing is there are great reasons for
choosing relational databases and great reasons for
| | 06:00 | choosing NoSQL databases,
but it's not flip a coin.
| | 06:04 | These things are very different.
| | 06:06 | The most important question that might
steer you away from relational databases, first,
| | 06:11 | do you need a flexible schema?
| | 06:13 | If you can't define a relational database,
because when you start to draw out your tables
| | 06:18 | and columns, you just have no idea what they'll be,
because the data you need to store will
| | 06:23 | change from moment to moment.
| | 06:24 | That might be one pointer for looking at a
NoSQL database where you have freedom to store
| | 06:29 | more flexible data with a schema
that might change from moment to moment.
| | 06:34 | The second question, do you
have an enormous amount of data?
| | 06:37 | And by enormous, I mean enormous.
| | 06:40 | A good definition of this is described in the
documentation for the popular HBase NoSQL database.
| | 06:48 | It's asking when should I use HBase?
First, make sure you have enough data.
| | 06:53 | If you have hundreds of millions or
billions of rows, then HBase is a good candidate.
| | 06:59 | If you only have a few thousand or a few million rows,
a traditional relational Database Management System
| | 07:04 | might be a better choice.
| | 07:07 | So if that fits your needs, take a look at
some of the projects that were specifically
| | 07:11 | designed for that kind of data, what's being
generically referred to as big data situations
| | 07:17 | which have a different kind of requirements from what people
envisioned for classic relational databases 40 years ago.
| | 07:24 | And the third option, valuing,
scaling over consistency.
| | 07:27 | What do I mean by this?
| | 07:30 | You see with most enterprise-level
relational databases, traditional databases, they are
| | 07:35 | oriented towards never losing a single thing,
that's why it's set to focus on transaction,
| | 07:40 | because for things like banking systems or
airline bookings or patient information, you
| | 07:45 | just can't lose a thing.
| | 07:47 | If, however, you have a different kind of setup,
your database needs to crunch vast amounts
| | 07:52 | of statistics on website hits but no
particular one entry is all that important, then you're
| | 07:59 | more concerned with being able to scale
really fast, to add machines, lose machines, add
| | 08:04 | and lose disk drives and still keep up and
running, even if you lost an entry here and
| | 08:09 | there, and some of the NoSQL databases are
really more oriented to that kind of approach
| | 08:14 | than the never lose a single row.
| | 08:17 | So these databases are great, but make
sure you're using them for the right reasons.
| | 08:22 | Some people think, well, they're newer.
| | 08:23 | There must be a more up-to-date way
to approach relational database issues.
| | 08:27 | But that isn't the case at all.
| | 08:29 | These systems are trying to fix different
kinds of problems, and if you have a problem
| | 08:33 | that a relational database can fix, then go ahead.
| | 08:36 | If you need flexible schemas,
vast amounts of data, just go for it.
| | 08:40 | Or if you have well-defined data, don't
underestimate the ability of traditional relational databases,
| | 08:46 | even if it might not be
the newest technology around.
| | 08:49 | But for more on NoSQL, take a look at our
Up and Running with NoSQL Databases course
| | 08:54 | that gets deeper into some
of these, particularly CouchDB.
| | 08:59 |
| | Collapse this transcript |
|
|
ConclusionFinal thoughts| 00:01 | We've reached the end of this
Foundations of Programming: Databases course.
| | 00:05 | As with everything else in programming, once
you've got a decent grasp of the theoretical
| | 00:09 | content, and you're not looking for perfection,
just that most of it makes sense most of the
| | 00:14 | time, it's time to get hands-on.
| | 00:17 | And there are several more courses here
at lynda.com you might be interested in.
| | 00:20 | So, take a look at our courses on the SQL
language and on the hands-on use of Database
| | 00:25 | Management Systems like SQL Server,
MySQL, and PostgreSQL.
| | 00:31 | Everything we've done in this course, you
will find relevant in those and many others.
| | 00:35 | I'd also suggest that once you've started
designing your databases, it's a good idea
| | 00:39 | to revisit this content now and again.
| | 00:41 | As many of these ideas and concepts reward
repeated attention over months and even years.
| | 00:48 | And thanks for watching.
See you next time.
| | 00:53 |
| | Collapse this transcript |
|
|