Join Bill Weinman for an in-depth discussion in this video Removing duplicates with SELECT DISTINCT, part of SQL Essential Training.
There are times when you may want to know all the different values of a column in your result without duplication. For this purpose, SQL provides the select distinct statement. We're going to start with the world.db database for this lesson. We're going to change databases Part way through the lesson so we'll start with the country table and I'm just going to list all of the continents from the country table like this: select, continent from country, and I'll press the Go button.
And you see, there's a list of all of the continent values in the table, all the values in the continent column. You notice that we get 239 rows, and there's a lot of duplication here, Asia, Europe, North America, there's another Europe. There's another Africa, another North America. And so we have all of these results, but we just really want to know how many different continents are represented here. So the way that you do this is with the select distinct statement. So, where we have select here, we're just going to add the word distinct.
And now, we're saying SELECT DISTINCT instead of SELECT. And when I press Go, and notice we just get seven results because in this counting, Europe and Asia are separate continents. Some people count them as one continent. So here, we have seven rows and these seven continents in this way of looking at the world. So, the SELECT DISTINCT statement, it takes this result, and internally it sorts it. And then it simply removes the duplicates. You see that's why it's in order here, in alphabetical order.
Most database engines work this way. The first thing they'll do is they'll sort the column, or they'll sort whatever the column is in your result. It can be combined for multiple columns. We'll see that in a little bit. And then they simply select out all of the duplicates. So it's a really simple thing to do, SELECT DISTINCT, and it just gives you the results that are different. The distinct results is opposed to all of the results from that query. Now let's take a closer look at how this works. We're going to switch over to the test.db database here and I'm going to go ahead and create a table, just a very simple table, called test.
And we'll give it two columns, a int and b int. And in SQLite, you can just type int instead of spelling out integer. A lot of database systems have shortcuts like that. And I'm going to insert a bunch of rows. So I'm going to say, insert into test values 1, 1, like that. I'm just going to copy this a bunch of times. Actually made 10 of them and we're going to edit them real quickly here. We're going to say 1, 2, 3, 4, 5.
And then we're just going to give all of these other ones a two in the second column. So I'm just going to take those four out and So now we have five rows that have a different value in the a colum.And five more rows that all of them say one, two.And I'll just do a select asterisks from test. Here and I'll press the Go button here. So know we have this table and it has ten rows.
11, 21, 31, 41, 51, like that. And then five rows of 12,12,12, 12. And this will allow us to investigate this select distinct statement. So I'm going to leave that result at the end of our queries here. And we're just going to say, select distinct A Now, what you would expect to have happen is we'll only get 1, 2, 3, 4, 5 because all of these other columns, they have a 1 in A and so those are not distinct. So, when I say go, we just have 1, 2, 3, 4, 5 and there's our whole table.
I have that selected. Everything from test so you can see what the whole table looks like there. But our distinct query on column A is just those distinct 5 values. And if I do the same thing with B, you see our table just has two different values in B. It has a 1 5 times and a 2 5 times. I should do just to rows, just 1 and 2. In my result. But, here's the interesting thing. If I say select a comma b, this is select distinct a comma b, what do we expect to have happen? Well, what it does is it takes that a and b and inserts the entire result and it takes out all of the duplicates, including both rows.
So, what we're going to get here is just the rows where both A and B are different, so we'll get all five of these first ones, because together a and b are different. You have 1 1, 2 1. And then we'll also get one of the 1 2s but no more. And so I'll press Go and you see they're sorted, there's the one, two but it's the second one because in order to do this distinct thing that it does, it actually sorts them first. You can change the order with an order by but that's going to do a second sort and you need to keep in mind if you're optimizing your database queries that, that will.
The extra execution cycles. So, this is really how SELECT DISTINCT works. It's a separate statement, technically, from SELECT. SELECT DISTINCT is distinct from just SELECT by itself and it takes the entire result and it sorts it and it weeds out the duplicates. So now let's go ahead and drop our test table from the test database. So we can return our database to its original state for further lessons. SELECT DISTINCT is a powerful tool for finding distinct results.
It's commonly used for finding distinct values from a single column. But it can also be used to find distinct values from any query or expression.
- Understanding SQL terminology and syntax
- Creating new tables and records
- Inserting and updating data
- Writing basic SQL queries
- Sorting and filtering
- Accessing related tables with JOIN
- Working with strings
- Finding the numeric type of a value
- Using aggregate functions and transactions
- Updating a table with triggers
- Creating views
Skill Level Beginner
Q: For Mac OS X: When I try to start the Apache Web Server from the XAMPP control panel, it doesn't start, and when I open "localhost" in my web browser, I see a white screen that says "It Works!" instead of the XAMPP page.
sudo apachectl stop
Q: I'm on a Mac, and I get an error in SID that says "attempt to write a read only database." How can I fix this?
A: This usually means that the database folder does not have sufficient permissions for writing by the web user. This can happen if you create the SQL folder new, rather than copying it from the Exercise Files. Here's how to fix this:
- Open a Finder window and Navigate to /Applications/XAMPP/htdocs/SQL
- Control-click on the SQL folder and select "Get Info" from the context menu.
- Under "Sharing and Permissions" (you may need to open the disclosure triangle), in the "everyone" row, select "Read & Write."Then you can close the Info window.
- Now repeat the process for the three *.db files inside the folder.