Introduces SQL's built-in grouping functions, coupling them with common numeric functions
- [Narrator] Sometimes all you need is information about a table, rather than returning a subset of information from within the table. So instead of getting a set of data and downloading to excel and then running some results on that, we can often do a lot of that aggregate reporting from within SQL. The most obvious thing that we need to know is how many rows there are and for that we can use count star, like so, and see that that returns a single row in a single column with the number 603 in it, and that's how many rows there are overall.
Notice that where we've used the asterisks before as select star, it's meant select all fields, but when we use it in count star, it means count all of the rows. So the asterisks is context dependent. Now we don't have to be counting all of the rows there, we could say specify a particular field, like so. Generally speaking that would give us exactly the same result, but because I checked this one earlier, I happen to know this gives us a different result.
A 599, instead of 603, which was the result before. Now the reason for this is that address two, is a field that contains four null values. Null in a database means the absence of data. It doesn't mean zero, or blank, or space, it means, nothing here. Now there may be times where you want to exclude null values from your count, in which case, this is the syntax for you, but if all you want is a count of all of the rows, stick with count star.
Now we can select regular fields along with count star like so, but this result can be very misleading. I've asked the database to return me all fields, plus an extra field containing a count of all the rows, and it's done the best it can, but what it's done is return me just the one row, with 603 at the end there. Using count start, which is a grouping function, means that the database is just returned me a single row.
Now this result, taken together, is very misleading. It looks as though number 47, MySativa Drive, is related someway to this 603, and of course we know that they are not connected, they are independent of each other. So we have to take a lot of care when mixing grouping and non grouping functions in a query. So let's see how to use them properly. Let's say that we want to know how many rows there are per district.
We can start off by saying give me the district and count star, and you see it returns to me, Alberta and 603. Now it's returned me Alberta because that's the first row, rather than because Alberta and 603 are connected. What we really want to do is group the data by district and for that we need to use a new function called Group By, and if we run that, we see if gives us 378 rows instead of one, which is a bit more plausible.
And it's listed all of the districts on the left, against the count of rows, or count of addresses in this case, from the address table. Now if you think that count star is a fairly unhelpful column heading there, we can use an alias on this, just as we have done before. So we can say As ct, for count and we've changed the column heading there. We can also then, order our results by the count descending and notice that the order by, comes up after group by.
So we can see that the most popular district in this database is 10. Now, one note of caution here, be careful grouping by field names that aren't being selected. Here we're grouping by district and we're selecting district, but actually the database will quite happily let you run a query like this, where you're selecting one thing and grouping by another. Now if we hit go on that, we're actually returning 603 rows, which is one row, for every item in the database, and we can tell this is what's happening, in another way as well, because we're ordering by count descending and we see that the maximum count is one.
So running this query has actually not grouped our data at all, although we might think it has at first sight. So although we've used the group by statement, we haven't successfully grouped anything. We can also use a where clause in this statement and we pop that after the from and before the group by. So we could say where the address ID is less than 10.
And if we run that, we can see that our row count from the original table, if we were to add all of these up adds up to nine, which is what we would expect given that the ID's begin at one. Now we can use the where clause on any field in the table, and it should always come before the group by statement, but we can't use the where on our new column, the aggregated data, which we've given a heading of CT. We'll look at how to do that in the next video.
Join Emma Saunders as she shows you how to design and write simple SQL queries for data reporting and analysis. Review the different types of SQL, and then learn how to filter, group, and sort data, using built-in SQL functions to format or calculate results. Learn a bit about data types and database design. Discover how to perform more complex queries, such as joining data together from different database tables. Last but not least, Emma shows how to save your queries as views, so you can run them again and again.
- Using different versions of SQL
- Retrieving data with SELECT statements
- Filtering and sorting your results
- Transforming results with built-in SQL functions
- Grouping SQL results
- Merging data from multiple tables
- Identifying data types, and how to make sense of your database design
- Saving SQL queries