Join Dan Sullivan for an in-depth discussion in this video Subqueries in SELECT clauses, part of Advanced SQL for Data Scientists.
- [Narrator] We usually select data directly from tables and views but sometimes it's helpful to be able to get data from the results of another select statement. Now we can do this using what are called Sub Queries. Sub Queries can be used in three different parts of a select statement. In the list of values returned, in the From Clause and in the Where Clause. Let's work through an example of each of these three. We'll do the first in this video. Let's start by building a basic query. So let's select oh the last name, the salary, and the department from the staff table.
Now, because we'll have multiple select statements within a single query we'll want to make sure the database can tell which table each value comes from. So to do this, we'll use a table alias and include that alias as a prefix for each value we'll return. I'll assign an alias of S1 to the staff table and add S1 as a prefix to each column I want to return from that table. Now when we execute this, we'll get the data we expect. Last names, salaries, and department names. Now, in addition to the last name, salary, and department, I'd like to include the average salary for each of those departments.
This will let us easily compare a person's salary with the average for their department. Now to do that we'll use a Sub Query that calculates the average salary for the department that matches the department of the employee's row. So to do that I'm going to add another value here but instead of the value coming from the table it's going to come from a select statement. And I will select the average salary. Now average can return a lot a decimal points so I'm going to actually round that to the nearest dollar and I'm going to select the salary or the average salary from the staff table.
Now I've already mentioned staff table once and have an alias for it. So I'm going to use a different alias to indicate that I'm talking about it, sort of a different reference to the staff table. And the other thing I want to do is make sure that I include in the average just the salaries for the department that I'm working with. And to do that I'll say anytime I refer to the department for S2, I want that department to be the same as the department that's listed up above in S1. Let's take a step back and look at this.
We have our outer query which includes a last name, a salary, and a department. And then we have this inner query or sub query that calculates the average salary from the staff table but only using rows where the department is equal to the department of the employee we're currently looking at. So that's how the sub query works and we'll execute this statement. And we'll see now that we've added an extra column. This column lists the average salary for the corresponding department, which in the first row is computers. The second row is sports, third is automotive and so on.
The important thing to note is that we use a sub query where we'd usually use a column name. Now within that sub query, we need a Where Clause that references a table in the top level query so that the sub query knows which row is referenced.
The course begins with a brief overview of SQL. Then the five major topics a data scientist should understand when working with relational databases: basic statistics in SQL, data preparation in SQL, advanced filtering and data aggregation, window functions, and preparing data for use with analytics tools.
- Data manipulation
- ANSI standards
- SQL and variations
- Statistical functions in SQL
- String, numeric, and regular expression functions in SQL
- Advanced filtering techniques
- Advanced aggregation techniques
- Windowing functions for working with ordered data sets