Join Dan Sullivan for an in-depth discussion in this video Window functions: OVER PARTITION, part of Advanced SQL for Data Scientists.
- [Instructor] Window functions allow us to make SQL statements about rows that are related to the current row during processing. This is somewhat like the way subqueries work. They let us do an operation that's related to the current row that SQL is processing. For example, instead of using a subquery to calculate an average salary for an employee's department, we can use a windowing function on rows called OVER PARTITION. Let's take a look at an example. Let's select department, last_name, and salary.
So that'll list for us the departments, last name, and salaries of each of the employees, but I also want to look at an average salary for each department. I can specify the average salary aggregate and then say I would like to have the average salary over a partition. And I use the phrase PARTITION BY, and then specify how I want to do my grouping. In this case, I want to do it by department, and I'll select this from the staff table. So again, what I have is I've selected department, last_name, and salary, and I'm going to also display an average of salary for each department, and that's what the OVER PARTITION BY statement does.
Let's see what we have here. What we'll notice is we have a list ordered by department, and then we have individual's last names, their salary, and then the average salary for each person in that department. So we'll notice for the automotive department, the average salary is 99,658. Now, let's scroll down and see a different department. We'll notice in the baby department the average salary now changes. It's about $93,750. We can use other aggregate functions too.
For example, let's change the average to max, and I'll leave everything else the same. And I'll execute that query. Now what I'm listing includes the department name, the employee's name, the employee's salary, and then the maximum salary in that department. So for example in automotive, the maximum salary is 146,167, and we'll scroll down again to a different department to see that it shifts. In the baby department, the maximum salary is 148,687.
Now, let's change both the aggregate function and the column we partition by. So I'm going to clean up. I'm going to use the view that we had created, and what I'd like to do is SELECT company_region, last_name, salary, and the minimum salary. And here, I'm going to do it OVER a PARTITION BY company_region. And since I'm referencing company_region, I'll select from our view which we created earlier called staff_div_reg. And I'll execute, and here, what we have is a similar list, but instead of department, we have company_region, and we have the minimum salary that's earned by anyone in that region.
So for example, in British Columbia, the minimum salary is 40,194, but let's scroll down to another region. For example, here in the northeast, the minimum salary is $41,026.
The course begins with a brief overview of SQL. Then the five major topics a data scientist should understand when working with relational databases: basic statistics in SQL, data preparation in SQL, advanced filtering and data aggregation, window functions, and preparing data for use with analytics tools.
- Data manipulation
- ANSI standards
- SQL and variations
- Statistical functions in SQL
- String, numeric, and regular expression functions in SQL
- Advanced filtering techniques
- Advanced aggregation techniques
- Windowing functions for working with ordered data sets