From the course: Learning Data Analytics: 1 Foundations

Building aggregate datasets

From the course: Learning Data Analytics: 1 Foundations

Building aggregate datasets

- [Narrator] We clean and transform a lot of data so that we can total, count it and average it. And then we send that aggregated data to other people. This type of aggregation can be frustrating when you're dealing with large data ranges in Excel and you're limited to doing sum, average and count functions. Let's quickly look at how we can produce these aggregate data sets in Power Query. Okay. So I'll start with our marketing list. I'll right click and go to edit. Let me explain my list of queries. Okay, so for the marketing list, I want to go ahead and right click it and I'll reference it because if I make changes to my marketing information I want it to carry over and I'll rename this, thank you counts. Okay, great. So now that I have my thank you counts I'll go ahead and create the counts. If you'll go to the transform tab you'll see a valuable function called group by. Okay. I want to actually work with the thank you notes. So when a group by a thank you note and I want to count the rows. Now, what this will do is tell me how many records get thank you notes and how many do not. I'll go ahead and click okay. Perfect. Now I get a count of how many records have no note versus thank you note. Okay. Let's go up to our sales order header. Again we want to do some counts of our sales orders. We may also want to see what the average sales order is, as well as what's the total orders. So I'll go ahead and right-click and I'll reference this dataset, and I'll go ahead and call this one order totals. Okay, I'll go to my group by, I want to get a count. I want to count the rows, but I also want to get a total of the order and the average order. So I'll go to advanced and I'll add an aggregation. And what I want to do is I want to actually create a total, order amount, and I want to sum, and want to sum the total order. I want to add another aggregation so I can do the average order amount and I'll change my operation to average. And I want to do that by the total order. Now, right now I have a grouping by customer key, but I can choose any of these fields. So if I wanted to do more meaningful like display name then what it would give me is a display name and as a grouping level, and then I would get a count the order and the average order amount. I can also choose options like date ordered or date shipped. Okay, let's leave it at customer key and then I'll click okay. So now I get account by each customer, I get a total order amount and an average order amount. Let's go ahead and convert these to currency. I think I actually want my average to be before my total order amount. Now I can swap them here or I could go to my grouped raws, choose my gear shape and leave them. So I'll hit my three little dots and I move that up. If I decide that I don't need the count I can just click those three little dots and delete it. And then I'll click okay. And now I have a subtotal for every customer, their average order and their total order amount. Let's go ahead and do closing load. We'll name this aggregate by customer and we'll name this sheet, thank you counts. There are a number of reasons why you might aggregate data. In some cases you might be the only person who has access to the raw data. You may only provide the high level aggregated data. Anything we can do to simplify that process will ensure that we've removed any human error, also, we can have a higher level of confidence and with Power Query, it's a lot less rinse and repeat than dealing with building functions.

Contents