This video covers some of the most common data transformation tools in the Query Editor, such as adding or removing columns or rows, changing data types, and so on.
- [Instructor] All right, so let's kick things off with some basic transformations. For these, we're going to use the Home tab of the Query Editor. Note that this isn't the Home tab of the generic Excel screen. This is the Home tab within the Query Editor itself. And here where we need it is a bunch of different tools. We've got the Choose Columns and Remove Columns options. One tip here. If you always want a specific set of columns and there may be more added or removed down the line, use the Remove Other option and make sure you always get the set that you want.
Similar tools for rows. You can keep or remove the rows, you know, a certain number from the top or bottom. That's nice if you're pulling data and it's got, like, header information or footers or subtotals that you want to remove. These row tools are really helpful for dealing with that kind of stuff. You can also get rid of blank rows or errors, which is nice. And one tip here. You can use the Remove Duplicates button if you want to create something like a new look-up table or a column containing just the unique set of values. From there, you've got sorting options, A to Z, low to high, kind of standard stuff.
You can change data types. This is a really commonly used one. Excel does a pretty good job recognizing data types, but sometimes it gets it wrong. There are a few places to change those. You can promote the header row, if Excel doesn't recognize that the first row of data is actually headers as opposed to values. Then you can also duplicate, move, and rename columns. Just another tip here. If you right-click the column header, you can access a lot of these same tools as well as a few others. So one thing you'll notice, kind of as a theme is that you'll find the same tools in multiple places throughout the Editor window.
So with that, let's go ahead and jump into Excel and load our first file. So what we're going to do is open up a brand new, blank workbook. I'm going to open the Data tab and choose from the New Query options in the Get and Transform section. Let's go into From File and pull a CSV. I've navigated to the folder that I created on my desktop with all those course resource files. So got all those look-ups, got transaction files, got return data.
And the first one that we're going to work with is the Customer_Lookup. So I can just double-click to open that up. And this is like our little preview window, where it shows the file origin. You can identify how it was delimited. Comma is correct since this was a CSV, comma-separated values. And then it looks at the first 200 rows and it says, all right, here's my best guess at what's going on with this dataset. I think this is a header row. I think these are the different data types. And it generally does a pretty good job with it.
So if you're happy with it, you could load or load to straight from this preview window. Personally, I'd recommend against that. Instead, we're going to hit Edit and that's going to take us to the Query Editor window. And this is just a nice way to spot check and QA your data. You can dig a little deeper. You can make any adjustments you need to make. So by default, I would say, let's always use the Edit option because then we just close and load from here. Now, a little insight into my personal process.
Generally, whenever I connect to a source with Power Query, the first thing that I do is check my table name because like I said before, that's really, really important for a few reasons. One, it's very hard to change after the fact, especially if it's referenced in a whole bunch of formulas and measures. And two, if you have long or confusing table names with spaces in them, for instance, you've got to surround them with single quotes. So I like to create short, pretty concise tables names with no spaces.
And the other little tip. You'll see that it says Customer_Lookup as opposed to Customer or Customers. We'll get at that in the Data Modeling 101 section, but it's a nice little label that I like to apply to differentiate my look-up tables from my data tables. So in this case, table name looks good. I'm going to keep this as Customer_Lookup, which it pulled right from the title of the CSV file. And the second thing that I usually check right off the bat is that my headers have been promoted to header rows and they're not being treated as values.
So we can check that box, that looks good. And then I look at the data types for each column. You can see that in the top-left corner of each one. So customer id is a number, that looks good. Account number is a decimal. That's fine, we can change it to a whole number. Doesn't really matter. And this is just saying, Do you want to replace any current steps or add a new one? I'm just going to replace that step, change it to a whole number. First name, last name, address, city, state province.
These are all text, which is good. And let's keep moving on here. We've got postal code as a number. Now, this is an interesting one because I don't want to ever perform any sort of mathematical or statistical operation based on the postal code as a value. So that's one that I'm actually going to change to text instead. And it's giving me this Change Column Type message because it had, by default, tried to set that type as a number. That was Excel trying to, you know, be smart.
So by saying Replace Current, I'm basically saying, no, Excel, this is not a number. I really want it to be text instead, so I can replace that. And there you go, it's converted to text. Country looks good. Birthday, that little calendar icon, that means it's a date. Sounds good. Got text for marital status. Income, gender. Total children is a whole number. Children at home, whole number. Education looks good. Account open date. Everything else looks spot-on.
So first two steps, pretty much every time I open a file. Check the table name. Check the column headers and the data types. Now, we can apply some other simple operations here. You know, maybe we don't need the account number here. You could either right-click and remove or you could go into Remove Columns from the menu. Again, two options to do the same thing. We don't want to remove any rows here because we don't have any headers or footers. We kind of want all of this data that we have to work with.
You know, one thing we might want to do is just make sure that our customer id is sorted. Ascending, because this is a set of sequential values. You know, not critical, but might be a nice way to just scroll through and make sure that id is populating as expected. And that's really it for now. Those are the only simple tools and transformations that we need to account for here with this dataset. And we're good to go. So the last step is just to close and load. And we're always going to choose Close and Load To.
We don't want to load this to a table into an Excel spreadsheet. We want to load this right to the data model, so Close and Load To. Just create the connection and absolutely add this to the data model. Boom, load. So as soon as I press Load, you'll see that Workbook Query window pop up. That was super quick, it was only 10000 rows. And this tells me that the Customer_Lookup data is now accessible in my data model. So quick sneak peek. We can manage the data model right from the Data tab.
And we'll see one tab with our one table. Customer_Lookup right here. And now, we can go ahead and just close that data model window. Back to Excel. And let's go ahead and save this project. And save it right to my desktop and let's call it FoodMart Data_Model. And we can save it just as normal Excel workbook, nothing fancy. Press Save and there we go. We just loaded our first dataset using Power Query.
- When to use Power Query (Get & Transform or Get Data)
- When to use Power Pivot
- Working with the Query Editor
- Options for loading data in Excel
- Working with text-specific and number-specific query editing tools
- Creating a rolling calendar with Power Query
- Grouping and aggregating data with Power Query
- Modifying Excel workbook queries
- Merging and appending queries
- Pivoting and unpivoting data with Power Query