From the course: Learning Data Analytics: 1 Foundations

Modeling data in Power Query

From the course: Learning Data Analytics: 1 Foundations

Modeling data in Power Query

- [Instructor] You don't always have backend access to a database and querying tools to use to model data. So what do you do when you don't? You can use Power Query. It's built right inside of Excel and we can bring our data in, and through the use of merge queries, model it the way we need it. I want to take a look at our wage data and some reusable data from postal code information that we got from geonames.org. So the very first thing we'll do is go to Power Query through the Data tab. I'll go to Data, Get Data, From File, and From Workbook. I'll navigate to my Exercise Files. And the very first thing I want to do is pull in my PostalCodeData. And I'll choose PostalCode. Immediately, I notice something a little bit off with my postal codes. One, postal codes are not three digits long. Let's go ahead and choose Tranform Data and see if we can discover the problem. I see the header is called TextPostalCode, but what it actually did is it changed the type. Power Query attempts to understand the data types and will automatically create them for you. I'm going to actually delete this step. And when I do that, I see my TextPostalCode and all my leading zeros come back. Want to go ahead and change this to a Text data type, that way it keeps those leading zeros. I don't have to worry about it again. I actually don't need the zip code anymore. I can right click and remove it. This data only represents the United States. That's still a lot of data. Let's go ahead and do Close & Load. And this'll bring this postal code data directly to the spreadsheet. And notice that it maintained my leading zeros, which Excel normally drops. All right, let's go grab our next dataset. I'll go to Data. I'll choose Get Data. I'll go From File, From Workbook, and I'll choose my WageSurvey data, and I'll choose Import. I'll choose Survey and I'll go ahead and transform the data. I have 343 responses from a survey. One of the very first things I want to check is the postal code. And I'll scroll over. First thing I'm going to do is right click and move it all the way to the beginning. Just more convenient when I get ready to merge it. I also noticed that it's a one, two, three or a number of data type. And really, to merge or to join, they need to be the same data type. So I'll go ahead and change this to a Text. Okay, great. Let me expand my queries there. So I have my postal code data and I have my survey data. Okay, now I have my two data sets ready to merge. On the Home tab of my ribbon in Power Query, I'll go to Merge Queries. If I choose Merge Queries here, then it will simply merge these two datasets into one. I like to keep mine separate so I'll hit the dropdown and tell it to merge the queries as new. Now it's asking me to select the tables. I'll go ahead and choose my PostalCode below, and just like a query where we have to join the two fields, here we select the fields that should match. Now, I have my survey data at the top and I have my postal code data at the bottom. I have both the postal codes selected and I see my join type, Left Outer, all from the first and matching from the second. So it's going to show me all of my survey data and where there's a matching postal code. It tells me that it's matching 337 of my 343 rows. That means either I have a miskeyed postal code or there's a postal code missing. I'll go ahead and click OK. And now I have my new merged query. Okay, what I'll do now is right click that Merge 1 and choose Rename. And I'll call this SurveyPostalCodes. Okay, I'm going to create another query because I want to analyze just the survey results of the people in the state of Alabama and the 67 counties. So I'll go ahead and choose Merge Queries. I'll do Merge Queries as New. And I'll perform the same join. I'll choose Survey. I'll choose PostalCode. Power Query doesn't offer left and right on the screen, it offers top and bottom, but the outer still works the same. So I'm less concerned about left or right and more concerned about outer and using those screen prompts. In Power Query, top is the left and bottom is the right. All right, so I'll choose a Left Outer. I'll choose my postal codes, and then I'll click OK. Let's go ahead and call this one AlabamaSurveyResults. Okay. So I'm going to go ahead and close my query settings, and I'm going to scroll over. So here I see my PostalCodes table. In a query, we drag and drop or double-click fields. In a merge query, we just expand. And here we can tell it what we want to bring over. I want the city, the state, and the county. I can click OK. Now, I'm going to report on this in map so I might need that latitude and longitude. Let me go back to my query settings which I closed earlier and hit that gear shape. Let's go ahead and add the latitude and longitude, and then I'll click OK. I'll go ahead and go back to SurveyPostalCodes. I'll scroll over here to the merge. And here I want everything but that TextPostalCode. Don't need time zone for my reporting either. I'll go ahead and click OK. Notice that it appended the postal code name. If I don't really want the title name in there, I can also address that. You see this Default column name prefix? I can delete that. And I can click OK. And then I just get those primary city and states. We do the same thing for my AlabamaSurveyResults. Again, that's just dependent upon what you want in your reporting. Perfect. Okay. So I have one final step and that's to actually filter for the state of Alabama. So let me scroll back over. I'll hit that dropdown by my state. I'll choose Alabama and click OK. All right, and then I'm ready to load all of this to the spreadsheet. Okay, and on my AlabamaSurveyResults, I'll go to Sheet4, and notice that it tells me I'm in the SurveyPostalCodes. I'll go back to Sheet5. So that on-screen Query & Connections can help you. So I'll go ahead and name this AlabamaOnly. I'll go to Sheet4, name this All Survey with Postal. Original Data. And my Postal Codes data. Rephrase, I'll go ahead and delete Sheet1. I don't need it. Okay, if I click back on the AlabamaSurveyResults, notice that it takes me directly to the Alabama sheet. Okay. I'll go ahead and save my work. Modeling data has become so much easier over the years. We model data to get it ready for further cleaning, for better reporting, but just remember, a little bit of knowledge will take you a long way.

Contents