Join Gini von Courter for an in-depth discussion in this video Troubleshooting queries, part of Power BI Features in Depth.
- View Offline
In the Chapter Two Exercise Files, you'll find a workbook called CIA_data. If you don't have access to the files but you've been working along then, this is the same section of data about population growth rate that we used previously in this chapter, this data comes from the CIA. And it has some issues that I'd like to show you, so we can talk about how we troubleshoot data. If I take a look at the data that I have, it looks like population growth rate, which is a key column, is text.
And I say that because it's on the left-hand side of the column. I'd like to convert this text to a number because I want to use it in calculations. I want to be able to illustrate it with a chart. And if it doesn't have any actual value, I won't be able to do that. So let's click Query and Edit to fire up the Query Editor. And I'm going to click this column, Population Growth Rate. And I want to convert its type to a number, currently it's text. So I'm going to choose Number, and it looks pretty good.
Now, one of the things I've learned to do though, is to take a look at the different types of information that might be in a column. Positive numbers, negative numbers, zero values. And when I scroll down I find that my zero values are okay, but when I hit the negative numbers, all of a sudden I get errors. And it says it can't convert this to a number. Now, before you assume that the Query Editor doesn't know what to do with negative numbers, let's take a look at some other possibilities. So, why might it not be possible to convert something that looks like a number to a number? One possibility is that there are extra characters that when somebody was typing the minus signs, in front of this particular set of data, that they were putting a space first.
Or perhaps they put an apostrophe. Perhaps there's something else there. So let's change this information back. I can either go back and choose Text. But it's easier if I get rid of this applied step. And let's scroll down and take a look. Now, I don't have to spend a lot of time trying to decide if there are extra spaces here. I can simply get rid of them if there are. I'm going to select this column. And I'm going to use a text transformation that's called Trim. The Trim text transformation works the same way that the Trim function does in Excel, which is, it gets rid of any leading or trailing spaces. So if you have somebody who's typing numbers, and they have a habit of typing a space afterwards, because they spend more time in a word processor or typing email than they do in Excel, then that trailing space might be preventing us from being able to transform this data. Any time you have an extra space at the start of, or at the end of a string, this is how you get rid of it. So I'm just going to trim this. And I could try then again to convert this to a number. That makes good sense to me. But unfortunately I still have errors. So let's change it back, get rid of the Change Type step. Another possibility is that there are characters in this column that don't print, and that prevent this from actually being a number. For example, sometimes you'll import data and it'll have a non-printing character, a small rectangle here. And that means that that data came, usually from a fairly large database, a relatively old database, when it just exported these little object markers that actually stand in for code that we can't show on the screen. So it's possible that we actually have some non-printing characters in here, and if we do, and again sometimes you'll see them, they'll usually be small blocks. I'm going to say, well let's just clean this column up. If there's any non-printable characters in there and that's my problem, let's clean it. So now I've done a second transformation. Doesn't look any different, but I wouldn't have expected it to. So now let's select our column and let's convert it to a number. Let's scroll down again. Nope, still have errors. Wow, so at this point, I change my type back. I could spend a lot of time coming up with various theories.
When I scroll down, what I see is that I get errors any time I have a negative number, even negative zero. So there's something about the minus here. There's something about the fact that these numbers are negative that makes them different. Now, before I spend any more time on this, I want to show you one more option. Once I've changed this column to numbers, if I take a look and I say, you know, I'm fine with just going with the first 196 rows here. Now, this really needs to be a good decision.
In this particular case, it leaves out a large part of the story. Some countries are declining in population, but there might be other times when you have 10,000 rows of data. You ultimately only need, for example, 500 in a sample and you have some random data that won't convert. In that case, you actually have the ability to go on and say, you know, just get rid of all the errors. And when you remove all the errors, you'll be left simply with data that is error-free.
But, I'm not going to do that, and I'm not going to change the type to a number. If we go take a look, this is an interesting thing, and the reason I'm showing you this is, this is not an infrequent issue. That looks for all the world, like minus 0.01. But it's not. The symbol that's there is actually not the minus sign. It's another symbol. And it is probably some version of an em dash. The way this normally happens, is you have data that was originally stored in Microsoft Word. And when it was transformed to Microsoft Excel, the minus character that was typed into Microsoft Word was actually converted to a longer dash by Microsoft Word.
And then when we pasted that data back into Excel, or imported it into Excel, it was no longer a minus. It was actually a dash when it came in. So because I have a dash here, Microsoft Excel wouldn't know what to do with this and Microsoft Power Query doesn't know what to do with it. It knows that it's text in the same way as if it were a comma, or an exclamation point, or any other piece of text that's not a minus sign. So what do we do? Well, what we're going to do is we're going to replace them.
I'm going to click replace values. And notice that I started by choosing one of the rows that actually had a minus in it. And I don't want to replace these. I simply want to replace the minus sign. So I'm going to tab down and I'm going to type a minus on my keyboard. And I want you to notice that those are definitely different in size. The minus symbol is a pretty small symbol. This is a dash, which stylistically, is easier to read when you're in Microsoft Word.
But makes a text rather than a number. So in this selected column, replace every one of those with this. This is the easiest way to do this. You can spend time trying to figure out exactly what this symbol is, you can use, for example, the code function in Excel, not in Power Query, but in Excel, to tell you what the numeric value ff this particular symbol is, but the truth is we don't care. We just want it to be a minus. So we're going to find every instance of that value and replace it with a minus sign, and I'm going to click OK.
Now when we scroll down, you'll notice that those minuses are a lot smaller. And now when I choose my column and say, you know what? I need a number data type here, and scroll all the way down, there are my negative numbers. This time, Power Query is actually able to transform the data type from text to number for every single one of my records. This is how you approach troubleshooting. Text is our most generic way to store data. And, almost anything is allowed, in a text field. Spaces, characters, symbols even, like the em dash.
When we want to store data as a date, or as a number, as a time, even as a logical. Something that we want to be able to use to manipulate it, to be able to summarize or assess. Whenever we want to do that, then the parameters are far tighter. And it's not unusual that we'll need to be able to go in and clean or trim our data in order to be able to change its data type to a type that we can use in a summarization in Power Query.
- Understanding data analysis and business intelligence
- Installing Office BI add-ins
- Searching for online data with Power Query
- Shaping data in the Query Editor
- Connecting to data sources
- Modeling data with Power Pivot
- Enhancing PivotTables and PivotCharts with PowerPoint
- Visualizing geospatial data with Power Map
- Creating and formatting Power View reports
- Sharing your data using Power BI for Office 365