To get the most from Excel's Data Management tools, make sure list-type data has no empty rows or columns and that column headers are in a single row on to. Avoid putting too much data in a single column; people's name should be in reverse order or in separate columns. City, State, Zip should be in separate columns.
- Before working with lists and Excel, you should be aware of a few standards that are going to help you be more proficient as you consider using, for example, sorting, filtering, subtotal, eventually maybe, pivot tables. All these tools work best if your data adheres to a few basic standards. Now, I've violated a number of standards in this workbook, it's called poor design. And Excel didn't stop me at any point, I've got data here, I can certainly use some of this data, I can print it, there are a lot of things I can do. But on the other hand, there are a lot of features that are not going to work very well as I try and, for example, sort or filter the data the way it currently appears.
First of all, what might strike your eye is that empty row 13. Now, again, with a small list, it's not going to cause any huge problems or anything like that. But as simple of example here, what if the list is substantially larger and there's an empty row in 100 or 300? What happens if you click within the data here and say to yourself, "I want to sort the data"? Now, with the obvious row showing here, immediately we'll see what can go wrong. Click on the data tab in the ribbon and even if you don't know anything about sort, as I click this button, look what happens.
You can see in the background that excel is about to sort only the data that we see. And what else are we seeing here? There's a hidden column, column E and we'll get to that in a moment, but I think you can see clearly, if we proceed with sorting here, we've got a mess on our hands. So let's get out of here, cancel. First of all, the sort only went down to row 12 because all this information down here is empty. Excel will not proceed below that unless we override it. So, let's get rid of empty rows. Someone might have put these here for printing reasons, just possibly to separate the list a little bit.
If you want to work around this and work more efficiently with Excel, get rid of the empty row. Right click row 13 in the example here, delete. Now, if in printing, if you want to simulate the look of an empty row or a space, you could click on row 13 and then drag its bottom border, the line between 13 and 14, drag it down roughly twice as tall, doesn't have to be perfect, like that. And then in your print preview, and I'll get there with control P, you'll see that there is a break in the data and that could be all you're looking for.
Escape to get our of here. But ideally, that wouldn't be the case. Go back to a normal view here. I'll double click the boundary between row 13 and 14. Now, the titles here look a little bit unusual. And they're in two separate rows. Bad idea. You wouldn't know that at first but, for example, if I click here and start to sort, Excel in the background, in this case, does figure out that I've got two rows of labels on top. But you can't count on that working all the time. Then if you start to work with this data and pivot tables possibly, or if you're about to insert subtitles, occasionally Excel doesn't see that properly.
Get your titles in a single row. A not so run on shortcut here that's really helpful, I'm going to adjust, not all these, but I'll show you an example. In column A when I put contact name together, we've got enough space there, so I'm going to double-click in front of the N in name, double click. And now I'm going to type contact. If I do want the word name to appear under this, but in the same cell, I'll press alt enter and then enter. And that's acceptable, I will eventually get rid of row one. I would adjust the others as well here too, it's going to take me a little bit of retyping time, but one more example here.
I can do this either in D one or D two, but eventually put these together. For example, right here, I'm going to double click behind the Y in social security. I'll press alt enter to introduce a line break and then type number and enter. And I'll get rid of this data here and simply move this down. And eventually I'd adjust the others in a similar way and get rid of the top row after I've realigned the names in the text here. So if you do need multiple words and it would make the column a lot wider, use the line wrap. You can do this using another technique on the home tab called wrap-text, or when typing you can use alt enter.
So you clean those up to make that work ideally. Now, earlier when I jumped in here to start sorting, I had an empty row, now if I start to sort, it does not see the empty row, but there's still that problem of a missing column. Not really missing here, but hidden. There's nothing wrong with hiding columns, makes good sense sometimes. Maybe you want to print this list and not show column D, let's hide it. But where's column E? Drag across the columns that surround the empty column, you can right click unhide or you can double click the boundary between the two.
There we are. That's empty, maybe you did intent to put data in. Fine, go ahead, give it a label, put in some information. But if we don't really need this, get rid of it. You don't want empty rows or empty columns within your data. Right click and delete this. We've got some other problems, columns A, B, and to a lesser extent column C, all have the same problem. And that is too much data in the same column. It doesn't necessarily look like a lot of data, but the way this is structured here, we cannot sort this list by last name.
Now if it's only five, 10, 15 that's not a big deal, we can probably find the names pretty quickly, but at some point or other you're likely to want to sort this list by last name. And you cannot do that with the names looking the way they are right now. The suggestion here is to do one of two things. Either split these names into two separate columns, one called first name, one called last name. And by the way, you certainly could have situations where you have middle initials as well, maybe it'll take three columns. That's one approach. Another approach is to restructure these names so that we see last name, comma, space, first name.
For example, Wagner, comma, space, Max. Now, there are multiple ways to clean up data. But in this course we're only going to discuss those in passing. Here's a technique you would want to use to clean up column A. I'm going to right click column B and insert a new column. There's a relatively new feature called flash fill, it's found on the data tab. It is one of the techniques used for cleaning up data. Another one takes a bit longer, more involved, but more capability ultimately, text to columns.
But here's a feature called flash fill. Automatically fill in values. I'll use this here in column B by typing Wagner, comma, space, Max. Now, I'm simply going to press control enter so the active cell doesn't move, I'll point to this flash fill feature, it has a key stroke shortcut by the way, control E, watch what happens in column B as I click flash fill. We see a corrected entry. Now the titles are a little bit off, we don't are about that necessarily, but the names are the way we want. And then we get rid of column A eventually.
So we clean that up. Now, I'm not going to clean up column C, but that's a problem too. Can we sort this list by state? No, we can't. Can we sort it by zip code? No, we can't. So either using flash fill or text to columns, we could clean up that data. But once again, we've got the issues of too much data in a column. In other words, too much data in each successive cell. We can't sort our list by state or zip code like this. We need to split this data into three separate columns. Now, column D you might not need to do this.
But, if this were a city department and we need to locate all the people on Elm Street or Elm Court, we can't sort our list here by Street name or even by the street indicator. And there might be an Elm Court, an Elm Lane, an Elm Place, and lots of other Elms within the city. Like in the Atlanta where there area lot of Peach Tree. So we can't sort this list by street name unless we split this into separate columns. And again, with certain kinds of data that might never be needed, but nevertheless, we could sort this by street number and street name and even street indicator.
In other words, court, lane, boulevard, etc. Alright, next column. Social security number. As I click on one of these cells, look in the formula bar, the dashes are already there. Now, that's not horribly wrong and certainly the data is readable and all that. If you are making comparisons or using look ups, trying to match this up with social security numbers elsewhere, it's quite possible the other social security numbers don't have dashes in them. Not everybody is aware of this idea. You can have a special format applied here and you don't need to have the dashes.
So I'm getting rid of the dashes. I'm going to click column E here, and on the home tab in the ribbon, far right button here, I'll choose replace, replace what? Dash, replace it with what? Don't put anything in the replace with panel, replace all. Look what happens. 38 of them eliminated. That means we've got 19 of them, 19 different rows here of data, we got rid of them. So what do we do now? Apply a special format. This is being applied even to cell E too, although there's text in it, won't make any difference.
There is a special format here. We can press control one to get there quickly. That takes us to format cells. On the number tab, special and there it is, social security number, click OK. Now we see the dashes, you look in the formula bar, you don't see them. What does this mean? Every time we type a social security number we type nine characters, not 11. Any time we make a change, if we double click in here to edit, or if we click on the formula bar to edit, we don't add the dashes we simply make the change and press enter.
Similarly, with phone numbers, it looks as if the parenthesis were actually types as well as the space and the dash. So these were entered by typing 14 characters. It's going to be faster if we need to only type 10. Or when adjusting these if we only make the number change. So here too, let's press column F, use find and replace. Let's begin by replacing all dashes with nothing, that'll take care of the dashes. We see that they're gone. Now we're going to replace the left parenthesis with nothing. Replace all, they're gone, 19 of them.
And now we want to replace right parenthesis and space with nothing. Right parenthesis space, replace all of them in all three cases here, those are gone, 19. And let's format these as we did with the others. We could close that dialogue box. Press control one again and this time we'll go to special and chose phone number, OK. And double click the column boundary between F and G. and here, too, look at the formula bar, we don't see those parenthesis space and dash.
And then 10 we double click to edit, we'll make our number change without adding or dealing with those other characters. Nothing wrong with column G, this is displayed in a way that you're not necessarily familiar with. But if you want to change that, a common choice would be in the number group on the home tab, click the drop arrow, choose short date. Not necessarily better in all cases, but it could be the display that you like. Column I, putting two kinds of dates in the same column, make each of these unworkable. We cannot use these in any computational way if we've got two sets of dates here.
You want these in two separate columns. Starting date, ending date, last date, something like that. So, this too is a problem. A number of different issues here in this list that are inappropriate for working with the data efficiently. There are certainly some other examples, other ways of putting in data that might also violate standard rules for improving productivity, but here are some common ones. And the one I see most commonly is first name last name as we see them over here, we want to turn these around. As I did in column B there.
Not the way we see them in column A. So a number of different techniques here on this worksheet called poor design. So when you're setting up data, remember some of the tips that we've seen in looking at this list. We want to avoid all the problems that can occur if we don't structure our data efficiently.
- Prepping data for analysis
- Multiple-key sorting
- Sorting by rows or by columns
- Setting single- and multi-level subtotals
- Using text, numeric, and date filters
- Creating custom filters
- Filtering tables using slicers
- Using Advanced Filter
- Eliminating duplicate data
- Using SUMIF and COUNTIF functions for quick data analysis
- Working with the database functions such as DSUM and DMAX