Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
Many Excel users these days are dealing with increasingly large lists of data, and suppose this list that we're looking at here was sent to you. I think pretty quickly you'd realize that in rows 4 and 5 you've got somebody with the same name. Well, that's certainly possible. But then you look and columns of the right, same building, same department same social security number--well, obviously we've got a problem here. We've got a duplicate record. Not uncommon these days. And Microsoft has recognized that, and as of Excel 2007, you will see that in the Data tab in the Ribbon, there's his choice called Remove Duplicates, and it's a great feature, and we'd definitely use it here.
But there are times when you want to know something else about the duplicate data. You want to, in effect, know which records, or which rows, have been duplicated. In other words, let's not simply get rid of the duplicates, which is very efficient and fast; let's find out which records are duplicated. This particular list here only goes down to row 762. I say "only," but even that's a bit difficult to scan and have you figure out manually what's duplicated and what's not. So, let's do this by way of a special kind of formula called an array formula, and let's insert a new column to the left of column A. You can simply right-click on column A and choose Insert.
And we're going to be putting in a formula here, and although ultimately the column doesn't have to be that wide, let's make it wider now, so we can see this a bit better. And in English, here's what we're about to do. We're about to use an IF function to say, in effect, if this cell is equal to this one--of course we want to do this for the entire list, but let's just for the moment focus on these two--if these two cells have the same content. Now there could be, as suggested, people with the same name, but if that's true and the building is true--in other words they're both the same, and same thing with department, same thing the Social Security Number and theoretically we can go away across the column and say in effect, are all of these the same? Maybe if you say we only have to worry about this many of them, that's probably going to be sufficient.
In other words, if those are the same, we've got a duplicate. So, how might we do this? And let's just start right here in context. We'll essentially drag it downward, but also up a few cells as well. What we need to do here is to put in an IF function, and we in effect want to say, are the two names cells the same, the Building cells, the Department cells, the Social Security Number cells? We're going to be using an IF function. If I were to put in right now B4=B5, we're more or less on the right path, but then we'd have to also, after putting in a comma, check to see if the C4 equals C5. Do this for column D, column E, and maybe in a real-life situation all the way out to column M or N or however far it goes.
You can begin to see how long this formula might be. We need also to put the word AND in front of the before. Now this is a function in its own right, but it's very often used within an IF function. Here's a shortcut that you wouldn't exactly guess. Instead of putting in these constant formulas here, let's just change all this to B, and then we'll do it by dragging B4 through E4 = B5 through E5. Right parenthesis.
We're not finished yet. This is a compound condition, comma. When all of those four tests are equal to one another, what do we have here? We've got a duplicate. Let's just put in "Duplicate," comma. When they're not all the same--in other words if any one of those pairings is not equal to the other one-- we want to put in "Unique." In other words, the records are not the same. Double quote. Right parenthesis. Now, if you've never worked with an array formula and you saw this being explained and you missed the last step, you'd be lost, because most formulas, we press Enter.
That's not very valuable. Jumping back in to edit this. An array formula which allows us to handle multiple sets of data, you might call them parallel sets of data, works only when you complete the formula, not by pressing Enter, but by pressing Control+Shift+Enter, as I'm doing right now. And that is a duplicate. It's comparing these four cells with these four, one by one. Once again, here's what the formula looks like. What I'm going to do first here is click in the cell and look in the formula bar.
An array formula is indicated by curly braces. Now, I didn't type these, and you never type them in these kinds of formulas. And when you edit them, either by clicking in the formula bar or by double-clicking here, you don't see them. And some of you are probably saying to yourself, "Is he making this up?" But as he says it, no I'm not. This is the way it works. These are array formulas. You must press Control+Shift+Enter to make them work. I'm going to drag this manually up into these two cells, and we can see clearly here that in this formula here that compares these two, that's not a duplicate.
In this formula here that compares the entries in rows 3 and 4, that's not a duplicate. But this one surely is. And double-click to copy this downward, all way down the column. And as we scroll through here, here and there we will see a duplicate. There's one there, Angus Kent. And we will see a few more here. So we've identified which ones are duplicates. We haven't eliminated them, even though for many people that's the most important thing. But at certain points you do want to know which ones are duplicate, and this array formula does get the job done.
I use this formula a lot, and not that long ago I forgot to put in the AND, so don't forget the AND, and don't forget all the double quotes as you see them here. So this is a great way to identify which records have been duplicated in a list.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.