Ready to watch this entire course?
Become a member and get unlimited access to the entire skills library of over 4,987 courses, including more Business and personalized recommendations.Start Your Free Trial Now
- View Offline
As fast and easy as the Remove Duplicates command is to get rid of duplicate data, what it does not do for you is indicate what has been duplicated. So for example, in this list here which does have duplicates in it, there is a Juan Bishop in Row 5, also one in Row 10. If we want to identify which records have been duplicated before getting rid of them, what we need to do is sort the data. So I do have another list here, an exact list, but it has been sorted, clicking over here.
And in this list we will see two Juan Bishops together. If you do want to identify duplicate records here, what we need to do is after doing the sorting, insert a new column to the left of Column A. And what we are going to do here is to put in a formula. I want to show you or start off at least with the long way and then show you a nice quick way using an array formula to identify which records have been duplicated. So, here's the formula I am going to put in here. I can start probably in Row 2.
It's good to see everything in place. =if. Now you may or may not be familiar with the If function but this I think almost explains itself. Here is what we'd like to say. And we want to include the word And. And for the moment we are only thinking of rows 2 and 3 and we know they are not duplicated but they could be. So we'd like to indicate if this cell B2=B3, comma, and if C2=C3, comma, and if D2=D3, comma, and on and on and on.
Now this is going to get really long, but if they are all equal, and why don't I just cut it short here for the moment and then show you a better way to do this? But we're on the path here for this to make sense. If these pairs are all equal to one another, I'll just go this for right parenthesis, comma. In other words, if all those pairings are true, that they are all matched, then what do we want to see here over in cell A2? "Dup". But if that's not the case, if any ne of those is not the same then we want to put in a word Unique. Right parenthesis and Enter.
And those are unique and we can copy this down the column relatively quickly simply by double-clicking the lower right-hand corner. So all these are unique and sure enough down here in Row 50 there we go, 53, that's a duplicate, and we'll find others eventually. Now we've got several hundred names. No reason to scroll through all of this. We could have 70,000 or 700,000. This at least identifies the records that have been duplicated. Possibly you can use a filter and just show the duplicated ones. We could do that.
You will need a dummy heading up here. Put in anything for the moment. If we introduced filtering right now we could then on this column right here simply show those that are dup and then see the list of the records that have been duplicated. Sometimes that's important, sometimes not. But this is the way to get there. Let me remove the filter and to complete this formula and actually do it all the way across, in other words not just to Column F but all the way across, seems like it would take a lot of work.
In other words, to really do the check here fully we'd want to check all the way over into in this example into Column L. But here is a better way to do this and there is a special kind of formula in Excel called an array dormula. So what we'd like to do here is actually change this to read B2:L2=B3:L3. In other words, as I delete all this, all the cells B2 all the way over to L2 we are going to check all of those and we are going to compare them with the corresponding cells in the row below.
And this is a lot shorter than what we just saw. Now, this is what we call an array formula. In order for this to work properly I need to press Ctrl+Shift+Enter, not simply Enter. Ctrl+Shift+Enter, there we go, and I'll double-click to recopy all this. We'll still find our duplicates down there in the same location, but it's a much shorter formula. Let me once again display this a little larger so we can see it, make it a little bit wider, and so what we are seeing again in English-- let me make this even wider so we can see it.
We're going to compare B2 with B3 and C2 with C3 and on and on and on all the way over in Column L. If all those are the same, we've got a duplicate. If any one of them is different it's going to be unique. So this is how to identify the records that are duplicates. It's an unusual kind of formula. You may or may not have seen array formulas, but I think you'll see pretty clearly. Any time there is the need to identify the duplicate records, a formula like this will get the job done. If you'd like more information on array formulas and how they work, you might want to check out another course in this series, Advanced Formulas and Functions for Excel 2010.
There is an entire chapter on array formulas there.
- Multiple key sorting
- Single and multiple column numeric filters
- Creating a top-ten list with values or percentages
- Setting up subtotals
- Creating multiple-field criteria filters
- Creating unique lists from repeating field data
- Using the Remove Duplicates command
- Finding duplicate data with specialized arrays
- Counting the number of unique items in a list
- Using SUMIF and COUNTIF functions
- Working with the database functions such as DSUM and DMAX