You sometimes need to use Excel to check in
a data set where there are duplicate values.
So in this example, what we have got here
is a transaction number and you’ll see the
first one is 100123 and as you come down you’ll
see that same number is repeated a couple
of times.
So what we want to do is to tell Excel to
please find where these situations happen.
The function to use is the COUNT IF function.
The key logic is how we are going to do the
counting.
Just working back we want Excel to look at
the first row and to say, that transaction
number - have I seen it before? – just looking
upwards.
When it goes to the next cell, it must look
between these two and say, have I seen it
before, yes or no – how many times?
Eventually we will go all the way down here
and by the time we get to this cell (B18),
what it will do is count how many times its
seen this transaction before and in this case
it will say, oh, there is another one here.
So you are always looking above.
So lets do the “COUNT IF” function.
The range is where we are going to make a
major difference.
So I’m going to say please look at the range
from here and in this case by pushing semi-colon
(;), you’ve got the same range (B11;B11)
– and the criteria, I’m going to make
it this cell (B11).
What is key is, as I copy it down, I want
the first cell to stay the same (of the range)
but the second reference can move.
So I’m going to freeze SBS11 and press ok.
If you see Excel’s result – it has found
one time of this transaction.
If I copy it down to G31.
As we go down, you’ll see the area it looks
at is ever increasing until it gets to row
G18 where its now looking and saying in this
range, I have found this transaction twice
– and going down you’ll see 3, 4, 5 etc.
And now what we want to do is in a separate
column, we want to tell Excel where is the
first instance of any particular transaction
number shown.
So, for example, this 2, 3, 4 – all of them
are irrelevant to us – we don’t actually
care – we know that that means that they
are duplicates – all we care about is the
1’s.
So a simple way to do it is use the “IF”
function and just say, if this is a 1, that’s
fine – we’ll just put a 1 for example,
if its not a 1 put a 0 or another word.
So I’m just going to use the function wizard,
go to the “IF” function- the logical test
here is that if this cell is = to 1, then
I know that that is a unique situation, if
its true in my case I’m going to put a 1,
you may put a word saying “unique” or
whatever you want it to be.
If its not a 1, then it must be a duplicate,
so you can either be a word but I am going
to put a 0 in, say ok and now I copy it down.
So what you’ll see here is that all unique
transaction numbers, get 1s.
Wherever there are duplicates, you now get
0s.
If you wanted to now, you could set up the
data filter.
You could say just show me the 1s and you
will see only the unique numbers.
What we can now do is take it a step further.
What we want to do is uniquely identify the
unique items.
So what we want to be able to say is- give
an absolutely unique number to the lines that
are unique – so for example, that will be
a 1, that will be a 2.
When it gets down here where there are duplicates,
we don’t actually want numbers for these,
but we want unique numbers here.
This will allow you to do things like “VLOOKUP”
on the set of data.
The way you can do that is by using an “IF”
function.
So let’s activate the function wizard, get
the “IF” function.
What we want to do now, is every time we see
a 1, in this first instance column, I want
to have a unique number so I’ll take a number
and add 1, if there’s not a 1, and there’s
a 0, then I want Excel to ignore it and put
a 0.
So the logical test is, is this cell = 1.
If its true, and again we are going to use
a growing boundary – I want it to sum.
So we go from here to itself (I am just going
to freeze that).
If its false I want to put a 0.
Now this is the key – the easiest way to
show what this is doing is to copy it down
– say ok, copy it down.
What you’ll see the first time it finds
a 1 so it adds the 1 in if you go down again
– it sees a 1 so it adds these 2 together.
If we go down again, it sees a 1 so adds these
3 together.
When it gets to the first 0, it sees 0 and
we have told it- please make it a 0.
When it finally finds another 1, it then goes
and adds all of them together.
What you can now see, is that you have got
an absolutely unique number for every unique
row of information and you’ll see it goes
up to 13.
At this point now it is possible to extract
all this information using something like
a VLOOKUP command.
What you could do is using a VLOOKUP command
you could set up a separate spreadsheet, have
a grouping of numbers – 1,2 3 etc and all
the VLOOKUP will do is it will look on this
table, find the unique item called 1 and maybe
pull through this set of information.
Another problem with duplicate values is that
sometimes you are not looking for a duplicate
value at this level, you maybe want it to
be the same transaction number, the date,
etc.
For this methodology what you can then do
is you need to create a single cell that represents
absolute uniqueness.
The way you will see I have done it here is
I have used CONCATENATE and I’ve joined
all the cells together in this case - so what
I’m going to be looking for is the exact
duplicate where columns A, B, C, D etc are
the same.
The only way to do this is to create a new
column, create this unique key and then follow
through and do the exact same logic here (count
how many times you’ve seen this before).
