Welcome to Excel 2013
Statistical Analysis Video
number 12.
Hey, if you want to download
this workbook and follow along,
click on the link
below the video.
Hey.
In this video, we're going to
still be using our data set.
These are sales transactions
from our boomerang sales
website.
And in this video, we want to
look at age and payment method.
And so what it means is we
have one, two variables.
And the perfect tool
to look at frequency
distribution with
two variables--
the pivot table.
Now, so far, frequency
distributions
have been just one variable.
But the pivot table
can do 1, 2, 3,
4 variables as easy as dragging
and dropping the field names.
Now, I want to go look
at the end result.
We're going to go
to CrossTab Answer.
And here's the pivot table,
and here are the three charts.
But let's just see how a pivot
table with one, two variables
works.
Any intersecting cell is a
calculation with two conditions
or criteria.
Now, we're building a
frequency distribution,
so we're counting.
But what does that "six" mean?
That means we went over to
the data set and counted all
of the 34- to 40-year-olds
who used MasterCard--
over here, how many people
55 to 61 who used PayPal.
So any intersecting cell is a
calculation with two criteria.
Our calculation is counting.
Now, let's look at these
different charts, 1, 2, 3.
They're all looking
at the same data,
but each one is articulating
a slightly different message.
Now, look here.
We have up in the legend
the payment methods.
And down in the horizontal
axis are our age categories.
This chart here is called
a "clustered column."
In the textbook, they call
it "side-by-side bar."
I'm going to use the names that
Excel uses, because then it's
easier to pick them.
Now, what this
chart emphasizes is
the actual items
from the legend,
we're comparing them
for each age category.
We can clearly see
the purple one is
biggest for the age 27 to 33.
That means PayPal.
So the emphasis here is
comparing the legends
in each age category.
Let's go over to this chart.
This chart is not
covered in the book.
It's called a "stacked column."
And I very much like this chart.
This chart emphasizes the
height for each one of the ages.
And yet we still get to
see inside this column
the actual individual pieces.
The actual size here
means the count.
Now, whereas this chart
emphasizes the total column
height for each age, this one
is emphasizing the total column
height for each
individual payment method.
Now, the third chart--
and this one is called in
Excel "a 100% stacked column."
In the textbook, they
call it "stacked bar."
Now, the key to this
is that every column
is the same height.
These are percentages.
It's as if we have pie charts--
1, 2, 3, 4-- but they're
orientated like a column.
They're all the same height
because they all equal 100%.
So you've got to be careful
with this one, sometimes.
But the nice thing about
a 100% stacked column
is this is a quantitative
variable here, A.
So you can sort of see
the shape or pattern
as we go across
the age categories.
But don't get confused.
This is percentage.
This is the actual heights--
1, 2, 3-- all from a
cross-tabulated pivot table.
Let's go and create these.
I'm going to click on CrossTab.
Click in a single cell.
Insert, Pivot
Table or Alt-N-V. I
want to put it on
a new worksheet.
So I'm going to click
OK or hit Enter.
Now, we want to start with Age.
I'm going to grab Age
and drag it down to Rows.
And oh, how amazing
pivot tables are.
Right-click the
Row Label, Group.
And we're going to use our
same start age and increment
from our earlier videos.
Hey, we're going to
use the min age of 16.
Tab, Tab.
We don't need to change that.
It'll be automatic.
When we use our By field,
this is the class width
or increment.
We're going to use 7, just
like we did in earlier videos.
Click OK.
Instantly, that's grouped up.
Remember, this is an integer.
Even though it's a continuous
quantitative variable,
the integer gives us very polite
labels for our categories.
There's no ambiguity
here, 16 to 22.
And because there's
nothing between 22 and 23,
we immediately jump to 23 to 29.
Now, I can come over.
And we're going to get
our Payment Method.
Drag it to Columns.
Boom-- instantly,
one, two variables.
Notice this is a
quantitative variable.
But we've rolled it up in
groups so the pivot table
will interpret it as
a category and count.
Quantitative variable,
categorical variable--
for cross-tabs, you
can do it any way.
You can have two categorical,
two quantitative,
or, in our case, one of each.
Now, we're going to drag
any particular column down
to Values, because it will
just count any time it gets
a match for the two criteria.
So I'm going to drag Age down.
Now, I'm going to
close the field list.
I'm going to change the layout.
So Design, Report
Layout, because I want
to show the actual field name.
So I'm going to
show an outline--
Customer, Age, and
Payment Method.
Click in cell A3--
"Cross Tab of Count."
Now, you can do your
analysis from here.
We can clearly see MasterCard,
age 23 to 29, there were 10.
PayPal, for this
age category, 13--
Visa, there were 14.
Now, but how about
a visualization
to help us more quickly
see our information?
I'm going to click
in a single cell.
Insert.
Click the Column.
This one is clustered
column or, in the textbook,
side-by-side bar.
This is the stacked column.
It's not in the textbook.
This is what they
call a "stacked bar."
We're going to call it
a "100% stacked column."
I'm going to select
the first one.
Boop.
That's looking pretty
good from the start.
But we do want to
change a few things.
Right-click, Hide
All Field Buttons.
Click on the legend.
Right-click Format
Legend or Control-1.
And we're going to
put it at the top.
Let's close our Task pane.
Let's use the Plus.
And we're going to
select Axis Titles.
This one is selected.
Type an equal sign-- jumps
up to the Formula Bar.
Click on cell A3 and enter.
Click on the
horizontal axis label.
Equal sign-- I'm going to click
on Customer Age and enter.
Now, one important
thing is we don't
want the columns to be touching
here in this cross-tab.
But now, we can clearly
start comparing.
In the age group 37 to
43, PayPal is the most.
Over in 44 to 50, this age
group didn't even use Discover.
51 to 57-- clearly,
Visa was the highest.
But look at this.
Here's something.
You'd think that the younger
categories would use PayPal.
But for some reason,
this upper age, 58 to 64,
are using a lot of PayPal.
Maybe one of the
senior centers that
featured this website
for buying boomerangs
taught everyone how to
use PayPal or something.
All right.
So this chart is quite nice for
comparing our different payment
methods across our
age categories.
Now, I'm going to copy this.
Click on the edge--
Control-C. I'm going to
pull it off to the side.
Click right here--
Control-V.
And it's just as simple as
changing the chart type.
Right-click, Change Chart Type.
And I want this second
one, Stacked Column.
Stacked Column--
when I click OK, this
emphasizes the actual
total for each age group.
So which age group bought
the most boomerangs
or had the most transactions
from our website?
23 to 29.
Which one had the least?
Looks like 65 to 71.
Ha, but look at this.
We can still see the
individual comparisons
based on our legend.
Now, one last chart--
I'm not even going to copy it,
because it's already cued up.
I'm going to pull
this to the side.
Click here--
Control-V to paste it.
Right-click, Change Chart Type.
We want to select the
100% Stacked Column.
Click OK.
This chart allows us to look
at our quantitative variable
along the horizontal axis.
And up in the legend, we have
our categorical variable.
And we can clearly see
some kind of pattern.
Well, actually, there's
not much of a pattern.
It's up and down, up and down.
Some other data set, you
might clearly see some pattern
over whatever
quantitative variable
we have on the horizontal axis.
Wow, that's pretty amazing.
I'm going to Control
and roll my mouse
to roll out to zoom
out and arrange these.
And I want to do one last
cross-tab, just to show you
how amazingly easy this is.
CrossTab-- how about we
look at region and product?
These are two
categorical variables.
Click in a single cell--
Alt-N-V. And I'm
going to hit Enter.
I'm going to drag--
oops.
I have two bad sheet names.
I'm going to come over here.
And I can't call this "CrossTab"
because there's already
this or a "CrossTab Answer."
Since this is the answer,
I'm going to put "an2."
You can never have
the same sheet name.
This one, I'm going to
call "CrossTab Answer 3."
And all we're going
to do is drag Region
down to Columns, Product
down to Rows, and then
whichever one down to Values.
Close the field list.
Design, Report Layout,
Show an Outline.
Click in cell A3--
"CrossTab Frequency."
It's as easy as that.
We can clearly see for the
particular product "Carlota,"
we sold six in the
East, 14 in the Midwest,
three in the South,
and 21 in the West.
Now, one other important
aspect about a cross-tab,
check this out.
There's a grand
total here and here.
If we were to drag the region
field away, which we'll
do in a second, guess what?
Product names-- frequency.
Not only that, but you can
also see the regions here.
And here's the frequencies
for those in the grand total.
Now, check this out.
That means we have three-in-one
frequency distributions.
Inside is our cross-tab,
frequency distribution
for product, frequency
distribution for region.
Now, I actually
closed the field list.
So right-click, Show Field List.
Just for a second,
I'm going to drag--
this is the region.
So I'm just going to uncheck it.
There's our frequency table.
Drag it down to Columns.
Cross-tab, and the
frequency's still there.
Now, we want to go
over and look at what's
called "Simpson's paradox."
Now, here's a data set.
And I actually have
the filters turned on.
I just-- oh, you can't see that.
I'm going to right-click, Insert
a Column, just for a second.
The filters are quite polite,
because they'll always
give you a unique list.
So there are only two sales
reps in this particular data
set, Joe and Moe.
And we have a column that says
whether their communication
with a customer did
not end up in a sale
or ended up in a
sale, yes or no.
And we have Sales Channel.
They either communicated
by email or phone.
Now, what we're
going to do here is
we're going to build
a pivot table that
counts the number of
yeses for each sales rep.
"Yes" means they actually had a
sale from their communication.
But here's the thing.
When we use one, two variables
in our cross-tabulated table,
later, when we add
a third variable,
we'll see that
the actual results
we got from the first table
are mysteriously reversed.
Now, it's not that mysterious.
It's just if we don't
include this variable here,
then it is a hidden variable.
And so we have to be careful
when we do cross-tabs.
All right.
Let's check this out.
I'm going to click
in a single cell.
Actually, I'm going to
right-click, Delete.
Click in a single cell--
Alt-N-V.
And I want to put
it on this sheet.
Location-- I'm going to
try and put it in E1.
Click OK.
Our field list-- so I'm going to
drag Sales Rep down to Columns,
Contact Result in
Sale down to Rows,
and then it doesn't
matter which variable
we drag because
remember, the counting
is based on the transactions
that have these two
criteria or conditions.
I'm going to drag Contact
Result in Sale down to Values.
Now, let's fix this
up a little bit.
Design, Report Layout,
Show an Outline.
Cell E1 is selected.
I'm going to type "Cross Tab"--
Enter.
And this one's a
little bit too long.
I'm going to type
"Y/N" for Yes or No.
Now, I'm going to
highlight the columns E
and drag all the
way to H. And then
I want to make them all the same
size as the largest text here.
So I'm going to
point between H and I
and click and drag
just a little bit
so they're all the same size.
Now, let's copy this table--
Control-C, Control-V in J1.
And I'm going to
point to the smart tag
and say, give me the
same column widths.
Now, let's change
the calculation.
Right-click, Show Values
As-- this is amazing--
Percent of Column Total.
And instantly, we see
Joe had 14% of the yeses,
and Moe had 13%.
Now, check this out.
This is true based on
just the yeses and no.
There were actually
more yeses in comparison
to the total for Joe
than there were for Moe.
But remember, there
was a sales channel.
What if we un-aggregate
this because there's
a hidden variable there and
add that hidden variable?
We'll do one, two tables--
one for Joe, one for Moe--
with that third variable.
And we'll see that it's
reversed, because Joe actually
did overall better, just
looking at the yeses and noes,
but not for the phone
and email together.
Well, let's check this out.
I'm going to highlight
this table here--
Control-C. And I want to
make sure and go 1, 2,
3 cells below--
Control-V.
Now, I want to look at our
pivot table field list.
Right now, it's, in the Columns,
Sales Rep, in the Rows, Yes/No.
Now, in our earlier video,
we used a slicer as a filter
for the whole pivot table.
But you don't have
to use the slicer.
I'm going to drag
Sales Rep up to Filter.
Boom.
Now, it's everybody, right?
We could see it
says "All" there.
And then I'm going to drag
Sales Channel down to Column.
Now, this is a
little scary here.
I'm going to move
this out of the way.
That's too close up there.
I should've gone
four cells down.
So I'm going to Control-X.
Click right in J7--
Control-V.
Then I'm going to copy.
And because it already has
the filter, I can go just one
below--
Control-V. And now,
let's check this out.
Inside the pivot table, I'm
going to check and select Joe
only.
Click OK.
And I'm going to come down
to the third pivot table,
with our percentages of column
total, and select just Moe.
Click OK.
Email, phone-- email, phone.
And look at this.
Here are all the yeses.
Here are all the yeses.
This is Joe.
This is Moe.
Now, let's look at
Joe-- all the yeses.
Oh, OK.
So for email, 8.82%--
but look at Moe.
Compared to his
total, he had 12%.
So for the sales channel
Email, Moe actually
did better as a
percentage than Joe did.
Look at this one.
15%, that's Joe's.
16%, that's Moe's.
That means for Phone,
comparing the yeses
to the total number
of phone calls,
Moe did a percentage better.
That's an example of
Simpson's paradox.
And all it means is
nothing's lying here.
There was just a
hidden variable.
And when we
un-aggregated-- brp--
one, two, we had a
different picture.
Hey, that's a lot
about cross-tabs.
They're totally
amazing and easy.
Go over to the CrossTab 3.
We saw with this cross-tab
two categorical variables.
And it was like three-in-one--
cross-tab, frequency
distribution for product,
frequency distribution
for region.
And over here, we
did a cross-tab
with a quantitative and
categorical variable.
And we saw our 100% stacked
column, clustered column,
and our stacked column
charts as options
from visualizing our data.
All right.
Next video, we will talk about
two variables for xy data
and see how to do
a scatter chart.
All right.
See you next video.
