Welcome to Excel 2013,
statistical analysis video
number six.
Hey, if you want to
download this workbook file
and follow along, click on
the link below the video.
Hey, this video,
we're going to start
talking about chapter two,
descriptive statistics creating
tabular and graphical summaries.
That just means tables with
formulas or pivot tables.
Graphical means charts,
like column, bar, and pi.
Chapter three, hey, we'll do
more descriptive statistics,
but we'll do numerical.
Now let's go ahead
and scroll down.
These are the topics
for chapter two.
Remember, we have categorical
data and quantitative.
Categorical is something
like a label or a category.
Quantitative is a number.
We're going to have tabular
and graphical for both.
Here's the list of the
tabular, the graphical
for our categorical
variables and the tabular
and graphical items we'll create
for our quantitative variable.
Now I want to go
over to notes here.
And what we're going
to do in this video
is learn how to create a
frequency distribution,
a relative or percent frequency
distribution, column chart
or bar chart, pie chart,
and even Pareto chart.
Now check this out.
Below here are notes.
And if you Control P
and print this out,
there's five pages with
notes about what we're
going to see in this video.
I'm going to click escape.
Now let's go over to FD
for frequency distribution,
categorical, and here, we're
given a data set of peoples
purchases for phones.
So if I click on the top
cell and control down arrow,
you can see we have
over 500 records here.
Control home to jump
back to cell A1.
And as data analysts using
statistical techniques,
we need to organize this
into useful information.
Well, a frequency distribution
will allow us to count.
But now let's go back
over to the notes sheet
and just look at the definition
for a frequency distribution.
It's a tabular summary.
First thing is we need to list
all the unique values or ranges
of values later when we get
to a quantitative variable
in a set of data called
categories of classes.
So obviously, we need a
unique list of our phones.
That'll be our categories.
The categories must be
collectively exhaustive.
That just means you have
to have enough categories,
so nothing's left out.
So we need to get a complete
list of every phone,
and the categories have
to be mutually exclusive.
No item can fit into
more than one category.
That's something we more have
to worry about when we get
to our quantitative variable.
Hey, the frequency
distributions will
show the number of observations,
count or frequency,
for each category, and the
goal is to provide information
about frequencies or count.
Now that doesn't seem
like much, right?
Information about count, but
you won't believe how often.
It's very important
to know, hey, which
category has the most, right?
So for phones, we
want to know which
phone was purchased most often
or, for that matter, the least
often.
When we get to our
quantitative variable,
we want to know things
like, how many students
got scores between
80 and 90 points?
So counting even know it seems
very basic is very important.
Not only that, but these
frequency distributions
and the charts that
we create from them
will be very important
throughout the entire class.
All right, let's
go back over here.
We're going to see how
to use either a formula
or a pivot table.
And the reason why
we're going to do both
even though the textbook
only does pivot tables
is because each one has its
strengths and weaknesses.
Now I have a little list
here, but overall in the notes
when you print it out,
there's a complete list
of the differences
between the two.
And we'll talk about
them as we create both.
Oh, how are we going
to get a unique list?
Well, we're going to use
something up in data ribbon.
We saw filter in
week one, but we're
going to use Advanced Filter.
There's a checkbox here
that will automatically
extract a unique list.
All right, so I'm
going to click Escape.
Click on the top
cell Control Shift
down arrow to highlight
the whole column.
One thing that's very important
about extracting a unique list,
you better have a
field name at the top.
Now we've emphasized this
heavily in the class so far.
But if you use Advanced
Filter, and you
don't have a field name,
you'll get an incorrect list.
You'll get exactly
two of the first time,
because it will consider
the first item a field name.
No problem.
We have a field name, data.
Advanced Filter.
Here's our dialog box.
Now you just read it
from top to bottom.
Filter list in place, no way.
We want to copy to
another location.
The list range, well,
it got it right,
because we followed all our
rules for a proper data set.
Criteria range, well, we
don't have any criteria.
Copy to, we definitely
want to copy it to,
and I'm going to click cell D6.
And here it is.
It's as easy as this.
As soon as you check unique
records only and click OK,
boom.
Look at that.
We get an exact listing of
one for each unique item
in this list over here.
All right, now let's add
our second column over here.
And we're going to call
this frequency, and enter.
Now I want to use
the same formatting,
so I'm going to right click
it on the mini toolbar,
hit my paintbrush, and then,
boop, click on frequency.
Now I'm going to
highlight all these cells,
and up on the Home
ribbon, I'm going
to go and get all borders.
And I'm going to add that
green just because that's
our convention for this class.
That means there'll
be a formula there.
All right, so what do we do?
We use the count ifs.
Now count if, that's
for one condition only.
I'm always going
to use count if,
because we can do one
condition or more.
Criteria range, I
click on the top cell.
Control Shift down
arrow and F4 key,
because we need to put those
dollar signs into lock it.
Remember, we're copying
the formula down.
Criteria range, that
word range reminds me
that I need the whole
column with everything.
I type a comma.
Criteria, hey, that's going
to be the individual phone.
That's a relative
cell reference,
so as we copy it down, it knows
to move to each new phone.
That's it.
Control Enter, double
click, and send it down.
And there we have
our frequency table.
Some of the useful
information we
can gain in this particular
sample of over 500,
we can see the Samsung Galaxy 4
152 times, Apple iPhone 6, 122.
The one with the
least, Motorola Moto G.
Now this gives us the count.
Sometimes, we want to see
either a decimal representation
or a percentage
representation that
shows us parts compared to the
whole, so let's come down here.
And what's the whole?
I'm going to type total
tab, and whoa, now that's
something new in 2013.
It automatically copied
the formula down.
That's terrible.
Now I'm going to take
advantage of that.
It did bring some
formatting down,
but if I hit F2, that
is not what I want.
That's new in 2013.
I have never seen that before.
I'm going to delete that.
Now we want to
add, so we're going
to use not the sum button up in
the Home ribbon or the Formulas
tab.
Where we use our
keyboard alt equals.
We're looking carefully.
Are the dancing ants marching
around the correct range?
You bet, enter.
I'm going to turn my
speak cells on enter off.
Now I'm going to come
back to this later and do
some formatting.
But for now, I'm going
to highlight these two
and control B. We'll actually
have a few more totals,
and so we'll do a universal
formatting at the end.
Now we come up
here, and I'm going
to name this relative frequency.
What does relative mean?
It means relative to the total.
What's the decimal or
percentage going to be?
Now for relative
frequency, we'll
use this later in our
probability chapters.
We're always going to
show it as a decimal.
What this means is
I need to compare
122 divided by the total,
88 divided by the total, 152
divided by the total.
Relative frequency.
Now I'm at a right click
point to my paintbrush.
When I see my paintbrush,
I'm going to click and, boom,
I've formatted that.
I'm going to highlight
this whole column.
Right click, click on the
paint brush, and boom.
I love that.
Look at that.
It did the whole column just
by clicking in the top cell.
Now I'm going to do my
formula equals relative cell
reference divided by
E14, and I'm absolutely
going to hit F4 to lock it.
That's the dollar signs.
That means as I
copy down, locked.
The denominators locked on 559,
but each one of the blue cell
references will move.
Control Enter, and I'm going
to not double click and send it
down.
I'm going to click and drag.
Now one thing that's
true about this
is if these are
relative frequencies,
and we're comparing
each one of the parts
to the total when I add them
up, alt equals and enter,
it better equal one.
Now that's kind of
a messy decimal.
We can highlight this.
And when we go up to home
and decrease decimals,
it's not hacking them off.
See I'm clicking it.
It's just hiding them.
It is displaying
four decimals only.
The actual decimals that we
just hit are still in the cell.
For example, if I were
to come down here,
and you don't have to do this.
But I'm going to
show you, because I
want to emphasize something
important about number
formatting.
If I say truncate, the truncate
function will truncate.
It's looking at that
number underneath,
and I'm going to say hack
off everything after four.
And now when I Control
Enter, it looks the same.
If I were to highlight
both of these cells
and increase the decimals, you
can see that sure enough, it's
not the same.
Those decimals are
still there, Control ZZ.
You don't have to do this,
but I'm just doing this
as an exercise to emphasize.
Now I'm going to delete
this, and I'm going
to alt equals, Control Enter.
And you can see that if
we actually hacked off
the decimals underneath
meaning everything
after the fourth position to
the right, we would not get one.
Up here, we do get one when
that some function adds,
because all those decimals
are just hiding underneath.
Now why do we always
have to get one?
Or later when we do our
percent frequency column,
why do we get 100 percent?
Because the categories are
collectively exhaustive.
That means we have
enough categories,
and they're mutually exclusive.
We're not going to get a count
of a particular item in two
different categories.
All right, now
we're going to add
a column for percent frequency.
And I'm actually
going to show you
how to do it two ways,
the way I prefer it,
and I'm going to show you
the way the textbook does it.
Which does have some
uses occasionally.
Now I'm going to cheat here.
I'm going to highlight this
whole column and right click.
And I'm going to double
click that, and now watch.
one, two, I copied the
whole-- don't click over here.
Because if I click right now,
remember I double clicked it.
So wherever I click,
it's still going.
You see the dancing
ants right there.
But notice it took all the
formatting and all the way
down.
Now I'm going to click Escape
to turn off that double click
for the paintbrush.
Now we're going to have two
columns, percent frequency
Control Enter Control
C and Control V
just so I have the
same column there.
Now since I'm going to
use this as a label,
and it might show up
in the charting engine,
I'm going to put a
little note up here.
Number format tab space
times 100, just so I know.
Now I'm going to highlight both
columns, and with both of them
selected, I'm going
to click and drag.
So I have enough space
for the number formatting.
All right, now
percent frequency when
using number format,
that's the way I prefer.
Because then we can use
any of these numbers
freely in any
subsequent calculation.
We saw a great example
back in week one
of why this method is useful.
So I'm going to use
percent number formatting
based on this
decimal, I'm not going
to recreate the whole formula.
I'm just doing a
relative cell reference.
Control Enter and copy it down.
Now I'm going to go up to the
Home ribbon number dropdown
and select percentage
with two decimals.
Same thing here.
There's a bunch of
extra decimals there.
Hey, when I come down
to the bottom and alt
equals and enter, 100 percent.
Now the beauty of this
method is those decimals
are still under there.
So if we have a reference one
of the cells in a formula,
we'll be using the
correct number.
Now this method is
going to involve
taking the same
relative frequency
and multiplying it times 100.
The problem with this is
that it changes the number.
It's not 0.218247 anymore.
It's 21.
So if you use that any
subsequent calculations,
your result will be too big.
Now the reason that
this method is around
is because you want to
display the percentages
without the percentage symbol.
Sometimes in some reports, or
tabular summaries, or charts,
the percentage symbol all over
the place gets a little busy.
Now I'm going to decrease
the decimals, and we can see,
we get the same looking thing
without the percentage symbol.
If I come down here
and alt equals,
it's going to give me 100.
But that's actually
the integer 100.
All right, so I showed
you this and the logic
for why it's occasionally used.
I mentioned in
week one the reason
that I don't use this
method and always
use this method is because
looking at spreadsheets for two
decades, I just
see too many errors
when people do
formulas like this.
No problem.
If you want to use it,
you can, but only when
you're using these numbers
in some sort of tabular
or graphical display where
you don't want the percentage
symbol.
Hey, now we want to come
down to this last row,
and I'm going to
highlight the whole row.
And we got to talk
about borders.
And we're going to use
Format Cells, which
is home and any one of
these dialogue launchers.
And I'm going to use the
keyboard Control one.
And the nice thing about
the Format Cells dialog box
is it has the number tab,
alignment, font, border, fill,
and even protection.
Now we want border, and the nice
thing about the dialog boxes
is they have everything,
all the options.
Whereas the ribbon tabs don't
have as many options sometimes.
Now by default, it
comes with an outline,
and I'm going to clear this.
And want an outline
means is unless we
have something highlighted
here, and there's
some lines inside the outline.
So if I click outline,
it's only the outside.
Only the outsides edges would
get lines here, nothing inside.
As soon as I click inside, that
means it does inside as well.
Now what we want is we want
a dark line, so the order
as you pick your line, boop.
Then you select your
color if you want,
but our default is fine.
And you click.
Now you could use these buttons
here, but I like to draw it.
Boop, and you just click.
Line, color, and then
boop, and there we have it.
That's the famous dark
line and a double line
to indicate this
is the bottom line.
Click OK.
Hey, that didn't do anything.
But if I click
outside, sure enough,
I can see that the
borders are looking good.
Now for printing, we would
not have all of these columns.
We did all of these
columns just as we're
learning how to build a
frequency distribution.
Now I want to look at
the pivot table next,
and then we'll look at charts
from the various columns.
We'll select A1.
We have a proper data set, so
we go up to Insert pivot table.
Or we use the
keyboard Alt, N, V.
I'm going to select
existing worksheet.
The location will be cell J16.
Click OK.
Now let's scroll over, and
we'll build our pivot table.
Now remember there's
only one field.
And the remarkable thing is when
I click and drag down to rows,
instantly, I get a unique list.
That is fast and easy.
One of the big advantages
of pivot tables
is that they are fast and easy.
We'll compare and contrast
the formula and pivot table
in just a bit.
Now we want to then drag
the field down to values.
Instantly, because it's a
text field, we get count.
Let's rename these columns.
This is the frequency.
I can click and drag.
That is amazing and fast.
Now we want to add relative
frequency and percent
frequency.
That's going to
require that we drag
our field a couple
times down to values,
but first, let's pull the
bottom of the field list.
I don't know if
there's enough room,
so I'm going to
point to the middle.
And when I see my white vertical
arrow, I click and drag.
Now I'm going to drag it once.
Then that sign means,
no, no, no, no, no.
When you see that blue,
then you can drag and drop.
Let's do it again.
We're not going to use
the field list anymore,
so I will close it.
Now let's rename relative
frequency tab, percent
frequency, and enter.
Now watch this.
We got the count
by default, but we
need to change this
so that it's comparing
each one of the counts
to the column total.
Right click in the values area,
and point to show values as.
There's all sorts of
amazing calculations
that we'll use throughout
this statistics class,
but look at this,
percent of column total.
There's also percent of
row total, which we'll
use in percent of grand total.
Hey, percent of column total,
you gotta be kidding me.
Now the default is to
show percent number
format for relative frequency.
We don't want that,
so right click.
And we were using value field
settings as our one stop
shopping, but if it's
just number formatting,
come to the number formatting.
This does not exist if you come
out and right click a cell.
It's only in a pivot table that
you can format the actual field
with number formatting.
Now if I click general, I
have all those decimals,
and I don't want that.
I want to display
them four decimals.
Click OK.
Now right click.
You've got to be kidding me.
Show values as percent
of column total, done.
Now the pivot table
is fast and easy.
The actual calculation
where we multiply by the 100
is much harder to
do in a pivot table
than it is to do with formulas.
You'll see in the textbook,
they do that calculation outside
of the pivot table.
I do want to
compare and contrast
the pivot table to the formula.
Now the first thing
we want to know
is that formulas
instantly update
if your source data changes.
Pivot tables don't instantly
update, so let's just copy.
And before I paste
to change this,
watch the calculations
here and over here.
Control V, instantly,
this updates.
This does not.
Now it's not hard.
It's simple.
You have to come over to
the pivot table and right
click, Refresh.
And then it updates.
Sometimes, like if
you have sample data
that you're dumping into
your template over and over,
then the formulas
look pretty darn good.
Because they're
instantly updating.
However, the advantage
to the pivot table
is that it's super
easy to calculate.
Another advantage
to formulas that
don't apply in this
particular situation.
There is many, many functions
and formula possibilities
that just don't exist
in a pivot table.
The last difference, we'll
see when we create our chart.
All of these columns
will automatically
show up in any chart.
Whereas when we build
it with formulas,
we can pick and choose
our columns for charts.
All right, now let's
build a column chart.
I'm going to zoom in a bit.
I'm going to build the
first couple of charts
off of the formulas,
and then we'll
build one off the pivot table.
Now I just want to
highlight the actual phone
names and the frequency.
Do not highlight
the total, boom.
And now let's go to Insert, and
we are going to do a column.
Click the dropdown, and
we want the first one.
Now in the textbook, they
call this either a bar chart,
or later when we use this
column for cross tabulate,
they'll call it side by side.
But in Excel, it's called
a clustered column.
We'll use these two later
on and cross tabulation,
or we'll use all three
for cross tabulation.
So clustered column,
and there's our chart.
Now let's fix it
up a little bit.
I don't like this title.
Select the title.
When you see that solid line,
type in equal sign immediately.
There's an equal sign
in the formula bar.
Click on cell A1, enter,
so we have our title.
Now that's a pretty good
column chart right now.
One thing that's nice about
it is this vertical axis.
The numbers are neatly
placed off to the side.
I like to put for
some column charts
and histogram that
we'll see next video,
the frequency is right on top.
So I'm going to go ahead and
click on the axis and hit
delete.
We also want to delete
the horizontal lines,
and make sure you select not
the outside but the inside.
Now those lines
are important when
you have the numbers
in your vertical axis.
But we're not going to
have those, so delete.
Now let's select the
column, and notice
that they're all selected.
When I right click, I
can open the task pane
with either format data
series or control one.
I want to go to the
bucket of paint, fill,
and down at the bottom, I
want varied colors by point.
Now let's go over to the series
options or the little columns
and click.
One thing you don't want
to do with categorical data
is select zero gap width.
That's left for continuous
quantitative variables,
which we'll see next video.
I'm going to leave it at
maybe 100%, one, zero, zero,
and enter.
Now I can close the task pane.
I want to add the frequencies
at the top, so I click the plus.
I love this data labels.
Boop, just like that.
So we have a column chart.
The order of the
labels at the bottom
do not relay any information.
The information comes from
the heights of the column.
We can quickly see that the
green one is the biggest,
and the little teal
one is the smallest.
Now I want to do a second
column chart from the percent
frequency column over here.
I'm going to point to the edge,
and when I see my move cursor,
I can click and drag.
Now something that's hard
to do with a pivot table
is if I want to click and
select the phone names,
and then hold Control to click
and select a second column--
notice I don't have the totals.
--you can't just pick and choose
Columns from a pivot table.
But you can do that
from a set of formulas.
Insert, Column, and
then clustered column.
This one, I'm going to
control one to open up.
And on the series, I'm going
to select 100%, 100, and enter.
The only other thing
I'm going to change
is I'm going to select
the title equal sign.
Click in A1 and enter.
I'll close the task pane.
So we have our percent frequency
and our actual frequency.
Here are two options.
They're equally acceptable.
Oftentimes, it's based
on people's preferences.
Now let's go ahead,
and I'm going
to copy this chart,
Control C. And then
right below, I'm
going to Control V.
We talked about in
week one how if you
turn a column chart on its
side, it's a bar chart.
And sometimes for
emphasizing differences,
that's better or when
you have long titles.
I'm going to right
click in the white area
and point to change chart
type, and let's go to bar.
Click OK.
There is a comparison.
They're relaying
the same information
in slightly different forms.
Another way you can
relay this information,
let's copy this chart.
I select the outside edge,
Control C. I click below,
Control V. Now
let's right click,
and let's change it
to change chart type.
And we'll change it to a pie.
Now notice the pie will come
out with the automatic colors.
These colors are
based on the themes
that you select
in your workbook.
Now let's go to the plus,
because for pie charts,
we have a few more
options for data labels.
I'm going to click the
arrow and then more options.
This opens up the task pain.
Category name, percentage,
unchecked values.
I immediately can see
that it's kind of crowded,
so let's push these out.
I'm going to come back
over to the task pane.
Label position,
let's say outside
and still a little bit crowded.
But check this out.
I'm going to select the actual
pie, come back over to the task
pane.
And actually, one nice thing
is the different elements
that you select in the
chart, the task pane changes.
And look at this,
angle of first slice.
Now watch.
Here's the green one.
I'm going to slide
this a little bit
maybe to like 36 or something.
And there we have it.
If you wanted-- I kind
of liked that, right?
If you wanted, sometimes you
have to click on a label,
click on it a second
time, and drag it out
to get a leader line.
That leader line comes
from over in the columns
label options, show leader line.
I'm going to undo that Control
Z, because I kind of linked it.
Without the leader lines,
there's less clutter.
So there's four different
charts for relaying
the information for
categorical data
from a frequency distribution.
Now I want to
close the task pane
and go look at the pivot table.
I'm going to use the
horizontal scroll bar
and scroll over here.
Now even though this is
kind of over top of it,
I'm going to move
it to the side.
The problem with the
pivot table, and it's
an easy one to solve, but this.
If your goal is to have
this tabular report,
and you want frequency, relative
frequency, and percentage
frequency in the same
report, any chart
built off of this pivot
table will automatically
take all the columns.
Hey, just click on a single
cell, or let's do this.
Let's highlight, use
control, and all I want
is the percentages, right?
So I've highlighted just like
I would with formulas, Insert,
Column chart, column.
You can already see
the problem here.
All of the columns
are coming in.
Now you can barely
see it down there,
because they are
little percentages.
But even if you
hit them, you still
have the problem with
this weird gap width.
Now we can in most situations
go up to Design, select data.
That's the trick.
Later in this class, we'll
use this to really create
charts that normally can't
be created, and look at this.
If we want to edit
any of the series,
all the buttons are grayed out.
So the solution is--
and I'm going to delete this.
So the solution is to build
individual pivot tables, one
for frequency, one
for percent frequency.
But if you want all
your calculations
in the pivot table,
then you have
to switch over and do something
like this formula table.
Now I want to over to this
next key data product,
and we're going to see
that really a lot of times,
we want to use pivot tables.
Here's a different example.
We have a product.
This is boomerang name.
It's a list of all of
the products we sold.
Control, down arrow, so
there's like 200, control Home.
All I want to do is create
a pivot table of frequency
and a percent frequency.
Insert pivot table or Alt, N,
V, existing sheet, location, C1,
click OK.
I'm going to drag down to rows.
We get a unique list.
Drag down to values,
we get a count.
Click in cell C1.
Type product tab,
frequency enter.
Now watch this.
We're going to copy
this, Control C
and paste it right below,
Control V. Right click, show
values as percent
of column total.
Click in the cell, hit F2,
and I'm going to type percent
before frequency and enter.
I can change that
column width up here.
Now we have our two tables.
I can close the field list.
Click in a single
cell, Insert, Column.
Notice we get field buttons.
I'm going to right click Hide
all field buttons on chart.
Look on the title, start typing,
product count, and enter.
Click on our legend and delete.
Click in a cell, Insert, Column.
Click on the title equals sign.
I'm going to click on the
cell with percent frequency
and enter, right click
Hide all field buttons,
delete, point to the
corner, click and drag.
I'm not going to worry
about the exact layout,
but that's how you do it.
And often times when you
want multiple charts, quite
a bit easier to do it
with a pivot table.
Hey, two last points about
this categorical video.
We want to go over and look
at the sheet discrete column.
This is not a
categorical variable.
This is a quantitative variable.
We're going to count
the number of heads
in three flips of a coin.
We'll actually do
this in chapter four
when we're talking
about probability.
And because this
is a discrete, not
a continuous quantitative
variable-- discrete means
we're counting, right?
Zero, one, two, three.
--there are gaps.
We can't have 0.5 or 1.75.
When you have a discrete
quantitative variable,
then you want to use
columns with gaps.
The gaps indicate that we are
counting zero, one, two, three.
The last topic we want to
look over on the sheet Pareto.
Now this type of chart is just
mentioned in a one sentence
blurb in the margin, but
I did want to show you
a picture of it.
It's categorical data,
data from customers
that say why they're
dissatisfied with a product.
And the only difference
here is we have
our unique list, our count.
The frequency is sorted
from most complaints.
That's for the bad finish, the
next most unattractive design.
The next most did
not work all the way
down to the one that got the
fewest selections by customers.
No instruction, so
this column is sorted.
Then we do a cumulative percent.
This chart is for
quality control
where you list the reason
for dissatisfaction
with the most frequency or
highest percentage first sorted
in this direction.
We're not going to create
this in this class.
There's a reference
video here if you want
to learn how to create this.
I've done this in other
statistics series.
All right, we're going
to scroll over here.
We did a few things here.
We saw how to create frequency
tables, percent frequencies
with a pivot table
and pivot chart.
We also saw how to do it with
formulas and pivot tables,
and compared and contrast the
benefits and drawbacks of each.
And then we saw percent
frequency chart, a pie chart,
a bar chart, and a
column with frequencies.
All right, next video, we'll
start quantitative variables,
and figure out how to do
frequency distributions
and charts for those.
All right, we'll
see you next video.
