All right.
So last, but very
certainly not least,
is data exploration
and visualization.
Data exploration
and visualization
are critically important to
the practice of data science.
In fact, we're going to
spend the vast majority
of the first day
of the boot camp
talking almost exclusively
about data exploration
and visualization because
it's just that important.
You need to understand what
your data looks like before you
can start to model it properly.
So what is data exploration?
Essentially, data exploration
is visualization and calculation
that allows us to better
understand the characteristics
of a dataset.
The key motivations
of it are that we
want to be sure we select the
right tools for preprocessing
and analysis.
And because it uses our human
mind's really, really powerful
ability to recognize patterns.
A person will recognize a
pattern that a data analysis
tool won't in a lot of context.
Building a neural network,
which will tell you
if a picture is of a face,
is a massive endeavor.
It's a very
complicated endeavor.
But humans can do it.
Most humans can do it
innately, automatically,
very, very quickly.
So this is, of course, related
to the historical phrase
of exploratory
data analysis, EDA.
The original book is Exploratory
Data Analysis by John Tukey.
And if you're interested in
data exploration, specifically,
there's some information here.
And this will, of course,
be online shortly,
so you can pull that
off more quickly.
The original focus
of the field of EDA
is not the same as our
focus as data scientists.
As data scientists, our focus
is on summary statistics
and visualization.
And EDA, clustering and
anomaly detection, so I think,
Ron, you have some background
in this field, I suspect,
because you're talking
about Natalie as well.
Using clustering as
exploratory techniques.
Anomaly detection as
exploratory techniques.
In our context, now clustering
and anomaly detection
are major areas of data
science interest, major fields,
sub-fields of their own, not
just a piece of an exploratory.
Though, clustering for
exploratory purposes is still
used a great deal.
It's actually-- good clustering
algorithms and good clustering
practice is one of your
more powerful tools
if you have a very
complicated dataset.
