[MUSIC PLAYING]
DANIEL SMILKOV: Hi, I'm Daniel.
MARTIN WATTENBERG:
Hi, I'm Martin.
FERNANDA VIEGAS:
Hi, I'm Fernanda.
Machine learning
is pretty complex.
So we've been
experimenting with ways
to visualize what's happening.
There's a core concept
in machine learning
called high-dimensional space.
Here's one way to wrap your
head around this concept.
You can think about people
as being high-dimensional.
For example, take
famous scientists.
You can think about when they
were born, where they were
born, their fields of study.
Each of these is like a
dimension of that person.
These dimensions become
difficult to untangle
when you think about
different people,
because someone might
be similar in some ways,
but very different in others.
MARTIN WATTENBERG: But
this is the kind of thing
you can use machine
learning for.
With machine
learning, the computer
isn't told the meaning
of these dimensions.
It just sees them as numbers.
And it sees each set of
numbers as a data point.
But by looking across all
of these dimensions at once,
it's able to place related
points closer together
in high-dimensional space.
DANIEL SMILKOV: Here's
a concrete example
where words are treated as
high-dimensional data points.
The important thing
to remember is
that we haven't told the
computer the meaning of words.
Instead, we've shown it
millions of sentences
as examples of how
words get used.
Here is a visualization
of the results.
We're looking at
a subset of words
that the computer
has learned about.
Each dot represents one word.
Each word is a data point
with 200 dimensions.
Using a technique called
t-SNE, the computer
clusters words together
that it considers related.
And clusters
form-base the meaning,
even though we've never taught
it the meaning of words.
Here is a cluster of
numbers, months of the year,
words related to space, people's
names, cities, and so on.
FERNANDA VIEGAS: We
can also look closely
at smaller sets of words.
If we search "piano,"
we can run t-SNE only
on words related to "piano."
We get clusters of composers,
genres, musical instruments,
and more.
MARTIN WATTENBERG:
And this approach
doesn't just work from words.
For example, you can
also treat an image
as a high-dimensional
data point.
Here's a dataset
where lots of people
wrote digits between 0 and 9.
People write in
all kinds of ways.
So the question is, instead
of us needing to manually code
rules for all the
ways people write,
could a machine figure it out
itself using machine learning?
Each image is 784 pixels.
The computer treats each
pixel as a dimension.
Again, using t-SNE, it
clusters these images
in a high-dimensional space.
We've color-coded them so
that it's easier for us
to see what's going on.
And you can see groups of
digits clustering together.
It's learned something about
the meaning of these digits.
FERNANDA VIEGAS: These
visualizations techniques
we've been exploring can be
useful for all kinds of things.
That's why we're working on
open sourcing all of this
as part of TensorFlow
so that anyone
can use these tools
to explore their data.
[MUSIC PLAYING]
