[VIDEO PLAYBACK]
[MUSIC PLAYING]
[END PLAYBACK]
DAVID J. MALAN: All right.
This is CS50, and this
is Yale University.
Welcome to week seven.
So this class marks a transition between
the first part of the course, where
we have been learning about
the magic wonder world of C--
an extremely powerful low-level
programming language that
has allowed us to solve
many problems, indeed--
to a second part of the
course, where we move
towards more high-level programming
languages, such as Python.
In fact, this transition
had already begun last week
when we learned about
HTML, CSS, and the like.
And we left off, really,
by commenting on the fact
that this new programming language,
Python, can be conveniently used
to write something like the back
end of google.com or facebook.com
or any web service, really,
that accepts some parameters,
parse them, possibly look up
for some code in a database,
possibly store some code in
a database, and it gets back
to the user with dynamic output.
So before we get to that end and to
see how Python can used to indeed write
the back end of a web server,
it is instructive to see
how Python can be used as a tool,
really, to do data analysis,
much like a data scientist will do.
And this is what we
are going to see today,
diving into the magic
world of machine learning.
But first of all, what is
machine learning, really?
So these are a funny
sequence of vignettes that I
found online on pythonprogramming.net.
And they represent the stereotypical
life of programmers/researchers
in machine learning.
So let's see on the top left here.
Well, society thinks that they
are creating hordes of robots,
possibly with the idea
to conquer the world.
Right?
Friends think that they're
hanging out with robots, really.
Now, parents-- their parents think
that programmers in machine learning
spend most of their time in data
centers with no windows, apparently.
What about other programmers?
Well, they might think that
programmers do fancy mathematics.
And what about themself?
Well, they typically think that they're
involved with some fancy visualization,
data analysis.
But at the end of the day, what
they really do-- and here we are--
is using Python, and not just using
Python to implement some algorithm,
but really using Python to import
some algorithm as we are seeing here.
So we do not know Python yet.
But this line of code looks
extremely readable, isn't it?
There is English there.
It says "from sklearn"-- we don't
know what this is yet-- "import"--
I already mentioned svm,
support vector machine.
It's a function to run a
machine-learning algorithm.
So we don't know Python
yet, but we're already
able to decipher, more or
less, what is going on.
And indeed, sklearn, as we
will see, is a so-called module
in Python-- a library, if you
wish, in C-- from which we
are importing an algorithm, a function.
Now, this line exemplifies a
characteristic feature of Python--
namely, it readability.
And it is often the case that
Python code is referred to
as being similar to pseudocode,
precisely for the fact
that it allows us to express very
powerful ideas with a couple of lines
that are extremely readable.
Now, this very characteristic
of Python, together
with our familiarity, at this
point in the course, with C,
is what will allow us
today to quickly see
how Python can be used
as a data processing tool
to implement and run
machine learning algorithms.
Well, so back to the questionn--
what is machine learning, really?
So as the previous sequence
of vignettes were suggesting,
in the popular culture,
at least, machine learning
is often associated with the world
of AI, artificial intelligence.
Typically, we think of
machine in terms of robots.
And literally, there are
countless science fiction movies
about this theme, typically
representing some robots turning evil
against humanity or in this line.
Indeed, this is a modern
interpretation of an old fear
dating back, possibly, to
Frankenstein in the 1800s and beyond.
But really, if we think
of the way machine
learning is having an impact on
our lives on a day-to-day basis,
it's not necessarily
related to robots, per se.
It has more something to do with
the following set of applications
that we indeed use on a daily basis.
So lets just think of search
engines, for instance,
that allow us to look for whatever we
might feel like in the world wide web,
and in a matter of literally
milliseconds to get back
an order, at least, of results
based on the query that we enter.
Think about image
recognition, the possibility
to catalog, to search for an
image based on its subject.
Think about speech recognition software.
These days, we can all talk to our
phone and ask, what's the weather like
or show me the movies around me.
Or finally, just for these
four sets of application
I mentioned, the world of
natural language processing,
the amazing ability to translate a
document of text from one language
to another in real time or
the ability to, say, infer
the meaning, the semantics of a
document with no human intervention.
So these are just a few of the
applications that are indeed backed up
by machine learning algorithms.
And here by the word machine, we really
mean an algorithm running, most likely,
on the cloud in a data center.
And today, we use these
applications on a daily basis.
And so have you ever wondered
what they're all about?
How can we get started to thinking
to design one of these applications?
And this is what we're
going to learn today.
So in particular, we will be focusing
on two applications-- image recognition
and natural language processing.
But before that, let's
go back to week zero.
So we have seen this diagram
early on in the course.
It represents a
general-purpose algorithm.
The black box is an algorithm that takes
some input from the left-hand side.
It processes them, and it
delivers the user with an output.
Now the class of application
that we're going to see today
very much fits into this framework.
Just think about image
recognition, for instance.
Well, we might want to
have an algorithm that
takes as inputs images-- say here an
image of a horse, image of a car--
and is capable or realizing or
recognizing that there is, indeed,
a horse or a car in that
image and get back to us
with strings, such as "this is a horse."
Or, well, in the world of
natural language processing,
think about an algorithm where we
could do something of the following--
say I want to pass as an input to this
algorithm one of my favorite novels,
1984 by George Orwell.
So this is just a part of it--
"BIG BROTHER is watching you."
But say that we want really the
entire book to be fed up as an input
to this algorithm.
We want the algorithm to be able to
recognize the semantics of the book.
So we want the algorithm,
with no human intervention,
to be able to get back to us
and tell us, look, Patrick,
this book is about politics.
It's about propaganda.
It's about privacy.
So in both of these two applications--
image recognition and natural language
processing-- it is already clear
that what the machine learning
algorithm is doing is trying to infer
some hidden structure in the input,
right?
And if we phrase the problem like that,
well, we have seen something like that
earlier on in the course-- in
fact, many examples of a scenario
where we are given an input
with some hidden structure,
and we want to design-- you have
done that in problem set four--
an algorithm that can decipher, that can
crack the input and gives us an output.
So this is Whodunit.
And per the specific, we were
given an image with some red noise.
And you were told to design an algorithm
to get back with the hidden message.
So in this respect, you might think
applications such as image recognition
seem to share the similar
behavior, isn't it?
But there is a key difference.
Anyone can spot the difference?
We're getting started.
So one difference in the picture
is that instead of just one image,
there are two images.
And indeed, that is somehow
to the point in the sense
that with problem set four, we were
given an image with a hidden message,
indeed.
But we were told the structure
of the hidden message.
In other words, we were told that
the image had a lot of red noise,
and we were told that in order
to decipher, to really find
the hidden message, we should lower
the intensity of the red pixel
and possibly rise the
intensity of the other pixels.
But in machine learning
applications, this
is not typically what we have in mind.
So we want an algorithm that
can work with any sort of image
we may want to feed it with.
And this is what is bringing us
to one of the key differences
in the class of application
that we are going to see,
namely the fact that it
is the algorithm itself
that is trying to find out the
hidden structure in the input.
And the way you can do this is by having
access to what is called training data.
In other words, even if we want
to design an algorithm that
is capable to tell us, Patrick, look,
this is a horse, in order to work,
this algorithm needs to have access
to a lot of images of horses.
And based on these images, it is able
to figure out the hidden structure
so that once you feed, again,
an image of a horse, it can say,
this is a horse.
And this is what we
are going to see today,
starting precisely from the
example of image classification.
So what is the example
we're going to play with?
It's the following-- suppose that we are
given a data set of handwritten digits.
So there are 10 digits
from zero to nine.
And for each of these
digits, we are given
a collection of handwritten
images representing that digit.
So for instance, for digit zero,
we are given many of these images.
And each of is having a different
way of writing a digit zero.
So what we set as our goal for today
is to actually design, in this case,
an algorithm that can take
an image of digit zero
or an image of digit
six or any other digit
and, indeed, get us back with
the string "0" or "6" or so on.
And the way you can do
that, as we will see,
is by having access to
the so-called training
data that is inferring the [INAUDIBLE]
structure behind the images
we want to use.
But before we get to talk about images,
let's us just abstract a little bit.
And let us think that we live in a
one-dimensional world, a line land,
if you wish, where the only thing
that we can see is a straight line.
Say we go to school, and the first thing
that we are told in day one of school
is that, OK, I see a point here.
The teacher is telling
us, well, look, Patrick.
This is number zero.
Then we go again, day two of school.
We see another point.
The teacher is telling us,
this represents a zero.
Day three, we see another point.
It's the only thing that we can
see in a one-dimensional world.
And the teacher is telling
us, well, this is a six.
And so on.
So day four, we see another point.
This represents number zero.
And so on.
So this is representing number six.
This will represent the number six.
So in other words, we have been
exposed to a labeled training
set of data-- points and the
associated label corresponding
to the digits they represent.
So say now that we are presented,
the next day of school,
with a so-called test point over here.
And we are asked, what
should this number be?
What should this point represent?
Which digit?
Anyone?
Number six, indeed.
And congratulations because
this is the first example.
This is the first machine
learning algorithm
that we are going to discuss
today, easy as it sounds.
So there is a lot going on.
First of all, just appreciate
the fact that I haven't
told you anything about the structure.
I just presented to you a set of points.
We don't tell you
anything about the game.
It's not like in problem
set four, where we
were told about the specifics
of the image to crack.
In this case, I was just
presenting you with points.
And then at a certain moment, I was
asking you, OK, what is this point?
And as you have guessed, well, this
should represent the number six.
Why?
Well, because somehow,
we have figured out
that the points representing number zero
are grouping in one side of the line.
Points representing number six are
grouping in another side of the line.
And so in our head,
what we are doing, we're
looking at the closest
point among the ones
that we have been exposed to previously,
so the point with the minimal distance,
so the so-called nearest
neighbor, and just
label the new point that
we're given-- the test
point-- with the label of the
closest point we've been exposed to.
So this algorithm is called
nearest neighbor classifier.
And indeed, this is what we're going
to use to classify handwritten digits.
But before we get to that, let's
assume that we get promoted.
And so on the second year of school,
we now see the world in two dimensions,
much like a flat land.
And we are exposed to a set of points--
again, a labeled set of points.
So this point represents number zero.
This is the number six, number
six, number zero, number six,
number zero, six again.
And finally, we're
presented with a test point.
So any guess on what this
test point should represent?
It's the number zero.
And indeed, same reasoning, just done
in two dimensions-- nearest neighbor
classified.
So in a way, this is
very much intuitive.
And what we are left off
with is the question of, OK,
can we map an image
into a point in a space?
Because if we were able to do that,
if we can take an image of a digit
and just interpret that image
as a point in a space, then
we could repeat exactly the
same procedure I told you.
In this case, there are no points.
There are images in this abstract space.
But we have access to
a labeled training set.
So we know what these digits represent.
And then when we are asked,
OK, what is this new digit?
Well, we apply the same logic, right?
And this is what we are going to see.
Indeed, the world can be
seen as having more than one
or two or three dimensions.
And much of machine learning
is really about interpreting
the data we are given as points in
some sort of high-dimensional world.
And in making this jump, we might
feel a little bit like the character
in one of my favorite novels, namely,
Flatland by Edwin Abbott Abbott.
This is the [INAUDIBLE] of
the first edition in 1884.
The plot of Flatland is
the following-- the story
describes a two-dimensional world
occupied by geometric figures.
The narrator is a square
named A Square who
guides the reader through some of the
implications of life in two dimensions.
On New Year's Eve, A Square dreams about
a visit to a one-dimensional world,
Lineland, inhabited by
lustrous points, in which
he attempts to convince the realm's
monarch of a second dimension,
but he's unable to do so.
Following his vision, A
Square is himself is visited
by a three-dimensional
sphere named A Sphere,
which he cannot comprehend until he sees
Spaceland, a three-dimensional world.
So many movies have been
produced on this story line.
Let's see a trailer of one of those.
[VIDEO PLAYBACK]
-Imagine a vast plane, a
world of only two dimensions
on which triangles, squares,
pentagons, and other figures live
and move freely about.
-Configuration makes the man.
-Get to your squarical!
Now!
-You're only a square.
-Thanks, brother.
-They know nothing of our
three-dimensional world.
-Such a notion is, of course,
absurd and, furthermore, illegal!
-But that's all about to change.
-Where did you come from?
-I come from space, the third dimension.
-No, no, no, no, no, no, no, no!
No!
You're not serious!
-Based on the beloved
novel by Edwin A. Abbott.
Tonight, our world faces a grave threat.
[END PLAYBACK]
DAVID J. MALAN: Right.
And then it goes on.
I love it.
It's one of my favorite.
You should watch the movie.
And so indeed, are we ready to go beyond
the Lineland, Flatland, Spaceland.
So let's do it.
So here I just represented
what Linland looks like.
Just one line, right?
The only thing that we can see in
one dimension are really points.
And here I wrote two points for us.
What we have here is
the coordinate system,
just to be able to measure
distances, so to speak.
So in this case, this
point is at location one.
This is what this one represents.
And this point is at location four.
So this is the picture in
one dimension, Lineland.
Flatland we have seen--
two-dimensional world.
Indeed, we can visualize points here.
In this case, each point is
represented by two coordinates.
We have the horizontal coordinate
and the vertical coordinate.
So coordinate 1,
horizontal, and 2, vertical.
And so on for the 4, 4.
Now we go to Spaceland-- indeed,
a three-dimensional world.
And even here, it's pretty easy to
visualize what points are, really.
There are three coordinates.
And so we can simply refer to a point
with the associated coordinates.
So can we go beyond?
Indeed.
Indeed we can.
It's not that easy to draw points
or the like in higher dimensions.
But we can indeed think of a
point, say, in four dimensions
by being referred to by a set a
coordinates-- say, 0, 1, 4, 5.
Isn't it?
So OK, this reference, the
axes there don't mean much.
We're no longer in three dimensions.
Still, they just represent
sort of an abstract space.
But indeed, I cannot four axes here, but
can definitely think of a point indexed
by four coordinates as a point
living in a four-dimensional world.
And so on.
If we want a point in a five-dimensional
world, well, here we are.
And so we can go back
to the idea, can we
map an image from this
data set we have access
to to a point in a
higher-dimensional space?
And as we have seen in
problem set four, images
are just a collection of
pixels-- in this case,
for the smile, just a collection
of the zeroes and ones,
zeroes being associated to the color
white, ones to the color black.
And even in the data set
that we are playing with,
each image in this data
set is simply an eight
by eight array of pixels,
whereby each pixel in this case
is a number between 0 and 16.
So in some sense, we have
eight by eight-- so it's 64.
So it's really a point in a
64-dimensional space, isn't it?
So we can really use this
idea and interpret images
as points in this 64-dimensional space.
And now we can indeed
run the nearest neighbor
classifier we have seen previously.
Namely, we are in this
64-dimensional space.
We see labeled images
that are presented to us.
We are exposed to it, if you wish.
This represents a six, a six.
This represents a zero.
And so on, until we are presented
with a test point again.
Fine.
We know how to deal with it.
We just simply assign to this first
point the label of the training point
that is closest to it.
So the only thing that is missing here
is a notion of distance, isn't it?
So let's see how we can go
about thinking about this.
So in a one-dimensional world, it's
pretty easy to compute distances.
Indeed, the distance between
these two points is simply what?
This line, right?
So it's simply 4 minus 1, 3.
In a two-dimensional
world, a little bit more
complicated, but we are
still able to do so.
If this is the distance
we want to have access to,
well, we have the old
Pythagoras's theorem.
So we first compute this
distance-- so the difference
between the horizontal coordinates of
the two points, which is 4 minus 1,
is a 3.
Then we square it.
And we add the vertical distance between
the two points, which is 4 minus 2-- 2.
And then we square it.
And then we take the square root.
Isn't it?
So in this case, well, it
will be the square root of 13.
So even in a three-dimensional
world, it's the same idea.
In order to get this distance
between the two points,
we can simply work coordinate-wise.
So we can take the distance between
the first coordinate, which is a 3,
square it, plus the distance between
the second coordinate, which is a 2,
square it.
Plus the distance between the
first coordinate, which is a 3,
and square it again.
And we can take the square root of this.
So we indeed have a formula
to compute distances.
And this formula doesn't simply hold in
one-, two-, or three-dimensional space.
It holds in many dimensions
we may want to work.
And so we have all the ingredients
to run the nearest neighbor
classifier at this point.
So just to give you an idea
what these distances look
like, say we want the distance
between these two images,
thought of as points in
this 64-dimensional space.
Well, in this case, if we apply
this formula coordinate-wise,
we get a distance of 31.98.
Say now we want to consider the
distance between an image representing
digit zero and an image
representing digit six.
Well, we do the math, and we find
out 45.97, which is, indeed, bigger
than what we had previously.
Previously, we had 31 as the distance
between two images representing
the same digit.
So it should be smaller than a distance
between two images representing
different digits.
So at this point, we are
ready to see some Python code.
So we are going to actually
implement the nearest neighbor
classifier just described
to you in abstract terms
by using these data sets.
Again, for each digit, we have access
to a collection of different images
representing the digits.
So let's get to see some Python.
First of all, let me show
you what by Python is.
We can go to the CS50 IDE and simply
fire up the Python interpreter
by writing, python.
And here we are.
We're inside the
interpreter, so to speak.
At this point, we can run Python code.
So let's see.
We can write, x = 3.
y = 5.
x + y equals-- guess what?
8.
Amazing, isn't it?
Coming from the world of
C, this is totally amazing.
Many things happening here.
First, there is no need of
declaring variables whatsoever.
I didn't write int x = 3.
I simply wrote x = 3.
And in fact, we might write
something like x = '8'.
y = 'b'.
And guess what-- x + y is
equal to the string 'ab'.
So in Python, there is no
difference between single quotes
and double quotes, as in
C. And indeed, we do not
need to define what a variable is.
Another key difference with respect
to C is the fact that it's immediate.
There is no clang or make or the like.
There is no need for us
to call the compiler.
I was simply running the
code, and the interpreter
was in fact interpreting or
running the code line by line.
So indeed, there is a
compiler behind the scene.
But we do not need to
get involved with this.
This is one of the beauty of Python.
And in fact, coming from the world of C,
we can read off Python code fairly easy
at this point.
Now, the syntax is a
little bit different.
So let's see what a for loop will be.
for i in 3, 5, 7, print i.
And ta-da, this is a for loop.
So a few syntax differences, right?
First of all, it is more
like a for each loop,
where we loop through each element
in this array, if you wish.
And then there is a column
that we don't even see.
More interestingly, there are
no brackets, no curly brackets.
So how is Python knowing
that the block of code
we want to execute inside the
for loop is the line "print i"?
Well, it goes by indentation.
So if I were to repeat the
line of code that I just
presented to you but without
them indentation here,
you will see an error.
So in Python, there is no
need of curly brackets.
It's simply a matter of
the indentation you use.
And it is good style to use
indentation even in C code,
so why the curly brackets after all?
So OK, I could carry on like this
and show you some more Python
code within the CS50 IDE.
But for the sake of
exposition, let me actually
close this and go to the following
way of presenting code to you.
So these are called
markdown type notebooks.
So the idea is that while
the code is indeed run line
by line as Python Is doing, this
way of presenting the material
is allowing me to group
lines of code together.
So here it is, what
I was writing earlier
in the CS50 IDE, the same line
of code, same line of code again
when it comes to manipulating
strings, the for loop
that I presented to you, and so on.
So as we see again, we
don't know the syntax yet,
but we can read off what
is happening coming from C.
This is an if statement, for instance.
Again, there are syntax differences.
Indeed, there are no brackets here.
There is the semicolon and
there is the indentation.
But we can really
decipher what is going on.
So this one the beauty of Python, and
we are going to rely on this beauty now
and today to actually parse almost
line by line some blocks of code that
will allow us to quickly
jump into the action
and see some meaningful, cool output
from machine learning algorithms.
So let's do that.
We go to the world of supervised
learning, namely image recognition.
But before we get to that,
what is supervised learning?
So the class of application
in machine learning typically
fit into either supervised
learning or unsupervised learning.
Today we are going to see both.
So the example of image
recognition fits into the category
of supervised learning, as
we have access to a labeled,
as I mentioned and
stressed earlier, data set.
So we are presenting a set of
code or a set of points or images
with a label associated to it.
As we had this label, we're
doing supervised learning.
So let's start with points,
and let's see how Python
can be used in Flatland, if you wish.
So Python has many built
data types and functions.
Indeed, we are going to see
much more of this next week.
But when it comes to data processing,
data science type of application,
really typically, people rely on
external data structures and functions.
And so this is what we
are going to do today.
And this is the line of
code that I'm writing here.
I'm importing two modules,
they're called-- library,
if you wish, in the world of
C. The first module is numpy.
It's one of the main modules
for scientific computing.
The second module is matplot.
It will allow us to easily plot graphs.
And the third line of code,
don't pay attention to it.
It is simply required there
in this notebook style
to tell the notebook just
print whatever graph we
are producing in line with the output.
So one of the cool
things about Python is
that it is an extremely
popular language.
And being popular in computer
science, it is really important,
one reason being that
there is a lot of code
out there that we can
simply go and import.
If you wish, these numpy and
matplot are on these lines.
Someone else has brought
libraries, modules of code
that we can easily import.
So indeed, we need to install this
module before we can import them.
But for the sake of today, just ignore
the fact that we need to install them.
So let's just assume that we can easily
import this module with a line of code
that I am presenting to you.
OK, let us create a training
data set in Flatland.
So here is the code.
The first line creates a
numpy array, meaning an array
within this numpy scientific module.
So it's very simple.
We're creating an array of six points.
Each point is a two-dimensional point,
so it is indexed by two coordinates.
And so in the second
line, instead, we are
creating, in this case, a built-in
Python list, as we will see.
But it's simply an array
of strings, if you wish.
And each element in the array Y_train
is simply a color, name of a color.
We have three red strings
and three blue ones.
So shortly, we will plot
this collection of points.
But before we get to
do that, let me just
show you some cool features
of the Python syntax.
So X_train is an array of six
points, each of them being
a two-dimensional vector, if you wish.
But you can also interpret these array
as being a two-dimensional array.
And so we can use this syntax
that here I present to you.
So the print function is simply
printing whatever is inside it.
And so with this line of code,
X_train 5, 0, what we are doing,
we are taking the fifth
element in the array of points.
So Python is a zero-index language, much
like C. So we start counting from zero.
And so in this case, the fifth point
is the last one in this array--
so, namely, 7, 6.
And so we take the fifth point,
the last one in that array,
and we can print the
zeroth index coordinate.
So if we you that, the
interpreter is outputting
7, which is indeed the horizontal
coordinate, if you wish,
of the last point in the collection.
Now, we can do the same while
changing from the horizontal
coordinate to the vertical one.
And so we get 6, which is here.
So one other key feature
of Python-- we are
going to get to see a
little bit of it today--
is the so-called slicing syntax.
So slicing syntax is a convenient way
to extract a collection of elements
from an array, as in this case.
So in this case, what we are doing,
we are taking-- the first line of code
is saying, OK, Python.
Just look at all the
points in the array.
Take the horizontal coordinate, the
zeroth coordinate, and just print
all the elements in the
horizontal coordinate.
Indeed, if you were to compare,
we have a 1, a 2, 3, 5.5.
These are all the first, the
horizontal, components of each point.
And so with the second component.
So this is one of the uses
of this slicing syntax.
And what we can use this syntax for
is to conveniently plot this point.
So now we are using-- OK,
plt.figure is simply saying,
Python, just expect that
I'm going to print a figure.
And the last line, as well, the .show
Is simply saying, just output the plot.
So the magic, really, everything is
happening in the line in between,
the scatter function
that is contained, again,
in the matplot module we are importing.
So this scatter function takes
two arrays, an array of horizontal
coordinates, if you wish, that we have
access to through the slicing syntax,
and an array of vertical coordinates.
And then we have s equals 170-- simply
the size, if you wish, of this point.
And the color yellow.
This is one of the
beauties of Python, again.
We are using the slicing syntax
again from the label arrays.
Recall that Y_train
was an array of colors.
So every time we are
plotting a point, we
are also taking the corresponding
color from the array Y_train.
So if we do that, just appreciate that.
With essentially one line of
code, we have a plot here.
And there are indeed six points.
Three of them are red.
Three of them are blue.
So this represents our
so-called labeled training set.
Instead of having digits zero and
six, now we have color red and blue.
The idea is the same.
So what we can do, we can now
add a so-called test point,
say at location 3 and 4.
So why don't we print it?
We use the same line of
code as before to print.
We add this new line
with the new test point.
And we have it plot
with the color green.
And this is the output.
Flatland.
So now what we want to do, we want to
run this nearest neighbor classifier.
And we know why, right?
We simply look at the point that
is closer to the green point here.
And we associate to the green
point either a color green or blue,
depending on whatever color of the
nearest neighbor has-- in this case,
green.
So in order to do that, we need to
define a notion of distance in that.
Well, we know what the
distance should look like.
We have the mathematical formula.
Let's write it down in Python.
And so you get to see how we can
indeed define functions in Python.
So here, define-- and again, this
is resembling pseudocode code,
right? def stands for defining.
Just appreciate this.
So define a function that we
call dist that takes two points
and returns the following.
So let me just pass for you
precisely what we are doing here.
So the line of code that
we want to understand
is the following, where
we take two points.
These could be, in this case,
two-dimensional points, each of those.
But later on, we are going
to 64-dimensional points.
And return with the Euclidean
of [INAUDIBLE] distance.
So let's parse this.
Let's assume that we have two points.
y is the test point 3, 4, and x
is one of the training points.
And we want to see what this line above
is doing with respect to these inputs.
So first of all, we can
take the difference.
And the way Python is
thinking about this
is taking the difference
coordinate-wise.
This is, again, one of the magic
properties of the Python language--
namely, we can act, we can apply
a function to the entire vector.
It's called vectorization.
So whenever there are vectors-- in
this case, two-dimensional points
or the like or arrays-- we can apply a
function to each element of the array.
So this is one case, if
we take the difference,
Python will automatically take the
difference of each coordinate at once.
So the difference between
1 and 3 is indeed minus 2.
And the difference between
1 and 4 is indeed minus 3.
So now we can take the power,
the second-- we can square it.
And again, what Python
is doing with this code
is simply thinking coordinate-wise.
So it's taking minus 2, and
taking the square of it.
That's 4.
Taking 3, and taking
the square of it as 9.
And so on.
Now we can use a function that comes
with the numpy module simply summing
the two coordinates.
So 4 plus 9 is indeed 13.
And then we can take
the square root of it.
So this is what this single
line of code is doing.
Just appreciate the beauty behind that.
And indeed, we can define a
function that simply returns
the distance between any two points.
And let us just compute for
each point in the training set,
we compute the distance with respect
to the single point, the test point.
So what we are doing
here, we are computing.
There are six points in the train
set, three red and three blue.
So we are taking the
distance from the green point
to any other of these six
points in the training set.
And so this is what this
block of code is doing.
The length function is
simply returning the amount
of points in the X_train array.
Then we initialize an array
of zeros with six zeros.
And then this is simply a for loop
that is computing the distance
using the dist function we defined
between all these six pairs of points.
And this is the output that we
print with the function print.
So this is an array where each element
represent the distance with respect
to the test point.
Then what we can do to just
complete our classifier
is just choose the point that
has the closest distance.
In this case, it will be the
second one, as we can see.
The distance is 1.8 here.
And we can simply print the
color associated with that point.
And indeed, if we go
back to the picture here,
we see that point number two here
is the closest to the green dot.
And so here it is.
So just appreciate the beauty.
With literally three,
four lines of code,
we can run a machine learning algorithm.
In fact, with using some more
Pythonic, as it is called, syntax,
we can even get down to
much less lines of code.
So OK, this is the case for points.
But let's go back to the
image classification example.
Let us see how we can exactly
take the precise lines of code
that I showed to you and apply it
to images in the digit data set.
So this is what we are doing.
We are importing from
the module sklearn--
it's a common module in
machine learning for Python.
We are importing a data set
that is the digit data set.
We call it digits.
So the digit data set
contains 1,797 images in that.
And there are multiple
structures within this database.
In particular, there are two arrays--
an array of images, which is called
digits.images, an array of labels,
which is called digits.target.
So let us see.
Let us print the first element
in the array digits.images.
So this is what it contains.
It is an eight by eight collection
of pixels that indeed represents
the number zero, as
we can see by actually
plotting with that line of
code that I showed to you,
quickly mapping each pixel from 0 to
16 included to an intensity of black.
So indeed, we can realize
that, well, this is a zero.
There are a few zeroes in the middle.
And indeed, if we plot it with this
line of code, we indeed get number zero.
So this is just the first element
indexed by zero in the data set.
And we can indeed also
plot the true label
that counts with the digits.target.
And it is a zero.
So this is the data set.
And what we want to do is to run
the nearest neighbor classifier
to this data.
So in particular, what we
are doing-- so let's see.
This data set, again,
something like this.
What we are doing, we are saying, OK.
Let us consider just a
subset of the database.
So let us take 10 images.
And let us consider these 10
images as being the training data
that we have access to.
Potentially, we could have
access to the entire database.
But somehow, we want to split.
And this is a common
feature in machine learning,
to split the original data
set into multiple subsets
in order to test the performance of
the algorithm you are implementing.
So this is what we are doing.
By selecting 10 images, we
essentially have this picture,
where each point in this 64-dimensional
space represents an image of a digit.
And the yellow label here
represents the true label
we have access to in this training set.
So now what we can do, we can say, OK.
Let us take a test point.
So say that-- take a test point here.
And Let us assume that we only have
access to the image of this test
point, which is indeed a three.
And we want to test the performance
of our machine learning algorithm
to classify this point.
So indeed, this is the line of
code that allows us to only take
10 images out of the original data set.
I'm using, again, the slicing
syntax, but in this case,
a little bit differently in the
sense that we are selecting elements
from zero included to 10 excluded.
This is per the specification
of the slicing syntax.
The right, outmost element is excluded.
So indeed, if we use this syntax,
we can extract 10 images from it,
precisely like in the picture there.
And then we can create a test image.
We can choose a random number here
in the remaining part of the dataset,
say the image corresponding to 345.
Indeed, we can plot it with the same
line of code I presented to you.
It is indeed a three.
But this is easy to the human eyes.
So we want to see how
good of a performance
we can get by applying the
nearest neighbor classifier.
And now the lines of code that
I'm going to present to you now
are precisely the same
lines of code that I
presented to you earlier in Flatland.
So let's see.
This is all together.
And indeed, we get that the
classifier is returning number three.
So the classifier,
what it's doing, again,
is computing all the distances
between these test points
and all the other points
in the training sets.
And it's choosing the point that has the
closest distance, the nearest neighbor.
And it is assigned the same label
of this point-- and in this case,
indeed, correct.
So it should come as a surprise that,
indeed, such a simple algorithm--
two, three lines of code
in Python to implement
it-- allows us to get such good a
result. But how good of a result
is this, really?
Well, we can test it.
And indeed, we can
plot the true solution.
We do have access to the true
label, which is indeed a three.
So let us test how well we are
doing with 100 test images.
So what we are doing,
instead of just testing
the performance of our
algorithm with a single image,
let us consider a set of 100 images.
And let us count how many mistakes the
algorithm we just implemented gets.
So if we run this code-- I
won't it parse it for you.
It's simply, again, taking into account
that starting from a number of errors
equals 0.
And then there is a count here.
It's adding plus 1 every time that
the algorithm is outputting something
that is different from the truth.
So if we run this algorithm
over a set of 100 test images,
we get that we commit 37 errors.
So we get 63 correct answers out of
100, which is pretty good, really,
for such a simple algorithm, isn't it?
But indeed, much like
the way humans learn,
human learning, also
machine learning algorithm
gets better with the amount of
training sets they have access to.
So in this case, we have
just chosen a subset
of the original data base of 10 images.
So what we might try to do is to take
a training set which is much bigger
and see how well the
algorithm is doing with that.
So we can do that.
We indeed, enlarge the training set.
Before, it was from 0 to 10 excluded.
Now it is from 0 to 1,000 excluded.
So it has 1,000 images.
We can run exactly the same code
as before over 100 test images.
And this time, look--
only three mistakes.
It's rather surprising, isn't it?
I mean, such a simple algorithm.
I described it to you
starting from Lineland.
And basically, the idea was there.
Now it was a matter of coding it up.
There was a notion of a point
in a higher-dimensional space.
There was a notion of a distance.
But once we figured that out
and we code it up in Python,
Python doesn't care about
the dimension, as we saw.
The same distance function that
works with two-dimensional points
equally works with
64-dimensional points and higher.
And so this is what we
achieve-- 97% of correctness
with, really, five lines of code.
So the question is, what if
we try the very same algorithm
I just presented to you in a
data base that looks like more
of what we will like to try it with?
So this is a popular database.
It's called CEFAR-10.
It is, again, same idea--
it is a labeled data base
that contains, in this case,
really, tens of thousands or more
of images with a label.
So we indeed have 10 labels, as before.
But now, instead of the labels
being numbers from 0 to 9,
the labels are something like airplanes,
automobiles, birds, dogs, and so on.
So this is just one of the
data sets that you can find.
And indeed, there are
websites such as kaggle.com
that host sort of competitions
where machine learning researchers
and programmers try out their algorithm,
and there are challenges going on.
So this data set was popular a couple of
years ago for one of these challenges.
A typical challenge could last
anything in between two, three months
to even longer-- a year.
So it turns out that if we
run the nearest neighbor
classifier to this new set of
images, the performance is 30%.
So it is still much better
than random guessing.
After all, there are 10 categories.
So you might suspect that just by
random guessing, you get 10% correct.
Indeed, you get 30%.
But it's not what we would like, is it?
And in fact, there are more
advanced algorithms that
do something a little bit different.
But let us see first what could
be an issue with the algorithm
that we just ran?
So this is a training cert for the
category zero in the previous data set.
And this is a few
elements, a few images,
from the category horse
in the new data set.
So one difference that
pops to the eye immediately
is that these are color pictures.
And indeed, they are.
So, in fact, instead of being eight
by eight pixels, they are 32 by 32.
So still rather small
pictures, but now each pixel
is indeed a triple-- RGB, as we
saw-- where each coordinate contains
a number from 0 to 255.
So in some sense, we are in
a higher dimensional space.
It's not just 64.
But there is another key
difference, isn't it?
Anyone?
Please.
AUDIENCE: The image can be rotated.
So you can have a horse that's
facing one way or [INAUDIBLE].
DAVID J. MALAN: Great.
This is definitely one of the
issues here, is viewpoint variation.
So we indeed have a sequence of
pictures representing horses.
But the horses are taken from
different angles, poses, right?
And in fact, there are
all sort of issues here.
There are viewpoint variations,
illumination conditions,
scale variation, deformation,
occlusions, and the like.
And this is what is making image
recognition a really tough challenge
for machine learning algorithms.
Now, what more
sophisticated algorithms do
is not just interpreting images
as collections of pixels, per se.
But they work on a higher level somehow.
So they group pixels together.
Instead of looking at one pixel
at the time, what is happening,
they sort of try to
extrapolate, to abstract
some higher-level feature of the code.
And this is an example.
So the digit zero can
indeed be represented
as having the following
four features, which
is four arches and the possible angles.
And indeed, we can go
higher in the hierarchy.
So the top-of-the-art algorithm for
this class of application, and not only,
is called deep learning.
They do work besides
like I just described.
So instead of working
with the pixels themself,
as at the bottom of this image,
they try to group pixels together.
And they try to find out patterns
if you group a few pixels together.
And the first layer of patterns they
can extrapolate are edges, for instance.
And then from there, you
can go another step up.
So you by grouping edges
together, you can come up
with objects such as eyes or
noses or mouth and the like.
And then grouping this
set of objects again,
you can get something like a face.
So this is indeed the logic.
Deep learning is really a game changer.
The idea behind this type of
technology is not new, in fact.
It relies on so-called neural
networks that were invented
in the '80s or probably even earlier.
But it's only until recently--
literally four years ago,
five-- that researchers have
been able to use this technology
to really achieve amazing results.
And now this technology is everywhere,
not just for image recognition.
Google's search engine
uses deep learning.
Just name one-- Facebook.
The tagging feature for pictures in
Facebook is based on deep learning.
Speech recognition software--
Siri, Cortana, OK Google--
that one doesn't have a fancy name yet--
they are all based on deep learning.
And this is really a game changer.
It has brought a 10% or more
increase in the performance
of [INAUDIBLE] algorithm in a
matter of literally one jump.
Something like that was
unseen in the field.
And indeed, deep learning is
beyond the scope of this class.
In fact, you really need a
lot of computational power
in order to run deep
learning algorithms.
Not just that, you need
tons of data out there.
And this is why in the
'80s, they couldn't
run-- the theory was
there, but first of all,
we didn't have access
to the amount of data
we do have access to today,
thanks to the world wide web.
And back then, we didn't have
the processing capabilities
that we have now.
So while deep learning algorithms
are beyond what we can actually
run in the CS50 IDE, we can
indeed use some tool sets
that are out there that
are based on deep learning.
So this is one example--
TensorFlow by Google.
So what you can do, you can
actually download within Python,
for instance, a trained
algorithm for you.
So without us trying to
train an algorithm ourselves,
as we have been doing with the
nearest neighbor classified,
we can download a file of 100 megabytes
or more for doing image recognition.
Another example is the
DeepDream generator.
So it turns out, as I mentioned,
this type of algorithm,
they are able to figure out patterns
in the input that we feed them with.
So what we can do with
the DeepDream generator,
we can go online,
upload whatever picture
we might like-- a picture of
ourself-- then upload a picture of one
of our favorite paints.
And the algorithm is capable
of recognizing the painting
style behind that painting and apply
it to the original image we uploaded.
And this is why it's called DeepDream,
because apparently dreams work
by mixing together images.
So if you apply this type of
technology to that database
I showed to you earlier, we
get an amazing performance
of 95%, which is indeed close
to what the human eye can
achieve, in this case.
So the question is, is 95% enough?
Well, it really depends
on the application.
And just imagine an
example-- self-driving car.
As you know, it's a hot area.
There are a lot of players out there.
Tesla is one of the first players
who was brought to the market,
really, autopilot-like features.
So the autopilot feature by
Tesla is allowing the car
to shift lanes on the
highway automatically,
to speed up or lower down based
on the traffic, and so much more.
Now, this is just an assistant,
and the company is making it clear,
so someone should always
be in control of the car.
But it is indeed
providing a lot of help.
And it turns out that the autopilot
feature in cars like Tesla
do rely on a variety of
technologies, such as GPS,
ultrasonic sensors, radars, and so on.
But they also do rely on
forward-facing cameras
with image recognition software.
And indeed, they use these so-called
deep learning technologies these days.
What is the problem?
Well, let us see a video.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
[END PLAYBACK]
DAVID J. MALAN: So indeed, the
reason investigation is going on--
and let me actually close this.
And as we saw, a driver was
on a highway in Florida.
He was using the autopilot
feature, apparently
relying almost exclusively on it, when
a tractor trailer drove perpendicular
to it.
And if you read, from tesla.com, a
statement that the company has released
after the accident, which
happened a few months ago,
I read-- "Neither
Autopilot nor the driver
noticed the white side of the tractor
trailer against a brightly lit sky,
so the brake was not applied."
So it is indeed an issue
with image recognition.
Apparently, the color of the
trailer was whitish, white.
And so against a brightly
lit sky, the algorithm,
although it performs something
like 95% of the time correctly,
had some challenges.
So these are a few of the challenges.
And, in fact, the [INAUDIBLE] will
be much interested in this respect.
Applying this type of
technology for self-driving cars
will bring a lot of interesting
questions in all fields--
not just computer science, of course,
but politics with policies, ethics,
philosophy, and the like.
All right.
That was it for image recognition.
So let's now just move to the next
application-- text clustering.
So we are going to go a
little bit faster about that.
The application we have
in mind is the following.
Say that I want to design an
algorithm that takes as an input
the following list of movies-- in
fact, not just the movie title,
but the IMDB synopsis for the movies.
So the movies are Robin Hood, The
Matrix, The King's Speech, Aladdin,
and so on.
And I want to design an algorithm that,
just solely based on these inputs,
is capable to cluster,
as it is called, to group
these movies into two categories.
So if we stare at the list of
movies, it might be evident,
if you like, that the clustering that
we expect is something like that,
where we have a Beautiful
Mind, The Matrix, The King's
Speech in one group and Robin Hood,
Aladdin, Finding Nemo in another group.
And indeed, if I were to ask you,
just group this list of movies
into two groups, most likely,
this would have been the answer.
But your answer would have
been based on something
different than what the machine
will do, as we see, most likely.
In fact, you might say, OK,
these are the two categories.
Because we know from before that,
in a way, Robin Hood, Aladdin,
Finding Nemo are really more
Disney-like movies, right?
They're for kids, if you wish, whereas
the other are more action-type movies.
But again, this way of
categorizing, clustering movies
is based on some sort of human learning,
whereas the machine has only access
to the synopsis of the movies.
So let's see what will happen if
we indeed try to run an algorithm
to do this clustering.
So as before, we try to abstract.
In this case, the set of
applications we are discussing now
is called unsupervised
learning because contrary
to what happened before, where
we were presented a list of data,
a list of points, a list of images with
a label associated to it, this time,
we are simply presented
with a set of points.
And here it is in Lineland.
This is what we will see--
just a set of seven points.
And so if you are asked, OK, just split
this set of points into two groups,
well, we have an easy way of doing that.
And once again, I haven't told
you anything about the structure
to be inferred.
I was just presenting to you with a
set of data points and asking to you,
just split this group of points
into k equals 2 categories, groups.
So as before, this is indeed a
well-known machine learning algorithm.
It's called K-means.
K there represents
the number of clusters
we want the algorithm to
divide the original data into.
And as before, from one dimension,
we can go to two dimensions.
And if I ask you, OK, split this
group of points into two groups, easy.
That's what K-mean is doing.
What about text now?
So before, we somehow had
an easy way of mapping
an image, a collection of pixels, into
a point in a higher-dimensional space.
How can we go about thinking--
doing something like that with text?
It's not that easy, isn't it?
Because if we were
able to do, if we were
able to interpret each of these
synopses as a point in a space, then
most likely, the picture
will look like that.
And if I ask you to divide this
group of movies into two categories,
that will be your answer.
So indeed, let's see how we can map
some text input into a collection,
into a point in a
higher-dimensional space.
So in order to do so, let me
just step back for a moment
and describe the following, easier
type of example, where as an input,
we have four strings.
So let's read.
First string is, "I love CS50.
Staff is awesome, awesome, awesome."
String A. String B-- "I
have a dog and a cat."
String C-- "Best of CS50?
Staff.
And cakes.
OK.
CS50 staff."
String D-- "My dog keeps chasing my cat.
Dogs."
OK.
Say these are the four
strings, and we ask, OK,
let's split it into two groups.
Most likely, what you will guess
is that one cluster, one group
should include string A and
string C. And the other cluster
should include string B and string D,
based on the semantics, on the meaning.
So how can we do this?
And indeed, if this a representation
in high-dimensional space,
this is what we will get.
So the missing step is
really this mapping, right?
Mapping each of these four strings into
a point in a higher-dimensional space.
And this is what we can do with
the following interpretation.
So here, we have the four strings.
The first thing that we can do is look
at the vocabulary used in these four
strings-- namely, extract the words
that is used in each of the strings.
So if we do not consider
so-called stop words-- namely,
words such as "I," "is,"
"a," "and," and the like,
which do not provide much meaning, this
is the dictionary that we will extract.
There is the word "awesome,"
"best," "cakes," "cats," and so on.
And now if we look at
each of the strings,
we can indeed map each
string into a numerical point
by using the so-called bags of words
interpretation, by which each string is
simply represented by the word count.
So let's see, for
instance, the first string.
The word "awesome" is used three times.
And that's why we have a three there.
The word "CS50" in
lowercase is used once.
The word "love" is used once, again.
And "staff" is used once, again.
Again, we do not consider the
stop words such as "I" and "is."
So on-- so the second string
can also be represented
by a numerical vector as this.
And indeed, there are 12
words in this dictionary.
So we can indeed think
of each of these strings
as being a point in a
12-dimensional space, isn't it?
So not so simple.
This is a first great step.
But we should also normalize by the
length of the string, if you wish,
just because if a string
is very long, then
it's more likely that simply
the rough counts will be higher.
What we can do easily is just
divide each numerical vector
by the total amount of
words in that string.
So we get a so-called frequency matrix.
So here it is.
We have a way to map a string into
a point in a high-dimensional space,
a 12-dimensional space.
And what we can do, we can
apply this algorithm K-means.
So let me just show you quickly
how this can be done with Python
in the realm of unsupervised learning.
So now we will move
much faster than before.
In this case, we we're importing
the same modules as before,
numpy and matplot, and creating-- in
this case, in the world of Flatland--
we're creating an array of
seven points, which I here
plot with the same exact
lines of code as before.
But in this case, instead of
us implementing the machine
learning algorithm from scratch, we
can do what stereotypically, at least,
most machine learning programmer
or researcher will do,
which is importing the K-means
algorithms from an external module.
So if we imported this algorithm, the
details of how the algorithm works
is beyond the scope of this class.
But it wouldn't be that
difficult, in fact.
But we can reasonably run
this algorithm and say, OK,
algorithm, cluster this group of
points into two groups. k equals 2.
If we do that-- I won't pass
line-by-line what is happening-- simply
running the algorithm with k equals
2, these would be the output.
So indeed, the algorithm is capable of
figuring out the two groups of points
based on their distance.
So we can also run the
algorithm with k equals 3.
After all, we are deciding the
number of groups to be created.
So if we run with k equals
3, the same lines of code
as before-- I'm changing k from
2 to 3-- we get three groups.
And so on.
With k equals 7, there are seven points.
And here it is.
So the crosses that are
there present in the plot
are simply, if you wish,
the center of mass,
center of gravity of the group--
simply the middle of the group.
So this is what we can do easily
with points in the world of Flatland.
And we can, in fact, move, very
much like we have done now,
to the world of documents.
And so this is the collection
of strings I presented to you.
This is the bags of words
matrix that we can easily
construct using some function
from the external module sklearn.
Again, I won't spend much
detail on parsing this code.
I want you to appreciate the fact
that really, in a few lines of code,
we can get something like
this running in Python.
So we can have a look at the
dictionary, is what I presented to you
earlier-- "awesome," "best,"
"cakes," and the like.
We can get to the frequency
matrix, as before.
And we can indeed run
K-means with k equals 2.
And if we do that, we indeed
have the output that we expect,
meaning the algorithm is
capable of figuring out
that the two clusters should
be divided by the words "dog,"
"cat," "keeps," and the words
"awesome," "staff", and "CS50."
So this is per the simple
example of strings.
We can go to the more
interesting example
of movies with the IMDB synopsis.
Run precisely the same line of code.
Now, the inputs for the
algorithm is the following list
of movies with their title
and the synopsis from IMDB.
And we can easily import
this Google spreadsheet
into Python using the pandas modules.
Again, I won't spend much time to it.
I want to get to the punch line.
This is the line of code
to import these data sets.
Here is, indeed, if we printed
these frames [INAUDIBLE] Python,
it's the same table as in
the Google spreadsheet.
And from this time on, we can
precisely take the same code
that we applied earlier in the
easier example in this case.
And if we do so, just by
creating the frequency matrix,
running K-means with k equals
2, we get the following output.
So indeed, the algorithm is capable of
figuring out that one class of movies
should be including movies such as
The King's Speech, Frozen Aladdin,
Cinderella, Robin Hood, and the like.
So The King's Speech, we wouldn't
really expect that, right?
Frozen, Aladdin, Cinderella,
Robin Hood-- OK, kids' movies.
But The King's Speech?
Well, let's hear the other cluster.
The other cluster would be Mad
Max, The Matrix, No Country For Old
Men, and the like.
So the way the algorithm is
thinking about it when we map it
to this higher-dimensional space
is by grouping movies together
based on the count words, as we see.
And so the reason why it's
taking The King's Speech
in the same group as Frozen,
Aladdin, Robin Hood and so on
is because of words such as
"king," "prince," "duke."
So this is the machine learning
behind-- this what the machine is doing.
Again, it might sound a
little bit counterintuitive,
coming from a human
learning point of view.
But the machine has only access to those
inputs that I showed to you earlier.
So let's just wrap up.
This was Python.
We have seen, indeed, two important
applications in the real world--
so the image recognition application
and the text clustering application.
These are just two applications out
of countlessly many applications that
are changing our life on a daily basis.
And in fact-- well, at this point,
just to mention something more in this.
This should ring a bell at
this point in the class.
It is, indeed, the pyramid in Mario.
And indeed, we can run a
machine learning algorithm
to play video games such as Mario.
The way we do that?
Well, we can have as a training set,
we can have human players play Mario.
And we can have an algorithm
watching them and learning from them.
Or we can have an algorithm watching
another algorithm watching playing.
And this is what indeed does
happen in the case of Go.
So Go is a popular
chess-board-like game.
It's much like chess.
But now the number of combinations
are astonishingly large.
It's more than the number
of atoms in the universe.
So playing this game has always
been considered sort of hard.
And it has been believed
for a long time to be
outside of the reach of modern
applications, modern machine
learning algorithms.
This was the picture
up to a few months ago,
in March 2016, when an algorithm
made by a researcher at Google
played against one of the
world champions at this game.
And here it is, the world champion,
world master of Go, Lee Sedol.
So before a series of five
games against the machine,
Lee released a statement claiming
that he would have expected
to win something like four to one.
Indeed, four to one
was the final outcome,
but it was the other way around.
So the machine won
four times out of one.
And what is really amazing-- this is
perceived, again, as a game-changer.
Deep learning algorithms
are behind this technology.
So much attention has
been drawn to the game
not just because the machine won, in
fact, four out of five, but because
during the game, really, new
plays were made by the algorithm.
And the algorithm was trained not
just by observing human masters
playing the game Go.
But it was also trained by looking
at itself-- at the algorithm itself--
playing against another algorithm at Go.
So in having access to
this set of training data,
the algorithm simply came up
with new, astonishing moves
that not even commentators of
the game were able to command.
And so this is where
we left off by watching
some of the reaction
of these commentators
while trying to interpret the
moves made by the machine.
CS50 will be back next week with
more on Python and on web servers.
[APPLAUSE]
[VIDEO PLAYBACK]
-And this is what [INAUDIBLE] from
the Google team was talking about,
is this kind of evaluation, value--
-That's a very surprising move.
-I thought it was a mistake.
-Well, I thought it was a quick miss.
But--
-If it were online Go,
we'd call it a clicko.
-Yeah, it's a very strange-- something
like this would be a more normal move.
-OK, you're going to have to-- so
do you have to think about this?
-This would be kind of a normal move.
And locally, white would answer here.
-Sure.
-But--
[END PLAYBACK]
DAVID J. MALAN: Thanks again.
[VIDEO PLAYBACK]
-All those freshman were, like, stoked
for the first lecture, you know?
Like cake, candy, swag, candy.
They, like, got their own DJ.
-But the first lecture
didn't go as planned.
-Malan, he had that phone book
ready to, like, tear its heart out.
-Well, you might first open the phone
book roughly to the middle, look down,
and--
-And then the dude just let it drop.
[MUSIC PLAYING]
-Hmm.
-Mind if I bum one of those?
-Oh, sure.
-Dude, there was
something in those pages.
This was like nothing I'd ever
seen before on the YouTubes.
-Was there any mention of Rosebud?
-Rosebud?
Is that, like, a programming language?
-[EXHALES]
[END  PLAYBACK]
