You only get one shot, do not miss your chance
to blow this opportunity.
It actually takes more than one shot to train
a model.
Hello World it’s Siraj and welcome to Fresh
Machine Learning. This course is all about
making bleeding edge machine learning accessible
to developers. The field is moving faster
than Steve Ballmer at a developer conference.(scene-ballmer)
And there are so many incredible discoveries
being made every single day. We’re going
to learn to apply some of the latest machine
learning techniques to practical examples
that you can integrate into your own apps.
Neural Networks have been around since the
50s  but we’ve just never
had as much data and computing power as we
do now. These are the two key ingredients
that make neural nets capable of doing incredible
things like composing music and writing stories.
We call it Deep Learning these days and it
deserves the press it’s gotten, but so many
articles claim it replicates the human brain.
Some even make it sound like if we give a
deep neural net enough data and compute it’ll
suddenly become self aware!
The brain is indeed a neural network, but
do we really learn the way a deep neural net
does?
Well let’s think about it. In order for
a deep neural net to learn to, say, recognize
an image of a banana. You first have to feed
it hundreds of thousands of banana images.
As the algorithm discoveries commonalities
across the photos, it creates a series of
hierarchial abstractions and very soon is
able to recognize a novel image of a banana.
But think about the way you and I learn. If
I were to show you an image of a banana for
the first time, you’d probably be able to
recognize a novel banana instantly even if
it was a different shape or color. And toddlers
work this way as well. We humans don’t seem
to need thousands of examples to generalize.
Just a few. And we learn richer representations
than machines do as well. We can use the concepts
we learn in other ways like creating new examples,
or parsing an object into parts, or creating
new abstract categories of objects based on
existing categories. If we could create an
algorithm to do this, to learn concepts with
few examples, wouldn’t that be incredible?
It would further democratize the field so
that not just the big companies like Google
and Amazon with huge private internal datasets
are able to train their models, but anyone
can. All you’d need are a few examples.
So is there an algorithm that does this? That’s
able to learn rich abstract flexible representations
from sparse data?
Well there was a recent paper that came out
called Human level concept learning through
probabilsitic program induction. The authors
said, lets build a model capable of what’s
called one-shot learning. One shot learning
is a type of ML that learns an object category
after just one or a few examples.(scene-ohh)
They use something called Bayesian Program
Learning or BPL to do this. Bayesian referans
to Baye’s thereom, which attemtps to use
simple stochastic programst to reprsent concepts.
The word stocasthic, referring to the theory
of probability, is what Bayes theorem loosely
revolves around. So by using "simple stochastic
programs" or probability algorithms, BPL can
"represent concepts” so given the knowledge
of how to write a known letter, it can calculate
the chance of a novel letter being the same
or not. BPL builds these "simple stochastic
programs... compositionally from parts, subparts,
and spatial relations. All these things exist
in a hierarchy of knowledge which the machine
has gained through little experience. So they
trained it on a dataset of handwriting chracters
and it was able to recognize chracters with
a better error rate than deep learning or
even humans. OK so does that mean BPL is the
way to go? Well, it does have its flaws. It
lacks explciit knoweldge of certain things
like parallel lines, symmetry, and connections
between ends of strokes and other strokes.
And the learning isn’t really transferable
to other things so it’s not better than
deep learning in every way
A few months later though, DeepMind challenged
the paper by releasing their own called ‘One
shot learning with memory augmented neural
networks.’ The basic idea they had was that
deep learning is very data intensive, but
perhaps theres a way to build a deep neural
net that only needs a few examples to learn.
Deep Learning without the huge datasets. So
they built whats called a Memory Augmented
Neural Network or MANN. A MANN has two parts,
a controller which is either a feed forward
neural net or LSTM neural net and an external
memory module. The controller interacts with
the external memory module with a number of
read/write heads. Its capable of long term
storage via slow updates of weights and short
term storage via the external memory module.
They fed it a few examples of handwritten
characters and continously trained it thousands
of times on just those examples. It performed
meta learning, which means it learned to learn.
And guess what? It outperformed humans as
well! So they proved that one-shot learning
can be accomplished by using a neural network
architecture.
Which is pretty cool. So there are lots of
methodologies to implement one-shot learning
and in this episode we’re going to implement
our own. We’re gonna build a one-shot handwritten
character classifier in Python using the scipy
library so Hold on to to your butts.
We gotta import our dependencies first. We’re
going to want 3 libraries, numpy, scipy, and
copy. Once we have those we can define two
variables, the number of runs we want to complete
and a reference var for where we store our
class labels. Then in our main method we can
create an array of the size of runs which
is 20. We’ll use this array to store all
of our classification error rates, one every
run. Then we’lll write a for loop to train
our algorithm 20 times. For each run, we’ll
run a classification function which will attempt
to classify a small sample set of images,
and store the error rate in in the array.
Then we’ll print out the error rate to terminal
and when we are done with all of our runs,
we’ll go ahead and get the mean error rate
from the array and print it out as the last
statement in terminal. So how does this classification
step work?
Well, before we answer that , we need to understand
these two methods, loadimagaspoints and modified
hausdorff distance. The load image as points
function loads an image file, in our case
this will be a character image. It then converts
the image to an array and finds all the nonzero
values, that is all of the inked pixels and
stores that in an array. then it creates an
array of all the coordinates of those pixels
and returns that. The modified hausdorff distance
is a metric that computes the similarity between
two images by comparing their pixel coordinate
arrays. That comes from the load image as
points functions. It calculates the mean difference
between both images and returns it. The last
parameter of the classifciaton function, cost,
just notifies the function that small values
from the modified hausdorff distance mean
more similar. Lastly, let’s take a look
at the classification function itself. In
this function, we retrieve both our training
and testing images and load their image point
matrices into memory. Then we compute the
cost matrix using the modified hausdorff distance.
After that, we compute the error rate and
return it. That’s all! We do this for every
run and then average them all to get the average
rate. We can see that when we run this code,
i’ll return an average error rate of around
a third. Which isn’t state-of-the-art like
DeepMind or BPL but it does make for a good
baseline demo of one-shot learning. One shot
learning will only get more popular over time,
and pretty soon we’re going to start seeing
lots of production grade code using this.
Bunch of cool links down below, check em out
and please subscribe for more ML videos. For
now, I’ve gotta go fix an index out of bounds
error so thanks for watching
