The thing I would very much like to talk about today
is the state of the art in deep learning.
Here we stand in 2019
really at the height of some of the great accomplishments
that have happened. But also stand at the beginning.
And it's up to us to define where this incredible
data-driven technology takes us.
And so I'd like to talk a little bit about
the breakthroughs that happened in 2017 and 2018
that take us to this point.
So this lecture is not on the state of the art results on
main machine learning benchmarks.
So the various image classification and object detection
or the NLP benchmarks
or the GAN benchmarks.
This isn't about the cutting edge algorithm
that's available on github that performs
best on a particular benchmark. This is about ideas
ideas and developments that are at the cutting edge
of what defines this exciting field of deep learning.
And so I'd like to go through a bunch of different
areas that I think they're really exciting.
Of course this is also not a lecture that's complete
There's other things that may be totally missing that happened
in 2017-18 that are particularly exciting
to people here and people beyond.
For example medical applications of deep learning
is something I totally don't touch on.
And protein folding and all kinds of applications
that there has been some exciting developments
from deep mind and so on that don't touch on.
So forgive me if your favorite developments are missing
but hopefully this encompasses
some of the really fundamental things that have happened
both on the theory side and the application side
and then the community side of all of us being able to work
together on this and these kinds of technologies.
I think 2018 in terms of deep learning is the year of
natural language processing.
Many have described this year as the ImageNet moment.
In 2012 for computer vision when AlexNet was the first
neural network that really gave that big jump in performance.
And computer vision it started to inspire people what's possible
with deep learning with purely learning based methods.
In the same way there's been a series of developments
from 2016-17 led up to 18 with a development of BERT
that has made on benchmarks and in our ability to
apply NLP to solve various NLP tasks,
natural language processing tasks a total leap.
So let's tell the story of what takes us there.
There's a few developments.
I've mentioned a little bit on Monday
about the encoder decoder or recurrent neural networks.
So this idea of recurrent neural networks encode sequences of data
and output something,
output either a single prediction or another sequence.
When the input sequence and the output sequence
are not the same, necessarily the same size,
they're like in machine translation we have to translate
from one language to another
the encoder decoder architecture takes the following process.
It takes in the sequence of words
or the sequence of samples as the input
and uses the recurrent units whether LSTM, GRU and beyond
and encodes that sentence into a single vector.
So forms an embedding of that sentence of what it
represent, representation of that sentence.
And then feeds that representation in the decoder
recurrent neural network
that then generates the sequence of words that form
the sentence in the language that's being translated to.
So first you encode by taking the sequence and
mapping it to a fixed size vector representation.
And then you decode by taking that fixed size vector representation
and unrolling it into the sentence
that can be of different length than the input sentence.
Okay that's the encoder-decoder structure for recurrent neural networks
has been very effective for machine translation
and dealing with arbitrary length input sequences,
arbitrary length output sequences.
Next step attention.
What is attention?
Well it's the next step beyond it's an improvement on the
the encoder-decoder architecture.
It allows the, it provides a mechanism that allows
to look back at the input sequence.
So suppose to saying that you have a sequence
that's the input sentence
and that all gets collapsed into a single vector representation.
You're allowed to look back at the particular samples
from the input sequence
as part of the decoding process.
That's attention and you can also learn which aspects
are important for which aspects of the decoding process,
which aspects the input sequence
are important to the output sequence.
Visualize in another way
and there's a few visualizations here.
They're quite incredible
that are done by Jay Alammar.
I highly recommend you follow the links and look at the
further details of these visualizations of attention.
So if we look at neural machine translation
the encoder RNN takes a sequence of words
and throughout, after every sequence forms a set of
hidden representations, hidden state that captures the
representation of the worlds that followed.
And those sets of hidden representations
as opposed to being collapsed to a single fixed size vector, are then
all pushed forward to the decoder.
That are then used by the decoder to translate
but in a selective way.
Where the decoder here visualized on the y-axis
the input language and on the X the output language
the decoder weighs the different parts of the input sequence differently
in order to determine how to best translate generate
the word that forms a translation in the full output sentence.
Okay that's attention,
allowing expanding the encoder-decoder architecture
to allow for selective attention to the input sequence
as opposed to collapsing everything down into fixed representation.
Okay next step self-attention.
In the encoding process allowing the encoder to also
selectively look informing the hidden representations
at other parts of the input sequence
in order to form those representations.
It allows you to determine for certain words.
What are the important relevant aspects of the input sequence
that can help you encode that word the best?
So it improves the encoder process by allowing
it to look at the entirety of the context.
That's self-attention.
Building a transformer.
It's using the self attention mechanism in the encoder
to form these sets of representations on the input sequence.
And then as part of the decoding process follow the same
but in reverse with a bunch of self-attention
that's able to look back again.
So it's self attention on the encoder attention on the decoder
and that's where the magic, that's where the entirety magic is.
That's able to capture the rich context
of the input sequence in order to generate
in the contextual way the output sequence.
So let's take a step back then
and look at what is critical to natural language
in order to be able to reason about words,
construct a language model
and be able to reason about the words in order to
classify a sentence or translate a sentence
or compare two sentences and so on.
There the sentences are collections of words or characters
and those characters and words have to have an efficient representation
that's meaningful for that kind of understanding.
And that's what the process of embedding is.
We talked a little bit about it on Monday.
And so the traditional Word2Vec  process of
embedding is you use some kind of trick
in an unsupervised way to map words into
into a compressed representation.
So language modeling is the process of determining
which words follow each other usually.
So one way you can use it as in a skip gram model
taking a huge datasets of words
you know, there's writing all over the place taking those datasets
and feeding a neural network that in a supervised way looks
at which words are usually follow the input.
So the input is a word the output is which word are
statistically likely to follow that word.
And the same with the preceding word.
And doing this kind of unsupervised learning
if you throw away the output and the input
and just taking the hidden representation form in the middle
that's how you form this compressed embedding
a meaningful representation that when
two words are related in a language modeling sense,
two words that are related they're going to be
in that representation close to each other.
And when they're totally unrelated have nothing to
do with each other they're far away
ELMo is the approach of using bi-directional L STMs
to learn that representation.
And what bi-directional, bi-directionally?
So looking not just the sequence that let up to the word
but in both directions the sequence that
following, the sequence that before.
And that allows you to learn the rich full context of the word.
In learning the rich full context of the word
you're forming representations
that are much better able to represent the statistical language model
behind the kind of corpus of language that you're you're looking at.
And this has taken a big leap in ability to then
that for further algorithms then with the language model
a reasoning about doing things like
sentence classification, sentence comparison, so on.
Translation that representation is much more effective
for working with language.
The idea of the OpenAI transformer
is the next step forward is taking the
the same transformer that I mentioned previously.
The encoder with self-attention decoder with
attention looking back at the input sequence.
And using, taking the language learned by the decoder
and using that as a language model
and then chopping off layers and training in a specific
on a specific language tasks like sentence classification.
Now BERT is the thing that did the big leap in performance.
With the transformer formulation there is always
there's no bi-directional element.
There is, it's always moving forward.
So the encoding step and the decoding step with BERT is
it's richly bi-directional
it takes in the full sequence of the sentence
and masks out some percentage of the words,
15% of the words.
15% of the samples of tokens from the sequence.
And tasks the entire encoding
self-attention mechanism to predict the words that are missing.
That construct and then you stack a ton of them together.
A ton of those encoders self-attention feed-forward network,
self attention feed forward network together.
And that allows you to learn the rich context of the language
to then at the end perform all kinds of tasks.
You can create first of all, like Elmo
and like Word2Vec, create rich contextual embeddings.
Take a set of words and represent them in the space
that's very efficient to reason with.
You can do language classification,
you can do settings pair classification,
you can do the similarity of two sentences,
multiple choice question answering,
general question answering,
tagging of sentences.
okay I'll link it on that one a little bit too long.
but it is also the one I'm really excited about
and really if there's a breakthrough this year
is been it's thanks to BERT.
The other thing I'm very excited about is totally
jumping away from the new rips,
the theory, those kind of academic developments
and deep learning and into the world of applied deep learning.
So Tesla has a system called Autopilot
where the hardware version 2 of that system
is a newer  implementation of the NVIDIA Drive PX 2 system
which runs a ton of neural networks.
There's 8 cameras on the car and
a variant of the inception network is now taking in all a cameras
at different resolutions as input
and performing various tasks,
like drivable area segmentation, like object detection
and some basic localization tasks.
So you have now a huge fleet of vehicles where it's not engineers
some I'm sure engineers but it's really regular consumers,
people that have purchased the car have no understanding
in many cases of what neural networks
limitations the capabilities are so on.
Now it has a neural network is controlling the well being
has its decisions, its perceptions
and the control decisions based on those perceptions
are controlling the life of a human being.
And that to me is one of the great breakthroughs of 17 and 18.
In terms of the development of what AI
can do in a practical sense in impacting the world.
And so one billion miles
over 1 billion miles have been driven in Autopilot.
Now there's two types of systems in currently operating in Tesla's.
.There's hardware version 1, hardware version 2.
Hardware version 1 was Intel Mobileye
monocular camera perception system.
As far as we know that was not using a neural network.
And it was a fix system.
That wasn't learning, at least online learning in the Tesla's.
The other is hardware version 2
and it's about half and half now in terms of the miles driven.
The hardware version 2 has a neural network that's always learning.
There's weekly updates.
It's always improving the model shipping new weights and so on.
That's the exciting set of breakthroughs
in terms of AutoML, the dream of automating some aspects or
all aspects or many aspects as possible of the
machine learning process
where you can just drop in a dataset that you're working on
and the system will automatically determine all the parameters
from the details of the architectures,
the size are the architecture, the different modules and then architecture
the hyper parameters use for training the architecture
running that they're doing the inference everything.
All is done for you. All you just feed it is data
So that's been the success of the neural architecture search in 16 and 17.
And there's been a few ideas with Google AutoML that's really trying
to almost create an API we just drop in data set.
And it's using reinforcement learning
and recurrent neural networks to given a few modules,
stitch them together in such a way where the objective function
is optimizing the performance of the overall system.
And they've showed a lot of exciting results.
Google showed and others that outperform state of art systems
both in terms of efficiency and in terms of accuracy.
Now in 18 there've been a few improvements on
this direction and one of them is a AdaNet
where it's now using the same reinforcement
learning AutoML formulation to build ensembles on your network.
So in many cases state-of-the-art performance can be achieved
by as opposed to taking a single architecture,
is building up a multitude and ensemble a collection of architectures.
And that's what is doing here is given candidate architectures,
stitching them together to form an ensemble
to get state-of-the-art performance.
Now that state of the art performance is not a leap
a breakthrough leap forward but it's nevertheless a step forward.
And it's a very exciting field that's going to be
receiving more and more attention.
There's an area of machine learning that's heavily under studied
and I think it's extremely exciting area.
And if you look at 2012 with AlexNet achieving
the breakthrough performance
of showing what deep learning networks are capable of.
From that point, from 2012 to today there's been non-stop
extremely active developments of different architectures
that even on just ImageNet alone on doing the image classification task
have improved performance over and over and over with totally new ideas.
Now on the other side on the data side
there's been very few ideas about how to do data augmentation.
So data augmentation is the process of, you know, it's what
kids always do when you learn about an object right?
You look at an object and you kind of like twist it around is
is taking the raw data and messing it in such a way
that it can give you much richer
representation of what this can this data can look like in other forms
in other contexts in the real world.
There's been very few developments I think still
and there's this AutoAugment is just a step
a tiny step into that direction that I hope that
we as a community invest a lot of effort in.
So what AutoAugment does?
As it says, ok, so there's these data augmentation methods
like translating the image,
sharing the image, doing color manipulation like color inversion.
Let's take those as basic actions you can take
and then use reinforcement learning
and an RNN again construct to stitch those actions
together in such a way that can augment data
like an ImageNet, you train on the data, it gets state-of-the-art performance.
So mess with the data in a way that optimizes
the way you mess with the data. So.
And then they've also showed that given that the
set of data augmentation policies that are learned
to optimize for example for ImageNet
given the some kind of architecture
you can take that learn the set of policies for data augmentation
and apply it to a totally different dataset.
So there's the process of transfer learning.
So what is transfer learning?
We talked about transfer learning, you have a
neural network that learns to do cat versus dog
or no learns to do a thousand class classification problem on image.
And then you transfer, you chop off few layers and you transfer on the task of
your own dataset of cat versus dog.
What you're transferring is the weights
that are learned on the ImageNet classification task.
And now you're then fine-tuning those weights on the
specific, personal cat vs. dog dataset you have.
Now you can do the same thing here.
You can transfer as part of the transfer learning process,
take the data augmentation policies learned on ImageNet,
and transfer those.
You can transfer both the weights and the policies.
That's a really super exciting idea I think.
It wasn't quite demonstrated extremely well here
in terms of performance,
so it got an improvement in performance and so on,
but any kind of inspired an idea that's something
that we need to really think about.
How to augment data in an interesting way
such that given just a few samples of data?
We can generate huge data sets in a way that you can then form
meaningful complex rich representations from.
I think that's really exciting in one of the ways that you break
open the problem of how do we learn a lot from a little.
Training deep neural networks with synthetic data.
This also really an exciting topic
that a few groups but especially NVIDIA invested a lot in.
Here's a from a CVPR2018 probably my favorite work on this topic
is they really went crazy and said ok let's mess
with synthetic data in every way we could possibly can.
So on the left there're shown a set of backgrounds
then there's also a set of artificial objects
and you have a car or some kind of object that you're trying to classify.
So let's take that car and mess with it with every way possible.
Apply lighting variation to whatever way possible,
rotate everything that is crazy so
what NVIDIA is really good at is creating realistic scenes.
And they said okay let's create realistic scenes
but let's also go away aboveboard and not do realistic at all.
Do things that can't possibly happen in reality.
And so generally these huge datasets I want
to train and again achieve quite interesting
quite a quite good performance
on image classification. Of course they're trying to
apply  to ImageNet and so on these kinds of tasks,
you're not going to outperform networks that were trained on ImageNet.
But they show that with just a small sample from from those real images
they can fine tune this network train on synthetic images,
totally fake images
to achieve state of the art performance.
Again another way to generate, to get, to learn a lot for very little
by generating fake worlds synthetically.
The process of annotation which for supervised learning
is what you need to do in order to
train the network, you need to be able to provide
ground truth, you need to be able to label
whatever the entity that is being learned.
And so for image classification that's saying what is going on in the image.
And part of that was done on ImageNet by
doing a Google search for creating candidates.
Now saying what's going on in the image is a pretty easy tasks.
Then there is the object detection task of
detecting the boundary box.
And so saying drawing the actual boundary box is a little bit more difficult
but it's a couple of clicks and so on.
Then if we take the finals the probably one of the higher
complexity tasks of perception
of image understanding is segmentation.
It's actually drawing either pixel level or polygons
the outline of particular object.
Now if you have to annotate that that's extremely costly.
So the work with Polygon-RNN is to use recurrent neural networks
to make suggestions for polygons.
It's really interesting.
There's a few tricks to form these high-resolution polygons.
So the idea is it drops in a single point
you draw a boundary box around an object.
You use convolutional neural networks to drop the first point.
And then use recurrent neural networks to draw around it.
And the performance is really good
There's a few tricks and this tool is available online.
It's a really interesting idea again the dream with AutoML is to remove
the human from the picture as much as possible.
With data augmentation remove the human from the
picture as much as possible for a menial data.
Automate the boring stuff and in this case
the act of drawing a polygon tried to automated as much as possible.
The interesting other dimension along which
deep learning is recently being trying to be optimized
is how do we make deep learning accessible.
Fast, cheap, accessible.
So the DAWNBench from Stanford the benchmark
the DAWNBench benchmark from Stanford
asked formulated an interesting competition,
which got a lot of attention and a lot of progress.
It's saying if we want to achieve 93% accuracy
on ImageNet and 94% on CIFAR10,
let's now compete, that's like the requirement,
let's now compete how you can do it in the least amount of time
and for the least amount of dollars.
Do the training in the least amount of time
and the training in the least amount of dollars
like literally dollars you are allowed to spend to do this.
And fast AI you know it's a renegade
awesome renegade group of deep learning researchers
have been able to train on ImageNet in 3 hours.
So this is for training process for 25 bucks.
So training a network that achieves 93% accuracy for 25 bucks,
and 94% accuracy for 26 cents on CIFAR10.
So the key idea that they were playing with is quite simple.
But really boils down to messing with the learning rate
throughout the process of training.
So the learning rate is how much you based on the loss function
based on the error the neural network observes,
how much do you adjust the weights.
So they found that if they crank up the learning rate
while decreasing the momentum,
which is a parameter of the optimization process,
and they do it that jointly they're able to make the network learn really fast.
That's really exciting and the benchmark itself is also really exciting
because that's exactly for people sitting in this room
that opens up the door to doing all kinds of fundamental deep learning
problems without the resources of Google DeepMind
or OpenAI or Facebook or so on, without computational resources.
That's important for academia that's important for independent researchers and so on.
So GANs. There's been a lot of work on
generative adversarial neural networks.
And in some ways there has not been breakthrough
ideas in GANs for quite a bit.
And I think began from Google DeepMind an ability to generate
incredibly high-resolution images.
And it's the same GAN technique,
so in terms of breakthroughs and innovations but scaled.
So the increase the model capacity and increase the the batch size
the number of images that are fed
that are fed to the network. It produces incredible images
I encourage you to go online and and look at them
It's hard to believe that they're generated.
So that was 2018 for GANs was a year of scaling and parameter tuning
as opposed to breakthrough new ideas.
Video-to-Video Synthesis. This work is from NVIDIA
is looking at the problem so there's been a lot of work
on general going from image to image.
So from a particular image generating another image.
So whether it's colorizing an image or just to traditionally define GANs.
The idea with video to video synthesis that a few
people have been working on but
NVIDIA took a good step forward is to make the video
to make the temporal consistency the temporal dynamics
part of the optimization process.
So make it look not jumpy.
So if you look here at the comparison the for this particular.
So the input is the labels on the top left and the output of the
of the NVIDIA approach is on the bottom right.
See it's temper it's very temporarily consistent.
If you look at the image to image mapping that's
that state the pix2pixHD.
It's very jumpy, it's not temporally consistent at all.
And there's some naive approaches for trying to maintain temporal consistency.
That's in the bottom left.
So you can apply this to all kinds of tasks all kinds of video to video mapping.
Here is mapping it to face edges.
Edge detection on faces mapping it to faces.
Generating faces from just edges.
You can look at body pose to actual images.
As an input to the network you can take the pose of the person
and generate the  video of the person.
Okay semantic segmentation.
The problem of perception, so if began with AlexNet and ImageNet
has been further and further developments
where the input, the problem is of basic image classification,
where the input is an image
and the output is a classification was going on in that image
and the fundamental architecture can be reused
for more complex tasks like detection
like segmentation and so on, interpreting what's going on in the image.
So these large networks from VGGNet, GoogLeNet,
ResNet, SENet, DenseNet
all these networks are forming rich representations
that can then be used for all kinds of tasks
whether that task is object detection.
This here shown is the region based methods
where the neural network is tasked the
convolutional layers make region proposals.
So much of candidates to be considered.
And then there's a step that's determining what's in those different regions
and forming boundary boxes around them in a for-loop way.
And then there is the one-shot method single-shot method where in a single pass
all of the boundary boxes in their classes generated.
And there has been a tremendous amount of work
in the space of object detection.
Some are single shot method, some are region based methods.
And there's been a lot of exciting work
but not more not I would say breakthrough ideas.
And then we take it to the highest level of perception
which is semantic segmentation.
There's also been a lot of work there the state of the art performance
is at least for the open source systems
is DeepLabv3+ on the PASCAL VOC challenge.
So semantic segmentation and catch it all up started 2014
with fully convolution neural networks.
Chopping off the fully connected layers and then
outputting the heatmap very grainy very low resolution.
Then improving that was SegNet performing maxpooling
with a breakthrough idea that's reused in a lot of cases is
Dilated Convolution, Atrous convolutions
having some spacing which increases the
field of view of the convolutional filter.
The key idea behind DeepLabv3 that
is the state of the art is the multi-scale processing.
Without increasing the parameters the multi scale
is achieved by the "atrous rate"
So taking those atrous convolutions and increasing the spacing.
And you can think of the increasing that spacing
by enlarging the model's field of view.
And so you can consider all these different scales of processing and looking at the
at the layers of features.
So allowing you to be able to grasp the greater context
as part of the upsampling deconvolutional step.
And that's what's produced in the state of art performances
and that's where we have the tutorial
on github showing this DeepLab
architecture trained on CityScapes.
CityScapes is a driving segmentation data set
that is one of the most commonly used for the task of driving scene segmentation.
Okay on the deep reinforcement learning for.
So this is touching a bit a bit on the 2017.
But i think the excitement really settled in 2018
as the work from Google and from OpenAI, DeepMind.
So it started in DQN paper from Google DeepMind where they beat a bunch of
a bunch of Atari games achieving superhuman performance
with deep reinforcement learning methods.
That are taking in just the raw pixels of the game,
so the same kind of architecture is able to learn how to beat these,
how to beat these games. Super exciting idea that kind of has echoes
of what general intelligence is. Taking in the raw
raw information and being able to understand
the game, the sort of physics of the game sufficient to be able to beat it.
Then in 2016 AlphaGo with some supervision and some playing against itself,
self play, some supervised learning on expert world champ players
and some self play where it plays against itself
was able to beat the top of the world champion at Go.
And then 2017 AlphaGo Zero a specialized version of Alpha Zero
was able to beat the AlphaGo
with just a few days of training.
and zero supervision from expert games.
So through the process of self play again this is kind of
getting the human out of the picture more and more and more
which is why Alpha Zero is probably or this
AlphaGo Zero was the demonstration of
the cleanest demonstration of all the nice progress
in deep reinforcement learning.
I think if we look at the history of AI
when you're sitting on a porch hundred years from now
sort of reminiscing back Alpha Zero will be a thing that people will
remember as an interesting moment in time,
as a key moment in time.
And Alpha Zero was applied in 2017 to beat.
Alpha Zero paper was in 2017 and it was this year
played StockFish in chess which is the best engine, chess playing engines
is able to beat it with just four hours of training
of course the four hours this caveat.
Because four hours for Google DeepMind is highly distributed training.
So it's not four hours for an undergraduate student sitting in their dorm room.
But meaning it was able to self play to very quickly
learn to beat the state of the art chess engine.
And learned to beat the state of the art Shogi engine Elmo.
And the interesting thing here is you know with perfect information games like chess
you have a tree and you have all the decisions you could possibly make
and so the farther along you look at along that tree presumably the better you do.
That's how DeepBlue beat Kasparov in the 90s
is you just look as far as possible in a down the tree
to determine which is the action is the most optimal.
If you look at the way human grandmasters think
it certainly doesn't feel like they're like looking down a tree.
There's something like creative intuition there's something like
you can see the patterns in the board,
you can do a few calculations but really it's an order of hundreds.
It's not on the order of millions or billions which is kind of the
the StockFish the state of the art chess engine approach.
And Alpha Zero is moving closer and closer closer towards
the human grandmaster concerning very few future moves.
It's able through the neural network estimator
that's estimating the quality of the move
and the quality of the different, the current quality of the board and
and the quality of the moves that follow.
It's able to do much much less look ahead.
So the neural network learns the fundamental information
just like when a grandmaster looks
at a board they can tell how good that is.
So that's again interesting, it's a step towards
at least echoes of what human intelligence is in this very structured
formal constrained world of chess
and go and shogi.
And then there's the other side of the world that's messy.
It's still games. It's still constrained in that way
but OpenAI has taken on the challenge of playing games
that are much messier to have this resemblance
of the real world and the fact that you have to do teamwork,
you have to look at long time horizons
with huge amounts of imperfect information,
hidden information, uncertainty.
So within that world they've taken on the challenge of a popular game Dota 2.
On the human side of that
there's the competition the international hosted every year
where you know in 2018 the winning team gets 11 million dollars.
So it's a very popular very active competition has been
going on for a few years.
They've been improving and it achieved a lot of interesting milestones in 2017.
Their 1v1 bot beat the top professional Dota 2 player.
The way you achieve great things is as you try.
And in 2018 they tried to go 5v5. The OpenAI team lost two games
a go against the top Dota 2 players at the 2018 international.
And of course their ranking here the MMR ranking in Dota 2
has been increasing over and over but there's a lot of challenges
here that make it extremely difficult.
To beat the human players and this is, you know, in every story rocky
or whatever you think about losing is essential element
of a story that leads to then
a movie in a book and the greatness.
So you better believe that they're coming back next year.
And there's going to be a lot of exciting developments there.
It also, Dota 2 and this particular video game makes it currently
this really two games that have the public eye
in terms of AI taking on his benchmarks.
So we saw go incredible accomplishment
What's next?
So last year the associate were the best paper in Europe's.
There was the heads up Texas No Limit Hold'em
AI was able to beat the top level players was completely current
well not completely but currently out of reach
is the general not heads up one versus one but the general team
Texas No Limit Hold'em here you go.
And on the gaming side this dream of Dota 2 now
that's the benchmark that everybody's targeting.
And it's actually incredibly difficult one and some people think
would be a long time before we can win.
And on the more practical side of things the
2018, start in 2017 has been a year of
of the frameworks growing up of maturing
and creating ecosystems around them.
With TensorFlow with the history there dating back a few years
has really with TensorFlow 1.0 as come
to be sort of a mature framework PyTorch 1.0 came out 2018 is matured as well.
And now the really exciting developments in the
TensorFlow with the eager execution and beyond
that's coming out TensorFlow 2.0 in in 2019.
So really those two players have made incredible leaps in standardizing deep learning.
In the fact that a lot of the ideas I talked about today
and Monday and we'll keep talking about
are all have a github repository
with implementations in TensorFlow and PyTorch.
Making extremely accessible and that's really exciting.
it's probably best to quote Geoff Hinton the "Godfather" of deep learning,
one of the key people behind backpropagation said recently
on backpropagation is "My view is throw it all away and start again"
His believes backpropagation is totally broken and an idea that has ancient
and it needs to be completely revolutionized
and the practical protocol for doing that is he said the future
depends on some graduate student who's deeply suspicious of everything I've said
that's probably a good way to end
the discussion about what the state of the art
in deep learning holds because everything we're doing is fundamentally based on
ideas from the 60s and the 80s and really in terms of
new ideas, there has not been many new ideas
especially the state of the art results that I've mentioned
are all based on fundamentally,
on stochastic gradient descent and backpropagation.
It's ripe for totally new ideas. So it's up to us to define
the real breakthroughs and the real state of the art 2019
and beyond. So that I'd like to thank you and
the stuff is on the website deeplearning.mit.edu.
