Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
Between 2013 and 2015, DeepMind worked on
an incredible learning algorithm by the name
Deep Reinforcement Learning.
This technique looked at the pixels of the
game, was given a controller and played much
like a human would… with the exception that
it learned to play some Atari games on a superhuman
level.
I have tried to train it a few years ago and
would like to invite you for a marvelous journey
to see what happened.
When it starts learning to play an old game,
Atari breakout, at first, the algorithm loses
all of its lives without any signs of intelligent
action.
If we wait a bit, it becomes better at playing
the game, roughly matching the skill level
of an adept player.
But here's the catch, if we wait for longer,
we get something absolutely spectacular.
Over time, it learns to play like a pro, and
finds out that the best way to win the game
is digging a tunnel through the bricks and
hit them from behind.
This technique is combination of a neural
network that processes the visual data that
we see on the screen, and a reinforcement
learner that comes up with the gameplay-related
decisions.
This is an amazing algorithm, a true breakthrough
in AI research.
However, it had its own issues.
For instance, it did not do well on Montezuma’s
revenge or Pitfall because these games require
more long-term planning.
Believe it or not, the solution in a followup
work was to infuse these agents with a very
human-like property… curiosity.
That agent was able to do much, much better
at these games…and then got addicted to
the TV.
But that’s a different story.
Note that this has been remedied since.
And believe it or not, as impossible as it
may sound, all of this has been improved significantly.
This new work is called Agent57, and it plays
better than humans, on all 57 Atari games.
Absolute insanity.
Let’s have a look at it in action and then
in a moment, I’ll try to explain how it
does what it does.
You see Agent57 doing really well at the Solaris
game here.
This space battle game is one of the most
impressive games on the Atari as it contains
16 quadrants, 48 sectors, space battles, warp
mechanics, pirate ships, fuel management,
and much more, you name it.
This game is not only quite complex, but it
also is a credit assignment nightmare for
an AI to play.
This credit assignment problem means that
it can happen that we choose an action, and
we only win or lose hundreds of actions later,
leaving us with no idea as to which of our
actions led to this win or loss, thus, making
it difficult to learn from our actions.
This Solaris game is a credit assignment nightmare.
Let me try to bring this point to life by
talking about school.
In school, when we take an exam, we hand it
in, and the teacher gives us feedback for
every single one of our solutions and tells
us whether we were correct or not.
We know exactly where we did well, and what
we need to practice to do better next time.
Clear, simple, easy.
Solaris, on the other hand, not so much!
If this were a school project, the Solaris
game would be a brutal, merciless teacher.
Would you like to know your grade?
No grades, but he tells you that you failed.
Well, that’s weird, okay.
Where did we fail?
He won’t say.
What should we do better next time to improve?
You’ll figure it out bucko!
Also, we wrote this exam 10 weeks ago, why
do we only get to know about the results now?
No answer.
I think in this case, we can conclude that
this would be a challenging learning environment
even for a motivated human, so just imagine
how hard it is for an AI!
Hopefully this puts into perspective how incredible
it is that Agent57 performs well on this game.
It truly looks like science fiction.
To understand what Agent57 adds to this, it
was given something called a meta-controller
that can decide when to prioritize short and
long term planning.
On the short term, we typically have mechanical
challenges, like avoiding a skull in Montezuma’s
revenge or dodging the shots of an enemy ship
in Solaris.
The long term part is also necessary to explore
new parts of the game, and have a good strategic
plan to eventually win the game.
This is great because this new technique can
now deal with the brutal and merciless teacher
who we just introduced.
Alternatively, this agent can be thought of
someone who has a motivation to explore the
game and do well at mechanical tasks at the
same time and can also prioritize these tasks.
With this, for the first time, scientists
at DeepMind found a learning algorithm that
exceeds human performance on all 57 Atari
games.
And please, do not forget about the fact that
DeepMind tries to solve general intelligence,
and then, use general intelligence to solve
everything else.
This is their holy grail.
In other words, they are seeking an algorithm
that can learn by itself and achieve human-like
performance on a wide variety of tasks.
There is still plenty to do, but, we are now
one step closer to that.
If you learn only one thing from this video,
let it be the fact that there are not 57 different
methods, but one general learning algorithm
that plays 57 games better than humans.
What a time to be alive!
I would like to show you a short message from
a few days ago that melted my heart.
This I got from Nathan, who has been inspired
by these incredible works and he decided to
turn his life around, and go back to study
more.
I love my job, and reading messages like this
is one of the absolute best parts of it.
Congratulations Nathan and note that you can
take this inspiration and greatness can materialize
in every aspect of life, not only in computer
graphics or machine learning research.
Good luck!
If you're a researcher or a startup looking
for cheap GPU compute to run these algorithms,
check out Lambda GPU Cloud.
I've talked about Lambda's GPU workstations
in other videos and am happy to tell you that
they're offering GPU cloud services as well.
The Lambda GPU Cloud can train Imagenet to
93% accuracy for less than $19! Lambda's web-based
IDE lets you easily access your instance right
in your browser.
And finally, hold on to your papers, because
the Lambda GPU Cloud costs less than half
of AWS and Azure.
Make sure to go to lambdalabs.com/papers and
sign up for one of their amazing GPU instances
today.
Our thanks to Lambda for helping us make better
videos for you.
Thanks for watching and for your generous
support, and I'll see you next time!
