Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Today we are going to talk about PlaNET, a
technique that is meant to solve challenging
image-based planning tasks with sparse rewards.
Ok, that sounds great, but what do all of
these terms mean?
The planning part is simple, it means that
the AI has to come up with a sequence of actions
to achieve a goal, like pole balancing with
a cart, teaching a virtual human or a cheetah
to walk, or hitting this box the right way
to make sure it keeps rotating.
The image-based part is big - this means that
the AI has to learn the same way as a human,
and that is, by looking at the pixels of the
images.
This is a huge difficulty bump because the
AI does not only have to learn to defeat the
game itself, but also has to build an understanding
of the visual concepts within the game.
DeepMind’s legendary Deep Q-Learning algorithm
was also able to learn from pixel inputs,
but it was mighty inefficient at doing that,
and no wonder, this problem formulation is
immensely hard and it is a miracle that we
can muster any solution at all that can figure
it out.
The sparse reward part means that we rarely
get feedback as to how well we are doing at
these tasks, which is a nightmare situation
for any learning algorithm.
A key difference with this technique against
classical reinforcement learning, which is
what most researchers reach out to to solve
similar tasks, is that this one uses models
for the planning.
This means that it does not learn every new
task from scratch, but after the first game,
whichever it may be, it will have a rudimentary
understanding of gravity and dynamics, and
it will be able to reuse this knowledge in
the next games.
As a result, it will get a headstart when
learning a new game and is therefore often
50 times more efficient than the previous
technique that learns from scratch, and not
only that, but it has other really cool advantages
as well which I will tell you about in just
a moment.
Here you can see that indeed, the blue lines
significantly outperform the previous techniques
shown with red and green for each of these
tasks.
I like how this plot is organized in the same
grid as the tasks were as it makes it much
more readable when juxtaposed with the video
footage.
As promised, here are the two really cool
additional advantages of this model-based
agent.
The first is that we don’t have to train
six separate AIs for all of these tasks, but
finally, we can get one AI that is able to
solve all six of these tasks efficiently.
And second, it can look at as little as five
frames of an animation, which is approximately
one fifth of a second worth of footage…that
is barely anything and it is able to predict
how the sequence would continue with a remarkably
high accuracy, and, over a long time frame,
which is quite a challenge.
This is an excellent paper with beautiful
mathematical formulations, I recommend that
you have a look in the video description.
The source code is also available free of
charge for everyone, so I bet this will be
an exciting direction for future research
works, and I’ll be here to report on it
to you.
Make sure to subscribe and hit the bell icon
to not miss future episodes.
Thanks for watching and for your generous
support, and I'll see you next time!
