Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
We have talked about some awesome previous
works where we used learning algorithms to
teach digital creatures to navigate in complex
environments.
The input is a terrain and a set of joints,
feet, and movement types, and the output has
to be a series of motions that maximizes some
kind of reward.
This previous technique borrowed smaller snippets
of movements from a previously existing database
of motions and learned to stitch them together
in a way that looks natural.
And as you can see, these results are phenomenal.
And the selling point of this new one, which
you might say, looks less elaborate, however,
it synthesizes them from scratch.
This problem is typically solved via reinforcement
learning, which is a technique that comes
up with a series of decisions to maximize
a prescribed score.
This score typically needs to be something
reasonably complex, otherwise the algorithm
is given too much freedom to maximize it.
For instance, we may want to teach a digital
character to run or jump hurdles, but it may
start crawling instead, which is still completely
fine if our objective is too simple, for instance,
just maximizing the distance from the starting
point.
To alleviate this, we typically resort to
reward engineering, which means that we add
additional terms to this reward function to
regularize the behavior of these creatures.
For instance, we can specify that throughout
these motions, the body has to remain upright
which likely favors locomotion-type solutions.
However, one of the main advantages of machine
learning is that we can reuse our solutions
for a large set of problems.
If we have to specialize our algorithm for
all terrain and motion types, and different
kinds of games, we lose out on one of the
biggest advantage of learning techniques.
So researchers at DeepMind decided that they
are going to solve this problem with a reward
function which is nothing else but forward
progress.
That's it.
The further we get, the higher score we obtain.
This is amazing because it doesn't require
any specialized reward function but at the
same time, there are a ton of different solutions
that get us far in these terrains.
And as you see here, beyond bipeds, a bunch
of different agent types are supported.
The key factors to make this happen is to
apply two modifications to the original reinforcement
learning algorithm.
One makes the learning process more robust
and less dependent on what parameters we choose,
and the other one makes it more scalable,
which means that it is able to efficiently
deal with larger problems.
Furthermore, the training process itself happens
on a rich, carefully selected set of challenging
levels.
Make sure to have a look at the paper for
details.
A byproduct of this kind of problem formulation,
is, as you can see, that even though this
humanoid does its job with its lower body
well, but in the meantime, it is flailing
its arms like a madman.
The reason is likely because there is not
much of a difference in the reward between
different arm motions.
This means that we most likely get through
a maze or a heightfield even when flailing,
therefore the algorithm doesn't have any reason
to favor more natural looking movements for
the upper body.
It will probably choose a random one, which
is highly unlikely to be a natural motion.
This creates high quality, albeit amusing
results that I am sure some residents of the
internet will honor with a sped-up remix video
with some Benny Hill music.
In summary, no precomputed motion database,
no handcrafting of rewards, and no additional
wizardry needed.
Everything is learned from scratch with a
few small modifications to the reinforcement
learning algorithm.
Highly remarkable work.
If you've enjoyed this episode and would like
to help us and support the series, have a
look at our Patreon page.
Details and cool perks are available in the
video description, or just click the letter
P at the end of this video.
Thanks for watching and for your generous
support, and I'll see you next time!
