Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
A few years ago, scientists at DeepMind published
a learning algorithm that they called deep
reinforcement learning which quickly took
the world by storm.
This technique is a combination of a neural
network that processes the visual data that
we see on the screen, and a reinforcement
learner that comes up with the gameplay-related
decisions, which proved to be able to reach
superhuman performance on computer games like
Atari Breakout.
This paper not only sparked quite a bit of
mainstream media interest, but also provided
fertile grounds for new followup research
works to emerge.
For instance, one of these followup papers
infused these agents with a very human-like
quality, curiosity, further improving many
aspects of the original learning method, however,
had a disadvantage, I kid you not, it got
addicted to the TV and kept staring at it
forever.
This was perhaps a little too human-like.
In any case, you may rest assured that this
shortcoming has been remedied since, and every
followup paper recorded their scores on a
set of Atari games.
Measuring and comparing is an important part
of research and is absolutely necessary so
we can compare new learning methods more objectively.
It’s like recording your time for the olympics
at the 100 meter dash.
In that case, it’s quite easy to decide
which athlete is the best.
However, this is not so easy in AI research.
In this paper, scientists at DeepMind note
that just recording the scores doesn’t give
us enough information anymore.
There’s so much more to reinforcement learning
algorithms than just scores.
So, they built a behavior suite that also
evaluates the 7 core capabilities of reinforcement
learning algorithms.
Among these 7 core capabilities, they list
generalization, which tells us how well the
agent is expected to do in previously unseen
environments, how good it is at credit assignment,
which is a prominent problem in reinforcement
learning.
Credit assignment is very tricky to solve
because, for instance, when we play a strategy
game, we need to make a long sequence of strategic
decisions, and in the end, if we lose an hour
later, we have to figure out which one of
these many-many decisions led to our loss.
Measuring this as one of the core capabilities,
was, in my opinion, a great design decision
here.
How well the algorithm scales to larger problems
also gets a spot as one of these core capabilities.
I hope this testing suite will see widespread
adoption in reinforcement learning research,
and what I am really looking forward to is
seeing these radar plots for newer algorithms,
which will quickly reveal whether we have
a new method that takes a different tradeoff
than previous methods, or in other words,
has the same area within the polygon, but
with a different shape, or, in the case of
a real breakthrough, the area of these polygons
will start to increase.
Luckily, a few of these charts are already
available in the paper and they give us so
much information about these methods, I could
stare at them all day long and I cannot wait
to see some newer methods appear here.
Now note that there is a lot more to this
paper, if you have a look at it in the video
description, you will also find the experiments
that are part of this suite, what makes a
good environment to test these agents in,
and that they plan to form a committee of
prominent researchers to periodically review
it.
I loved that part.
If you enjoyed this video, please consider
supporting us on Patreon.
If you do, we can offer you early access to
these videos so you can watch them before
anyone else, or, you can also get your name
immortalized in the video description.
Just click the link in the description if
you wish to chip in.
Thanks for watching and for your generous
support, and I'll see you next time!
