Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
This footage that you see here came freshly
from Google DeepMind's lab, and is about benchmarking
reinforcement learning algorithms.
Here, you see the classical cartpole swing-up
task from this package.
As the algorithm starts to play, a score is
recorded that indicates how well it is doing,
and the learner has to choose the appropriate
actions depending on the state of the environment
to maximize this score.
Reinforcement learning is an established research
subfield within machine learning with hundreds
of papers appearing every year.
However, we see that most of them cherry-pick
a few problems and test against previous works
on this very particular selection of tasks.
This paper describes a package that is not
about the algorithm itself, but about helping
future research projects to be able to test
their results against previous works on an
equal footing.
This is a great idea, which has been addressed
earlier by OpenAI with their learning environment
by the name Gym.
So the first question is, why do we need a
new one?
The DeepMind Control Suite provides a few
differentiating features.
One, Gym contains both discrete and continuous
tasks, where this one is concentrated on continuous
problems only.
This means that state, time and action are
all continuous which is usually the hallmark
of more challenging and life-like problems.
For an algorithm to do well, it has to be
able to learn the concept of velocity, acceleration
and other meaningful physical concepts and
understand their evolution over time.
Two, there are domains where the new control
suite is a superset of Gym, meaning that it
offers equivalent tasks, and then some more.
And three, the action and reward structures
are standardized.
This means that the results and learning curves
are much more informative and easier to read.
This is crucial because research scientists
read hundreds of papers every year, and this
means that they don't necessarily have to
look at videos, they immediately have an intuition
of how an algorithm works and how it relates
to previous techniques just by looking at
the learning curve plots.
Many tasks also include a much more challenging
variant with more sparse rewards.
We discussed these sparse rewards in a bit
more in detail in the previous episode, if
you are interested, make sure to click the
card on the lower right at the end of this
video.
The paper also contains an exciting roadmap
for future development, including quadruped
locomotion, multithreaded dynamics and more.
Of course, the whole suite is available, free
of charge for everyone.
The link is available in the description.
Super excited to see a deluge of upcoming
AI papers and see how they beat the living
hell out of each other in 2018.
Thanks for watching and for your generous
support, and I'll see you next time!
