Agent 57 come in Agent 57!
when you need a job done you can always
count on agent 57
and his metacontroller to handle
anything thrown his way
agent 57 doesn't produce the best
results in
every area but he's better than the
humans
so i get a lot of questions most common
question actually is revolving around
artificial general intelligence
what it means and what it means for the
future of artificial intelligence
i like to stay away from artificial
general intelligence because
there's so much more that we can focus
on today and in the near future
that'll increase our efficiency
effectiveness
and our return on value i think those
are a lot more
important to focus on on these videos
for you guys
but there are some just monumental
breakthroughs that are happening in
artificial general intelligence
this video will talk about one of them
agent 57
the odds are that you haven't heard of
agent 57 before
and its triumphs weren't nearly as
trumpeted as
Gary Kasparov Lee Sedol and Team Liquid's
fall
to agent 57's predecessors agent 57
is the latest advancement towards
artificial general intelligence
it's the best atari player and it just
broke a significant barrier
it's able to outperform humans in every
one of the 57 games in the near decade
old atari benchmark
you may have heard my spiel before about
how games are
great at training artificial
intelligence as they represent similar
challenges to what a i would face in the
real world and agent 57's case is unique
in this area
in one game you can train an ai to do
one
thing very very well because it only has
to do that one thing
in its niche AlphaStar was excellent
at Starcraft 2 but not very good at much
else
in this case the atari benchmark is an
example of our pursuit towards
artificial general intelligence
because it forces the AI to learn and
perform in
all these diverse cases artificial
general intelligence or AGI
is different from AI sometimes called
narrow AI
in comparison and the fact that its
experience
and learning extend beyond just one
niche
similar to how humans learn and perform
when should AI explore when should it
exploit
what memory should it carry over into
other domains
and how can it develop high-level
sophisticated behavioral strategies
have you ever played yar's revenge look
at this game
what is going on here how the heck is
anything or anyone supposed to know
what's happening
or excel at this well agent 57 knocks
yar's revenge
out of the park agent 57 displays better
than human level performance
and problem solving and i'll say this
again in every game of this benchmark
even the tricky ones like solaris
montezuma's revenge
and pitfall oh pitfall man that's tricky
by developing and training agent 57,
DeepMind
has taken one large step towards
artificial general intelligence
even though it may seem small AI has
already defeated
human champs at Dota 2 and Starcraft 2.
two of the most
complex games today check out those
videos above
they aren't more general and i'm using
the term in air quotes here
than agent 57 perhaps but they are super
powerful and groundbreaking achievements
because agent 57 can see more generally
it is because it stands on the shoulders
of its metaphorical giant
silicon wafer ancestors agent 57
has made a few significant tweaks to
enable this outstanding in general
performance
memory seems like it's something that
would be useful in a game
more complicated than just batting a
ball around right
suppose that game or real life for that
matter
required you to walk into rooms and
remember how much fruit were in each
room
without memory your performance would be
just as good as random guessing
agent 57 comes equipped with
recurrent neural networks forming basic
memory
and off-policy learning that is a little
bit more abstract
so imagine that you need to know the
size of any oranges you saw
but the weight of apples you saw this is
an example
that's easy in hindsight but immensely
intricate problem to solve if the ai
doesn't know explicitly what these rules
are
deep mind also created distributed
reinforcement learning schemes
where the data has been decoupled from
the learning process
and stored agents fed data into the
memory bank where the learners can just
simply sample the data just like a human
playing football can
learn from memories or video footage of
the last games they played
and potentially even from other humans
this helps
AI minimize losses one of the coolest
tweaks
deepmind instilled in agent 57 was its
ability
and intrinsic motivation for exploration
and its balance with exploitation agent
57 is rewarded for finding
novel approaches amid high
dimensionality
and can even somewhat adapt this on the
fly without adjusting parameters of the
model
that's pretty cool and finally the piece de
resistance
agent 57's overarching meta controller
agent 57 knows how to adjust strategy
or policy to shift focus between
exploration
and exploitation this is what helps
agent 57 surpass
humans on the easy games as well as on
those
hard games that bottom five percent that
we talked about
agent 57 can choose the policy
trade-offs between near
and long-term performance and exploring
and exploiting the meta controller is
instrumental
in choosing which policies or strategies
to pursue and when
also interestingly the term
metacontroller has dual meaning in this
case right
by controlling these policies it's also
controlling the actions chosen by the
agent
and therefore the data the agent learns
from
meta learning whoa meta man
so i'll leave you with some sweet moves
and reaction times from agent 57
