Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
This one is going to be huge, certainly one
of my favorites.
This work is a combination of several techniques
that we have talked about earlier. If you
don't know some of these terms, it's perfectly
okay, you can remedy this by clicking on the
popups or checking the description box, but
you'll get the idea even watching only this
episode.
So, first, we have a convolutional neural
network - this helps processing images and
understanding what is depicted on an image.
And a reinforcement learning algorithm - this
helps creating strategies, or to be more exact,
it decides what the next action we make should
be, what buttons we push on the joystick.
So, this technique mixes together these two
concepts, and we call it Deep Q-learning,
and it is able to learn to play games the
same way as a human would - it is not exposed
to any additional information in the code,
all it sees is the screen and the current
score.
When it starts learning to play an old game,
Atari breakout, at first, the algorithm loses
all of its lives without any signs of intelligent
action.
If we wait a bit, it becomes better at playing
the game, roughly matching the skill level
of an adept player.
But here's the catch, if we wait for longer,
we get something absolutely spectacular.
It finds out that the best way to win the
game is digging a tunnel through the bricks
and hit them from behind. I really didn't
know this, and this is an incredible moment -
I can use my computer, this box next to
me that is able to create new knowledge, find
out new things I haven't known before. This
is completely absurd, science fiction is not
the future, it is already here.
It also plays many other games - the percentages
show the relation of the game scores compared
to a human player. Above 70% means that it's
great, and above 100% it's superhuman.
As a followup work, scientists at deepmind
started experimenting with 3D games, and after
a few days of training, it could learn to
drive on ideal racing lines and pass others
with ease. I've had a driving license for
a while now, but I still don't always get
the ideal racing lines right. Bravo.
I have heard the complaint that this is not
real intelligence because it doesn't know
the concept of a ball or what it is exactly
doing. - Edsger Dijkstra once said,
"The question of whether machines can think...
is about as relevant as the question of whether
submarines can swim."
Beyond the fact that rigorously defining intelligence
leans more into the domain of philosophy than
science, I'd like to add that I am perfectly
happy with effective algorithms. We use
these techniques to accomplish different tasks,
and they are really good problem solvers.
In the breakout game, you, as a person learn
the concept of a ball in order to be able
to use this knowledge as a machinery to perform
better. If this is not the case, whoever knows
a lot, but can't use it to achieve anything
useful, is not an intelligent being, but an
encyclopedia.
What about the future? There are two major
unexplored directions:
- the algorithm doesn't have long-term memory,
and even if it had, it wouldn't be able to
generalize its knowledge to other similar
tasks. Super exciting directions for future
work.
Thanks for watching and for your generous support, and I'll see you next time!
