Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
Neural network-based learning algorithms are
making great leaps in a variety of areas.
And many of us are wondering whether it is
possible that one day we’ll get a learning
algorithm, show it a video, and ask it to
summarize it, and we can then decide whether
we wish to watch it or not? Or just describe
what we are looking for and it would fetch
the appropriate videos for us. I think today’s
paper has a good pointer whether we can expect
this to happen, and in a few moments, we’ll
find out together why.
A few years ago, these neural networks were
mainly used for image classification, or in
other words, they would tell us what kinds
of objects are present in an image. But they
are capable of so much more, for instance,
these days, we can get a recurrent neural
network write proper sentences about images,
and it would work well for even highly non-trivial
cases. For instance, it is able to infer that
work is being done here, or that a ball is
present in this image even if the vast majority
of the ball itself is concealed. The even
crazier thing about this is that this work
is not recent at all, this is from a more
than 4 year old paper! Insanity.
The first author of this paper was Andrej
Karpathy, one of the best minds in the game
who is currently the director of AI at Tesla
and works on making these cars able to drive
themselves.
So, as amazing as this work was, the progress
in machine learning research keeps on accelerating,
so let’s have a look at this newer paper
that takes it a step further, and has a look
at not an image, but a video, and explains
what happens therein. Very exciting.
Let’s have a look at an example! This was
the input video, and let’s stop right at
the first statement. The red sphere enters
the scene. So, it was able to correctly identify
not only what we are talking about in terms
of color and shape, but also knows what this
object is doing as well. That’s a great
start. Let’s proceed further. Now, it correctly
identifies the collision event with the cylinder,
then this cylinder hits another cylinder,
very good… and look at that. It identifies
that the cylinder is made of metal, I like
that a lot, because this particular object
is made of a very reflective material, which
shows us more about the surrounding room than
the object itself.
But we shouldn’t only let the AI tell us
what is going on on its own terms - let’s
ask questions and see if it can answer them
correctly. So, first, let’s ask - what is
the material of the last object that hit the
cyan cylinder? And it correctly finds that
the answer is Metal. Awesome. Now let’s
take it a step further and stop the video
here - can it predict what is about to happen
after this point? Look, it indeed can!
This is remarkable because of two things.
If we look under the hood, we see that to
be able to pull this off, it not only has
to understand what objects are present in
the video and predict how they will interact,
but also has to parse our questions correctly,
put it all together, and form an answer based
on all this information. If any of these tasks
works unreliably, the answers will be incorrect.
And two, there are many other techniques that
are able to do some of these tasks, so why
is this one particularly interesting? Well,
look here! This new method is able to do all
of these tasks at the same time.
So there we go, if this improves further,
we might become able to search Youtube videos
by just typing something that happens in the
video and it would be able to automatically
find it for us. That would be absolutely amazing.
What a time to be alive!
This episode has been supported by Linode.
Linode is the world’s largest independent
cloud computing provider. Unlike entry-level
hosting services, Linode gives you full backend
access to your server, which is your step
up to powerful, fast, fully configurable cloud
computing. Linode also has One-Click Apps
that streamline your ability to deploy websites,
personal VPNs, game servers, and more. If
you need something as small as a personal
online portfolio, Linode has your back, and
if you need to manage tons of client’s websites
and reliably serve them to millions of visitors,
Linode can do that too. What’s more, they
offer affordable GPU instances featuring the
Quadro RTX 6000 which is tailor-made for AI,
scientific computing and computer graphics
projects. If only I had access to a tool like
this while I was working on my last few papers!
To receive $20 in credit on your new Linode
account, visit linode.com/papers or click
the link in the description and give it a
try today! Our thanks to Linode for supporting
the series and helping us make better videos
for you.
Thanks for watching and for your generous
support, and I'll see you next time!
