Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
This paper was written by scientists at DeepMind
and it is about teaching an AI to look at
a 3D scene and decompose it into its individual
elements in a meaningful manner.
This is typically one of those tasks that
is easy to do for humans, and is immensely
difficult for machines.
As this decomposition thing still sounds a
little nebulous, let me explain what it means.
Here you see an example scene, and a segmentation
of this scene that the AI came up with, which
shows what it thinks where the boundaries
of these individual objects are.
However, we are not stopping there, because
it is also able to “rip out” these objects
from the scene one by one.
So why is this such a big deal?
Well, because of three things.
One, it is a generative model, meaning that
it is able to reorganize these scenes and
create new content that actually makes sense.
Two, it can prove that it truly has an understanding
of 3D scenes by demonstrating that it can
deal with occlusions.
For instance, if we ask it to “rip out”
the blue cylinder from this scene, it is able
to reconstruct parts of it that weren’t
even visible in the original scene.
Same with the blue sphere here.
Amazing, isn’t it?
And three, this one is a bombshell - it is
an unsupervised learning technique.
Now, our more seasoned Fellow Scholars fell
out of the chair hearing this, but just in
case, this means that this algorithm is able
to learn on its own and we have to feed it
a ton of training data, but this training
data is not labeled.
In other words, it just looks at the videos
with no additional information, and from watching
all this content, it finds out on its own
about the concept of these individual objects.
The main motivation to create such an algorithm
was to have an AI look at some gameplay of
the Starcraft 2 strategy game and be able
to recognize all individual units and the
background without any additional supervision.
I really hope this also means that DeepMind
is working on a version of their StarCraft
2 AI that is able to learn more similarly
to how a human does, which is, looking at
the pixels of the game.
If you look at the details, this will seem
almost unfathomably difficult, but would,
of course, make me unreasonably happy.
What a time to be alive!
If you check out the paper in the video description,
you will find how all this is possible through
a creative combination of an attention network
and a variational autoencoder.
This episode has been supported by Backblaze.
Backblaze is an unlimited online backup solution
for only 6 dollars a month, and I have been
using it for years to make sure my personal
data, family pictures and the materials required
to create this series are safe.
You can try it free of charge for 15 days,
and if you don’t like it, you can immediately
cancel it without losing anything.
Make sure to sign up for Backblaze today through
the link in the video description, and this
way, you not only keep your personal data
safe, but you also help supporting this series.
Thanks for watching and for your generous
support, and I'll see you next time!
