Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
In early 2019, a learning-based technique
appeared that could perform common natural
language processing operations, for instance,
answering questions, completing text, reading
comprehension, summarization, and more.
This method was developed by scientists at
OpenAI, and they called it GPT-2.
A followup paper introduced a more capable
version of this technique called GPT-3, and
among many incredible examples, it could generate
website layouts from a written description.
The key idea, in both cases was, that we would
provide it an incomplete piece of text, and
it would try to finish it.
However, no one said that these neural networks
have to only deal with text information.
And sure enough, in this work, scientists
at OpenAI introduced a new version of this
method that tries to complete not text, but…images!
The problem statement is simple: we give it
an incomplete image, and we ask the AI to
fill in the missing pixels.
That is, of course, an immensely difficult
task, because these images may depict any
part of the world around us.
It would have to know a great deal about our
world to be able to continue the images, so
how well did it do?
Let’s have a look!
This is undoubtedly a cat.
But look!
See that white part that is just starting?
The interesting part has been cut out of the
image.
What could that be?
A piece of paper?
Something else?
Now, let’s leave the dirty work to the machine
and ask it to finish it!
Wow.
A piece of paper indeed, according to the
AI, and it even has text on it.
The text has a heading section and a paragraph
below it too.
Truly excellent.
You know what is even more excellent?
Perhaps the best part.
It also added the indirect illumination on
the fur of the cat, meaning that it sees that
a blue room surrounds it and therefore some
amount of color bleeds onto the fur of the
cat, making it bluer.
I am a light transport researcher by trade,
so I spend the majority of my life calculating
things like this, and I have to say that this
looks quite good to me.
Absolutely amazing attention to detail.
But it had more ideas.
What’s this?
The face of the cat has been finished, quite
well in fact, but the rest, I am not so sure.
If you have an idea what this is supposed
to be, please let me know in the comments.
And here go the rest of the results.
All quite good!
And, the true, real image that was concealed
for the algorithm.
This is the reference solution.
Let’s see the next one.
Oh, my, scientists at OpenAI pulled no punches
here, this is also quite nasty.
How many stripes should this continue with?
Zero?
Maybe!
In any case, this solution is not unreasonable.
I appreciate the fact that it continued the
shadows of the humans.
Next one.
Yes, more stripes, great!
But likely a few too many.
Here are the remainder of the solutions, and,
the true reference image again.
Let’s have a look at this water droplet
example too.
We humans, know that since we see the remnants
of some ripples over there too, there must
be a splash, but does the AI know?
Oh yes, yes it does!
Amazing!
And the true image.
Now, what about these little creatures?
The first continuation finishes them correctly,
and puts them on a twig.
The second one involves a stone.
The third is my favorite.
Hold on to your papers, and look at this.
They stand in the water and we can even see
their mirror images.
Wow!
The fourth is a branch, and finally, the true
reference image.
This is one of its best works I have seen
so far.
Here are some more results, and note that
these are not cherrypicked, or in other words,
there was no selection process for the results,
nothing was discarded, these came out from
the AI as you see them.
There is a link to these and to the paper
in the video description, so make sure to
have a look and let me know in the comments
if you have found something interesting!
So what about the size of the neural network
for this technique?
Well, it contains from 1.5 to about 7 billion
parameters.
Let’s have a look together and find out
what that means.
These are the results from the GPT-2 paper,
the previous version of the text processor
on a challenging reading comprehension test
as a function of the number of parameters.
As you see, around 1.5 billion parameters,
which is roughly similar to GPT-2, it learned
a great deal, but its understanding was nowhere
near the level of human comprehension.
However, as they grew the network, something
incredible happened.
Non-trivial capabilities started to appear
as we approached a hundred billion parameters.
Look!
It nearly matched the level of humans.
And all this was measured on a nasty reading
comprehension test.
So, this Image GPT has the number of parameters
that is closer to GPT-2 than GPT-3, so we
can maybe speculate that the next version
could be, potentially, another explosion in
capabilities.
I can’t wait to have a look
at that.
What a time
to be alive!
Thanks for watching and for your generous
support, and I'll see you next time!
