Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
In the last few years, the pace of progress
in machine learning research has been staggering.
Neural network-based learning algorithms are
now able to look at an image and describe
what’s seen in this image, or even better,
the other way around, generating images from
a written description.
You see here a set of results from BigGAN,
a state of the art image generation technique
and marvel at the fact that all of these images
are indeed synthetic.
The GAN part of this technique abbreviates
the term Generative Adversarial Network - this
means a pair of neural networks that battle
each other over time to master a task, for
instance, to generate realistic looking images
when given a theme.
After that, StyleGAN and even its second version
appeared, which, among many other crazy good
features, opened up the possibility to lock
in several aspects of these images, for instance,
age, pose, some facial features and more,
and then, we could mix them with other images
to our liking, while retaining these locked-in
aspects.
I am loving the fact that these newer research
works are moving in the direction of more
artistic control, and the paper we’ll discuss
today also takes a step in this direction.
With this new work, we can ask to translate
our image into different seasons, weather
conditions, time of day, and more!
Let’s have a look!
Here, we have our input, and imagine that
we’d like to add more clouds, and translate
it into a different time of the day, and…there
we go!
Wow.
Or, we can take this snowy landscape image
and translate it into a blooming flowery field.
This truly seems like black magic, so I can’t
wait to look under the hood and see what is
going on!
The input is our source image, and, a set
of attributes where we can describe our artistic
vision.
For instance, here, let’s ask the AI to
add some more vegetation to this scene.
That will do!
Step number one, this artistic description
is routed to a scene generation network, which
hallucinates an image that fits our description.
Well, that’s great, as you see here, it
kind of resembles the input image, but still,
it is substantially different!
So, why is that?
If you look here, you see that it also takes
the layout of our image as an input, or in
other words, the colors and the silhouettes
describe what part of the image contains a
lake, vegetation, clouds, and more.
It creates the hallucination according to
that, so we have more clouds, that’s great,
but the road here has been left out.
So now, are we stuck with an image that only
kind of resembles what we want.
What do we do now?
Now, step number two, let’s not use this
hallucinated image directly, but, apply its
artistic style to our source image.
Brilliant!
Now we have our content, but, with more vegetation.
However, remember that we have the layout
of the input image.
That is a gold mine of information!
So, are you thinking what I am thinking?
Yes, including this indeed opens up a killer
application.
We can even change the scene around by modifying
the labels on this layout, for instance, by
adding some mountains, make it a grassy field,
and add a lake.
Making a scene from scratch from a simple
starting point is also possible.
Just add some mountains, trees, a lake, and
you are good to go!
And then, you can use the other part of the
algorithm to transform it into a different
season, time of day, or make it foggier.
What a time to be alive!
Now, as with every research work, there is
still room for improvements!
For instance, I find that it is hard to define
what it means to have a cloudier image.
For instance, the hallucination here works
according to the specification, it indeed
has more clouds than this.
But, for instance, here, I am unsure if we
have more clouds in the output - you see that
perhaps it is even less than in the input.
It seems that not all of them made it to the
final image.
Also, do fewer, but denser clouds qualify
as cloudier?
Nonetheless, I think this is going to be an
awesome tool as is, and I can only imagine
how cool it will become two more papers down
the line.
This episode has been supported by Weights
& Biases.
In this post they show you how to easily iterate
on models by visualizing and comparing experiments
in real time.
Weights & Biases provides tools to track your
experiments in your deep learning projects.
Their system is designed to save you a ton
of time and money, and it is actively used
in projects at prestigious labs, such as OpenAI,
Toyota Research, GitHub, and more.
And, the best part is that if you are an academic
or have an open source project, you can use
their tools for free.
It really is as good as it gets.
Make sure to visit them through wandb.com/papers
or just click the link in the video description
and you can get a free demo today.
Our thanks to Weights & Biases for their long-standing
support and for helping us make better videos
for you.
Thanks for watching and for your generous
support, and I'll see you next time!
