Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
We hear more and more about RGBD images these
days.
These are photographs that are endowed with
depth information, which enable us to do many
wondrous things.
For instance, this method was used to endow
self-driving cars with depth information and
worked reasonably well, and this other one
provides depth maps that are so consistent,
we can even add some AR effects to it, and
today’s paper is going to show what 3D photography
is.
However, first, we need not only color, but
depth information in our images to perform
these.
You see, phones with depth scanners already
exist, and even more are coming as soon as
this year, but even if you have a device that
only gives you 2D color images, don’t despair.
There is plenty of research on how we can
estimate these depth maps, even if we have
very limited information.
And, with proper depth information, we can
now create these 3D photographs, where we
get even more information out of one still
image.
We can look behind objects and see things
that we wouldn’t see otherwise.
Beautiful parallax effects appear as objects
at different distances move different amounts
as we move the camera around.
You see that the foreground changes a great
deal, the buildings in the background, less
so, and the hills behind them, even less so.
These photos truly come alive with this new
method.
An earlier algorithm, the legendary PatchMatch
method from more than a decade ago could perform
something that we call image inpainting.
Image inpainting means looking at what we
see in these images, and trying to fill in
missing information with data that makes sense.
The key difference here is that this new technique
uses a learning method, and does this image
inpainting in 3D, and it not only fills in
color, but depth information as well.
What a crazy, amazing idea.
However, this is not the first method to perform
this, so how does it compare to other research
works?
Let’s have a look together.
Previous methods have a great deal of warping
and distortions on the bathtub here, and if
you look at the new method, you see that it
is much cleaner.
There is still a tiny bit of warping, but,
it is significantly better.
The dog head here with this previous method
seems to be bobbing around a great deal, while
the other methods also have some problems
with it…look at this too.
And if you look at how the new method handles
it…it is significantly more stable.
And you see that these previous techniques
are from just one or two years ago.
It is unbelievable how far we have come since.
Bravo.
So this was a qualitative comparison, or in
other words, we looked at the results, what
about the quantitative differences?
What do the numbers say?
Look at the PSNR column here, this means the
Peak Signal to Noise Ratio, this is subject
to maximization, as the up arrow denotes here.
The higher, the better.
The difference is between one half to 2.5
points when compared to previous methods,
which does not sound like a lot at all.
So, what happened here?
Note that PSNR is not a linear, but a logarithmic
scale, so this means that a small numeric
difference typically translates to a great
deal of difference in the images, even if
the numeric difference is just 0.5 points
on the PSNR scale.
However, if we look at SSIM, the structural
similarity metric, all of them are quite similar,
and a previous technique appears to be winning
here.
But this was the method that warped the dog
head, and in the visual comparisons, the new
method came out significantly better than
this.
So what is going on here?
Well, have a look at this metric, LPIPS, which
was developed at UC Berkeley, OpenAI and Adobe
research.
At the risk of simplifying the situation,
this uses a neural network to look at an image,
and uses its inner representation to decide
how close the two images are to each other.
Loosely speaking, it kind of thinks about
differences as we, humans do, and is an excellent
tool to compare images.
And, sure enough, this also concludes that
the new method performs best.
However, this method is still not perfect.
There is some flickering going on behind these
fences, the transparency of the glass here
isn’t perfect, but witnessing this huge
a leap in the quality of results in such little
time is truly a sight to behold.
What a time to be alive!
I started this series to make people feel
how I feel when I read these papers, and I
really hope that it goes through with this
paper.
Absolutely amazing.
What is even more amazing is that with a tiny
bit of technical knowledge, you can run the
source code in your browser, so make sure
to have a look at the link in the video description.
Let me know in the comments how
it went!
Thanks for watching and for your generous
support, and I'll see you next time!
