Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
About two years ago, we worked on a neural
rendering system, which would perform light
transport on this scene and guess how it would
change if we would change the material properties
of this test object.
It was able to closely match the output of
a real light simulation program, and, it was
near instantaneous as it took less than 5
milliseconds instead of the 40-60 seconds
the light transport algorithm usually requires.
This technique went by the name Gaussian Material
Synthesis, and the learned quantities were
material properties.
But this new paper sets out to learn something
more difficult, and also, more general.
We are talking about a 5D neural radiance
field representation.
So what does this mean exactly?
What this means is that we have 3 dimensions
for location and two for view direction, or
in short, the input is where we are in space
and what are we looking at, and the resulting
image of this view.
So here, we take a bunch of this input data,
learn it, and synthesize new, previously unseen
views of not just the materials in the scene,
but the entire scene itself.
And here, we are talking not only digital
environments, but also, real scenes as well!
Now that’s quite a value proposition, let’s
see if it can live up to this promise!
Wow!
So good.
Love it!
But, what is it really that we should be looking
at?
What makes a good output here?
The most challenging part is writing an algorithm
that is able to reproduce delicate, high-frequency
details while having temporal coherence.
So what does that mean?
Well, in simpler words, we are looking for
sharp and smooth image sequences.
Perfectly matte objects are easier to learn
here because they look the same from all directions,
while glossier, more reflective materials
are significantly more difficult, because
they change a great deal as we move our head
around, and this highly variant information
is typically not present in the learned input
images.
If you read the paper, you’ll see these
referred to as non-Lambertian materials.
The paper and the video contains a ton of
examples of these view-dependent effects to
demonstrate that these difficult scenes are
handled really well by this technique.
Refractions also look great.
Now, if we define difficulty as things that
change a lot when we change our position or
view direction a little, not only the non-Lambertian
materials are going to give us headaches,
occlusion can be challenging as well.
For instance, you can see here how well it
handles the complex occlusion situation between
the ribs of the skeleton here.
It also has an understanding of depth, and
this depth information is so accurate, that
we can do these nice augmented reality applications
where we put a new, virtual object in the
scene and it correctly determines whether
it is in front of, or behind the real objects
in the scene.
Kind of what these new iPads do with their
LiDAR sensors, but without the sensor.
As you see, this technique smokes the competition.
So what do you know, entire real-world scenes
can be reproduced from only a few views by
using neural networks.
And the results are just out of this world.
Absolutely amazing.
What you see here is an instrumentation of
this exact paper we have talked about, which
was made by Weights and Biases.
I think organizing these experiments really
showcases the usability of their system.
Weights & Biases provides tools to track your
experiments in your deep learning projects.
Their system is designed to save you a ton
of time and money, and it is actively used
in projects at prestigious labs, such as OpenAI,
Toyota Research, GitHub, and more.
And, the best part is that if you are an academic
or have an open source project, you can use
their tools for free.
It really is as good as it gets.
Make sure to visit them through wandb.com/papers
or just click the link in the video description
and you can get a free demo today.
Our thanks to Weights & Biases for their long-standing
support and for helping us make better videos
for you.
Thanks for watching and for your generous
support, and I'll see you next time!
