Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Image matting is the process of taking an
input image, and separating its foreground
from the background.
It is an important preliminary step for creating
visual effects where we cut an actor out from
green-screen footage and change the background
to something else, and image matting is also
an important part of these new awesome portrait
mode selfies where the background looks blurry
and out of focus for a neat artistic effect.
To perform this properly, we need to know
how to separate the foreground from the background.
Matting human hair and telling accurately
which hair strand is the foreground and which
is the background is one of the more difficult
parts of this problem.
This is also the reason for many of the failure
cases of the portrait mode photos made with
new iPhone and Pixel cameras.
The input of this problem formulation is a
colored image or video, and the output is
an alpha matte where white and lighter colors
encode the foreground, and darker colors are
assigned to the background.
After this step, it is easy to separate and
cut out the different layers and selectively
replace some of them.
Traditional techniques rely on useful heuristics,
like assuming that the foreground and the
background are dominated by different colors.
This is useful, but of course, it's not always
true, and clearly, we would get the best results
if we had a human artist creating these alpha
mattes.
Of course, this is usually prohibitively expensive
for real-world use and costs a ton of time
and money.
The main reason why humans are successful
at this is that they have an understanding
of the objects in the scene.
So perhaps, we could come up with a neural
network-based learning solution that could
replicate this ideal case.
The first part of this algorithm is a deep
neural network that takes images as an input
and outputs an alpha matte, which was trained
on close to 50 thousand input-output pairs.
So here comes the second, refinement stage
where we take the output matte from the first
step and use a more shallow neural network
to further refine the edges and sharper details.
There are a ton of comparisons in the paper,
and we are going to have a look at some of
them, and as you can see, it works remarkably
well for difficult situations where many tiny
hair strands are to be matted properly.
If you look closely here, you can also see
the minute differences between the results
of the raw and refined steps, and it is shown
that the refined version is more similar to
the ground truth solution and is abbreviated
with GT here.
By the way, creating a dataset with tons of
ground truth data is also a huge endeavor
in and of itself, so thank you very much for
the folks at alphamatting.com for creating
this dataset, and you can see how important
this kind of work is to make it easier to
compare state of the art research works more
easily.
Adobe was a part of this research project
so if everything goes well, we can soon expect
such a feature to appear in their products.
Also, if you're interested, we also have some
nice Two Minute Papers shirts for your enjoyment.
If you are located in the US, check twominutepapers.com,
and for worldwide shipping, check the video
description for the links.
All photos of you wearing them are appreciated.
Plus scholarly points if it depicts you reading
a paper!
Thanks for watching and for your generous
support, and I'll see you next time!
