
Chinese: 
各位學術夥伴，歡迎來到 
「Two Minute Papers」 with Károly Zsolnai-Fehér。
本篇論文是試著用影像輸入來估計深度資訊。
即是說，看一張相片，
然後分辨影像中的物件離攝影機有多遠。
 
結果看起來像這樣：左邊是輸入的相片，
右邊是真實距離的 heatmap，
這是我們想要得到的結果。
 
這表示需要收集大量的室內與室外影像，
並且知道真實的深度資訊。
再嘗試從中學習其對應的關係，了解其關聯性。
人行道、森林、
建築物，所有你想的到的。
這些影像跟深度的對應關係，由一個 3D 掃描器
配合一個很炫的客製化載具。
還有，就是一大筆的研究經費。
 
最後的目標是提供一張相片，
其中的深度資訊是完全未知的，
而演算法會提供深度資訊給我們。

English: 
Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
This piece of work tries to estimate depth
information from an input photograph. This
means that it looks at the photo and tries
to tell how far away parts of the image are
from the camera.
An example output looks like this: on the
left, there is an input photograph, and on
the right, you see a heatmap with true distance
information. This is what we're trying to
approximate.
This means that we collect a lot of indoor
and outdoor images with their true depth information,
and we try to learn the correspondence, how
they relate to each other. Sidewalks, forests,
buildings, you name it. These image and depth
pairs can be captured by mounting 3D scanners
on this awesome custom-built vehicle. And,
gentlemen, that is one heck of a way of spending
research funds.
The final goal is that we provide a photograph
for which the depth information is completely
unknown and we ask the algorithm to provide
it for us.

Chinese: 
你可以看到結果：
第一張影像是輸入的相片，第二張是真實的深度資訊。
第三張是這個演算法計算出來的深度資訊。
 
這些是一些網路上的影像產生的結果。
判斷的至少跟人一樣好。
嘆為觀止。
這聽起來像是個人類感知問題，
而對電腦來說最少是個危險的旅程。
真正值得注意的是，
這些關聯性可以被電腦的演算法學習。
 
這可以用來做甚麼呢？恩，多的是，其中一個是
利用 2D 相片跟計算出的深度資訊進行多重視角的呈現。
這也對於環境探索機器人的可靠度也是非常有用的，
而且只需要架設便宜的消費性攝影機。

English: 
Here you can see some results: the first image
is the input photograph, the second shows
the true depth information. The third image
is the depth information that was created
by this technique.
And here is a bunch of results for images
downloaded from the internet. It probably
does at least as good as a human would. Spectacular.
This sounds like a sensorial problem for humans,
and a perilous journey for computers to say
the least. What is quite remarkable is that
these relations can be learned by a computer
algorithm.
What can we use this for? Well, a number of
different things, one of which is to create
multiple views of this 2D photograph using
the guessed depth information.
It can also be super helpful in building robots
that can wander about reliably with inexpensive
consumer cameras mounted on them.

English: 
Thanks for watching, and for your generous
support, and I'll see you next time!

Chinese: 
感謝您的收看與慷慨支持，我們下次見。
