
Chinese: 
边上还有位置
那么确认一下你们要上的课程是cs231n
用于计算机视觉的深度学习和神经网络
有人走错教室了吗？
很好，那么欢迎大家，各位新年快乐，寒假快乐
这堂课是cs231n ，是第二次开课
申请的人数比上次增加了差不多一倍
从180人增加到350人了
为了确保我们符合法律，先声明一下
我们正在为这堂课录制视频

English: 
There is more seats on the side for people walking in late.
So just to make sure you are in CS231n
The deep learning on neural network class for visual recognition
Anybody in the wrong class? Okay, good.
Alright, so welcome and happy new year, happy first day of the winter break
the second offering of this class when
we have literally doubled our enrollment
and from a hundred eighty people last
time we offered to about 350 of you
signed up just a couple of words to do
to make us all legally covered we are

Chinese: 
如果你觉得不自在，今天可以去教室后面或者角落那边，这样摄像机就录不到你了
不过我们会发给你们表格以获得录像的允许
所以目前也不会公开发布
那么，我叫李飞飞，CS系的教授
这次课程我将会和两位高年级研究生共同执教
一位是Andrej Karpathy——和大家打个招呼
我觉得Andrej不用我介绍太多了
你们中很多人都知道他的工作，关注他的博客
可能还是他的twitter粉丝
……Andrej的粉丝比我还多~

English: 
video recording this class so um you
know if you're uncomfortable about this
for today just go behind the camera or
go to a corner that the camera is not
gonna turn but we are going to send out
forms for you to fill out in terms of
allowing a video recording so so that's
that's just one bit of housekeeping so
um all right um my name is Faye Faye Lee
I'm a professor at the computer science
department so this class I'm co-teaching
with two senior graduate students and
one of them is here is Andre capaci
and ray can you just say hi to everybody
we have well I don't think Andre needs
too much introduction a lot of you
probably know his work follow his blog
his Twitter follower Andre has way more

English: 
followers than I do
he's very popular and also Justin
Johnson who is still traveling
internationally but will be back in a
few days so Andre and Justin will be
picking up the bulk of the lecture
teaching
and today I'll be giving the first
lecture but as you probably can see that
I'm expecting a newborn ratio
speaking of weeks so you'll see more of
Andra and Justin in lecture time we will
also introduce a whole team of TAS
towards the end of this lecture again
people who are looking for seats if you
go out of that door and come back there
is a whole bunch of seats on this side
okay so let's so this for this lecture
we're going to give an introduction of
the class the kind of problems we work

Chinese: 
此外还有Justin Johnson……他还在环球旅行，不过几天内就会回来了
所以Andrej和Justin将会负责大部分的讲解课程
今天我会做第一次讲义，不过你们也许可以看得见……
我正在期待一个新的小生命的降临——也就是这几周的事儿了
所以你们会更多看到Andrej和Justin
我们也会介绍TACE的整个团队
……如果有人还在找座位，从那边出去，后面有一大排椅子……
这次讲义我们会做一个简要介绍
关于课程，关于我们要解决什么样的问题
以及我们要学习的工具

English: 
on and the tools we'll be learning so
again welcome to CS 231n this is a
vision class it's based on a very
specific modeling architecture called
neural network and even more
specifically mostly on convolution on
your network and a lot of you hear this
term maybe through a popular press an
article we or coverage we tend to call
this the deep learning network vision is
one of the fastest growing field of
artificial intelligence in fact cisco
has estimated and and we are on day four
of this by 2016 which we already have
arrived more than 85% of the internet
cyberspace data is in the form of pixels

Chinese: 
那么，再次欢迎来到CS231n
这是一门关于计算机视觉的课程
基于一种专用的模型架构，叫做神经网络
或者分得更细一点，卷积神经网络（Convolutional Neural Network）
可能你们中很多人最近从报纸之类的地方看到
一个很流行的词，深度学习网络
 
 
（计算机）视觉是人工智能领域中发展最迅猛的一个分支
实际上，cisco(?)进行过评估，在2016年
我们已经到了
互联网上超过85%的信息都会是像素形式，也就是他们说的“多媒体”

Chinese: 
所以我们实际上进入了一个视觉的时代
图片和视频的时代。
何以这样？
这样的信息爆炸，部分是因为我们有互联网作为信息的载体
另外一部分原因是因为传感器。
我们的传感器比人的数量还多
你们每个人都拿着一个智能手机
数码相机
大街上跑的汽车也都有行车记录仪
所以传感器确实引发了视觉信息的大爆发
但是视觉信息或者叫像素信息确实是最难被利用的信息
如果你们听过我之前的演讲或者其他一些讲义

English: 
or what they call multimedia so so we
basically have entered an age of vision
of images and videos and why why is it
so well partially and to a large extent
is because of the explosion of both the
internet as a carrier of data as well as
sensors we have more sensors than the
number of people on earth these days
every one of you is carrying some kind
of smartphones digital cameras and and
and and you know cars are running on the
street with cameras so so the sensors
have really enabled the explosion of
visual data in the on the internet but
visual data or pixel data is also the
hardest data to harness so if you have

English: 
heard my previous talks and some other
um talks by computer vision professors
we call this the dark matter of the
internet why is this the dark matter
just like the universe is consisted of
85% dark matter dark energy is these
matters energy that is very hard to
observe we can we can infer it by
mathematical models in the universe on
the Internet
these are the matters pixel data other
the data that we don't know we have a
hard time grasping the contents here's
one very very simple aspects for you to
consider
so today YouTube servers every 60
seconds we have more than 150 hours of
videos uploaded onto YouTube servers for

Chinese: 
我们称之为“互联网中的暗物质”
为什么是暗物质呢？
就像银河系中据说有85%的质量属于暗物质和暗能量
这些都非常难于检测和观察
我们可以通过数学模型来推断这些暗物质的存在
在互联网上，这些像素数据我们不知道（它们的内容），我们很难获取到它们描述的内容
这里有一个非常简单的例子，便于大家理解
现在，youtube的服务器上
每60秒，服务器就会接受超过150小时的视频上传
每60秒而已
想想看，如此大的数据量，人类的眼睛根本没有办法浏览全部这些数据

Chinese: 
给如此大量的数据进行标记，分类
所以youtube团队或者谷歌都在想办法为这些数据进行标记、分类、索引等等的工作
用来做广告，或者是帮助我们检索或者是操作这些数据
但是没有成功，因为没有人能手工地去处理如此大量的数据。
完成这项工作的唯一希望，就是（计算机）视觉技术。
以期能够对照片进行标签、分类，处理视频中的每一帧
自动截取出篮球比赛中——比如说科比的一次精彩进球
那么这就是我们现在面对着的问题

English: 
every 60 seconds think about the amount
of data there's no way that human eyes
can sift through this massive amount of
data and and make annotations labeling
it and and and and describe the contents
so think from the perspective of the
YouTube team or Google company if they
want to help us to search index manage
and of course for their purpose put
advertisement or whatever manipulate the
content of the data were at loss because
nobody can hand annotate this the only
hope we can do this is through vision
technology to be able to label the
objects find the things find the frames
you know locate where that basketball
video were Kobe Bryant's making like
that awesome shot and so so these are
the problems that we are facing today
that the massive amount of data and the

English: 
the challenges of the dark matter so
computer vision is a field that touches
upon many other fields of studies so I'm
sure that even sitting here sitting here
many of you come from computer science
but many of you come from
biology psychology are specializing
natural language processing or graphics
or robotics or you know medical imaging
and so on so as a field computer vision
is really a truly interdisciplinary
field what the problems we work on the
models we use touches on engineering
physics biology psychology computer
science and mathematics so just a little
bit of a more personal touch I am the
director of the computer vision lab at
Stanford in our lab we I work with

Chinese: 
非常大量的数据，以及“暗物质”的挑战。
计算机视觉是一个与很多领域紧密关联的学科
我想，除了计算机系的学生之外，在座的很多人，可能是来自生物学系，心理学系
以及自然语言处理方向，或者是机器人图形学、或者可能是医疗影像等等等等
所以说，计算机视觉确实可以说是一门跨学科的领域
我们面对的问题，我们使用的建模(手段）也是跨学科的
比如说工程、物理、生物、心理学、计算机科学，以及数学
从个人的角度讲，我是斯坦福大学计算机视觉实验室的主任
在我们的实验室里
我和很多研究生或者博士后一起工作
甚至也有本科生

Chinese: 
我们讨论的论题往往都是对我们的研究方向十分有益的
他们中的一些人……比如Andrej，Justin，来自我的实验室
我们是研究机器学习领域的——这是深度学习的一个超集
但我们也在神经科学和认知科学上投入了很多工作
自然语言处理（NLP)也是一样
这就是在我的实验室里，计算机视觉研究的现状，是这样
那么我们再来了解一下，斯坦福CS系提供的其他计算机视觉方面的课程
你们现在正在上这个——CS231n

English: 
graduate students and postdocs and and
and even undergraduate students on a
number of topics and most dear to our
own research who some of them you know
that Andre just didn't come from my lab
a number of TAS come from my lab we work
on machine learning which is part a
percent of deep learning we work a lot
cognitive science and neuroscience as
well as the intersection between NLP and
speech so that's that's the kind of
landscape of computer vision research
that my lab works in so also to put
things in a little more perspective what
are the computer vision classes that we
offer here at Stanford through the
computer science department
clearly you're in this class yes 21 n

English: 
and so you some of you who have never
taken computer vision probably have
heard of comparison for the first time I
probably should have already done CS 131
that's an intro class of previous
quarter we offered and then and then
next quarter which normally is offer
this quarter but this year is a little
shifted there's an important graduate
level computer vision class called CS
231 a offered by Professor Silvio subber
si who works in robotics and 3d vision
and a lot of you asked us the question
that are these you know do these replace
each other this class CS 231n vs. CS
231 a and the answer is no and if you're

Chinese: 
可能你们中的一些人从来没有接触过计算机视觉
应该已经上过CS131
我们上个季度提供的一门前置课程
接下来下个季度——通常应该是这个季度，不过这一次我们调整了一下
有一门研究生层级的课程，叫做CS231a
由Silvio Savarese教授上课，他主要研究机器人3D视觉
很多人问到，CS231n和CS231a能替代彼此吗？
答案是：不能
如果你的兴趣范围是关于计算机视觉的一些广泛的讨论，比如工具之类

Chinese: 
或者一些基础讨论
关于3D视觉，或者是机器人视觉或者识别
你应该考虑申请CS231a
那是一个更为广泛通用的课程
CS231n针对的是更为专门的领域
在模型和应用范围都更有针对性
模型方面，我们只讨论神经网络
应用范围方面，我们基本只针对视觉识别
当然我们会有很多重叠的方面
但这是这两门课程主要的区别所在
下季度我们可能会有一些更高级的课程，不过这些还在筹划当中
最终以你们的课程表为准

English: 
interested in a broader coverage of
tools and topics of computer vision as
well as some of the fundamental
fundamental topics that comes that
relates you to 3d vision robotic vision
and visual recognition you should
consider taking 231 a that is the more
general class 231 n which will go into
starting today more deeply focuses on a
specific angle of both problem and model
the model is your network and the angle
is visual recognition mostly but of
course they have a little bit of overlap
but that's the major difference and next
next quarter we also have possibly a
couple of a couple of advanced
seven-hour level class but you that's
still in the formation stage so you just

English: 
have to check the syllabus so that's the
the kind of computer vision curriculum
we offer this
you're at Stanford any questions so far
yes 131 is not a strict requirement for
this class but you'll soon see that if
you've never heard of computer vision
for the first time I suggest you find a
way to catch up because this class
assumes a basic level of understanding
of of of computer vision you can browse
the notes and so on all right okay so
the rest of today is that I will give a
very brief broad stroke history of
computer vision and then we'll talk
about 231n a little bit in terms of
the organization of the class actually
really care about sharing with you this

Chinese: 
那么这些就是近年来斯坦福提供的计算机视觉方面的课程
有什么问题吗？
131不是本课程的严格前置要求
不过如果你之前从没听说过计算机视觉
我建议你想办法补一下基础
因为这个课程假定你对计算机视觉有一些基础的了解
你可以去翻翻讲义……之类的
好，那么今天接下来的时间
我会给你们一个非常简要的，关于计算机视觉的历史简介
然后我们会关于231n的课程组织讨论一点点

Chinese: 
我其实非常想和你们分享计算机视觉识别的历史
因为你们可能是因为，对“深度学习”这个非常有意思的工具感兴趣
这也是我们这个课程的目的
我们会给大家提供一个有深度的视角
去探究深度学习模型到底是什么
但是如果不去深入理解问题本质，不去思考问题的真正定义
是很难继续前行
（很难）成为下一个解决问题的模型的发明者
（很难）开发出一个能解决实际难题的（系统）
而且，在通用的（计算机视觉）问题领域，这些模型从来没有彻底解耦过

English: 
brief history of computer vision because
you know you might be here primarily
because of your interesting this really
interesting tool called deep learning
and this is the purpose of this class
we're offering you an in-depth look in
them and just journey through the the
what this deep learning model is but
without understanding the problem domain
without thinking deeply about what this
problem is it's very hard for you to to
go on to be an inventor of the next
model that really solves a big problem
in vision or to be you know developing
developing making impactful work in
solving a heart problem and also in
general problem domain and model the
modeling tools themselves are never

Chinese: 
他们互相引用和借鉴
从深度学习的历史可以看到
卷积神经网络结构，是为了解决识别问题的需要而出现
此后，识别问题又促进了深度学习的进化
如此往复
我希望能够骄傲地看到各位同学
成为计算机视觉识别*和*神经网络的专家。
希望你们不但能掌握这些工具
也能用这些工具来解决实际的问题
那么下面，这是简史
简史，可不是短史
我们要从5亿4千万年前说起……

English: 
never fully decoupled they inform each
other and you see through the history of
deep learning a little bit that the
convolutional neural network
architecture come from the need to solve
a vision problem and
then vision problem helps the the deep
learning algorithm to evolve and back
and forth so it's really important to to
you know I want you to finish this
course and feel proud that your student
of computer vision and of deep learning
so you have this boost tool set and the
in-depth understanding of how to use the
tools that to to to to tackle important
problems so it's a brief history but
doesn't mean it's a short history so
we're gonna go all the way back to two
hundred thirty five hundred forty
million years ago so why why did I pick

English: 
this you know on the scale of the the
earth history this is a fairly specific
range of years well so I don't know if
you have heard of this but this is a
very very curious period of the Earth's
history and biologists call this the Big
Bang of evolution before five hundred
three four five hundred forty million
years ago the earth is a very peaceful
pot of water I mean it's pretty big pot
of water so we have very simple
organisms these are like animals that
just floats in the water and the way
they eat and now on a daily basis is you
know they just float and if some kind of
food comes by near their mouths or

Chinese: 
我为什么选择了这样一个特定的时间段呢？
在地球的历史上，这是一段相当传奇的时期。
我不知道你们有没有听说过，不过这确实是地球历史中很奇特的一段。
生物学家称之为“进化爆炸”（寒武纪生命大爆发）
5亿4千万年前
地球是一锅非常平静的水
一大锅水
我们有非常简单的生物圈，所谓动物就只是漂在水里
他们进食的方式就是张着嘴漂着
等着嘴边的食物撞进来

Chinese: 
他们就张开嘴吞下去就好了
我们也没有很多种类的生物
但是5.4亿年前，奇怪的事情发生了
从化石研究来看，物种数量突然地就爆发了
生物学家称之为speciation
那么突然间，由于某种原因，生物开始变得多样化
他们发展出非常非常复杂的形态
出现了食肉动物，猎食者们发展出
各种各样的工具来帮助自己生存
到底是什么力量触发了这一切？
这是一桩悬案
可能是诸如小行星撞地球啊，环境变迁啊之类的
一个最有说服力的理论

English: 
whatever they just open the mouth and
grab it and we don't have too many
different types of animals but something
really strange happened around five
hundred forty million years suddenly
from the fossils we study there's a huge
explosion of species the biologists call
speciation like suddenly for some reason
something hit the earth that animals
start to diversify
get really complex and they they start
to yellow to to you start to have
predators and praise and then they have
all kind of tools to to survive and what
was the triggering force of this was a
huge question because people were saying
oh did you know another set of whatever
a meteoroid hit the earth or or you know
the environment change it turned out one
of the most convincing theory is that by

English: 
this guy called Andrew Parker of his a
modern zoologist in Australia from
Australia he studied a lot of fossils
and his theory is that it was the onset
of the ice so one one of the first
trilobite developed and I a really
really simple I it's almost like a
pinhole camera that just catches light
and make some projections and register
some information from the environment
suddenly life is no longer so mellow
because once you have the eye the first
thing you can do is you can go catch
food you actually know where food is
you're not just like blind and and
floating the water and once you can go
catch food guess what the food better
develop eyes and to run away from you

Chinese: 
是Andrew Parker提出的
他是澳大利亚的现代地质学家
他研究了很多化石，得出了结论：
这一切都是源于眼睛的出现
那么，第一个先驱发展出了一个非常非常简陋的眼睛
也就和针孔相机（小孔成像）差不多
只能捕捉到光线，察觉到一点环境的信息
突然之间
生活不再那么平淡
因为，有了眼睛之后
第一件事就是可以去捉些食物
你现在知道食物在哪儿了
再不是漂在水里的瞎子了
而当你能去抓食物之后，你猜怎么着？
那些食物最好赶紧长出眼睛来，从你身边跑掉
否则它们就要挂了

Chinese: 
所以第一只有眼睛的动物
简直就像进了谷歌员工的自助餐厅
它拥有最美好的时光，什么都随便它吃
因为这些眼睛的出现
生物们展开了“军备竞赛”
每一只动物都得学着去弄出来点什么东西，挣扎求存
在这种突然的物种爆发中，出现了食肉动物和猎食者
所以，这就是5.4亿年前，视觉出现时的情景
视觉能力不仅仅是“出现”而已
实际上它正是进化大爆发的最主要驱动力
我们暂时不在进化历史上探究太多细节

English: 
otherwise they'll be gone you know
you're you're so the first element we
had had eyes were like in a in a
unlimited buffet it's like working at
Google and it just like it has the best
time you know eating everything they can
but because of this onset of the eyes
what we whether the geologists realized
is that the the biological arms race
began every single animal needs to needs
to learn to develop things to survive or
to you know you you you suddenly have
praise and predators and and all this
and the speciation begin so that's when
vision become 540 million
years and not only vision began vision
was one of the major driving force of
the speciation or the the big ban of
evolution alright so so we're not going
to follow evolution for with too much

Chinese: 
视觉领域另一项非常重要的突破
在工程技术方面
发生在文艺复兴时期
由达·芬奇这个传奇人物发明
在文艺复兴之前，全球各地的文明
从亚洲到欧洲、美洲、非洲
我们曾经见识过照相机的模型
亚里士多德曾经用树叶制作过相机
中国先贤墨子曾经用带小孔的盒子制作过相机
但是如果你去了解第一份描述现代照相机原理的文书
你会找到"照相暗盒“
是由利奥纳多·达·芬奇描述的
我不会探究太多的细节

English: 
detail another big important work that
focus on in engineering of vision
happened around the Renaissance and of
course it's attributed to this amazing
guy
Leonardo da Vinci so before Renaissance
you know throughout human civilization
from Asia to Europe to India to Arabic
world we have seen models of cameras so
Aristotle has proposed the camera
through the leaves Chinese philosopher
Moses have proposed the camera through a
box with a hole but if you look at the
first documentation of really a modern
looking camera
it's called camera obscura obscura and
that is documented by Leonardo da Vinci
I'm not going to get into the details

Chinese: 
但是你能看到，这里至少有个镜头之类的东西
或者至少是个小孔，用来捕捉真实世界反射的光线
这里面还有一些机制来捕捉
来自真实世界的图像信息
那么，这就是现代视觉工程技术的开端了。
从“想要复制这个世界”开始
希望能为看到的世界留下一份视觉拷贝
这并没有涉及到试图去理解看到的信息
这时候我们只是在复制我们看到的信息
这是一个值得被铭记的重要成就
当然，自照相暗盒之后，我们有了一系列的进步

English: 
but this is you know you get the idea
that there is some kind of lens or at
least a hole to capture lights reflected
from the real world and then there is
some kind of projection to capture the
information of the of the of the real
world image so that's the beginning of
the modern you know of engineering of
vision
it started with wanting to copy the
world and wanting to make a copy of the
visual world it hasn't got anywhere
close to wanting to engineer the
understanding of the visual world right
now we're just talking about duplicating
the visual world so that's one important
work to remember and of course after a
camera obscura that we we start to see

English: 
a whole series of successful you know
some film gets developed um you know
like kodak was one of the first
companies developing commercial cameras
and then we start to have camcorders and
all this another very important
important piece of work that i want you
to be aware of as vision student is
actually not an engineering work but a
sign science piece of science work that
starting to ask the question is how does
vision work in our biological brain no
we we now know that it took 540 million
years of evolution to get a really
fantastic visual system in mammals in
humans but what did evolution do during
this time what kind of architecture did
it develop from that simple trilobite

Chinese: 
比如电影的发明
比如柯达开发出商用的相机产品
我们现在还有了摄像机之类的产品
另一项非常重要的工作
你们作为（计算机）视觉的学生应该知道的
并不是工程技术的成果
而是科学上的成就
那就是：生物的大脑是如何处理视觉信息的？
我们现在知道，我们用了5.4亿年进化出了
人类这样非常神奇的视觉系统
但是这段时间进化究竟做了些什么？
我们到底用着什么样的"架构"?

Chinese: 
从三叶虫到人类的眼睛，到底有什么样的变迁？
哈佛进行了一项重要研究
由两位当时非常年轻的博士后Hubel和Wiesel主持
他们弄了一只清醒的，但是被麻醉了的猫
并制作一根电极探针
打开了猫的头骨
将这根探针插入猫的大脑内的
基础视觉皮质层(primary visual cortex)中
这部分区域的神经元处理着和视觉相关的大量工作
但是此前我们并不知道基础视觉皮质层到底做着什么样的工作
我们只知道它负责眼睛看到东西之后

English: 
eye to today yours and mine well a very
important piece of work happened at
harvard by 2:00 at that time young to
very young ambitious postdoc Cuba and
visa what they did is that they used
awake but Anna sized cats and then there
was enough technology to build this
little needle called electrode to push
the electrode through into the the wall
that the skull is open into the brain of
the cat into an area what we already
know primary visual cortex primary
visual cortex is an area that neurons do
a lot of things for for visual
processing but before you go visa we
don't really know what primary visual
cortex is doing we just know it's one of
the earliest state other than your eyes

English: 
of course but earliest stage for visual
processing and there's tons and tons of
neurons working on vision and we really
ought to ought to know what this is
because
that's the beginning of vision visual
process in the brain so they they put
this electrode into the primary visual
cortex and interestingly this is another
interesting fact if I don't drop all my
stuff I'll show you
primary visual cortex the first stage or
second dependent where he come from I'm
being very very rough rough here first
state of your cortical visual processing
stage is in a back of your brain not
near your eye okay it's very interesting
because your olfactory cortical
processing is right behind your nose
your auditory is right behind your a

Chinese: 
视觉处理流程中最为前期的部分
有难以计数的神经元参与这个流程
我们应该搞清楚它到底是怎么运作的
因为这是大脑处理视觉的开端
所以他们把电极插入了猫的基础视觉皮质层中
另一个有趣的事实是
我把东西放下演示给你们看啊
基础视觉皮质层
视觉处理流程的第一站
或者第二？取决于要不要把眼睛算进去
但是关键流程的第一步，是在后脑勺的位置上
而不是紧挨着眼睛
这个可真有意思
因为你的嗅觉处理部分是紧挨着鼻子
听觉处理的部分是紧挨着耳朵
然而基础视觉皮质层

Chinese: 
却在离眼睛最远的位置
另一个有意思的事儿
并不是只有基础这一块参与了视觉处理流程
差不多有50%的大脑都参与着视觉处理的过程
视觉是大脑的感知任务中，最艰难
也是最重要的一项工作
我倒不是说别的感官没用
但是自然进化用了如此长的时间
开发出我们的感知系统
视觉在其中占据了如此多的资源
为什么呢？
因为它太重要了
而且太***难了
所以它才占据了如此多的资源
那我们回到Hubel和Wiesel的实验上来
他们踌躇满志
想要搞清楚基础视觉皮质层到底在做什么
因为这是我们的深度学习神经网络的第一步知识

English: 
year but your primary visual cortex is
the farthest from your eye and another
very interesting fact in fact not only
the primary there's a huge area working
on vision almost 50% of your brain is
involved in vision vision is the hardest
and most important sensory perceptual
cognitive system in the brain you know
I'm not saying anything else does it
it's not useful clearly but you know it
takes nature this long to develop this
this sensory system and it takes later
this much real estate space to be used
for this system why because it's so
important and it's so damn hard that's
why we need to use this much place I'll
get back to human reason they were
really ambitious they want to know what
primary visual cortex is doing because
this is the beginning of our knowledge

English: 
for deep learning your network ah so
they were showing cats so they put the
cats in this room and they were
recording your activities and when I say
recording your activity they're tall
they're basically trying to see you know
if I put the the neural electrode here
like to the neurons to the neurons fire
when they see something so for example
if they show ah if they show cat
their ideas if I show this kind of fish
you know apparently at that time cats
eat fish rather than these beings um
with the cats new are like yellow you're
happy and start sending spikes and and
the funny thing here is a story of
scientific discovery a scientific
discovery takes both luck and care and
thoughtfulness they were shown as
catfish whatever Mouse flower it just

Chinese: 
他们给猫看——
哦，他们先把猫放到屋子里
然后他们记录神经元的活动
我所说的“记录神经元活动”
基本上是指他们去观察
如果把电极放在这里
当看到东西时，神经元是否被激发
比如说，他们给猫看一张
鱼的图片
唔，看来那个时候猫还在吃鱼呢
那么，神经元会兴奋起来，发送脉冲吗？
这个故事教导我们
科学发现需要运气，细心和深思熟虑
他们给猫看了鱼的图片，耗子的图片
花的图片
结果全都没有用

English: 
doesn't work the catch neuron in the
primary visual cortex was silent there
was no spiking are very little spiking
and they were really frustrated but the
good news is that there was no computer
at that time so what they have to do
when they show this cats these stimuli
is they have to use a slight projector
so they put a put a slide of a fish and
then wait till the neuron spike if the
neuron doesn't spike they take the slide
out and put in another slide and then
they notice every time they change slide
like this dislike you know the squarish
film I don't even remember if they use
glass or film but whatever the neural
spikes that's weird you know like the
actual mouse and fish and flower didn't
drive then you're excite the neuron but
the the movement of taking a slide out

Chinese: 
猫的基础视觉区一片沉寂
没有任何脉冲
他们真的相当沮丧
然而
好消息是，
当时还没有计算机
所以他们想给猫看图片的话
得用幻灯片投影才行
所以他们放一张鱼的幻灯片
等着神经元的脉冲
如果没有，就拿出这张幻灯片，换下一张
结果他们发现
他们每次换幻灯片的时候
就像用……我不知道你们用不用什么胶片，无所谓
神经元被激活了
这很诡异啊
真正的鱼，耗子，花的图片都没有激活神经元
结果把幻灯片拿出去的动作反而激活了它
或者放进去的动作

Chinese: 
总不能说是猫在想
“哦他们总算给我换新图片了”
这表明了
更换幻灯片的动作
生成了一个“边缘”
可能是方形，矩形或者圆形之类的
这个移动的边缘
激活了这些神经元
他们立刻捕捉到了这个发现
如果他们太沮丧了，或者太粗心
便会漏掉这个发现
但他们没有
他们立刻深入探究
最终发现
基础视觉区的神经元是按一列一列的组织起来
每一列神经元只“喜欢”某一种特定的形状
某种简单的线条组合

English: 
or putting a sliding dip excite the nor
I can be the cat is thinking or finally
they're changing the new you know a new
object for me so it turned out there's
an edge that's created by this slide
that they're changing right the slide
whatever it's a square rectangular plate
and that moving edge drove or excited
the neurons so they're really chased
after that observation you know if they
were too frustrated or too careless they
would have missed that but they were not
they really they chase
after that and realize neurons in the
primary visual cortex are organized in
columns and for every column of the
neurons they like to see a specific
orientation of the of the of a stimuli

English: 
simple oriented bars rather than the
fish or Mouse you know I'm making this a
little bit of a simple story because
there are still numerous in primary
visual cortex we don't know what they
like they don't like simple oriented
bars but by large we human visual found
that the beginning of visual processing
is not a holistic fish or Mouse the
beginning of visual processing is simple
structures of the world edges oriented
edges and this is a very deep deep
implication to both neurophysiology and
neuroscience as well as engineering
modeling it's if later when we visualize
our deep neural network features will
see that simple simple of edge like
structure emerging from our from our

Chinese: 
而不是鱼或者耗子
我讲这个小故事
有很多基础视觉区的神经元，我们不知道它们喜欢什么
但是总体来说，Hubel和Wiesel发现
视觉的最初，并不是对整体的鱼或者耗子进行处理
视觉处理流程的第一步，是对简单的形状结构处理
边缘，排列
这对认知科学、神经科学、工程模型
都产生了极为深远的影响
如果以后我们实现一些深度神经网络
我们会看到
简单的边缘结构
出现在我们的模型中

English: 
model and even though the discovery was
in a later 50s and early 60s they won a
Nobel a medical price for this work in
1981 so that was another very important
piece of work related to vision and
visual processing and so when did
computer vision begin that's another
interesting um that's another
interesting story his history the
precursor of computer vision as a modern
field was this particular dissertation
by Larry Roberts in 1963 it's called
block world he just as Hubel and Visa
were discovering that the visual world
in our brain is organized
by simple edge like structures Larry

Chinese: 
尽管这个发现是在50-60年代期间
他们还是在1981年凭借这个贡献
获得了诺贝尔医学奖
这就是视觉研究领域的另一项极为重要的成就
那么，计算机视觉领域，起于何时呢？
这是另一个有趣的故事了
现代计算机视觉领域的先驱
是这篇Lary Roberts在1963写的论文
名字叫做“方块世界”（block world）
就像Hubel和Wiesel发现的那样
我们大脑对视觉信息的处理是基于边缘和形状的

Chinese: 
Larry Roberts作为早期计算机科学的一名博士生
试图从图像中解析出这些边缘和形状
作为一项工程上的成果
在这个实例中，他的目标……
我们作为人类，能够识别出来图中的这块东西
我们知道这两张图是同一块东西
尽管光照和朝向都不同了
就如Hubel和Wiesel所说
是边缘决定了结构
这些边缘定义了形状，它们不会改变
所以Larry Roberts写了这篇博士论文
来解析出图片中的这些边缘

English: 
Roberts as an early piece Commerce
science PhD students were trying to
extract these edge like structures in
images and and and and as a as a piece
of engineering work and in this
particular case his goal is that you
know bow you and I as humans can
recognize blocks no matter how it's
turned right like we know it's the same
block these two are the same block even
though the lighting changed and the
orientation changed and his conjuncture
is that just like people told us it's
the edges that define is the structure
the edges the edges define the shape and
they don't change rather than all these
interior things so Larry Roberts wrote a
PhD dissertation to just extract these

English: 
edges it's you know if you work as a PhD
student computer vision this is like you
know this is like undergraduate computer
vision we don't have being a PhD thesis
but that was the first precursor
computer vision PhD thesis on Larry
Roberts is interesting he kind of gave
up he's he's a working computer vision
afterwards and and went to DARPA and was
one of the inventors of the Internet so
you know he didn't do too badly by
giving up computer vision but we always
like to say that the birthday of
computer vision as a modern field is in
the summer of 1966 the summer of 1966
MIT artificial intelligence lab was
established before that actually for one
piece of history you should feel proud
as a Stanford student this there are two
pioneering artificial intelligence lab

Chinese: 
如果你是计算机视觉的一名博士生
这个看起来像是本科学生的学业
不大能成为一篇博士论文
但是这是计算机视觉领域第一篇开创性的博士论文
Larry Roberts后来放弃了计算机视觉领域的研究
进了DARPA
也就是现在互联网的前身（发明者之一）
他放弃了计算机视觉研究之后做得也不错
但是我们仍然总是说
计算机视觉，作为现代科学领域
它的生日
在1966年夏季
1966年夏天，MIT（麻省理工学院）
人工智能实验室成立了
作为斯坦福的学生应该自豪的是
实际上在60年代早期，有两个前瞻性的AI实验室

English: 
established in the world in the early
1960s one by Marvin Minsky
at MIT one by John McCarthy at Stanford
at Stanford the compel the artificial
intelligence lab was established before
the computer science department and
professor John McCarthy who founded AI
lab is the one who is responsible for
the term artificial intelligence so
that's a little bit of a proud stanford
history but anyway we have to give MIT
this credit for starting the field of
computer vision because in the summer of
1966 a professor at MIT AI lab decided
it's time to solve vision you know so AI
was established we start to understand
you know first of all the logic and all
this and I think Lisp was probably
invented at that time but anyway vision
is so easy you open your eyes you see

Chinese: 
全世界建立了两个
一个是Marvin Minsky在MIT建立的
另一个是John McCarthy在斯坦福建立的
人工智能实验室甚至比计算机科学系建立的还要早
建立了AI实验室的John McCarthy教授
是AI（artificial intelligence）一词的提出者
这是斯坦福值得自豪的一个小历史故事
但是不论如何
我们将给予MIT建立计算机视觉学科的荣誉
因为在1966年夏天
MIT AI实验室的一名教授决定
着手解决计算机视觉的问题
于是AI实验室建立起来了
我们开始弄明白
ELIZE可能就是在那时候发明的
不管怎么样吧
（计算机）视觉多简单啊！
睁开眼睛你就能看到啦！

Chinese: 
这玩意能有多难！
我们这个夏天就把它搞定吧！
MIT的学生总是很聪明的^_^
夏季视觉项目尝试高效地使用我们的暑期员工
来建立视觉系统中意义重大的一部分
这就是那个夏天他们的目标
可能他们没有“高效地使用暑期员工”
但不论怎么说，计算机视觉问题没能在那个夏天被搞定
在那之后，计算机视觉成为了人工智能方面
增长最快的一个领域
如果你参加一个顶级的计算机视觉大会
叫做CVPR或者ICCV
就能看到我们有2000到2500名研究人员
在全球各地进行这方面的研究
有一个相当实用的信息
就是

English: 
the world how hard can this be let's
solve it in one summer
so especially MIT students are smart
right so the summer vision project is an
attempt to use our summer workers
effectively in a construction of a
significant part of a visual system this
was the proposal for that summer and
maybe they didn't use their summer work
effectively but in any case Kumbi
computer vision was not solved in that
summer since then they become the
fastest growing field of comparison and
AI if you go to today's premium computer
vision conferences CS call cvpr or icc v
we have like 2,000 to 2,500 researchers
worldwide attending this conference and
a very practical note for for students

Chinese: 
如果你是一个计算机视觉/机器学习领域的优秀学生
根本不用担心在硅谷找不到工作
或者随便在什么地方
这的确是一个让人兴奋的领域
1966年是计算机视觉的诞生年
这意味着今年是它的50周年
非常值得庆祝的一年
我们艰难跋涉了如此之远
那么我们继续计算机视觉的历史
这也是以为值得铭记的人物，David Marr
他当时也在MIT
与一些才华横溢的科学家合作，比如
David Marr本身去世得非常早
在70年代（勘误：应为1980年）
他写了一本非常有影响力的书
《视觉》，非常薄的一本书

English: 
if you are a good computer vision slash
machine learning students you will not
worry about jobs in Silicon Valley or
anywhere else so so it's it's actually
one of the most exciting field but that
was the birthday of computer vision
which means this year is the 50th
versary of computer vision that's a very
exciting year in computer vision I we
have come a long long way
okay so continue on the history of
computer vision this is a person to
remember David Marr he he was also at
MIT at that time working with a number
of a very influential computer vision
scientist Shimon Ullman Thomas Tommy
Poggio and David Marr himself died early
in 70s and he wrote a very influential

English: 
book called vision it's a very thin book
and David Marsh
thinking about vision he took a lot of
insights from neuroscience we already
said that Hubel and Wiesel give us the
concept of simple structure vision
starts with simple structure it didn't
start with a holistic fish or holistic
Mouse David Marr give us the next
important insight and these two insight
together is the beginning of deep
learning architecture is that vision is
hierarchical you know so human and Visa
said okay we start simple but human visa
didn't say we're any simple this visual
world is extremely complex in fact I
take a picture a regular picture today
with my iPhone there is I don't know my
iPhone's resolution let's suppose it's
like 10 mega megapixels the potential

Chinese: 
David Marr认为视觉
他从神经科学领域领悟了很多
Hubel和Wiesel给了我们一些概念
视觉处理流程从一些简单形状开始
而不是整体的鱼或者耗子
David Marr给了我们第二个重要的领悟
这两个领悟共同形成了现在的深度学习架构的基石
这个领悟就是：视觉是分层的
所以Hubel和Wiesel阐述了我们应该从简单的形状开始
但并没有说我们也在简单的形状结束
这个世界极为复杂
实际上我用我的iPhone拍一张照片
我并不知道iPhone的分辨率
假设就是一千万像素好了

English: 
combination of pixels to form a picture
in that is bigger than the total number
of atoms in the universe that's how
complex vision can be is it's it's
really really complex so human visit
oldest are simple David Marr told us
build a hierarchical model of course
David mark didn't tell us to build it in
a convolution on your network which we
will cover for the rest of the quarter
but his idea is
is this to represent or to think about
an image we think about it in several
layers the first one he thinks we should
think about the edge image which is
clearly an inspiration
it took the inspiration from human visa
and he personally call this the primal
sketch it's you know the name is
self-explanatory and then you think

Chinese: 
这么多像素的可能的排列组合
比宇宙中的原子总数还多
这说明了视觉究竟有多复杂
那么，Hubel和Wiesel告诉了我们从简单的形状开始
David Marr告诉我们建立一个分层的模型
当然David Marr并没有说“建立一个卷积神经网络吧”
那是我们后面整个季度都要去讲的
他的想法是，将图像表现为
把图像想成
由多个层构成
他认为第一层应该是边缘结构
显然是从Hubel&Wiesel处得到的灵感
他称之为原始草图（Primal sketch）
这个名字就足以解释自己的意思了

Chinese: 
接下来一层，他称之为2 (1/2)D
这里你将2D的图像信息调整为包含真实世界的3D信息
在这里你会认识到层次关系
比如我现在看到你们
我不会以为你们都只有脖子以上的部分
虽然我只看到了这些
但是我知道那是因为你们被前排给挡住了
这是一个计算机视觉领域非常核心的问题
我们会有遮挡的问题
自然界就会产生遮挡的问题
因为世界是3维的
而我们的视觉成像是2维的
大自然先解决了这个问题
用了一个硬件上的技巧
就是用两只眼睛而非一只
但是这就需要一大堆软件上的技巧
来将两只眼睛看到的信息合并到一起
所以我们在计算机视觉上也同样
需要解决2.5D的问题

English: 
about two and half D this is where you
start to reconcile your 2d image with a
3d world you recognize there is layers
right on you know I look at you right
now
I don't think half of you only has a
head and a neck even though even though
that's all I see but there is I know
you're included by the row in front of
you and this is the fundamental
challenge of vision we have an ill-posed
problem to solve nature had a ill-posed
problem to solve because the world is 3d
but the imagery on our retina is 2d
Nature solved it by first a hardware
trick which is to ice it did I use one
eye but then there's going to be a whole
bunch of software trick to merge the
information of the two eyes and all this
so the same thing with computer vision
we have to solve that two and half the
problem and then eventually we have to

English: 
put everything together so that we
actually have a good 3d model of the
world why do we have to have a 3d model
of the world because we have to survive
navigate manipulate the world when I
shake your hand I really need to know
how to you know extend out my hand and
grab your hand in the right way that is
a 3d modeling of the world otherwise I
won't be able to grab your hand in the
right way when I pick up a mug the same
thing so so that's a that's a that's
David Marsh architecture for visual it's
a very high-level abstract architecture
it doesn't really inform us exactly what
of mathematical modeling we should use
it doesn't inform us of the learning
procedure and it really doesn't inform
as of the the inference procedure which
we were getting to through the deep

Chinese: 
最终，我们还是需要把一切整合起来
得到一个3D的世界的模型
我们为什么需要3D的模型呢？
因为我们需要生存，需要导航，需要改造这个世界
当我和你握手的时候
我必须知道
怎么伸出我的手去
通过合适的路线抓住你的手
在3D的空间中
否则我就没法判断该怎么去抓你的手
拿起杯子也是一样的
那么，这就是David Marr的
关于视觉的模型架构
这是一个高度抽象的架构
并不能指引我们建立一个什么样的数学模型
它指引我们去了解学习的过程
也指出了推理的过程
在深度学习网络中指导我们

English: 
learning network architecture but that's
the that's the high-level view and it's
an important it's an important concept
to learn in in vision and we call this
the representation um a couple of really
important work and this is a little bit
Stanford centric to just show you as
soon as David Marr are laid out this
important way of thinking about vision
the first wave of visual recognition
algorithms went after that 3d model
because that's the goal right like no
matter how you represent the the stages
the goal here here is to reconstruct a
3d model so that we can recognize object
and this is really sensible because
that's what we go to the world and do so
both of these two influential work comes
from Palo Alto one is from Stanford one
is from SR I so uh Tom Binford was a

Chinese: 
但这是一个高屋建瓴的，概念性的思想
非常重要的概念性思想
我们称之为representation
有一些非常重要的成果
展示给你们了解
在David Marr提出这种思考方式
之后不久
涌现出了第一波视觉识别算法
紧随着3D模型的思路
因为这就是目标，对吧？
目标就是重建3D模型，以便我们识别它
识别出一个物体
这是合情合理的
因为我们在现实中就是这么做的
这两个重大成果都出自Palo Alto（斯坦福所在城市）
一个来自斯坦佛，一个来自SRI

English: 
professor at Stanford AI lab and he had
his student Rodney Brooks proposed one
one of the first so-called generalized
cylinder model I'm not going to get into
the details but the idea is that the
world is composed of simple shapes like
cylinders blocks and then any real-world
object is just a combination of these
simple shapes given a particular viewing
angle and that was a very influential
visual recognition model in the 70s and Romney Brook went on to become a
director of MIT's AI lab and he was
also a founding member of the irobot
company and
Roomba and all this so he continued very
influential of AI work
another interesting model coming from

Chinese: 
Tomas Binford是一名斯坦佛AI实验室的教授
他和他的学生Brooks提出了"generalized Cylinder"模型
我不会深入讨论它的细节
但它的主旨是
整个世界都是由
简单的形状组成
比如
圆柱体
世界上所有的实体
都只不过是这些形状的组合
再从不同的视角观察而已
这是一个在70年代
非常有影响力的
视觉识别模型
Brooks后来成为了斯坦福AI实验室的主任
同时，他也是iRobot公司的联合创始人
一直都在从事着AI方面的工作
另一个有趣的模型
来自于我们这里的斯坦福研究院（SRI）

English: 
local uh Stanford Research Institute I
think SR I is a across the street from
El Camino is this pictorial structure
model is very similar it focused it has
less of a 3d flavor but more of a
probabilistic flavor is that the objects
are made of still simple parts like a
person's head is made of eyes and nose
and mouth and the parts were connected
by Springs allowing for some deformation
so this is getting a sense of okay we
recognize the world not every one of you
have exactly the same eyes in the
distance between the eyes we allow for
some kind of variability so this concept
of variability start to get introduced
in a model like this and using models
like this you know the the reason I want

Chinese: 
就在El Comino，街对面就是
这个模型称为Pictorial Structure
它主要专注于
3D方面的特色就比较少
反而是概率模型的味道浓一些
它也认为物体由简单的部分组成
比如人的头部由眼睛鼻子和嘴组成
各个部分之间由“弹簧”连接起来
允许有一些变形
ok，我们知道这个世界
你们眼睛之间距离并不都是完全一致的
有一定的多样性
多样性的概念引出了这样一个模型
利用这样的一些模型
我给你们看这些模型的原因

Chinese: 
是让你们知道在80年代的成果是多么simple
但这已经是80年代最为震撼的成果了
识别整个世界
整篇论文中，“整个世界”就是这些剃须刀了
用这些边缘
以及这些边缘组成的简单的形状
来识别这些物体。这就是David Lowe的成果
他也是斯坦福的学生
所以这就是计算机视觉世界古时候的情况了
我们能看到黑白的甚至是合成的图像
到了90年代
我们终于开始着手处理彩色的画面了
这可是个大进步
另一项重大的成果
这个并不是去识别图片中的物体

English: 
to show you this is to see how simple
that the work was in 80s this is one of
the most influential model in the 80s
are recognizing real world object and the
entire paper of real world object is
these shaving razors and but using the
edges and and simple shapes formed by
the edges to to recognize this by by
develop another another Stanford
Graduate so that's that's a that's kind
of the ancient world of computer vision
we have been seen black and white or
even synthetic images starting the 90s
we finally start to move into like
colorful images of real world it was a
big change again a very very influential
work
here it's not particularly about

English: 
recognizing an object is about how
only like carve out an image into
sensible parts right so if you enter
this room there's no way your visual
system is telling you oh my god I see so
many pixels right you immediately have
group things you see heads heads heads
chair chair chair a stage platform piece
of furniture and all this this is called
perceptual grouping perceptual grouping
is one of the most important problem in
vision biological or artificial if we
don't know how to how to solve the
perceptual grouping problem we're going
to have a really hard time to deeply
understand the visual world and and you
will learn towards the end of this this
class this course a problem is
fundamental as this it's still not
solved in computer vision even though we

Chinese: 
而是去将图片分割成有意义的几部分
比如你进到这间屋子里
你的视觉系统不会告诉你
“哦天哪，好多像素”
对吧？你一瞬间就将像素按物品分组了
你看到了帽子帽子帽子
椅子椅子椅子
舞台，平台，家具……
这个称为感知分组
感知分组是视觉领域最为重要的问题
不论是生物视觉还是人工视觉
如果我们解决不了感知分组的问题
我们就会在深入理解视觉世界上遇到巨大的困难
你会在这堂课，这个课程结束时发现
如此基础的一个问题并没有彻底得到解决
尽管我们得到很多进展——

English: 
have made a lot of progress before deep
learning and after deep learning we're
still grasping the final solution of a
problem like this so so this is again
why I want to give you this introduction
to for you to be aware of the deep
problems in vision and also the the
current state in the the challenges in
vision we did not solve all the problem
in vision despite whatever the news says
you know like we're far from developing
terminators who can do everything yet so
this piece of work is called normalized
cut is one of the first computer vision
work that takes real-world images and
tries to solve a very fundamental
difficult problem and titania Malick is
a senior computer vision researcher
now professor at Berkeley also Stanford
Graduate and you can see the results are

Chinese: 
深度学习之前和之后都有
我们仍然在苦苦追寻着，这类问题的终极答案
这也是为什么我希望给你们上这节介绍课程
让大家了解到视觉领域的深层次的问题
同时也是我们面对的挑战
我们远远没有解决所有的问题
不管新闻怎么说
我们离建造什么都能干的终结者还远着呢
这项成果，称为“normalized cut”的
这项成果是第一次
使用现实世界的图片
并且试图去解决一个非常核心的难题
Malik是一个视觉领域高级研究员
现在是伯克利的教授
也是斯坦福毕业的
你能看到
这结果也并非特别好

English: 
not that great um are we going to cover
any segmentation in this class for me
when we might right you see we are
making progress but this is the
beginning of that another very
influential work that I want to
I want to bring out and pay tribute so
for even though these work were not
covering them in the rest of the course
but I think it as a vision student it's
really important for you to be aware of
this because not only introduces the
important problem we want to solve it
also gives you a perspective on the
development of the field this work is
called village owns face detector and
it's very dear to my heart because as a
graduate student fresh graduate student
at Cal Tech it's the full of the first
papers I read as a graduate student when
I enter the lab and I didn't know
anything that my advisor said read this

Chinese: 
 
 
你们会看到我们在进步
而这是第一步
另一项我希望介绍给你们
非常震撼的成果，值得致敬
即使我们课程后面不会涉及这些内容
但是作为视觉领域的学生
你们应该了解这个
因为它不仅指出了我们需要解决的问题
还反映着整个领域的发展
这项成果称为Viola Jones Face Detector
在我心中它非常可贵
因为在我刚刚成为CALTech的研究生时
这是我最先读的论文之一
刚到实验室时什么都不懂
导师跟我说
来读读这篇超赞的论文

Chinese: 
我们全都在学着理解它
然后后来等我毕业的时候
这个成果
转化成为了第一个智能面孔检测的产品
富士相机2006年的数码相机产品
是第一台具有面孔检测功能的数码相机
 
从成果转化的角度讲，这是非常迅速的
而且这也是第一个应用到大众消费产品上的
高级视觉识别算法
这项成果用于面孔检测
面孔信息是一种“野生”的数据
而非模拟数据或者人造的数据
虽然这项成果并没有使用深度学习网络
但是它的特征学习过程却有很强的深度学习特质

English: 
amazing piece of work that you know
we're all trying to understand and then
P by the time I graduated from Cal Tech
this very work is transferred to the
first smart digital camera by Fujifilm
in 2006 as the first digital camera that
has a face detector so from a transfer
point point a technology transfer point
of view it was extremely fast and it was
one of the first successful high-level
visual recognition algorithm that's
being used by consumer product so this
work just learns to detect faces and
faces in a wild it's no longer you know
simulation data or very contrived data
these are any pictures and and again
even though it didn't use a deep
Learning Network it has a lot of the
deep learning flavor the features were
learned you know the algorithm learns to

Chinese: 
你看，算法试图寻找一些
比如这些黑白的过滤器特征值
可以最好地识别面孔位置信息的特征
这是一项相当震撼的成果
也是第一件可以在电脑上实时运行的
计算机视觉方面的研究成果
此前，计算机视觉算法都非常的慢
这篇论文实际上就叫做“实时面孔检测”
它可以在奔腾II芯片上运行
不知道你们是否记得有这么一款CPU
这是一颗非常慢的芯片
但是它能让这个算法实时地运行检测
那么这就是这项重大的成果
需要指出的是
这不是这个时期唯一的成果
但这件成果却反映了

English: 
find features simple features like these
black and white filter features that can
give us the best localization of faces
so this is a very influential piece of
work it's also one of the first computer
vision work that is deployed on a a
computer and can run real time before
that compare vision
algorithms were very slow the paper
actually is called real-time face
detection it was granted
Pentium 2 chips I don't know if anybody
remember that kind of chip but it was on
a slow chip but nevertheless it run real
time so that was another very important
bit of work and also one more thing to
point out around this time this is not
the only work but this is a really good

Chinese: 
计算机视觉领域研究焦点的一次变迁
还记得吗
David Marr和斯坦福早期的工作
都试图去给真实的3D物体建模
现在我们变成了试图去“识别物体是什么”
我们跳过了
是否能对这些面孔重新建模的过程
现在还有一整个分支在继续研究这个方向
但有很大一部分计算机视觉的工作
将研究的焦点聚焦到识别领域
这个趋势，将计算机视觉带回到人工智能领域
现在
计算机视觉研究最重要的课题
就聚焦于这类识别问题
和AI问题

English: 
representation around this time computer
the focus of computer vision is shifting
remember that David Marr and the early
Stanford work was trying to model the 3d
shape of the object now we're shifting
to recognizing what the object is we
lost a little bit about can we really
reconstruct these faces or not there is
a whole branch of computer vision
graphics that continue to work on that
but a big part of computer vision is not
at this time around the turn of the
century is focusing on recognition
that's bringing computer vision back to
AI and today the most important part of
the computer vision work is focused on
these cognitive questions like
recognition and AI questions um another

English: 
very important piece of work is starting
to focus on features so around the time
of face recognition people start to
realize it's really really hard to
recognize an object by describing the
whole thing like I just said you know I
see you guys
heavily included I don't see the rest of
your torso I really don't see any of
your legs other than the first row but I
recognize you and I can infer you as an
object so so people start to realize gee
it's not necessarily that global
shape that we have to go after in order
to recognize an object maybe it's the
features if we recognize the important
features on an object we can go a long
way and it makes a lot of sense think
about evolution right if you're out
hunting you don't need to recognize that

Chinese: 
另一项非常重要的成果
是关于特征（features）的
在面孔识别的那段时期人们发现
想要通过描述整个物体来实现识别
是非常非常困难的
如我刚才所说
我看到的你们
是被遮挡得很严重的
我看不到你们的躯干部位
除了第一排也都看不到腿
但我仍然能识别你们
我能推断出你们是一个物体
所以
人们发觉，并不是必须掌握整体全部的形状
来识别一个物体
也许重要的是特征
如果我们认识物品的一个重要特征
我们就能取得很大的进展，具有很大的意义
如果你出去打猎

Chinese: 
你不需要看见整只的老虎才懂得逃走
只需要从树叶间的空隙中
看到几处老虎身上的花纹
就足够引起警觉了
所以我们需要能快速地看并且识别
基于视觉进行决策的基础就是快速地识别
基于重要的特征大量识别
所以这被视作一次升级
由David Lowe提出——你们又见到他了
从物品上学习一些重要特征
一旦学会了这些特征
哪怕很少的几项
你就能够识别出它们，哪怕从完全不同的角度
或是在杂乱的场景下
所以
2010-2012年间深度学习复兴，在那之前
大概有10年的时间，整个计算机视觉领域都在研究

English: 
Tigers full body and shape to decide you
need to run away you know just a few
patches of the fur of the tiger through
the leaves probably can alarm you enough
so so we need to vision as quick
decision-making based on vision is
really quick a lot of this happens on
important features so this work cost
sift by devil Oh again you saw that name
again is about learning important
important features on an object and once
you learn these important features just
a few of them on an object you can
actually recognize this object in a
totally different angle on a totally
cluttered scene so up to deep learnings
resurrection in that 2010 or 2012 for
about 10 years the entire field of

Chinese: 
如何用这些特征来建模
来识别物体和场景
我们已经做了很多工作，取得了很多进展
最近深度学习变得更有说服力的原因之一
就是很多人发现
深度学习网络学习到的特征
和那些出色的工程师设计的特征非常相似
这是一种佐证
David Lowe第一次告诉我们这些特征很有效
然后我们找到一些算法来自动地学习这些特征
二者互相佐证了对方
所以这项工作的重要性不应被忽视
这项成果是我们后面成果的智力基础之一
它告诉了我们

English: 
computer vision was focusing on using
these features to build models to
recognize objects and things and we've
done a great job we've gone a long way
one of the reasons deep learning network
uh was became more more convincing to a
lot of people is we will see that the
features that a deep learning network
learns is very similar to these
engineered features by brilliant
engineers so it's kind of confirmed even
though you know in needed we need a
develope to first tell us this features
work and then we start to develop better
mathematical models to learn these
features by itself but they confirmed
each other so so the historical you know
importance of this work should not be
diminished
they this work is the intellectual
foundation for us one of the
intellectual foundation for us
to realize that how critical or how
useful these deep learning features are

English: 
when we learn them uh I'm going to skip
this work and just briefly say because
of the features that they will owe and
meaning other researchers taught us we
can use that to to learn that Scene
Recognition and around that time the
machine learning tools we use mostly is
either graphical models or support
vector machine and this is one
influential work on using support vector
machine and kernel models to recognize
the sink but I'll I'll be brief here and
then one almost one last model before
deep learning model is this feature or
feature based model called deformable
part model is where we learn parts of an
object like parts of a person and we
learn how to configure each other well

Chinese: 
深度学习得到的这些特征有多重要，多有用
由于David Lowe和其他很多研究者的工作
我们能用这些特征去进行场景的识别
那个时期我们进行机器学习的主要工具
是图形建模或者支持向量机（SVM）
使用SVN和核模式进行场景识别
也是那个时期的一项重要成果
在深度学习之前差不多最后一项重要成果
是一个特征模型
称为可变性部件模型(deformable part model)
是我们学习一个物体的一部分
例如人的一部分
然后我们学习他们互相之间如何关联

Chinese: 
在空间中的关联
然后使用支持向量机
某种模型
来识别物体，例如人体或者瓶子等等
这个时期，也就是2009，2010年间
计算机视觉已经足够成熟
我们已经开始解决一些重要而且复杂的问题
例如识别行人和汽车
这些都是现实中的问题而非人为
我们还需要一个基准测试
因为这个领域已经充分发展了
如果没有一个基准测试
大家发论文的时候都各自用各自的图像
建立起一个全球性的标准就太难了
那么一个重要的基准测试就是
PASCAL VOC物体识别基准测试
由欧洲建立

English: 
they come they configure in space and
use a support vector machine kind of
model to recognize objects like humans
and bottles around this time that's 2009
2010 the field of computer vision is
matured enough that we're working on
these important and hard problem like
recognizing pedestrians and recognizing
cars they're no longer contrived problem
something else was needed its
benchmarking because as a field advanced
enough if we don't have good benchmark
then everybody's just published in
papers on a few set of images and it's
really hard to really set global
standard so one of the most important
benchmark is called Pascal vo C object
recognition benchmark it's by a European

English: 
it's a European effort that researchers
put together at tens of thousands of
images from 20 classes of objects and
these are
one example per per object like Cat
Scouts cows maybe no cats dogs cows
airplanes bottles you know horses trains
and all this and then we used and then
annually our computer vision researchers
and labs come to compete on the object
recognition task for a Pascal object
recognition challenge and in over the
past you know like through the years the
the performance just keeps increasing
and that was when we start to feel
excited about the progress of the field
at that time here's a little bit of more
a closer story close to us is that my

Chinese: 
研究人员集合了数万张图片
包含20个分类
这里是个例子
包含了猫，牛
好像没有猫？
狗，牛，飞机，瓶子
恩，马
火车等等
接下来我们
每年我们这些研究人员和实验室都来参加竞赛
参加PASCAL VOC的物体识别挑战
在过去，很多年时间里
（算法的）表现一直都在提高
我们对这个领域的发展感到兴奋
这时候有一个和我们关系比较近的小故事
我的实验室，我和我的学生们

Chinese: 
觉得真实世界远远不止20种物品
似乎比20种多那么一点
我们追随着PASCAL的成果
我们建立了一个超超大规模的项目，叫做ImageNet
有些同学可能听说过
在这个课程中，你们可能会使用其中一小部分导出的图像去做作业
ImageNet有5000万张图片
全部都是手工清洗
标注了超过2万个分类
哦不用担心，不会让研究生去清洗数据的
那确实挺恐怖的
是使用Amazon Mechanic Turk平台众包完成的
其实也有一些同学干了点苦力活
把这些平台输出整合到一起

English: 
lab and my students were thinking you
know the real world is not about twenty
objects the with real world is a little
more than twenty objects so following
the work of Pascal visual object
recognition challenge we put together
this massive massive project or imagenet
some of you might have heard of image
net in this class you will be using a
tiny portion of image that in some of
your assignments that image that is a
data set of 50 million images all
clinged by hands and annotated over
20,000 object classes door it's not
graduate student who cleaned it it's
that that would be very scary it's
Amazon Mechanical Turk platform the
crowdsourcing platform and having said
that graduate student also suffered for
from you know putting together this this

Chinese: 
这是一个非常棒的数据集
我们开始把这些整合到一起
每年举行一次竞赛
叫做ImageNet Competition for Object Recognition
比如一项ImageNet的标准竞赛是
对1000种类的接近150万张图片进行识别
比较各种算法的性能
说起来我听到过一些社会媒体称ImageNet为
计算机视觉界的奥林匹克竞赛
真是不胜荣幸
不过
下面展示的内容会让我们更了解
深度学习的历史成就
ImageNet挑战开始于2010年

English: 
platform but it's a very exciting data
set and we started we started to put
together competitions annually called
image that competition for object
recognition and for example a standard
competition of image classification by
image that is a thousand object classes
over almost 1.5 million images and
algorithms compete on the performance so
actually I just heard somebody was on
the social media was referring image
that challenge as the Olympics of
computer vision I was very flattering
but um but here is something that here's
bringing as close to the history making
of deep learning so in in a so the image
step challenge started in 2010 that's
actually around the time Pascal you know

Chinese: 
那时候我们听说PASCAL要逐步终止
他们的20类别识别比赛
所以我们决定启动
我们的1000类别ImageNet挑战赛
Y轴是错误率
一开始我们的错误率还是相当高的
而每一年错误率都在下降
但是有一个特殊的年份
错误率非常明显地下降了
几乎是腰斩
就在2012年
2012年的ImageNet挑战赛
获得桂冠的模型
就是卷积神经网络
我们所谈论的“卷积神经网络”
可不是在2012年被发明出来的
尽管新闻上把它描述成最最新兴的产物
但其实不是
它在上世纪七、八十年代就被提出了

English: 
we're colleagues they told us they're
going to start to phase out their
challenge of twenty objects so we faced
in the thousand object image the
challenge and y-axis is error rate and
we start to we started with very
significant error and of course you know
every year the error decreased but
there's a particular year that error
really decreased it was cutting half or
almost is 2012 2012 is the year that the
winning architecture of image that
challenge was a convolutional neural
network model and we'll talk about it
convolutional neural network was not
invented in 2012 despite how all the
news make it sound like it's the newest
thing around the block it's not it was

Chinese: 
但是，（直到现在）种种因素共同作用下
卷积神经网络展现出了强大的能力
作为一个高性能的端到端训练模型
以巨大优势赢得了ImageNet挑战赛
 
这是一个历史性的时刻
从数学的角度来看它不怎么新
但是从工程和解决真实问题的角度
这确实是一个历史性的时刻
 
这是深度学习革命的开端
也是这门课的前提
我们在这里切换一下
我们之前回顾了一下计算机视觉的简史
大概5.4亿年（笑）

English: 
invented back in the 70s or 80s
but having a convergence of things we'll
talk about convolutional neural network
showed its massive power as a high
capacity end to end training
architecture and won the image they're
challenged by a huge margin and that was
you know a quite a historical moment
from a math mathematical point of view
nothing it wasn't that new but from an
engineering and an solving real-world
point of view this was a historical
moment that that piece of work was
covered by you know New York Times and
all this this is the onset this is the
beginning of the deep learning
revolution if you call it
and this is the premise of this class so
at this point I'm gonna switch so we
went through a bit of brief history of

Chinese: 
现在我要切换到我们这门CS231n
有人有什么问题吗？
OK
 
我们谈论了这些，可能内容有点多
我们谈论了很多计算机视觉领域不同的任务和方向
CS231n将聚焦于视觉识别问题
从总体上来说
大部分基础讲义
都会探讨图像分类问题
你们现在知道
我们将会基于ImageNet图片集分类
我们也会接触其他的视觉识别场景
但是图像分类问题将是我们这堂课的主要问题
你们需要知道
视觉识别并不仅仅是图像分类
还有3D建模

English: 
computer vision for 540 million years
and now I'm going to switch to the
overview of this class is there any
other questions ok all right so um we've
talked about even though it was kind of
overwhelming we talked a lot about many
different tasks in computer vision CS
231 n is going to focus on the visual
recognition problem also by enlarge
especially through most of the
foundation lecture we're going to talk
about the image classification problem
but now you know everything we talked
about is going to be based on that image
that classification setup we will we
were getting to other visual recognition
scenarios but the image classification
problem is the main problem we will
focus on in this class which means
please keep in mind visual recognition
is not just the image classification
right there was 3d modeling there was

Chinese: 
还有感知分组以及（图像）分割等等
但是我们将聚焦（在图像分类）
我甚至不需要说服你们
即使在应用层面，图像分类也是极有价值的问题
从互联网巨头的观点
到初创企业
你会希望识别物体
识别食物
在线电商
手机电商
整理相册
所以图像分类可以成为
很多重要问题的根本解决之策
很多问题都和图像分类有关
今天我并不期望你们领悟其中的区别
但是我希望你们能够通过这堂课

English: 
perceptual grouping and segmentation and
all this but that's that's what we'll
focus on and I don't need to convince
you that just even application wise
image classification is extremely useful
problem in
you know big big commercial internet
companies a point of view to start up
ideas you know you want to recognize
objects you want to recognize food you
want to do online shop mobile shopping
you want to sort your albums so image
classification is is is can be a bread
and butter a task for many many
important problems um there is a lot of
problem that's related to image
classification and today I don't expect
you to understand the differences but I
want you to hear that throughout this
class we'll make sure you learn to
understand the neurons in the the
details of different flavours of visual

English: 
recognition what is image classification
what's object detection
what's image captioning and these have
different flavors for example you know
while image classification my focus on
the whole big image object detection by
tell you where things exactly are like
where the car is the pedestrian or the
hammer and and where the the
relationship between objects and so on
so there are nuances and and details
that you will be learning about in this
class and I already said CNN or
convolutional neural network is one type
of deep learning architecture but it's
the overwhelmingly successful deep
learning architecture and this is the
architecture we will be focusing on and

Chinese: 
学到视觉识别领域不同的流派的特点
比如何谓图像分类
何谓物体检测
何谓图像说明
这些都是不同的领域流派
比如说
图像分类关注的是大图整体
物体检测告诉你东西具体出现在图片的哪里
比如车在哪儿、行人在哪
锤子
以及物体之间的联系是什么
之类的
你们需要在这堂课学习这些细微的差别和细节
 
我已经提到过
CNN，卷积神经网络只是深度学习架构的一种
但它是压倒性地成功的一种
也是我们将要学习的一种
回到ImageNet挑战赛

Chinese: 
我有提到2012年是历史性的一年
这一年Alex Krizhevsky和他的导师Geoff Hinton
提出了这个卷积神经网络
我想是个7层结构的卷积神经网络
获得了ImageNet挑战赛的冠军
在这一年以前
一直都是特征+支持向量机结构
它也是分层结构
但它并没有端到端学习的风格特色
我们快进到2015年
获胜的仍然是卷积神经网络架构
有151层
由微软亚洲研究院提出
称为深度残差网络

English: 
to just go back to the image that
challenge ah so I said the historical
year is 2012 this is the year that Alex
Khrushchev ski and his advisor geoff
hinton proposed this this convolutional
neural network I think it's a seven
layer convolutional neural network to
win the image that challenge model
before this year it was a shift feature
plus
a support vector machine architecture
it's still hierarchical but it doesn't
have that flavor of end to end learning
and fast forward to 2015 the winning
architecture is still a convolutional
neural network it's a hundred fifty one
layers by by Microsoft Asia research
researchers and it's covering the

Chinese: 
我不确定我们会不会讲到这里
也不会指望你们能记住它每一层到底在干吗
额……其实它是重复的，所以也没那么难
但是2012年以后的每届冠军都是基于深度学习的架构
就如我所说，我总是愿意尊重历史
CNN不是一夜之间发明的
有很多杰出人物做出了贡献
你们知道，很多人打下了基础
我没有幻灯片
但是有一个重要的人物你们应该记住
就是Kunihiko Fukushima
Kunihiko Fukushima是一位日本家算计科学家
提出了一个模型，称为Neocognitron
那便是神经网络架构的开端

English: 
residual net right is that could the
residual net so I'm not so sure if we're
going to cover that definitely don't
expect to know every single layer what
they do actually they repeat so it's not
that hard but but but every year since
2012 the winning architecture of image
net challenge is a deep learning based
architecture so like I said I also want
you to respect history um
CNN is not invented overnight there is a
lot of influential players today but you
know there are a lot of people who build
the foundation I actually I don't have
the slides one important name to
remember is kunihiko fukushima kunihiko
fukushima it was a Japanese computer
scientist who build a model Konya

Chinese: 
Yann LeCun也是一位非常有影响力的人物
我心目中的奠基性成果就是Yann LeCun在90年代发布的
在数学家们和Geoff Hinton——Yann LeCun的导师
加入进来，并且搞明白了反向求导和学习策略
——如果根本听不明白这些词儿
Andrej会花好几周时间来告诉你们的
这些数学问题在八十年代和九十年代得到了解决
Yann LeCun在AT&T的贝尔实验室工作
那个年代的奇迹之地
如今已经没有了
他们发起了一个非常有野心的计划
他们准备识别手写数字

English: 
cognitum and that was the beginning of
the the neural network architecture and
yellow kun is also a very influential
person and he's really his the the
groundbreaking work in my opinion of
young ku was published in the 1990s so
that's when mathematicians and which
geoff hinton Yellen Cruz PhD advisor was
involved worked out the back propagation
learning strategy which if this work
didn't mean anything
Andrei will tell you in a couple of
weeks so but but the the mathematical
model was worked out in the 80s and the
90s and this was a yellow Coon was
working for Bell
laughs at AT&T which is a amazing place
at that time there's no Bell Labs today
anymore
that they were working on really
ambitious projects and he needed to
recognize digits because eventually that
product was shipped to banks in the u.s.
post office to recognize zip codes and

English: 
checks and he constructed this
convolutional neural network and this is
where he he's inspired by Hubel and
Wiesel he starts by looking as simple
edge like structures of an image it's
not like the whole letter a it's really
nice just edges and then layer by layer
he he you know he filters these edges
pull them together filters pool and then
the build this architecture 2012 when
Alex Khrushchev ski and geoff hinton
used almost exactly the same
architecture to participate in a in the
in the image net challenge almost there's
very few changes but that become the
winning architecture of this so what I
will tell you more about the detail
changes there is the capacity that the

Chinese: 
因为在银行或者邮政体系中会有识别支票和邮编的需求
于是他建立起了一个卷积神经网络
受到Hubel和Wiesel的启发
他开始从简单的边缘结构着手
整张图片是一个字母A
在每一层网络中，他对图像进行滤波
再放到一起，池化，滤波，池化
建立了这样一个结构
2012年
Alex Krizervsky和Geoff Hinton
使用了几乎完全一样的架构
来参加ImageNet挑战赛
只有极少的修改
却夺得了冠军
Andrej会告诉你们具体的区别
这个模型的能力确实有所增长

Chinese: 
因为摩尔定律帮了我们
另外还有一个非常细节的一点
一个函数改了一点点
从sigmoid改成了一个更为线性的函数（Relu）
没关系
有一些小改动
但是主要的方面没有任何改变
从数学的角度来说
但是有很重要的两点确实改变了
引发了深度学习架构的复兴
一个是摩尔定律
硬件（的发展）让很多事不一样了
这种高性能的模型
Yann LeCun在搞手写识别的时候
慢得让人抓狂
由于算力的瓶颈
他根本不能把模型设计得太大
如果你不能设计一个大模型
你就不能了解到它到底有多少潜力
你知道在机器学习的角度上来说

English: 
model did grow a little bit because
Moore's Law helped us there's also a
very a very detailed function that
changed a little bit of a shape from a
sigmoid row to a more rectified linear
shape but whatever there's a couple of
small changes but really by enlarge
nothing had changed mathematically but
two important things did change and that
Grove the deep learning architecture
back into into its Renaissance one is
like I said Moore's law and hardware
hardware made a huge difference because
these are high extremely high capacity
models when Gallagher was doing this
it's just painfully slow because of the
the bottleneck of computation he
couldn't build this model too big
at once you cannot build it too big it
cannot fully realize its potential you

Chinese: 
有过拟合等等的问题，非常难以解决
但是我们现在有了一个快得多也大得多的
额……不是晶体管
更高性能的集成芯片和Nvidia的GPU
Nvidia在深度学习的历史中引发了一场大变革
我们现在可以
在合理的时间开销内训练这些模型
即便模型很大
另一个重要原因
我们应该归功于
我想是数据，大量可用的数据
或者叫，大数据
数据本身
你知道
并没有什么意义
如果你不知道该怎么使用它
但是在这个深度学习架构中
数据成了高性能架构的驱动力
来执行端到端的训练
帮助解决过拟合的问题
只要有足够的数据
所以如果你观察两个模型的像素数量

English: 
know the from machine learning
standpoint there's overfitting and all
these problems you cannot solve but now
we have a much faster and bigger
transistor not transistors bigger
capacity microchips and GPUs from Nvidia
Nvidia made a huge difference in deep
learning history that we can now train
these models in a reasonable amount of
time even if they're huge another thing
I think we do need to take credit for is
data the availability of data that was
the big data data itself is just you
know it doesn't mean anything if you
don't know how to use it but in this
deep learning architecture data become
the driving force for a high capacity
model to enable the end-to-end training
and to help avoid overfitting when you
have enough data so you know so you if

English: 
you look at the number of pixels that
machine learning people had in 2012
versus yellow ku had in 1998 it's a huge
difference orders of magnitude so so
that was that's so this is the focus of
231 n but we'll also go
it's also important one last time I'm
gonna drilling this idea that visual
intelligence does go beyond object
recognition I don't want any of you
coming out of this course thinking we've
done everything you know we've saw
vision and if it's the challenge defined
the entire space of visual recognition
it's not true there are still a lot of
cool problems to solve for example you
know dense labeling of an entire scene
with perceptual groupings I know where
every single pixel belong to that's
still an ongoing problem combining

Chinese: 
机器学习界的人们在2012年使用的模型
和Yann LeCun在1998年用的模型对比
天差地别
顺序和量级都是
 
这些是231n所关注的焦点
但是我们也
有很重要的
我最后一次强化这个念头
（计算机）视觉智能比物体识别要更为任重而道远
我希望你们学完了这个课
不要以为我们已经搞定了所有计算机视觉的难题
以为ImageNet定义了整个计算机视觉识别领域
不是那样的
我们还有一大堆很酷的问题亟待解决
比如对整个照片进行密集标记
感知分组
使我能确定每个像素点的归属
这仍是一个研究中的问题
将识别和3D整合起来

English: 
recognition with 3d is a really there's
a lot of excitement happening at the
intersection of vision and robotics and
this is this is definitely one area of
that and then anything to do with motion
affordance and and and this is another
big open area of research there is a I
put this here because Justin is heavily
involved in this in this work you know
beyond just putting labels on a sink you
actually want to deeply understand a
picture what people are doing what are
the relationship between objects and we
start getting into the the relation
between objects and this is the ongoing
project called visual genome in my lab
that justin and a number of my students
are involved and this goes far beyond

Chinese: 
这类问题在视觉和机器人的交叉领域中有很多
这仅是其中一个方面
另外的一些工作
动作，场景
这也是一个非常大的开放研究领域
我把这张图放在这里
因为Justin深入地参与了这项工作
和简单地“在东西上贴标签”比起来
你往往希望深入地理解图片
人们在做什么
各个物体之间的关系是什么
我们开始探究物体之间的关联
这是一项进行中的项目
称为visual genome，在我的实验室进行
Justin和我的一些学生参与了这个项目
这项工作也远远超过了我们刚才谈论的图像分类问题

Chinese: 
我们的愿景是什么呢？
计算机视觉领域的一个愿景
是能够看图讲故事
对吧？
想想你们自己，作为人类
你睁开眼的一瞬间就能够描述你看到的一切
另外
根据一项心理学实验的结果
我们发现只要给人们看这张图
只要超过500毫秒
仅仅半秒钟
人们就能写出一篇看图作文出来
我们为这一小时付给他们10美元呢
其实没那么长时间
不过我觉着多说点时间
他们就会把作文写得长一点
不过重点是
我们的生物视觉系统真的极为强大
我们能讲出来故事
我的梦想就是
这也是我给Andrej的毕业论文题目
就是
我们给你

English: 
image classification we we talked about
and what is one of our Holy Grails well
one of the Holy Grails of computer
vision is to be able to tell a story of
a scene right so think about you as a
human you open your eyes the moment you
open your eyes you're able to describe
what you see and in fact in psychology
experiments we find that even if you
show people this picture for only 500
milliseconds that's literally half of a
second people can write essays about it
we pay them $10 an hour so they did
every it wasn't that long but you know I
figure if we took a little longer a
little more money they'd probably write
longer essays but the point is that our
visual system is extremely powerful we
can tell stories and I would dream this
is my challenge to undress a

English: 
dissertation that can we give you give a
computer one picture and outcomes a
description like this you know and we're
getting there you'll see work that you
give the computer one picture it gives
you one sentence or you give the little
computer one picture it gives you a
bunch of short sentences but we're not
here yet but
that's one of the Holy Grail and another
Holy Grail is continuing this continuing
this Atlas I think is summarized really
well by Audrey's blog is you know take a
picture like this right
there's the stories are so refined
there's so much nuance in this picture
that you get to enjoy not only you
recognize the global sea it would be
very boring if all computer can tell you
is man man man room
you know room scale mirror whatever
cabinet Locker that's it you know here

Chinese: 
给电脑一张图片
输出一段类似这样的描述文字
我们正在逐步实现它
你能看到一些成果，输入一张图片
输出一个句子
或者你输入电脑一张图片
它输出一组短句子
不过我们还没有实现它
这是我们的一个愿景
我们的另一个愿景
从这里说起
这篇Andrej的博文总结得非常好
你瞧，照一张这样的照片
这里面的故事如此精密复杂
图片里有这么多的细微之处可供玩味
不仅仅是判断出来整体的情况
如果所有的计算机都只是告诉你
这里是人、人、人和房间
那可太无聊了
你看这里无非是一些房间，秤，镜子
锁柜，没啦
但是你能够认出来他们是谁

Chinese: 
你弄明白了奥巴马搞的小把戏
你理解了他们之间的互动
你搞懂了他们的幽默
你知道了……
如此多的细微之处
这是我们视觉世界的用武之地
我们运用我们的视觉理解力
不仅仅用于生存，寻路，制造
同样也用于社交，娱乐，理解和学习这个世界
这是我们视觉的终极目的
我无须去说服你们
计算机视觉将让我们的世界变得更美好
不用去管外面那些耸人听闻的传言
到现在，我们在工业领域也好
学术领域也好
我们在使用计算机视觉建造更好的机器人
来拯救生命
来进行探险，等等
那么我还有……

English: 
you recognize who they are you recognize
the trick Obama is doing you recognize
the kind of interaction you recognize
the humor you recognize there's just so
much nuance that this is what visual
world is about we use our ability to of
visual understanding to not only survive
navigate manipulate but we use it to
socialise to entertain to understand to
learn the world and this is where vision
you know the grand goals of vision is so
and and I don't need to convince you
that computer vision technology will
make our world a better place
despite some scary talks out there you
know even going on today in industry as
well as research world we're using
computer vision to build better robots

Chinese: 
2……3……5分钟剩余时间
好极了
我来介绍一下团队成员
Andrej和Justin和我是核心讲师
TACE请起立和大家打个招呼
能不能简短地自我介绍一下，名字
以及入学年份
不用做一个演讲
从你开始吧
（不好意思听不清楚）
很好，这些就是我们的幕后英雄
请和我们保持联络

English: 
to save lives to go deep exploring and
all this now ok so I have like what two
minutes three minutes five minutes left
great time let me introduce the team and
Andre and Justin are the co instructors
with me TAS please stand up when to say
hi to everybody can you like to say your
name quickly and you're like what year
and just don't give a speech but yeah
start start with you your name
what so so these are the heroes behind
the thing and so please stay in touch

Chinese: 
有两种最佳方式
其实我觉得只有一种方式（来联络）
然后我来告诉大家什么时候可以例外
请用Piazza和我们联络或者用这个邮件列表
任何与课程相关的内容请务必不要给我们发送个人邮件
如果你发送的是个人邮件
没有得到反馈的话，我很抱歉
因为这是个300人的大课
这个邮件列表会对邮件打标签
帮助我们来处理邮件
仅在如下情况我希望你们用个人邮件
发给我或者Andrej和Justin
就是关于私密的个人问题
我能理解你可能不愿意公开发布这样的问题
到TACE组中
可以的，但是

English: 
with us there two really the best way
and almost I almost wanted to say the
only way and I'll tell you what's the
exception is stay in touch through
Piazza as well as the staff meeting list
anything course related please please do
not send any of us personal email
because I'm just gonna say this
if you don't hear replies or your issue
is not taken care of because you send a
personal email I'm really sorry because
this is a 300 plus people class hour
this mailing list actually tags our
email and and and and help us to process
the only time I respect you to send a
personal email mostly to me and Andre
and Justin is confidential personal
issues you know and I understand if you
don't want that to be broadcasted to a
team of 10 TAS that's okay
but that should be really really minimal
at the only time that you send us an

English: 
email and also you know just again I'm
going on my turn to leave for a few
weeks starting the end of January so so
please if you decide you just want to
send an email to me and it's my like
due day for her baby
I'm not likely going to reply you
promptly sorry about that
priorities so a couple words about our
philosophy out this is we're not going
to get into the details we really want
this to be a very hands-on project and
this is really I give a lot of credit to
Justin and Andre they are extremely good
at walking through these hands-on
details with you so that when you come
out of this class you not only have a
high level understanding but you have a
thorough you have a really good ability
to to build your own deep learning code

Chinese: 
除此之外，请尽量不要使用发送个人邮件的方式
另外就是我在一月底开始
会休假一段时间
如果你们铁了心一定要发email给我
没准正赶上我生孩子
不大可能及时回复你们
抱歉啦
（笑）优先级的问题
关于我们的哲学，一些话
我不会去逐条细说
我希望我们这个可以成为一个非常棒的项目
我必须要感谢Justin和Andrej
他们极为擅长带领大家遍历各种细节手段
当你结束课程时
你不仅拥有一大堆概念
而且有足够能力去着手去实现一个
深度学习系统

English: 
we want you to be exposed to
state-of-the-art material you're going
to be learning things really that's as
fresh as 2015 and it'll be fun
you get to do things like this now not
all the time but you know like turn a
picture into Van Gogh or or this weird
and kin thing so it will be a fun class
in addition to all the important tasks
you are you you learn uh we do have
grading policies these are all on our
website I'm not going to eat to rate
this again one thing I want to be very
clear I'm actually two things what is
late policy you are grownups we treat
you like grown-ups we do not take
anything at the end of the courses who
my professors want me to go to this
conference and I have to have like three
more late days no you are responsible

Chinese: 
我们会向你们介绍最前沿的资料
你们将学习到2015年最新的进展
而且会很有意思
你们也会做这类的东西
可能不总是
比如把一张图片转换成梵高的作品
或者这种怪模怪样的东西
所以在这些重要的任务之外
这也会是一门好玩的课程
我们也有打分策略
都在我们网站上
我不做重复说明了
另外一点要明确的是
延迟政策
我们按照成年人来对待你们
我们在课程结束的时候不会接受任何
“哦我的教授让我去参加这个会议”
“我只能延迟个三到四天”
不行
你们一共有7天延迟期

English: 
for using your total late days you have
seven late days you can use them in
whatever way you want with zero penalty
beyond those you have to take a penalty
again if there's like really really
exceptional medical family emergency
talk to us on an individual basis but
anything else conference deadlines other
final exams
you know like missing cat
or whatever is we we we budgeted that
into the seven days
another thing is honor code this is one
thing I have to say with a really
straight face you are enough such a
privileged institution you're you are
grownups
I want you to be responsible for honor
code every single Stanford student
taking this class should know the honor
code if you don't there's no excuse you
should go back

Chinese: 
必须自行妥善安排
7天延迟之内
你可以任意安排时间不受惩罚
在此之外只能被惩罚
如果有极其极其特殊的
医疗或者家庭方面的紧急情况
和我们商谈个别处理
但除此之外会议、限期、其他期末考
或者什么丢掉的猫之类的
我们全都打包预留在那7天内
另一件事就是荣誉准则
这件事我必须严正声明
你们都有充分的自主权
也都是成年人
我希望你们尊重荣誉准则
每个参加本课程的斯坦福学生都应
了解该荣誉准则
一旦违背，绝不宽宥

English: 
we take collaboration extremely
seriously I almost hate to say that's
statistically given a class this big
we're going to have a few cases but I
also want you to be an exceptional class
even with the size this big we do not
want to see anything that infringes on
academic honor code so read the
collaboration policies and respect that
this this is really respecting yourself
ah I think I'm done with you know these
pre-requisite you can you can read it I'm done
with anything I want to say is there any
burning questions that you feel it's
worth asking yes a good question Andre
do you have
midterm move a bit of everything which
means they haven't figured it out

Chinese: 
我们对串通行为极为严肃
我很不想说
从统计学角度
这么大的班级一定会有个例出现
但我仍然期盼你们能成为一个特例的班级
哪怕我们有这么多人，我也希望
没有任何人违反学院的荣誉准则
所以阅读“合作”政策，并且遵重它
这也是尊重你自己
我想我差不多讲完了
这些前置要求你们可以读一下
你们有什么觉得想要问的问题吗？
Andrej，你们有没有期中……
什么都有一点

Chinese: 
说明他们其实还没搞定呢
我们会给你们一次期中模拟的
好，那么欢迎来到这门课程！

English: 
yeah we will give you sample meters okay
all right
thank you welcome to the class
the water's going
