大家好，大家好，我们应该
okay hello everyone okay we should get
开始
started
实际上仍然有很多
there actually are still quite a few
如果你想要真正的大胆，座位就会离开
seats left if you want to be really bold
就在我面前的几个座位
a couple of seats right in front of me
在前排，如果你不那么大胆
in the front row if you're less bold or
有一些在那边但是你也知道
a few over there but you know also on
有些道路是相当的中间部分
some of the roads are quite a few middle
如果人们想要真正的
seats so if people wanted to be really
有公民意识的人可能会
civic minded some people could sort of
挤到边缘，做得更多
squeeze towards the edges and make more
还有一些座位
accessible some of the seats that still
存在于课堂上，所以你知道
exist in the classroom okay so you know
这真的很令人兴奋，也很好
it's really exciting and great to see so
这里有很多人所以我非常欢迎
many people here so I'm a hearty welcome
对于CS 224，偶尔也被称为
to CS 224 in occasionally also known as
Line284，这是自然语言
Ling 284 which is natural language
深度学习的过程只有一个
processing with deep learning has just a
这是一件个人轶事
sort of a personal anecdote it still
这让我很震惊
sort of blows my mind that so many
这些天来，人们都来上这门课
people turn up to this class these days
所以在我第一个十年的时候
so for about the first decade that I
在这里教NLP你知道
taught NLP here you know the number of
我每年得到的人都差不多
people I got each year was approximately
所以这是一个更小的数量级
45 so it's an order of magnitude smaller
比现在我想的要多
than it is now which I guess says quite
关于什么是革命性的
a lot on about what a revolutionary
影响他们的人工智能
impact their artificial intelligence in
通用和机器学习深度
general and machine learning deep
学习NLP开始有了
learning NLP are starting to have in
现代社会
modern society okay
这是我们今天的计划，我的意思是
so this is our plan for today so I mean
我们会直接下来
we're really going to get straight down
今天的商业活动将会有一个简短的介绍
to business today so there'll be a brief
非常简短的介绍
very brief introduction some of the sort
当然教程很简单
of course logistics very brief
讨论和讨论人类语言
discussion and talk about human language
还有单词的意思然后我们想要得到
and word meaning and then we want to get
谈到第一件事
right into talking about the first thing
我们正在做的是
that we're doing which is coming up with
word vectors，看这个词
word vectors and looking at the word
戴维斯算法，然后排序
Davich algorithm and that'll then sort
把剩下的部分填满
of fill up the rest of the class there
仍然有两个座位在前面
are still two seats right in the front
想要坐直的人
row for someone who wants to sit right
在我面前让你知道
in front of me just letting you know
好的，这里是课程教程
okay okay so here the course logistics
简而言之，我是Christopher Manning
in brief so I'm Christopher Manning the
勇敢地成为领袖的人是
person who bravely became the head ta is
ABC，就在这里，然后我们有
ABC who's right there and then we have
非常精彩
quite a lot of wonderful
塔斯的助教们都是很好的助教
TAS to the people who are wonderful TAS
只是想站起来一会儿
just want to stand up for one moment so
我们有一些很棒的助教
we have some sense of wonderful TAS okay
很好
great
好的，你们知道这节课的内容
okay so you know when the lecture is
因为你在这里做了很受欢迎的
because you made it here and so welcome
对于SCPD人来说这也是一个
also to SCPD people this is also in a
CPD类你们可以在视频中观看
CPD class and you can watch it on video
但是我们喜欢斯坦福的学生
but we love for Stanford students to
出现在他们面前，展现他们美丽的脸庞
turn up and show their beautiful faces
在教室里，好的，网页有
in the classroom okay so the webpage has
关于教学大纲的所有信息等等
all the info about syllabus etc etc okay
这门课我们希望教什么
so this class what do we hope to teach
所以我们要教的一件事是
so one thing that we want to teach is
你知道对有效的理解
you know an understanding of effective
现代的深度学习方法
modern methods for deep learning
首先回顾一下
starting off by reviewing some of the
基础，尤其是
basics and then particularly
谈论各种技术
talking about the kinds of techniques
包括经常性的网络和
including recurrent networks and
被广泛使用的注意力
attention that are widely used for
自然语言处理模型
natural language processing models the
我们想要教的第二件事是
second thing we want to teach is a big
对人类语言的理解
picture understanding of human languages
以及一些不同的轻松和
and some of the different ease and
理解并产生
understanding and producing them of
如果你想知道很多关于
course if you want to know a lot about
人类语言有一个整体
human languages there's a whole
你可以做很多事
linguistics department you can do a lot
关于这方面的课程但我想给大家
of courses of that but so I want to give
至少有一些欣赏，所以你有
at least some appreciation so you have
一些关于挑战和挑战的线索
some clue of what are the challenges and
人类的困难和多样性
difficulties and varieties of human
语言，这也是一种
languages and then this is also kind of
一个实际的课程，就像我们想要的
a practical class like we actually want
教会你如何建立实践
to teach you how you can build practical
为一些专业工作的系统
systems that work for some of the major
NLP的部分，所以如果你去找一份工作
parts of NLP so if you go and get a job
在其中一家科技公司，他们说
at one of those tech firms and they say
嘿，你能不能给我们建一个命名实体
hey could you build us a named entity
你可以肯定我能做到
recognizer you can say sure I can do
这是一堆问题
that and so for a bunch of problems
很明显，我们不能做所有的事情
obviously we can't do everything we're
去做单词，意思是依赖性
going to do word meaning dependency
解析机器翻译，您有
parsing machine translation and you have
回答问题的选项
an option to do question answering
实际上是为这些人建立系统
actually building systems for those if
你一直在和朋友们聊天
you've been talking to friends who did
在过去的几年中
the class in the last couple of years
以下是今年的不同之处
here are the differences for this year
只是为了把事情弄清楚，所以我们
just to get things straight so we've
更新了部分内容
updated some of the content of the
我和客座教授之间的课程
course so between me and guest lectures
有了新的内容，我觉得很糟糕
there's new content what that look bad I
不知道是否会继续发生
wonder if that will keep happening we'll
查明
find out
关于各种主题有新的内容
there's new content on various topics
这是一种发展领域
that are sort of developing areas one of
这门课的问题是
the problems with this course is really
目前的深度学习领域
the area of deep learning at the moment
还在发展中
is still just developing really really
很快，这看起来就像是
quickly so it sort of seems like
一年的旧内容已经很纠结了
one-year old content it's already kinks
这是我们要更新的数据
kind of data than we're trying to update
这是我们正在做的一个很大的改变
things a big change that we're making
今年我们有五个星期的时间
this year is we're having five one-week
作业而不是三周
assignments instead of three two-week
在开始的时候
assignments at the beginning of the
当然，我还会多讲一点
course and I'll say a bit more about
在今年的一分钟内
that in a minute this year we're going
使用PI火炬而不是tensorflow
to use PI torch instead of tensorflow
我们以后再讨论这个问题——
and we can talk about that more later -
我们要提前完成作业
we're having the assignments due before
在周二或周四上课
class on either Tuesday or Thursday so
你不会分心，可以来
you're not distracted and can come to
开始上课，所以我们
class so starting off yeah so we're
试着让更温和的上升
trying to give an easier gentler ramp up
但另一方面，快速上升
but on the other hand a fast ramp up so
我们有第一个任务
we've got this first assignment which is
这很简单，但它是可行的
sort of easy but it's available right
现在，下周二和
now and is due next Tuesday and the
最后一件事是我们没有
final thing is we're not having a
这是今年的期中考试
midterm this year okay so this is what
我们这样做有5个这样的
we're doing so there are five of these
我刚才提到的作业
assignments that I just mentioned so six
第一个的百分比是12%
percent for the first one 12 percent for
其他的，我总是
each of the other ones and I've always
说我们要使用等级范围
said that we're going to use grade scope
对于评分，这真的很有帮助
for grading it's been really help out
如果你可以使用你的su网络ID
the TAS if you could use your su net ID
作为你的等级范围帐户ID
as your grade scope account ID so then
课程的第二部分
for the second part of the course people
做一个最终项目，有两个项目
do a final project and there are two
最后一个项目的选择
choices for the final project you can
或者是我们默认的最终项目
either do our default final project
对于很多人来说这是一个很好的选择
which is a good option for many people
或者你现在可以做一个定制的最终项目
or you can do a custom final project now
我将在更多的时间里讨论这个问题
I'll talk about that in a more in the
开始，这是行不通的
beginning and this is not working right
最后，我们有一个期末考试
and so then at the end we have a final
海报展示会议
poster presentation session at which
你的出席是意料之中的，我们
your attendance is expected and we're
我将会在周三的时候
going to be having that Wednesday in the
晚上可能不超过5个小时
evening probably not quite five hours
但它会在那个窗口里面我们会
but it'll be within that window we'll
把细节写在3点
work out the details in a bit three
参与人数百分比
percent for participation see the
详情请访问6天
website for details six late day's
collab
collab
像计算机一样的操作
operation like always in computer
我们想让你们做的科学课程
science classes we want you to do your
自己的工作，而不是从别人那里借东西
own work and not borrow stuff from other
人们的github x，我们真的这么做了
people's github x' and so we really do
强调你应该阅读和支付
emphasize that you should read and pay
注意协作策略
attention to collaboration policies okay
这是我们的高层计划
so here's the high-level plan for the
习题集
problem sets
作业1现在是a
so homework 1 available right now is a
希望有一个简单的斜坡，有一个I
hopefully easy on-ramp there's an I
Python笔记本只是帮助每个人
Python notebook just help get everyone
达到速度
up to speed
作业2是纯Python+numpy
homework 2 is pure Python plus numpy but
这就会开始教你们
that will start to kind of teach you
更多的是关于潜在的
more about the sort of underlying how do
如果你不这样做，我们会进行深度学习
we do deep learning if you are not so
很好，有点生锈，或者从未见过Python
good or a bit rusty or never seen Python
或者是numpy我们将会有一个额外的
or numpy we're going to have an extra
周五从1:30到周五
section on Friday so Friday from 1:30 to
2:50在斯奇林礼堂，我们将有一个
2:50 in skilling auditorium we'll have a
这是一个关于Python的评论
section that's a Python review that's
我们目前唯一的计划部分
our only planned section at the moment
我们不会有一个常规的部分
we're not gonna have a regular section
所以鼓励大家去做那个
so encourage to go to that and that will
也可以记录到SCPD和可用的
also be recorded for SCPD and available
视频也是
for video as well
然后作业3会让我们开始使用
then homework 3 will start us on using
PI火炬，然后是homeworks 4和5
PI torch and then homeworks 4 and 5
然后我们将使用PI PI手电筒
we're then going to be using pi PI torch
在GPU上，我们将会是
on GPU and we're actually going to be
使用微软Azure和大感谢
using Microsoft Azure with big thank
你是微软Azure团队的成员
yous to the kind Microsoft Azure people
谁赞助了我们的GPU计算
who have sponsored our GPU computing for
过去的三年是的，基本上我
the last three years yes so basically I
意味着所有的现代深度学习都有
mean all of modern deep learning has
移动到使用其中一个或另一个
moved to the use of one or other of the
像pi这样的大型深度学习库
large deep learning libraries like pi
在供应链或MX网络中被烧毁
torched in supply chain or MX net etc
然后在GPU上进行计算
and then doing the computing on GPU so
当然，既然我们错了
of course since we're in the wrong
我们当然应该使用
building we should of course be using
但是我的意思是，一般来说
GPUs but I mean in general the sort of
gpu的并行可伸缩性是
parallelism scalability of GPUs is
是什么推动了现代的大部分
what's powered most of modern deep
学习好的最后一个项目
learning ok the final project so for the
最终项目有两件事
final project there are two things that
你可以这样做，我们有一个默认的final
you can do so we have a default final
这是我们的最终项目
project which is essentially our final
计划
project
这是一个问题
box and so this is building a question
回答系统，我们在
answering system and we do it over the
德瓦拉说，你所建造的
squad Deva said so what you build and
如何提高你的表现
how you can improve your performance is
完全取决于你，它是开放式的
completely up to you it is open-ended
但是它有一个更简单的开始
but it has an easier start a clearly
定义的目标，我们可以有一个
defined objective and we can have a
关于事物的好坏的排行榜
leaderboard for how well things are
如果你没有明确的工作
working so if you don't have a clear
研究目标可以是一件好事
research objective that can be a good
选择你或者你可以提议
choice for you or you can propose a
定制的最终项目，假设它是
custom final project and presuming it's
明智的人会批准你的定制期末考试
sensible will approve your custom final
我们会给你反馈格式
project we will give you feedback format
作为导师或者是
someone as a mentor and either way for
只有最后一个项目我们允许
only the final project we allow teams of
两三个，但是家庭作业
one two or three but the homeworks
你应该自己去做
you're expected to do them yourself
当然，你可以和别人聊天
though of course you can chat to people
一般来说，关于这些问题
in a general way about the problems okay
所以这门课都很好
so that is the course all good
我甚至还没有落后于时间表
I'm not even behind schedule yet okay so
下一部分是人类语言
the next section is human language and
单词意思是你知道如果我真的
word meaning you know if I was really
会告诉你很多关于人类的事情
going to tell you a lot about human
需要很多时间的语言
language that would take a lot of time
我在这里没有，所以我
which I don't really have here so I'm
我想给你们讲两个故事
just going to tell you two anecdotes
关于人类语言，第一个是
about human language and the first is
这张xkcd漫画我的意思是这不是
this xkcd cartoon and I mean this isn't
不知道是什么造成的所以我实际上
not sure what make that so I actually
真的很像这个xkcd漫画它不是
really like this xkcd cartoon it's not
其中一个经典的例子
one of the classic ones that you see
通常是在这个地方，但是我
most often around the place but I
实际上，它说了很多
actually think it says a lot about
语言，值得思考
language and is worth thinking about
就像我认为很多时候
like I think a lot of the time for the
来这门课的人
kind of people who come to this class
他们主要是像CS人这样的人
who are mainly people like CS people and
人们和随机的其他人都有
people and random others there are some
我认识的其他一些人
other people I know since these people
语言学家等等，但很多
linguists and so on around but for a lot
那些像你一样的人
of those people like you've sort of
你的一生都在看正式的
spent your life looking at formal
语言和印象是
languages and the impression is that
某种人类语言有点像
sort of human languages are sort of
不知怎么的，有点支离破碎
somehow a little bit broken formal
语言，还有很多其他的东西
languages and there's really a lot more
比语言更重要的是
to it than that right that languages is
人类创造的奇迹
amazing human created sis
这是用于各种目的的
that is used for all sorts of purposes
并且能够适应各种各样的
and is adaptable to all sorts of
所以你可以做任何事情
purposes so you can do everything from
描述数学和人类
describing mathematics and human
用语言来表达对你的爱抚
language to sort of nuzzling up to your
最好的朋友，让他们去
best friend and getting them to
更好地理解你
understand you better so there's
这是人类的一件神奇的事
actually an amazing thing at human
不管怎样，我还是要读一下
language anyway I'll just read it so
这是第一个黑头发的人
it's the first person the dark-haired
不管怎样，我可以更少关心
person says anyway I could care less and
她的朋友说我认为你是指你
her friend says I think you mean you
不在乎你说你能在乎
couldn't care less saying you could care
更少意味着你至少关心一些
less implies you care at least some
数量和黑头发的人说我
amount and the dark-haired person says I
不知道我们是难以置信的
don't know we're these unbelievably
复杂的大脑通过a
complicated brain's drifting through a
徒劳地试图与一个人建立联系
void trying in vain to connect with one
另一种是盲目地逃避
another by blindly fleeing words out
在黑暗中，每一个选择
into the darkness every choice of
措辞，拼写和语调
phrasing and spelling and tone and
时间的安排带来了无数的信号
timing carries countless signals and
背景和潜台词，等等
contexts and subtext and more and every
听众解读这些信号
listener interprets those signals in
他们自己的方式语言
their own way languages in the formal
系统语言是光荣的混沌
system language is glorious chaos you
永远无法确切知道什么词
can never know for sure what any words
对任何人都意味着你能做的就是
will mean to anyone all you can do is
试着更好地猜测你的情况
try to get better at guessing how your
语言会影响人，所以你可以有
words affect people so you can have a
找到那些能找到的人的机会
chance of finding the ones that will
让他们感觉像你一样
make them feel something like what you
想让他们感觉一切都是一样的
want them to feel everything else is
无意义的
pointless
我猜你在给我一些提示
I assume you're giving me tips on how
你解释语言是因为你想要我
you interpret words because you want me
如果你觉得不那么孤单的话，谢谢你
to feel less alone if so thank you that
意思是很多但如果你只是在跑步
means a lot but if you're just running
我的句子经过了一些心理检查表
my sentences past some mental checklist
所以你可以展示你对它的了解程度
so you can show off how well you know it
那么我就会少关心，所以我认为
then I could care less and so I think
我认为这有一些好处
I think actually this has some nice
关于语言的信息
messages about how languages this
不确定的进化系统
uncertain evolved system of
交流，但我们有足够的能力
communication but somehow we have enough
同意的意思是你知道我们可以
agreed meaning that you know we can kind
很多人都在交流，但我们
of pretty much communicate but we're
做一些你的概率
doing some kind of your probabilistic
猜测人们的意思
inference of guessing what people mean
我们使用的语言不仅仅是为了
and we're using language not just for
信息功能，但社会
the information functions but the social
函数，等等，这里是
functions etc etc okay and then here's
我还想给你们讲一个
my one other thought I'd give you about
语言，如果我们想要
language so essentially if we want to
有人工智能
have artificial intelligence that's
我们需要以某种方式获得智慧
intelligent what we need to somehow get
到拥有计算机的地步
to the point of having computers that
有人类的知识
have the knowledge of human beings right
因为人类知道
because human beings have knowledge that
给他们智慧，如果你认为
gives them intelligence and if you think
关于我们如何传递知识
about how we sort of convey knowledge
在我们人类世界的周围
around the place in our human world
主要是通过人的方式
mainly the way we do it is through human
你知道的语言
language you know some kinds of
你可以为之努力的知识
knowledge you can sort of work out for
你自己做身体上的事情，对吧
yourself by doing physical stuff right I
可以按住这个，然后放下它，我
can hold this and drop it and I've
学到了一些东西，所以我学到了一点
learned something so I've learned a bit
有很多的知识
of knowledge there but sort of most of
你头脑中的知识，以及为什么
the knowledge in your heads and why
你坐在这个教室里
you're sitting in this classroom has
来自人类的交流
come from people communicating in human
语言对你来说是一个著名的
language to you so one of the famous
最著名的深度学习的人
most famous deep learning people yan
他喜欢说这句话
Laocoon he likes to say this line about
哦，你知道吗，我认为你知道
oh you know really I think that you know
这两者之间没有太大区别
there's not much difference between the
人类的智慧
intelligence of a human being and
猩猩和我认为他是
orangutan and I actually think he's
这是错误的
really wrong on that like the sense in
他的意思是一只猩猩有一个
which he means it is an orangutan has a
非常好的视觉系统猩猩
really good vision system orangutangs
很好，你知道他们的控制
have very good you know control of their
手臂就像人类的选择
arms just like human beings for picking
猩猩们可以使用工具
things up orangutangs can use tools and
猩猩可以制定计划，如果
orangutangs can make plans so that if
你把食物放在某个地方
you sort of put the food somewhere where
他们必须把木板移到
they have to sort of move the plank to
带着他们能吃的食物到岛上
get to the island with the food they can
做这样的计划在某种意义上
do a plan like that so yeah in a sense
他们有相当多的智慧
they've got a fair bit of intelligence
但是你知道，就像猩猩一样
but you know sort of orangutangs just
不像人类，为什么不
aren't like human beings and why aren't
他们撒谎
they lie
人类，我想建议你
human beings and I'd like to suggest you
原因就是人类
the reason for that is what human beings
已经实现的是我们不仅仅是有排序
have achieved is we don't just have sort
在一台电脑里，你知道一个满是灰尘的
of one computer like a you know a dusty
在你母亲的车库里旧的IBM个人电脑
old IBM PC in your mother's garage what
我们有一个人类的计算机网络
we have is a human computer network and
我们实现人类的方式
the way that we've achieved our human
计算机网络是我们使用人类的
computer network is that we use human
语言作为我们的网络语言
languages as our networking language and
所以当你想到它的时候
so when you think about it so on any
这是一种进化规模的语言
kind of evolutionary scale languages
超级超级超级最近的权利
super super super super recent right
那些生物有远见
that creatures have had vision for
人们不太了解，但你知道
people don't quite know but you know
也许是7500万年
maybe it's 75 million years or maybe
我要花很长的时间
it's longer I'd a huge length of time
人类有多长时间了
how long have human beings have had
你知道人们不知道的语言
language you know people don't know that
要么是因为你知道
either because it turns out you know
当你有化石时，你不能敲
when you have fossils you can't knock
头骨在侧面，然后说
the skull on the side and say do you
知道有语言，但你最了解
know have language but you know most
人们估计这种语言是
people estimate that sort of language is
最近的一项发明
a very recent invention before current
人类走出非洲
human beings moved out of out of Africa
所以很多人认为我们
so that many people think that we've
只有语言才能像
only had language for something like a
十万年之类的
hundred thousand years or something like
这是你知道的
that so it's sort of you know blink of
着眼于进化的时间尺度
an eye on the evolutionary timescale but
你知道这是发展的
you know it was the development of
我认为这是一种语言
language I'd assert that sort of made
不可战胜的人类
human beings invisible in invincible
是的，并不是人类
right it wasn't that human beings
发展毒牙或发展
developed poison fangs or developed the
比其他任何一个都跑得快
ability to run faster than any other
或者在他们的身上放一个大角
creature or put a big horn on their
正面或者类似的东西
heads or something like that right you
知道人类基本上是微不足道的
know humans are basically pretty puny
但是他们拥有无与伦比的优势
but they had this unbeatable advantage
他们可以和每个人交流
that they could communicate with each
其他的，因此工作要多得多
other and therefore work much more
在团队中有效地
effectively in teams and that sort of
使人类不可战胜
basically made human beings invincible
但是你知道，即使那时人类也是善良的
but you know even then humans were kind
有限的权利，这让你有了
of limited right that kind of got you to
关于石器时代的你
about the Stone Age right where you
会撞到你的石头上
could bang on your stones and with the
正确的石头会使东西变得锋利
right kind of stone make something sharp
与上帝人类的不同
to cut with what God humans beyond that
是他们发明了写作
was that they invented writing
写作是一种能力
writing was then an ability where you
不仅可以学习知识
could take knowledge and not only
用嘴对嘴进行交流
communicate it yeah mouth to mouth to
你看到的人可以把它放在
people that you saw you could put it
在你的纸莎草纸上
down on your piece of papyrus or your
一开始是粘土片或者其他什么东西
clay tablet or whatever it was at first
然后这些知识就会被发送出去
and that knowledge could then be sent
它可以在空间上被发送
places it could be sent spatially around
这个世界，然后就可以被发送出去
the world and it could then be sent
时间上，时间和年龄
temporally through time and well how old
我的意思是，我们基本上
is writing I mean we sort of basically
要知道，写的东西是对的
know about how old writing is right that
写作大约有5,000年的历史
writing is about 5,000 years old right
是的，这是令人难以置信的最近
right it's incredibly incredibly recent
在这个进化的规模上但是你知道
on this scale of evolution but you know
本质上，写作是如此的强大
essentially writing was so powerful as a
这是一种拥有知识的方式
way of having knowledge that then in
那5,000年，亚伯
those 5,000 years that Abel enabled
人类从石器时代开始
human beings to go from Stone Age sharp
你知道有一块火石
piece of Flint to you know having
iphone和所有这些东西
iPhones and all of these things there's
非常复杂的设备
incredibly sophisticated devices so
语言是很特别的东西
language is pretty special thing I'd
我想说的是，如果我去
like to suggest but you know if I go
回到我的类比那就是
back to my analogy that sort of it's
允许人类构建一个网络
allowed humans to construct a networked
更强大的计算机
computer that is way way more powerful
而不仅仅是拥有个体的生物
than just having individual creatures as
有点像猩猩一样聪明
sort of intelligent like an orangutan
你把它和我们的电脑进行比较
and you compare it to our computer
网络是一种非常有趣的
networks it's a really funny kind of
网络，你知道这些天
network right you know that these days
我们的网络在哪里运行
we have networks that run around where
我们有很大的网络带宽
we have sort of large network bandwidth
你知道我们可能会很沮丧
right you know we might be frustrated
有时是通过我们的Netflix下载
sometimes with our Netflix downloads but
总的来说，我们可以下载
by and large you know we can download
很容易就能达到几百兆字节
hundreds of megabytes really easily and
很快，我们不认为这是快速的
quickly and we don't think that's fast
够了，我们要推出了
enough so we're going to be rolling out
5 g网络，所以这是一个数量级
5g network so it's an order of magnitude
我的意思是，和
faster again I mean by comparison to
我是说人类语言是
that I mean human language is a
可怜的慢网络
pathetically slow Network right that the
你可以通过的信息量
amount of information you can convey by
人类语言非常慢，我是说你
human language is very slow I mean you
知道我在说什么
know whatever it is I sort of speak at
大约15个字，你可以
about 15 words a second right you can
开始做你的信息理论
start doing your information theory if
你知道一些事情，但你不知道
you know something right but you don't
实际上获得了很高的带宽
actually get much bandwidth the tall and
然后引导你思考如何做
then leads so you can think of how does
它起作用了所以人类提出了
it work then so humans have come up with
这个令人印象深刻的系统
this incredibly impressive system which
本质上是一种压缩形式
is essentially a form of compression
这是一种非常适应的形式
it's sort of a very adaptive form of
压缩，这样当我们说话的时候
compression so that when we're talking
对人们来说，我们假设他们有
to people we assume that they have an
大量的知识
enormous amount of knowledge in their
头和它不一样，但它是
heads which isn't the same as but it's
和我说话时的大致相似
broadly similar to mine when I'm talking
对你来说，你知道什么是英语
to you right that you know what English
单词的意思是你知道很多关于
words mean and you know a lot about how
世界是这样运作的因此我可以
the word world works and therefore I can
说一个简短的信息，只交流
say a short message and communicate only
一个相对较短的字符串
a relatively short bit string and you
其实可以理解很多，所以我
can actually understand a lot right so I
可以说是你知道的
can say sort of whatever you know
想象一个繁忙的购物中心
imagine a busy shopping mall and that
有两个人站在
there are two guys standing in front of
一个化妆品柜台，你知道我只有
a makeup counter and you know I've only
不管那是200年左右
said whatever that was sort of about 200
信息的一小部分，但这是可以实现的
bits of information but that's enabled
你要构建一个完整的视觉场景
you to construct a whole visual scene
用兆字节来表示
that were taking megabytes to represent
这就是为什么语言是这样的
as an image so that's why language is
很好，从那个更缥缈的层次
good so from that more ethereal level
现在我要讲具体的内容
I'll now move back to the concrete stuff
我们在这门课上要做的是
and what we want to do in this class is
不是解决整个语言，而是我们
not solve the whole of language but we
想要表达单词的意思
want to represent the meaning of words
所以很多语言都被绑定了
right so a lot of language is bound up
用语言和它们的意义和话语
in words and their meanings and words
可以有非常丰富的含义
can have really rich meanings right as
一旦你说了"老师"这个词
soon as you say it word teacher that's
有很多丰富的含义，或者你
kind of got a lot of rich meaning or you
可以有富有意义的行为
can have actions that have rich meaning
所以如果我说一个像预后援助这样的词
so if I say a word like prognostic aid
或者是磨磨蹭蹭或者你知道的
or dawdle or something you know these
词汇，有丰富的含义和丰富的内涵
words and have rich meanings and a lot
它们的细微差别，所以我们想
of nuance on them and so we want to
代表意义，所以问题是
represent meaning and so the question is
什么是意义所以你当然可以
what is meaning so you can of course you
字典是用来告诉你的
can dictionaries are meant to tell you
关于意义你可以查一下
about meaning so you can look up
字典和韦伯斯特说
dictionaries and Webster says sort of
试着把意义和想法联系起来
tries to relate meaning to idea the idea
用一个词或一个词来表示
that is represented by a word or a
表达一个人想要的想法
phrase the idea that a person wants to
用文字符号表示，我是指你
express by word signs etc I mean you
知道你能想到这些
know you could think that these
定义是一种逃避
definitions are kind of a cop-out
因为他们似乎在重写
because it seems like they're rewriting
也就是说，从概念的角度来看
meaning in terms of the word idea and
这在任何地方都是新的
has that really gone new anywhere
语言学家是怎么看待我的
how do linguists think about meaning I
意思是语言学家最常用的方法
mean the most common way that linguists
思考意义是一个想法
have thought about meaning is an idea
这就是所谓的变性语义学
that's called denotational semantics
在编程中也会用到
which is also used in programming
语言，所以我们的想法是
languages so the idea of that is we
把意义看作是什么
think of meaning as what things
如果我说"椅子"这个词
represent so if I say the word chair the
单词椅的外延包括
denotation of the word chair includes
这个和那个
this one here and that one that one that
一个是这个词的意思是
one that one and so the word chair is
它代表了所有的东西
sort of representing all the things that
是椅子，你可以
are chairs and you can sort of you could
然后想想像跑步这样的东西
then think of something like running as
你知道这是一种集合
well that you know there's sort of sets
人们可以参与的行动
of actions that people can just partake
这是它们的外延，这是
that that's their denotation and that's
这是你最常看到的
sort of what you most commonly see in
哲学或语言学的外延
philosophy or linguistics as denotation
这很难得到你想要的
it's kind of a hard thing to get your
以计算的方式进行操作
hands on computationally so and what do
人们最常做或使用最多的
people most commonly do or use the most
一般来说，我想我应该说
commonly do I guess I should say now for
计算出单词的意义和
working out the meaning of words and the
电脑通常都是旋转的
computer they're commonly they're turn
有些东西有点像
to something there was a bit like a
字典在网上尤其受欢迎
dictionary in particular favorite online
这是一个在线的同义词典
thing was this online thesaurus called
wordnet，它告诉你
wordnet which sort of tells you about
词的含义和关系
word meanings and relationships between
单词意思是这样的
word meanings so this is just giving you
从某种意义上说
the very slice of sense of of what's in
单词net，所以这是一个实际的
word net so this is an actual bit of
上面的Python代码，您可以输入
Python code up there which you can type
进入你的电脑，运行并执行这个操作
into your computer and run and do this
对你自己来说，这使用了一种叫做
for yourself so this uses a thing called
NLT，所以在所有的TK中都有点像
NLT K so in all TK is sort of like the
NLP的瑞士军刀意思是
Swiss Army knife of NLP meaning that
这对任何事情都不是很好
it's not terribly good for anything but
它有很多基本的工具，如果
it has a lot of basic tools and so if
你想要做一些事情
you want to do something like just get
一些东西从wordnet中显示出来
some stuff out of wordnet and show it
这是最好的使用方法
it's the perfect thing to use okay so
我来自于otk，我在导入wordnet
I'm from in otk I'm importing wordnet
然后我就可以对这个词说ok了
and so then I can say okay for the word
很好，告诉我同义词集
good tell me about the synonym sets the
很好参与，很好
good participates in and there's good
天哪，有一个形容词
goodness as a now there's an adjective
很好，有一个可估计的好
good there's one estimable good
可敬的可敬
honorable respectable and
这看起来很复杂，很难
this looks really complex and hard to
理解但是单词的措辞
understand but the idea of word wording
这使得这些非常精细
that makes these very fine-grained
一个词的感觉之间的区别
distinctions between senses of a word so
我们说得很好
we're sort of saying for good there's
有些人认为它是一个名词，对吧
what some senses where it's a noun right
这就是我买的东西
that's where you sort of I bought some
我的旅行的货物，这是一种分类
goods for my trip right so that's sort
像这样的一个名词传感器
of one of these noun sensors like this
我想
one I guess
然后是形容词传感器
then there are adjective sensors and
它试图区分这是一个
it's trying to distinguish there's a
基本形容词的好是好
basic adjective sense of good being good
然后在某种意义上
and then in certain senses there are
这些延伸的感官
these extended senses of good in
不同的方向，我想这是
different directions so I guess this is
这是有益的，也是有益的
good in the sense of beneficial and this
一种是那种人
one is sort of person who was
可敬的人或者他是个好人
respectable or something he's a good man
或者类似的东西，但是你
or something like that right so but you
知道是什么造成了这种情况
know part of what's kind of makes this
这是一个很有问题的问题
thing very problematic and practice to
当它试图制造所有这些东西的时候
use as it tries to make all these very
传感器之间的细微差别
fine-grain differences between sensors
这是一个人，几乎不能
that are a human being can barely
理解它们之间的区别
understand the difference between them
与之相关的你可以做其他的事情
and relate to so you can then do other
用wordnet做的事情，在这一点上
things with wordnet so at this bit of
你可以在上面写代码
code you can sort of work walk up and is
这是一种等级制度，有点像
a kind of hierarchy so it's kind of like
一个传统的数据库，如果我开始
a traditional database so if I start
和一只熊猫说我要开始
with a panda and say I'm going to start
有了熊猫，然后就走了，熊猫是
with a panda and walk up that pandas are
东京可能需要你们做生物研究
Tokyo needs maybe you'd guys did bio
它们是食肉动物
which are carnivores potentials mammal
block好的，这就是我想说的
block okay so that's the kind of stuff
你可以从你知道的wordnet中走出来
you can get out of wordnet you know in
练习词的意思是指所有人
practice word nedeth mean everyone sort
因为它给了你
of used to use it because it gave you
某种意义上的意义
some sort of sense of the meaning of the
但是你知道它也很好
word but you know it's also sort of well
众所周知，它从来没有这么好，所以你
known it never worked that well so you
要知道这是一种同义词集
know that it's sort of the synonym sets
忽略了很多细微的差别所以你知道
miss a lot of nuance so you know one of
“好”的同义词集
the synonym sets for good has proficient
在里面，很好地
in it and good sort of like proficient
但是不熟练的人还有更多
but doesn't proficient have some more
我认为它的内涵和细微差别
connotations and nuance I think it does
像大多数手工构建的资源一样
word net like most hand-built resources
是非常不完整的，所以
is sort of very incomplete so as soon as
你会对词语产生新的含义
you're coming to new meanings of words
或新的
or new
鸟语和俚语，wordnet给你
birds and slang words wordnet gives you
没有任何东西是人类的
nothing it's sort of built with human
以你所知的方式劳动是很难的
labor in ways that you know it's hard to
在某种程度上创造和适应
sort of create and adapt and in
特别是我们想要关注的是
particular what we want to focus on is
这似乎是你想要的最基本的东西
seems like a basic thing you'd like to
用语言来做至少是
do with words is actually at least
理解相似和关系
understand similarities and relations
在词语的意义和它的意义之间
between the meaning of words and it
结果是，你知道wordnet没有
turns out that you know wordnet doesn't
实际上做得很好，因为
actually do that that well because it
只是有一些固定的谨慎
just has these sort of fixed discreet
同义词集如果你在a中有一个词
synonym set so if you have a words in a
同义词集，有一种
synonym set that there's sort of a
同义词，但可能不完全一样
synonym but maybe not exactly the same
也就是说，如果它们不一样
meaning if they're not on the same
同义词说你不可能真的
synonym said you kind of can't really
测量其部分相似度
measure the partial resemblances of
对他们来说意义非凡
meaning for them so something like good
神奇的是不一样的同义词
and marvelous aren't in the same synonym
但是他们有一些东西
said but there's something that they
分享你想要的共同点
share in common that you'd like to
表示好，这是一种趋势
represent okay so that's kind of going
引导我们想要做某事
to lead into us wanting to do something
不同的，更好的正向意义
different and better forward meaning and
在到达那里之前，我只是想要
before getting there I just sort of want
再一次，从某种程度上
to again sort of build a little from
传统的NLP传统的NLP
traditional NLP so traditional NLP in
这门课的背景是
the context of this course sort of means
自然语言处理直到
natural language processing up until
大约在2012年有一些
approximately 2012 there were some
早期的前因，但基本上
earlier antecedents but as basically in
2013年，事情真的开始发生变化
2013 that things really began to change
人们开始使用神经网络
with people starting to use neural net
自然的样式表示
style representations for natural
语言处理一直到2012年
language processing so up until 2012
你知道我们有语言，他们是
standardly you know we had words they're
只是口头上说我们有酒店会议
just words so we had hotel conference
汽车旅馆有文字，我们有neo
motel there were words and we'd have neo
词汇图标，并将文字输入到我们的模型中
lexicons and put words into our model
在你的网络中，这是
and in your networks land this is
被称为地方主义代表
referred to as a localist representation
我下次再讲这些术语
I'll come back to those terms again next
时间，但这是一种意义
time but that's sort of meaning that for
任何概念都有一个
any concept there's sort of one
特定的地方，也就是旅馆
particular place which is the word Hotel
或者汽车旅馆一种思考方式
or the word Motel a way of thinking
关于这一点，我们要考虑的是
about that is to think about what
当你制造一台机器时
happens when you build a machine
学习模式，如果你有
learning model so if you have a
分类变量就像你有单词一样
categorical variable like you have words
有了单词的选择，你想要
with the choice of word and you want to
坚持住
stick there
变成了某种分类器
into some kind of classifier in a
机器学习模式
machine learning model somehow you have
要对分类变量进行编码
to code that categorical variable and
标准的做法是
the standard way of doing it is that you
用不同级别的
code it by having different levels of
这个变量意味着你有一个
the variable which means that you have a
矢量，你有这个词
vector and you have this is the word
这是猫这个词这是
house this is the word cat this is the
单词dog这个词就是这个词
word dog this is the word some just this
这个词是“顺”这个词
is the word agreeable this is the word
这是酒店这个词
something else this is the word hotel
这是另一个词
and this is the another word for
这是不同的权利
something different right so that you
在这个位置上放一个
have put a one at the position and in
你的净土地，我们称其为热
your net land we call these one hot
向量，所以这些可能是我们的
vectors and so these might be our one
酒店和汽车旅馆的热载体
hot vectors for hotel and motel so there
这里有一些不好的地方
are a couple of things that are bad here
这是一种实用的方法
the one that's sort of a practical
妨害是你知道语言有一个
nuisance is you know languages have a
很多词都是正确的
lot of words right so one of those
你可能还有的字典
dictionaries that you might have still
在学校里他们可能有
had in school they probably have about
25万个词，但你知道
250,000 words in them but you know if
你开始变得更专业
you start getting to more technical and
科学英语很容易达到a
scientific English it's easy to get to a
我的意思是，实际上这个数字
million words I mean actually the number
你在一门语言中使用的词语
of words that you have in a language
就像英语一样，它是无限的
like English it's actually infinite
因为我们有这些过程
because we have these processes which
被称为派生形态学
are called derivational morphology where
你可以通过添加更多的单词
you can make more words by adding
以现有的词语结尾，这样你就知道
endings on to existing words so you know
你可以从类似的东西开始
you can start with something like
父性的父性，然后你可以
paternal as fatherly and then you can
从父亲的说法你可以说
sort of say from paternal you can say
家长式的或家长式的
paternalist or paternalistic or
家长式作风，我做到了
paternalism and put I did it
家长式地说，所有这些
paternalistically right there all these
你可以通过这些方式来表达更大的词汇
ways that you can make bigger words by
在里面添加更多的东西，真的
adding more stuff into it and so really
你会得到一个无限的空间
you end up with an infinite space of
是的，这是一个小问题
words yeah so that's a minor problem
是的，我们有非常大的矢量
right we have very big vectors if we
想要代表一个合理的规模
want to represent a sensible sized
词汇量，但还有更大的
vocabulary but there's a much bigger
问题比那好
problem than that which is well
这正是我们想要做的
precisely what we want to do all the
时间是我们想要理解的
time is we want to sort of understand
语言意义上的关系
relationships in the meaning of words so
你知道一个很明显的例子
you know an obvious example of this is
网络搜索，如果我搜索
web search so if I do a search for
如果是西雅图汽车旅馆，它也会很有用
Seattle motel it'd be useful if it also
给我看了西雅图酒店的结果
showed me results that had Seattle hotel
都在页面上，反之亦然
all on the page and vice versa because
你知道旅馆和汽车旅馆
you know hotel and motels pretty much
同样的事情，但是如果我们有
the same thing but you know if we have
这是我们之前的一个热矢量
these one hot vectors like we had before
他们没有相似的关系
they have no similarity relationship
在他们之间，在数学上
between them right so in math terms
这两个向量是正交的
these two vectors are orthogonal no
它们之间的相似关系
similarity relationship between them and
所以你现在什么都没有了
so you kind of get nowhere now there are
你可以做的事情，我只是
things that you could do hey I just
展示了你会不会展示
showed you would and wouldn't it shows
你有一些同义词之类的
you some synonyms and stuff so that
可能会有所帮助还有其他的事情
might help a bit there are other things
你可以这样说
you could do you could sort of say well
等等，为什么我们不建立一个大的
wait why don't we just build up a big
我们有一张大桌子的桌子
table where we have a big table of words
相似点，我们可以解决这个问题
similarities and we could work with that
你知道人们过去常常尝试做
and you know people used to try and do
你知道这就是我想说的
that right you know that's sort of what
谷歌在2005年做过或者你知道的
Google did in 2005 or something you know
它有单词相似表
it had word similarity tables the
这样做的问题是我们知道
problem with doing that is you know we
我们说的是我们想要的
were talking about how maybe we want
五十万字，如果你
five hundred thousand words and if you
想要建立一个词的相似性
want to build up then a word similarity
从一个热的词表中得到的
table out of pairs of words from one hot
表示你的意思是
representations you that means that the
这个表格的大小，我的数学很好
size of that table as my math is pretty
不好的是2。5万亿
bad is that two and a half trillion it's
你自己也有很多人
some very big number ourselves in your
相似矩阵，差不多了
similarity matrix so that's almost
这是不可能的我们要做的是
impossible to do so what we're going to
取而代之的是探索一种方法
instead do is explore a method in which
我们将用语言来表示
we're going to represent words as
向量在某种程度上，我将向你们展示一个a
vectors in a way I'll show you just a a
以这样的方式，仅仅是
minute in such a way that just the
一个词的表示会给你
representation of a word gives you their
和没有进一步的工作的相似性
similarity with no further work okay and
这就会导致这些
so that's going to lead into these
不同的想法，我之前提到过
different ideas so I mentioned before
变性语义学是另一种
denotational semantics here's another
代表的意思
idea for representing the meaning of
被称为分布的词
words which is called distributional
语义，以及
semantics and so the idea of
分布语义很好
distributional semantics is well how
我们要用它来表示
we're going to represent the meaning of
一个词是通过观察上下文
a word is by looking at the contexts in
这张图是这样的
which it appears so this is a picture of
他是一位英国语言学家
Jr who as a British linguist and he's
因这句话而出名，你应该知道
famous for this saying you shall know a
公司的消息是，它保留了另一个
word by the company it keeps but another
非常有名的人
person who's very famous for
发展这个意义的概念是
developing this notion of meaning is the
哲学家的爱用的是投标常数
philosopher love did with bid constant
他后来的作品，他提到
and his later writings which he referred
作为会议意义的使用理论
to as a use theory of meeting meaning
实际上他使用了一些德语单词
actually he's used some big German word
我不知道，但我们叫它a
that I don't know but we'll call it a
使用意义理论，你知道
use theory of meaning and you know
重点是，你知道
essentially the point was well you know
如果你能解释一下
if you can explain every if if you can
解释使用正确的上下文
explain what context it's correct to use
一个特定的词与什么语境
a certain word versus in what context
用这个词就错了
would be the wrong word to use this
可能会给你留下不好的印象
maybe gives you bad memories of doing
高中英语，人们说
English in high school and people said
这个词怎么用错了
how that's wrong word to use there well
然后你就明白了
then you understand the meaning of the
这就是我的想法
word right and so that's the idea of
分布语义，它是
distributional semantics and it's been
其中一个最成功的想法
one of the most successful ideas in one
统计NLP，因为它给你一个
statistical NLP because it gives you a
学习单词意义的好方法
great way to learn about word meaning
所以我们要做的是
and so what we're going to do is we're
说一个低的词
going to say how one low at the word
银行的意思是，我要抓住很多
banking means so I'm going to grab a lot
现在很容易做的文本
of text which is easy to do now when we
拥有万维网
have the World Wide Web
我会找到很多句子
I'll find lots of sentences where the
word银行使用的是政府债务
word banking is use government debt
陷入银行业危机的问题
problems turning into banking crises has
在2009年，我只是
happened in 2009 and look these I'm just
说所有这些都是
going to say all of this stuff is the
银行这个词的意思是
meaning of the word banking that those
是这个词的上下文
are the contexts in which the word
银行被使用了，这似乎很好
banking is used and that seems it very
很简单，甚至可能不太对
simple and perhaps even not quite right
想法，但结果却是
idea but it turns out to be a very
有用的想法在这方面做得很好
usable idea that does a great job at
捕捉意义，所以我们是
capturing meaning and so what we're
去做是说，而不是我们的老
going to do is say rather than our old
我们现在要讲的地方主义代表
localist representation we're now going
用我们所谓的“a”来表示单词
to represent words in what we call a
分布式表示，等等
distributed representation and so for
我们的分布表示
the distributed representation we're
仍然代表着
still going to represent the meaning of
一个单词作为一个数word vectors，但是现在我们
a word as a numeric vector but now we're
也就是说每个人的意义
going to say that the meaning of each
word是一个小的愿望矢量，但它是
word is a small wish vector but it's
将会是一个密集的胜利者
going to be a dense victor whereby all
这些数字是非零的
of the numbers are nonzero so the
银行业的意义将会是
meaning of banking is going to be
分布在这个维度上
distributed over the dimensions of this
矢量现在我的矢量是
vector now my vector here is of
维度9因为我想保留
dimension 9 because I want to keep the
槽
slot
美好的生活并不那么好
nice life isn't quite that good in
当我们这样做时，我们会使用更大的
practice when we do this we use a larger
维度类型的
dimensionality kind of sort of the
人们使用的最小值是50
minimum that people use is 50 a typical
你可以在笔记本上使用的数字
number that you might use on your laptop
是300，如果你想要最大限度
is 300 if you want to really max out
性能
performance
也许是二千零四
maybe thousand two thousand four
但是你知道
thousand but you know nevertheless
与之相比的数量级
orders of magnitude smaller compared to
长度为五十万的矢量
a length five hundred thousand vector
好的，我们有了它们的矢量
okay so we have words with their vector
表示，因此，每个单词
representations and so since each word
会有一个矢量表示
is going to have a vector representation
然后我们有一个向量空间
we then have a vector space in which we
可以把所有的单词都放在这里
can place all of the words and that's
完全不可读但如果你放大
completely unreadable but if you zoom in
对于向量空间，它仍然是
to the vector space it's still
完全不可读但如果你放大
completely unreadable but if you zoom in
再进一步，你会发现不同的
a bit further you can find different
这个空间的一部分，这是部分
parts of this space so here's the part
国家倾向于
that where countries are tending to
有日本的德国法语俄语
exist Japanese German French Russian and
英属澳大利亚裔美国人
British Australian American France
英国，德国等等你可以改变
Britain Germany etc and you can shift
在空间的不同部分
over to a different part of the space so
这是一个空间的一部分
here's part of a space where various
动词都是存在的
verbs are so has have had been beings
剩下的就是你能在的地方
remain be are is was where you can even
看到了salm形态的形态
see the salm morphological forms are
把它们组合在一起
grouping together and things that sort
就像说的那样去想
of go together like say think expect the
这些都是
things that take those kind of
他说或想了什么
complement he said or thought something
他们现在聚在一起，我是什么
they group together now what am I
在这里展示给大家看
actually showing you here
你知道这是100年建造的
you know really this was built from 100
维度word vectors，还有
dimensional word vectors and there's
这个问题真的很难
this problem it's really hard to
可视化100维文word vectors
visualize 100 dimensional word vectors
所以这里实际发生的是
so what is actually happening here is
这些100维的词向量是
these 100 dimensional word vectors are
被投射到二维空间中
being projected down into two dimensions
你看到的是
and you're showing seeing a
二维视图，我将返回
two-dimensional view which I'll get back
在以后的时间里，不管什么时候
to later so on the one hand whenever you
看这些照片，你应该坚持下去
see these pictures you should hold onto
你的钱包因为有一个巨大的
your wallet because there's a huge
原始矢量的细节量
amount of detail in the original vector
完全被杀死的空间
space that got completely killed and
离开
went away
在二维投影中，确实有一些
in the 2d projection and indeed some of
是什么把东西推到一起
what push things together in the 2d
投射可能仅仅是
projection may really merely
真正地歪曲了
really misrepresent what's in the
原始空间，但即使是看这些
original space but even looking at these
二维表示整体感觉
2d representations the overall feeling
天哪，这是一种可行的方法
is by gosh this actually sort of works
难道我们不能看到
doesn't that we can sort of see
单词之间的相似性是的，是的
similarities between words okay so yeah
这就是我们想要的
so that was the idea of what we want to
下一部分是怎么做的呢
do the next part is then how do we
实际上，我要暂停一下
actually go about doing it I will pause
每半分钟呼吸一次
for breath per half a minute
有人有问题吗，他们快死了
has anyone got a question they're dying
要求他们不要站在NLP上
to ask they're not standing across NLP
他们根本就不是被选中的
and they're not chosen at all so what
我们要展示的是一种学习
we're going to present is a learning
算法是这样的
algorithm so where we just sort of
大量的文本和奇迹
shovel in lots of text and miraculously
这些词矢量出来了，所以
these word vectors come out and so the
学习算法本身决定了
learning algorithm itself decides the
但这实际上提醒了我
dimensions but that actually reminds me
我想说的是
of something I sort of meant to say
我的意思是，因为这是a
which was yeah I mean since this is a
向量空间在某种意义上
vector space in some sense the
它的维度是任意的
dimensions of it are arbitrary right
因为你可以知道
because you can you know just have your
在任何不同方向上的基向量
basis vectors in any different direction
你可以在第n个单词后面
and you could sort of rear nth the words
和一个不同的向量空间
and the vector space with a different
基本的基向量的集合，它会是
set of basics basis vectors and it'd be
完全相同的向量空间
exactly the same vector space just sort
绕着新矢量旋转
of rotate around your new vectors so you
要知道你不应该读太多
know you shouldn't read too much into
尽管它实际上是一种元素
the sort of elements though it actually
结果是，因为很多
turns out that because of the way a lot
深入学习的过程中
of deep learning operations work some
他们所做的事情是有一定的因素的
things they do do element wise so that
这些维度实际上倾向于
the dimensions do actually tend to get
这对他们来说是有意义的
some meaning to them it turns out but
我真正想要的东西
the thing I really want
说你知道一件事
to say was that you know one thing we
能想到的是
can just think of is how close things
在矢量空间中，这是a
are in the vector space and that's a
意思相似的概念
notion of meaning similarity that we're
去开发，但你可能希望
going to exploit but you might hope that
你会得到更多的东西
you get more than that you might
实际上认为这是有意义的
actually think that there's meaning in
不同的维度和方向
different dimensions and directions and
向量空间和答案
the word vector space and the answer
有了，我就来了
that there is there is and I'll come
再回到这一点
back to that a bit later
在某种意义上说
okay so in some sense the thing that had
在某种程度上，这是最大的影响
the biggest impact in sort of turning
神经网络中NLP的世界
the world of NLP in a neural networks
方向是这样的
direction was that picture was this
我的算法是
algorithm that tamashii me Koloff came
在2013年，这个词被称为“贝克”
up with in 2013 called the word - Beck
所以这不是第一个工作
algorithm so it wasn't the first work
有了分布的表示
and having distributed representations
这是一种古老的作品
of words so there was older work from
约书亚本吉奥，回到了
yoshua bengio that went back to about
这是千禧年的转折点
the sort of turn on the millennium but
不知怎么的，它并没有真正的
somehow it sort of hadn't really sort of
用头撞世界
hit the world over the head and had a
巨大的影响，我真的有点
huge impact and I was really sort of
tomash麦考利夫展示了这个非常简单的
tomash McAuliffe showed this very simple
非常可伸缩的学习矢量方法
very scalable way of learning vector
单词的表示形式
representations of words and that sort
真正打开了闸门
of really opened the floodgates and so
这就是我要展示的算法
that's the algorithm that I'm gonna show
好了，这个算法的概念
now okay so the idea of this algorithm
你从一大堆文字开始
is you start with a big pile of text so
无论你发现什么，你都知道
whatever you find you know webpages on
报纸上的文章很多
newspaper articles something a lot of
连续文本正确的句子
continuous text right actual sentences
因为我们想用文字来学习
because we want to learn with word
意思是语境NLP人们称之为
meaning context um NLP people call a
大量的文本，一个文集，我的意思是
large pile of text a corpus and I mean
这就是拉丁语的意思
that's just the Latin word for body
是的，这是一种重要的文本
right it's a body of text important
如果你想看起来真的很好
things know if you want to seem really
受过教育的是拉丁文这是第四个
educated is in Latin this is a fourth
名词，所以语料库的复数
declension noun so the plural of corpus
是语料，而如果你说核心
is corpora and whereas if you say core
每个人都会知道你没有
PI everyone will know that you didn't
在高中学习
study that in the high school
好的，所以我们想说
okay so right so we then want to say
每一个字都是固定的
that every word in the in a fixed
词汇，也就是
vocabulary which would just be the
语料库是由
vocabulary the corpus is represented by
一个矢量，我们只要开始这些矢量
a vector and we just start those vectors
作为随机矢量，然后是什么
off as random vectors and so then what
我们要做的是做一个大的
we're going to do is do this a big
我们通过迭代算法
iterative algorithm where we go through
文本中的每一个位置都是
each position in the text we say here's
在课本中有一个词让我们来看看
a word in the text let's look at the
关于这个和我们要讲的内容
words around that and what we're going
想要做的是说
to want to do is say well the meaning of
一个词就是我们想要使用的语境
a word is this context of use so we want
这个词在
the representation of the word in the
能够预测单词的中间部分
middle to be able to predict the words
这是我们要讲的内容
that are around that and so we're going
通过移动位置来达到这个目的
to achieve that by moving the position
这个词的意思是，我们重复一遍
of the word vector and we just repeat
十亿次，不知怎么的
that a billion times and somehow a
奇迹发生了，最后出现了
miracle occurs and out comes at the end
我们有一个单词向量空间
we have a word vector space that looks
就像一张图片，我展示了它的位置
like a picture I showed where it has a
良好的词意会很好
good meaning of word meet a good
单词意思的表示
representation of word meaning so
稍微多一点
slightly more is slightly more
图像上是这样的
graphically right so here's the
情况所以我们有一部分
situation so we've got part of our
将问题转化为银行业
corpus problems turning into banking
危机，所以我们想说的是
crises and so what we want to say is
我们想知道的是
well we want to know the meaning of the
我们将会希望
word into and so we're going to hope
它的表示可以被用在
that it's representation can be used in
我要精确地讲
the way that I'll make precise to
预测语境中出现的词语
predict what words appear in the context
因为这就是
of into because that's the meaning of
所以我们要试着做
into and so we're going to try and make
那些预测看我们能有多好
those predictions see how well we can
预测然后改变矢量
predict and then change the vector
用一种我们自己的方式来表达
representation of words in a way that we
可以更好地做这个预测
can do that prediction better and then
一旦我们处理好了，我们就继续
once we've dealt with into we just go on
下一个单词，我们说，好吧
to the next word and we say okay let's
以银行业为词
take banking as the word the meaning of
银行业正在预测环境
banking is predicting the context and
哪一种银行发生在这里有一个背景
which banking occurs here's one context
让我们试着预测这些词
let's try and predict these words that
发生在银行周围，看看我们是怎么做的
occur around banking and see how we do
然后我们再从这里开始
and then we'll move on again from there
好的
okay
听起来很简单现在我们继续，排序
sounds easy so far now we go on and sort
再多做一点，总的来说
of do a bit more stuff okay so overall
我们有很长的一段时间
we have a big long
大写的T字所以如果我们有一个整体
of capital T word so if we have a whole
我们连接的很多文档
lot of documents we just concatenate
它们都在一起我们说，好吧
them all together and we say okay here's
十亿字作为一长串的
a billion words as a big long list of
我们要做的是
words and so what we're going to do is
我们要讲的第一种产品
for the first product we're going to
把所有的单词都读一遍
sort of go through all the words and
然后我们要讲的第二种产品
then for the second product we're going
说我们要选择一些固定的
to say we're going to choose some fixed
大小窗口，你知道它可能是5
size window you know it might be five
每一面或某物
words on each side or something and
我们要试着预测10
we're going to try and predict the ten
在我们的中心词周围的词
words that are around our Center word
我们将从这个意义上预测
and we're going to predict in the sense
试图预测这个词
of trying to predict that word given the
中心词这是我们的概率模型
Center word that's our probability model
所以如果我们把所有这些相乘
and so if we multiply all those things
这就是我们的模型可能性
together that's our model likelihood is
在预测这个问题上，它的工作有多好
how good a job it does at predicting the
每个词和那个模型的词
words around every word and that model
可能性将取决于
likelihood is going to depend on the
我们的模型的参数
parameters of our model which we write
在这个特定的模型中
as theta and in this particular model
唯一的参数是惰性
the only parameters inert is actually
这是矢量表示
going to be the vector representations
我们给模型两个词
we give two words the model has
绝对没有其他参数
absolutely no other parameters to it so
我们只是想说，我们是
we're just going to say we are
在a中代表一个单词
representing a word with a vector in a
向量空间和它的表示
vector space and that representation of
这就是这个意思，然后我们就开始
it is this meaning and we're then going
能够用它来预测
to be able to use that to predict what
换句话说，我要讲的是
other words occur in a way I'm about to
给你看，这就是我们的可能性
show you okay so that's our likelihood
所以我们在所有这些模型中所做的
and so what we do in all of these models
我们是否定义了一个目标
is we sort of define an objective
函数，然后我们将是I
function and then we're going to be I
想要得到矢量
want to come up with vector
以这样的方式表示词语
representations of words in such a way
为了最小化我们的目标函数
as to minimize our objective function
所以我们的目标函数是
so our objective function is basically
和上半部分的情况一样
the same as what's on the top half of
这张幻灯片，但是我们改变了一些
the slide but we change a couple of
我们在前面加个负号
things we stick a minus sign in front of
这样我们就可以做最小化，而不是
it so we can do minimization rather than
完全任意的最大化
maximization completely arbitrary makes
没有区别，我们把1和T放在一起
no difference we stick a 1 and T in
在它前面，这样我们就能算出
front of it so that we're working out
这是一种平均的
the sort of average as ever goodness of
预测每一个发送a的选择
predicting for each choice of sent a
再说一遍，这是不可能的
word again that sort of makes no
不同之处，但它保留了
difference but it kind of keeps the
不依赖于
scale of things not dependent on the
主体的大小
size of the corpus the bit
重要的是我们要坚持a
it's actually important is we stick a
在函数的前面进行记录
log in front of the function that was up
因为事实证明
there because it turns out that
一切都很美好，当你
everything always gets nice and when you
在产品前粘贴日志
stick logs in front of products and when
你在做一些优化之类的事情
you're doing things like optimization so
当我们这样做时，我们得到了一个对数
when we do that we've then got a log of
所有这些产品都能让我们
all these products which will allow us
把东西变成或变成
to turn things in or into a sums of the
这个概率的对数，我们会
log of this probability and we'll go
再过一分钟，好的
through that again in just a minute okay
如果我们能改变，如果我们能改变
and so if we can MIT if we can change
我们的矢量表示
our vector representations of these
为了尽量减少这个J的值
words so as to minimize this J of theta
这意味着我们很擅长预测
that means we'll be good at predicting
单词在另一个词的语境中
words in the context of another word so
这听起来不错，但确实如此
then that all sounded good but it was
都依赖于这个概率
all dependent on having this probability
你想要预测的函数
function where you want to predict the
在上下文中出现一个单词的概率
probability of a word in the context
考虑到中心词和问题
given the Center word and the question
你是怎么做到的呢
is how can you possibly do that
还记得我说过的吗
well remember what I said is actually
我们的模型将会有矢量
our model is just going to have vector
语言的表达，那是
representations of words and that was
现在模型的唯一参数是
the only parameters of the model now
这几乎是真的，不是很好
that's that's almost true it's not quite
真实的
true
实际上，我们稍微作弊了
we actually cheat slightly since we
实际上向矢量提出
actually propose to vector
每个单词和这个的表示
representations for each word and this
这样做更简单，你不能
makes it simpler to do this you cannot
这样做有很多方法可以解决
do this there are ways to get around
但这是最简单的方法
that but this is the simplest way to do
所以当我们有一个词的时候
it so we have one vector for a word when
它是预测的中心词
it's the Center word it's predicting
换句话说，我们有第二个矢量
other words but we have a second vector
对于每个单词，当它是一个上下文单词时
for each word when it's a context word
这是其中一个词和语境
so that's one of the words and context
对于每个单词类型我们有这两个
so for each word type we have these two
将一个单词作为上下文单词发送
vectors as sent a word as context word
然后我们要算出这个
so then we're going to work out this
在上下文中出现一个单词的概率
probability of a word in the context
给定一个中心词，纯粹是
given a Center word purely in terms of
这些向量和我们的方法是
these vectors and the way we do it is
有了这个方程，我就会
with this equation right here which I'll
请稍后再解释，所以我们
explain more in just a moment so we're
仍然在完全相同的情况下
still on exactly the same situation
是的，我们想要解决这个问题
right that we're wanting to work out
发生的单词的概率
probabilities of words occurring the
中心词的背景，所以中心
context of our Center word so the center
单词是C
word is C and
用oh和
takes words represented with oh and
这是一些幻灯片符号
these some of this slide notation but
我们基本上是在说
sort of we're basically saying there's
中心词的一种矢量
one kind of vector for Center words
有一种不同的矢量
there's a different kind of vector for
上下文词汇，我们将会工作
context words and we're going to work
在这个概率预测中
out this probabilistic prediction in
这些词向量的术语，好吧
terms of these word vectors okay so how
我们能做得很好我们的做法是
can we do that well the way we do it is
这个公式是这样的
with this formula here which is a sort
一遍又一遍的形状
of shape that you see over and over
再一次，深入地学习
again and deep learning with categorical
这些东西都是为了中心的
stuff so for the very center bit of at
橙色的部分是一样的
the bit in orange more the same thing
发生在分母上
occurs in the denominator what we're
在这里做的是计算一个点积
doing there is calculating a dot product
所以我们要讲的是
so we're going to go through the
我们的矢量分量
components of our vector and we're going
把它们相乘，这就意味着
to multiply them together and that means
如果不同的词有很大的分量
if different words have big components
在相同的符号中加上或减去
of the same sign plus or minus in the
同样的位置点积会是
same positions the dot product will be
如果它们有不同的符号
big and if they have different signs or
一个大的和一个小的点积
ones big and one small the dot product
会小得多，所以橙色
will be a lot smaller so that orange
部分直接计算a
part directly calculates sort of a
单词之间的相似性
similarity between words where the
相似之处就在于这些向量
similarity is there sort of vectors
看起来是一样的，这就是
looking the same right and so that's the
它的核心是正确的所以我们将会有
heart of it right so we're going to have
有相似向量的词很接近
words that have similar vectors is close
在向量空间中
together in the vector space have
类似的意思，对于剩下的部分
similar meaning so for the rest of it
接下来我们要做的是
so the next thing we do is take that
编号，并放入一个x参数
number and put an x parameter so the
指数有这么好的性质
exponential has this nice property that
不管你用什么数字
no matter what number you stick into it
因为点积可能是
because the dot product might be
积极的或消极的，它将会到来
positive or negative it's going to come
作为一个正数，如果我们
out as a positive number and if we
最终想要得到一个概率
eventually want to get a probability
如果我们有正的，那就太好了
that's really good if we have positive
数字，而不是负数
numbers and not negative numbers so
这很好，然后是第三部分
that's good then the third part of it
蓝色的部分是我们想要的
which is the bit in blue is we wanted to
有概率和概率
have probabilities and probabilities are
加起来等于1，所以我们这样做
meant to add up to one and so we do that
以最愚蠢的方式
in the standard dumbest possible way we
总结一下这个量是多少
sum up what this quantity is that every
我们的词汇中有不同的词
different word in our vocabulary and we
除以它，这样
divide through by it and so that
使事物正常化
normalizes things and
它们变成了概率分布
them into a probability distribution
是的，这是一种实践
yeah so there's sort of in practice
有两部分是橙色的
there are two parts there's the orange
这是使用点的概念
part which is this idea of using dot
在向量空间中的乘积
product in a vector space as our
词与词之间的相似性度量
similarity measure between words and
第二部分是剩下的部分
then the second part is all the rest of
它是我们通过我们的
it where we feed it through what we
参考并使用所有的时间作为a
refer to and use all the time as a
软max分布，所以这两个部分
softmax distribution so the two parts of
x骨
the x bone
正常化会给你一个软的最大值
normalizing gives you a soft max
分布和软的下一个函数
distribution and soft next functions
会把任何数字映射到a
will sort of map any numbers into a
概率分布总是
probability distribution always for the
我给出的两个理由，所以它是
two reasons that I gave and so it's
因为它被称为软的最大值
referred to as a soft max because it
就像一个软的max，如果你
works like a soft max right so if you
有数字，你可以直接说
have numbers you could just say what's
这些数字的最大值，你知道
the max of these numbers and you know
这是一种心脏，如果你
that's sort of a heart if you sort of
把你的原始数据映射到
map your original numbers into if it's
最大的最大值和其他的东西
the max of the max and everything else
是0，这有点难，但是
is 0 that's sort of hard max soft but
这是一个软的最大值，因为
this is a soft max because the
指数函数，如果你
exponential you know if you sort of
想象一下，如果我们忽略
imagine this but if we just ignore the
暂时的问题负数
problem negative numbers for a moment
然后去掉X，然后排序
and you got rid of the X then you'd sort
有可能出现的概率
of been coming out with a probability
分布，但大小是这样的
distribution but by-and-large is so be
相当平，不会特别
fairly flat and wouldn't particularly
找出不同xim的最大值
pick out the max of the different xim
当你把它们取幂的时候
burs whereas when you exponentiate them
这是一个很大的数字
that sort of makes big numbers way
更大，所以这是最软的
bigger and so this the softmax sort of
主要是把质量放在最大的地方或者是
mainly puts mass where the max is or the
两个最大的数是最大的
couple of maxes are so that's the max
部分和软部分是这样的
part and the soft part is that this
这不是一个艰难的决定吗
isn't a hard decision still spreads a
小概率质量
little bit of probability mass
其他地方都可以，现在我们有了
everywhere else okay so now we have a
损失函数我们有一个损失函数
loss function we have a loss function
在内部有一个概率模型
with a probability model on the inside
我们可以建立，我们想要的
that we can build and so what we want to
能够做的就是移动我们的矢量
be able to do is then move our vector
关于单词的表示
representations of words around so that
他们善于预测什么词
they are good at predicting what words
发生在其他词的语境中
occur in the context of other words and
所以
so
在这一点上，我们要做的是
at this point what we're going to do is
优化，所以我们有矢量
optimization so we have vector
不同单词的组成部分
components of different words we have a
非常高维的空间
very high dimensional space again but
你知道，我只是为了
you know I've just got to for the
我们想说的是
picture and we're going to want to say
我们如何最小化这个函数
how can we minimize this function and
我们想要一个数字
we're going to want a jiggle the numbers
这个词用在这个词里
that are used in the word
以这样一种方式表示
representations in such a way that we're
沿着这个空间的斜坡走下去
walking down the slope of this space I
沿着梯度走，然后我们
walking down the gradient and then we're
将函数最小化
going to minimize the function and we
找到了很好的表达方式
found good representations for words so
为了这个案例，我们想做这个
doing this for this case we want to make
一个非常大的贝克德
a very big Bek de in a very high
所有的空间向量空间
dimensional vector space of all the
我们模型的参数和唯一的参数
parameters of our model and the only
这个模型的参数是
parameters that this model has is
就是向量空间
literally the vector space
单词的表示，如果它们是
representations of words so if they're
百维字表示
hundred dimensional word representations
它们是一百个参数
they're sort of a hundred parameters for
aardvark的100个参数
aardvark in context 100 parameters for
“艺术”这个词在语境中
the word art in context etc going
通过100个参数
through 100 parameters for the word
aardvark是一个中心词，等等
aardvark is a center word etc etc
通过这个，我们得到了一个很大的矢量
through that gives us a high big vector
优化的参数，我们
of parameters to optimize and we're
运行这个优化，然后
going to run this optimization and then
把它们移下来，是的，这是
move them down yeah so that's
本质上，我想要的是
essentially what you do I sort of wanted
详细地讲一下这个
to go through the details of this just
所以我们经历了一些事情
so we've kind of gone through things
具体来说，确保每个人都在
concretely to make sure everyone is on
同样的页面，所以我怀疑你知道
the same page so I suspect that you know
如果我试着做这个
if I try and do this
具体来说，有很多人
concretely there are a lot of people
这将会让一些人感到厌烦
that this will bore and some people that
威尔的表现很糟糕，所以我要
are will bore very badly so I'll
为你道歉但你知道我是
apologize for you but you know I'm
希望和思考
hoping and thinking that there are
可能有些人还没有做过
probably some people who haven't done as
最近的很多东西，可能会
much of this stuff recently and it might
其实很好
just actually be good to do it
具体地说，让每个人都加快速度
concretely and get everyone up to speed
在开始的时候
right at the beginning yeah
你吃得特别好，所以我们
that you eat specifically well so we're
所以我们计算U和V的方法
so the way we calculate the U and V
向量就是我们要做的
vectors is we're literally going to
从每个单词的随机向量开始
start with a random vector for each word
然后我们吃起了床上的东西
and then we eat reveille going to change
这些向量在我们学习的时候
those vectors a little bit as we learn
我们的方法是
and the way we're going to work out how
要改变它们，我们会说
to change them is we're going to say I
想要进行优化，这是
want to do optimization and that is
我们已经实现了
going to be implemented as okay we have
每个单词的当前向量让我
the current vectors for each word let me
做一些计算来计算我能做什么
do some calculus to work out how I could
改变矢量这个词的意思是
change the word vectors to mean that the
词向量会计算更高
word vectors would calculate a higher
实际上是单词的概率
probability for the words that actually
出现在这个中心词的语境中
occur in context of this Center word and
我们会这样做，我们会再做一次
we will do that and we'll do it again
一次又一次，我们会
and again and again and then we'll
最终得到好的词向量
eventually end up with good word vectors
谢谢你的问题，因为
thank you for that question because
这就是你想要的概念
that's the concept that you're meant to
明白了这是怎么回事
have understood how this works and maybe
我没有解释那个高水平的食谱
I didn't explain that high level recipe
好了
well enough
好的，那么我们就开始吧
yeah okay so yeah so let's just go
通过它，我们已经看到了，所以我们
through it so we've seen it right so we
如果我们想要这个公式
had this formula that we wanted to
最大化你知道原始函数
maximize you know original function
这是T=1到T的乘积
which was the product of T equals 1 to T
然后是单词的乘积
and then the product of the words
负M的位置等于j
position of minus M where equal to j
小于或等于m j不等于
less than or equal to m j not equal to
W'的概率为0
zero of the probability of W Prime at T
加上J给定的重量
plus J given WT according to the
我们模型的参数，然后
parameters of our model okay and then
我们已经看到了我们要去的
we'd already seen that we were going to
把它转换成我们的函数
convert that into the function that we
我们将使用J的位置
were going to use where we have J of
在这里我们有-1的T
theta where we had the minus 1 on T of
T等于1到T的总和
the sum of T equals 1 to T of the sum of
负M和新三部曲
minus M with new trilogy
0的概率是w nu的概率
zero of the log of the probability w nu
t+j，重量，好的，我们有了这个，然后我们
t + j WT okay so we had that and then we
有了这个公式
had had this formula that the
外词的概率
probability of the outside word given
上下文词就是这个公式
the context word is this formula we just
通过x，t vc除以总和
went through of x uo t vc over the sum
W等于1的X次方等于X的大小
of W equals 1 to the vocab you size of X
好的，这就是我们的模型
u WT C okay so that's sort of our model
如果我把这个最小化，我们需要2分钟
we want 2 min if I minimize this so we
想要最小化这个，我们想要
want to minimize this and we want to
通过改变这些Pat来最小化
minimize that by changing these Pat
参数和这些参数是
parameters and these parameters are the
这些向量的内容，我们
contents of these vectors and so what we
现在要做的是做微积分
want to do now is do calculus and we
想说，让我们来研究一下
want to say let's work out in terms of
这些参数是U和V
these parameters which are our U and V
对于当前值的向量
vectors for the current values of the
Ischl眼睛之间的参数是随机的
parameters between Ischl eyes randomly
比如这个空间的斜率是多少
like what's the slope of the space where
是下坡，因为我们可以工作
is downhill because where we can work
我们要从山坡上走下去
out downhill is we got just going to
走下坡，我们的模型变得更好
walk downhill and our model get better
所以我们要用导数
so we're going to take derivatives and
计算出下坡的方向是什么
work out what direction downhill is and
然后我们想要这样走，是的，我是
then we want to walk that way yeah I'm
想要达到我想要的效果
wanting to achieve this what I want to
为我的分配概念
achieve for my distributional notion of
意思是我有一个有意义的词a
meaning is I have a meaningful word a
矢量和这个矢量知道什么词
vector and that vector knows what words
发生在它自身的背景下
occur in the context of were of itself
并且知道单词的出现
and knowing what words occur in its
上下文意味着它可以准确地给出a
context means it can accurately give a
对这些词的高概率估计
high probability estimate to those words
这发生在环境中，它会
that occur in the context and it will
低概率
give low probability
对那些不典型的词的估计
estimates to words that don't typically
发生在环境中所以你知道
occur in the context so you know if the
word是银行，我希望像这样的词
word is Bank I'm hoping that words like
分支，开放和退出将会是
branch and open and withdrawal will be
因为他们倾向于
given high probability because they tend
用“银行”这个词，我是
to occur with the word bank and I'm
希望其他一些词，比如神经
hoping that some other words like neural
网络或者其他的东西
network or something have a lower
概率因为他们不倾向于
probability because they don't tend to
用"银行"这个词就可以了
occur with the word Bank okay does that
讲得通
make sense
是的，还有一件事，我们有
yeah and the other thing I was we've got
我想说的是，你很清楚
meant to comment was you know obviously
我们不可能做到这一点
we're not going to be able to do this
很好，否则我就不去了
super well or I it's just not going to
我们可以在世界范围内
be able that we can say all the world in
上下文将是这个词
the context is going to be this word
概率是0.97，因为
with probability 0.97 right because
我们用的是一个简单的概率
we're using this one simple probability
用于预测我们所有的单词
distribution to predict all words in our
特别是我们在使用它
context so in particular we're using it
一般来说，预测10个不同的单词
to predict ten different words generally
所以我们最多只能算是
right so at best we can kind of be
给其中一个人5%的机会
giving sort of 5% chance to one of them
是的，我们不可能这样猜测
right we can't possibly be so guessing
每次你都知道
right every time and well you know
它们将会是不同的背景
they're going to be different context
有不同的词，所以你知道
with different words in them so you know
这将是一个非常松散的模型
it's gonna be a very loose model but
尽管如此，我们还是想要抓住这个事实
nevertheless we want to capture the fact
你知道，退出更多
that you know withdrawal is much more
很可能发生在“银行”附近
likely to occur near the word Bank then
就像足球一样，你知道
something like football that's you know
基本上我们的目标是好的，是的
basically what our goal is okay yes so
我们希望通过最小化
we want to maximize this by minimizing
这意味着我们想要做一些
this which means we then want to do some
微积分来解决这个问题
calculus to work this out so what we're
然后我们要做的是
then going to do is that we're going to
假设这些参数是我们的词
say well these parameters are our word
向量，我们想要的是
vectors and we're going to sort of want
将这些向量移动到工作中
to move these word vectors to work
关于如何下山的问题
things out as to how to walk downhill so
我现在要做的是
the case that I'm going to do now is
我们来看看这些参数
going to look at the parameters of this
中心词VC，并研究如何做
Center word VC and work out how to do
关于它的东西现在不是
things with respect to it now that's not
你唯一想做的事就是
the only thing that you want to do you
也要算出斜率
also want to work out the slope with
对向量的尊重
respect to the uo vector
我不打算这么做，因为时间
I'm not going to do that because time
类就会耗尽所以它会是
and class is going to run out so it'd be
很好，如果你在家做的话
really good if you did that one at home
然后你会觉得更有信心
and then you'd feel much more confident
那么我就是我想要的
right so then I'm so what I'm wanting to
求出偏导数
do is work out the partial derivative
关于我的BC矢量
with respect to my BC vector
这个量的表示
representation of this quantity that we
我们只是在看哪个是
were just looking at which is the
这里的量是我们取的
quantity in here where we're taking the
对数的对数
log of that quantity right the log of
你的X，哦，TVC除以
the X of you Oh TVC over the sum that
我们等于1的X次方，所以这个
we're equals 1 to be the X okay so this
现在我们有了一个关于除法的对数
so now we have a log of your division so
这很容易重写，我们有a
that's easy to rewrite that we have a
对数的偏导数
partial derivative of the log of the
分子，我可以把
numerator - and I can distribute the
偏导数，所以我可以——
partial derivative so I can have - the
分母的偏导数
partial derivative of the denominator
也就是这个东西的对数，好的
which is log of this thing okay so this
这是分子和分子
is sort of what was the numerator and
这是分母，所以
this is what was the denominator okay so
这部分是分子
the part that was the numerator is
其实很简单，也许我能适应它
really easy in fact maybe I can fit it
在这里，log和X都是逆的
in here so log and X are just inverses
相互抵消所以它们相互抵消了
of each other so they cancel out so
我们得到了u的偏导数
we've got the partial derivative of u OT
在这一点上，我应该
BC ok so at this point I should just
提醒大家，这里的VC是a
remind people right this VC here as a
矢量仍然是矢量，因为我们
vector of still a vectorized because we
有一百维表示
had a hundred dimensional representation
这个词的意思是多障碍
of a word so this is doing multi-barrier
在微积分中，如果你是
at calculus so you know if you're if
你们都记得这些东西
you're at all remember any of this stuff
你可以说，这是微不足道的
you can say hi this is trivial the
答案是你欠的，那就是
answer that is you owe done and that's
很好，但是如果你感觉不舒服
great but you know if you're feeling not
所有这些东西都很好
so good on all of this stuff and you
想要在这个问题上稍微作弊一下
want to sort of cheat a little on the
边试着找出你是什么
side and try and work out what it is you
可以说，好吧，让我来算一下
can sort of say well let me work out the
对1的偏导数
partial derivative with respect to one
这个向量的元素，像第一个
element of this vector like the first
这个向量的元素，我已经知道了
element of this vector well what I've
实际上是为了这个点积
actually got here for this dot product
我有1乘以C 1加上
is I have you O one times the C one plus
你欠了2乘以B C 2加上点乘点
you owe two times B C 2 plus dot dot dot
加上你欠了100乘以BC 100
plus you owe 100 times BC 100 right and
我求出了它的偏导数
I'm finding the partial derivative of
这是关于V c1的
this with respect to V c1 and hopefully
你们还记得
you remember that much calculus from
高中没有这些术语
high school of none of these terms
涉及到vc-1，所以唯一的事情是
involve vc-1 so the only thing that's
左边是你欠的，这就是
left is this you owe one and that's what
我已经在这个维度上了
I've got there for this dimension so
这个特定的参数，但我没有
this particular parameter but I don't
只需要做第一部分
only want to do the first component of
VC矢量，我也想做
the VC vector I also want to do the
第二分量VC矢量，等等
second component the VC vector etc which
意味着我将以所有的方式结束
means I'm going to end up with all of
他们恰好在一个
them turning up in precisely one of
这些东西最终的结果是
these things and so the end result is I
得到向量u，好的，但是你知道如果
get the vector u 0 ok but you know if
你有点糊涂了
you're sort of getting confused and your
大脑正在崩溃，我想它可以是
brain is falling apart I think it can be
这是一种很有用的方法
sort of kind of useful to reduce things
对于单维微积分
to sort of single dimensional calculus
实际上是在展示什么是
and actually sort of play out what's
不管怎样，这部分是
actually happening anyway this part was
很简单，分子，我们得到了
easy the numerator we get you O so
当我们这样做时，事情就不那么好了
things aren't quite so nice when we do
我们现在想要的是分母
the denominator so we now want to have
这是D DBC
this D DBC
W等于1到V的对数的对数
of the log of the sum of W equals 1 to V
在世博会的视野中我看到了，好吧
of the expo view ot me see okay so now
在这一点上，我们没有那么漂亮
at this point not quite so pretty we've
得到这个对数和X的组合
got this log sum X combination that you
看到很多，在这一点上
see a lot and so at this point you have
要记住，这条链是用的
to remember that there was use the chain
好的，我们可以这样说
rule okay so what we can say is here's
你知道我们的函数f，这是
you know our function f and here's the
函数的主体，我们想要的
body of the function and so what we want
要做的是分两步进行
to do is do it in two stages so that at
今天结束的时候我们有了这个VC
the end of the day we've got this VC at
最后我们得到了一些函数
the end so we have sort of some function
这是BC的函数
here there's ultimately a function of BC
所以我们要用
and so we're going to do it with the
链式法则所以链式法则是我们首先
chain rule so the chain rule is we first
对它求导
take the derivative of this outside
把东西放进这个身体然后我们
thing putting in this body and then we
记住，log的导数是1
remember that the derivative of log is 1
在X上，我们有1除以W的和
on X so we have 1 over the sum of W
等于1到V的X和t vc
equals 1 to V of the X both uo t vc and
然后我们需要把它乘以
then we need to multiply that by then
对内部部分求导
taking the derivative of the inside part
这就是我们这里的情况
which is what we have here okay times
内部的导数
the derivative at the inside part with
重要的提醒是你需要
the important reminder that you need to
改变变量，为
do change of variables and for the
内部部分使用不同的变量
inside part use a different variable
你要把它加起来，好了
that you're summing over okay so now
我们试着求出a的导数
we're trying to find the derivative of a
X的和，我们可以做的第一件事
sum of X and the first thing that we can
做起来很简单我们可以移动
do is very easy we can move the
我们可以在一个和式中求导
derivative inside a sum so we can
重新写一下，先把它加起来
rewrite that and have it the sum first
x等于1/2 V的偏微分
of the x equals 1/2 V of the partial
关于VC的导数
derivatives with respect to VC of these
这是一个小小的进步
so that's a little bit of progress and
在这一点上，我们必须要做
at that point we have to sort of do the
链
chain
好了，这是我们的函数
all again right so here's our function
这是另一件事
and here's the thing in it again which
是VC的一些函数所以我们想要
is some function of VC so we again want
要做链式法则，我们就有了
to do the chain rule and so we then have
X的导数是X，所以我们可以
well the derivative of X is X so we can
有X=1到V的X的和
have the sum of x equals 1 to V of X of
然后我们要做乘法
UX TBC and then we're going to multiply
通过偏导数
that by the partial derivative with
关于内部uxt VC的D VC
respect to D VC of the inside uxt VC
我们之前见过这个
well we saw that one before so the
它的导数就是你
derivative of that is you well here you
因为我们用的是不同的X
X because we're doing it the different X
所以这就变成了用户体验
right so this then becomes out as UX and
所以我们有x=1/2 V的和
so we have the sum of the x equals 1/2 V
这是X ux BC乘以U的X，好的
of this x ux t BC times the U of X ok so
通过链式法则，我们得到了两次
by doing the chain rule twice we've got
现在，如果我们把它放在一起
that and so now if we put it together
你知道我们对BC的导数
you know that our derivative of BC with
这是对整个事件的尊重
respect of the whole thing this log of
一个给定的C的概率
the probability of a given C right that
对于分子来说，它只是
for the numerator it was just
然后我们要减去我们的
UO and then we're subtracting we had
这一项，有点像
this term here which is sort of a
分母，然后是这一项
denominator and then we have this term
这是分子，如果我们是
here which is a numerator so if we are
在分子中减去，我们有
subtracting in the numerator we have the
X=1到B的X的总和
sum of x equals 1 to B of the X of UX
TVC乘以UX，然后在分母上
TVC times UX and then in the denominator
我们有W=1到B的总和
we have the sum of W equals 1 to B of X
都是UWT BC，好的，我们得到了
both UWT BC okay so we kind of get
哦，等等，是的，这是对的
oh wait yeah yeah that's right okay we
这样就可以了，然后我们可以
kind of get that and then we can sort of
重新整理一下，这样我们就可以
just rearrange this a little so we can
把这一项放在前面
have this sum right out front
我们可以说这是我们的大问题
and we can say that this is of our big
X=1/2 V的和，我们可以
sum of X equals 1/2 V and we can sort of
以用户体验为结尾，然后点击ok
take that UX at the end and say okay
我们把这个放在这里，如果
let's call the put over here aux and if
我们做了一件很有趣的事
we do that sort of an interesting thing
发生了，因为我们是对的
has happened because look we've right
这里我们重新发现了完全一样的东西
here we've rediscovered exactly the same
我们用的形式作为概率
form that we use as our probability
预测的分布
distribution for predicting the
现在是单词的概率
probability of words so this is now
简单地说，X给出C的概率
simply the probability of X given C
根据我们的模型，我们可以重写
according to our model so we can rewrite
这就是说我们得到的是
this and say that what we're getting is
那么x=1到V的总和
uo- the sum of x equals 1 to V of the
X的概率是C乘以UX
probability of X given C times UX and
这是一种有趣的现象
this has a kind of an interesting
也就是说，如果你仔细想想
meaning if you think about it
所以这实际上是给我们的
so so this is actually giving us you
了解这个多维度的斜率
know our slope in this multi-dimensional
空间以及我们如何得到斜率
space and how we're getting that slope
我们是在观察观察到的
is we're taking the observed
上下文单词的表示
representation of the context word and
我们要减去的是
we're subtracting from that what our
模型认为环境应该是这样的
model thinks the context should look
比如这个模型是怎么想的
like and what does the model think that
上下文应该像这里的这个部分
context should look like this part here
这是正式的期望
is formally an expectation so what
你在做的就是找到
you're doing is you're finding the
加权平均数
weighted average of the models of the
每个词的表示相乘
representation of each word multiplied
根据电流的概率
by the probability of it in the current
这是一种预期
model so this is sort of the expected
根据我们当前的情况
context word according to our current
模型，所以我们要取差值
model and so we're taking the difference
在预期的上下文词和
between the expected context word and
出现的实际上下文单词
the actual context word that showed up
而且
and that
不同的结果是完全正确的
difference then turns out to exactly
给我们斜率，告诉我们哪个方向
give us the slope as to which direction
我们应该改变措辞
we should be walking changing the words
为了提高我们的能力
representation in order to improve our
模型预测能力
models ability to predict
好了，我们再分配两个如果是这样的话
okay so we'll assignment two if yeah so
这对你们来说是一个很好的练习
it'd be a great exercise for you guys to
试着为发送的等待做这件事
in to try and do that for the sent wait
我用文字的味道试着去做
I did the scent of words try and do it
对于上下文词汇，并展示
for context words as well and show that
你也可以做同样的事情
you can do the same kind of piece of
数学，如果我刚刚得到
math and have it work out if I just got
在最后的几分钟里
a few minutes left at the end what I
我只是想告诉你我是否能得到所有
just wanted to show you if I can get all
这样才能正常工作
of this to work all right
好的，我只是想展示一下
okay okay so I just want to just show
举个简单的例子
you a quick example so for the first
我们要做的是一个ipython
assignment we're going it's an ipython
所以如果你都准备好了
notebook so if you're all set up you
可以做木星的笔记本和你
sort of can do Jupiter notebook and you
这里有个笔记本，这是我的小宝贝
have some notebook here's my little
我要给你们看的笔记本
notebook I'm going to show you and the
技巧就是要把这个足够大
trick will be to make this big enough
人们可以看到它
that people can see it
可以读懂
that readable okay
所以numpy就是这样的
so right so so numpy is the sort of do
用Python编写的数学包，你会想要
math package in Python you'll want to
如果你不知道，就知道这一点
know about that if you don't know about
它
it
matplotlib是其中一种
matplotlib is sort of the one of the
最基本的图形包如果你不这样做
most basic graphing package if you don't
知道你会想要
know about that you're going to want to
要知道这是一种ipython
know about it this is sort of an ipython
或者是木星特别的，让你有一个
or Jupiter special and lets you have an
交互式地图绘制在里面和if
interactive map plot Lib inside and if
你想要得到幻想，你可以玩它
you want to get fancy you can play it
用你的图形风格来玩
play with your graphic styles so there's
这个scikit-learn是一种普遍的
that scikit-learn is kind of a general
机器学习包
machine learning package gen sim
而sim是一个词的相似性
and sim is kind of a word similarity
以方法开始的包
package which started off with methods
比如潜在的arisha分析
like latent arisha analysis if you know
从建模中得到的
about that from modeling word
相似之处就像
similarities there's sort of grown as a
用一个很好的方法来做单词矢量
good package for doing word vectors as
它经常被用于word
well so it's quite often used for word
向量和词的相似性是排序的
vectors and word similarities it's sort
在大范围内做事的效率
of efficient for doing things at large
是的，我还没有告诉你
scale yeah so I haven't yet told you
下次我们会有自己的
about will next time we have our own
词向量的本土形式
homegrown form of word vectors which are
我使用的手套词矢量
the glove word vectors I'm using them
不是因为它真的很重要
not because it really matters for what
我在展示但是你们知道这些向量
I'm showing but you know these vectors
很方便，结果是
are conveniently small it turns out that
Facebook和谷歌的矢量
the vectors that Facebook and Google
分发非常大的词汇表
distribute extremely large vocabulary
非常高维的情况
and extremely high dimensional so it
我花了太长时间才把它们装进去
take me just too long to load them in
这堂课的最后五分钟
the last five minutes of this class
我们在斯坦福很方便
we're conveniently in our Stanford
向量我们有一百维空间
vectors we have a hundred dimensional
向量和50维向量
vectors and 50 dimensional vectors which
对做小事情有好处
are kind of good for doing small things
坦白地说，我正在做的事情
on a laptop frankly um so what i'm doing
这是简森不支持的
here is Jenson doesn't natively support
手套矢量，但实际上是提供
glove vectors but they actually provide
一个可以转换手套文件的实用程序
a utility that converts the glove file
将单词格式化为矢量文件格式
format the word to vector file format so
我已经做过了，然后加载了一个
I've done that and then I've loaded a
前训练的词向量模型
pre trained model of word vectors and so
这就是他们所说的键控
this is what they call a keyed back
所以一个键控的矢量就不花哨了
and so a keyed vector is nothing fancy
只是你有像土豆这样的词
it's just you have words like potato and
每一个都有一个矢量
there's a vector that hangs off each one
所以这是一个很大的问题
so it's really just sort of a big
每个东西的矢量字典
dictionary with a vector for each thing
但是这个模型是一个火车模型
but so this model has been a train model
我们刚刚使用了这种算法
where we just used the kind of algorithm
我们看了看，你知道
we looked at and you know trained at
几十亿次的篡改我们的话语
billions of times fiddling our word
向量一旦我们有了一个，我们就可以
vectors and once we have one we can then
问一些问题，比如我们能说什么
ask questions like we can say what is
最相似的词
the most similar word some other words
所以我们可以取一些类似的东西
so we could take something like what are
对奥巴马来说，最相似的词是
the most similar words to Obama let's
比如说，我们赢回了巴拉克。布什。克林顿
say and we get back Barack Bush Clinton
麦凯恩，希拉里，希拉里，罗姆尼
McCain Gore Hillary Dole Rotem Romney
这似乎是一种
Kerry that seems actually kind of
有趣的是这些向量来自于一些
interesting these vectors are from a few
几年前，我们没有那个帖子
years ago so we don't have that post
奥巴马的工作人员，我的意思是你可以加入
post Obama staff I mean you can put in
另一个词你知道我们可以输入
an another word you know we can put in
比如香蕉和椰子
something like banana and we get coconut
芒果香蕉土豆菠萝
mango bananas potato pineapple we get
热带食物，所以你可以
kind of tropical food so you can
实际上你可以要求它
actually you can actually ask it for
不同于文字本身
being dissimilar to words by itself
不一样的方法不是很有用所以如果我问
dissimilar isn't very useful so if I ask
最相似的，我说负的等于
most similar and I say negative equals
香蕉我不知道你的概念是什么
banana I'm not sure what your concept of
和香蕉最不一样的是
what's most dissimilar to banana is but
实际上，你自己也不知道
you know actually by itself you don't
从这里得到有用的东西
get anything useful out of this because
你只是有点奇怪
you just sort of get these weird really
很少见的词，肯定不是
rare words which definitely weren't the
那些正在思考的人，但它会转变
ones who are thinking of but it turns
你可以做一些非常有用的事情
out you can do something really useful
有了这个消极的想法
with this negative idea which was one of
这是一个非常著名的词
the highly celebrated results of word
当它们刚开始的时候
vectors when they first started off and
这就是我们的想法
that was this idea that there is
实际上是意义的维度
actually dimensions of meaning in this
这是最重要的
space and so this was the most
著名的例子是什么
celebrated example which was look what
我们可以做的是，我们可以从
we could do is we could start with the
字
word
从人类的意义上减去
and subtract from at the meaning of man
然后我们可以加上它的意义
and then we could add to it the meaning
然后我们可以说
of woman and then we could say which
单词和向量空间是最多的
word and our vector space is most
和那个词的意思相似
similar in meaning to that word and that
会是一种方式
would be a way of sort of doing
类比可以做到
analogies would be able to do the
打比方说，男人就是女人
analogy man is the king as a woman is to
我们要做的是什么
what and so the way we're going to do
也就是说，我们想要和
that is to say we want to be similar to
国王和女人因为他们都是
king and woman because they're both
积极的和远离人类的
positive ones and far away from man and
所以我们可以手动操作
so we could do that manually
这是手动的最相似的
here's said manually most similar
积极的女性国王和我们
positive woman King negative man and we
可以运行这个，瞧它
can run this and lo and behold it
产生了蜂后
produces Queen to make that a little bit
我更容易定义这个类比类比
easier I defined this analogy analogy
谓词，所以我可以运行其他的
predicate so I can run other ones and so
我可以再做一个类比
I can run another one like analogy
日本的日本，日本的奥地利也一样
Japanese Japan Japanese Austria is too
奥地利人，你知道我认为这是公平的
Austrian and you know I think it's fair
当人们第一次看到这个的时候
to say that when people first saw that
你可以用这个简单的数学
you could have this simple piece of math
然后运行它，学习单词的含义
and run it and learn meanings of words I
实际上，这只是一种爆炸
mean it actually just sort of blew
人们的想法是多么有效
people's minds how effective this was
你知道，就像没有
you know like there are there's no
镜子和字符串，你知道
mirrors and strings here right you know
这并不是说我有一个单独的a
it's not that I have a separate a
在我的Python里有一些特殊的列表
special was sort of list in my Python
哪里有我在找的迪克
where there's a dick that I'm looking up
奥地利的奥地利，诸如此类的
for austria austrian and things like
但是这些矢量
that but somehow these vector
表示是这样的
representations are such that it is
实际上是对这些语义进行编码
actually encoding these semantic
你知道的关系，所以你可以尝试
relationships you know so you can try
不同的，你知道的不是
different ones you know like it's not
只有这个，我可以放进去
that only this one works I can put in
法国，我可以加入法语
France it since French I can put in
德国，德国，我可以输入一个
Germany it says German I can put in a
哦，是的，不是奥地利，它说
stray oh yeah not Austria and it says
澳大利亚人，你知道这是
Australian you know that somehow it's
不是这个矢量表示
weren't this vector representation of
这些想法，比如
words that the sort of these ideas like
理解两者之间的关系
understanding the relationships between
你只是在做这个矢量
words you're just doing this vector
这一百个的空间操纵
space manipulation on these hundred
它实际上没有的维度数
dimensional numbers that it actually no
关于他们，这不仅仅是
about them this not only the
词义的相似性
similarities of word meanings but
实际上是不同的语义
actually different semantic
国家之间的关系
relationships between words like country
名字和他们的人，你知道
names and their people's and you know
这真的很神奇，我们会
that's actually pretty amazing and we'll
你知道这有点让人惊讶
you know it's sort of surprising that
在向量上运行这样一个愚蠢的算法
running such a dumb algorithm on vectors
数字可以很好地捕捉到
of numbers could capture so well but
单词的意义，诸如此类
what meaning of words and so that sort
成为很多人的基础
of became the foundation of a lot of
现代分布式神经系统
sort of modern distributed neural
单词的表示，好的，我将停止
representations of words okay I'll stop
谢谢大家，再见
there thanks a lot guys and see you on
星期四
Thursday
