
English: 
The following
content is provided
under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue
to offer high quality
educational resources for free.
To make a donation or
view additional materials
from hundreds of MIT courses,
visit MIT OpenCourseWare
at ocw.mit.edu.
PROFESSOR: So today's
lecture is on sorting.
We'll be talking about specific
sorting algorithms today.
I want to start
by motivating why
we're interested in sorting,
which should be fairly easy.
Then I want to discuss
a particular sorting
algorithm that's
called insertion sort.
That's probably the
simplest sorting algorithm
you can write, it's
five lines of code.
It's not the best
sorting algorithm
that's out there and so
we'll try and improve it.

Undetermined: 
 
以下的内容被提供
在创作共享许可之下.
您的支持将有助于麻省理工学院开放式课计划继续
为免费提供优质教育资源。
进行捐赠或查看其他材料
从麻省理工学院的数百种课程，可以参观麻省理工学院开放课
在ocw.mit.edu网站。
 
教授：所以今天的演讲是排序。
我们将在今天谈论具体的排序算法。
我想开始通过激励为什么
我们感兴趣的分类，应该是相当容易的。
 
然后，我想讨论一个特定的分类
算法，这就是所谓插入排序。
这可能是最简单的排序算法
你可以写，它的五行代码。
这是不是最好的排序算法
这是在那里，所以我们会尽量改善。

Undetermined: 
我们还将讨论有关合并排序，这是一个鸿沟
而治之算法和这回事
激励，我想花时间的最后一件事，
这是复发以及如何解决复发。
通常情况下，复发，我们会
寻找在双O6顷将来自鸿沟
征服像合并排序问题
但你会看到这一遍又一遍。
因此，让我们谈谈为什么我们感兴趣的排序。
 
这里也有一些相当明显的应用
就像如果你想保持一个电话簿，
你有一大堆的名字和电话号码的对应
到电话簿，你想
让他们在有序所以它的
易搜索，MP3组织者，电子表格，等等。
因此，有很多明显的应用。
这里也有一些有趣的问题也

English: 
We'll also talk about merge
sort, which is a divide
and conquer algorithm
and that's going
to motivate the last thing
that I want to spend time on,
which is recurrences and
how you solve recurrences.
Typically the
recurrences that we'll
be looking at in double o six
are going to come from divide
and conquer problems
like merge sort
but you'll see
this over and over.
So let's talk about why
we're interested in sorting.
There's some fairly
obvious applications
like if you want to
maintain a phone book,
you've got a bunch of names
and numbers corresponding
to a telephone
directory and you want
to keep them in
sorted order so it's
easy to search, mp3 organizers,
spreadsheets, et cetera.
So there's lots of
obvious applications.
There's also some
interesting problems

English: 
that become easy once
items are sorted.
One example of that
is finding a median.
So let's say that you
have a bunch of items
in an array a zero through
n and a zero through n
contains n numbers and
they're not sorted.
When you sort, you
turn this into b 0
through n, where if
it's just numbers, then
you may sort them in increasing
order or decreasing order.
Let's just call it
increasing order for now.

Undetermined: 
这成为容易，一旦项目进行排序。
 
这方面的一个例子是找到一个值。
 
因此，让我们说，你有一堆物品
在数组中的零至n和零至n
包含n个数字和他们不排序。
 
当你进行排序，你把它变成B 0，
通过n，其中如果它只是数字，那么
你可以对它们进行排序按升序排列或降序排列。
我们只是把它增加订单了。

Undetermined: 
或者，如果他们的记录，他们不是数字，
那么你必须提供一个比较函数
以确定哪些记录比其他记录小。
这就是另一种输入你
必须有为了进行排序。
因此它并不真正的问题是什么项目
只要你有比较函数。
把它看成是小于或等于。
如果你有一个和它的简单，
显然，以检查3小于4，等等。
但是，它可能是一个有点复杂
对于更复杂的分类应用。
但底线是，如果你有你的算法
需要一个比较函数作为输入，
你要能够，一定量的时间之后，
获得B 0，ñ。
现在，如果你想找到一组数字的中位数
最初在数组中的一个，
你会怎么做，一旦你有排序排列B？
听众：有没有更有效的算法中值？

English: 
Or if they're records,
and they're not numbers,
then you have to provide
a comparison function
to determine which record is
smaller than another record.
And that's another
input that you
have to have in order
to do the sorting.
So it doesn't really
matter what the items are
as long as you have the
comparison function.
Think of it as less
than or equal to.
And if you have that and
it's straightforward,
obviously, to check that 3
is less than 4, et cetera.
But it may be a little
more complicated
for more sophisticated
sorting applications.
But the bottom line is that if
you have your algorithm that
takes a comparison
function as an input,
you're going to be able to,
after a certain amount of time,
get B 0 n.
Now if you wanted to find the
median of the set of numbers
that were originally
in the array A,
what would you do once you
have the sorted array B?
AUDIENCE: Isn't there a more
efficient algorithm for median?

English: 
PROFESSOR: Absolutely.
But this is sort of a side
effect of having a sorted list.
If you happen to
have a sorted list,
there's many ways
that you could imagine
building up a sorted list.
One way is you have something
that's completely unsorted
and you run insertion
sort or merge sort.
Another way would be to
maintain a sorted list as you're
getting items put into the list.
So if you happened
to have a sorted list
and you need to have this
sorted list for some reason,
the point I'm making here
is that finding the median
is easy.
And it's easy because
all you have to do
is look at-- depending
on whether n is odd
or even-- look at B of n over 2.
That would give you the
median because you'd
have a bunch of numbers
that are less than that
and the equal set of numbers
that are greater than that,
which is the
definition of median.

Undetermined: 
教授：当然。
但是，这是那种具有排序列表的副作用。
如果你碰巧有一个排序的列表，
有很多方法，你可以想像
建立一个排序的列表。
一种方法是，你有什么事情，是完全不排序
并在运行插入排序或归并排序。
另一种方法是保持一个排序列表作为你
越来越投入列表项。
所以，如果你碰巧有一个排序列表
你需要有这个排序列表因为某些原因，
我在这里做的一点是，找到位数
是容易的。
而且很容易，因为所有你需要做的
是看at--取决于n是奇数
或even--看的n B，历经2。
这将使你的中位数，因为你最好
有一串数字是低于
和相等组数字是大于，
这是中值的定义。

Undetermined: 
因此，这并不一定是最好的方法，正如你所指出，
找到的中位数。
但它是固定的时间，如果你有一个排序的列表。
这是我想说明一点。
还有其他的事情，你可以做。
这个就在埃里克的演讲，
这是二进制search--发现的概念
在一个元件array--的特定元素。
你有items--名单再次0到n。
你正在寻找一个具体的数字或项目。
 
你可以很明显，扫描阵列，
这将带你线性时间找到这个项目。
如果阵列最少要排序，
那么你就可以在对数时间发现这

English: 
So this is not necessarily the
best way, as you pointed out,
of finding the median.
But it's constant time if
you have a sorted list.
That's the point
I wanted to make.
There are other things
that you could do.
And this came up
in Erik's lecture,
which is the notion of
binary search-- finding
an element in an array--
a specific element.
You have a list of items--
again a 0 through n.
And you're looking for a
specific number or item.
You could, obviously,
scan the array,
and that would take you
linear time to find this item.
If the array happened
to be sorted,
then you can find this
in logarithmic time

Undetermined: 
用什么叫做二进制搜索。
 
比方说，你正在寻找一个特定的项目。
让我们把它叫做ķ。
二进制搜索，粗略地讲，将
工作like--你去比较ķ到n再次，B，历经2，
而定，因为B被排序，
你看阵列的1/2。
如果n超过2的B是不完全K，then--好，
如果它恰有k大功告成。
否则，你看一下左边的一半。
你做你的分而治之的范例。
你也可以在对数时间做到这一点。
所以，记住这一点，因为二进制搜索
是要拿出在今天的演讲
又在其他的讲座。
这真是鸿沟的伟大范例
和conquer--可能是最简单的。
而且，从本质上讲，需要的东西

English: 
using what's called
binary search.
Let's say you're looking
for a specific item.
Let's call it k.
Binary search, roughly
speaking, would
work like-- you go compare
k to, again, B of n over 2,
and decide, given
that B is sorted,
you get to look at
1/2 of the array.
If B of n over 2 is not
exactly k, then-- well,
if it's exactly k you're done.
Otherwise, you look
at the left half.
You do your divide
and conquer paradigm.
And you can do this
in logarithmic time.
So keep this in mind,
because binary search
is going to come up
in today's lecture
and again in other lectures.
It's really a great
paradigm of divide
and conquer--
probably the simplest.
And it, essentially,
takes something

Undetermined: 
这是linear--线性search--
并把它变成对数查找。
因此，这些都是几个问题
这变得容易，如果你有一个排序的列表。
而有一些不那么明显的应用
的sorting--例如，数据压缩。
如果你想压缩文件，
的事情，你可以做一个是用于：
这是一个集items--你可以在项目进行排序。
并且自动查找重复的。
你可以说，如果我有100个项目，都是相同的，
我打算由代表项目一旦压缩文件
和，然后，具有若干相关联
与该频率item--类似于
文档距离一样。
文件距离可以被看作是一个方式
压缩的初始投入。

English: 
that's linear--
a linear search--
and turns it into
logarithmic search.
So those are a
couple of problems
that become easy if
you have a sorted list.
And there's some not
so obvious applications
of sorting-- for example,
data compression.
If you wanted to
compress a file,
one of the things that
you could do is to--
and it's a set of items--
you could sort the items.
And that automatically
finds duplicates.
And you could say, if I have 100
items that are all identical,
I'm going to compress the file
by representing the item once
and, then, having
a number associated
with the frequency of that
item-- similar to what
document distance does.
Document distance can
be viewed as a way
of compressing
your initial input.

Undetermined: 
很显然，你失去莎士比亚的作品，或不管它是什么。
并且变得一串单词和频率。
但它是什么，压缩输入
并给你一个不同的表示。
等人使用排序在数据压缩子程序。
 
计算机图形使用的排序。
大多数时候，当你呈现
在计算机图形的场景，你有很多层
对应于该场景。
事实证明，在计算机图形学，
大部分的时间你实际渲染
从前到后，因为当你有一个大的不透明
在前面的对象，要渲染，首先，
所以你不必担心一切的
通过这个大的不透明物体遮挡。
这让事情变得更加高效。
等你把东西整理前后，

English: 
Obviously, you lose the works of
Shakespeare or whatever it was.
And it becomes a bunch
of words and frequencies.
But it is something that
compresses the input
and gives you a
different representation.
And so people use sorting as a
subroutine in data compression.
Computer graphics uses sorting.
Most of the time,
when you render
scenes in computer graphics,
you have many layers
corresponding to the scenes.
It turns out that,
in computer graphics,
most of the time you're
actually rendering
front to back because,
when you have a big opaque
object in front, you want
to render that first,
so you don't have to worry
about everything that's
occluded by this
big opaque object.
And that makes things
more efficient.
And so you keep things
sorted front to back,

English: 
most of the time, in
computer graphics rendering.
But some of the time, if you're
worried about transparency,
you have to render
things back to front.
So typically, you
have sorted lists
corresponding to the different
objects in both orders--
both increasing order
and decreasing order.
And you're maintaining that.
So sorting is a real
important subroutine
in pretty much any sophisticated
application you look at.
So it's worthwhile to look
at the variety of sorting
algorithms that are out there.
And we're going to do
some simple ones, today.
But if you go and
look at Wikipedia
and do a Google search,
there's all sorts
of sorts like cocktail
sort, and bitonic sort,
and what have you.
And there's reasons why each of
these sorting algorithms exist.
Because in specific
cases, they end up
winning on types of inputs
or types of problems.
So let's take a look at our
first sorting algorithm.

Undetermined: 
大部分时间，在计算机图形渲染。
但有些时候，如果你担心的透明度，
你要呈现的东西后到前。
所以通常情况下，你已经排序的列表
对应于不同的对象在两个ord​​ers--
既为了增加和减少订单。
而你保持的。
所以排序是一个真正重要的子程序
在几乎任何复杂的应用程序，你看看。
所以这是值得来看看各种排序
算法是在那里。
而我们今天要做些简单的人。
但是如果你去看看维基百科
并做了谷歌搜索，有各种
各种各样的鸡尾酒一样排序和双调排序，
和你有什么。
还有就是为什么每一种排序算法存在的理由。
因为在特定的情况下，他们最终
获胜上类型的输入或类型的问题。
 
因此，让我们来看看我们的第一个排序算法。

English: 
I'm not going to write code
but it will be in the notes.
And it is in your document
distance Python files.
But I'll just give
you pseudocode here
and walk through what
insertion sort looks like
because the purpose
of describing
this algorithm to you is
to analyze its complexity.
We need to do some
counting here,
with respect to this
algorithm, to figure out
how fast it's going to run
in and what the worst case
complexity is.
So what is insertion sort?
For i equals 1, 2, through n,
given an input to be sorted,
what we're going to do is
we're going to insert A of i
in the right position.
And we're going
to assume that we
are sort of midway through
the sorting process, where

Undetermined: 
我不会写代码，但它会在音符。
它是在你的文档距离Python文件。
但我只是给你在这里的伪代码
然后步行通过什么插入排序的样子
因为描述的目的
该算法给你的是分析它的复杂性。
我们需要在这里做了一些计算，
相对于该算法，找出
如何快速它会在运行什么最坏的情况下
复杂性。
那么，什么是插入排序？
对于i等于1,2，至n，给定一个输入进行排序，
我们要做的是我们要插入的I A
在合适的位置。
而且我们要承担我们
都是通过排序排序过程中，中途在哪里

Undetermined: 
我们已经整理一个0到零下我1。
而且我们要扩大这个这个数组
有我加1元。
和我一个是会得到插
到正确的位置。
我们打​​算通过两两交换做到这一点
到该号码的正确位置是最初
在A的我。
 
所以，让我们通过这样的一个例子。
我们将按照升序进行排序。
只要有六位数字。
 
和最初，我们有5，2，4，6，1，3。
我们要去看看这个。

English: 
we have sorted A 0
through i minus 1.
And we're going to
expand this to this array
to have i plus 1 elements.
And A of i is going
to get inserted
into the correct position.
And we're going to do
this by pairwise swaps
down to the correct position
for the number that is initially
in A of i.
So let's go through
an example of this.
We're going to sort
in increasing order.
Just have six numbers.
And initially, we
have 5, 2, 4, 6, 1, 3.
And we're going to
take a look at this.

English: 
And you start with the index
1, or the second element,
because the very first
element-- it's a single element
and it's already
sorted by definition.
But you start from here.
And this is what
we call our key.
And that's essentially a pointer
to where we're at, right now.
And the key keeps
moving to the right
as we go through the different
steps of the algorithm.
And so what you do
is you look at this
and you have-- this is A of i.
That's your key.
And you have A of
0 to 0, which is 5.
And since we want to
sort in increasing order,
this is not sorted.
And so we do a swap.
So what this would do in
this step is to do a swap.
And we would go obtain
2, 5, 4, 6, 1, 3.
So all that's happened here,
in this step-- in the very
first step where the key
is in the second position--

Undetermined: 
你开始与索引1，或者第二个元素，
因为第一个element--它是一个单一的元素
而且它已经按定义。
但是你从这里开始。
这就是我们所说的我们的关键。
而这本质上是一个什么地方，我们是在，现在。
和密钥保持向右移动
因为我们通过不同步骤的法则计算。
还等什么，你要做的就是你看这个
你have--这是一个我。
这是你的关键。
你有一个0比0，这是5。
因为我们想在递增的顺序排序，
这是不排序。
因此，我们做了交换。
所以，这是什么会做这一步是做一个交换。
我们会去得到2，5，4，6，1，3。
因此，所有这一切发生在这里，在这个step--在很
第一步骤，其中关键的是在第二position--

English: 
is one swap happened.
Now, your key is
here, at item 4.
Again, you need to put
4 into the right spot.
And so you do pairwise swaps.
And in this case, you
have to do one swap.
And you get 2, 4, 5.
And you're done
with this iteration.
So what happens here is
you have 2, 4, 5, 6, 1, 3.
And now, the key
is over here, at 6.
Now, at this point,
things are kind of easy,
in the sense that you look
at it and you say, well, I
know this part is
already started.
6 is greater than 5.
So you have to do nothing.
So there's no swaps that
happen in this step.
So all that happens
here is you're

Undetermined: 
是一个交换发生。
现在，你的关键就在这里，在第4项。
同样，你需要把4到正确的位置。
所以你做的两两互换。
在这种情况下，你必须做一个交换。
你会得到2，4，5。
而你这个迭代完成。
所以会发生什么在这里你有2，4，5，6，1，3。
而现在，最关键的是在这里，在6。
现在，在这一点上，东西都是那种轻松，
在这个意义上你看看它，你说，那好，我
知道的是已经开始这部分。
图6是大于5。
所以，你必须做什么。
所以没有掉期是发生在这一步。
因此，所发生的一切这里是你

English: 
going to move the key to
one step to the right.
So you have 2, 4, 5, 6, 1, 3.
And your key is now at 1.
Here, you have to do more work.
Now, you see one aspect of the
complexity of this algorithm--
given that you're doing
pairwise swaps-- the way
this algorithm was defined, in
pseudocode, out there, was I'm
going to use pairwise swaps
to find the correct position.
So what you're going
to do is you're
going to have to
swap first 1 and 6.
And then you'll
swap-- 1 is over here.
So you'll swap this
position and that position.
And then you'll
swap-- essentially,
do 4 swaps to get to
the point where you have
1, 2, 4, 5, 6, 3.
So this is the result.

Undetermined: 
去键移动到一个步骤到右侧。
所以，你有2，4，5，6，1，3。
和你的关键是现在1。
在这里，你必须做更多的工作。
现在，你看到的这个算法 - 复杂的一个方面
因为你正在做两两swaps--方式
该算法的定义，在伪代码，在那里，是我
将要使用的成对交换来找到正确的位置。
所以，你要做什么你
将有第1和6交换。
然后你就会swap-- 1是在这里。
所以，你会换这个位置和位置。
然后你就会swap--本质，
做4互换去，你必须点
1，2，4，5，6，3。
所以，这就是结果。
 

English: 
1, 2, 4, 5, 6, 3.
And the important thing
to understand, here,
is that you've done
four swaps to get 1
to the correct position.
Now, you could imagine a
different data structure
where you move this over
there and you shift them
all to the right.
But in fact, that shifting
of these four elements
is going to be computed
in our model as four
operations, or
four steps, anyway.
So there's no getting
away from the fact
that you have to do
four things here.
And the way the code that
we have for insertion sort
does this is by
using pairwise swaps.
So we're almost done.
Now, we have the key at 3.
And now, 3 needs to get put
into the correct position.
And so you've got
to do a few swaps.
This is the last step.

Undetermined: 
1，2，4，5，6，3。
而且重要的是要明白，在这里，
是你已经做了四互换拿到1
到正确的位置。
现在，你可以想像一个不同的数据结构
在那里你那边移动这一点，你将它们转移
所有的权利。
但事实上，这转移这四个要素
会在我们的模型来计算四
操作或四个步骤，反正。
所以没有从事实越来越远
你必须做四件事情在这里。
而且我们有插入排序的方式代码
确实，这是通过使用成对互换。
所以我们几乎完成了。
现在，我们在3键。
而现在，3需要获得投入到正确的位置。
所以，你必须做一些掉期交易。
这是最后的步骤。

English: 
And what happens here is 3 is
going to get swapped with 6.
And then 3 needs to
get swapped with 5.
And then 3 needs to
get swapped with 4.
And then, since 3 is
greater than 2, you're done.
So you have 1, 2, 3, 4, 5, 6.
And that's it.
So, analysis.
How many steps do I have?
AUDIENCE: n squared?
PROFESSOR: No, how
many steps do I have?
I guess that wasn't
a good question.
If I think of a step as
being a movement of the key,
how many steps do I have?
I have theta n steps.
And in this case, you can
think of it as n minus 1 steps,
since you started with 2.

Undetermined: 
和这里发生的事情是3，将被换为6。
然后3需要被换用5。
然后3需要被换用4。
然后，因为3大于2，就大功告成了。
所以，你有1，2，3，4，5，6。
 
就是这样。
因此，分析。
 
多少个步骤我有吗？
 
听众：N平方？
教授：没有，多少步我有吗？
我想这是不是一个好问题。
如果我认为一个步骤，因为是关键的动作，
要走多少步我有吗？
我有THETA n步。
在这种情况下，你可以把它想象为N减1步，
因为你开始2。

English: 
But let's just call
it theta n steps,
in terms of key positions.
And you're right.
It is n square because,
at any given step,
it's quite possible that
I have to do theta n work.
And one example is
this one, right here,
where I had to do four swaps.
And in general, you can
construct a scenario
where, towards the
end of the algorithm,
you'd have to do theta n work.
But if you had a list
that was reverse sorted.
You would, essentially,
have to do, on an average n
by two swaps as you go
through each of the steps.
And that's theta n.
So each step is theta n swaps.
And when I say
swaps, I could also

Undetermined: 
但是，让我们只是把它THETA n步，
在关键岗位的条款。
 
和你说得对。
它是正方形，因为，在任何给定步骤中，
这很可能是我必须做的THETAñ工作。
其中的一个例子就是这一个，就在这里，
在这里我必须做四互换。
而在一般情况下，可以构造一个场景
那里，对算法的最后，
你所要做的THETAñ工作。
但是，如果你有被反向排序列表。
你会，本质上来说，所要做的，在一个n平均
由两位交换，当您去完成每个步骤。
而这THETAñ。
因此，每一步都是THETAñ互换。
 
当我说掉，我也

English: 
say each step is theta
n compares and swaps.
And this is going
to be important
because I'm going to ask
you an interesting question
in a minute.
But let me summarize.
What I have here is a
theta n squared algorithm.
The reason this is
a theta n squared
algorithm is because
I have theta n steps
and each step is theta n.
When I'm counting,
what am I counting
it terms of operations?
The assumption here--
unspoken assumption--
has been that an operation
is a compare and a swap
and they're, essentially,
equal in cost.
And in most computers,
that's true.
You have a single
instruction and, say, the x86
or the MIPS architecture
that can do a compare,
and the same thing for
swapping registers.
So perfectly
reasonably assumption
that compares and
swaps for numbers
have exactly the same cost.

Undetermined: 
说每一步都是THETAñ比较和掉期交易。
这将是重要的
因为我要问你一个有趣的问题
在一分钟内。
但是，让我来总结。
我在这里是一个THETAñ平方算法。
究其原因，这是一个THETAñ平方
算法是因为我有THETA n步
并且每一步都是THETAñ。
当我计算，我是什么计数
它操作方面？
假设这里 - 潜assumption--
一直认为操作是一个比较和一个交换
而他们，实际上，在相同成本。
而在大多数计算机上，这是事实。
你有一个单一的指令，说，在x86
或MIPS架构，可以做一个比较，
而同样的事情交换寄存器。
如此完美合理的假设
它比较和掉期号码
具有完全相同的成本。

Undetermined: 
但是，如果你有一个记录，你比较记录，
并且您使用的记录比较函数是
本身就是一个方法调用或子程序，
它很可能是你正在做的
被交换指针或引用做掉，
但该比较可以是基本上更昂贵。
 
大部分的时间 - 和我们将区分
如果它变得necessary--我们要去
在排序算法进行比较，计算
我们会被扑灭。
我们会假设，无论是比较掉期
大致相同或比较are--
我们会说哪一个，course--的，比较
基本上比互换更昂贵。
所以，如果你有任何对这些案件进行插入排序，
你有一个THETAñ平方算法。
你有THETAñ平方进行比较
和θñ平方互换。

English: 
But if you had a record and
you were comparing records,
and the comparison function that
you used for the records was
in itself a method
call or a subroutine,
it's quite possible
that all you're doing
is swapping pointers or
references to do the swap,
but the comparison could be
substantially more expensive.
Most of the time-- and
we'll differentiate
if it becomes
necessary-- we're going
to be counting comparisons
in the sorting algorithms
that we'll be putting out.
And we'll be assuming that
either comparison swaps are
roughly the same or
that compares are--
and we'll say which one,
of course-- that compares
are substantially more
expensive than swaps.
So if you had either of those
cases for insertion sort,
you have a theta n
squared algorithm.
You have theta n
squared compares
and theta n squared swaps.

English: 
Now, here's a question.
Let's say that compares are
more expensive than swaps.
And so, I'm concerned
about the theta
n squared comparison cost.
I'm not as concerned, because of
the constant factors involved,
with the theta n
squared swap cost.
This is a question question.
What's a simple fix-- change
to this algorithm that
would give me a better
complexity in the case
where compares are
more expensive,
or I'm only looking at the
complexity of compares.
So the theta
whatever of compares.
Anyone?
Yeah, back there.
AUDIENCE: [INAUDIBLE]
PROFESSOR: You could
compare with the middle.
What did I call it?

Undetermined: 
现在，这里有一个问题。
比方说，相比较比互换更加昂贵。
所以，我很担心的θ
ñ平方的比较成本。
 
我是因为所涉及的常数因子的不作为有关，
与THETAñ平方换汇成本。
 
这是一个问题的问题。
什么是简单的fix--改变这种算法
会给我一个更好的复杂性的情况下
其中，比较比较贵，
或者我只是在看的比较复杂。
因此，无论进行比较的θ。
有人吗？
是的，回到那里。
听众：[听不清]
 
教授：你可以用中间比较。
我是怎么称呼呢？

Undetermined: 
 
我把它叫做什么。
你刚才说的，我把它叫做什么。
听众：二进制搜索。
教授：二进制搜索。
这是正确的。
两个垫子这一个。
所以，你演讲结束后接他们。
所以，你是完全正确的。
你得到它的权利。
我把它叫做二进制搜索，在这里。
所以你可以把插入排序
你可以排序的平凡把它变成一个THETA的n logñ
如果算法我们正在谈论ñ
是的数量进行比较。
和所有你需要做的做的就是说，
你知道吗，我要代替
这与二进制搜索。
你可以做that--那是关键observation--
因为A 0至零下我是1已经排序。
所以你可以做的阵列的一部分二进制搜索。
因此，让我只写了下来。
 

English: 
I called it something.
What you just said, I
called it something.
AUDIENCE: Binary search.
PROFESSOR: Binary search.
That's right.
Two cushions for this one.
So you pick them
up after lecture.
So you're exactly right.
You got it right.
I called it binary
search, up here.
And so you can
take insertion sort
and you can sort of trivially
turn it into a theta n log n
algorithm if we
are talking about n
being the number of compares.
And all you have to do
to do that is to say,
you know what, I'm
going to replace
this with binary search.
And you can do that-- and
that was the key observation--
because A of 0 through i
minus 1 is already sorted.
And so you can do binary search
on that part of the array.
So let me just write that down.

English: 
Do a binary search on A
of 0 through i minus 1,
which is already sorted.
And essentially, you can think
of it as theta log i time,
and for each of those steps.
And so then you get your
theta n log n theta n log
n in terms of compares.
Does this help the swaps
for an array data structure?
No, because binary search
will require insertion
into A of 0 though i minus 1.
So here's the problem.
Why don't we have a full-fledged
theta n log n algorithm,
regardless of the cost
of compares or swaps?
We don't quite have that.

Undetermined: 
通过我减1做一个为0的二进制搜索，
这是已经排序。
 
而且基本上，你可以把它看作THETA登录我的时间，
并为每个这些步骤。
所以，你得到你的THETA的n logñTHETA的n log
n个方面进行比较。
这是否有助于掉期的阵列？
没有，因为二进制搜索将需要插入
成A 0虽然我减1。
因此，这里的问题。
我们为什么不有一个全面的THETA的n logñ算法，
不管比较或掉期的成本？
我们不太有。

Undetermined: 
我们不太有，因为我们需要插入的我我们的A
通过进我减1右位置分为A 0。
你做，如果你有一个数组结构，
它可能进入的中间。
而且你要的东西转移到正确的。
而当你的东西转移到正确的，
在最坏的情况下，你可能会被转移了很多东西
到正确的。
而这回来到THETA n个最坏情况的复杂性。
 
在插入排序所以二进制搜索
给你THETA N日志N代表进行比较。
但它仍然THETAñ平方的互换。
 
因此，大家可以看到，有很多品种
的排序算法。
我们只是看着他们夫妇。
两个人又插入排序。
我只是把第二个
是的，我想，技术上称为折半插入排序
因为它确实二进制搜索。
和香草插入排序是
一个是你的代码在DOC DIS程序，
或该文档中的至少一个存款保险计划的文件。

English: 
We don't quite have that because
we need to insert our A of i
into the right position into
A of 0 through i minus 1.
You do that if you have
an array structure,
it might get into the middle.
And you have to shift
things over to the right.
And when you shift
things over to the right,
in the worst case, you may
be shifting a lot of things
over to the right.
And that gets back to worst
case complexity of theta n.
So a binary search
in insertion sort
gives you theta n
log n for compares.
But it's still theta
n squared for swaps.
So as you can see,
there's many varieties
of sorting algorithms.
We just looked at
a couple of them.
And they were both
insertion sort.
The second one
that I just put up
is, I guess, technically
called binary insertion sort
because it does binary search.
And the vanilla
insertion sort is
the one that you have the code
for in the doc dis program,
or at least one of
the doc dis files.

English: 
So let's move on and talk
about a different algorithm.
So what we'd like to
do, now-- this class
is about constant improvement.
We're never happy.
We always want to do
a little bit better.
And eventually, once
we run out of room
from an asymptotic
standpoint, you
take these other classes
where you try and improve
constant factors and
get 10%, and 5%, and 1%,
and so on, and so forth.
But we'll stick to improving
asymptotic complexity.
And we're not quite happy
with binary insertion sort
because, in the case of numbers,
our binary insertion sort
has theta n squared complexity,
if you look at swaps.
So we'd like to go find an
algorithm that is theta n log
n.
And I guess, eventually,
we'll have to stop.
But Erik will take care of that.
There's a reason to stop.
It's when you can prove that
you can't do any better.

Undetermined: 
因此，让我们继续前进，并谈论不同的算法。
所以，我们想要做的，这now--类
约不断的改进。
我们永远都不会幸福。
我们总是希望做到更好一点。
最终，当我们运行的空间
从渐进的角度来看，你
采取这些其他类，你尝试和改进
恒定的因素，并获得10％，5％，1％，
等，等等。
但是，我们将坚持渐进提高的复杂性。
而且我们不是很满意，折半插入排序
因为，在数字的情况下，我们的二进制插入排序
有THETAñ平方的复杂性，如果你互换。
因此，我们想去找一个算法是THETA的n log
ñ。
我猜，最终，我们将不得不停止。
但是，埃里克将采取照顾。
 
还有一个理由停止。
这是当你能证明你不能做的更好。

English: 
And so we'll get to
that, eventually.
So merge sort is also something
that you've probably seen.
But there probably
will be a couple
of subtleties that come out as
I describe this algorithm that,
hopefully, will be interesting
to those of you who already
know merge sort.
And for those of you who don't,
it's a very pretty algorithm.
It's a standard recursion
algorithm-- recursive
algorithm-- similar
to a binary search.
What we do, here, is we have
an array, A. We split it
into two parts, L and R.
And essentially, we kind of
do no work, really.
In terms of the L and R in
the sense that we just call,
we keep splitting,
splitting, splitting.
And all the work is
done down at the bottom
in this routine called
merge, where we are merging

Undetermined: 
因此，我们会到达那个，最终。
所以合并排序也未尝你可能已经看到。
 
但也有可能会是一对夫妇
细微之处说出来，因为我描述这个算法是，
希望，将是有趣的，你们谁已经
知道归并排序。
而对于那些你们谁不这样做，这是一个非常漂亮的算法。
这是一个标准的递归算法 - 递归
算法 - 类似于二进制搜索。
我们做什么，在这里，我们是有一个数组，答：我们把它分解
成两部分，L基本上种，我们和R.和
做任何工作，真的。
在L和R在这个意义上，我们只需要调用而言，
我们不断分裂，分裂，分裂。
和所有的工作都是在底部做下来
在这个程序称为合并，我们正在合并

Undetermined: 
一对在叶片元件。
然后，我们合并两对获得四个要素。
然后我们合并4元组的元素，等等，
走一路上扬。
因此，虽然我只是说L而言为L-总理，在这里，
有没有真正明确的代码，您
可以看到，原来L为L-素。
它真的发生以后。
有没有真正的排序代码，在这里。
它发生在合并程序。
你会看到很清楚
当我们通过一个实例运行。
 
所以，你有L和R转成L-素和R素。
我们最终得到的是一个有序的数组，A.
我们有什么所谓的合并例程

English: 
a pair of elements
at the leaves.
And then, we merge two
pairs and get four elements.
And then we merge four tuples
of elements, et cetera,
and go all the way up.
So while I'm just saying L
terms into L prime, out here,
there's no real
explicit code that you
can see that turns
L into L prime.
It happens really later.
There's no real
sorting code, here.
It happens in the merge routine.
And you'll see
that quite clearly
when we run through an example.
So you have L and R turn
into L prime and R prime.
And what we end up getting
is a sorted array, A.
And we have what's called
a merge routine that

Undetermined: 
取L素和R素数，将它们合并
进入排序的数组。
因此，在顶层，你看到的是一分为二，
并做了合并，并获得了排序的数组。
输入是大小为n的。
你超过2有大小为n的两个数组。
这是大小为n超过2两排序数组。
然后，最后，你有大小为n的有序数组。
 
所以，如果你想跟着执行递归
这在一个小例子，那么你会
可以看到这是如何工作。
我们会做一个非常简单的例子
与8个元素。

English: 
takes L prime and R
prime and merges them
into the sorted array.
So at the top level, what
you see is split into two,
and do a merge, and get
to the sorted array.
The input is of size n.
You have two arrays
of size n over 2.
These are two sorted
arrays of size n over 2.
And then, finally, you have
a sorted array of size n.
So if you want to follow
the recursive of execution
of this in a small
example, then you'll
be able to see how this works.
And we'll do a fairly
straightforward example
with 8 elements.

Undetermined: 
因此，在顶部level--之前，我们到达那里，合并
会假设你有两个数组排序，
并把它们合并起来。
这是在归并排序不变，或合并程序。
它假定输入sorted-- L和R.其实
我应该说，L素和R素。
因此，让我们说你有20个，13个，7个和2个。
你有12，11，图9和1。
这可能为L素。
这可能为R素数。
你所拥有的就是我们所说的两个手指算法。
所以，你有两个手指和他们每个人
指向的东西。
在这种情况下，它们中的一个指向
以L.我的左手手指指向为L素，
或者一些元件L素。
我的右手指指向R中的一些主要元素。
我要去比较两个元素
我的手指指向。

English: 
So at the top level--
before we get there, merge
is going to assume that
you have two sorted arrays,
and merge them together.
That's the invariant in merge
sort, or for the merge routine.
It assumes the inputs are
sorted-- L and R. Actually
I should say, L
prime and R prime.
So let's say you have
20, 13, 7, and 2.
You have 12, 11, 9, and 1.
And this could be L prime.
And this could be R prime.
What you have is what we
call a two finger algorithm.
And so you've got two
fingers and each of them
point to something.
And in this case, one
of them is pointing
to L. My left finger
is pointing to L prime,
or some element L prime.
My right finger is pointing
to some element in R prime.
And I'm going to
compare the two elements
that my fingers are pointing to.

English: 
And I'm going to
choose, in this case,
the smaller of those elements.
And I'm going to put them
into the sorted array.
So start out here.
Look at that and that.
And I compared 2 and 1.
And which is smaller?
1 is smaller.
So I'm going to write 1 down.
This is a two finger
algo for merge.
And I put 1 down.
When I put 1 down, I
had to cross out 1.
So effectively, what
happens is-- let
me just circle that
instead of crossing it out.
And my finger moves up to 9.
So now I'm pointing at 2 and 9.
And I repeat this step.
So now, in this
case, 2 is smaller.
So I'm going to go
ahead and write 2 down.
And I can cross out 2 and
move my finger up to 7.
And so that's it.
I won't bore you with
the rest of the steps.
It's essentially walking up.
You have a couple of
pointers and you're
walking up these two arrays.

Undetermined: 
而我要选择，在这种情况下，
这些元素的小。
我打算把它们变成了排序的数组。
因此，开始在这里。
你看那个和那个。
和我相比，2和1。
而且这是小？
1越小。
所以我打算写下来1。
这是两个手指的算法中合并。
我把1下来。
当我把1下来，我只好划掉1。
那么有效，会发生什么is--让
我只是过路圈出来的，而不是说。
和我的手指向上移动至9。
所以现在我指着2和9。
我重复这一步。
所以，现在，在这种情况下，2小。
所以我要继续前进，并写下来2。
我可以划掉2和移动我的手指7。
所以，就是这样。
我不会跟你们唠叨的其余步骤。
它本质上是走了。
你有一对夫妇的指针和你
走了这两个数组。

Undetermined: 
而你写下1，2，7，9，11，12，13，20。
这就是您的合并程序。
和所有的工作，说真的，在合并完成常规
因为，除此之外，身体是简单地
递归调用。
你必须这样做，显然，拆分阵列。
但是，这是相当简单的。
如果你有一个数组，A 0到N--并根据
n是否是奇数还是even--你可以
想象一下，你集合L为A 0 N 2减1，
和R类似。
所以你刚才在中间半路分裂它。
我会谈论更多一点。
有一个与服务相关的微妙之处
我们会得到在几分钟内。
但在合并排序的计算方面完成了。
这是它。

English: 
And you're writing down 1,
2, 7, 9, 11, 12, 13, 20.
And that's your merge routine.
And all of the work, really,
is done in the merge routine
because, other than
that, the body is simply
a recursive call.
You have to, obviously,
split the array.
But that's fairly
straightforward.
If you have an array, A 0
through n-- and depending on
whether n is odd
or even-- you could
imagine that you set L
to be A 0 n by 2 minus 1,
and R similarly.
And so you just split it
halfway in the middle.
I'll talk about that
a little bit more.
There's a subtlety
associated with that
that we'll get to
in a few minutes.
But to finish up in terms of
the computation of merge sort.
This is it.

Undetermined: 
合并例程正在做的工作的大部分，如果不是全部。
这两指算法是怎么回事
能够采取两个已排序的阵列
并把它们放到一个排序的数组
通过散布，或交织，这些元素。
什么是合并的复杂性
如果我有一个大小为两个数组超过2，在这里？
我有什么？
听众：N。
教授：N。
我们会给你一个垫子，太。
 
THETAñ复杂性。
 
到目前为止，一切都很好。
 
我知道你知道答案是什么
合并排序的复杂性。
但我猜你最
无法证明给我，因为我是那种硬家伙
要证明的东西。
而我总是说，不，我不相信你
或者我不明白。
 

English: 
The merge routine is doing
most, if not all, of the work.
And this two finger
algorithm is going
to be able to take
two sorted arrays
and put them into a
single sorted array
by interspersing, or
interleaving, these elements.
And what's the
complexity of merge
if I have two arrays
of size n over 2, here?
What do I have?
AUDIENCE: n.
PROFESSOR: n.
We'll give you a cushion, too.
theta n complexity.
So far so good.
I know you know the
answer as to what
the complexity of merge sort is.
But I'm guessing
that most of you
won't be able to prove it to me
because I'm kind of a hard guy
to prove something to.
And I could always say,
no, I don't believe you
or I don't understand.

English: 
The complexity-- and you've
said this before, in class,
and I think Erik's
mentioned it--
the overall complexity of this
algorithm is theta n log n
And where does that come from?
How do you prove that?
And so what we'll do, now,
is take a look at merge sort.
And we'll look at
the recursion tree.
And we'll try and--
there are many ways
of proving that merge
sort is theta n log n.
The way we're
going to do this is
what's called proof by picture.
And it's not an established
proof technique,
but it's something
that is very helpful
to get an intuition
behind the proof
and why the result is true.
And you can always
take that and you
can formalize it and
make this something
that everyone believes.
And we'll also look at
substitution, possibly
in section tomorrow,
for recurrence solving.

Undetermined: 
该complexity--和你说，这之前，在课堂上，
我认为埃里克的提到它 -
该算法的整体复杂性是THETA的n logñ
而哪里是从何而来？
你怎么证明呢？
还等什么，我们就做什么，现在，就是看看归并排序。
我们来看看递归树。
我们会尽力还有 - 还有很多方法
证明合并的排序是THETA的n logñ。
我们要做到这一点的方法是
什么叫做被证明的图片。
它不是一个既定的证明技术，
但它的东西，这是非常有帮助的
拿到证明背后的直觉
为什么结果是真实的。
而你总是可以采取你
可以正式化，使这个东西
每个人都相信。
我们也来看看置换，可能
在第明天，复发解决。

Undetermined: 
那么，我们现在的问题是，我们有一个分而治之
算法，有一个合并的步骤，是THETAñ。
所以，如果我只是看这个结构，我这里有，
我可以写一个为复发合并排序
看起来是这样的。
所以当我说的复杂性，我可以说
T N的，这是对于n个项目完成的工作，
将是为了一些固定时间
分割阵列。
所以这可能是相应的部分
向分割的阵列。
还有的将是一个大小为两个问题超过2。
所以，我有2个T N超过2。
这是递归部分。
 
我要去具有C n次，这是合并的一部分。
而这是一些常数n次，这是我们所拥有的，

English: 
So where we're right now is that
we have a divide and conquer
algorithm that has a merge
step that is theta n.
And so, if I just look at this
structure that I have here,
I can write a recurrence
for merge sort
that looks like this.
So when I say
complexity, I can say
T of n, which is the
work done for n items,
is going to be some
constant time in order
to divide the array.
So this could be the
part corresponding
to dividing the array.
And there's going to be two
problems of size n over 2.
And so I have 2 T of n over 2.
And this is the recursive part.
And I'm going to have c times
n, which is the merge part.
And that's some constant times
n, which is what we have,

English: 
here, with respect to
the theta n complexity.
So you have a recurrence like
this and I know some of you
have seen recurrences in 6.042.
And you know how to solve this.
What I'd like to do is show you
this recursion tree expansion
that, not only tells you how
to solve this occurrence,
but also gives you a means
of solving recurrences where,
instead of having c of n, you
have something else out here.
You have f of n, which
is a different function
from the linear function.
And this recursion
tree is, in my mind,
the simplest way of
arguing the theta n log n
complexity of merge sort.
So what I want to do is
expand this recurrence out.
And let's do that over here.

Undetermined: 
这里，相对于的θÑ复杂。
所以，你有这样的复发，我知道你们中的一些
已经于6.042看到复发。
你知道如何解决这个问题。
我想要做的就是告诉你这个递归扩展树
如此，不仅告诉你如何解决这种情况的发生，
但也给你解决复发，其中的一种手段，
而不是让C N的，你有别的东西在这里。
你有F N的，这是一个不同的功能
从线性函数。
而这个递归树，在我的脑海里，
争论的θ的n log n的最简单方法
归并排序的复杂性。
因此，我想要做的就是扩大这个复发了。
让我们做到这一点在这里。

English: 
So I have c of n on top.
I'm going to ignore this
constant factor because c of n
dominates.
So I'll just start with c of n.
I want to break things
up, as I do the recursion.
So when I go c of n, at
the top level-- that's
the work I have to do at
the merge, at the top level.
And then when I go down to two
smaller problems, each of them
is size n over 2.
So I do c times n
divided by 2 [INAUDIBLE].
So this is just a constant c.
I didn't want to
write thetas up here.
You could.
And I'll say a little bit
more about that later.
But think of this cn as
representing the theta n
complexity.
And c is this constant.
So c times n, here. c
times n over 2, here.

Undetermined: 
 
所以，我有C N之上的。
我会忽略这个常数因子，因为C N的
占主导地位。
所以我就开始用C N的。
我想打破东西，因为我做的递归。
所以，当我去C N的，在顶部level--那
工作我必须做的合并，在顶层。
然后当我去到两个小问题，他们每个人
是大小为n超过2。
所以，我做的C倍2 [听不清] n的划分。
所以，这只是一个常数c。
我不想在这里写了θ驱动。
你可以。
我会说一点点关于以后。
但认为这CN作为代表的θñ
复杂性。
和c是本恒定。
所以C n次，在这里。 çn次超过200，在这里。

Undetermined: 
然后当我继续下去，我有Çn次超过4，
Çn次超过4，等等，等等，等等。
当我一路下来这里，
n为最终将成为1--或本质上是constant--
而我将有一大堆的C'S在这里。
所以这里的另外一个问题，那我想你回答。
有人告诉我，什么水平的这种树的数量，
准确地说，和叶在这棵树的数量，
精确。
听众：电平的数目是日志N加1。
教授：日志N加1。
登录到基座2加1。
和叶子的数量？
 
你提出你的手回到那里，首先。
叶数。
观众：我觉得ñ。
教授：是啊，你说得对。
你认为正确的。

English: 
And then when I keep going,
I have c times n over 4,
c times n over 4, et cetera,
and so on, and so forth.
And when I come down
all the way here,
n is eventually going to become
1-- or essentially a constant--
and I'm going to have
a bunch of c's here.
So here's another question,
that I'd like you to answer.
Someone tell me what the number
of levels in this tree are,
precisely, and the number
of leaves in this tree are,
precisely.
AUDIENCE: The number of
levels is log n plus 1.
PROFESSOR: Log n plus 1.
Log to the base 2 plus 1.
And the number of leaves?
You raised your hand
back there, first.
Number of leaves.
AUDIENCE: I think n.
PROFESSOR: Yeah, you're right.
You think right.

Undetermined: 
因此，1加日志n和n叶。
当n为1，其中有多少你有？
你到一个单一的元素，这是根据定义，
排序。
你有n个叶子。
所以，现在让我们增加了工作。
我真的很喜欢这张照片，因为它只是
在美国获得的结果而言如此直观
我们正在寻找。
所以，你把工作中的每个这种树的水平。
所以顶层是cn。
第二个层次是CN，因为我加了1/2和1/2，CN，CN。
哇。
什么对称。
所以，你正在做的工作模等量
恒定的因素，在这里，有什么
与C1，这是我们所忽略的事情，
但大致相同的量在每一级的工作。

English: 
So 1 plus log n and n leaves.
When n becomes 1, how
many of them do you have?
You're down to a single element,
which is, by definition,
sorted.
And you have n leaves.
So now let's add up the work.
I really like this
picture because it's just
so intuitive in terms
of getting us the result
that we're looking for.
So you add up the work in each
of the levels of this tree.
So the top level is cn.
The second level is cn because
I added 1/2 and 1/2, cn, cn.
Wow.
What symmetry.
So you're doing the same
amount of work modulo
the constant factors,
here, with what's
going on with the c1,
which we've ignored,
but roughly the same amount
of work in each of the levels.

English: 
And now, you know how
many levels there are.
It's 1 plus log n.
So if you want to write
an equation for T of n,
it's 1 plus log n times c of
n, which is theta of n log n.
So I've mixed in
constants c and thetas.
For the purposes of
this description,
they're interchangeable.
You will see recurrences that
look like this, in class.
And things like that.
Don't get confused.
It's just a constant
multiplicative factor
in front of the
function that you have.
And it's just a little
easier, I think,
to write down these
constant factors

Undetermined: 
而现在，你知道有多少水平有。
这是1加日志ñ。
所以，如果你想要写一个方程式为n个T，
这是1加日志n次N，这是THETA的n log n个C上。
 
所以我混在常数c和θ驱动。
对于这个描述的目的，
他们是可以互换的。
你会看到复发看起来像这样，在课堂上。
 
之类的东西。
不要混淆。
这只是一个恒定的乘数因子
在你具有的功能的面前。
它只是更容易一些，我觉得，
写下这些因素不变

English: 
and realize that the
amount of work done
is the same in
each of the leaves.
And once you know the
dimensions of this tree,
in terms of levels and in
terms of the number of leaves,
you get your result.
So we've looked at
two algorithm, so far.
And insertion sort, if
you talk about numbers,
is theta n squared for swaps.
Merge sort is theta n log n.
Here's another
interesting question.
What is one advantage of
insertion sort over merge sort?
AUDIENCE: [INAUDIBLE]
PROFESSOR: What does that mean?
AUDIENCE: You don't have
to move elements outside
of [INAUDIBLE].
PROFESSOR: That's exactly right.

Undetermined: 
并认识到工作的完成量
是在每个叶片的相同。
一旦你知道这棵树的尺寸，
在水平方面和在叶片的数量而言，
你会得到你的结果。
 
因此，我们已经看了两个算法，到目前为止。
 
和插入排序，如果你谈论的数字，
是THETAñ平方的互换。
合并排序是THETA的n logñ。
 
这里还有一个有趣的问题。
什么是一个优点插入排序比归并排序？
 
听众：[听不清]
教授：这是什么意思？
听众：你不必到外面移动元素
的[听不清]。
教授：这是完全正确的。

English: 
That's exactly right.
So the two guys who
answered the questions
before with the levels, and you.
Come to me after class.
So that's a great answer.
It's in-place
sorting is something
that has to do with
auxiliary space.
And so what you see, here--
and it was a bit hidden, here.
But the fact of the
matter is that you
had L prime and R prime.
And L prime and R prime are
different from L and R, which
were the initial halves of
the inputs to the sorting
algorithm.
And what I said here is, we're
going to dump this into A.
That's what this picture shows.
This says sorted
array, A. And so you
had to make a copy of the
array-- the two halves L
and R-- in order to
do the recursion,
and then to take the
results and put them
into the sorted array, A.
So you needed-- in
merge sort-- you

Undetermined: 
这是完全正确的。
所以两个家伙谁回答的问题
与之前的水平，而你。
下课后到我这里来。
所以这是一个伟大的答案。
这是就地排序是什么
有做辅助空间。
所以你看到，这里 - 这是一个有点隐蔽，在这里。
但问题的事实是，你
有L素和R素。
和L素和R素是从L和R，其中不同
分别输入的初始半部到分拣
算法。
而我在这里说的是，我们要倾倒到这一点A.
这就是这张照片显示。
这表示排序的数组，A。所以你
不得不做出的一个副本array--两半L
和R--为了做递归，
然后取结果，并把它们
进入有序阵列，A.
所以，你在合并类别 - 你needed--

English: 
needed theta n auxiliary space.
So merge sort, you need
theta n extra space.
And the definition
of in-place sorting
implies that you have theta
1-- constant-- auxiliary space.
The auxiliary space
for insertion sort
is simply that
temporary variable
that you need when
you swap two elements.
So when you want to swap
a couple of registers,
you gotta store one of the
values in a temporary location,
override the other, et cetera.
And that's the theta 1 auxiliary
space for insertion sort.
So there is an advantage of
the version of insertion sort
we've talked about,
today, over merge sort.
And if you have a billion
elements, that's potentially
something you don't
want to store in memory.
If you want to do something
really fast and do everything

Undetermined: 
需要THETAñ辅助用房。
所以，归并排序，你需要THETAñ额外的空间。
和就地的定义分选
意味着你拥有THETA 1-- constant--辅助用房。
 
插入排序辅助空间
仅仅是临时变量
当你交换两个元素，你需要。
所以，当你想换一对夫妇的寄存器，
你在一个临时位置得存储的值中的一个，
覆盖其它，等等。
而这对于插入排序的θ1个辅助空间。
所以插入排序的一大优点
我们今天谈到，在归并排序。
如果你有一个十亿的元素，这是潜在的
一些你不想要存储在内存中。
如果你想要做的事真快，并尽一切

English: 
in cache or main
memory, and you want
to sort billions are maybe
even trillions of items,
this becomes an
important consideration.
I will say that you can
reduce the constant factor
of the theta n.
So in the vanilla
scheme, you could
imagine that you have to
have a copy of the array.
So if you had n
elements, you essentially
have n extra items of storage.
You can make that n over 2
with a simple coding trick
by keeping 1/2 of A.
You can throw away one of
the L's or one of the R's.
And you can get it
down to n over 2.
And that turns out--
it's a reasonable thing
to do if you have
a billion elements
and you want to reduce your
storage by a constant factor.
So that's one coding trick.
Now it turns out that you
can actually go further.
And there's a fairly
sophisticated algorithm
that's sort of beyond
the scope of 6.006
that's an in-place merge sort.

Undetermined: 
在高速缓存或主内存，并且希望
排序是数十亿美元的项目可能甚至上万亿，
这成为一个重要的考虑因素。
我会说，你可以减少常数因子
θ波的n。
因此，在香草方案，你可以
想象一下，你必须有数组的一个副本。
所以，如果你有n个元素，你基本上是
存储有n个额外的项目。
您可以为n超过2用一个简单的编码窍门
由A.保持的1/2
你可以扔掉之一的L的或存在的R之一。
你可以把它下来到n超过2。
并且，轮流out--这是一个合理的事情
如果你有一个十亿元素做
并且希望通过一个常数因子，以减少您的存储。
所以这是一个编码的把戏。
现在事实证明，你其实可以走得更远。
并有一个相当复杂的算法
这是那种超出6.006的范围
这是一个就地归并排序。
 

Undetermined: 
这就地合并排序是怎么样的
不切实际在某种意义上说，它不会做的非常好
在恒定因子的条款。
因此，尽管它在就地，它仍然THETA的n logñ。
的问题是，在就地运行时间排序合并
比普通合并排序更糟糕的
采用THETAñ辅助用房。
所以，人们并不真正使用就地归并排序。
这是一个伟大的纸张。
这是一个伟大的事情来读取。
它的分析是有点复杂的双0 6。
所以我们不会去那里。
但它确实存在。
所以，你可以采取归并排序，我只是
想让您知道，您可以就地做的事情。
就数量而言，一些实验中，我们遇到了几年
ago--所以这些可能不是完全有效的
因为我要真正给你numbers--

English: 
And this in-place
merge sort is kind of
impractical in the sense
that it doesn't do very well
in terms of the
constant factors.
So while it's in-place and
it's still theta n log n.
The problem is that the running
time of an in-place merge sort
is much worse than the
regular merge sort that
uses theta n auxiliary space.
So people don't really
use in-place merge sort.
It's a great paper.
It's a great thing to read.
Its analysis is a bit
sophisticated for double 0 6.
So we wont go there.
But it does exist.
So you can take merge
sort, and I just
want to let you know that
you can do things in-place.
In terms of numbers, some
experiments we ran a few years
ago-- so these may not
be completely valid
because I'm going to
actually give you numbers--

English: 
but merge sort in Python, if
you write a little curve fit
program to do this, is 2.2n log
n microseconds for a given n.
So this is the
merge sort routine.
And if you look at
insertion sort, in Python,
that's something like 0.2
n square microseconds.
So you see the
constant factors here.
If you do insertion sort in C,
which is a compiled language,
then, it's much faster.
It's about 20 times faster.
It's 0.01 n squared
microseconds.

Undetermined: 
但在Python归并排序，如果你写一个小曲线拟合
计划要做到这一点，是2.2N日志ñ微秒作为​​指定N。
因此，这是合并排序程序。
 
如果你看一下插入排序，在Python，
这有点像为0.2N广场微秒。
所以你在这里看到的常数因子。
如果你这样做插入排序在C，这是一个编译语言，
然后，它的速度要快得多。
这是大约快20倍。
 
这是0.01N的平方微秒。

Undetermined: 
因此，在实践中边一点点。
我们要求你写代码。
这是很重要的。
我们感兴趣的算法的原因
是因为人们想要运行它们。
而且你可以看到的是，你实际上可以找到一个这样N--
无论你是否是Python或C，
这告诉你，渐进的复杂性是非常重要的
因为，一旦n得到超过约4000，
你会看到在Python的合并排序
在击败C.插入排序
因此，常数因子纳入得到
超出的n某些值。
所以这就是为什么渐进的复杂性是非常重要的。
你有20倍，在这里，
但是，这并不真正帮助你在条款
保持对一个n方算法的竞争力。
它停留长一点竞争力，
但后来落后。
 
这就是我想掩盖排序。
所以希望，你的感觉是什么

English: 
So a little bit of
practice on the side.
We do ask you to write code.
And this is important.
The reason we're
interested in algorithms
is because people
want to run them.
And what you can see is that
you can actually find an n-- so
regardless of whether
you're Python or C,
this tells you that asymptotic
complexity is pretty important
because, once n gets
beyond about 4,000,
you're going to see that
merge sort in Python
beats insertion sort in C.
So the constant
factors get subsumed
beyond certain values of n.
So that's why asymptotic
complexity is important.
You do have a
factor of 20, here,
but that doesn't really
help you in terms
of keeping an n square
algorithm competitive.
It stays competitive
for a little bit longer,
but then falls behind.
That's what I wanted
to cover for sorting.
So hopefully, you
have a sense of what

Undetermined: 
恰好这两个排序算法。
我们来看看一个非常不同的排序算法接下来的时间，
使用堆，这是一个不同的数据结构。
我想过去的事情做了几分钟，我已经离开
就是给你多一点直觉复发
根据这个图，我写了那里解决。
所以我们要使用的正是这种结构。
我们要看看在几个不同的复发
我不会真的在激励方面
具有特定的算法，但我就
写出来的复发。
我们来看看递归树的。
我会试着逗你了相关的复杂性
用的整体复杂性，这些复发。
 
因此，让我们来看看在N个了T超过2等于2 T N的

English: 
happens with these two
sorting algorithms.
We'll look at a very different
sorting algorithm next time,
using heaps, which is a
different data structure.
The last thing I want to do in
the couple minutes I have left
is give you a little more
intuition as to recurrence
solving based on this diagram
that I wrote up there.
And so we're going to use
exactly this structure.
And we're going to look at a
couple of different recurrences
that I won't really
motivate in terms
of having a specific
algorithm, but I'll just
write out the recurrence.
And we'll look at the
recursion tree for that.
And I'll try and tease out of
you the complexity associated
with these recurrences of
the overall complexity.
So let's take a look at T
of n equals 2 T of n over 2

English: 
plus c n squared.
Let me just call that c--
no need for the brackets.
So constant c times n squared.
So if you had a
crummy merge routine,
and it was taking n square,
and you coded it up wrong.
It's not a great motivation
for this recurrence,
but it's a way this
recurrence could have come up.
So what does this
recursive tree look like?
Well it looks kind of
the same, obviously.
You have c n square; you
have c n square divided by 4;
c n square divided by
4; c n square divided
by 16, four times.
Looking a little bit
different from the other one.
The levels and the leaves
are exactly the same.
Eventually n is going
to go down to 1.
So you will see c
all the way here.
And you're going
to have n leaves.

Undetermined: 
加上CN平方。
 
让我打电话说C--不需要括号。
所以常数c n次平方。
所以，如果你有一个糟糕的合并程序，
并且它取n平方，你错编码它。
它不是为这个复发巨大的动力，
但它是一种复发这可能上来。
那么，这是什么树递归样子？
那么它看起来种相同的，很明显。
你已经为cn平方米;你已经为cn 4平方分;
CN平方除以4; CN平方除以
16，四次。
展望从另外一个有点不同。
水平和叶子是完全一样的。
最终，n为要下井1。
所以你会看到这里的ç一路。
而你将有n个叶子。
 

Undetermined: 
你将有，像以前一样，1加日志N水平。
一切都是一样的。
这就是为什么我喜欢这个递归树配方等等
，更因为现在，我必须做的
被添加了与每一级相关联的工作
得到溶液到复发。
现在，看看会发生什么，在这里。
CN平方米; CN平方除以2; CN平方除以4。
这是n次℃。
 
那么是什么加起来？
听众：[听不清]
教授：是的，没错。
完全正确。
所以，如果你看看发生了什么，在这里，这占主导地位。
 
所有的其他的东西实际上比这少。
你说为界由两个方形CN
因为这部分是由CN正方形界
我已经为cn方在顶部。
使对应于该照出这个特定算法

English: 
And you will have, as
before, 1 plus log n levels.
Everything is the same.
And this is why I like this
recursive tree formulation so
much because, now,
all I have to do
is add up the work associated
with each of the levels
to get the solution
to the recurrence.
Now, take a look at
what happens, here.
c n square; c n square divided
by 2; c n square divided by 4.
And this is n times c.
So what does that add up to?
AUDIENCE: [INAUDIBLE]
PROFESSOR: Yeah, exactly.
Exactly right.
So if you look at what
happens, here, this dominates.
All of the other things are
actually less than that.
And you said bounded
by two c n square
because this part is
bounded by c n square
and I already have c n
square up at the top.
So this particular algorithm
that corresponds to this crummy

English: 
merge sort, or wherever
this recurrence came from,
is a theta n squared algorithm.
And in this case,
all of the work done
is at the root-- at the
top level of the recursion.
Here, there was a
roughly equal amount
of work done in each of
the different levels.
Here, all of the work
was done at the root.
And so to close
up shop, here, let
me just give you real
quick a recurrence where
all of the work is done at
the leaves, just for closure.
So if I had, magically, a merge
routine that actually happened
in constant time, either
through buggy analysis,
or because of it
was buggy, then what
does the tree look
like for that?
And I can think of
this as being theta 1.

Undetermined: 
归并排序，或是其他地方，这种复发而来，
是THETAñ平方算法。
在这种情况下，所有的工作完成
是在root--在递归的顶层。
这里，有一个大致相等的量
工作做在每个不同层次的。
在这里，所有的工作都做在根。
因此关门大吉了，在这里，让
我只是给你真正的快速复发哪里
所有的工作已经完成，在叶子，只是关闭。
所以，如果我有，神奇，合并程序的实际发生
在固定时间内，无论是通过车分析，
或者是因为它是越野车，那么什么
没有树的样子是什么？
我可以认为这是THETA 1。

English: 
Or I can think of this as
being just a constant c.
I'll stick with that.
So I have c, c, c.
Woah, I tried to move that up.
That doesn't work.
So I have n leaves, as before.
And so if I look at
what I have, here, I
have c at the top level.
I have 2c, and so
on and so forth.
4c.
And then I go all
the way down to nc.
And so what happens
here is this dominates.
And so, in this recurrence, the
whole thing runs in theta n.
So the solution to
that is theta n.
And what you have here
is all of the work
being done at the leaves.
We're not going to really cover
this theorem that gives you

Undetermined: 
或者我可以认为这仅仅是一个常数c。
我会坚持的。
所以我有C，C，C。
 
哇，我试着动了起来。
这是行不通的。
所以，我有n个叶子，和以前一样。
 
所以，如果我看看我有什么，在这里，我
有c在顶层。
我有图2c，等等，等等。
4C。
然后我一路走下来NC。
所以，这里发生的一切是这样的主宰。
 
因此，在这种复发，整个事情运行在THETAñ。
因此，解决这一是THETAñ。
和你在这里是所有的工作
正在开展的叶。
我们不打算真正涵盖这个定理，让你

Undetermined: 
搞清楚了这一点，因为我们认为的机械方式
递归树是在寻找一种更好的方式。
但是你可以看到，根据什么功能，
在工作​​方面正在做在合并例程
你有不同的版本复发。
我会坚持围绕，人们谁回答问题，请
拿起你的垫子。
下次再见。

English: 
a mechanical way of figuring
this out because we think
the recursive tree is a
better way of looking at.
But you can see that, depending
on what that function is,
in terms of the work being
done in the merge routine,
you'd have different
versions of recurrences.
I'll stick around, and people
who answered questions, please
pick up you cushions.
See you next time.
