Let's do this! Let's talk about theory of
computation and the Chomsky hierarchy.
So we've been talking about finite state
machines. Finite state machines are a
kind of abstraction of what a computer
program does. If you think about them
going from state to state with a
transition, it's sort of similar to
saying: if something happens, then do
something else, if I get this input then
perform this output. This finite state
machine is an abstraction of what a
computer program can do, and it is only
one type of abstraction. There's many
other ways to explain what computer
programs can perform on strings, the
kinds of operations they can perform. And
we call these formal grammars. The
Chomsky hierarchy is the description of
these grammars from most restrictive,
which is what we've been seeing so far
finite state machines, to most powerful
which is Turing machines which
essentially can generalize to
performing any operation on a string.
Some of these formal grammars are gonna
help us overcome the difficulties that
we've had. For example the fact that some
things in language have long-distance
dependencies where two distant states
have to communicate with one another in
order to work. They can help us overcome
the problems of recursion where our
finite state machine has to keep going
down and down and down and then remember
the path upwards and upwards and upwards
so that it goes down n times and then it
comes back up an equal n number of times.
Likewise they could help us with the
problem of context, where before you
apply a rule, you need to figure out what
other symbols are around you, and
depending on those, you perform the rule
in one way or the other.
So again finite state machines are just
one type of automaton and there's other
systems that are less restrictive and
they are here in the Chomsky hierarchy.
Type 3 is a regular grammar which is the
kind of grammar that a finite state
machine can describe. And we have other
types of grammars, context-free,
context-sensitive, and unrestricted or
computationally enumerable. Let's look
at each one of them. A type 3 grammar or
regular grammar is what we have been
seeing so far, a finite state machine is
a subtype of regular grammar. You can
have a rule that has one input where you
have one task that you want to perform:
create a sentence, you don't need to look
at the sentence before or after to
perform the task. You only want to create
this sentence here and you create the
sentence by making the symbol a n times or
having many symbols n times m times,
but there's no way for you to coordinate
that these two run at the same time. Each
state is independent of each other
in how many times it executes. This is
the grammar of the basic structures that
we had so far like Jane eats pizza. One
noun for the subject, one verb and zero
or more nouns for the direct object.
English sentences - in English sentences,
the number of nouns for the subject and
the number of nouns for the direct
objects are not connected. If you have
Jane Smith here, you do not need to have
two words in the direct object. You - Jane
Smith eats pizza, where this is two and this
is one, is a perfectly valid English
sentence. So this is a type of rule that
will be described through a regular grammar.
Let's look at a context-free grammar.
This is one where you have a single
variable going in: you know you want - you
want to generate a sentence and then the
output or the right side of the rule can
be anything. You can have more than one
symbol. These symbols can coordinate how
many times you run them but then you
would need to program something to help
them remember. It can be a variable
somewhere that remembers that n is equal
to 3, it can be a data structure like a
push down stack where if you push a n
times then you know you have to pop a n
times and so the popping could be that
the number of times you're on b so if
you pop a 1-2-3, and then - I'm sorry if
you push them one two three and then you
know you have to pop them one two three.
This could be the number of times you
have b, b b b. So as you can see,
you can implement memory with a
structure like a push down stack, you
could implement it with a variable.
There's something that you have to do
that is additional to the description of
the states and their transitions. This is
of course going to cost you
computationally. One rule that is a
cont - that can be explained with a
context-free grammar is centered
embedding. For example, the sentences like
The cat that I like eats tuna, where you
have a sentence like The cat eats tuna
and then the noun phrase can also have a
sub sentence, The cat that I like eats
tuna, for example. In the final
configuration you need cat cat I and
then like eats, so you need to push two
nouns and then pop two verbs, or you can
just have a variable that remembers that
n is equal to two and tracks that you
have two
going in and two verbs going out. This is
a context-free grammar and again it's
context-free because in order for you to
construct your sentence you do not need
to look at previous sentences or at
following sentences. A type 1 grammar is
a context-sensitive grammar. In these
kinds of rules, you can have anything on
the left side and anything on the right
side, but you do need to have for every
input an output. For every time, for
every element like a to the n, you need
to output something: it might be the same
or it might be different. For example in
the Yoruba emphatics, we had a rule that
took a vowel, a tone, and a consonant if
it existed. For every vowel you had in
the input you, had to give - you have to
produce the vowel in the output. For
every consonant you had in the input you
have to bring the constant to the output,
but then for every tone in the input you
have to read it, you have to perform an
operation on it, and you have to give us
an output. So there's a one-to-one
correspondence even if the output itself
is different. We call it again
context-sensitive because in order for
you to know what def is, you need to
read some things about abc and in order
for you to know what abc - how abc
are going to interact, you need to
actually look at them, to look at the
sounds before and after you. So regular
grammars are for example a basic English
sentence, context-free grammars are
center embedding which is more complex,
and you need to coordinate things but
you still don't need to look at
sentences around you, a context-sensitive
grammar is one where you do need to look
at the sounds around you, like in Yoruba
tones. And an unrestricted grammar or a
computationally enumerable grammar is
anything else: one where you can have an
arbitrary number of symbols as the input
and then produce some output that is not
necessarily matched to the input. Human
languages probably don't have these
kinds of rules, but this is the kind of
rule that have been handled by a Turing
machine. A Turing machine would take a
string - an input string of any kind and
an arbitrary number of symbols, and then
it would transform it into some other
string with an arbitrary number of
symbols in the output. Of course the
problem with this is that this comes
with added computational power, we can
take any input and put it into any out -
and convert it into any output but now
the cost is that not only is our
processing going to skyrocket, we cannot
even guarantee that we're ever gonna
stop the operation. More on this in the
next video. In summary we have four main
types of grammars for the Chomsky
hierarchy: we have regular grammars which
take one input and have many outputs,
but these outputs are each of them
independent from the other, and yeah they
don't remember how the other behaved.
This is our syntax of sentences in
English. We have context-free grammars
where we have one input that doesn't need to
look at the context, and we can have many
outputs and these outputs can
communicate with one another and we
would need to implement some
computational way for them to
communicate, a push down stack for
example. So we know that we push so many
in and we need to pop so many out. A
context-sensitive grammar would be one
that takes an arbitrary - I'm sorry that
takes input and then for every input you
need to write an output. You can
transform symbols and with this make the
rule context-sensitive because you need
to look at them in order to know what
you're doing, but you need to provide for
every input an output. Finally an
unrestricted grammar is one where the rules
can take any form. You can take a
string and transform into any other
string. Human languages probably don't
have these, but this is the kind of
computa -  of rule that we actually do
need in computer science for example. So
we can finally make this definition very
precise: finite state machines are kind
of automaton, a finite state machine is
one that generates regular languages in
the form for example a to the nth b to the mth, to the pth. You can generate an arbitrary number of symbols,
an arbitrary number of times but they're
not gonna be communicating with one
another. We can only have transitions
between states so you cannot have
communications across distant states in
finite state machines. You can have them
in other types of grammars but this is
gonna cost us computationally. This by
the way is the same hierarchy just
displayed slightly differently. As you
can see we have as finite languages is
the most restrictive kinds of languages.
Within it, for example, we have logic like
true or false statements. We have rules
that are regular, that can be modeled by
finite state machines. For example our
English consonant clusters have some
structure to construct syllables but we
only need to look inside of our syllable
to construct it. We don't need to look at
other syllables to know what our
syllable needs to look like. We have
context free rules, where you need to
implement means for the computer to
remember what it's doing such as a
stack, for example, or variables or in
some other way so that you can have as
many n nouns and as many verbs as you
need, but you still don't need to look at
the sentences around you.
We have at the outside edges of this
hierarchy, a rule like the Yoruba emphatic
where you have a rule that takes a part
of a syllable, reads it and then
generates the output based on what it
saw in its immediate con - in its
context not just immediate context, but
it can be context several symbols away,
several sounds away like in the second
case: sun un. Where you have to go not
one but two back and could potentially
go even further back and again we don't
know if human languages have rules that are
more complicated than this. We would call
those computationally enumerable rules.
In summary human languages do have rules
that are context sensitive and that
cannot be modeled by finite state
machines. Some rules in language are
fairly easy to model and some rules are
more complex to model. There's many types
of formal grammars that we can use to
generate computer programs and the
Chomsky hierarchy tells us what - how much
power are we gonna need in our computer
program to handle a rule of human
language. Next week we're - sorry next week?
Next video! We're gonna look at the
consequences of having to use different
grammars and spoiler alert the - the more
complex the formal grammar, the more
we're gonna have to - more resources we're
gonna have to invent - invest for it to
be processed, the more costly it's going
to be and ultimately the problem is
going to explode out of control.
