Hi and welcome to week 2 of accelerated
computational linguistics. So this is
gonna be one of the more theoretical
weeks of our course. After week 2, we're
gonna work mostly on natural language
processing and a little bit of machine
learning, but this week we're gonna study
the foundations of the whole edifice.
We're gonna study how to model human
language through rules like the
programming rules that we usually do. Can
it be done? Is it possible? We're gonna
study a kind of series of abstractions
called automata and we're gonna study
something called the Chomsky hierarchy
which is a description of formal
grammars and how they can be used to
describe human languages and
computational languages as well. So in
general we're gonna try to figure out if
it is possible to model all of the
sequences of words and sounds in human
language using rules in a computer and
so in order to do that, let's start with
thinking about human language as a
system of rules. So human languages are
systematic. Indeed they are made up of
rules and of reoccurring patterns,
things that we see again and again. So if
you're in a class trying to learn a
language, your teacher's probably going to
show you how the sentences of the
languages - of the language are built. So
if you're learning English, for example
you will learn that first you have the
subject, the person who does the action:
Jane, and then you have the verb: Jane
eats. Subject verb. Next the teacher will
show you that there are sentences in
English like Jane eats Pizza. The subject
Jane, the verb, eats and then the direct
object, pizza. A direct object is
something that the action is done to. So
what are we eating? The pizza. The pizza
is the direct object. So in English, our
sentences should be subject verb, subject
verb direct object.
We cannot scramble them, for example: eats
Jane is not a good sentence of English.
It's verb noun. Something like: Pizza Jane
eats, is not a good sentence of English
either. This would be the direct object
the subject the verb. Other languages can
say this but English really cannot. So
all the - the pattern subject verb direct
object is apparent and will reoccur
time and again. And indeed we could try
to describe - use that pattern and all of
the similar patterns and turn them into
rules so that ultimately you would have
a massive description of how English
works. Let's look at that - that sentence
more closely. N here's me- here means
Noun. A noun is an object, a thing, a place
a name, and V means verb. A verb is an
action like to eat, to dream, to walk. So
we have the sentence Jane eats. In this
sentence, Jane is the subject.
Jane is a noun and the verb is to eat so
maybe English sentences are Noun Verb. And
this could help us generate many other
English sentences. Indeed we could make a
semi-formal description of an English
sentence as one noun and one verb.
Maybe we can expand on that description. We
also have sentences like Jane eats pizza.
In Jane eats pizza, we have a noun as a
subject, a verb, 1 verb and a noun for the
direct object, pizza. It's the thing that
we're eating. So look at some of the
regularities that we have here. The
sentence always begins with a noun and
with - with one noun so maybe the first
part of an English sentence is 1 noun.
These sentences always have one verb so
maybe we need one noun and one verb, and
then the sentences vary on whether they
can have a
direct object. Sometimes they are zero
nouns like in Jane eats and sometimes
they are one noun as in Jane eats pizza.
So maybe the formalization of these
sentences is one noun one verb zero or one
nouns.
Let's try to see if we can have
more sentences. We have Jane eats,
Jane eats pizza. How about Jane eats ice
cream? Here the subject remains one noun,
the verb is one verb, and then the direct
object can be two nouns: ice cream. So now
we have that our English sentence can be
one noun for the subject, one verb, and
then zero, one, or two nouns. It could be
three or more actually. It could be
something like Jane eats chocolate ice
cream. So it could be one, zero, or three
nouns. It could be Jane eats Vermont
chocolate ice cream, zero or four nouns.
So in general an English sentence maybe
is one noun, one verb and zero or more
nouns for the position of direct object.
Let's bring another sentence into the
equation: Jane Smith eats pizza
margherita. So look at how this changes
our subject. Now we have two nouns Jane
Smith. Maybe the subject is one or more
nouns. It could be James Smith Watson for
example, one two three nouns, but it
always needs to be at least one for
every sentence that we have seen. There's
always at least one noun in the subject,
there's always one verb, eats, and there
can be 0 1 2 or more nouns for the
direct object. So we could formalize the
description of this
sentence - of these sentences as one or more nouns, one verb, zero or more nouns. This is a
regular expression. This is this kind of
structure that we studied last week. An
English sentence would be one or more
nouns, one verb, and zero or more nouns, so
look at how a regular expression is
describing the syntax of English
sentences, and we could use this to
generate hundreds thousands of new
English sentences. Indeed we managed
to turn one aspect of human language
into a formal rule, at least one noun, one verb, zero or more nouns and these can
describe things like Jane Smith eats
pizza margherita. So maybe with enough
time, with enough effort, we could find
one big regular expression to describe
every sentence of English. We could find
a regular expression to describe every
sentence of Spanish, and so forth and in
general we could then model English as a
sequence of symbols, as a deterministic
sequence of symbols that tells you, first
I need a noun, then I need something like
a verb, then I need something like a noun again.
Indeed we can describe languages rules,
rules that order the way that sounds
interact, the way that words appear in a
sentence, so maybe all of human language
could be modeled like this, like a
sequence of symbols and the order in
which they appear. This is the question
that we're going to be studying this
week: is it possible to model human
language like this? And tools that we're
going to be using to study this kind of
ordering are a generalization of regular
expressions, and a kind of abstraction
called an automaton, plural, automata.
Thank you.
