Hi again. Let's continue exploring
different constituents, and how we can
build rules that can explain the
internal structure of our constituents.
So let's say we have some noun phrases,
maybe some noun phrases that look - that
look like these Harry coppers, the
coppers spot, the spot, reason, the reason,
and so forth. All those are possible noun
phrases in English. However, you can
probably see that there's a few
regularities popping up in the data. We
could make a rule that said a noun
phrase is any of these: Harry, a noun
phrase, coppers a noun phrase, the
coppers, but as you notice - are noticing, this
rule is not the most efficient. We could
add a few generalizations, such as the
fact that noun phrases sometimes have a determiner, the word the, as
in the coppers, the reasons, the
parties, and sometimes noun phrases don't
have a determiner. Let's try to make a
rule to explain the structure of a noun
phrase. How about this? A noun phrase is
composed of a determiner, and a noun, and
then we have that determiners are the -
the set of one element which is the, and
nouns are the set of five elements, harry
coppers spot reason and parties. This is
a nice rule. It - for example, we could have
the coppers, where the noun phrase is
composed of the determiner the, the noun
coppers. We could have the parties, where
the noun phrase is made up of the
determiner the, and the noun parties. So
this is good. However, there's still a few
phrases that this noun phrase would not
capture. For example, if it saw the word
Harry, it wouldn't think that it's a noun
phrase, because the only thing it
recognizes as a noun phrase is a determiner, the, and a
noun. So let's expand upon this rule, and
add another condition. Let's say a noun
phrase is a determiner and a noun, or a
noun. And we have again the determiner be
the set of one element, the, and the noun
be a set of five elements,
Harry copper spot recent parties. As you
can see, we can have two types of noun
phrases, according to these rules.
We can have some noun phrases, that are
determiner and noun, such as the coppers
the spot the reason and the parties, and
there's another type of noun phrase,
which is just noun like Harry. And
there's others that we can build, like
this, like parties or spot. So these rules
can describe the structure of a noun
phrase in English, at least of these of
this little toy set. Let's add something
more to the equation. How about if a noun
phrase can be three things? It can be a
determiner followed by a noun, or a noun,
or a pronoun. And we have the same set
for the same set for noun, but we have
a new element pronoun, which has one
element in its set and it's they. So now we
would have three types of noun phrases,
the determiner plus noun, the coppers the
spot, we have noun phrases that are just
noun, like Harry, and we have noun phrases
that are just a pronoun, like they. So all
of these are noun phrases, to just noun
phrases with different internal
structures.
There's something important here, the
ones that are in lower case here are
called terminals. The Harry coppers, they,
and so forth. These are the actual words
that we observe in the language, in the
writing, in the speaking, and so forth. We
call them terminals because - because
they're the last point of the derivation.
We're gonna call the others non
terminals. The non terminals are
abstractions that help us support the
explanation for why NPS can be - can
have different configurations. So for
example, you have seen the word Harry in
text, but no one has seen an NP in texts,
or - or just an abstract noun. What you see
are specific instantiations of the object
noun. What you see is an instantiation of
noun, which is spot, an instantiation of
the abstract object noun which is reason,
and if you mount two abstractions, the
determiner and the noun, you would have
an instantiation of the abstract object
noun phrase. So noun phrase, determiner,
and noun, are non terminals because they
are abstractions. The Harry coppers are
terminals because they are the actual
objects that we're dealing with. So you
can see how we mount rules. First with
some non terminals that help us explain
the structure, and then we end up with
our terminals. So how can we use any of
this? We can use this to build parsing
trees of our noun phrases, for example, so
let's say you wanted to explain the
structure of the phrase the parties. How
would you do it with these rules? We know
that this is a noun phrase, so the first
node that we need is the non-terminal, an
NP or noun phrase, which is I'm gonna
have here, and also here for diagram.
Next you need to select the kind of noun
phrase that fits the phrase you want to
parse. For example, the parties has two
words, so it cannot just be a noun phrase
composed of a noun, and it cannot be a
noun phrase composed of a pronoun. Also
neither the nor parties are a pronoun, so
the parties is probably an NP composed
of determiner and noun. So we go through
that rule, and select that then NP is
determiner and noun, and now we need to
go from these non terminals to our
actual terminals. Let's look at the
determiner because the word that we have
is the, and we do have that rule. We can
have det, the determiner, go to the
terminal
the. We could repeat this, and have the
noun, this rule here, noun, go to the
terminal parties. So look at what we have
here. We have a noun phrase which has a
determiner, and a noun, this one here, and
then the determiner has the terminal the.
And the noun has the terminal parties. So
we can use these rules to build a parsed
structure of a noun phrase, so that a
noun phrase has two components, and we
know exactly what these components are.
One of them is a determiner the, and one
of them is a noun
parties. We have seen these types of
rules before, in week two. We called them
context-free grammar rules. They're - they
were rules such that one element in - on
the left could go to a number of
elements on the right. And for example,
you need one determiner for one noun. This
is a type of rule called context grammar,
and as you can see English sentences are
mostly within this domain.
This theory that I'm showing you right
now, by the way, was invented by Chomsky in
1957, this rule style, and it's the one
that you're going to find in the Python
implementation of parsing. They are
called context-free because the
structure, the structures that you need
to build an NP are determined within the
NP. You don't need to look at a verb, for
example, to know how your NP is going to
be constructed. All the information for
the NP lies within the NP. We are going
to use context-free grammar rules for
parsing, and a quick note: if you've taken
LING 1, because right now you must be
having a heart attack,
telling me, wait we're not using X bar? We'll
go into more detail later, but no we're
not going to use X bar, or principles and
parameters, or minimalism, or anything
more modern than 1960s and 70s of phrase
grammar rules! The reason will come up in
the next few videos. For the CS people,
we're gonna call these types of rules
context-free grammar rules. In summary we
have constituents such as professor, the
professor, pizza, pineapple pizza, the
pineapple pizza, the delicious pineapple
pizza.
We have constituents, for example, noun
phrases. We have rules for the internal
structure of constituents, for example,
constituents can be a determiner like
the, and a noun like pizza. We have
rules that are expressed in terms of
terminals, like the specific tokens or
words, the pizza and non - and rules that have
non terminals, such as noun phrase
determiner noun, and so forth. These rules
are context-free, in that the explanation
for what a noun phrase is is self
contained within the noun phrase. And on
the next day, you will continue to build
upon these structures. We're gonna look
at verb phrases and
prepositional phrases, and use them to
try to build a whole parsed English sentence.
