Welcome to week seven. Thank you so much
for being here.
This week, we're going to talk about
parsing. We're going to talk about how
sentences have structures in them; they
can be divided into noun phrases, verb
phrases, and we will use this to help the
computer understand sentences, to help
the computer extract information from
sentences, and to use this so that they
would have a better understanding of the
world. So far we have been doing mostly
natural language processing, which is
turning human language, be it sign
language or spoken language, into
language models which contain, for
example, probabilities of two words being
together or the probability of two words
in English being related to one word in
Spanish, and so forth. From finite state
machines to BERTs,
we have turned information about
language into a language model, a model
for what we've used it for two things:
one of them is natural language
generation, which is for example
predicting a word on your cell phone
once you've typed a few letters, maybe
guessing the next word when you're
typing on your phone and then predicting
what's gonna come next. We've also
done generation, where we make a new
paragraph. We used n-grams for this and then we
used deep learning for this as well. So
one of the things we've done with
natural language processing is natural
language generation. We've done a little
bit of natural language understanding
we're just trying to understand what is
said in some text or some stretch of
language. For example, understanding if a
review is positive or negative, or
understanding what a tweet is about, is
it about sports, is it about spam, is it
about Arabic from Egypt, for example. In
the next two weeks, we'll focus on
language understanding this week, week
seven. We will study how to parse a human
sentence, so that we can distinguish its
different parts. On week 8, we will use this to then learn
information from these statuses, learn
relationships like a restaurant serves a
certain kind of food, or London is the
capital of England, and so forth, and this
will help of course, bring information to
our language models, so that they can
better understand the world, and can
better interact with our users. This week
we'll talk about parsing, rule-based
parsing. We'll mention deep learning
parsing briefly, we'll talk about
statistical parsing, a kind of parsing
called dependency parsing, which is
figuring out which words depend on which,
a kind of rough parsing called chunking,
but we just need like vague parts of a
sentence. And again we're going to do
this to then perform knowledge
extraction, where we try to extract
information about the entities of the
world, and use this when we have systems
like chat bots, or questions and answer
systems, and so forth. Let's get to it.
Let's look at this sentence. We've been
looking at noun phrases since week two.
We have a sentence like I like pizza, and
in week two, we said that the first
element was a noun phrase,
the second element was a verb, and the
third element was a noun phrase. So that
a sentence in English could be defined
as noun phrase, verb phrase, noun phrase,
for example, each of these is a
constituent, so you could have a noun
phrase constituent, that is the word I in
noun phrase constituent, which is the
word pizza, and a verb phrase constituent
for this word like. We call them
constituent because they are pieces of
language that makes sense on their own.
For example, I like pizza.
Constituents can have more than one word
in them,
the subject of this sentence could be
the professor, and then who likes the
pizza, the professor. So there's one
constituent, a noun phrase which is the
professor. It could also be I, it could
also be Jane, and so forth. We have the
same verb, and then we have another noun
phrase with two words, pineapple pizza, as
our object. So we can have constituents
like noun phrases, and they can be
composed of one word, or more words, but
they act as a noun. For example, the
professor, the weird professor, the
extremely weird professor, they might
have a lot of words in them, but together
they're acting like a noun, like the
subject of the verb likes the extremely
weird professor likes pizza. So each one
of those parts of the sentence is called
a constituent. We're going to simplify
the names. We're gonna call noun phrases NP
we're gonna call the verb phrases VP.
What happens if you have more than one
word in your constituent? If you have
this, then one of them is going to be the
head of the constituent. This means that
it's the main idea of the constituent or
what the constituent is talking about. For
example, in the sentence The weird
professor likes pineapple pizza,
you have three constituents: an NP, the
weird professor, a VP likes, and an NP
pineapple pizza. So in the constituent,
the weird professor, what are we talking
about? We're talking about a professor.
We're not really talking about weird, and
we're not talking about the either. So
there's three words in the first
constituent, but only one of them is the
main idea, the professor. So we have one
constituent, NP, the weird professor, and
the head of that NP is professor, which
is a noun. Also in the noun phrase
pineapple pizza, we have two words and
what are we talking about, when we talk
about pineapple pizza? Are we talking
about a pineapple, or a pizza? We're
talking about a pizza, and so that's why
this one is the head of the constituent.
So we have a constituent, pineapple pizza
with one head, pizza which is what we're
talking about. Also notice that the head
has the same category as the whole
constituent, so pizza is a noun, therefore
every - therefore all of the constituent
is a noun phrase. Professor is a noun,
therefore all the constituent is also a
noun phrase. You go ahead and give it a
try. These are all noun phrases. Take a
moment to figure out what are the heads
of each of these noun phrase
constituents. Pause the video, try to
figure it out, and then come back.
Welcome back. These are the heads of each
of those NPs. When we're talking
about a high-class spot such as Mindy's,
for example, we're talking about a spot.
When we're talking about the reason he
comes into the hotbox, we're talking
about a reason. When we talk about
three parties from Brooklyn, we're
talking about parties, even if you have a
single word like they, the head of that
constituent is the word they, because
it's what it's talking about. So again,
the head is the word that can stand for
all of the rest of the constituent. It's
what the phrase is about. And the head
has the same category as the noun phrase,
so it determines the category of the
noun phrase. So for example, reason is a
noun, so that whole phrase is a noun
phrase. Spot is a noun, so that's why the
whole constituent is a noun phrase. We
can have other types of no - of
constituents that we'll analyze in the following
videos, such as verb phrases,
prepositional phrases, and so forth. One
reason why we know that all of these are
noun phrases, is that they all behave the
same. For example, they can all be
subjects of a verb in the phrase They
are great, for example, the word they can be
replaced for any of these noun phrases.
For example, the Broadway coppers are
great, the reason he comes into the hot
box is money, three parties from Brooklyn
are arriving, so any of these NPs can be
the subject of a verb, and that's how we
know we're all - they are all the same
type of constituent. They're all noun
phrases. So summary so far: we have a
structure called a sentence which is
made up of constituents like noun
phrases, and verb phrases, and our first
step is trying to identify what these
constituents are. Maybe our language, like
English, will have noun phrases, maybe it'll have verb
phrases. In the next video, we will look
at how we can analyze the structure, the
internal structure of a constituent, and
then start building larger and larger
elements until we can build the whole
sentence.
