We have been looking at support vector
machines, and for exercise, we're going to
train a support vector machine to assign
parts of speech to words. So before we do
that, let's take a quick look at parts of
speech. As a presummary, parts of speech
help us describe the properties and
behavior of a word. So for example nouns,
and verbs behave differently in
sentences, and knowing what's a noun and
what's a verb is going to help us. It is
also going to help us in natural
language processing, because if you ask
Siri, Siri play some music, Siri needs to
know that play is an action and that
music is what you want the action
performed on. You want to play music, you
want to do an action to a thing. So this
kind of labeling is gonna be very useful
for us in natural language processing. In
many languages the exact label of what's
a noun and a pronoun is going to be
dependent on context, and this is true
for English, and it's going to be true
for the language in our example, which is
Cook Islands Maori. So in order to know
that something is a noun or verb, you
need to look at the previous word, and
the preceding word, at least, and you need
to look at your context. And finally, not
all languages have the same parts of
speech. So you might find languages that
don't have adjectives, for example, or you
might find languages that have a part of
speech that English doesn't have.
So in summary, parts of speech are labels
like noun ,verb, adjective that we
assign to a word according to its
properties,
maybe it's morphological properties,
maybe it's syntactic properties, or
positional properties. Which ones do you
remember from school? You remember things
like nouns, like verbs, like adjectives
and so forth.
There's quite a few. The main system for
describing English has 30-something
tags. It has quite a few. Let's look at
how we can get them for English. This is
an example from the NLTK
documentation. There's many packages in
Python that can give you part of speech,
but this is a very simple example. You
take a sentence like: And now for
something completely different, and you
tokenize it and then you send the array
of tokens to the variable text, then you
take this array of tokens, and run it
through the function part of speech tag,
POS tag of the package NLTK. When you
print the results, you're going to get
this: a list of tuples where the first
element is the word and, now, for
something, completely, different. And the
second word is going to be a tag, like CC,
RB, IN, and so forth. What does the tag
mean here? On the bottom left you have a
list of parts of speech for a very
popular system in English called a Penn
Treebank. If we look at the list. the
first element, CC is the coordinating
conjunction, and the word and is indeed
is a conjunction. The word now is an
adverb, for is a preposition, NN is a
noun,
completely is an adverb, and different is
the adjective that describes something.
But for now it's just an adjective. And
that's a list of parts of speech in
English, but we cannot simply have
something like a dictionary that has
words and then their part of speech
because in English, one word can have
more than one part of speech. For example,
here we have the word Facebook. Sometimes,
Facebook is a noun as in the sentence
Facebook is an app, because we could say
something like Facebooks are an app, singular and plural so
it behaves like the noun cat/cats.
Facebook/Facebooks. On the other hand,
you - Facebook can be be a verb, like in
Let me Facebook that real quick. It's a
verb because you can have it behave like
a verb. You can have the word she
Facebooks, She walks, She is Facebooking,
She is walking, She Facebooked, She walked.
So it can adopt the behavior of a verb
as well.
So things that are nouns in English can
also behave like verbs. And you can go
back and forth and this behavior is
sensitive to context. And look at how
sensitive it is, that if you flip the
order of words then the parts of
speech can change. For example, in the
sentence Gus is learning piano, the - the
word learning is a verb, because it's
what Gus is doing, he is learning. And we
know it's a verb, because the previous
word is is. Is learning. However if we
changed that to Learning is fun. Now
learning is a noun because it comes
before the word is. So at the absolute
minimum, we need to know what our
previous word is, and what our following
word is, in order for us to identify the
part of speech of a word. That is at the
absolute minimum for a language like
English. Cook Islands Maori is gonna have
this behavior as well. So in summary, in
languages like English and Cook Islands
Maori, we can only really define nouns
according to their distributional
properties. For example, nouns can be
preceded by prepositions like by Jake. If
we have to word by, what comes after that
is probably going to be a noun.
Nouns are gonna share properties with
things like pronouns. Jane arrived
yesterday is a sentence where we can
replace Jane with She arrived yesterday,
and these two are gonna be very similar.
So now is something that can occupy the
same spot as the pronoun. So in English,
and as we'll see in Cook Islands Maori,
we do need to look at a word to look at -
to know its context, I'm sorry, we do need
to look at the context of a word if we
want to know its part of speech. This is
not true for every human language.
There's languages like Russian and Latin
where nouns have certain endings that
are very different from the endings of
verbs, and so it is very easy to
distinguish what's a noun and what's a
verb. If you've studied Russian or Latin,
you remember that we call these case so
for example in Russian, the word book, kniga,
is different depending on what we're
doing with the book. If I say the book is
beautiful, I have to say kniga. And if I
want to talk about the book, I have to
say knigye, because it's about the book.
So - and if it's the cover of the book, I
have to say this is similar to the s in
book's, for example in the book's cover in
English, so because now it's behaving like
this in Russian, it's very easy to
identify them and contrast them to the
verbs. But in English we're going to need
the context. Finally, not all languages
have the same parts of speech. For
example, Japanese probably doesn't have
adjectives. In - in English we cannot say, a
beautiful book, beautifuling book, beautifuled
book. So adjectives don't behave like
verbs at all in English. But in Japanese
they do something is beautiful,
utsukushi-i and
I'm walking, aruk-u. Something was
beautiful with utsukushi-katta, someone
walked aruk-atta, something is not beautiful
utsukushi-kunai, some - someone is
not walking aruk-anai. So as you can see,
even though in English, adjectives and
verbs don't behave the same way in
Japanese, they do.
So maybe Japanese does not have what
English would call adjectives, which
again doesn't mean that adjective - that
Japanese is deficient in any way. It
means that Japanese is perfectly fine
and just behaves in a way that's
different from English. In - in addition to
this, other languages might have parts of
speech that you might not be familiar
with. For example Cook Islands Maori has
something called a tense aspect mood
marker, so these are words that will
indicate whether a verb happened in the
past, in the present, or in the future. In
in English we would say that Tere went
or the Tere walked, but in Cook Islands
Maori, you have to say past go Tere to
Rarotonga, Kua 'aere Tere ki Rarotonga. So
we have the word past, and then we have
the word go, the word Tere
for who did the action, ki is the
preposition, and Rarotonga is the place.
Past go Tere to Rarotonga. There can be
other tense aspect mood markers, such as
the future marker: Will go Tere to
Rarotonga. Ka 'aere Tere ki Rarotonga.
There's one for the present, Tei 'aere nei
Tere ki Rarotonga. Is go is Tere to Rarotonga. Tere is going to Rarotonga. So
even though English doesn't have this,
we're gonna need something called a
tense aspect mood part of speech to
correctly describe sentences in
Cook Islands Maori.
Again why do we care about any of this?
Because it's gonna help us parse
sentences, and it's gonna help us
distinguish between actions and things.
So again for example, Siri, I don't know,
play me the Beatles. Play is the action
that you want the software to perform,
and the Beatles are the thing that you
want the action performed on. So it's
gonna be important for us to distinguish
between these two so that the software
can know what the action is, and what the
objective of the action is. You can help
us do things that build - like build
better replies for chatbots, like
identify which part is the action that
the person is performing and which part
is the thing that they're performing it
on. In summary, parts of speech include
labels like nouns, verbs, adjectives and
so forth. In many languages, for example
in English, the only way to know for sure
what your part of speech is, is to look
at your word and its context, at the
minimum the preceding word and the
following word. This is also going to be
true of many languages such as Cook
Islands Maori. Languages can have
different parts of speech, for example to
correctly describe Cook
Islands Maori, we
need something called a tense aspect
mood marker that English doesn't have. In
the next two videos, we're going to code
a support vector machine that can tell
you the parts of speech of Cook
Islands Maori sentences.
