Okay. Let's get started again.
Okay. So welcome back to, um,
week three of CS224N.
Okay. So we- we've got a bit of a change of pace today after week two.
So, um, this week in week three,
we're actually going to have some human language,
and so this lecture has no partial derivative signs in it.
And so we'll be moving away, um,
from sort of working out the so technicalities of doing, um,
new networks and back propagation,
um, and a sort of math heavy week two.
So then, this week,
what we actually want- well,
in today's lecture, we want to look at, well,
what kind of structures do human language sentences have,
and how we can build models that,
um, build that kind of structure for sentences that we see.
Um, so first of all,
I'm gonna sort of explain and motivate a bit about,
um, structure of human language sentences.
So, that's kind of like, um,
linguistics in 20 minutes or something.
Um, then going particularly focusing on dependency grammars,
and then gonna present a method for doing dependency structure,
dependency grammar parsing called transition-based dependency parsing.
And then talk about how you can make neural, um, dependency parsers.
Um, so, um, going on just,
you know, a couple of announcements.
So, assignment two was due one minute ago,
so I hope everyone's succeeded,
um, in getting assignment two out of the way.
If you're still working on it,
do make sure to make, um,
use of the office hours and get help for that.
Coming out just today is assignment three.
Um, assignment three, um,
is basically about this lecture.
Um, so, [LAUGHTER] in assignment three,
what you're doing is building a neural dependency parser,
and so we hope that you can put together what you learned about
neural networks last week and the content of today,
and jump straight right in to building a neural dependency parser.
Um, the other thing that happens in assignment three is that,
we start using a deep learning framework PyTorch.
So, for doing assignment three, instruction zero,
and this is in the PDF for the assignment,
is to install PyTorch as a Python package,
and start using that.
Um, so we've attempted to make assignment three sort of be a highly scaffolded tutorial,
where you can start to learn how to do things in PyTorch by just,
um, writing a few lines of code at a time.
Hopefully that works out for people.
Um, if you have any issues with,
with that, um, well,
obviously, you can send Piazza messages,
come to office hours.
I mean, the one other thing you could think of doing is that there's sort of
a one hour introduction to PyTorch on the PyTorch site,
where you down- where you're directed for installing PyTorch,
and you could also look at that if that was maybe helpful.
Um, now the final mentions, yes.
So, um, final projects, um, you know,
we're going to sort of focus on those more in week five,
but if it's not bad to be thinking about things you could do,
if you're under a custom final project.
You're certainly encouraged to come and talk to me or the TAs.
We have under the sort of office hours page on the website,
a listing of the expertise of some of the different TAs.
Um, since I missed my office hours yesterday,
I'm gonna have a shortened office hour tomorrow from 1:00 to 2:20.
Um, that's at the same time as the,
um, normal CS224N, um,
office hours, so you can kind of come for any reason you want,
but it might be especially good to come to me if you want
to talk about, um, final projects.
Okay. So, let's leap in and start talking about the structure of sentences.
And so, I just sort of want to explain something about human language sentence structure,
and how people think about that structure,
and what kind of goals then people in natural language processing
have of sort of building structure to understand the meaning of sentences.
Um, all of the examples I'm going to give today are in English,
um, because that's the language that you're all expected to have some competence in.
But this really isn't meant to be sort of facts about English.
This is meant to be sort of ideas of how you can think about the structure of
human language sentences that are applied to all sorts of languages.
Okay. So in general,
there are two different ways that
linguists have thought about the structure of sentences,
though there's some relations to them.
One of them is called phrase structure,
or phrase structure grammars.
And if you vaguely remember from CS103 if you did that,
when you spent about the lecture on context-free grammars, um,
phrase structure grammars are using the tools of
context-free grammars to put structures over sentences.
So, I'm first of all going to just briefly introduce that, so you've seen it,
but actually the main tool that we're going to
use in this class and for assignment three,
is to do put dependency structures over,
um, sentences, so I'll then go about that.
So, the idea of phrase structure is to say that
sentences are built out of units that progressively nest.
So, we start off with words that, cat, cuddly,
et cetera, and then we're gonna put them into bigger units that we call phrases,
like "The cuddly cat by the door",
and then you can keep on combining those up into even bigger phrases,
like, "The cuddly cat by the door."
Um, [NOISE] Okay, that's that.
So, how does this work?
Well, so the idea of it,
and this is sort of the way linguists thinks,
is to say, "Well,
here's this language, which,
you know, might not be English.
It might be Oaxacan or some other language.
What kind of structure does it have?
And well, we could look at lots of sentences of the language.
And so the linguist is gonna think,
"Well, I can see,
um, patterns, like the cat,
a dog, the dog,
a cat, et cetera.
So, it's sort of seems like there's one word class here,
which linguists often referred to as determiners.
Um, they're also referred to as article sometimes in English.
There's another word class here of nouns.
And so, what I- to capture this pattern here,
it seems like we can make this unit, um,
that I see all over the place in language, um,
which is made of a,
um, a determiner, followed by a noun.
So, I've write, um,
a phrase structure grammar role,
a context-free grammar role of- I can have
a noun phrase that goes to a determiner, and a noun.
Okay. But, you know,
that's not the only thing that I can, um, see.
So, I can also see, um,
other examples in my language of the large cat,
or a barking dog,
or the cuddly cat, the cuddly dog.
So, that seems that I need to put a bit more stuff into my grammar.
So, maybe I can say from my grammar that a noun phrase goes to a determiner,
and then optionally, you can put in an adjective,
and then you can have a noun.
And then I poke around a little bit further and I
can find examples like the cat in a crate,
or a barking dog by the door.
And I can see lots of sentences like this.
And so I want to put those into my grammar.
But at that point, I noticed something special, because look,
here are some other things,
and these things look a lot like the things I started off with.
So, it seems like,
which sort of having a phrase with
the same expansion potential that's nested inside this bigger phrase,
because these ones can also be, um, expanded, right?
I could have something like the green door something in here.
So, I just wanna capture that in some way.
So, maybe I could say that a noun phrase goes to a determiner,
optionally an adjective, a noun,
and then a something else,
which I'll call a prepositional phrase.
And then I'm gonna write a second rule saying that
a prepositional phrase goes to a preposition,
that's gonna be these words here,
um, followed by a noun phrase.
So then I'm reuse- [NOISE] I'm reusing my noun phrase that I defined up here.
So then I could immediately generate other stuff.
I can sort of say,
"The cat by the, the large door."
Or indeed I could say,
"The cat by the large crate."
Um, "The cat by the large crate on the table",
or something like that,
because once I can have the prepositional phrase includes a noun phrase,
and a noun phrase includes a prepositional phrase,
I've already got something that I can kind of
recursively go back and forth between noun phrases,
and I can make infinitely big sentences, right?
Yeah?
Yeah? So, I could write something like, yeah,
"The cat by
the large crate on the,
um, large table, um, by the door."
Right. I can keep on going and make big sentences.
And I could say, well,
I've got a- I don't have space to fit it on this slide,
but I've got an analysis of this according to my grammar,
where that's a noun phrase goes to a determiner noun prepositional phrase.
The prepositional phrase goes to a preposition,
and a noun phrase,
and this noun phrase goes to a determiner,
adjective, noun prepositional phrase.
And that goes to a preposition,
and another noun phrase,
and I keep on going and I can produce big sentences.
Okay. You know, that kind of then continues on,
because, um, you know,
I can then start seeing more bits of grammar.
So, I could say, "Well,
I can now talk to the cat."
Um, and so if I wanna capture,
um, this talking to a cat here, well,
that now means I've got a verb,
because words like talk and walk are verbs.
And then talk to the cat,
it seems like after that,
it could become a prepositional phrase.
And so I could write another rule saying that a verb phrase
goes to a verb followed by a prepositional phrase.
And then I can make more bigger sentences like that.
And I could look at more sentences of the language and start building up these,
these context-free grammar rules to describe the structure of the language.
And that's part of what linguists do,
and different languages, um, have different structures.
So, um, for example,
like in this, uh,
little grammar I've had and in general in English, um,
what you do, what you find is that prepositional phrases following the verb.
But if you go to a different language like Chinese,
what you find is the prepositional phrases come before the verb.
And so, we could say okay,
there are different rules for Chinese, um,
and I could start writing a context-free grammar for them. Okay, beauty.
Um,so that's the idea of context-free grammars,
and actually, you know,
this is the dominant approached linguistic structure
that you'll see if you go and do a linguistics class in the linguistics department,
people make these kinds of Phrase Structure Grammar trees.
Um, but just to be contrary,
no, it's not actually just to be contrary,
it's because this alternative approach has been
very dominant in computational linguistics.
What I'm going to show you instead, um,
is the view point of dependency structure.
So, the idea of dependency structure
is rather than having these sort of phrasal categories,
like, noun phrases and prepositional phrases,
and things like that,
we are going to directly, um,
represent the structure of sentences by saying,
how words, how arguments or modifiers of other words in a recursive faction.
Which is sort of another way of saying how the dependence on other words.
So, we have a sentence,
''Look in the large crate in the kitchen by the door''.
And if we want to we can give these word,
words word classes, so we can still say this is a verb,
and this is a preposition,
and this is a determiner,
and this is an adjective,
and this is a noun.
But to represent the structure,
what we're going to say is, "Well,
look here is the the root of this whole sentence."
So, that's where things start.
Um, and then, well, where are we going to look is in the large crate,
so that is a dependent of look.
And well, if we- then we have for the crate,
it's got some modifies its a large crate.
So, that's a dependent of crate.
Its the large crate,
that's a dependence of crate.
And in this system of dependencies I'm going to show you,
we've got in as kind of,
um, a modifier of crate in the large crate.
I could come back to that.
Well, but this crate has its own modification,
because it's a crate in the kitchen.
So, we have, in the kitchen,
as a modifier of crate.
And it's the kitchen in the kitchen,
these are dependence of crate.
And well, then we have this next bit by the door.
And as I'll discuss in a minute, well,
what does the by the door modifying?
It's still modifying the crate,
it saying, ''It's the crate by the door.''
Okay. So, the by the door is also a dependent of crate,
and then we've got the structure of dependencies coming off of it.
Okay. And so that's then, um,
the structure you get may be drawn a little bit more
neatly when I did that in advance like this.
And so we call these things, uh, dependency structure.
And so crucially, what we're doing here,
um, is that we're- sorry,
I had two different examples.
[NOISE] different examples.
[LAUGHTER] Um, um,
what we're doing is saying, what,
what words modify other words?
And so, that allows us to sort of
understand how the different parts of the sentence relate to each other.
And so, overall, you know,
then- let me just so say here,
you might want to why do we need sentence structure?
You know, the way, um,
language seems to work when you're talking to
your friends is that you just blab of something,
and I understand what you're saying, and, um,
what goes on beyond that, um,
is sort of not really accessible to consciousness.
But well, to be able to have machines that interpret language correctly,
we sort of need to understand the structure of these sentences,
because unless we know what words are arguments and modifiers of other words,
we can't actually work out what sentences mean.
And I'll show some examples of that as to how things go wrong immediately,
because actually, a lot of the time there are
different possible interpretations you can have.
And so, in general,
our goal is, you know,
up until now we've sort of looked at the meaning of words, right?
We did word vectors,
and we found that words there was similar meaning,
and things like that.
Um, and you can get somewhere in human languages with just saying words.
I mean you can say, "Hi",
and friendly, um, and things like that,
but you can't get very far with just words, right?
The way human beings can express
complex ideas and explain and teach things to each other,
is you can put together words to express more complex meanings.
And then, you can do that over and over again
recursively to build up more and more complex meanings,
so that by the time you're reading the morning newspaper,
you know most sentences are sort of 20-30 words long,
and they're saying, um,
some complex meaning, like you know,
"Overnight Senate Republicans resolve that they would not do blah blah blah blah.''
And you understand that flawlessly,
by just sort of putting together those meanings of words.
And so, we need to be able to know what is connected to
what in order to be able to do that.
And one of the ways of saying, um,
that's important is saying,
''What can go wrong?''
Okay. So here, is a newspaper article.
Uh, ''San Jose cop kills man with knife''.
Um, now, this has two meanings and the two meanings, um,
depend on, well, what you decide depends on what,
you know, what modifies what?
So, what are the two meanings. Meaning one.
The cop stabs the guy. [LAUGHTER]
The cop stabs the guy.
Right. So, meaning one is the cop stabs that guy.
So, what we've got here is,
we've got the cops that are killing.
So, this is what we'll say is the subject of kill,
is the cops, and I'll just call them the San Jose cops here.
And well, there's what they kill which say that,
the man is an object of killing.
Um, and then while one person is the,
the cop using knife to kill the person.
And so that's then that this is, um,
modifier and here if we complex we call it an instrumental
modifier to say that the cops are killing people with a knife.
That's one possible analysis.
Okay. Then, there's a second meaning sentence can have.
The second meaning sentence can have. [NOISE]
Okay. The second meaning the sentence can have is,
that's the man has a knife.
So, um, in that case,
what we wanna say is, well, you know,
is this word man,
and this man has, uh,
noun modifier, um, which is sort of saying something that the man possesses,
and then this dependency is the same,
and it's a man with a knife.
Okay. And so, the interpretations of these sentences that you can get depend on putting
different structures over the sentences in terms of who is- what is modifying what?
Um, here is another one that's just like that one.
Um, scientists count whales from space.
[LAUGHTER] Okay.
So again, this sentence has two possible structures, right?
[LAUGHTER] That we have, the scientists are the subject that are
counting and the whales are the object.
Um, and, well, one possibility is that this is how they're doing the counting,
um, so that they're counting the whales from space using something like a satellite.
Um, but the other possibility is that these parts are the same,
this is the subject,
and this is the object,
but these are whales from space which, you know,
we could have analyzed as a noun phrase goes to,
um, and now, on a PP,
you know, um, constituency grammar,
but its dependency grammar we saying, "Oh,
this is now a modifier of the whales,
and that they are whales from space, um,
that are starting to turn up as in the bottom example."
Right? So, obviously what you want is this one is correct and this one is here wrong.
Um, and so this choice is referred to as a prepositional phrase attachment ambiguity,
and it's one of the most common ambiguities in the parsing of English, right?
So, here's our prepositional phrase from space.
And so in general,
when you have prepositional phrases and before it you have verbs,
and noun phrases, or nouns,
that the prepositional phrase can modify
either of the things that come beforehand, right?
And so this is a crucial way in which
human languages are different from programming languages, right?
In programming languages, we have hard rules
as to how you meant to interpret things that dangle afterwards, right?
So, in programming languages,
you have an else is always construed with the closest if.
Well, if that's not what you want, um,
you have to use parentheses or indentation or something like that.
I guess, it's different in Python because you have to use indentation.
But if we think of something like C or a similar language, right?
Um, if you haven't used,
um, braces to indicate,
it's just deterministically, the else goes with the closest if.
Um, but that's not how human languages are.
Human languages are, um,
this prepositional phrase can go with anything proceeding,
and the hearer is assumed to be smart enough to work out the right one.
And, you know, that's actually a pa- large part of why
human communication is so efficient, right?
Like, um, we can do such a good job at communicating with
each other because most of the time we don't have to say very much,
and there's this really smart person on the other end, um,
who can interpret the words that we say in the right way.
Um, so, that's where if you want to have artificial intelligence and smart computers,
we then start to need to build language understanding devices who can also,
um, work on that basis.
That they can just decide what would be the right thing for form space to modify.
And if we have that working really well,
we can then apply it back to programming languages,
and you could just not put in any braces in your programming languages,
and the compiler would work out what you meant.
Um, okay. So, this is prepositional phrase attachment.
It's sort of seems maybe not that hard there,
but you know, it, it gets worse, I mean,
this isn't as fun an example,
but it's a real example of a sentence from The Wall Street Journal actually.
The board approved this acquisition by Royal Trustco Limited of Toronto for $0.27,
$27 a share at its monthly meeting.
Boring sentence, but, um,
what is the structure of this sentence?
Well, you know, we've got a verb here,
and we've got exactly the same subject,
and for this noun,
um, object coming after it.
But then what happens after that?
Well, here, we've got a prepositional phrase.
Here, we've got a prepositional phrase.
You've just got a see four prepositional phrases in a row.
And so, well, what we wanna
do is say for each of these prepositional phrases what they modify,
and starting off there only two choices,
the verb and the noun proceeding as before.
But it's gonna get more complicated as we go in, because look,
there's another noun here,
and another noun here,
and another noun here.
Um, so once we start getting further in there'll be more possibilities.
Okay. So, let's see if we can,
um, work it out.
So, um, by Royal Trustco Limited, what's that modifying?
[NOISE] Right. You see acquisition,
so it's not the board approved by Royal Trustco Limited,
it's an acquisition by Royal Trustco Limited.
Okay. So, this one is a dependent of the acquisition.
Okay. Um, now, we went to of Toronto,
and we have three choices,
that could be this, this, or this.
Okay. So, of Toronto is modifying.
Acquisition. [NOISE]
Its acquisition of Toronto?
[LAUGHTER] No, I think that's a wrong answer.
Um. [LAUGHTER] Is there another guess for what of Toronto is modifying?
Royal Trustco.
Royal Trustco, right. So, it's Royal Trustco Limited of Toronto.
So, this of Toronto is a dependent of Royal Trustco Limited.
And Royal Trustco Limited,
right, that's this again,
sort of this noun phrase,
so it can also have modifiers by prepositional phrase.
Okay. For $27 a share is modifying acquisition, right?
[NOISE] So now, we leap right back.
[NOISE] I'm drawing this wrong.
Now, we leap right back and,
um, is now the acquisition that's being modified.
And then finally, we have at its monthly meeting is modifying?
[NOISE]
Approved.
Well, the approved, right?
It's approved, yeah.
It's approved that its monthly meeting.
Okay. [NOISE] I drew that on,
[NOISE] I drew that one the wrong way around with the arrow.
Sorry, it should have been done this way.
I'm getting my arrows wrong. [NOISE] Um, um.
Okay. So that we've got this pattern of how things are modifying.
Um, [NOISE] and so actually, you know,
once you start having a lot of things that have choices like this,
you stop having- if I wanna put an analysis ac-
on to this sentence I've to work out the, the right structure,
I have to potentially consider an exponential number of possible structures because,
I've got this situation where for the first prepositional phrase,
there were two places that could have modified.
For the second prepositional phrase,
there are three places that could have modified.
For the fourth one,
there are five places that could have modified.
That just sounds like a factorial.
It's not quite as bad as the factorial, because normally,
once you've let back that kind of closes off the ones in the middle.
And so, further prepositional phrases have to be
at least as far back in terms of what they modify.
And so, if you get into this sort of combinatorics stuff the number of analyses you get
when you get multiple prepositional phrases is the sequence called the Catalan numbers.
Ah, but that's still an exponential series.
And it's sort of one that turns up in a lot of places when they're tree-like contexts.
So, if any of you are doing or have done CS228,
where you see, um,
triangular- triangulation of, ah,
probabilistic graphical models and you ask how many triangulations there are,
that's sort of like making a tree over your variables.
And that's, again, gives you the number of them as the Catalan series.
Okay. But- so the point is,
we ha- end up with a lot of ambiguities.
Okay. So, that's prepositional phrase attachments.
A lot of those going on.
They are far from the only kind of ambiguity.
So, I wanted to tell you about a few others.
Um, okay, shuttle veteran and longtime NASA executive Fred Gregory appointed to board.
Um, why is this sentence ambiguous?
What are the different reading of this statement?
[NOISE].
Yes?
Uh, it's a better [inaudible]
Okay. So, um, right answer.
So, yeah there are two possibilities, right?
That is either that there's somebody who's
a shuttle veteran and a long time NASA executive,
and their name is Fred Gregory,
and that they've been appointed to the board.
Um, or, um, the other possibility
is that there's a shuttle veteran and there's a long time NASA executive,
Fred Gregory, and both of them have been appointed to the board.
And so, again, we can start to indicate the structure of that using our dependency.
So, we can ether,
um, say, okay, um,
there's Fred Gregory and then this person is, um,
a shuttle veteran and long ta- and whoops,
and longtime NASA executive.
Or we can say, well,
we're doing appointment of a veteran and the longtime NASA executive, Fred Gregory.
And so, we can represent by dependencies,
um, these two different structures.
Okay. Um, that's, um, one.
Um, That one is not very funny again.
So- so, here's a funnier example that illustrates the same ambiguity effectively.
Um, so, here's precedence first physical.
Doctor: No heart, cognitive issues.
[LAUGHTER] Um, so, there isn't actually an explicit,
um, coordination word here.
But effectively in, um,
a natural language or certainly English, um,
you can use kind of just comma of sort of list intonation
to effectively act as if it was an "And" or an "Or", right?
So, here, um, we have again two possibilities that either we have
issues and the dep- and the dependencies
of- the dependencies of issues is that there are no issues.
So, that's actually a determiner, ah, no issues.
Um, and then it's sort of like no heart or cognitive issues.
So, heart is another dependent.
It's sort of a non-compound heart issues.
And so, we refer to that as an independency,
and then it's heart or, um, cognitive.
Um, so that heart or cognitive is
a conjoined phrase inside of this "No heart" or "Cognitive issues".
But there's another possibility,
um, which is, um,
that the coordination is at the top level that we have "No heart" and "Cognitive issues".
And, um, at that point,
we ha- have the "Cognitive" as an adjective modifier of the "Issues" and the "No heart",
the determiner is just a modifier of "Heart",
and then these being conjoined together.
So, um, "Heart" has a depend- has a coordinated dependency of "Issues".
Okay. That's one one.
Um, I've got more funny ones.
Susan gets- [NOISE] [LAUGHTER] Okay.
So, what the person [LAUGHTER] who wrote this intended to
have is that there- we- Here we've got an adjective modifier ambiguity.
So, the intended reading was, um,
that "First" is an adjectival modifier of "First hand" and it's firsthand experience.
Um, so, the "First hand" is a modifier of
"Experience" and the "Job" is also a modifier of "Experience".
And then we have the same kind of subject,
object, um, reading on that one.
Um, but unfortunately, um, this sentence, um,
has a different reading, um,
where you change the modification relationships.
Um, and you have it's the first experience and it goes like this. Um. [LAUGHTER] Okay.
[NOISE] One more example.
Um, "Mutilated body washes up on Rio beach to be used for Olympics beach volleyball."
Um, wha- what are- [LAUGHTER]
what are the two ambigui- What are the two readings that you can get for this one?
[NOISE]
We've got this big phrase that I want to try and put
a structure of to be used for Olympic beach volleyball,
um, and then, you know,
this is sort of like a prepositional phrase attachment ambiguity
but this time instead of it's a prepositional phrase that's being attached,
we've now got this big verb phrase we call it, right,
so that when you've sort of got most of a sentence but without any subject to it,
that's sort of a verb phrase to be used for
Olympic beach volleyball which might be then infinitive form.
Sometimes it's in part of CPO form like being used for beach volleyball.
And really, those kind of verb phrases they sort of just like, um, prepositional phrases.
Whenever they appear towards the right end of sentences,
they can modify various things like verbs or nouns.
Um, so, here, um, we have two possibilities.
So, this to be used for Olympics beach volleyball.
Um, what the right answer is meant to be is that that is a dependent of the Rio beach.
So, it's a, um,
modifier of the Rio Beach.
Um, but the funny reading is,
um, that instead of that, um,
we can have here is another noun phrase muti- mutilated body,
um, and it's the mutilated body that's going to be used.
Um, and so then this would be, uh,
a noun phrase modifier [NOISE] of that.
Okay. Um, so knowing the right structure of sentences is
important to understand the interpretations you're
meant to get and the interpretations you're not meant to get.
Okay. But it's, it's sort of, um, okay,
you know, I was using funny examples for the obvious reason, but, you know,
this is sort of essential to all the things that
we'd like to get out of language most of the time.
So, you know, this is back to the kind of
boring stuff that we often work with of reading through
biomedical research articles and trying to extract facts
about protein-protein interactions from them or something like that.
So, you know, this is, um,
the results demonstrated that KaiC interacts rhythmically with SasA Ka- KaiA and KaiB.
Um, and well, [NOISE] I turned the notification's off.
[NOISE] Um, so, if we wanna get out sort of protein-protein interaction,
um, facts, you know, well,
we have this KaiC that's interacting with these other proteins over there.
And well, the way we can do that is looking at patterns in our dependency analysis,
and so that we can sort of, um,
see this repeated pattern where you have, um,
the noun subject here interacts with a noun modifier,
and then it's going to be these things that are beneath that of the SasA
and its conjoin things KaiA and KaiB are the things that interacts with.
So, we can kind of think of these two things as essentially, um, patterns.
[NOISE] I actually mis-edited this.
Sorry. This should also be nmod:with.
[NOISE] Um, we can kind of think of
these two things as sort of patterns and
dependencies that we could look for to find examples of,
um, just protein-protein interactions that appear in biomedical text.
Okay. Um, so that's the general idea of what we wanna do,
and so the total we want to do it with is these Dependency Grammars.
And so, I've sort of shown you some Dependency Grammars.
I just want us to sort of motivate Dependency Grammar a bit more,
um, formally and fully, right?
So, Dependency Grammar, um,
postulates the what is syntactic structure is is that you have, um,
relations between lexical items that are sort of
binary asymmetric relations which we draw as arrows,
because they are binary and asymmetric,
and we call dependencies.
And there's sort of two ways, common ways,
of writing them, and I've sort of shown both now.
One way is you sort of put the words in a line and that makes it.
He see, let's see the whole sentence.
You draw this sort of loopy arrows above them and
the other way is you sort of more represent it as a tree,
where you put the head of the whole sentence at the top,
submitted and then you say the dependence of submitted,
uh, bills were in Brownback and then you say,
um, the dependence of each of those.
Um, so, it was bills on ports and immigration.
So, the dependence of bills and were submitted words,
the dependent of submitted and you're giving this kind of tree structure.
Okay. Um, so, in addition to the arrows commonly what we do is we
put a type on each arrow which says what grammatical relations holding them between them.
So, is this the subject of the sentence?
Is it the object of the verb?
Is that a, um,
a conjunct and things like that?
We have a system of dependency labels.
Um, so, for the assignment,
what we're gonna do is use universal dependencies,
which I'll show you more,
a little bit more in a minute.
And if you think,
"Man, this stuff is fascinating.
I wanna learn all about these linguist structures."
Um, there's a universal dependency site, um,
that you go and can go off and look at it and learn all about them.
But, if you don't think that's fascinating, um,
for what we're doing for this class,
we're never gonna make use of these labels.
All we're doing is making use of the arrows.
And for the arrows,
you should be able to interpret things like prepositional phrases as to what they're
modifying just in terms of where
the prepositional phrases are connected and whether that's right or wrong.
Okay. Yes. So formally,
when we have this kind of Dependency Grammar,
we've sort of drawing these arrows and we sort of refer to
the thing at this end as the head of a dependency.
And the thing at this end as the dependent of the dependency.
And as in these examples are normal expectation
and what our policies are gonna do is the dependencies form a tree.
So, it's a connected acyclic single,
um, rooted graph at the end of the day.
Okay. So, Dependency Grammar has an enormously long history.
So, basically, the famous first linguists that human beings know about his Panini who,
um, wrote in the fifth century before the Common Era
and tried to describe the structure of Sanskrit.
And a lot of what Panini did was working out things about all of
the morphology of Sanskrit that I'm not gonna touch at the moment.
But beyond that, he started trying to describe the structure of Sanskrit sentences.
And, um, the notation was sort of different but, essentially,
the mechanism he used for describing the structure of
Sanskrit was dependencies of sort of working out these,
um, what are arguments in modifies of what relationships like we've been looking at.
And indeed, if you look at kind of the history of humankind, um,
most of attempts to understand the structure of
human languages are essentially Dependency Grammars.
Um, so, sort of in the later parts of the first millennium,
there was a ton of work by Arabic grammarians and essentially what
they used is also kind of basically a Dependency Grammar.
Um, so compared to that, you know,
the idea of context-free grammars and
phrase structure grammars is incredibly incredibly new.
I mean, you can basically, um, totally date it.
There was this guy Wells in 1947 who first proposed
this idea of having these constituents and phrase structure grammars,
and where it then became really famous is through the work of Chomsky, um,
which love him or hate him is by far the most famous, um,
linguist and also variously contributed to Computer Science.
Who's head of the Chomsky hierarchy?
Do people remember that 103?
Yeah. Okay, the Chomsky hierarchy,
the Chomsky hierarchy was not invented to torture beginning computer science students.
The Chomsky hierarchy was invented because Chomsky wanted to make
arguments as to what the complexity of human languages was, um.
Okay. Yeah. So, in modern work,
uh, there's this guy Lucie Tesniere.
Um, and he sort of formalized
the kind of version of dependency grammar that I've been showing you.
So, um we sort of often talk about his work.
And you know it's- it's long-term being influential and computational linguistics.
Some of the earliest parsing work in
US Computational Linguistics was dependency grammars.
But I won't go on about that um more now.
Okay. Um, just one,
two little things um, to note.
I mean, if you somehow start looking at other papers where their dependency grammars,
people aren't consistent on which way to have the arrows point.
There's sort of two ways of thinking about this um,
that you can either think okay,
I'm gonna start at the head and point to the dependent.
Or you can say I'm going to start at the dependent and say what its head is,
and you find both of them.
Uh, the way we're gonna do it in this class is to do it the way Tesniere did it,
which was she started the head and pointed to the dependent.
Uh, sorry. I'm drawing that wrong.
Whoops, um because discussion of the outstanding issues.
So, really um, the dependent is sort of discussion.
Um, okay. We go from heads to dependence.
And usually, it's convenient to serve in addition to the sentence to
sort of have a fake root node that points to the head of the whole sentence.
So, we use that as well.
Okay. Um, so to build a dependency pauses or to indeed build
any kind of human language structure
finders including kind of constituency grammar pauses,
the central tool in recent work,
where recent work kind of means the last 25 years has been this idea of tree banks.
Um, and the idea of tree banks is to say we are going to get
human beings to sit around and [NOISE] put grammatical structures over sentences.
So, here are some examples I'm showing you from
Universal Dependencies where here are some um, English sentences.
I think Miramar was a famous goat trainer or something.
And some human being has sat and put
a dependency structure over this sentence and all the rest.
Um, and with the name Universal Dependencies,
this is just an aside.
Um, Universal Dependencies is actually project I've been strongly involved with.
But precisely what the goal of universal dependencies
was is to say what we'd like to do is have
a uniform parallel system of
dependency description which could be used for any human language.
So, if you go to the Universal Dependencies website,
it's not only about English.
You can find Universal Dependency analyses of you know, French,
or German, or Finish,
or Carsac, or Indonesian,
um, lots of languages.
Of course, there are um, even more languages
which there aren't Universal Dependencies analyses of.
So, if you have a- a big calling to say I'm gonna
build a Swahili Universal Dependencies um,
treebank, um, you can get in touch.
Um, but anyway.
So, this is the idea of treebank.
You know, historically, tree banks wasn't something that people thought of immediately.
This so- an idea that took quite a long time to develop, right?
That um, people started thinking about grammars
of languages even in modern times in the fifties,
and people started building parses for languages in the 19, early 1960s.
So, there was decades of work in the 60s,
70s, 80s, and no one had tree banks.
The way people did this work is that they wrote grammars,
that they either wrote grammars like the one I did for constituency of
noun phrase goes to determiner, optional adjective noun.
Noun goes to goat um,
or the equivalent kind of grammars and a dependency format,
and they hand built these grammars and then train,
had parsers that could parse these sentences.
Going into things, having a human being write a grammar feels more efficient.
Because if you write uh,
a rule like noun phrase goes to determiner optional adjective noun.
I mean, that- that describes
a huge number of phrases or actually infinite number of phrases.
Um, so that you know,
this is the structure of you know, the cat, the dog,
or cat or dog, or large dog all those things we saw at the beginning.
So, it's really efficient you're capturing lots of stuff with one rule.
Um, but it sort of turned out that in practice that wasn't such a good idea,
and it turned out to be much better to have
these kind of treebank supporting structures over sentences.
It's often a bit more subtle was to why that
is because it sounds like pretty menial work um,
building tree banks, and in some sense it is.
Um, but you know,
it turns out to be much more useful.
I mean, so one huge benefit is that treebanks are very reusable.
That effectively what they was in 60s, 70s,
and 80s was that every different you know,
people who started about building a parser invented
their own notation for grammar rules which got more and more complex,
and it was only used by their parser and nobody else's parser.
So, there was no sharing and reuse of the work those done by human beings.
Well, once you have a treebank,
it's reusable for all sorts of purposes that lots of people build parsers format.
But also other people use it as well like linguists now often used
tree banks to find examples of different constructions.
Um, but beyond that,
this sort of just became necessary once we wanted to do machine learning.
So that if we want to do machine learning,
we want to have data that we can build models on.
In particular, a lot of what
our machine learning models exploit is how common are different structures.
So, we want to know about the commoners and the frequency of things.
Um, but then treebanks gave us another big thing which is,
well, lots of sentences are ambiguous,
and what we want to do is build models that find the right structure for sentences.
If all you do is have a grammar you have no way of
telling what is the right structure for ambiguous sentences.
All you can do is say hey that sentence with
four prepositional phrases after it that I showed you earlier,
it has 14 different parsers.
Let me show you all of them.
Um, but once you have um,
treebank examples, you can say this is the right structure for this sentence in context.
So, you should be building a machine learning model which will recover that structure,
and if you don't that you're wrong.
[NOISE]. Okay. Um, so that's treebanks.
Um, so how are we gonna do build dependency parsers?
Well, somehow we want models that can kind of capture what's the right parse.
Just thinking about abstractly, you know,
there's sort of different things that we can pay attention to.
So, one thing that we can pay attention to is the sort of actual words, right?
Discussion of issues.
That's a reasonable thing.
So, it's reasonable to have issues as dependent of discussion um,
where you know, discussion of outstanding.
That sounds weird.
So, you probably don't want that dependency.
Um, there's a question of how far apart words are.
Most dependencies are fairly short distance.
They not all of them are.
There's a question of what's in between.
Um, if there's a semicolon in between,
there probably is an a dependency across that.
Um, and the other issue is sort of how many arguments do things take?
So, here we have was completed.
If you see the words was completed,
you sort of expect that there'll be a subject before of the something was completed,
and it would be wrong if there wasn't.
So, you're expecting an argument on that side.
But on the other side, hand it won't have object after it.
You won't say the discussion was completed the goat.
Um, that's not a good sentence, right?
So, you won't have ah, um, an object after it.
So, there's sort of information of that sort,
and we want to have our dependency parsers be able to make use of that structure.
[NOISE] Okay.
Um, so effectively what we do when we build a dependency parser is going to say,
for each word is- is going to be the dependent of some other word or the root.
So, this give here is actually the head of the sentence.
So, it's a dependent of root,
the talk is a dependent of give,
'll is a dependent of talk.
And so, for each word we want to choose what is
the dependent of and we want to do it in such a way that the dependencies form a tree.
So that means it would be a bad idea if we made a cycle.
So, if we sort of said, Bootstrapping, um,
was a dependent of, um, talk,
um, but then we had things sort of move around.
So,this goes to here,
but then talk is a dependent that,
and so I'm gonna cycle that's bad news,
we don't want cycles, we want a tree.
And there's one final issue,
um, which is we don't want things that,
um, is whether we want to allow dependencies to cross or not,
um, and this is an example of this.
So, most of the time, um,
dependencies don't cross each other.
Uh, but sometimes they do,
and this example here is actually an instance for that.
So, I'll give a talk tomorrow, um, on bootstrapping.
So, we're giving a talk that's the object,
and when it's being given is tomorrow,
but this talk has a modifier that's on bootstrapping.
So, we actually have another dependency here that crosses, um, that dependency.
And that's sort of rare,
that doesn't happen a ton in English,
but it happens sometimes in some structures like that.
And so, this is the question of whether, um,
what we say is that the positive sentence is projective if there
no crossing dependencies and it's non-projective if there are crossing dependencies,
and most of the time, English's projective and it's
parses of sentences, but occasionally not.
And when it's not is when you kind of have
these constituents that are delayed to the end of the sentence, right?
You could've said, I'll give a talk on bootstrapping tomorrow,
and then a [inaudible] have a projective parse, but if you want to,
you can kind of delay that extra modifier and say I'll give a talk
tomorrow on bootstrapping and then the parse becomes non-projective.
Um, okay.
So, that's that.
Um, there are various ways of,
um, doing dependency parsing,
but basically what I am gonna tell you about today is this one called
transition-based or deterministic dependency parsing,
and this is, um,
the one that's just been enormously influential in practical deployments of parsing.
So, when Google goes off and parses every web page,
what they're using is a transition based parser.
Um, and so, this was a notion of parsing that, um,
was mainly popularized by this guy,
walk him Joakim Nivre, he is a Swedish computational linguists.
Um, and what you do it's- it's sort of inspired by shift-reduce parsing.
So, probably in- in our CS103 or compilers class or something,
you saw a little bit of shift-reduce parsing.
And this is sort of like a shift-reduce parser,
apart from when we reduce,
we build dependencies instead of constituent.
Um, and this has a lot of very technical description that
doesn't help you at all to look at in terms of understanding what,
um, a shift-reduce parser does.
And here's a formal description of a
transition-based shift-reduce parser and which also doesn't help you at all.
Um, so, instead we kinda look at this example,
uh, [LAUGHTER] because that will hopefully help you.
So, what I wanna to do is parse the sentence "I ate fish".
And yet formally what I have is I have a why I start,
there are three actions I can take and I have
a finished condition for formal parse, parse.
Um, and so here's what I do.
So, I have a stack which is on this side and I have a buffer.
Um, so, the stack is what I have built,
and the buffer is all the words in the sentence I haven't dealt with yet.
So, I stop the parse,
and that's the sort of instruction here, by putting route,
my root for my whole sentence onto my stack,
and my buffer is the whole sentence,
and I haven't found any dependencies yet.
Okay, and so then,
the actions I can take is to shift things onto the stack
or to do the equivalent of a Reduce where I build dependencies.
So, starting off, um,
I can't build a dependency because I only have root on the stack,
so the only thing I can do is shift,
so I can shift I onto the stack.
Um, now, I could at this point say,
let's build a dependency,
I is a dependent of root,
but that would be the wrong analysis,
because really the head of this sentence is I ate.
So, I'm a clever boy and I shift again.
And now I have root I ate on the stack.
Okay, and so, at this point,
I'm in a position where,
hey, what I'm gonna do is reductions that build structure, because look,
I have I ate here and I want to be able to say
that I is the subject of dependency of ate,
and I will do that by,
um, by doing a reduction.
And so, what I'm gonna do is the left-arc reduction, which says, look,
I'm gonna treat the second from top thing on the stack
as a dependent of the thing that's on top of the stack.
And so, I do that,
and so, when I do that,
I create the second from the head thing as a subject dependent of ate,
and I leave the head on the stack ate,
but I sort of add this dependencies as other dependencies I've built.
Okay, um, so, I do that.
Um, now, I could immediately reduce again and say ate is a dependent of root,
but my sentence's actually I ate fish.
So, what I want to do is say, "Oh,
if it's still fish on the buffer," so what I should first do is shift again,
have root ate fish in my sentence,
and then I'll be able to say, Look,
I want to now build, um,
the thing on the top of this stack as
a right dependent of the thing that's second from top of the stack,
and so that's referred to as a Right-Arc move,
and so, I say Right Arc, and so,
I do a reduction where I've generated
a new dependency and I take the two things that are on top of the stack and say,
um, fish is a dependent of ate,
and so therefore, I just keep the head.
I always just keep the hit on the stack and the- and I generate this new Arc.
And so, at this point,
I'm in the same position I want to say that this ate is a right dependent of my route,
and so, I'm again going to do Right Arc,
um, and make this extra dependency here.
Okay. So, then my finished condition of having
successfully parsed the sentence is my buffer is
empty and I just have root left on my stack because that's what I sort of said back here,
that was, buffer is empty as my finished condition.
Okay. So, I've parsed the sentence.
So that worked well but, you know,
I actually had different choices of when to pa- when to shift and when to reduce.
And I just miraculously made the right choice at each point.
And well, one thing you could do at this point is say, well,
you could have explored every choice and,
um, seen what happened and gone different parsers.
And I could have,
but if that's what I'd done,
I would've explored this exponential size tree of different possible parsers.
And if that was what I was doing,
I wouldn't be able to parse efficiently.
And indeed that's not what people did in the 60s, 70s and 80s.
Uh, clever people in the 60s said,
uh, rather than doing a crummy search here,
we can come up with clever dynamic programming algorithms and you
can relatively efficiently explore the space of all possible parsers.
Uh, and that was sort of the mainstay of parsing in those decades.
But when Joakim Nivre came along,
he said "Yeah, that's true, um, but hey,
I've got a clever idea, uh,
because now it's the 2000s and I know machine learning."
Um, so, what I could do instead,
is say I'm at a particular position in the parse and I'm gonna build
a machine learning classifier and that machine learning
classifier is gonna tell me the next thing to do.
It's gonna tell me whether to shift,
um, with left arc or right arc.
So, if we're only just so talking about, well,
how to build the arrows,
they're just three actions,
shift, left arc or right arc.
Um, if we also wanted to put labels on the dependencies,
and we have our different labels, um,
there are then sort of 2R plus actions because she is
sort of left arc subject or left arc object or something like that.
But anyway, there's a set of actions and so you gonna build
a classifier with machine learning somehow which will predict
the right action and Joakim Nivre showed the sort of slightly surprising fact
that actually you could predict the correct action to take with high accuracy.
So, um, in the simplest version of this,
um, there's absolutely no search.
You just run a classifier at each step and it
says "What you should do next is shift" and you shift,
and then it says "What you should do is left arc" and you left arc
and you run that through and he proved, no,
he showed empirically, that even doing that,
you could parse sentences with high accuracy.
Now if you wanna do some searching around,
you can do a bit better,
but it's not necessary.
Um, and we're not gonna do it for our, um, assignment.
But so if you're doing this just sort of run classify,
predict action, run classify, predict action,
we then get this wonderful result which
you're meant to explain a bit honest on your assignment 3,
is that what we've built is a linear time parser.
Right? That because we are gonna be sort of- as we chug through a sentence,
where we're only doing a linear amount of work for
each word and that was sort of an enormous breakthrough.
Because although people in the 60s hadn't come
up with these dynamic programming algorithms,
dynamic programming algorithms for sentences were always cubic or worse.
And that's not very good if you want to parse the whole web,
whereas if you have something that's linear time,
that's really getting you places.
Okay. So this is the conventional way in which this was done.
Was, you know, we have a stack,
we might have already built some structure if we
hadn't working out something's dependent of something.
We have a buffer of words that we don't deal with and we want to predict the next action.
So the conventional way to do this is to say well,
we want to have features.
And well, the kind of features you wanted was so
the usually some kind of conjunction or multiple things so
that if the top word of the stack is good,
um, and something else is true, right,
that the second top word of the stack it has,
and it's part of speech is verb,
then maybe that's an indicator of do some action.
So ha- had these very complex binary indicator features
and you'd build- you literally have millions of
these binary indicator features and you'd feed them into
some big logistic regression or
support vector machine or something like that and you would build parses.
And these parses worked pretty well.
Um, but you sort of had these sort of very complex hand engineered binary features.
Um, so in the last bit of lecture I want to show you what people have done in the,
um, neural dependency parsing world.
But before I do that,
let me just explain how you,
um, how you evaluate, um, dependency parses.
And that's actually very simple, right?
So, what you do is well,
you assume because the human wrote it down,
that there is a correct dependency parse for a sentence.
She saw the video lecture like this.
And so these are the correct arcs and to evaluate our dependency parser,
we're simply gonna say,
uh, which arcs are correct.
So, there are the gold arcs,
so there's a gold arc,
um, from two to one,
She saw subject, and there's a gold arc from zero to two,
the root of the sentence,
these the gold arcs.
Um, if we generate a parse,
we're gonna propose some arcs as to what is the head of each word.
And we're simply going to count up how many of them are correct,
treating each arc individually.
And there are two ways we can do that.
We can either, as we're going to do,
ignore the labels and that's then,
uh, referred to as the unlabeled attachment score.
So here in my example, my dependency paths,
I've got most of the arcs right but it got this one wrong.
So I say my unlabeled attachment score is 80 percent or we can also
look at the labels and then my parser wasn't very good at getting the labels rights,
so I'm only getting 40 percent.
And so we can just count up the number of dependencies and how many we get correct.
And that's in our accuracy and in the assignment,
you're meant to build a dependency parser with a certain accuracy.
I forget the number now is saying,
some number 80 something or something that you're meant to get to.
Okay. Um, maybe I'll skip that.
Okay. Um, so, now I wanted to sort of explain to you just a bit
about neural dependency parses and why they are motivated.
So I'd mentioned to you already that the conventional model, uh,
had these sort of indicated features of, um,
on the top of the stack is the word good and the second thing on
the stack is the verb has or on
the top of the stack is some other word and the second top is of some part of speech.
And that part of speech has already been
joined with the dependency of another part of speech.
People hand-engineer these features.
And the problems with that,
was these features were very sparse.
Each of these features matches very few things.
Um, they match some configurations but not others so the features tend to be incomplete.
Um, and there are a lot of them,
they're are commonly millions of features.
And so it turned out that actually computing
these features was just expensive so that you had some configuration on
your stack and the buffer and then you wanted to know which of
these features were active for that stack and buffer configuration.
And so you had to compute features format.
And it turned out that
conventional dependency parsers spent most of their time computing features,
then went into the machine learning model rather than doing the sort of shifting and,
which you're are seeing, are just a pure parser operation.
And so that seemed like it left open the possibility that, well,
what if we could get rid of all of this stuff and we could run
a neural network directly on the stack and buffer configuration,
then maybe that would allow us to build a dependency parser which was
faster and suffer less from issues of sparseness than the conventional dependency parser.
And so that was a project that Dan Chi Chen and me tried to do in 2014,
uh, we used to build a neural dependency parser.
And, you know, effectively what we found,
is that that's exactly what you could do.
So, here's sort of a few stats here.
So these are these same UAS and LAS.
Uh, so MaltParser was Joakim Nivre's Parser that I sort of,
uh, we started showing before.
And they've got, um,
a UAS on this data of 89.8.
But everybody loved that.
And the reason they loved it is it could parse at 469 sentences a second.
There had been other people that have worked out
different more complex ways
of doing parsing with so-called graph-based dependency parsers.
So this is another famous dependency parser from the 90s.
So it was actually, you know,
a bit more accurate but it was a bit more
accurate at the cost of being two orders of magnitude slower.
And, you know, people have worked on top of that.
So, here is an even more complex graph-based parser, uh,
from the 2000s and well, you know,
it's a little bit more accurate again but it's gotten even slower.
Um, okay.
So, what we were able to show is that using the idea of instead using
a neural network to make the decisions of Joakim Nivre Style shift-reduce parser,
we could produce something that was almost
as accurate as the very best parsers available at that time.
I mean, strictly we won over here and we are a fraction behind on UAS.
Um, but, you know,
it was not only just as fast as Nivre's parser,
it was actually faster than Nivre's parser,
because we didn't have to spend as much time on feature computation.
And that's actually almost a surprising result, right?
It's not that we didn't have to do anything.
We had to do matrix multiplies in our neural network,
but it turned out, um,
you could do the matrix multiplies more quickly than
the feature computation that he was doing even though at the end of the day,
it was sort of looking at weights that went into a support vector machine.
So that was kind of cool.
And so the secret was we're gonna make use of
distributed representations like we've already seen for words.
So for each word,
we're going to represent it as a word embedding,
like we've all what already seen.
And in particular, um,
we are gonna make use of word vectors
and use them as the represent- the starting representations of words in our Parser.
But well, if we're interested in distributed representations,
it seem to us like maybe you should only have distributed representations of words.
Um, maybe it also be good temp distributed representations of other things.
So we had parts of speech like,
you know, nouns and verbs and adjectives and so on.
Well some of those parts of speech have more to do with each other than others.
I mean, [NOISE] in particular, um,
most NLP work uses fine-grained parts of speech.
So you don't only have a part of speech like noun or verb,
you have parts of speech like singular noun versus
plural noun and you have different parts of speech for, you know,
work, works, working, kind of the different forms of
verbs are given different parts of speech, um, as well.
So there's sort of sets of parts of speech labels that kind of clusters.
So maybe we could have distributed representations,
a part of speech that represent their similarity.
Why not? Um, well if we're gonna do that,
why not just keep on going and say the dependency labels.
They also, um, have a distributed representation.
And so, we built a representation that did that.
So the idea is that we have in our stack,
the sort of the top positions of the stack,
the first positions of the buffer and for each of those positions,
we have a word and a part of speech and if we've already built structure as here,
we kind of know about a dependency that's already been built.
And so we've got a triple for each position and we're gonna convert
all of those into a distributed representation,
um, which we are learning and we're gonna use those distributed representations,
um, to build our parser.
Okay. Now for- so,
you know starting from- starting from the next lecture forward,
we're gonna sort of s- start using a more complex forms of neural models.
But for this model, um,
we did it in a sort of a very simple straightforward way.
We said, well, we could just use exactly the same model,
exactly the same parser structure that Nivre used, right?
Doing those shifts and left arcs and right arcs.
Um, the only part we're gonna turn into
a neural network is we're gonna have the decision of what to do next,
um, being controlled by our neural network.
So our neural network is
just a very simple classifier of the kind that we are talking about last week.
So based on the configuration,
we create an input layer which means we're sort
of taking the stuff in these boxers and turn- and looking up
a vector representation for each one and concatenating them together to produce
a input representation that's sort of similar to when we were making
those window classifiers and then we can concatenate a bunch of stuff together.
So that gives us in our input layer.
[NOISE] Um, so from there,
we put things through a hidden layer just like last week.
We do Wx plus b and then put it through a ReLU or a non-linearity to a hidden layer.
And then on top of that,
we're simply gonna stick a softmax output layer.
So multiplying by another matrix,
adding another, um, bias term,
and then that goes into the softmax which is gonna give
a probability over our actions as to whether it's shift left arc or right arc,
or the corresponding one with labels.
And then we're gonna use the same kind of cross entropy loss to say how good a job did we
do at guessing the action that we should have
taken according to the tree bank parse of the sentence.
And so each step of the shift-reduce parser,
we're making a decision as what to do next and we're doing it by this classifier
and we're getting a loss to
the extent that we don't give probability one to the right action.
Um, and so that's what we did using the tree bank.
We trained up our parser, um,
and it was then able to predict the sentences.
And the cool thing- the cool thing was,
um, that this, um,
had all the good things of Nivre's parser but, you know,
by having it use these dense representations,
it meant that we could get greater accuracy and
speed than Nivre's parser at the same time.
So here is sort of some results on that.
I mean, I already showed you some earlier results, right?
So this was showing, um, the fact, um,
that, you know, we're outperforming these earlier parsers basically.
But subsequent to us doing this work,
um, people at Google,
um, these papers here by Weiss and Andor,
um, they said, "Well, this is pretty cool.
Um, maybe we can get the numbers even better if we make our neural network,
um, bigger and deeper and we spend a lot more time tuning our hyper-parameters."
Um, sad but true.
All of these things help when you're building
neural networks and when you're doing your final project.
Sometimes the answer to making the results better is to make it bigger,
deeper and spend more time choosing the hyper-parameters.
Um, they put in Beam search as I sort of mentioned.
Um, Beam search can really help.
So in Beam search,
you know, rather than just saying,
"Let's work out what's the best next action,
do that one and repeat over",
you allow yourself to do a little bit of search.
You sort of say, "Well, let's consider two actions and explore what happens."
Um, quick question.
Do humans always agree on how to build this trees and if they don't,
what will be the [inaudible] or agreement of humans relative to [inaudible] [OVERLAPPING] [NOISE]
So that's a good question which I haven't addressed.
Um, humans don't always agree.
There are sort of two reasons they can't agree fundamentally.
One is that, uh, humans,
um, sort of mess up, right?
Because human work is doing this aren't perfect.
And the other one is they generally think that there should be different structures.
Um, so, you know,
it depend- varies depending on the circumstances and so on.
If you just get humans to parse sentences and say,
"Well, what is the agreement and what they produced?"
You know, maybe you're only getting something like 92 percent.
But, you know, if you then do an adjudication phase and you say, "Um,
look at these differences,
um, is one of them right or wrong?"
There are a lot of them where, you know,
one of the person is effectively saying,
"Oh yeah, I goofed.
Um, wasn't paying attention or whatever."
Um, and so then,
what's the residual rate in which,
um, people can actually disagree about possible parses?
I think that's sort of more around three percent.
Um, yeah.
But there certainly are cases and that includes
some of the prepositional phrase attachment ambiguities.
Sometimes there are multiple attachments
that sort of same clause although it's not really
clear which one is right even though there are lots of
other circumstances where one of them is very clearly wrong.
Um, yeah.
[inaudible].
There's- there's still room to do better.
I mean, at the unlabeled attachment score,
it's actually starting to get pretty good.
But there's still room to do better. Um, yeah.
Um, yeah.
So Beam search,
the final thing that they did was- that we're not gonna talk about here,
is the sort of more global inference to make sure, um, it's sensible.
Um, and so, um,
that then led to Google developing these models that they gave silly names to,
especially the Parsey McPa- parseFace,
um, model of parsing.
Um, and so, yeah.
So that then- that's sort of pushed up the numbers even further so that they were sort of
getting close to 95 percent unlabeled accuracy score from these models.
And actually, this work has kind of,
you know, deep learning people like to optimize.
Um, this work [LAUGHTER] has continued along
in the intervening two years and the numbers are sort of getting,
um, a bit higher again.
But, you know, so this actually, um,
led to ah sort of a new era of sort of better parsers because so effectively this was the
90's- the 90's era of parsers that was sort of where
around 90 percent and then going into this sort of new generation of,
um, neural transition based dependency parsers.
We sort of have gone down that we've halve that error- error rate.
And we're now down to sort of about a five percent error rate.
Yeah. I'm basically out of time now but, you know,
there is further work including, you know, at Stanford.
Um, another student, Tim Dossad has some sort of more recent work.
It's more accurate than 95 percent, right?
So we- we're still going on but I think I'd better stop here today,
um, and that's neural dependency parsing. [NOISE].
