Last time you [Sean] did a really good
animation about top-down parsing. We had
these sentences in this totally
artificial grammar, which has only got
about 30 words in it, all in all.  But we
started here with  is  
 and, in the top-down approach, you
basically say: "OK, I'll push those on
the stack in reverse order:  
" And, starting at the top, of the
stack, at the left, you say: "Well, what can
a  be? So, you do these things on
your stack, top down, one at a time, and
you start off at the root of the tree
and you develop the component parts, left
to right, one at a time. It is a lot
easier, totally by hand, to write a
top-down parser rather than a bottom-up one.
But what I shall be getting to, later on
and in fact I'm starting right now, is to
say: "Well what happens if instead of
starting up at the root and developing
the leaves of the tree, as it were, you
start right down at the bottom, with the
text string that you know is correct, and
try and work upwards you know can you
work back upwards, from the leaves 
to the root?"
So, in fact let's write that test sentence
below here: the robot stroked two furry ..."
And actually, since layout doesn't matter
too much at the moment I'll squeeze the
word 'dice' in at the right there.
Now, be clear, we're talking about
bottom-up parsing now. In bottom-up
parsing you start with the string that
you think is correct and you say:
"Starting with a string can I look into the
rules and see how to work up the tree,
not down the tree?" And, yes, so therefore
you're looking at possible matches on the
right-hand side [of the rules] for components of
this string, reading from left to right.
OK 'the'. How many ways are there you
can match the string 'the' against one
of these classifications here? Well the
text - the string 'the' -  is the right-hand
side possibility of an icle. That's
one way to do it. Oh! and then look up
here, right at the top of the grammar, if
you have an icle followed by a ,
like 'the' followed by something else, 
that could be a ect, and that's looking
good because right up at the top of the
grammar we want to end up with ect
 . Looking just at "the robot",
and looking at the grammar right-hand
sides, I could do it by saying: "Well it's
the subject of the sentence, it's at the
left-hand side, and if I go icle 
I get 'the' and I get 'robot'. But what
I've done, just to act as a talking point
and it illustrates a lot of things here,
I've given you a shortcut. If you want to,
you can just do "the robot", with no
further interior analysis at all. It's an
allowed phrase; it's the ect. Now I
have to say that, as we develop this
story, we will get into bottom-up parsing.
Because one of the tools were going to
use called 'yacc' basically produces
bottom-up parsers for you, not top-down ones.
And it's a yacc behaviour symptom that it
loves,  when you're trying to match text strings,
it likes to match the longest one that
it can [legally] manage. So it is going to seize on
"the robot", all as one phrase, as being a
wonderful long solution. Why doesn't it
use 'the' and then wait patiently
for ? That's not the bottom up way.
If you can see a longer ... Oh! and this thing
by the way, that you're looking at, is
built upon a stack, of course, it's called
the 'handle'. I get a longer handle by
going for this option here and getting
"the robot", all in one go. OK "the robot"
then, and you've got to get used to
reading from right to left now,
in bottom-up parsing, "the robot" all as one
phrase is an example of a ect. So, we
can now say: "OK, 'the robot' is my ect".
Now that act - of picking up a substring
from your sentence and going upwards, and
making it more abstract if you like - it's
called 'reducing', in bottom-up parsing.
So, looking for a longer and longer and
longer string, to get your handle, that's
called 'shifting', because you're shifting
characters, one after another, and making
the string longer and longer and saying
"... can I go any further?" That's shifting but
when you say: "Ooh! that's a nice long
string - and it matches" and then you go
up and say: "Oh! that's my subject", that is
called 'reduction' because you're going to
something simpler further up the tree.
So, you can tick that off as being done
bottom-up. [The] next thing is, you see this
string of characters called 'stroked' and,
once again, it's right-hand-side driven.
What is there, on the right hand side,
and which rule is it, that could possibly
match 'stroked'? You see in here, against
,  'bit', 'kicked' or 'stroked'. Those three
strings are your possibilities. So, that's
fine. Going right to left you say 'stroked'
is an example of a .
So we've got our  there. Now,again,
I've cheated but it's wonderful fun! I have
not analyzed "two furry dice" into
adjectives and nouns or anything like that.
I've just put it in as a interesting
short-cut to have there. And it is an
example of what I would call an object-phrase
Some of you, who are really good
English linguists, may want to go on about
my lack of understanding about what a
direct and indirect object are -  not to
mention 'predicates' and so on. But please
forgive me. I regard it as being a phrase
in an object position. So, I'm saying
there's a quick match here and bottom-up
I love this: "two furry dice" is a great
long handle. Oh! and if I match it
there, what's the left-hand side it corresponds
to? [Answer is]  . OK then, we've won! We have worked
bottom-up to having ect  ect
on our stack starting with the string.
And, what's more, we've exhausted the
[input] string now. It's the end of it. There's a
sort of full stop after that.
There we are then.
We've got top-down which tends to be
more - how shall we say? Eager?-  you know a top-down
parse would very probably leap on the
word 'the', and not bother to go any
further because it's found a quick match for it,
whereas bottom-up is the other way round.
It's basically saying: "I want the longest
possible handle". Even at this stage in
the late 50s and early 60s there
was a sneaking suspicion, coming around,
that actually bottom-up parsing was a
little bit more powerful than top-down.
I'm going to put out a set of notes for
this so that you can look up for yourself. 
Just examples of why it [i.e. bottom up] is more
powerful. But roughly speaking I think
you can sense that because you've not
only got something you're looking for
but you've [also] got a handle that you've
already accumulated, it's like gathering
more contextual information - going
bottom-up. But, on the other hand, handling
the stack and working out what's
happening is a darn sight more complicated - if
you do it by hand - coming bottom up.
Rather than doing it all by hand, why not 
me and you lot [together]. It's a good way
to learn 'lex' and 'yacc' In other words
don't write the C directly yourself. Get
a software tool, like these two,
to do it for you.  So, that's exactly what
we're going to do. I've got the program
'putty' that does 'ssh' connected here.
I'm talking to my other Linux machine in the
other room, where I have got set up a
parser -  complete parser: front-end lexical
analyzer [then] syntax analysis - for this 
'furry grammar' and all legal sentences in it.
And I know, first of all, you will want me
to call up the program that implements
this and the test sentence first of all is:
[In unison] "the robot stroked two furry dice". 
Here we go!
So, "furry". It's hanging there, it's waiting for us to give
a correct furry sentence  [reads from screen] "... dice ... return"
Look at that! It's happy with it!
I'm giving it in subject-verb-object
order and I have numbered those rules, in
the grammar as I showed you earlier, and
I now have, as it were, a map of how it
has parsed it.  Rule 3?
Now that is the one that effectively
says I can do "the robot" all as one
phrase. It has chosen not to go for 'the'
and 'robot' as icle and ,
as separate entities. It might well have done that,
had I gone top-down, but because this yacc-confected
parser system goes bottom-up
it's gone for the longest possible
handle at that stage and it's matched it.
Rule 4: the middle piece is matched
'stroked' as the verb and finally it has
spotted, right at the very end, that I put
in another sneaky short-cut to "two furry
dice" is Rule 6. And that is my parse.
So, should we try one more just to make sure?
Go on Sean, tell me which one to try next.
>> Sean: Try "the woman bit the dog" 
>> DFB: Yep, "the woman bit the dog"
There you are look - Rule 2 for "the woman" now. Rule 2
not rule 3.  if it has followed Rule 2 it's
gone down the icle  route,  which
means it knows that's the only way to
match "the woman".  There is no
shortcut way. OK?
Rule 4 - a  rule again; it chose 'bit'
Rule 5: now this time, again, there is no
short-cut to "two furry dice" at Rule 6
You've got to go the long way around and
following Rule 5, you break it down
into icle  again: 'the' and 'dog'
So, there we are. You could say
well you've written a compiler for the
'furry' language, with the help of lex and
yacc. We could go into details of that
later, if need be, but not now. It's fair enough 
but it's not doing anything really is it?
What more shall we do with this,
now we've written it this far then Sean?
You tell me ?! 
>> Sean: Well I think next time we need to come up 
with an action, it needs to do something ....
>> DFB: We need to transform that grammar in some way. 
Those of you who, in the previous video,
actually bothered to look at the EXTRA BITS
may have had a sneak preview as to
what we're going to do, as our much more
interesting actions now we have
recognized the innate structure.
So, remember - always watch the EXTRA BITS !
