I actually think that if we're going to
do parsing it might be easiest not to
dive into compilers straight away, as  some
of you might want me to do, but to
actually just take a simple, very simple,
subset of the English language and see
how we would parse that. And if you look
up in a dictionary "what does `parsing' mean"
it says, essentially, one of the
definitions is: " ...  breaking up sentence
structure into its component parts under
the direction of a grammar." so the
So the grammar is the rules of your language.
Well, I have drawn up here a language
involving furry dice and robots and
people, men and women, and cats and dogs
and they can 'bite' they can 'stroke' they
can ... whatever. And I thought:  that little
grammar, which I'll walk you through, is
capable of generating between 200 and
300 sentences and they are all
valid English. But there's no way it
captures the spirit of English. 
The idea that your sentences have
subject-verb-object is pretty universal,
I think, among what's called the
Indo-European languages. However,
sometimes it's not always in the order
of subject-verb-object, although that's
common. Sometimes they get twisted up
particularly in German, where there's a
definite option to have the verb last.
In this example I'm pointing to here, I'm
using a notation that we did cover about
three or four years ago now. 
We'll put out a link.
I think the episode was called Angle
Brackets. And they invented a notation
that looks like this: pointy brackets.
Yes, back in the very late 50s and early 60s
people who wanted to write
compilers realized they had to write
down a formal specification of what were
the rules in their language. And two
computer scientists actually won the
Turing award - [the] ACM Turing Award - for their
notation and their development of a grammar
for the language Algol 60. And it uses
angle brackets all over the place.
So, that was so-called Backus-Naur form.
Those are the two computer scntists
who developed it. People really liked it.
And I've used it here because it's
verbose enough to make some sense. I'll
read it out to you. What I'm saying, up at
the top of the grammar - using the angle
brackets for what's going to become
parts of a so-called parse tree.
A  is defined as  then a
 then an . That's the most
common form of English sentence the
subject is the person, or thing, doing the
action. The verb says what the action is:
'kicked' 'bit', 'stroked' ...  whatever. The object
says what it is doing it to. [e.g.] 'the man
kicked the robot' And it's such a
powerful language!  You can have either
'the' or 'a' [as] definite or indefinite
article. You can have 'the robot' or
'a robot'.  You can have a choice of verbs
either 'bit',  'kicked' or 'stroked' - you're
going to be very expressive in this language (!)
Your subject of the sentence
can either be 'dog',  'cat',  'man', 'woman' 
or 'robot'. And finally the tailpiece. The object
i've decided could either be any 
followed by a  again, like 'a dog',
'the robot bites a dog' [is] perfectly possible. But
just to show that you can put in
shortcuts, if you like, and put bits of
structure in there that aren't parsed -
they're just globbets you can put in - I
have defined the phrase: 'two furry dice'
as being a valid 'object'. Just the
string "two furry dice". So the target
sentence, that ultimately we're going to
try and construct from all of this, is
the [in lower case] robot stroked [writes on paper]
... running out of space here, carry on on the line below ...
"the robot stroked two furry dice" We all
agree it makes sense [but] is it parsable
against that set of rules? And how would
you go about showing that it was?
>> Sean: So by 'parsable' you mean "can we decode it"?
>> DFB:  [Yes] can we decode it? Can we assign everything to
be the right flavour of structural
component according to those [grammar] rules.
You can either start with a string, which
we've got here, and start trying to group
things together and work upwards. Now we
all know about trees - the root is at the
top. We're computer scientists, the root
is at the top; the leaves are at the bottom.
They're always upside down. Do we
start with and leaves and say:"Ooh! the
robot stroked ... The robot stroked ...
Looking sideways at the grammar I can
see there's a shortcut for "the robot". So
I'm gonna start coming from the string
upwards and see if I end up back at the
root of the tree, or shall I start at the
root of the tree and say: "I must have a
subject then a verb then an object". Can I
see anything that could likely be the
start of a ? That's called coming
top-down. So it's been there in computer
science ever since the dawn of computer
science: the top-down approach versus the
bottom-up approach.
There's no prizes for guessing that
functional languages like Haskell tend,
by their very structure, to favour the
top-down approach. It's messier, is the
bottom-up approach. I shall try to
indicate both, OK, but let's 
just this once,
let's go top down rather than bottom-up.
If we're coming top-down it is telling
me, is this grammar, that I must have a
 then a  then an . Now, for those
of you who are concerned with a
mechanistics -- you can actually think
of these things, if you're coming top
down, as being pushed as targets, that
you've got to satisfy, on a stack which ....
We all know about stacks! This one is not
building upwards and downwards it's
building from right to left. So, at the
top of my stack currently, because I'm
coming top down, I say my target my 
sub-target is . I must find a . 
But I must keep half an eye on the target
string to say: "does it really match?"
If we cross-refer now, to this grammar,
for inspiration, we're trying to match a
 If I look at the grammar over
there it tells me that a  can
either be an article and a noun like
"the" and "a" , with a  like 'cat', 'dog',
'robot' or you can take a shortcut. Because it
tells you here that the  just
could be the phrase "the robot". So let's
go for broke.
I'm looking at the input string and I see
"the robot". So let's take the shortcut
just to be devils, right?! I can say
straight away then, taking the
shortcut, that  can be the string
"the robot". That's good!
Now, having satisfied , I pop it
off the top of my stack. I've done that.
Next thing, to match this sentence,
you've got to be able to match the . Well,
we're going along the string we've coped
with "the robot". Next word - next valid leaf
word - that has to match up with being a
. I'm looking at 'stroked'. Is 'stroked' a
possibility? Look at . What are the
possibilities for ? And I see 'bit'
'kicked' or 'stroked'. Great! I choose 'stroked'.
But, once again, if you can imagine the
stack I then pop that off the top of the stack
-- we've done it! The stack is shrinking! 
all we've got left to identify
is an  that matches [in] that sentence.
Again, I've pulled a fast one here! In the
rules for  I said you can either
break it down into an  and a ,
just like the , so nice having the
action done to it. Or you can take a
whizzy short-cut. An allowed shortcut, for an
 is what you might call an 
and I put it in deliberately, just
to make a point: "two furry dice" Wonderful!
Simplest parse going: an  can be 
"two furry dice". We declare success and we say
we have matched 
"the robot strokes two furry dice" 
by going   .
 came down into being shortcut
"the robot".   developed straight away
into being "stroked".  shortcut to
 It's a very shallow tree, really.
But it's developed the target
sentence: "the robot stroked two furry dice".
Now, the only thing you might say: "Well,
yes, wonderful, we've parsed it. But would
it have worked if the  instead of
taking the  shortcut. Could you have
done it with the   option?" 
Yes, you could.
I could have basically, come here: let's
write out:   
 and, yes, right at the beginning
instead of taking the shortcut phrase
"the robot", I could have said: "Ah! but I'm
going to do it as  can be an
 followed by a ". So, the head
of the stack then would be - when we're
looking at the sentence and starting off
from the beginning again - it's got to be
an  to start with.  is 'a' or 'the'.
Oh! I'm looking at the string, I see "the".
Bingo! So, article can be "the" -  no problem.
Pop that off the stack. What's the next thing
I've got to look for? . Can  be "robot",
which is the next thing I'm looking at? 
Yes, "robot" is an option, so I could have done
"robot". Now the  is as before.
It's got to be "stroked". And "stroked" is
there. And equally the  is as
before. There's only one way to develop
the string: "two furry dice" and that's to
take the declared short-cutk that does it for
you. The reason for doing this rather
artificial example is to say: "Oh dear!
Does this matter? We have got a sentence
that makes perfect sense to us: "the robot
stroked two furry dice". But you've drawn
two slightly different trees. First time
you took a shortcut subject-phrase, just
for "the robot". Next time you said: "Ah! but
we could also do it by breaking it down
further into a 'the' and 'robot'. Does that
matter? In this particular case, in
informal spoken English, no it doesn't.f
If people understand you the fact that your
grammar is technically called 'ambiguous'
- so if you say to yourself what does
'ambiguous' mean? It means there's more
than one parse tree that seems to
satisfy and give the final target phrase.
Does that matter? In this case 'no'. Does it
matter in general? Yes, because you might
get very different effects under certain
circumstances. What I think I'll show you
is a favourite one almost. Which is to
check your 4-function desk calculator.
When you start typing things into a 
4-function (so-called) hand-calculator.
You're basically putting in, shall we say,
integer numbers - let's simplify it.
But the burning question is something like
this: that if you type in 8 * 4 * 2
You may not think of it in
those terms but computer scientists
would say: "Yes ,if you've  got the right
grammar rules that's an example of a
"sentence". That is a
- what might you call it? -  a sort
of subsection of a long calculation, that
you do on the right-hand side [say] of a C statement.
It's just 8 times 4 times 2. There's no problem is there?. 
Eight 4s are 32.
Two 32s are 64. But then you say: "Ah! but
we're into tree structures now. How would
you parse that sentence, given that
multiply can only multiply two things
at a time. So, are you going to arrange it
so that your overall expression groups
as 8 * 4, done on the left.
Then you do the multiply by 2, which is
that tree structure. So, the question is
you've got a little bit of depth on your
tree structure here. It's gone down a
level whereas finally you've got one [operand]
up at the top level. Do you want that sub-structure 
to be on the left or on the right?
Because it would be equally OK,
as we've got things at the moment, for
you to draw something that's got 8
here, multiplied by here 4 * 2 with a
multiplier between. So have we got a
little bits of tree structure on the
left or on the right? And does it matter?
And you say no it doesn't matter. We all
know that you can't complete the
expression until it's left operand
is known and that involves forcing you
to do the multiply. 8 * 4 is 32.
Fine, 32. What's the other operand here?  
It's multiply by 2: 32 * 2 = 64.
Equally, over here, 8 x (four twos are 8; eight 8s are 64)
So, you're doing it two different ways
but you're getting the same answer. 
And the reason that's OK
is that multiply is well-behaved [over the integers]
Mathematicians will tell you it commutes
a * b is the same as b * a for integers
[and * is also associative over the integers]
However, just look at the ambiguity, OK,
Two different parse trees and it
didn't matter for multiply. [But]  just suppose
those multipliers weren't multiplies but
were divides [ / ] . Does it then matter that
you do 8 / 4 first in
this one? You get your 2 and the
multiply is now replaced by a divide:
2 / 2 is 1.
But look at this other possibility and
just mentally substitute divides for
multiplies. 8t divided by the result
of 4 / 2 ? Well 4 /2 is 2.
8 / 2 is 4. 
Two completely different answers! Is it 1
or is it 4. If you're panicking take
out your calculator now and check the
app. Just do 8 / 4 / 2 and
if the answer isn't 1 demand your money back!
Here we see it then. Here is an example
of where ambiguity [two differing parse trees] 
does lead to problems,
because depending which tree you choose
you get a different answer. So, there we
are then. We now know what a grammar is.
We know that against a grammar you can
try and parse things like sentences but
"sentences" [is]  used in its widest sense.
It could be a program statement from C and
just think of this: 8 * 4 * 2
has been as a part of a phrase
on the right-hand side of an arithmetic statement.
And grammar is the guiding
principle for the whole thing.
