Today we're going to do some more
live coding, and we're going to talk
about something which is quite close to
my heart, because I wrote some of the
early papers on it many years ago, and
that's something called functional
parsing, or combinator parsing. Before
we get started with actual live coding, I
want to begin with the question. And the
question is: what actually is a parser?
So for me a parser is a program which takes
a string of characters as its input, and
as its output produces some form of tree.
And the idea is that the tree makes
explicit the structure in the input
string. So that's a bit of a mouthful, so
let's have a simple example to explain
what's going on with this. So what we've
got here is we've got a string of five
characters -- 2 plus 3 times 4 -- and
that's our input string. But when we look
at this we know that that's not just a
random sequence of five characters, it's
actually got some structure to it. So in
particular, we've got three numbers
here -- 2, 3 and 4 -- and we've got two 
arithmetic operators -- we've got the plus
and the times -- and one of the things we
learn at school is that the times happens
before the plus. So you really do 3 times 4
here, and then you add the 2 on at the
end. So that's our input string.
So what a parser does, is it takes a string
of characters -- like that -- and it
tries to recognize the structure in the
form of a tree. And the tree we would get
from a parser is something like this
here. And you see we've got some leaves
in the tree -- the leaves are the numbers
2, 3 and 4 -- and the nodes in the
tree are the two arithmetic operators --
that's the plus symbol and the times
symbol -- and you can see the structure
of the tree reflects the fact that we do
the multiplication. First we
multiply 3 by 4, and then we do the
addition second. So that's the basic idea
of what a parser is: it takes as input a
string of characters, and as output it
produces a tree. So it's really a
function -- it's taking as input a string
and as output as a tree. So we've seen
what a parser is. It's a function that
takes a string as an input and produces
a tree as an output. So now we want to
think how can we actually implement this:
how can we implement the idea of a parser?
And I'm going to do this in Haskell
today, but it doesn't matter if you don't
know anything about Haskell because
I'll be explaining everything as I go
along. And actually, nothing I'm going
to show you today is specific
to a language like Haskell.
You can do it in any general-purpose
programming language. So if you do a web
search for a 'combinator parsing' or
'functional parsing', and then whichever
language you're interested in, you'll
find the same kind of stuff which I'm
going to show you today. So it's not
specific to the Haskell setting. So
what we're going to do then first, is
we're going to define precisely, in our
programming language, what it means to be
a parser. We're going to define a new
type called Parser, and it's simply going
to be a function which takes a string as
an input and produces a tree as an
output. So it captures the very simple
idea of what we are wanting to do -- a
parser is a function that takes a string
as an input, and gives the tree as an
output, and the arrow here just means
that we have a function from one thing
into the other. So this captures the
basic idea of what a parser is. But
unfortunately, it's not sufficient to
program with. We need to refine this a
little bit to actually write programs
with this. So the first little refinement
we're going to make is that a parser
might not consume all of its input. So
for example, if we're trying to parse a
number, like 2, we maybe we find the 2,
and then maybe we've got some more stuff
left in the input string that we need to
parse. And if we want to chain parsers
together, we're going to need to have
access to the remaining input string
that we didn't manage to consume. So the
first little refinement that we'll make
is rather than just returning a single
tree, we're actually going to return a
pair now. We're going to return two
things -- we're going to return a tree as
before, and we're also going to return
the unconsumed part of the input string.
So that's the first refinement we've
made. The second refinement we're going
to make is that a parser may not always
succeed. We may be trying to parse a
number, and we don't find the number, we
find something else. So we need to have a
way of representing that a parser can fail.
So the way we're going to do that, is
we're actually going to make a parser
return a list of results, rather than a
single result. So lists in Haskell are
denoted using square brackets. So this
simply means that rather than returning
one pair of results, we could return zero,
or one, or two, or as many as we like. And
the idea is going to be if our parser can't
parse, so it doesn't succeed, we'll return
an empty list of results. And if it does
succeed, we'll return a list with
one pair -- we'll return a tree,
that represents the structure of the
input string, and we'll return the
unconsumed part of the input.  But
because we're working with lists here,
we could actually be more flexible.  We
could return two, three, or four, or five,
or as many as we like parses. And this
is actually quite a good flexibility,
because for some languages the input
string may be ambiguous -- maybe we're
trying to parse English, and English
sentences don't always have one parse,
they can be interpreted in many ways.
So this type here is giving us a
flexibility to return many results if we
wanted. We're not actually going to use
that flexibility today, but it's nice to
actually have it. So I haven't told you
what the tree data type is here. And
that's because I'm actually going to get
rid of that now. Sometimes you may want
to return a number, or a program, or some
kind of other structure. So we're going
to replace that specific type of trees
by some arbitrary type 'a'. And I'll make
this a parameter of my type. This is our
final type, which we're going to work
with today. What we're saying is that a
parser whose results have type 'a' is
simply a function which takes a string
as an input, and then it gives a list of
results. And each result is a pair
comprising a single value of type 'a'
-- maybe a tree, maybe a number, maybe
something else -- and then an unconsumed
part of the input string. Okay, so this is
our final type. And if you look in any of
the articles, or books, about these kind
of parser combinators, or functional
parsers, you'll find a type very
similar to this there. It's quite a
mouthful again, so let's think about how
we could understand this in a simpler
way. And actually, we can write a little
rhyme to understand what's going on with
this type. So let me write the rhyme out
for you as a comment. So what we can say
is: a parser for things, is a function
from strings, to lists of pairs, of things
and strings. This is a little Dr. Seuss
rhyme to tell you what a parser is, or
what a functional parser is. And that's
actually how I remember this type. So
that's our basic type now. We've seen so
far, a parser is basically a function
from strings to trees, but in order to
actually program with these kind of
things, we need to refine the type a
little bit. And this is the type we'll be
working with today. But you don't need to
worry about the details of it. Just
basically think it's a function from
strings into trees, or some other kind of
structure. What we're going to do now,
is we're going to load up the
parsing library. So I'll start up
the compiler. This is a library, which
contains a whole bunch of parsing stuff,
which allows us to program with parsers of
this form. And this is a parsing library I
wrote myself. I'll see if I can get Sean
to upload it as part of the video.
A parsing library comes with any
programming language you can think off.
And again, if you just search for parser
combinators, functional parsing, 
whichever language you like, you'll find
the library, which gives you all sorts of
basic ways of building parsers. And
that's what I'm going to show you now.
All of these libraries work in the same
way. So you have some basic primitives, or
basic building blocks, for parsers. And
then you have a way of combining parsers
to build bigger parsers. So it's like a
kind of Lego kit, or a construction kit.
You have some basic bricks that you can
do things with. And then you can put
those bricks, or components, or primitives,
together in all sorts of different ways.
So I'm going to show you a few of the
primitives, and a few of the combining
forms. And then we'll do an example. The
first primitive I want to show you is
very simple. It's just a way of parsing a
single digit. So what the digit parser
does, is it takes a string of characters,
and it tries to consume a single numeric
digit off the start of that string. And you
might think, well, what does 'parse' do
here? What parse does, is it takes a
parser, which in this case is just digit,
and it takes an input string to that
parser, and it just applies one to the
other. So of course, this parser here is
going to succeed, because we do have a
digit at the start of the input string.
So we get exactly the expected result. 
We get a list with one pair. And the first
thing in the pair is the actual digit. We
get the character '1'. And the second thing
in the pair is the unconsumed part of
the input. And that's something we could
then try and parse subsequently with
another parser. We can test, does this
thing fail properly? So, if I give it an
input string that doesn't have a single
digit at the start, then it's going to
fail. So if I give it the input string
"abc", there's no digit at the beginning, so
we're just going to get the empty list
of results. So I'll show you one more
quick primitive. If I parse a single
character, say an 'a', from that string, then 
that will do the right thing. If I parse
single character 'a', and I didn't have an
'a' at the beginning, then it will fail.
So we've seen two basic parsing
primitives here -- we've
seen a way of parsing a digit, 
a way of parsing a specific character,
and we've seen that these things can
succeed or fail. And in the library, or in
any of these libraries, there'll be a
bunch of these basic building blocks, or
basic bricks, or primitives, that you can
use to build up your parsers. Where
things get more interesting is when you
think how do you combine these kind of
things, how do you use these basic bricks
to build actually useful parsers? Let me
show you an example of this. So there's a
parsing combining form called 'some'. And
what it does is it takes a parser as its
input, and it tries to apply it one or
more times, as many times as possible. So
if we're trying to parse 'some' digits,
what we're trying to do is consume one
digit, then two, then three, and as many as
we can until we don't find any more
digits. So if we apply the 'some digit'
parser to the string "123" then
it will do the expected thing. It will
consume all three of the digits, and then
we'll get the empty string left here. So
that's 'some', it gives us a form of
repetition. And we also have a very
simple way of making a choice as well.
So if I want to make a choice here,
between a digit, and a letter. And let me
parse the string "abc123". So, what
we've got here, is this funny symbol
here -- with the three symbols -- that's a
choice operator. It says do that, or do
that. So if I try to parse a digit, or a
letter, what it's going to try first is
it will take the first character in the
input string, and say is it a digit? If so,
I'll parse it. And if it's not a digit,
then I'll go over to the other side, and
say is it a letter? And I will try and
parse that. You can see what's happening
with this particular example here -- if I
look at the first character, it's not a
digit, it's a letter. So when I apply the
digit parser it would fail, and then the
'or' operator, or the choice operator here,
will go over to the other side and say,
well, is it a letter instead? And of
course it is, so we can parse the single
'a' off the front here, and then we
get everything else as unconsumed.
And of course, if we wanted to be a bit
more clever we could combine some of
these things. So I could say, some digit
or letter -- get my brackets right --
"abc123", and then that will parse
everything. Because all I'm doing here is
I'm repeating, or iterating, the choice
between either parsing a single digit, or
a single letter. And I've got a
string of digits and letters here,
so I can parse the whole thing. So I've
consumed them all. And then I get nothing
left at the end. So what we've seen so
far, is some basic building blocks, and
we've seen a couple of combining
forms -- we've seen a way of doing
repetition, which is 'some', and we've seen
a way of making a choice, which is the
funny operator in the middle there. What
I haven't shown you so far, is how to do
some form of sequencing. And this is the
most common thing you typically want to
do with parsers. You want to say do this,
and then do that, and maybe do that as
well. You want to sequence things
together. So I'll actually show you a bit
of the parsing library here. So here's
the parsing library. And I don't want to
go through all the details of this, but
one thing I want to know, is it's quite
short. If I kind of scroll down here, I
think it's about four and a half screen
fulls. And I've got quite a big font here,
and this is actually already quite a
sophisticated parsing library. So it shows
you the power of this method, that you
don't need hundreds of lines of code to
write parsers -- four and a half screen
fulls is a library which
is fully fledged and you can
basically implement any parser
that you like using this. So I'll
show you a couple of examples of
sequencing. The first example I want to
show you is a parser for natural numbers.
So what's a natural number? It's just a
non-negative integer, like 0, 1, 2, 3, or 10,
or something like that. So you think how
do you parse a natural number? Well I'm
going to use the sequencing notation for
parsers, which is to do notation. And the
do notation is very simple -- you write
the word 'do', and then you have a whole
bunch of parsers one after the other, and
it just runs them each in sequence. So
the first thing here, is we're going to
parse 'some digits'. Because that's the
basics of what a natural number is -- it's
just some digits. And if that succeeds,
I'm going to call all those digits 'xs'.
So 'xs' is just going to be a list of all
the digits. And then what I'm going to do
here to be a bit more flexible, probably
when I parse a number I don't want a
string of characters back, I actually
want the number. So I'm going to pass the
string in to a little function called
'read', which just converts the string into
a number. And then I'm going to simply
return it. So the basic idea here is
we're sequencing two things together --
we're reading some digits, or parsing
some digits, and then we're translating
those into a normal number, and then
we're returning it. And we're sequencing
those things together using the 'do'
notation here. So just one more little
example, because we'll use this in a
minute. Here's how you could parse an
integer. So an integer is either a
negative number, or a
positive number. So there's a
choice there. So we're going to use the
choice operator. So here's the 'or'
operator, which we've seen a few minutes
ago. And the two parts here just say -- have
we got a negative number, or have we got
a positive number? So the parser for a
negative number, we use the 'do' notation,
because we need to do three things. 
So the three things are here. So if we're
trying to parse a negative number, the
first thing we do is we parse a minus
sign. So we're using the 'char' primitive
that we saw previously -- that will parse a
minus sign. Then we're going to parse a
number, and call it 'n'. And then we
need to remember that we need to make it
negative, so we negate it and then we
return it. So again here, we're just using
the simple idea of sequencing three
parsers, one after the other. And then the
'or' here says, or we can just have a
simple natural number. Okay so this
illustrates the idea of sequencing. And
if you've seen some of my previous
videos, you may recognize the 'do' notation
here. And this is because parsers form an
example of what's known as a 'monad'. And
in fact, for me, parsers being monadic is
one of the key ways to understand what
a monad is. So if you've seen the monad
video, or even if you haven't seen
it, maybe you have a look back at that,
and if you find it interesting. maybe
look up some of the work which people
have done on monadic parsing. And it's a
really good way to get a very good feel
for what's going on with both these kind
of parsers, and monads as well. We've seen
that parsers are basically functions --
they take a string as an input, and they
produce essentially a tree as an output.
We've seen some basic primitives for
consuming single digits, and single
characters, and things like that. And
we've seen some basic combining forms -- we
can have repetition, with 'some', we can
have choice, and we can have sequencing.
So we've got our basic kind of building
blocks for making larger parsers. So
let's wrap this up now by doing a little
example. And the example I want to do is
to build a really simple parser for the
kind of expressions, or arithmetic
expressions, that we saw back at the
start. So things like two plus three
times four. So what I'm doing here is I'm
writing a Haskell program, which is going
to implement this parser. What I've got
in the first line here is simply
importing the parsing library, which
we've just seen -- it's just four and a
half pages of code -- it's very
straightforward. And what we've got here
in the comments, is a simple way of
writing down what the syntax, or
form, or structure, of expressions are.
And this is what's known as a 'grammar'.
But it doesn't matter if you 
don't know what a grammar is, because the
basic idea is very simple here. The first
line says an expression can be one of
two things, so this means 'or' here, in
grammars. So an expression can either be
a term plus an expression, or it can be a
term. And then in turn, a term could
either be a factor times a term, or a
factor. And finally, a factor can be a
bracketed expression, or an integer. So
there's three simple rules here, which
explain the form, or structure, of what a
simple arithmetic expression can be. There's
actually quite a lot of things going on
here, but the only key thing I want to
point out is we've got three rules,
because there's three different levels
of priority in an expression. So the
highest level of priority is brackets. So
that's one thing you learn in school, you
learn that you do the brackets first,
that's the highest priority thing.
The middle level of priority here is
multiplication, so that's thing in the
middle rule. And the lowest level of
priority is you do addition, and that's
sitting at the top rule. And again, this
priority order is something you learn at
school -- you do brackets first, then you do
multiplication second, and then you do
things like addition last. And these
three rules are just making that precise.
And again, if you want to know more
about grammars, you can search on that,
and you'll find a lot of
information about that online.
So what we want to do now, is take this
little grammar, and implement it as an
actual parser. And this is very
straightforward to do, because we're
using this functional parsing idea.
Essentially, we just take those three
grammatical rules, and we just implement
them using the combining forms, and the
primitives that we've seen. So it's
a very straightforward translation.
So the first one, is we want to say an
expression can either be a term plus an
expression, or a term. So let's do the
first part of that. So we've got term
plus expression. So what we're going to
do is parse a term, and if that succeeds
call it 'x'. Then we're going to parse a
'+' character, then we're going to parse
an expression and call it 'y', and then
we're going to return x plus y. So there's
four things going on in sequence here.
We're first of all parsing a term.
Then we're parsing the '+' character.
Then we're parsing an expression, and
we're getting the values x and y -- these
will be numbers. And then we're going to
simply add those two numbers together.
So you can see here we're actually doing
more than just parsing -- we're actually
evaluating the expression as well. And
that's one of the advantages of this
approach to parsing, that it's not just
about building a tree, you can actually
process things as you're going along. And
we're actually processing them here by
doing complete evaluation. So x and y
will be numbers which, result from this
term and this expression, and then we just
add them together here, and return the
result. Then the last part of parsing an
expression is we can either be a term
plus an expression, or we could be a term.
So we just use the choice operator, and
we get term.  And these five lines here
are our full parser for expressions.
So we had this one rule up here -- an
expression could be a term plus an
expression, or a term, and we just
translate it directly into our parsing
notation. And again the key observation
here is that the grammatical rule here
looks basically the same as the parser.
So let's look at the second rule, and in
fact it's pretty much the same as the
first rule, except a few symbols are
changed. So let me just copy it. And I can
just change it. So I can say a term can
be a factor times a term, and then I can
do a multiply there, or I get a factor.
Okay, so in just a few key presses, I've
managed to implement my parser for terms.
And again the point to note here is that
the grammatical rule here looks
basically exactly the same as the parser.
Okay, I've just got a few more symbols
in here, because I'm actually writing a
program to do parsing, or actually
evaluation, but it's the same basic
structure. And then just to wrap things
up, I can implement what a factor is.
So a factor is either a bracketed
expression, or it's an integer. So let me
write the parser for that. So a bracketed
expression, I just parse a character, and
then I parse as an expression and call it
'x', and then I parse a closed bracket, and
then I return the 'x'. Or, I can parse an
integer. And again, if you look at the
structure of the rule up here -- a factor
is a bracketed expression or an integer.
I've got exactly the same thing down
here -- here I'm parsing a bracketed
expression, and here I'm parsing an
integer. And this is actually our entire
parser. We've got three lines up here, and
we've just got kind of ten or fifteen
lines down here, and this is actually
a complete parser, and evaluator,
for arithmetic expressions.  And
again, the beauty of this approach is
the parser looks basically the same as
the grammar. So let's try and see if this
works, and hopefully I haven't made any
mistakes. So let's load it into the
system. Okay that's great, no errors. And
we can see that we've loaded two files
in now -- we've loaded the parsing file
in, which is about four and a half pages
of definitions, and we've loaded the
example program in now. So now we can check,
does our parser actually do what we want.
So let's try out our parser with the
little example that we had at the start,
2 plus 3 times 4.  So we're going
to parse an expression, and the
expression is 2 plus 3 times 4.
And we press return, and we hope we get
the right result, and we do. So remember
from school, you do the multiply first,
you do the 3 times 4, so you get 12,
and then you add the 2 on at the end,
and you get 14. So we've managed to
get the result 14 here, and there's no
portion of the input string left. So
we've got a successful result -- we've got
a list, and we've got one result value,
we've got 14, and we've managed to consume
the whole thing. Or we could check,
does this actually work with more
sophisticated examples? So let's try
putting some brackets in, and let's put
brackets around the 2 plus 3, so we get 2
plus 3 times 4. So we hope then that we
do the addition first, and we get 5, and
then times by 4 to get 20. Yes, and it
works. We can try more sophisticated
examples. So let's do something like 2
plus 7 times 10 plus 8 times 20, and if
I've got the brackets right, yes then it
works fine. We can also check what
happens if you give it something which
doesn't parse. So suppose I do something
like, I parse expression, so 2 plus 3
times, and I forget to write the 4 at the
end. What's going to happen? Well,
the parser will still manage to
succeed, because it will manage to parse
the 2 plus 3, and we'll get the 5 out,
but it doesn't know what to do with
this symbol sitting on its own. So you get
that back as an unconsumed part of the
input. And again, we can try another
example. Suppose I forget to close the
brackets, so I do something like 2 plus 3,
and I forget to close the brackets, then
it won't know what to do with that at
all, and we'll just get the empty string.
That's basically it -- this is the idea of
functional parsing, or combinator
parsing. The idea is very simple --
parsers are basically functions,
you define a library with some
basic building blocks, or primitives, some
combining forms that let you put these
things together, and then you can end up
writing parsers as we've seen that look
very similar to the grammars that you
write to describe languages. 
