First thing we've got to talk about
today, Sean, because otherwise I'll get
it in the neck - and so will you for not
nudging me and reminding me
Yes, everybody, I know ... this thing here
is still using Windows 7. And the useful
guys at work - thanks for doing this chaps -
have assured me that the moment this
filming session is finished, if I bring
this [PC] in [to work] they will upgrade it to
Windows 10 as well 
>> Sean:  We've been talking
about Regular Expressions, basically
about the theory of them and the idea of
them but we've not actually seen them in practice.
>> DFB: [video clip from previous episode]: Yeah! regex, 
regex, REs. They are a very good illustration of where 
theory can meet practice. But I think in the
previous one we did a little bit of
theory. What we ought to do now is just
see them in action I think.
[It's] a difficult one for me to tackle is this.
I think you'll all - I'm just trying to
get sympathy here (!). The span of what you
either know or don't know [you the
audience] is huge on this topic. Some of
you know way more than I do, Some of you
really are beginners and struggling to
get used to the somewhat abstract
notation, and so on. So, apologies up front
but this one will seem very simple and
very straightforward to those of you who
have got some expertise. But I think it's
important that we regroup and say: "Look
this is the notation we all agree on this".
Because I have, for the future, got a
very good example lined up of something
where Regular Expressions, if you like,
can only just cope. I am doing my
examples here in 'lex' because I hope that
some of these examples later on will
transfer into being part of a little
compiler of some sort. And it's software
I'm used to. But it's very
straightforward. You give a piece of
Regular Expression for a pattern you
want to match and then you give, if you
like, an action that you want to take.
Now, very often,  having recognized a piece of
Regular Expression all you want to do is
to echo it back, perhaps with a bit of
explanation as to what it means. 
So here's my simple exercise. I'm going to
declare about seven reserved words in my
language. But my language is going to
ultimately end up as being an elementary
computer graphics language just like
Brian Kernighan's PIC. So, my reserved
words will be things like "circus" ...
Circus !!?    [Music]
[I meant] "circle"! In fact I should put in both
"circle" and "circus" to see if it can
distinguish between the two. 'line', 'arc'
'spline', 'box', that sort of thing. I want
those to be picked up as being Reserved
Words. But then, if it isn't a reserved word
in my scheme and it's some other bunch
of characters, is it a bunch of
characters that would do good service as
being a Variable Name? And, as I think
we've said, many a time, Variable Names in
many languages follow the pattern that
they must begin with a letter, but then
after that, they can have any mix of upper-
or lower-case letters and digits in the
name - zero or more of them.
That's your Variable Name. So, reserved
words and named variables of that type
- of that particular reserved word type -
that's all we're gonna do today.  I have
here a 'lex' script which has got seven
specific lines in for recognizing `circle',
`line', `arrow', `spline', `box', `arc, or 
- bonus at the bottom - `circus'. Both start "circ". 
So when you look at it, it's either going to
match that, or that, or that, or that or that - all
that the reserved words. If it won't
match any of those it keeps coming down
trying to match, trying to use the next
Regular Expression to get a match.
And below here I just give [0-9]+, (Zero to nine, it
says in square brackets, plus) And that's
a piece of Regular Expression notation
that says: "any combination of digits
0 - 9, in any order, going on
arbitrarily long". For the moment here
those [A-Z] or [a-z] choices it
means anything in that range - literally
those characters in however many
combinations are possible. So, I've put
all this together; I've  fed it into 'lex'.
I've compiled it all up for you - I won't
bore you by doing it in front of you - but
believe me, I have saved this as a binary
executable. It's called testRE for "test
regular expression" but it only handles
these regular expressions [points to paper]. I think 
we're all ready to go.  I just type in the name
of the executable binary ... testRE ... let's
see if it works. 
Right. [looks at screen], silence signifies "I'm happy" 
Yes, it's waiting. So, go on, tell me something to try out, Sean.
>> Sean: Let's use the name Bob. 
>> DFB: Bob? you just want Bob? 
all on its own? 
>> Sean: Yeah! Bob.  what will Bob do?
>> DFB: Would you agree with that?
"bob" is a Variable Name.  In other
words it's a valid identifier for a
variable of some sort. Fine, yes,
there's nothing to stop you calling your
integers, or your circle,s or your lines...
You can call them "bob" if you want to.
That's fine. I'm saying that this
thing, as advertised, really does treat
words like 'circle' and 'line' as being
special. Let's see if it filters those
out and gets it right. So, I'll just say
'circle', on its own, lower-case. Look at that!
As part of my pattern-matching, 'circle' is
one of my entries in Reserved Words that
must be recognized, just "as is", lower- case
notice and it's worked. It basically says
"Yeah! got you 'circle'. It's a Reserved Word
and - just to emphasize I've got it -
it's "circle". All right? Now, this time I'm spelling
it with a capital C and my guess, my hope,
is that it will [now] recognize the first
'circle' as being a Reserved Word.
The second 'Circle' can't be a reserved word
because it's case sensitive, right? It's
'circle', all lower-case, has been reserved.
But the version with an upper-case C isn't
[reserved], therefore - who knows -  it should
be a Variable. Let's see if that works.
Yeah, 'circle',  all lower- case is reserved
ECHO it back just to be sure. Yeah, I got it.
It's 'circle'. But 'Circle' is a variable
name, which I think sounds right. Think
of something else [Sean] that might break it, go on?
>> Sean: Well, we talked earlier and you
kind of said the idea of putting the
word 'circus' in there, to throw it, because
it's so similar?
>> DFB: Yes, that's a good point
let's just try 'circus'; it's happy with that,
I did make it a Reserved Word, but it
hasn't sort of come up with: "Ooooh! I can't
decide between 'circle' and 'circus'. Part 
of what I was saying, in the episode last
time, is that one of lex's jobs is to say:
"Despite the fact that 'circle' and 'circus'
have a common beginning, I'm very clever and 
I very efficiently factorize that
beginning out. And then say: "Well, after
that, if it ends 'le' it's a Reserved Word
If it ends 'us' it's also a Reserved Word
But it's happy. So a good thing to do, now
I think, would be: "Can you name a circus?"
Yeah!
[Music] [Screen shows circus tents named 
"Circus1" and "Dave"]
Better still perhaps, how about this 
I want to name another circus but I'm
going to call it 'circus1'. Now that
should be no problem because it's not
saying "circus circus" It's saying 'circus'
(reserved word) and that category. 
'circus1' can only be a Variable Name.
>> Sean: So it's using the space to delineate ...
>> DFB: Yes! the way
I've got it set up at the moment is - I
haven't told it to ignore spaces yet,
I've left them in. Because it serves
in the way I've got it at the moment as a 
very handy break between these various
things, which can then be analyzed
separately. This, then is, if you like,
aligning with the history of 'lex' and
regular expressions it's that Mike Lesk
put them in this front end to enable you
to do Reserved Words, variables, all sorts
of things like that, but historically
they then migrated out into things that
have nothing to do with compilers. Many
of you will have heard of UNIX 'awk' and
that was the great grandaddy of all
sorts of things that you're more
familiar with like Perl, PHP, Python and
so on. Awk's characteristic was that it
just did regex pattern matches, then
actions, there was no context. It was
interpretive.  'awk' - you gave it the thing
to do, it comes straight back at you. You
didn't have to recompile it every time.
So here's the first beginnings of what
we need for a longer example. We've got
the ability to take a choice of characters, in
any combination, "zero or more of ...", to
name variables. Fixed sets of characters of 
[a] certain variety like 'circle', 'line', 'box',
are dealt with first. So I think the
thing to take away from this is that in
programs like 'awk' and 'lex' you've got to
remember that the various possibilities
you give will be done in that
order. Its ... you've got to imagine that
between the lines there's almost an OR
operation. You start up at the top. You
say it will either my match 'circle' or it
will match 'box' or it will match 'line' or
it will match spline or it will match
arrow and so it goes on. And then, down at
the bottom, the catch-all is: " ...  and if it
doesn't match any of those let's just
see if it could be a legal variable" And
then you just run out [of options]. And I have to
accept that if you put in a line of
punctuation, I think it would just
ECHO it back atd me and not do anything
with it. Let's just see "£$ .
It just echoes back, it takes no action.
It just says: "I don't know what that is".
I think this has set us up now, I hope, into
being able to do a longer example than this.
But to me, at least, Regular
Expressions come into their own for this
kind of thing - one-liners to name things,
you know match a pattern, do that with it,
- all over one line. They're not all that
well suited to doing very long-range, big
strategic, structure. So many of you have
said to me: "Oh! cover why regex's can't do
XML properly" Well I might get onto that.
But yes, you know, you all know XML has
got big tree-like structure. Regex's do
not, of themselves, find it easy to do those.
>> Sean: Can I ask one question? If you had
a real circus, what would it be called? 
Would it be like "The Great Brailsfordinis" ?!
>> DFB: Wasn't there a Circus Maximus in Rome?
>> Sean: There we go, I just think he needs a
bit more of a showman's title ...
Oh Barnum & Bailey! Is that right? Do you
want me to try Barnum & Bailey ?!
