Seven-segment displays are great.
We don't see them as much
in electronics these days
because screens are a lot cheaper than they
used to be, but these used to be everywhere
from clocks to supermarket checkouts
to calculators to slot machines.
Here, they're being used as part of the Megaprocessor
at the Centre for Computing History in Cambridge.
This is the clock speed that the whole system
is running at.
Seven-segment displays are a really clever
bit of design.
Seven is the minimum number of segments required
to show every number using straight lines.
It doesn't matter that the 4 isn't actually
how most people write a number 4,
it's close enough that we've all just got
used to it.
And seven segments plus a decimal point
means that there are eight lights here to
turn on or off, which is convenient:
computers like working with eights,
there are eight bits in a byte,
so you can store the state of those lights
in just one byte of memory.
And you can use these for some letters, too.
You could show the word "Error",
or even base-16 numbers.
But there's no way you can write an
M or a W on this display.
So here is a code question for you:
what is the longest English word that you
can write on a seven-segment display?
I love questions like this because there are
so many ways to approach them.
I'm going to give one solution here because
that's the way I want to tell the story,
but I can think of a couple of others just
off the top of my head,
and there will be a dozen more that I haven't
thought of,
or maybe couldn't even think of.
To start off, we need a dictionary,
and fortunately here the work
has been done for us.
There is a public-domain list of English words
available,
I've put the link in the description.
Now, this is not a perfect list:
right at the top there are
two different spellings of 'aarrgh'
and I'm really not sure either
of those should count,
but the list is good enough for our purposes.
We can manually filter it later if there are
any strange results.
Now, I'm going to code this in JavaScript
using Node.
Not the best language, but it's
not bad for beginners,
and importantly it's easy for me to explain.
First, we're going to add
a bit of boilerplate code,
just stock stuff to get it working.
Did I know all that code off by heart?
No, of course not,
I Googled 'load array from file in node' and
I adapted some of the results.
This is how basically how everyone
codes stuff like that.
Don't ever be afraid that you're not a real
programmer because you still look stuff up.
The important part about programming is not
remembering exact words or syntax:
it is breaking down a problem,
working out how to solve it,
and then fixing all the inevitable bugs in
your solution.
It's about holding lots of
complicated connections in your head,
not the exact magic words
that you need this one time.
I still forget which order to put
basic stuff in sometimes.
Anyway, the first line loads in the bits of
Node that deal with reading and writing files,
and the next line loads the entire dictionary
into a single long string called "words".
The next line converts that single string
into an array, a long list of smaller strings,
based on where the new-line characters are.
And now we have an array of all the words
in the English language:
basically just a long list of strings.
Let's see how long that array is by telling
the console to output the array's length.
Okay, more than 370,000 words, each one in
a separate item in that array called “words”.
Next problem: we need to filter that list
and remove any words that use letters that
we can't display with seven segments.
Which gives us an interesting design problem:
which letters can't we display?
Now, I'm going to use fancy graphics here
rather than actual seven-segment displays,
but for extra credit,
you can try and figure out the After Effects
expressions that I used to make these.
Letters A through F are easy, there's almost
a standard for those.
But G is difficult.
We can't use the obvious pattern,
because that's a 6, or a 9 if it’s lowercase.
And if we use an alternate pattern for it,
it's... not really a G?
It's a C with aspirations.
But to be fair, if any of those patterns appear
at the start of a word like 'GOAL',
no-one's going to look at it and say
'oh, six-OAL'.
But I'm going to make the call that
we don't allow it.
I think I is all right, though.
Like, that's clearly an I,
it's not like the half-assed G which
sorta looked kinda like a C?
It's clear. I mean, I don't care that I'm
not applying strict rules here,
I'm just going on what feels right,
and if you disagree,
you can fix it in your version.
Other letters that I'm ruling out: K.
Just can't be done, requires a diagonal.
M: I've seen it displayed like this before,
but: no. Not having it.
N is borderline, but I'm going to allow it
because there's nothing else it could be.
Q is out: that is just a 9.
R, I'll allow if it's lowercase.
S is OK, same reasons as I.
But there's no way to do V, or W, or X.
And as for Z... no.
It needs the diagonals. Not having it.
Here's our alphabet, then.
Eighteen letters left, eight disallowed.
That's actually more than I'd expected
left in there.
And I'm going to cheat.
I know this is called the Basics,
but doing this the long way
would be really dull,
so I'm going to put those disallowed letters
into something called a regular expression,
or a regex. Or "reg-ex", whichever.
Those slashes indicate that it’s a regex,
and whatever's inside those slashes is like
a test that a string can match against.
So this regex would match any word
with an X in it,
whether that X is at the
start or middle or end.
As long as there's an X somewhere in the string,
it passes that test.
If we put all our disallowed letters in,
then surround them with square brackets so
they're treated as a class,
this regex will match any string, any word,
with any of those letters anywhere in it.
If a string matches this, we cannot use it.
Regular expressions are a heck of a lot
more complicated than this,
and they can boggle even experienced
programmers' minds,
but using one here will save me about
five minutes of really dull script later on.
The good news is: we can now just use the
function 'match'
to test a string against this regular expression,
which I'm calling badLetters,
and it'll tell us whether there are
any bad letters in there.
So how do we filter the array?
One of the important trade-offs here is between
code that is fast
and code that everyone can understand.
This is not going to be an efficient and fast
approach.
But because we're running it on a modern PC,
at the command line,
and we don't mind waiting a fraction of a
second after telling it to go, that's no big deal.
It's more important that I can show the code
and explain it,
and look at it again later
and understand it.
But imagine if something like this had to
be run on the scale of YouTube or Google,
running millions or billions of times a day.
Every minor improvement you could make
would be worth it.
At some point, I should do a video about
Big O notation, but now is not the time.
Here, where we're just writing code to find
the answer to one simple question,
don't worry about it.
Sure, there are more elegant solutions.
But this is easy to explain.
I'm going to declare an empty string.
longestAcceptableWord.
Then I'm going to tell the code to start testing
every word in the array.
This line will run the code in between
those brackets once for each word,
and on each run through, this variable,
testWord,
will be the next word in the list.
First question: is the word we're looking
at shorter, or the same length,
as the current longest acceptable word?
If it is, then we can just ignore it,
we know it’s not longer,
it's of no use to us,
and we can just say 'continue'.
That 'continue' skips the rest of the loop,
and kicks us back to the start
with the next word.
Some people hate 'continue',
they think it shouldn't be used anywhere
because it can cause confusion.
And, yeah, it can,
but I reckon for things like this it's fine.
Anyway, this'll save a lot of processing time,
because after the first few words,
we're not even going to bother to analyse
the short ones,
we'll just ignore them.
So let's say we've got to this point
in the code,
it's a new, longer possible word: is it acceptable?
Here we can use our
regular expression from earlier.
Does the word match our test for bad letters?
If it does, it's not acceptable,
ignore it, ignore the rest of the loop,
continue on to the next word in the list.
But if it’s passed both these tests,
if we’ve got to this point,
then we know that our word is longer than
anything that we've accepted before
and it has no bad letters in it.
So we change the longest acceptable word to
be our new word
and we start over again with the next one
in the array.
And when we're done, longestAcceptableWord
will be the longest acceptable word.
We tell the code to write that out, and the
result is...
Huh. That's actually the longest word
in the list anyway.
You know what?
I'm going to rule out I and O.
They are just numbers with aspirations.
Fortunately, we've made the code easy to edit,
so we can just make that change, and...
Sure, fine, that'll do.
The lesson here, I guess,
is that sometimes you might write code to
find something out
and the answer is really unsatisfying.
Of course, there is one thing
we're missing here.
Our test was only checking for words longer
than the current acceptable one.
What happens if there was another
acceptable word of the same length?
We'd have ignored it.
There could be multiple correct answers.
But I'll leave checking that up to you.
Thank you very much to the
Centre for Computing History in Cambridge,
for lending me their space and their Megaprocessor.
Thank you also to all my proofreading team
who made sure I got the script right.
