So today I'm going
to talk about stories
every developer should know.
This came about because
Mark Richards and I Mark
is a well-known
software architect
and we do these workshops and
training classes for software
architecture fundamentals.
Where we're talking to brand
new software architects
and all these stories
keep coming up
because we keep saying, well the
reason we have elastic scale
is because of pets.com
and you know about pets.com
right?
And nobody knows
about pets.com and so
that's one of the interesting
things about our field is that.
So many interesting
things are constantly
coming at you from the
future that you often
ignore really important things
that happened in the past.
But sometimes those lessons
are really important.
And that's mostly what
this talk is about.
And there's something
about stories
that is much more evocative
than just plain facts.
You can tell people
facts about something.
But if you tell a really good
story with a good punchline
you can move people a
lot more because we're
used to hearing stories ever
since you were a little baby.
You heard stories with
a beginning, a middle,
and the end.
And so sometimes
this helps convey
messages better
than just facts
and a lot of these
stories end up
coming from the realm of
forensic engineering, which
is basically a
field of engineering
that looks at things
that break spectacularly
and why I got really fascinated
with this kind of engineering
for a little while before
I realized that only
a few people get to do this.
But a lot of these are sort
of forensic engineering
from software.
But some of these are not
software projects at all.
And in fact, the first
one I want to talk about
is one of the two field trips
that I encourage developers
and architects to take.
And when I give this talk in the US.
It's a much bigger ask.
It's a much closer
trip for you folks.
So I start and end my
talk with a field trip
and the first field trip
here is for the Vasa
and this is my
oldest story as well.
This comes from the
1600s when Sweden
was at war with Poland
Lithuania at the time
and the King of Sweden decided
he was finished with this war.
So he wanted to finish
this war by building
the most ultimate ship that
had ever been created up
until that time.
That was the Vasa.
This was a magnificent warship
that up until that time ships
were either gunships or
they were troop transport.
The Vasa was going to be
both. Up until that time they had
a certain size gun
the Vasa was going have twice that size
guns. It was going to
be twice of everything.
This is actually a picture
of the Vasa as I mentioned,
this is a field trip
and you can actually
go see the Vasa for
reasons I'll talk about
in just a second.
Part of the problem,
though, is that never
actually built a ship that
was both a troop
transport and a gunship
with double the size guns that
they had ever put on a ship
before.
And while they had the best
shipbuilders in the world.
They were really pushing
envelope on this design.
And what happened was when
they got the ship finished,
they rolled it out into the
harbour and had a great celebration
and did a gun salute
off one side of the ship
and because it was top heavy.
It caused it to tip over
and capsize and sink
to the bottom of the
Bay there in Sweden.
Turns out Swedish water is
very cold and very anaerobic.
And to preserve the Vasa
until the 1920s when
they resurrected it and put it
into a museum now in Stockholm
so you can go visit the
Vasa now and actually see
this great example of
requirements gone wrong.
Because that's basically
what the Vasa is.
Is this project where they
just kept adding requirements
and kept adding requirements.
And then you eventually
launched the thing
and it sinks to the
bottom of the ocean
somewhere. Because they
built two deck ships before.
And they had built
cannon ships before,
but they'd never really
built a two deck cannon
ship troop transport
kind of ship before.
And so this is a classic
example of trying
to scale some sort
of architecture past
where it naturally
wants to go, but this
is a really common pattern
that we see over and over
in software.
And this is the best metaphor
I've seen of this pattern
that we keep seeing
over and over.
We start with something simple
and beautiful and elegant
to solve the problem.
And it's like, OK, that's
nice but you know scissors
every once in a while you need.
But now we're good.
We're done.
It's beautiful it does
exactly what we want.
But you know
corkscrew every once
in a while you need a corkscrew.
But now we're really done.
We're finished we're done.
It's good now.
It does everything we need.
It's a little
complicated, but that's
OK, because it's
solving some problems
But you know push pin every once in a
while, you just really need.
But now we're really we're done.
We're good.
We're sure it's finished now.
Every one of these things
always ends up with this.
We keep seeing this
pattern over and over
and software where we
keep building things.
We start with something simple
and beautiful and elegant
and we keep adding
stuff to it to make it
useful to solve real problems.
And it gets to the point
where everybody says it
it's too complicated.
Let's burn it all down and
start over from scratch.
And we've seen this
cycle several times.
We're currently in
the cycle right now
who were burning everything
down and rewriting everything
in JavaScript. Atwood's law
Jeff Atwood's a famous
pundit in the software world.
He has codified
Atwood's law, which
I'm terrified is coming
true that everything
that can be written
in JavaScript
will be written in JavaScript.
And I don't know if that's
the best thing or not.
But that's exactly
what we're seeing.
The cycle over and over again.
I fully expect and
probably another two years,
people are going to
look at JavaScript code.
This thing has gotten too
bloated and enterprise.
We need to burn this thing down
and start over from scratch
with something else.
So who knows what that
will be this time.
But all fundamentally
what this is about
is misunderstanding
these trade offs
that are at the
heart of everything
of course in software.
And it turns out in things
not software as well.
My second story, there's
another engineering project
the Tacoma Narrows Bridge.
This was in Washington
State at the
Kitsap Peninsula and
this is a bridge that
opened to traffic
on July 1st 1940
and then closed to traffic
on November 7th of that very
same year.
This is a picture of the
Tacoma Narrows Bridge.
This was a particular
bridge design
that was popular at the time.
We had a thin metal deck
supported by cables.
This is a more modern design.
It's the same kind of bridge
just like the Golden Gate
Bridge in San Francisco.
The thin deck
supported by cables.
And the reason they
don't build bridges
like that much anymore is
because of the Commodore
bridge
because it turns out
that in this gauge when the wind
blew at a particular frequency
it would cause a
resonance with this deck
and cause it to pitch like this.
Now the people who
lived here in 1940
thought this was the
coolest thing ever.
It was like a ride at a
source in the sky walking
in the middle here riding this
thing like it was a carnival
ride or something like that.
You wouldn't get me within
a mile of this thing.
And there's a car on it
pitching back and forth wildly
in the wind.
And the reason it closed
eight months after it opened
was because one day the wind
pitched up exactly right.
And it caused the thing
to eventually collapse
into that river gauge.
This is the thing that
basically convince
him to stop building bridges
like that because what
happened was they kept pushing
the envelope on this design.
It's like, well, you know
that design worked well,
can we make the plate a
little thinner on this one
and a little thinner
and a little thinner.
It turns out mostly you
can until you find a River
gauge that has a certain
frequency of wind it starts
at resonance frequency and then
you get a disaster like this.
This is another example of
trying to scale architecture
in a particular way.
But this is taking
a proven concept
and pushing it until you find an
interesting boundary condition.
And a lot of times
our mathematics
don't handle these
boundary conditions
that we've never
encountered before.
Well, this is one
of those cases,
it turns out contemporaneously.
There was another bridge
designer in Sweden.
Again coincidentally
named Robert Maillart
and he was designing bridges
with this revolutionary new
material called reinforced
concrete and using arches
and of course arches is a proven
way to design a supporting
structures.
At the time the math couldn't
say whether these bridges were
safe or not, but he forged ahead
and actually built and designed
a lot of these
bridges of course.
Now we have math that shows
that these are quite safe.
He also designed these pillars
that distribute weight better
and crack less than
other kinds of columns
in the buildings and the way
he is designing these things
is not necessarily with
mathematical design.
He started with
mathematical design.
And he was pushing the envelope.
But he was constantly verifying
he was great at testing things.
He would build scale models and
test them under realistic load
conditions and constantly
iterate on those designs
and that's why he was able
to innovate in bridge design
without hitting those
boundary conditions
because he's always
testing to make sure
that the things he
was designing work
the way he thought they did.
My next story is the
F 16 fighter jet. Back
in 1973, 1974
the Air Force went to
Northrop Grumman and said,
we may need a Mach 2.5 aircraft.
So they started trying to design
a Mach 2.5 aircraft, which
was really difficult at the
time because it was light enough
to go Mach 2.5.
It would vibrate apart and
other bad things would happened.
So they tried and
tried and tried
and they went eventually went
back to the Air Force and said,
OK, why does it need to
be Mach 2.5. The air force said.
Well, these things
are expensive.
And so if it gets
into a dogfight
we wanted to be able to get
away in a hurry if it needs to.
So now understanding
the actual requirements.
They went back and
designed the F 16 Tomcat
which was a Mach 2.0 aircraft.
But it was the most manoeuvrable
aircraft that ever designed,
and it could accelerate
really fast if it needed to.
And this is a great example
of one of the things
that we constantly have
to do, particularly
as software architects because
people keep bringing us
solutions rather than problems.
And it's very often
your job to be
the annoying three-year-old and
Say but why do you need that.
But why do you need that.
But why do it.
Oh, that's what you really need.
OK, now I see what the real
crunchy center of your problem
is.
And now I can actually come
up with a solution that's
very often the solution
has been handed you
like we need a Mach 2.5
aircraft OK from an engineering
standpoint, that's
not really possible.
Why Why does it need to be that.
And when you get
to the root cause,
you can actually address that.
As architects in particular,
we deal with a lot of these
elitism software. Software
really consists of requirements
plus all these other things
like performance and scalability
and security, etc.
But there's one
of these ilities
that I think is really
important that we
need to pay more attention to
as developers and architects
and that feasibility.
Should we even be doing
this thing or not.
Because what happens
a lot of times
is you're presented with
a problem that you suspect
may be not possible.
But you think.
Well, you know, if
we work really hard.
And you know work
a bit of overtime.
And you know if nothing
unexpected comes up
on this project because
when has that ever happened
on a software project or
something we didn't anticipate
come up you know that just never
happens on software projects.
So we can safely discount
that as a possibility
of the software
project, then maybe
if all those things come true.
We can get it done.
But you know that's
not going to happen.
Lots of things are
going to come up.
And a lot of times
you end up having
to ask why this is
more important now
then it was 10 years ago.
Because if you look at
the roles that people
had before the
continuous delivery and DevOps
revolution, architects
and developers
were very siloed off in
the engineering world.
And you had another silo
with DBAs and another silo
with operations.
But modern architecture
is like microservices
has forced a lot
of these barriers
to go away at least get
softer because now when
you look at things
like microservices,
you have to include things
like relational database
design and a lot
of the superpowers
of that architecture come from
interactions or operations.
And so while your political
influence and money probably
hasn't changed much
your perspective
has changed a lot over
the last few years.
And you actually have
a much better idea now
to see if something
is in fact feasible
should we be trying to
do this or should we
be trying to do something
is a lot simpler.
The next thing I want
to talk about is null.
So null was invented.
A lot of people
think that there's just a
natural part of the universe.
But it wasn't.
It was invented.
And in fact, it was
invented by this guy.
So Tony Hoare
who did a talking info
q about a decade ago called null
references the billion dollar
mistake. Cause he figures
that null the design of null
has probably cost the industry
about a billion dollars and counting.
And he asked that you please
don't write him a bill for that
because he's saying that
he's sorry for doing that.
So where did this
thing come about.
How did this come
into our lives.
Well, if he go to
the O'Reilly website.
You can download this brief
history of computer programming
languages, which only
accounts for a few of them
Lets zoom into
a particular area
here and look at the
origins something like Java.
Now of course Java started its
life as this thing called Oak.
And you can see Oak there
in June 1991
that was the
origins of Java.
If you look at the
progenitor languages.
It lead into that like ANSI C
this blue line here C++. This
red line there is
So let's go backwards
in time a little bit.
This language called
cedar which came from.
There's a Pascal up
there's that purple line.
And that led to modulo 2,
which eventually
led to Objective C and Java.
But if we keep going
back to the things that
see this purple line right
here that inspired Pascal
we keep going back
and back and back.
What you'll end up
with is ALGOL 60
and that's the language that sir
Tony designed way back in 1960.
And that's the language
introduced the idea of null
into our lexicon as
software developers.
So back at that time, there was
no such thing as open source.
And there were lots of
different computer makers.
He worked for computer
maker named Elliot
that was based here in the UK.
And the way that
computer makers worked
was they built a piece of
hardware proprietary hardware
and then they would build
a proprietary operating
system for it and
a language that was
optimised for their hardware.
And they sell you this solution
as a package.
This is some of
ALGOL 60. The language
he designed so you
can see it's the beginning
of the structured programming
or evolution you can see
the origins of Pascal in
this because this is one
of the progenitor and
inspirations for Pascal
and one of the things that
he thought about a lot
when he designed this language
was should he bill protections
into this language
because a really
common thing at that time
was to cheat on things
like array subscripts and read
past arrays and actually store
stuff you know before
beginning of arrays
and cheat and do
things like that.
He thought about adding
protections in his language,
including null pointer
checks and the language.
And he created a version of
ALGOL60 that had null pointer
checks it just like we have
in Java and the .NET world now.
And he built a switch where
you could turn these checks
on or off if you wanted to.
And then he sent out a
survey to all the people
who were buying the Elliott
computer and said, hey,
would you like a much
safer programming language
for your computer and
the overwhelming answer
was no. For two reasons.
First reason was it slowed
things down noticeably.
In that era we're
not we're talking
about very primitive software.
And of course, if you're
adding null checks and that kind
of stuff, it would make
a noticeable performance
difference.
But that wasn't really the
main reason they all said no.
The main reason they all said
no was because the way the world
worked that time if you built a
new computer an Elliott computer
that was running ALGOL
60 you can't convince
the entire world to write
everything and ALGOL 60
right away.
And so work has to be is
backwards compatible to watch
common at the time and
have a cross compiler
to compile into your language
so that you can support
all the code that's
already out there
and the code that was
most dominant at the time
was Fortran and in Fortran every
single sleazy trick in the book
was considered standard
when you wrote Fortran code
about cheating on arrays
subscript and storing things
that before the beginning of a
raise and after the end of raise.
And it turns out that if you
added those safety checks
into ALGOL it would compile
none of the legacy Fortran
code.
I mean zero of it.
And so it was a made to
the computer essentially
pointless to try to sell because
we could have all these safety
checks and oh, by the way, is it
not applicable to all the code
that you currently have,
which is not a great sales
pitch to make. If you go
back to this language
as chart see the language that
comes down from the top
and meets with ALGOL 58.
If you go all the way
back up there there's
our friend Fortran.
The first language.
And this explains why we
can't have nice things cause
we're constantly
trying to be backwards
compatible to the
past all the way back.
The first language when you
get a null pointer exception.
Now, in Java or .NET it's because
those languages are trying
to be back compatible to Java
and C family languages,
which are to back
compatible to ALGOL.
And that's awful.
It was just trying
to be backwards
compatible to the first
language, which is Fortran.
We can never get away
from all the legacy
stuff that's laying around.
This is one of the great
frustrations in software
as you never really get to start
greenfield there's
always some legacy stuff
that you have to deal with.
Of course, the
thing he was trying
to trade off their safety
versus speed and protection
and obviously speed and
protection lost out.
This is true in virtually
every language that you see.
So what is the
working crazy legacy
things is in Java right now.
If you ever get a
chance to meet Brian Goetz
who's the designer
basically of the Java language
right now.
If you ever want to
see the red veins
stick out in his forehead.
Just bring up the
topic of serialization
because when Java
was designed it
was assumed that every
application going forward
was going to be a three tiered
distributed application.
Because that was all
the rage at the time.
So one of the design
criteria in Java
lets built serialization
into the core of the language.
Every single thing you
can put on the wire
and get it back off of the wire
and now fast forward 20 years.
Brian Goetz is
trying to figure out
how to serialize a lambda,
which makes no sense whatsoever
because there are a lot of
things that just don't make
sense to do that.
But that legacy they've
been carrying in fact,
there's a huge conversation
that just got kicked off
in the Java ecosystem.
What happens if we
kill serialization
and it's a big deal because
Brian works at Oracle.
And his number one mandated
Oracle is don't break Java.
His number two is
to make Java cool.
But number one is
don't break Java.
And so now he's got to
tread this very fine line.
In fact, the design of
lambdas is quite brilliant
because it was completely
backwards compatible to all
the stuff that came
before, but still worked,
but that becomes increasingly
difficult as time goes on.
But it's really hard to judge
the long term implications
of some decisions like this.
Another great example
of that is ADA.
Now this is not
lady Ada Lovelace
who's generally regarded as the
first programmer in history.
This was the woman
who writes and wrote
some code for Charles Babbages
difference engine
and turns out would
have worked if they'd
had the money to actually build
that steam powered computer.
But her name was
used as a language
that was designed in the US.
Back in the 1980s, in 1975 the
Department of Defense in the US
noticed an increasing
problem that was coming up.
Remember the Elliott computers.
I was talking about earlier.
Well, I had a whole bunch
of solutions like that.
The problem was if
you're Elliott computer
and you're not successful,
you've go out of business.
And that means that there's
no support for your hardware
your software.
There is no such thing
as open source inside.
And so they kept
finding themselves
on dead end platforms
embedded systems and a bunch
of other bad things.
And they also realized
they had written the code
to translate from xy
to polar coordinates
in every single
language on earth.
And that gets less
and less valuable
every language you have
to translate that to
because they're having
to write everything
in so many different languages.
And so they put together this
high order language working
group to try to figure out
is there a single language
we could standardize on
organization wide to solve
all these problems.
And they looked around
and couldn't find one.
So they did what all
government organizations did.
They put together a committee to
design a brand new programming
language and that
language called ADA.
This language fill in the
weird little crack here
between in the early to mid 70s
was a structured programming
revolution with
Pascal and C and then
there was a brief time
about three or four
years were modular
programming languages became
really popular, Modulo
was one of those.
and ADA was one of them.
And then object oriented
languages came along and swept
them all away.
And that was C++ and Object
Pascal etc and ADA fell
into that modular programming
language era right before
objects really became popular.
ADA was a very readable
language a very Pascal like.
And so here's some
actual ADA code
that I wrote a long time ago.
A lot of this code
is still around.
In fact, the avionics on a
lot of aircrafts written in ADA
the 777 avionics
code is written in ADA
because it's deterministic.
It's really good for
multi-threaded real time
systems for the reason it was
designed, and it turned out
this is a great
thing for the DOD
because in 1983, they
were supporting more
than 450 languages and by 1996
it was down to 37.
Almost everything
was written in ADA
with a few exemptions for
very specialized cases.
But this turned out to be a
really good thing for them.
But then in the early 2000s.
They started moving
away from ADA
and in fact, they've
stopped standardizing on ADA
now for a couple of reasons.
One is the rise of commercial
off the shelf software
and the idea of open
source and the ability
to take on something once
it has ended life. Back
in the 70s that was not common
at all if a company went out
of business
they just took all their
toys and went home.
But now it's more common
for things to last,
it's common for things to
be based on common operating
systems like Linux and then
they open source things
when companies go away and
that can work out deals now
to get that code.
But the main reason
they ended up
moving away from
the ADA strategy
was the ultimate problem that
some companies find themselves
in, which is if you
build everything,
then you own everything.
I did some work for consulting
for a finance company
in outside of New York.
And they were
very, very paranoid
about open source and
any of their source code
getting out of their company.
And so period when in fact,
that they had taken the USB
slot on the desktop
computers and fill them
in with super glue.
So that nobody could
plug-in a USB drive.
So we were actually there to
do a code review and assessment
and it took over a week for them
to get us a copy of their code
in a way that we could
read it because everything
was locked down all the
CD writers were turned off
all the USB.
So one of the managers
literally went home
and broke the rules
of the company
and burned it onto
a CD-ROM for us.
So that we could actually read
the code in this organization
and they were so paranoid
about using open source
that they had built their
own application server.
They built their
own web framework
that was sort of like
Struts but not really.
They built their
own CORBA ORB.
They built their own.
At one point I said, did
you build your own operating
system.
And I said, no, we're using one
of those from somewhere else.
I said, why not.
You built everything
else from scratch.
The reason we were there
consulting with them.
I'm not kidding.
They were thinking about
building their own IDE.
And we told them, OK, stop.
You got to just stop now
because you can download
Eclipse you know you can.
It takes like 15
minutes to download it.
You're talking man decades worth
of effort to create something
like this 15
minutes versus this.
But the fundamental problem
they were running into
and a lot of people
run into this problem
is that you're on the
cutting edge of something
like, for example, those guys
because they built
their terrible version of
Struts before Struts came out.
So they had their own craptacular
version of Struts.
And then Struts came
out and it was 20%
different from their version.
So now you have to
make the decision.
Do we stay with our own internal
craptacular version of this
or just switch over to
the open source one.
And they always decided let's
keep with our internal version.
Well, now fast forward 10 years.
The best developers
in the organization
are doing nothing but
full time maintenance
mode on your own internal
broken crappy code
because your code is nowhere
nearly as good as the Struts
code because there are
literally thousands
of people working on that.
And you've got two people
working as hard as they can
in the basement of your
organization trying to keep up
with the rest of the world.
And that's exactly what
the DOD discovered is that.
OK Well, now the
army needs to be
able to support things
like TCP/IP that means we
have to build TCP/IP in ADA.
Oh, man.
Really? Can't we just download
that somewhere and get on
with it.
I work for a company that
had a bunch of mainframe stuff.
And we said, yeah.
We really want to SOP.
This is when SOP
was really popular.
We need to make SOP
calls in the mainframe.
The estimate is it will take
nine months to support that
Nine months?
And I said
Yeah, we got to write TCP/IP
for the mainframe.
So let's reinvent everything.
But this is the
problem that you run
into standardization
standardization
is obviously good.
But once you get
to a certain point,
it becomes one of those trade
offs shifts to something bad.
It's really hard to tell
exactly when that happens.
So the next one
is the Ariane 5.
This is a commercial rocket that
takes satellites up into space.
And this was in 1996,
the 4th of June in 1996.
And this is one
of their launches
and you'll see that this
launch actually goes
tragically bad at 39 seconds.
This is an unmanned
spacecraft that
cost millions of dollars
worth of satellite equipment.
But no one was harmed
during this.
You'll see that 39 seconds
into this launch the rocket
veers violently off
to one side and that
causes an internal self-destruct
mechanism to trigger
because if it ever goes off
course enough is automatically
self-destruct and that's what
we're about to see happen.
There's the veer and there's
the automatic self-destruct
which destroyed this rocket and
of course, after this happened.
They did a forensic
analysis
why did this happen.
And they learned
exactly why it happened.
So this is the Ariane 5.
Anyone want to guess
what came before the Ariane 5
It is not a hard question.
The Ariane 4 exactly.
And when they designed
the Ariane 5.
They said, well, you
know what are things
we're going to need for the Ariane 5
is a guidance system
Somebody said
Oh, no problem.
We've got one of those we had
one for the Ariane 4. Check.
Good we're good on the
guidance system part.
So we're good to go.
The only problem was
the guidance system
in the Ariane 4 with
16-bit and the flight data
recorder on the Ariane 5 was 64-bit.
It turns out that 39
seconds into that flight
on 1996, the flight
data recorder
crammed a 64-bit number
into the guidance system.
That was 16-bit.
It turns out some of those
bits might have been important.
And that's exactly
what caused the rocket
to crash as this
overflow error when they
tried to cram too much data.
It got a crazy invalid reading.
That's what caused the
flight correction, which
calls the auto destruct.
And so the obvious problem here
is the guidance system 16-bit
because the version 5 is
fashioned in version 4
and they're trying to reuse
this code because code reuse is
such a benefit.
On so many projects.
Here's the really
frustrating thing
about this particular
thing, it turns out
that they actually had
two different guidance
systems in the Ariane 5 and
the old one for the Ariane 4
was only used on the ground to
tell where the rocket was where
they were doing all
the moving around
and all that sort of stuff.
The problem was though,
that when they first
put it in the rocket.
They would crash a lot
because it was buggy
and it took a really long time
to restart that guidance code.
And so they just put a
hack in it and it said,
well, just keep that guidance
code for the ground running
for the first 40
seconds of flight
and then have it
automatically cut off.
So we won't have to incur
that reboot cost every time
whereas we're testing
it on the ground.
And of course, I forgot
that code is in there
in 39 seconds in flight the
flight data recorder cram
this number into this ground
based guidance system that
was never expecting that.
So basically what
these guys were doing
is debugging in
production, and it
caused them to lose millions
of dollars worth of satellite
equipment because of this forced
40 seconds of flight thing.
So this is a combination
of several cascading
problems debugging
in production
using too much legacy stuff.
But another problem that I refer
to as abstraction distraction
this attack was all the time
and is getting worse and worse.
Here's what I mean by that.
If you look at common
technology stack from 2005
you could change the
names on the boxes,
but the general
topology is basically
that. If you look at the similar
lines in boxes view in 2016
or 2019 it's way more
complicated way more moving
parts and way more
abstractions and that's
the problem we run
into as developers
because all these things
are based on abstractions
which work almost all the time.
Almost all the time.
But that's the problem.
You trust these things
to work all the time
and you get really surprised and
annoyed when they don't work.
And what happens is some
weird little problem
will occur at the bottom
of this abstraction stack
and then bubble all
the way up to the front
and then this thing pops up in
your face with a 10,000 line
stack traces like,
where did you come from
and what primordial
swamp am I'm going
to crawl into to kill you to
figure out how to get back
to what I was doing before.
The classic example of this
is something we fortunately
don't see much anymore,
which was this because
you'll be working along
in your files and folders and
nested hierarchy and all that.
And you see this and your
first thought is how much have i
lost because all those
abstractions just went away.
And now you're back down to
zeros and ones on a middle
spinning rust disk and
trying to figure out
how to put those things
back together again.
This is a problem as we
keep building more and more
sophisticated software we keep
building those on abstractions
that almost always work,
but then when they don't
it becomes more and more
difficult over time
To debug those things.
And of course, this
can never be a problem
because you can
always safely assume
that an int is always exactly the
same size across all platforms
ever.
Because who has ever
gotten in trouble
making that assumption.
My next one is about the
one I brought up before pets.com
These two are actually
sort of in a pair,
because they have
the opposite problem.
So there's probably
no chance anyone of you here
remember pets.com.
Even people in the US
who are alive in this period
don't remember the name, but
some people remember this.
This is from 1999.
This is the mascot
for pets.com.
1999 this is when
Amazon was just getting
started and pets.com
came out and they
were going to be this
massive online superstore
for pet supplies and had a
massive marketing campaign
with a sock puppet
and it worked.
This thing was everywhere on
TV and the big football game
ads and all that stuff.
And they became
super, super popular.
The problem was they
spent a lot of money
on their marketing
campaign, not so much
money on their infrastructure.
Back in 1999 elastic scale
wasn't a configuration
setting on your cloud provider
because none of those things
existed.
If you really wanted
a scalable solution
you had to buy hardware and you
had to buy a lot of hardware.
Pretty expensive hardware.
It turns out these guys did not
and their marketing campaign
was great.
And so something
happened that nobody really
thought was possible that they
were so successful that it
destroyed them.
And you know in the real world,
it's hard for enough people
to show up at your
brick and mortar store
to actually reduce it to rubble
because of the laws of physics
kind of prevent
that from happening
you know the size of doorways
and that kind of thing.
But on the internet
that's not true.
You can have enough
people show up at your site
to completely destroy it.
And that's exactly what
happened to these guys.
But here's the fundamental
problem they ran into.
You build an architecture
looks like this.
And then your web
server gets slammed.
So what do you do? They
add more web servers then
your app server get slammed.
So what do you do?
They add more app servers than your
database server get slammed.
What do you do?
You know the answer there is
you add more database servers.
And they get slammed.
What do you do?
We've run out of
places to add stuff.
This is exactly the
problem Amazon ran into
as well is that we ran
out of places to add more
stuff to our infrastructure
now. This actually kicked
off an architecture
that was popular in the late
90s and early 2000s, which
is very contemporaneous
with pets.com called
a space-based architecture.
This is a based on this
mathematical theory called
tuple space.
And the idea of a space-
based architecture was you
could build this architecture
get withstand incredible levels
of scale by splitting these
things out and basically
consolidating all those
results back together
eventual consistency style.
These things turned
into a big giant caches.
So Coherent started its life as
one of these things and McCain.
What we know is
Coherence now. Hazel-
cast an example of one of these
big giant distributed caches
but it's a way of handling
this kind of elastic scale
before we had the
cloud providers,
where it's configuration
setting to get elasticity.
So pets.com way under invested
in infrastructure Webvan were
over invested in infrastructure.
This is a grocery
delivery service in the US
that spawned in the
early 2000s and they
built these enormous
warehouses everywhere.
They didn't rely on
existing grocery stores.
They built their own
infrastructure everywhere.
And they also had a really
bad plan for scaling
because they started on
the west coast of the US
and then tried to move to
the middle of the country.
And then the south and
then the dead middle.
You know if you're moving
perishable things around maybe
you want to keep things
close together not as far
apart as possible.
And so they had
the problem of way
too much infrastructure
before they
could support the kind of
scale that they needed.
My next story is a cautionary
tale about lazy engineering.
This is the story
of Knight Capital.
If you ever want to find a good
reference to this story just
Google the phrase
"bankrupt in 45 minutes",
and you'll find the
story of Knight Capital.
This is a great example of
cascading indiscipline causing
problems in software projects.
So Knight Capital is a commodities
trading firm in the US.
And the SEC
in the US releases guidelines
for algorithmic trading.
These formats that
you're supposed
to follow for our
algorithm trading. A new one
was coming out.
Thats going to go in
effect in 2012 call SMARS
and there's an
acronym for something
i don't remember.
That implemented several
of these algorithmic
trading schemes before.
And there was an old
one in their code base
called Power Peg that was
still in their code base
from years ago underneath
a feature toggle
had just been left off
for all those years.
So this is strike one never
leave old feature toggles
laying around
in your code base.
But they did.
And so one of the
developers who I
will nominate as maybe
the laziest developer
that I know of.
Suggested that when
they implement the
new SMARS code.
Why don't we implement that
underneath the old Power Peg
feature toggle because you
know if we create a new
SMARS feature toggle we've
got to figure out something
to name the SMARS feature
toggle. You know
naming is so hard in software. What
could we call SMARS feature toggle.
Uhhh
I don't know.
Let's just reuse the Power Peg
with that seems like the safest
thing to do.
So that's what they
that is strike two
never ever reuse an old
feature toggle but they did.
And then they deployed
the brand new code
to seven of their eight servers.
Now, I don't know what happened
between seven and eight.
Maybe their
operations person got
distracted by something shiny
and it just never happened.
But for whatever reason,
this is the setup
when they went live
on August 1st 2012
and when they went live it
turns out seven of the servers
are doing SMARS
and one is doing Power Peg.
It turns out that
SMARS is selling.
Power Peg is buying and they looked
at this and said, wait a minute.
That's not supposed
to be happening.
That's bad.
That's not supposed to
be happening like that.
What's causing that.
It's got to be the
new SMARS code.
We screwed up something
and used SMARS code.
Get that SMARS code
off of all the servers,
and so they hot
unemployed this SMARS code
from all the servers.
But left the Power Peg
feature toggle turned on.
And so now they're all doing
Power Peg. At about 45 minutes
they pull the plugs
out of the wall.
And the servers make it stop
and had a venture capital
firm write them a check for
a little over $400 million
to keep them solvent.
Because that's how much
they had racked up in sales
that they could do support
over that period of time.
This is a great example
of really lazy engineering
practice.
But it's a great example too
of what I call bad variability.
So variability on software
project requires effort.
And this is a bad
kind of variability
because all their machines
have exactly the same.
This could never have happened.
But allowing that
variability to creep in
allowed this to happen.
And so this sort of
begs the question,
do you even know
what DevOps means.
And of course, this led to
that was called a DevOps
cautionary tale.
But it's also good
advice to clean up
some of that technical debt.
My next one is
really ultimately,
the motivation for doing this
talk because every developer
should know about the
San Francisco project
and not nearly enough
people know about it.
Back in the late 1990s,
IBM looked around said,
you know what.
We've seen a lot of
business applications.
Pretty much all the same to us.
So we're going to build one
last business application.
And it's going to be
called a San Francisco
Project because
we've looked around
and we've noticed every company
seems to have this thing called
general ledger.
So we're going to build
one last general ledger
module that will accommodate
every company on Earth.
And the idea is you just
download the general ledger
module and just tweak and
tweak and tweak and tweak
some properties.
And if you will custom
code and tweaking it.
And before long you've
modeled your business exactly.
That was the vision of
the San Francisco project.
They released these
design documents.
Here's some of the
design stuff here
with these core
business processes
General Ledger accounts
payable accounts
receivable warehouse
management order manager
because every company
has this stuff, right.
So we'll just build one last
version of all those things.
And so for a while.
This is the largest
Java project on Earth.
It import more job
developers than anybody else.
And they actually released two
modules of the San Francisco
project before they realize
that the dumbest project
anybody to ever
try to build ever
cause they were trying
to chase this idea
of ultimate canonicality
the single source
of truth ultimate reuse across
everything in their world.
And turns out it
was a huge disaster.
And so they started with
the foundation utilities.
And these common
business objects in 1997,
was the original design.
You move forward into 2000.
They decided at this point,
they've shipped some stuff
and said, you know what.
That was a bad idea because
several people download
the first versions
and say, OK, who are
going to be the first
people to use it.
And they said, well, it's
missing this and this and this
and this.
And this doesn't do this right.
It doesn't.
And they're like,
oh, wow
this is really complicated.
Whoever imagined that.
How shocking.
And so by this point,
they had decided
that this is not going
to work and they're
going to abandon this project.
But they've written
so much code.
There's just so much code there.
You can't just throw it away.
You can't just throw it away.
There's just so much.
I mean, you just you can't
throw that away.
We've got to do
something with it.
But what do we do with it.
Well, all that code.
Well, it turned into
something you may
have heard of called EJB.
That was all the San Francisco
common business objects
all the crazy amount of
indirection and interfaces
and all that stuff.
EJB that all came from
the original design of the San
Francisco Project.
Lest you think this
wasn't a big deal.
There were books written about
this San Francisco component
framework and
introduction, which
has one five star review on
Amazon, which I've got to think
is the author's mother
because hard to believe
that this is the book to
finally just lit somebody up.
It's like this is the book.
I've been waiting for.
This is a big deal because you
notice the foreword of this one
by Martin Fowler.
So this is a big deal at the
time when it was created.
And I actually have
one of these books.
And it kind of
interesting to look
at the way they were trying to
design this because if you're
really trying to design
something to last a long time.
They're design probably
would not have lasted
but it's a great
example of something
that we see a lot in software.
And we codify this in the
evolution architecture book
is what we call
the last 10% rule.
I used to work for company
many years ago that
built applications
in Microsoft Access.
And we ended up shutting down
that part of the business
because we realized every
Access project starts
as a booming success and
ends in total failure.
And we don't understand
the dynamic of why that
kept happening at over again.
And we codified this
as the last 10% rule.
So if you look at
what the user wants.
If you use a tool like
Access about 80% of what
they want is super
fast and easy.
It's really amazing how
fast you can get there.
The next 10% what
they want is possible
but difficult because
now you're trying
to do something that the
framework doesn't quite
want you to do.
But you can kind of hack it
and bend it and figure out
a way around it.
But that last 10% is impossible.
And it turns out users always
want 100% of what they want
and they're never satisfied with
some random 90% less of that.
And that's why every Access
project ended up in failure.
How many of you in the room
remember the fourth generation
language the 4GL that
were all the rage in the 1990s.
So maybe you have enough gray
hair to remember 4GL.
Where are they now they're
all gone because they all
suffer from this last 10% trap.
What about something now
that suffers from this right
now to me all the serverless
stuff looks exactly
like this because it's a
great way to prototype.
Look how fast you
can get stuff done
On somebody else's computer.
You don't have to
worry about it.
And then you tried to
put it in production.
It's like, oh late
Oh startup oh.
And I've seen horrible hacks
where we'll build one lamdba
constantly ping
this other one to
keep it alive.
So I don't have the startup cost.
That's a classic
class 10% problem now.
A lot of those platforms will
eventually probably build out
to full blown platform.
But right now they
still have very much of
this characteristic.
And this is one of
the problems you
run into with every giant
ERP package on earth
is they're trying to achieve
this ultimate level of business
reuse.
So my next one I
mentioned earlier,
this problem of legacy.
What if you had a project
with infinite time
no deadlines infinite
resources it was literally
funded by billionaire
who said, I
don't care how much you spend.
And I don't care
how long you take.
And there's no legacy
you have to support.
This sort like fantasy
land for most developers
who are dealing with
schedule pressure and legacy
code and all that
infinite time infinite
resources, no legacy and this
project still failed utterly.
This is written about
in the book dreaming
in code, which is a great book.
But I have to warn you
if you're a developer,
it will give you
trigger warnings
because what this
book is about, so Mitchell
Kapor was the one of the
creators of Lotus 123
What is a famous piece of
spreadsheet software.
But Lotus also created the
first personal information
manager called Agenda what was
a really powerful piece
of software
considering the
platform is on. When
he left Lotus he decided
to found a company to build
a modern version
of Agenda called
Chandler named after Raymond
Chandler the detective
novelist.
So he's a billionaire.
So he hired rockstar developers
all over Silicon Valley
and told them.
Here's the thing I want you
to build a desktop application
to pim that
does this automatic peer
to peer replication over
the internet has all
these movable properties and all
that stuff start working on it.
And this journalist
Scott Rosenberg thought,
oh, this is cool.
Here's an open source project
that's funded by billionaire.
It'll be really interesting
to write about this
and see the dynamic of
how software gets built.
But he didn't actually see
a lot of software get built.
What he got to see is a
lot of meetings about stuff
because this was the big
problem they had was they all
got together said, well, you
know none of the existing data
base technologies
is quite perfect.
So we're going to design our
own database with infinite time
and money and we want it
to be exactly perfect.
So they design
their own database.
And then, they have
another meetings that.
You know the database
that we design.
It's just not quite perfect.
So they design another database.
And then, they had another
meeting about some stuff.
They said, well, you know
that's not it's almost good.
But you know it's not quite.
Exactly and so they started
designing another database.
And at some point, the author.
So they're trying to do
six month releases here
and they've missed
the first six of them.
They've released no software
we're years into this project.
And this writers like
I'm trying to write
a book about a software project
you guys think you're ever
going to create a
software project here
or just it going to be another.
Oh no it's another meeting.
It's another.
And so eventually the
journalist is like that's it.
I'm out of here.
And so the book ends
before the project ends,
because the book basically
ends it says software project,
people are knuckleheads.
And that's basically
the summary of the book
is that these people don't
know what in the world.
They're doing.
They eventually limp
to the finish line
and shipped this thing.
And then shut down the
entire Chandler project.
It did about 1% of what
the original vision was
did none of the magic stuff.
And basically just shut
the whole thing down
and basically they
built all this stuff.
But because they didn't have
any kind of deadline pressure
nothing to keep them corralled.
They were just
building everything.
And this brings up
an important problem
we see in software, which
I describe as metawork
is more interesting than work.
Problem is if you
could go to developers
and they've written the
same code if you really
want to describe your
grandparents what
you do at work, take
things off web pages
and put them in databases and
not take things off databases
put them on web pages.
Yeah, there's a bunch of
details in between all that.
But that's kind of how it works.
And if you've done that
for a bunch of projects
it gets kind of boring.
You know developers do in
their day job gets boring they
invent cool puzzles to solve.
So they'll have
cool puzzles to work
on when they come into work.
This is metawork is
more interesting to work.
So the Golden Gate
Bridge in San Francisco
there's a crew that paints it.
And it takes a year to
paint the entire thing
and the water is so corrosive
that once they get to the end,
they just have to start
over and paint some more.
So they have a full time
job painting the Golden Gate
Bridge.
I was talking to one of our
client developers recently.
And he said, yeah,
we're currently
ripping out all the Angular stuff
and replacing it with React.
And I said, oh, is there a
big business driver of that.
He said no, they're just
painting the Golden Gate Bridge
because you know when you
talk to the business people
keep asking for stuff.
But you know here we're busy.
No sorry we're busy.
We'd like to do that.
But we were busy.
We're painting.
We're busy.
Well, you know, when
you get that done,
you will have the next thing you
take some time for. My last one
is the other
field trip in Barcelona
is the Sagrada familia.
This is the basilica that
looks like it was designed
by Dr. Seuss's the magnificent
Gaudí architecture inside it's
supposed to look like a forest
where all these pillars go up
and meet into these
branches and trees.
These are all pictures I
took inside Sagrada familia.
It's an absolutely
magnificent place
inside with all
these stained glass
and all the lighting etc.
The interesting thing
about this for us,
though, is the structure
of this building in the way
that the structure
actually holds up.
This is some scale.
You can see some
people standing here
in the way the scale works.
The fascinating thing for
us about Sagrada familia
is that you should go see
it and look at the basilica
because it's
absolutely beautiful.
But then go to the
museum in the basement
because it actually shows a
Sagrada Familia with design
because it turns
out that when Gaudí
was trying to design all
those interlocking support
structures.
He didn't have enough math to
figure out if it would actually
stand up or not.
But he understood
this physics principle
that if you take
string and deform
it would gravity that forms
a weight bearing arch.
If you turn it upside down.
It's a weight bearing arch.
And so what he did was
design all the arch work.
This is actually a
similar cathedral.
But the same design criteria
using streams and sandbags.
And so you can see the way
those deform into arches
of the kind of arches that he
designed for Sagrada familia.
Here it is from the side.
But if you look at this.
If you look up here
at the top, you
can actually see some
people scale upside down.
And so using the
magic of slides.
I'm going to flip this over.
That's the cathedral.
You can see the people standing
here at the bottom of scale.
And you can see the arches and
the way all those arches form.
That's exactly how he
designs Sagrada Familia.
Now of course, we know
that all that math works.
But before we had
the math to do that.
He figured out a
way to figure out
all that weight bearing
to be able to design
that magnificent structure.
So it's a great
example of literally
experimental architecture
of how can we
take architectures and
experiment with them
in interesting ways.
And we see this in the software
world Netflix and Chaos
engineering.
A great example of
pushing some envelope
like that all of
the web services
stuff from Amazon, LMAX.
If you ever want to see
a fascinating project
that really pushes things
like transaction speed
to the ultimate limit.
So to summarize
all these things.
One of the interesting
things to me about this
is how a common thread runs
through so many of these, which
is this desire for
reuse both good and bad
because this is what Nab the
Vasa was trying to reuse design
Tacoma Narrow is
trying to reuse design
and pushing it past
where it could live
try to reuse old code and null
try to reuse everything in ADA
again old code an Ariane 5.
Not enough reuse and
pets.com and not enough
infrastructure, invest way
too early in reuse for Webvan
reusing old
toggles for Knight Capital.
The San Francisco
project was trying
to reuse everything that's
ever been created ever.
The Chandler project
was trying to build
all these things were used
before actually building
anything that was useful.
And usable and even
in Sagrada Familia
we saw reuse in design.
We see this in code as well.
One of the things that we
constantly keep trying to do
is reuse code.
One of the great
observations by John Cook
is is nowhere nearly as easy
as we think it's going to be
and software used is typically
way more like an organ
transplant than it is
snapping together LEGO blocks.
And so we still need
to reuse things.
But we need to reuse things
very carefully in our world
because it comes with
it's a classic trade.
It comes with both
a benefit and cost.
And so reused stuff.
But reused stuff carefully
and know what you're reusing
and realize what it is that
you're taking advantage of
and you're not
incurring a huge cost.
And my favorite quote about the
past is not the Santayana one
but actually the William
Faulkner quote, "the past
is never dead.
It's not even past."
Remember, this
quote, the next time
you deal with Null
pointer exception
in your modern shiny
programming language
so that it can be backwards
compatible to Fortran.
So thanks very much.
Hope you enjoyed it.
