>>Male Presenter: Hello everyone. And welcome
to Sharon Bertsch McGrayne's talk on "The
Theory That Would Not Die." When I got the
announcement of the book and I asked for a
copy of it, I said, "This looks like a very
Googley book." Bayes' theorem is used all
over the place and when I asked you guys if
you're interested, it was either the first
or the second most popular book that I've
ever put up.
So, it looks like we've had a very good turnout
given that everyone here probably knows more
about Bayes theorem than I do. And I know
a fair bit. I will turn it right over to our
speaker. Welcome.
[applause]
>>Sharon Bertsch McGrayne: Thank you. Well,
thank you very much for inviting me and thank
you all for coming. I wanna start right out
with some truth in advertising and tell you
that I'm not a computer scientist. I'm not
a statistician, a scientist, or a mathematician.
I come to you from newspaper reporting and
from science writing. However, I became intrigued
with Bayes rule seven or eight years ago when
I could Google the word "Bayesian" and get
fewer than a hundred thousand hits. And last
week, when I went on and I Googled "Bayesian,"
I got eleven million hits.
So, today I wanna talk to you about how you
all are real revolutionaries. You're participants
in a remarkable--almost overnight--revolution
about a very fundamental scientific issue--how
you deal with evidence, how you deal with
data, how you evaluate the evidence and measure
the uncertainties involved, update it as new
knowledge arises and then hopefully, change
minds in light of the new data.
Now, usually when I talk about Bayes' Rule,
I start with a long list of examples of where
Bayes is used and I do not think I need to
do that with this crowd. Google often uses
naive Bayes and other Bayesian methods and
recently, Google's Bayesian driverless car
got headlines all over the world.
And when I wrote a short piece about it for
Scientific American, it was one of the most
popular articles in that issue. So, it was
well-known. But there are a couple of examples
that I'd like to bring up today that you might
not know so much about. [pause] The first
is the Air France jet Flight 447 that took
off two years ago last month from Rio de Janeiro
bound for Paris overnight.
Went into a high altitude, intense electrical
storm and disappeared without a trace with
228 people aboard. The world's most high-tech
search ever, naval search ever, lasted without
success almost two years. They were searching
for the wreckage and for these two black boxes,
which as you can see are actually red and
white.
They're the size of shoe boxes and they were
searching in mountainous terrain two and a
half miles down under the ocean in an area
the size of Switzerland. They had no success.
Last winter, the French government hired some
of the same people who appear in "The Theory
That Wouldn't Die" who developed Bayesian
Naval search methods.
And their software calculated the most probable
sites for the wreckage, which was found in
April after an undersea search of one week.
OK? Now, about a month ago, the agency in
charge of British archaeological sites announced
that a wonderful Bayesian program had shown
them that most of what they'd known about
Neolithic era in Britain was, in their words,
bollocks.
[laughter]
Neolithic people built these strange hilltops
surrounded by concentric circles of ditches.
And they had not built them gradually over
the ages. They built and abandoned them, often
within the space of one generation. And most
of them had been built during a building spree
roughly five thousand, five hundred years
ago.
Now, the remarkable thing to me about these
two examples is that the British and the French
governments were saying how wonderful Bayes'
had worked. And as we're gonna see today,
a lot of people didn't even dare mention the
word 'Bayes' for decades in the 20th Century.
So, to understand why you all are such revolutionaries
we're gonna have to go back to the beginning.
And given the time constraints, I'm gonna
race through the beginning until we get to
the Second World War and when I'll slow down
some. But I hope we're gonna see two big patterns
emerging.
First, Bayes becomes an extreme example of
the gap between academia and the real world.
And second, military super-secrecy during
the Second World War and the Cold War had
a profound effect on Bayes. Now, Bayes rule
of course, is named for the Reverend Thomas
Bayes, a wealthy Presbyterian minister and
amateur mathematician who lived in a chic
resort outside of London in the early 1700s.
We know very little about him. I'm not gonna
show you his portrait because it's indubitably
of someone else. We don't know his birthdate.
Wikipedia has his death date wrong. But we
do have something personal about Bayes. And
that is his handwriting. Goose quills in Latin.
OK? And we know another thing about him, excuse
me. This, this handwriting comes from the
Institute and Faculty of Actuaries in London.
And we do know something else about Bayes
and that is that he discovered his theorem
during the 1740s when Europe was racked by
a religious controversy.
The issue is not unknown today. It was whether
or not we can use evidence about the natural
world around us to make rational conclusions
about the existence of God. They called Bayes,
in his generation, would have called God--.
Said it was not about God the creator, but
about God the cause, God the primary cause,
the first cause.
We do not know that Thomas Bayes wanted to
prove the existence of God, but we do know
that Bayes tried to deal with the issue of
cause and effect, mathematically. And in so
doing, of course, he produced a simple one-line
theorem that we modify our initial beliefs.
And he actually called it a guess. He used
the word "guess" and he said if, if nothing
else works, start with 50-50 odds that it
works and modify this, this guess with objective
new information and get a new and improved
belief, which in turn carries with it a commitment
to update that belief each time a new piece
of information arrives.
But Bayes didn't believe in his theorem enough
to publish it. And he dies ten or fifteen
years later with it filed away in his notebook.
Now, going through Bayes' papers, a friend
of his, Richard Price, who was another Presbyterian
minister and an amateur mathematician, decides
that the theorem will help prove the existence
of God the cause.
Now, unlike Bayes, Richard Price was famous
in his day. This is a royal society portrait
done by a famous painter, Benjamin West. He
later becomes a famous supporter of the American
Revolution, friends of our Founding Fathers
and a founder of the insurance industry.
And he spends the next two years, off and
on, editing Bayes' theorem. Gets it published.
Unfortunately, in a journal that primarily
the British gentry read and not particularly
continental mathematicians and it sank out
of view and was neglected. But certainly by
today's standards, Richard Price would be
considered Thomas Bayes' co-author.
If, however, there were justice in this world,
Bayes' rule should be named for someone else
entirely. And that is the great French mathematician,
Pierre-Simon Laplace, who's better known today
for the Laplace transform. Now, as a young
man of 25, Laplace discovers the rule independently
of Bayes in 1774 and calls it the "Probability
of Causes".
Now, Laplace, also unlike Bayes, was the quintessential
scientific researcher. He mathematized every
science known to his day and he spends the
next 20 years, off and on, in the midst of
this enormous career, developing what we call
Bayes' rule into the form that's used today
and actually used it.
But when Laplace dies in 1827, the Western
world begins almost a manic fad collecting
precise and objective facts. There were clubs
that collected them. Even women could do it.
Some of the famous numbers were the chest
sizes of Scottish soldiers, the number of
Prussian officers who were killed by kicking
horses, [laugher] the number of victims of
cholera.
And with lots of these precise numbers at
their disposal, any up to date statistician
rejected Bayes' rule. They preferred to judge
the probability of an event by the frequency
that it occurred--nine times out of ten, three
out of four, and so on. And eventually they
will become known as the Frequentists.
And the Frequentists become the great opponent
of Bayes' rule up until quite recently because
for them, modern science requires both objectivity
and precise answers. And Bayes, of course,
calls for a measure of belief and approximations
and the Frequentists called that quote "subjectivity
run amuck," "ignorance coined into science."
By the 1920s, they were calling it, saying
that Bayes "smacked of astrology, of alchemy."
And another said, "We use Bayes formula with
a sigh as the only thing available under the
circumstances." Now, the surprising thing
that I discovered in all of this time is the
theorists and the philosophers denounced Bayes'
rule as subjective.
People who had to deal with real world emergencies,
who had to make one-time decisions based on
scanty data, they kept right on using Bayes'
rule because they had to make do with what
they had. So, for example, Bayes' rule helped
free Dreyfus from the treason trials in the
1890s in France.
Artillery officers in France, in Russia, and
the US, tested their ammunition and, and aimed
their fire using Bayesian tables in both World
Wars. The Bell telephone system survived the
1907 financial panic and the US insurance
industry started our first and only--for many
years--social insurance, workers compensation
insurance, almost overnight at the time of
the First World War despite having few facts
about the safety of particular industries
or particular businesses.
Now, every good book needs a villain. And
the villain of our piece is Ronald Fisher.
Despite Bayes' usefulness, Ronald Fisher started
attacking Bayes in the 1920s and '30s and
theoretician's attitudes about Bayes changed
from tepid toleration to outright hostility.
One of the reasons is that Ronald Fisher was
a giant. He was a superb geneticist. He founded
modern statistics for scientific work, randomization
methods, sampling theory, experimental design
methods--these are all some of Fisher's great
achievements. But unfortunately for Bayes,
the thing that Fisher hated most was Bayes'
rule.
He was a fervent eugenicist with an explosive
temper and a remarkable inability to understand
other people. For example, if Fisher got bored
at a meeting, he might pull out his false
teeth and clean them in public.
[laughter]
He, he interpreted scientific and statistical
questions as, as personal attacks. And his
life became, a colleague said, "a sequence
of scientific fights, often several at a time
at scientific meetings and in scientific papers
and he hated Bayes' rule the most." He didn't
need Bayes.
He didn't work with great amounts of uncertainty.
He was a eugenicist and he filled his house
with cats and dogs and thousands of mice for
cross-breeding experiments and he could trace
their genealogy back for generations, precisely.
His experiments were repeatable and they produced
precise answers.
And Fisher calls Bayes' approximation and
measures of belief "an impenetrable jungle,
perhaps the only mistake to which the mathematical
world has so deeply committed itself. It's
founded on an error and must be wholly rejected."
And he kept up this very personal fight against
Bayes for 40 years, into the 1950s when a
lone Bayesian at NIH was showing that cigarettes,
smoking cigarettes, caused lung cancer.
Now, Fisher was a chain smoker. He even swims
smoking.
[laughter]
And he was a paid consultant to the tobacco
industry. And he proposed, believe it or not,
not that smoking caused lung cancer, but that
lung cancer probably caused smoking.
OK? So, Fisher's stature and his utter inability
to discuss Bayes' rationally delayed its development
for decades. And I think we have to say that
Fisher is an example of how a destructive
personality can affect a field, particularly
a small field. Now, I wanna switch gears a
bit and dwell on the personal history of Alan
Turing.
First, because he's a hero of mine and second
because it illustrates how Bayes' worked as
a pencil and paper method, as one of the earliest
computer techniques, and as an illustration
of the effect of government secrecy. This,
of course, is Alan Turing. [pause] Now by
the time the Second World War began, as we've
seen, Bayes was virtually taboo as far as
sophisticated statisticians were concerned.
Fortunately, Turing was not a statistician.
He was a mathematician. And besides fathering
the modern computer, computer science, software,
artificial intelligence, the Turing machine,
the Turing test, he will father the modern
Bayesian revival. Now, it's also important
to remember that during the Second World War,
England was cut off from the agricultural
produce and supplies of the continent, particularly
of France and could feed only one in three
of its residents. And it depended on convoys
of unarmed merchant marine ships bringing
in each year 30 million tons of food and supplies
from North and South America and from Africa.
Now, Hitler said he thought that the U-boats
and their attacks on these convoys will win
the war.
And Churchill said later "the only thing I
was really scared of during the war were the
U-boats" because of the attacks on the supply
lines. And in fact, German U-boats did sink
almost 3000 Allied ships and killed more than
50,000 merchant seamen.
Now, the German Navy ordered the U-boats around
the Atlantic Ocean via radio messages that
were encrypted with word scrambling machines
called 'Enigmas'. The German government bought
the Enigmas during the '20s and distributed
them to all of its different agencies.
So, the Navy had one set of codes and could
develop their own complexities and their own
security controls. The Army had another. The
foreign service, the Italians, the Spanish
nationalists and so on. The German railways
had one. And the most complex of all was that
operated by the German Navy.
And this is a German naval Enigma. They're
had to sort out in pictures, but this one
comes from Frode Weierud's website called
'CryptoCellar'. Now, an Enigma machine, as
you can see, looks like a complex type sturdy
typewriter. It had wiring coming out of the
very bottom.
It had wheels. This one has four wheels, but
they started with three and added more later.
It had starting places. It had code books.
And it had many other features that could
be changed within hours if necessary, and
that could produce millions upon millions
of permutations.
And no one, German or British, thought that
the British could ever read, ever read those
messages. So, this is one of the Enigma machines
that Alan Turing will use Bayes' to conquer.
Now, on September 4th, 1939, Turing goes to--.
He's been working on the Enigma all summer
by himself.
But he's ordered the day after the war is
declared to go to Bletchley Park, which is
the British super-secret center for decryption
efforts just north of London. When he arrives
at Bletchley Park he was 27. He looked 16.
He was handsome and athletic. His mother sent
proper business suits for him to wear to work.
He preferred shabby sports coats. He had a
five o'clock shadow. Sometimes, his fingernails
were dirty. And he will spend the next six
years working on coding and decryption efforts.
When he arrives at Bletchley Park, no one
is working on the all-important Navy code,
the one that will, will control whether or
not supplies can reach Great Britain.
Turing, however, liked to work alone. And
after a few weeks, he announced that no one
was doing anything about it and "I could have
it to myself." And he goes up into the loft
of one of the stable buildings at Bletchley
Park and stays there during lunches and breaks
and the women on the staff organize a pulley
to take up baskets of food for him for lunch
and, and tea.
And, and if you don't mind, I'll read just
a bit from "The Theory That Wouldn't Die."
[reads from book]
"Late one night, soon after joining Bletchley
Park, Turing invented a manual method for
reducing the number of tests that would be
needed to determine the settings for those
wheels. It was a highly labor intensive Bayesian
system that he nicknamed 'Banburismus,' for
the nearby town of Banbury where needed supplies
were printed.
He wrote later during the war, 'I was not
sure it would work in practice, but if it
did it would let him guess a stretch of letters
in an Enigma machine, hedge his bets, measure
their belief and their validity by using Bayesian
methods to assess their probabilities and
add more clues as they arrived.'
If it worked, it would identify settings for
two of Enigma's wheels and reduce the number
of wheel settings to be tested from 336 to
as few as 18."
And at a time, obviously, when every hour
counted, the difference could save sailors'
lives.
"So Turing and his slowly growing staff begin
to comb intelligence reports to collect what
they called 'cribs', which were the German
words that they predicted would occur in the
original uncoded German message. The first
cribs came primarily from German weather reports
because they were standardized and repeated
often.
Weather for the night, situation Eastern channel.
And as one blessed fool radioed each night,
'Beacons lit as ordered.' Reports from British
meteorologists about weather in the English
Channel provided more clues, more hunches.
And in a fundamental breakthrough, Turing
realized that he couldn't systematize his
hunches or compare his hypotheses, their probabilities,
excuse me, without a unit of measurement.
And he named his unit 'A Ban for Banburismus'.
And he defined it as quote 'about the smallest
change of weight in evidence that is directly
perceptible to human intuition.' End quote.
One Ban represented odds of ten to one in
favor of a guess. But Turing normally dealt
with much smaller quantities, deciBans and
even centiBans.
The Ban was basically the same as the bit,
the measure of information that Claude Shannon
discovered by using Bayes' rule at roughly
the same time at Bell Telephone Laboratories.
Turing's measure of belief, the Ban, and its
supporting mathematical framework had been
called his greatest intellectual contribution
to Britain's defense.
Using Bayes' rule and Bans, Turing began calculating
credibility values for various kinds of hunches
and compiling reference tables of Bans for
technicians to use. It was a statistic-based
technique, produced no absolute certainties,
but when the odds of a hypothesis added up
to 50 to one, cryptanalysts could be close
to certain that they were right."
Now, Turing was obviously developing a home
grown Bayesian system. No one knows where
he got it, whether he discovered it and developed
it on his own, with his assistant Jack Good,
or whether he knew about the rudiments of
it from the lone defender of Bayes'--Cambridge
University during the pre-war period Harold
Jeffries, who used Bayes' for earthquake and
tsunami research to find out the epicenter
of the earthquakes.
Now, within a year and a half of the war starting,
by June of 1941, Turing could read the U-boat
messages within an hour of arrival at Bletchley
Park. And the British could reroute the convoys
around the U-boats and for almost a month
that summer no convoy, no ship is attacked
by a U-boat.
By the fall, however, the fall of 1941, Banburimus
is critically short of typists and junior
clerks, otherwise known as "girl power". Turing
and the other decoders wrote a personal letter
to Churchill, delivered it to Downing Street.
One of them delivered it to Downing Street.
And he responded immediately. Among the help
that was attempted, Ian Fleming, of James
Bond fame, plans an elaborate raid to capture
code books for Turing. Fortunately, it's so
elaborate, it's better for a novel than--.
And it was cancelled. [laughter]
The Navy collected code books for Turing from
sinking German ships and two young men lost
their lives collecting code, code books for
Turing from sinking German ships. But breaking
Enigma--. If we could actually have that one
back. I think. [pause] Whoop. Not there. Never
mind.
But eventually breaking the Enigma codes becomes
routine, like a factory at Bletchley Park.
However, [pause] let's go back to Turing if
we can. There we go. Thank you. But shortly
after Germany attacks Russia in June of 1941,
the German army started using a vastly more
complex code and word scrambling machine called
the 'Lorenz'.
These were ultra-secret codes and the Supreme
Command in Berlin relies on these new Lorenz
codes to communicate high-level strategy to
high-level army commanders around Europe.
And some of the messages were so important
that Hitler himself signed them.
[reads from book]
"And a group of Britain's leading mathematicians
begin a year of desperate search. They used
Bayes' rule, logic, statistics, Boolean algebra,
and electronics. And they also began working,
work on designing and building the first of
ten Colossi, the world's first large scale
electronic computers.
Turing invented a highly Bayesian method known
as 'Turingery', or 'Turingismus'. It was a
paper and pencil method again. The first step
was to make a guess and assume, as Bayes had
suggested, that it had a 50-50 percent chance
of being correct. Add more clues, some good
and some bad.
And as one of the decoders described it 'with
patience, luck, a lot of rubbing out and a
lot of cycling back and forth, the plain text
appeared.'
Now, the engineer who built the Colossi, Thomas
Flowers, had strict orders to have the second
model of the Colossus ready by June 1, 1944.
He was not told why, but his team worked,
he said, so hard until they thought their
eyeballs would drop out.
They get it ready by June 1. And on June 5,
there's a message that they receive from Hitler
that he's sending to his army commander in
Normandy named General Irwin Rommel. And Hitler
tells Rommel, if there is an invasion of Normandy,
it will be a diversionary tactic and do nothing
for five days.
The real invasion will occur later, somewhere
else. The message is decrypted at Bletchley
Park. A courier takes it to General Eisenhower
where he and his staff are determining when
to launch the invasion of Normandy. A courier
gives the piece of paper to Eisenhower.
We know this from Thomas Flowers, by the way.
He gives the paper to Eisenhower, who reads
it. He cannot tell anything, even to his top
staff about Bletchley Park and the decoding
efforts. He gives the paper back to the courier
and he turns to his staff and says, "We go
tomorrow morning."
November 6, 1944. And later, Eisenhower says
that the decoding at Bletchley Park, Bletchley
Park shortened the war in Europe by two years.
Now, a few days after Germany's surrender
in May of 1945, a year later, Churchill makes
a surprising and still shocking move.
Everything that showed that decoding had helped
to win the Second World War was to be ultra-secret.
No one could reveal anything about what they'd
done at Bletchley Park. And Turing, of course,
could not be mentioned. And this Colossi were
to be destroyed.
To be destroyed, cut into unidentifiable pieces,
except for the last two most complex ones.
[crash] Bombing.
[laughter]
Except for the two most--, the latest ones,
the most complex ones, which Britain apparently
used during the Cold War to decode Soviet
messages. But I think [sound of truck backing
up] we have to think that without Churchill's
orders, it might have been Great Britain that
became the leader of the 20th Century computer
revolution.
Now, after the war, Turing was working on
computers and other projects. No one knew
what he had done at Bletchley Park, when two
spies flee from Britain [door slam] to Moscow.
They had been spying for the Soviet Union
all during the war. And they fled in 1950.
One was named Guy Burgess and he was an openly
homosexual graduate of Cambridge University.
And US Intelligence told Britain that the
spies had been tipped off by another homosexual
graduate of Cambridge, an art historian named
Anthony Blunt. And the government panicked
and thought that it was probably a homosexual
spy ring of graduates from Cambridge. [laughter]
The number of arrests for homosexual activity
spikes. And the day after Queen Elizabeth
II is crowned, Alan Turing is arrested. He
is arrested for homosexual activity in the
privacy of his home with a consenting adult.
He is found guilty. He's sentenced to either
prison or chemical castration.
He chooses the estrogen injections. Over the
next year, he grows breasts. And the day after
the 10th Anniversary of the Normandy Invasion
that he helped make possible, Alan Turing
commits suicide. Anthony Blunt is later Knighted
and it's 55 years before the British government,
the Prime Minister in this case, apologizes
for its treatment of, of Turing.
Yeah, thank you. Now, after the, the Second
World War, its wartime successes were totally
classified. And Bayes' rule also emerged from
the Second World War, even more suspect than
before. The anti-Bayesians still focused on
Thomas Bayes' starting guess, this outrageously
subjective prior.
And without any public proof that their method
worked, the Bayesians were stymied. When Jack
Good, who had been Turing's statistical assistant
during the war, gave a talk at the Royal Statistical
Society about the theory of the method, no
mention of course of the application, the
next speaker's opening words were "after that
nonsense."
During Senator McCarthy's witch hunt against
communists in the US Federal government, a
Bayesian at the National Bureau of Standards
was called, only half-jokingly, "un-American
and undermining the United States' government."
At Harvard Business School, professors had
developed the very Bayesian decision trees
that MBAs use. But they were called 'Socialists'
and 'so-called scientists'. And a Swiss visitor
to Berkeley's very Frequentist department,
statistics department in the 1950s, realized
that it was "kind of dangerous" to apply these
Bayesian methods.
Now, the Cold War military, of course, continued
to use and develop Bayes' rule. So, they knew
it worked, but it was secret. For example,
the 1950s wrestled with the problem of how
do you predict the probability of something's
happening if it's never happened before.
There had never been an accidental H-bomb
explosion. So, the Frequentists said "you
couldn't predict the probability of its ever
happening in the future." So, a post-doc at
Rand Corporation, Albert Madansky, used Bayes'
to warn that Curtis LeMay's strategic air
command--I think you know Curtis LeMay from
the movie 'Dr. Strangelove' --that his strategic
air command could have produced at least 19
H-bomb accidents a year.
And the Kennedy Administration eventually
added safeguards. But there were other secret
Cold War projects, too. The National Security
Agency cryptographers used Bayes' to decode
Soviet messages.
And an advisor to the National Security Agency
and to the Institute for Defense Analyses
used it election nights to predict the winners
of a congressional and presidential elections
for 20 years, but refused to let anyone say
that he was using Bayes', apparently, to keep
his connection to Bayesian cryptography totally
secret. And the US Navy used Bayes secretly
to search for a missing hydrogen bomb in Spain,
for a nuclear submarine, Scorpion, which sank
without a trace. And then, in a classified
story that's told for the first time in "The
Theory That Wouldn't Die" how Bayes' actually
caught a Russian submarine in the Mediterranean,
and convinces the Navy.
Now, as a result, during the years of the
Cold War, Bayes becomes a real flesh and blood
story about a small group of maybe a hundred
of more believers struggling for legitimacy
and acceptance. And for many years, they concentrated
on theory, trying to make probability and
Bayes a respectable branch of mathematics.
But many Bayesians of that generation remember
the exact moment when Bayes' overarching logic
descends on them and they talk about it like
an epiphany, where they're converted. To them,
Frequentism looked like a series of ad hoc
techniques, whereas Bayes' theorem had what
Einstein had called "the cosmic religious
feeling."
The reason, of course, was that it was concerned
with a very fundamental scientific issue.
As David Speigelhalter told me, "It was basic.
A huge sway of scientists say you can't use
probability to express your lack of knowledge
or to describe one-time events that don't
have any frequency to it."
And many scientists find this rather disturbing
because it's not a process of discovery. It's
more a process of interpretation. So, both
sides were proselytizing their methods as
the one and only way to approach statistics.
It was, one statistician told me, a "food
fight, a devastating food fight."
And it was one that didn't subside until late
in the 20th Century. Both sides use these
religious terms when a Bayesian was appointed
Chair of an English statistics department.
Frequentists called him 'a Jehovah's Witness
elected Pope'.
[laughter]
He, in turn, asked how to encourage Bayes'.
He replied tartly, "Attend funerals." The
Frequentists reported in kind that if Bayesians
would only do what Thomas Bayes had done and
publish after they were dead, we'd all be
saved a lot of trouble. [pause]
The extraordinary fact though about Bayes'
during the Cold War is that although the military
was using Bayes' and the civilian Bayesians
were under attack, there were very few visible
civilian applications. For example, it was
an MIT physicist who used Bayesian methods
to do the first nuclear power plant safety
study 20 years after the industry started.
He did it in 1973. And he predicted what actually
would happen at Three Mile Island. But he
had the big, bad Bayes word hidden in the
appendix of Volume Three of the multi-volume
Rasmussen Report. And the only big public
application of Bayes was one using the words
in the Federalist Papers as data.
Now, the Federalist Papers were a series of
essays that appeared in New York State newspapers
to convince New York State voters to vote
for the Constitution. And some of them were
anonymous. And Fred Mosteller of Harvard and
David Wallace of the University of Chicago
launched a massive Bayesian study and concluded
and convinced everyone that all twelve of
the anonymous Federalist Papers were written
by James Madison.
But they also came to what they called an
"awesome statistical conclusion." And that
was, they said that 'Thomas Bayes' beginning
guess is controversial hated subjective prior
was irrelevant if you have a lot of data to
update it with.' The problem was that Mosteller
had to organize an army of about a hundred
Harvard students to punch the data into MITs
computer center and no one else was willing
to undertake such a mammoth organizational
problem.
By the 1980s though, another factor was working
against Bayes, too. And that was the computer
revolution was flooding the modern world with
enormous amounts of data and with a lot of
unknowns. And Laplace's method had involved
integration of functions and it was hopelessly
complex.
So, it was beginning to look, even to many
Bayesians, as though Bayes was an old-fashioned,
18th Century theory crying for a computer
and software. But many academic statisticians
thought computers were a copout. They'd started
out as abstract mathematicians. Most were
Frequentists.
They focused on small data sets, relatively
small data sets, with few unknowns. They didn't
need computers. Bayesians themselves during
this period also didn't realize that the key
to making Bayes useful in the workplace was
not more theory, but computational ease.
Theorist Dennis Lindley, who'd been programming
his own computer since 1965 and actually regarded
Bayes' ideal for computers. He wrote me, "I
consider it a major mistake of my professional
life not to have appreciated the need for
computing rather than mathematical analysis.
I should have seen that Bayes' enabled one
to compute numerical answers." It was a particularly
poignant case of the Canadian mathematician
Keith Hastings, who published in 1970 what
should have been a real breakthrough paper,
what's now called 'The Hastings Metropolis
Algorithm', or simply 'The Metropolis Algorithm'.
He used Markov chains, Monte Carlo sampling
techniques. Published it. Got no reaction
at all. A year later, drops out of research
and goes to teach at the University of British
Columbia. And it was not until 20 years after
his, after his work when he was fully retired
that he realized the importance of what he
had done.
And Hastings told me with some anguish in
his voice that his work was ignored because
quote "a lot of statisticians were not oriented
toward computing. Statisticians took these
theoretical courses, cranked out theoretical
papers, and some of them wanted an exact answer,
wanted exact answers, not estimates."
So, as a result, during the 1980s as computers
were pouring out this fascinating new data
about pulsars and plate tectonics and evolutionary
biology and pollution in the environment,
it was not analyzed often by, by statisticians.
It was analyzed by computer scientists, by
engineers, physicists and biologists.
And it would be imaging that would force the
issue because by the late '70s, early 1980s,
industrial automation, the military, and medical
diagnostics were producing blurry images from
their ultrasound machine, PET scans, MRIs,
electron micrographs, telescopes, military
aircraft, and infrared sensors.
And there was Bobby Hunt, who in 1977, finally
suggested that Bayes' could be used for image
restoration. He had done it while working
on strategic weapons programs and digital
image processing at Sandia and Los Alamos
National Labs in New Mexico.
During this period, others were introducing
iterations and Monte Carlo. And, in 1974.
in 1984, if I can get the next slide. Alan
Gelfand and Adrian Smith--. [pause] Sorry.
It's just a picture of Adrian Smith and Alan
Gelfand. Thank you. [pause] Who decide to
start something new. And Gelfand, reading
around, discovers the iterations and Gibbs
sampling.
Adrian Smith on the left, at the University
of Nottingham at the time. Alan Gelfand at
the right at University of Connecticut at
the time. And when Gelfand, the minute he
saw the papers on iteration and Gibbs sampling,
he said all the pieces fell together. Bayes,
Gibbs sampling, Monte Carlo, Markov chains,
iterations.
And they wrote their watershed synthesis,
now called MCMC for Markov chain-Monte Carlo,
very, very fast 'cause they were scared other
people would put the pieces together, too.
But they also wrote it very carefully. They
used the word 'Bayes' only five times in 12
pages.
And Gelfand told me later, "There was always
some concern about using the 'B' word, a natural
defensiveness on the part of Bayesians in
terms of rocking the boat. We were always
an oppressed minority trying to get some recognition.
And even if we thought we were doing it the
right way, we were only a small component
of the statistical community and we didn't
have much outreach into the scientific community."
But Bayesians thought the paper was an epiphany
and the next ten years is, what I call, a
frenzy of research, solving, calculating problems
that for two and a half centuries had only
been dreams. Gelfand, of course, says that
they were lucky because the relatively inexpensive
powerful workstations became available at
the same time.
Then Smith's student, a David Spiegelhalter,
came out with his BUGS software. BUGS standing
for Bayesian Statistics Using Gibbs Sampling.
It was off the shelf software. He comes out
with the first one in 1991 and it's BUGS that
causes the biggest single jump in Bayesian
popularity.
And it sends Bayes out into this scientific
and the technological world, where outsiders
from computer science, from physics, artificial
intelligence, refresh it, broaden it, secularize
it, de-politicize it and it gets adopted almost
overnight. It was a modern paradigm shift
for a very pragmatic age.
It happened overnight, not because people
changed their minds about a philosophy of
science, but because finally it worked. The
battle between Bayesians and Frequentists
subsided. Researchers adopted--could adopt--the
method they thought fit their needs best.
Prominent Frequentists moderated their positions.
Bradley Efron, a National Medal of Science
recipient who had written a classic defense
of Frequentism, recently said, "I've always
been a Bayesian."
[laughter]
And someone else, who once called Bayes "the
crack-cocaine of statistics [laughter] --seductive,
addictive, and ultimately destructive" began
recruiting Bayesian interns for Google. Thank
you.
[applause]
Now I'd be happy to try to answer some questions,
but it would be very kind of you if you'd
use the microphone so people could hear the
questions. And I wouldn't have to maul them,
summarizing them.
>>MALE AUDIENCE MEMBER #1: So what was the
Neolithic discovery?
>>Sharon Bertsch McGrayne: It was that the--.
It had happened all of a sudden, within a
short time period five thousand, five hundred
years ago. And that instead of their being
used for rites over the ages and built over
the ages, that they'd actually been built
and abandoned in this very short time period.
And they're so excited by this Bayesian mega-analysis
that they've done that they're gonna try another
of the early Anglo-Saxon period in Great Britain
next. Yes.
>>MALE AUDIENCE MEMBER #2: OK. So--.
>>Sharon Bertsch McGrayne: Now remember I'm
not a mathematician.
[laughter]
>>MALE AUDIENCE MEMBER #2: So, my training
actually is mathematics and we have, our controversies
are, we have what's called the axiom of choice
which is about the behavior of infinite sets
and the continuum hypothesis and all these
other sort of questioned axioms.
And if you assume not the axiom of choice,
you get one mathematics. If you assume it,
you get a different mathematics. But mathematicians
will all agree that these are just models.
These are abstract ideas. It's not that this
axiom is true or not true 'cause it's formalisms.
In the real world, you can't even substantiate
an infinite set anyway. How come with the
statistical community, 'cause in the math
community it was like, whether it's useful.
It's like, you can extend math this way or
that way. Which one is more useful as a model?
Why weren't they able to say, "OK, Bayesianism
might not make sense in some platonic way."
Or it might, there's selecting the priors,
which is more of an art than a science. But
just based on the fact that it's useful, why
weren't people able to go with it? [pause]
>>Sharon Bertsch McGrayne: It's very odd because
a whole class of people, physicists, did use
Bayes'. And they used it even before the Second
World War. Fermi used to do computations if
he couldn't sleep at night. He'd do Bayesian
computations.
[laughter]
And then he'd come in the next morning and
announce how the experiments of the day were
probably gonna come out. Well, we're not all
Fermis, but--. And during the war, Los Alamos
used them.
And after the war, the physicists actually
tried to get statisticians to, to use them.
There were NATO conferences and so on. So,
it's very puzzling. There were articles that
told statisticians you can do it manually.
There were non-statisticians. There were historians
in the '60s using computers.
I think they were trapped. They were so defensive.
They were such a small group and they were
under attack. That's not a good way to be,
mentally. You can't burst out if you're under--
if your fortress is under attack. That's the
only thing I've been able to conclude.
>>MALE AUDIENCE MEMBER #3: So, you mentioned
in your book but not in the talk on the Lindley's
paradox and this research now in Princeton
about psychokinetic random number generators
that, according to a Frequentist, there was
just as significant that people had random
number generating abilities, but Bayesians
said that this is silly and the same study
proves that it doesn't exist.
Would you go into that more? Like, the study
was cooked, then. Bad. Garbage in, garbage
out. But was this real data? [pause]
>>Sharon Bertsch McGrayne: Was it real data?
Well, I guess he had his machines recording
it, but I think the statisticians don't record
it, regard it as real data at all. The same
question came up just recently of--come on,
what's it called--ESP. And Frequentists' analysis
saying that ESP could work.
It was something routed in the last year,
I think. So, it's a long-standing problem,
but Dennis Lindley I think cooked that guy's
data and said it was garbage. Garbage in and
garbage out.
[laughter]
Yes, there's some papers about it and they're
in the bibliography.
>>MALE AUDIENCE MEMBER #4: To follow up on
that, there's an article on this very subject
in the latest issue of the Skeptical Inquirer.
>>Sharon Bertsch McGrayne: Good. Good.
>>MALE AUDIENCE MEMBER #4: I'll try to find
my copy and bring it in. I just wanna remark
about history. I've been in computing since
1964 and when I was at Cornell, there was
a tremendous, there was actually--. Oh, my
phone's going crazy.
There were actual random walk problems in
the textbook, the programming textbook, and
the social scientists were just generating
reports this big. I don't exaggerate. I hauled
them around, of computer analysis, factans,
and xtabs and all kinds of crazy stuff.
So, I think there's this whole background
of -- that was going on underneath this controversy
that was preparing us for it.
>>Sharon Bertsch McGrayne: Howard Raiffa gives
his--. I was in a totally different kind of
meeting and someone asked what I was working
on and I said Bayes' rule. And he said, "Oh,
Bayes' rule? Howard Raiffa taught Bayes' rule
at Harvard Business School in the 1960s."
And he dug out of his notes this reference,
too. It's 60 or 80 pages on Bayes, Markov
chains and so on. So, you're absolutely right.
So it's such a puzzle why--. [pause] I really
think it must have been the fortress mentality
that -- it's just hard to break out of that.
If there aren't any more questions, I would
be happy to sign anyone's books if they want
me to.
>>Male Presenter: Thank you very much.
[applause]
