BORIS DEBIC: Welcome,
everybody to one more
Authors at Google talk.
Today with us, Brian
Christian and Tom Griffiths.
Their book, "Algorithms
to Live By,"
is now available in all the
fine bookstores in the area
and in the US and around
the world and whatnot.
Brian Christian is the author
of "The Most Human Human,"
a "Wall Street Journal"
bestseller, "New York
Times" editor's choice,
and "New Yorker"
favorite book of the year.
His writing has appeared in "The
New Yorker," "The Atlantic,"
"Wired," the "Wall Street
Journal," "The Guardian,"
and "The Paris Review," as
well as in scientific journals
such as "Cognitive Science."
He has been translated
into 11 languages.
He lives in San Francisco.
Tom Griffiths is a professor
of psychology and cognitive
science at UC Berkeley,
where he directs
the computational
cognitive science lab.
He has published more
than 150 scientific papers
on topics ranging from
cognitive science psychology
to cultural evolution.
He has received awards from the
National Science Foundation,
the Sloan Foundation,
the American Psychology
Association, and the Psychonomic
Society, among others.
He lives in Berkeley.
So I'm not going to talk
too much about the book
because we have
Peter Norvig here,
who will explain
to you why this may
be the-- we call it the gateway
drug to computer science.
Come on, Peter.
PETER NORVIG: OK.
So you remember Apple
had this campaign that
said, think different?
And I think we
realize that we all
think different in some ways.
Right?
So we're all nerds or
whatever you want to call us.
And sometimes when you go out
with your friends and family,
you realize, oh, I think
differently than them.
And I think that's what
this book is about.
Boris talks about it
as a gateway drug,
and I think that's great.
That's a good way to think
about it because we're coming up
now-- there's another
hour of code thing,
and we're going to celebrate.
And we're going to try to
teach everybody to code.
But learning how to draw
a rectangle on the screen
and figuring out that
the third argument is
the color and the first two are
the x- and the y-coordinates,
that's not really it.
That doesn't change
the way you think.
I mean, that's a
good skill, to be
able to draw rectangles
and circles on the screen.
But really what's important is
being able to model the world.
Jeanette Wing talks about it
as computational thinking.
And I think there's a mix
between different types
of thinking.
There's this
computational thinking.
There's a mathematical thinking.
There's a statistical thinking.
And we all have
some combinations
of all of those things.
And those skills
together, I think,
are more important than
the details of the API
for JavaScript objects.
And this book really gets at it.
It doesn't teach
you how to code,
but it teaches you how
to think in that way.
And it gives you examples to
ponder about your everyday life
and why it might be
important to think that way
and when you can do with it.
So tell us all about
it, Brian and Tom.
[APPLAUSE]
BRIAN CHRISTIAN:
Thank you so much,
Boris and Peter, for
the introduction,
and thanks to Google for the
invitation to come speak.
The talk is "Algorithms
to Live By."
I'm Brian.
This is Tom, in case you were
curious which of us is which.
We sometimes get that question
if we've forgotten to introduce
ourselves at the top.
And the book opens
with an example
that I think will be acutely,
perhaps uncomfortably,
familiar to many of us
here in the Bay Area, which
is looking for housing.
So in a typical consumer
situation, the way
you make a choice is you
consider a pool of options.
You think hard about which
one you like the best.
And then you go with that.
But in a sufficiently
crowded real estate market,
in a sufficiently competitive
market-- which the Bay
Area certainly
is-- you don't have
the luxury of making the
decision in that way.
Rather, the decision
takes the form
of evaluating a sequence
of options one at a time,
going to a number of
different open houses,
and at each point
in time, you must
make an irrevocable commitment.
You either take the place that
you're looking at right there
on the spot, never knowing what
else might have been out there,
or you walk away,
never to return.
You have almost no chance
of going back and getting
the place.
This is certainly a much
more fraught setting
for making a decision because
here the critical question
that you have to ask yourself
is not which option to pick
but how many options
to even consider.
Intuitively, we have
this idea that you
want to look before you leap.
You don't want to make
a premature choice.
You want to get a sense
of what's out there.
But you don't want
to hold out too
long for some kind of perfect
thing that doesn't exist
And let the best
options pass you by.
So our intuition
tells us that we
have to strike some
kind of balance
between getting a feel
for what's out there
and setting a standard and
knowing a good thing when
we see it and being
ready to commit.
And the story that we get
from mathematics and computer
science is that this
notion of balance
is, in fact, precisely correct.
But our intuition
alone doesn't tell us
what that balance should be.
The answer is 37%.
If you want the very
best odds of finding
the very best
apartment, consider
exactly 37% of the pool of
options in front of you.
Or alternately, you can think
of it in terms of time--
37% of your time
just calibrating.
Leave your checkbook at home.
If you've given yourself a month
for the search, in this case,
that would be 11 days.
After that point, be prepared
to immediately commit
to the first thing you see
that's better than what
you saw in that first 37%.
This is not only the
intuitively-satisfying
compromise between
looking and leaping,
this is the provably optimal
solution to this problem.
So, for example,
if you're committed
to living in one of the painted
ladies here in San Francisco,
and they have their open
houses on successive weekends,
the algorithm tells you
to look at the first two,
and no matter how tempting
they seem, hold tight.
And then, starting
with the third one,
immediately leap
for the first one
better than what you
saw at the first two.
Now, more broadly,
this is known as what's
called an optimal
stopping problem.
And this structure
of encountering
a series of options
and being forced
to make a commitment one way or
another-- either you're all in
or you walk away--
some people have
argued that this is a structure
that describes not only
things like the apartment hunt
or real estate in general.
It also, many people have
argued, describes dating.
You're in a relationship,
and you have this decision
of when to commit.
Have you met enough people
to have a sense of who
your best match really is?
And so you can do some
back-of-the-envelope math
and say things like, OK, the
average expected lifespan
of an American is 79.
37% of that gives me 29.
And this roughly
divides my romantic life
into dating for fun
versus dating seriously
to really evaluate for a mate.
Of course, as we will see, it
all depends on the assumptions
that you're willing
to make about love.
TOM GRIFFITHS: So
as you've seen,
simple algorithms can solve
some of the problems which
we actually normally think about
not as problems for computers
but as problems for people.
So there are a set
of problems which
we have to solve just as
a consequence of the fact
that our lives are lived in
finite space and finite time--
so having to do things like try
and figure out how to organize
our house or our closet
or our office in a way
to make it most efficient
or trying to figure out
how to schedule our
time so that we can
do the most possible things.
And we normally
think about those
as being fundamentally
human problems.
But really, the argument
that we make in the book
is that they're not.
They have analogs
to the problems
that computers have to solve.
So, you know, your
computer has to figure out
how to manage its space--
its space on the hard disk
and its space in memory.
And it also has
to figure out how
to manage its time-- what
it's going to do next,
what program it's going to run.
And as a consequence,
computer scientists
have put a lot of thought into
coming up with good algorithms
for solving those problems.
So what we do in
the book is explore
how taking this perspective
gives us insight
into the problems that
human beings have to solve,
in some cases offering us
nice, elegant, simple solutions
to these problems,
like the 37% rule,
in other cases giving us new
ways of thinking about how
those problems might be
described in mathematical terms
and given, perhaps, a kind
of quantitative analysis.
So we consider a range of
different problems-- so
the optimal stopping problem,
which you've heard a little bit
about, as well as a
variety of other contexts
in which thinking
algorithmically actually
gives us a new insight into
a human sort of problem.
And what we're going to do
today is talk about just a few
of these problems.
In the book, we
also talk about some
of the more general
principles which
go into designing
good algorithms
or thinking about the
kinds of underlying ideas
that are useful
when we're trying
to engage with the world
in this computational way.
So the three problems that
we're going to focus on today
are optimal stopping, the
explore or exploit trade-off,
and caching.
And optimal stopping
you've already
heard a little bit about.
The 37% rule is
just one instance
of a strategy which is useful
for solving an optimal stopping
problem.
It's the solution
to a problem which
is known in the mathematical
literature as the secretary
problem.
So the canonical set-up
for this is that you're
trying to hire a secretary.
You have a pool of applicants.
Each applicant comes
in and is interviewed.
You don't have a way of
evaluating the applicants
except against one another.
So you can only make a
judgement of how good they
are based on the applicants
that you've seen so far.
And for each person
who comes in,
you have to make a
decision immediately as
to whether you hire that
person or whether you
dismiss them, in which case
you'll never see them again.
So this secretary
problem was first
presented to the
public in the 1960s
in a column that was
written by Martin Gardner.
But in the book, we trace the
history of this problem back,
and it turns out that
romance was at its core.
So what we actually found by
doing some archival research
is that the person who claims
to have originated the problem
is a mathematician
called Merrill flood.
And the story that
he tells about it
is that his daughter,
who had just
graduated from high school,
was engaging in a relationship
which he and his wife
didn't really approve
of with a much older man.
And so he wanted to
somehow convince her
that this was a bad idea.
So being a
mathematician, the way
that he approached
this problem was she
turned out to be taking the
minutes at a mathematics
conference that Flood was
scheduled to present at.
And so he went up, and
he presented this problem
of a woman who's entertaining
a series of suitors
and faced with the
challenge of deciding
which of those suitors'
proposals she should accept.
And he didn't actually
know the solution
to the problem at that point.
But he was pretty sure that
the number was larger than one.
And so he was hoping that
his daughter would sort of
take the message as she
was writing it down,
and sort of think about how it
applied to her own situation.
So as we've seen, the
solution to this problem
turns out to be 37%.
But as Brian was
saying, a lot of that
depends on the
assumptions that you make.
And there's actually been
a history of mathematicians
seeking love that provide some
cautionary tales and perhaps
some insights into variants
on this problem, which
are things that are worth
paying attention to.
So one kind of
situation that you
can encounter when trying
to pursue this approach
is rejection.
And we actually found a
story of Michael Trick,
who's an operations researcher
at Carnegie Mellon University.
And he told the story of
applying the secretary problem
in very much the same way
that Brian was describing--
calculating the
period over which
he thought he'd be searching,
working out what 37% was.
And it happened that the age
at which he should switch
from having fun to being serious
was precisely his current age.
And so he went and
proposed to his girlfriend,
who turned him down.
So Trick ran into one
of the assumptions that
are being made here,
which is that when
you make an offer
to somebody, they
should be willing to accept it.
And only under
those circumstances
is the 37% rule valid.
If somebody is potentially
able to reject your offer,
then it changes the strategy
that you should follow.
And in particular, it means
that the period that you spend
looking should be reduced.
So, for example, if you have a
50% chance of being rejected,
then you should look
at the first 25%
of your potential
candidates, and then
be willing to make an offer
to the first person who's
better than anybody you
saw in that first 25%.
Another variant on
the secretary problem
is what's called recall.
And so this is having
the opportunity
to go back to somebody
who you passed over.
So you can imagine your
candidates come in, but rather
than dismissing them meaning
that you'll never see them
again, dismissing
them just decreases
the chance that you'll be
able to go back to them.
Or in the dating
setting, basically,
breaking up with somebody
means that maybe there's
a chance that they'd
take you back later on.
So there's actually
a mathematician
who experienced exactly
this phenomenon.
This is the astronomer
Johannes Kepler, who
after his first wife died went
through an extended period
of courting various
women, trying
to sort of evaluate and come up
with the person who he thought
was going to be the very
best person for him.
And so over an
extended period, he
ended up interacting
with 11 women.
And having sort of explored
those possibilities,
he decided, getting to
the end of that process,
that number five, who
he'd previously dismissed,
was actually the
best one for him.
So he went back, and he
made an offer to her.
And it turned out she hadn't
accepted any other proposals
in the meantime.
And so luckily, they
were able to be married
and had a happy
marriage together.
So this possibility-- that
you can actually go back
to somebody, and they
can still say yes--
changes where you should
set your threshold again.
So in this case, it
makes it something
where you should spend a
little longer looking around.
So, for example, if
there's a 50% chance
that somebody who
you go back to is
going to be willing to
accept your offer anyway,
then you should look
at the first 61%,
and then use that to set the
standard which you'll then use
for evaluating the remainder.
And then if you get to the
very end of this period
and haven't found
anybody, then you
go back and make an
offer to the person
who was the very best, which
you now know because you've
explored all of the options.
PETER NORVIG: Tom?
So if you had prior
knowledge of the distribution
that you're drawing
from, that should also
[INAUDIBLE] the period, right?
TOM GRIFFITHS: That's right.
So in this case, we're
assuming that the only way
that you can evaluate people is
relative to one another, right?
And so that's
something which then
leads to this kind
of strategy, where
you have to spend
some time building up
an impression of what the
distribution looks like.
So basically, pretty
much any variant
on this problem
you can imagine has
been explored by mathematicians
in the last 50 years.
The case that you're talking
about, where you have what's
called full information--
you actually
know what the
distribution is like-- is
one where the overall strategy
looks quite different.
So in that case, what you
should do is set a threshold.
And then as you go
through the pool
and you're sort of remaining
candidates become sparser,
that threshold becomes
lower and lower.
Some of us might
have experienced
this in the context of other
kinds of romantic interactions.
But it certainly makes
dire predictions, perhaps,
as you age and people
start getting married,
that you should be willing
to accept, perhaps,
a slightly-- set a
slightly lower threshold
in terms of the way that
you approach the problem.
And there are also
variants which are
what are called partial
information versions,
where you come in not knowing
what the distribution is but
having some information about
what that distribution is.
And those turn out to
be some of the sort
of most interesting and
mathematically intricate cases.
So these examples
illustrate some
of the ways in
which the secretary
problem can be made
more complicated
and can accommodate some of
the other kinds of situations
that we might encounter in
realistic romantic scenarios.
But this is, by all
means, not the limits
of optimal stopping in
terms of its implications
for human lives.
So basically, any
situation where
you have to make a decision
about whether to continue
to gather information or
to continue to consider
opportunities or to
act on the information
that you have so
far is going to be
one which is going to fit
this same kind of schema
of optimal stopping.
So another example is deciding
when to sell an asset.
So be it your house
or your company,
you have to make a
decision in a situation
where you might receive
a series of offers.
But in this case,
each of those offers
is going to cost
you money, right?
You're going to
be waiting around
for somebody to come along
and make another offer.
And your goal is to
maximize the amount of money
that you get out
of those offers.
So this can be formulated as
an optimal stopping problem.
You have to decide
for each offer
whether you're going
to take that offer.
And there's a
simple formula that
describes the strategy
that you should
take for solving this problem.
So basically, the strategy
takes the general form
of a threshold.
You should set a
particular value
based on your expectations
of the distribution of offers
that you're going to receive.
And then the first offer that
exceeds that value should
be the one that you accept.
This is more closely
analogous to the situation
that you were talking about.
So, for example, if you have,
say, a uniform distribution
between getting an
offer of $400,000
and an offer of $500,000,
then as a function of the cost
per offer, we can
actually work out
what this threshold looks like.
And so it gives you a very
simple kind of approach
that you can take in the context
of trying to decide whether you
should evaluate an offer.
And it also says that you
should never look back, right?
So if you turned down
an offer in the past,
that was because the offer
was below your threshold.
And as a consequence,
you shouldn't
regret having given up
on that opportunity.
You should just
be waiting around
for the next offer that
exceeds your threshold.
Another example of an
optimal stopping problem,
which I think many of
us have encountered,
is the problem of figuring out
how to find a parking spot.
So in this case, the
kind of ideal scenario
that the mathematicians
imagine-- although, again,
there are lots of
variants-- is one
where you're driving
towards a destination,
and you kind of have a series
of possibly available parking
spots that are along
the side of the road.
And your goal is to minimize
the distance that you
have to end up walking.
So it turns out that the
solution to this problem
depends on what's
called the occupancy
rate, the proportion
of those parking spots
which are occupied.
And so for each occupancy
rate, what we can do
is identify at what
point you should
switch from driving to
being willing to take
the next available parking spot.
And so we calculate for you
a nice convenient table.
You can cut this out,
stick it on your dashboard.
But basically, you can get
some interesting conclusions
from this, such as if
10% of the parking spots
are free-- so a 90%
occupancy rate-- then
you should start looking
for a parking spot
seven spaces away
from your destination.
Whereas if 1% of those
parking spots are free,
then it's more like 70.
And maybe there's a
scenario down here
which fits to some
parts of San Francisco.
So another famous problem which
has been analyzed in this way
is one which, unfortunately,
ruins the plot
of a lot of heist movies.
So this is the
problem of a burglar
who has to make a
decision about when
to give up a life of crime.
So there's a scene that happens
in a lot of heist movies
where the team has to sort
of coax the old thief out
of retirement.
The thief is kind
of going, well,
you know, I kind of
like living in my castle
and trying to figure out exactly
what they're going to do.
But to sort of spoil all
of those plots, in fact,
there's an exact
solution, and the thief
need only crunch the numbers to
work out what should be done.
And if you assume that you have
a thief who for each robbery
is going to get, on average,
about the same amount of money
and has some probability
that they succeed
in executing those robberies,
and if they get caught,
they lose everything.
So the goal is to maximize
the expected amount of money
that you end up with,
then the optimal stopping
rule is to stop after
a number of robberies
equal to the probability
of success divided
by the probability of failure.
So if you are a ham-fisted
thief who succeeds about half
the time and fails
about half the time,
you could pull off one robbery.
And then you should just quit
if you got away with it away
with it and you've been lucky.
But if you're a skilled thief
who succeeds 90% of the time
and fails 10% of the time, you
can pull off nine robberies,
and then that's the point at
which you should call it quits
and go and live in
the castle and not
listen to any of these
guys who come calling.
So these sorts of scenarios
might seem a little bit
artificial, right?
They're all cases where
you're kind of forced
into a situation where you have
to make a decision about when
to stop doing something.
But I think there's
a deeper point here,
which is that while we normally
think about decision-making
as being something
where we've got all
the information in front
of us, and it's just
a matter of choosing what
we're going to pursue,
in fact, in most
human decisions,
we're in a scenario which
is much more like one
of these optimal
stopping problems,
where we have to be
gathering information.
And one of the decisions
that we have to make
is whether we've got
enough information to act.
And so I think there's a
deeper point here about the way
that we think about the
structure of human decisions,
that often we're engaging in
exactly this kind of process
even though we might
not realize it.
BRIAN CHRISTIAN:
The second class
of problems that we
consider in the book are
ones that we describe as
explore/exploit trade-offs.
And this encompasses
a wide range
of problems where we get to
make a similar kind of decision
over and over and over
in an iterated fashion.
And typically,
that takes the form
of a tension between doing our
favorite things-- the things we
know and love-- or
trying new things.
And this type of a
structure appears
in a number of different
domains and in everyday life,
restaurants being
the obvious case.
Do you go to your
favorite place?
Or do you try the new
place that just opened up?
It happens in music,
where do you do you
listen to your favorite album?
Or do you put on the radio?
It also describes
our social lives.
How much time do you spend with
your close circle of friends,
your spouse, your family,
your best friends,
versus going out to
lunch with that colleague
that you just met and you think
you have something in common
and want to kind of foster
a new acquaintanceship?
This same trade-off
also occurs in a number
of different places in more
societal ways-- in business
and also in medicine.
So in business, this
comes up in the context
of ads, which I
hear is something
that you guys know a little
bit about here at Google.
So when you're deciding what
ad to run next to a keyword,
for example, you've got
this tension between there
is some ad that historically
has the best track
record of getting the clicks.
But there's also this
ever-changing dynamic pool
of new ads that are
entering the system that
might be worth trying.
They might be better.
They probably aren't,
but they could be.
In medicine, this
structure describes the way
that clinical trials work,
where for any condition,
there is some known best
treatment with some known
chance of success and
some known drawbacks.
And then there's
a series-- there's
kind of a pipeline of these
experimental treatments that,
again, might be
better, might be worse.
And so we have to trade off
between exploring-- that is,
spending our energy
gathering new information--
and exploiting, which is
leveraging the information
that we've gathered so far
to get a known good outcome.
The canonical
explore/exploit problem
that appears in the
computer science literature
is known as the
multi-armed bandit problem.
And this colorful
name-- I'm sure
this is familiar to
some of you in the room.
But the name comes from the
moniker of the one-armed bandit
for a slot machine.
So a multi-armed bandit you
can think of as just a roomful
of different slot machines.
So the basic setup
of the formal problem
is this-- you walk
into a casino.
There's all sorts of different
slot machines, each of which
pays off with some probability.
But every machine is different.
Some of them are more
lucrative than others.
But there's no way for you
to know that until you just
start pulling the
levers and seeing
which ones seem more promising.
So again, here's
this case where you
have to trade off
between gathering
information and leveraging
the information that you have.
And so there's this
question, which
vexed an entire generation
of mathematicians, of, OK.
Let's say you walk into the
casino, and you have 100 pulls.
You're there for an afternoon.
You have enough
time for 100 pulls.
What strategy is going to give
you the highest expected payout
before you leave the casino?
In fact, for much
of the 20th century,
this was considered unsolvable.
And during World War II, Allied
mathematicians in Britain
joked about dropping the
multi-armed bandit problem
over Germany as the
ultimate instrument
of intellectual
sabotage to just waste
the brainpower of the
German scientists.
And I think one of the
simplest explanations
of the way in which this problem
can be very tricky to think
about is the following choice.
So let's say you've played
one machine 15 times.
And nine times, it paid out.
Six times it did not.
Another machine
you've played twice.
Once it paid out.
Once it didn't.
Now, if we just want to very
straightforwardly compute
the expected value of
each of these machines,
the nine-and-six machine's
got a payout rate of 60%.
The one-and-one machine
has a payout rate of 50%.
And so there are these two kind
of competing intuitions here.
One is, well, you
should obviously
just do the thing with
the better expected value.
The other is, well, there's
a sense in which we just
don't know enough about
the second machine
to walk away from it forever.
Certainly, it must be worth one
more pull or two more pulls.
How do you decide what
that threshold is?
And it turns out this is,
in a way, a trick question
because it all depends on
something that we haven't given
you in this description
of the problem yet,
which is how long you
plan to be in the casino.
So this is a concept
that sometimes
gets referred to as the horizon,
we refer to as the interval.
And this concept has,
just speaking personally,
given me a bit of
an axe to grind
with one of my favorite films
from my own childhood, which
is the inspirational 1980s Robin
Williams movie, "Dead Poets
Society."
It's one of these
really feel-good movies,
and he plays this
inspiring poetry teacher
who says to his students
in this rousing monologue,
"Seize the day, boys.
Make your lives extraordinary."
And going back through the
lens of the multi-armed bandit
problem, I can't help feeling
that Robin Williams is actually
giving two conflicting
pieces of advice here.
If we're just trying
to seize the day,
we probably want to pull
that nine-six because it's
got the higher expected value.
But if we want to make
our life extraordinary,
then we should
certainly see if there
isn't some value in trying
these new things because we
can always go back.
In standard American English,
we have all of these idioms
like "eat, drink, and be
merry, for tomorrow we die."
But it feels that we're missing
the idioms on the explore side
of the equation, which are
things like, life is long,
so learn a new language and
reach out to that new colleague
because who knows what could
blossom over many years time.
We're still honing
the messaging,
but it does feel like there's
a gap in the culture that
can be filled here.
So when you're working
with a finite interval,
the solution to the
multi-armed bandit problem
comes from a method
described by Richard Bellman,
dynamic programming,
where you basically work
backwards and are
able to compute
the expected value of every
pull given all of the possible
pulls that you could make as a
result of whether that succeeds
or fails.
And you can actually work
out the expected value
all the way back to walking in
the door of the casino, what
should you do?
And this provides
an exact solution
to the multi-armed
bandit problem,
but there's a catch, which is
that it requires that you know
exactly how long you're
going to be there
and exactly how many
machines there are.
And it also requires doing a
lot of computation up front.
But I think the
critical thing is
that we're able to look
at these solutions,
these exact solutions, and get
some broader principles out
of it.
So, for example, the
value of exploration
is greatest the minute you
walk through the door for two
reasons.
The first is that if you think
about it in the restaurant
analogy, you just
moved to Mountain View.
You go out to eat that night.
The first place you
try is guaranteed
to be the best
restaurant you've ever
experienced in Mountain View.
The next night, you
try a different place.
It has a 50% chance
of being the best
restaurant you've ever
seen in Mountain View,
and so on and so forth.
So the likelihood
that something new
is better than what
we already know about
goes down as we gain experience.
The second reason
that exploring is
more valuable at the
beginning of an interval
is that when we
make that discovery,
we have more chances to go back.
So discovering a really
amazing restaurant
on your last night in town is
actually a little bit tragic.
It would have been great
to find that sooner.
And so this gives us
this general intuition
that as we perceive ourselves
to be on some interval of time,
we should kind of
front-load our exploration
and weight our
exploitation to the end,
when we both have the most
experience with what to exploit
and the least time remaining
to discover and enjoy something
new, even if we did find it.
This is significant, I
think, because it gives us
a new way of thinking about
the arc of a human lifespan.
And so ideas from the
explore/exploit trade-off
are now influencing
psychologists and changing
the way we think about
both infancy and old age.
So to demonstrate
infancy, we have a picture
of a baby eating a power cord.
And I think this
demonstrates what
a lot of us kind of
culturally intuitively
think of as the
irrationality of babies.
They're totally-- have the
attention span of a goldfish.
They put everything
in their mouth.
They're distracted
really easily.
They're really bad at just
generally doing things.
We give the example in a
book-- like a baby gazelle
is expected to be
able to run away
from a wolf within the
first day of being alive.
But, you know, it
takes us 18 years
before we're allowed
to get behind the wheel
of a car, that kind of thing.
And the psychologist
Alison Gopnik uses ideas
from the explore or
exploit trade-off
as a way of saying, well,
maybe this extended period
of dependency is actually
optimal in some sense
because if you're in the
casino, for the first 18 years
that you're in the casino,
someone else is buying
your food and paying your rent.
And so you don't need to be
getting those early payouts
to buy your lunch.
And so you can really use that
period of time to explore,
which is exactly what
you should be doing
at the beginning of your life.
There's a sense in
which just putting
every item in the
house in your mouth
at least just once
sort of resembles
walking into the casino and
just pulling all of the levers.
Similarly, the idea
of exploitation
is changing the way we
think about getting older.
So here we have a gentleman
who I like think of, imagine it
as enjoying the same
lunchtime restaurant
that he's been to
hundreds of times.
And he knows exactly
what he's going to get
and exactly what he
likes, and it's great.
There's a lot of
psychological data
that says that as
we go through life,
older folks have a smaller
circle of social connections.
They spend their time
with fewer people.
And there's one interpretation
of this that just says,
well, they're lonely, or they're
detached or disinterested,
or it's just kind
of sad to get older.
But thinking about it from the
perspective of exploitation
gives a totally different story.
And this comes up in the work of
the Stanford psychologist Laura
Carstensen, who studies aging in
an attempt to sort of overturn
some of the prejudices we have.
And so, for example,
one of the intuitions
you get from the idea of the
explore or exploit trade-off
is that towards the
end of your life,
you really should
be spending more
of your time doing the things
you already know and love
both because it's unlikely
to make a discovery that's
better than the things you
already really care about
and also because
there's less time
to enjoy it should you do that.
And so it just makes more sense
to spend more of your energy
on the things that you
already know and love.
And so as a result,
what I think is actually
a very encouraging story here
is that as you spend more time
in the casino, your average
payouts per unit of time
should go up.
So there's a sense in
which we should actually
expect to get steadily
happier as we go through life.
We are less disappointed
and less stressed out.
And her research supports this,
which I think is really lovely.
Now, there are many cases
where we don't necessarily
know where we are on
the interval of time,
or it doesn't
necessarily make sense
to think of there being
some finite interval.
So maybe you move
to Mountain View,
and you don't know
how long you're
going to live in Mountain View.
Or if you're a
company, you imagine
yourself being interested in
being around indefinitely.
But nonetheless,
there's still a sense
in which you care about the
present more than the future.
This framing of the problem
led to a different series
of breakthroughs, starting
with an Oxford mathematician
named John Gittins,
who was hired
by the pharmaceutical
company Unilever
to tell them, basically,
how much of their money
to invest in R&D. And so
he frames the-- he almost
accidentally made this enormous
breakthrough in the problem
by thinking of it
not as there being
some finite interval of
time but as there being
some indefinite future that
is geometrically discounted.
So I don't know
how long I'm going
to be living in the Bay Area.
But a really good meal
next week is maybe only 90%
as valuable as a really
good meal this week.
Or making X dollars
next quarter is only 90%
as valuable as making
those dollars this quarter.
And it turns out
that, again, you
can get a very precise
solution to this problem.
So he explored it in
this business context.
But it's also, I think,
very interesting that it
was a pharmaceutical context,
because this also gives us
a way of thinking
differently about how
a clinical trial should be run.
The basic idea behind
Gittins's breakthrough here,
which is called
the Gittins index,
is he imagines that-- the
word we use is a bribe.
So if you think about-- there's
a game show that is on TV
called "Deal or No
Deal," where you
have a briefcase that
has somewhere between one
and a million dollars in it.
And someone calls you
on the phone and says,
I will pay you $10,000 not
to open that briefcase.
What do you do?
This is the basic intuition
behind the Gittins index.
So Gittins says,
for every machine
that we've tried and have
incomplete knowledge of--
or maybe we have no
knowledge of whatsoever--
there's some machine with
a guaranteed payout that's
so good, we'll never try
that machine ever again.
Maybe the nine-six versus the
one-one is more of a toss-up.
But if it was 9-0
and the one-one,
then there's a sense
in which maybe it's
just never worth pulling the
one-one machine ever again.
And so Gittins comes
up with what he thinks
is a nice approximation,
which is just always
play the machine with
the highest bribe price.
And to his own astonishment,
as well as the field's, this
turns out to be, in fact, not
merely a good approximation
but the solution.
So this is another case where,
like with the parking problem,
we present this
table in the book,
and you can cut it out
and take it home with you.
And it provides
values for situations
where you're trying to
weigh the value between two
different options.
And so going back to our
two slot machines, the nine
and six machine has a
Gittins index of 6,300.
But the one-and-one machine
has a Gittins index of 6,346.
So case closed.
Pull the one-one
machine one more time.
What I think is kind of
philosophically significant
about this is the
zero-zero square has
a value of 70.29%, which
means that you should consider
something you've never tried as
being just as good as something
that you know works
70% of the time,
even though it only has
an expected value of 50%.
And you can see if you follow
the diagonal down to the right,
it goes from 70% to
63.46% to 60.10%, 58.09%.
And it does indeed converge
on 0.5 as you gain experience.
But there's this boost applied
for not having that experience.
So we tongue-in-cheek suggest
that you just print this out
and just use it to
decide where to eat.
But there are
sometimes some problems
that come up with this.
One is that we don't always
geometrically discount
our payoffs.
The other is that
actually computing
these values on the fly is kind
of computationally intensive.
And so there's a third way of
thinking about the problem that
has come in the wake
of Gittins's work,
and that is the idea
of minimizing regret.
So we give a lot of
examples that touch
on Google's work, but the
best illustration of this
actually comes from Amazon.
So Jeff Bezos talks about being
in this really lucrative hedge
fund position and deciding
whether to give up
his cushy job to start
an online bookstore.
And he approaches it from what
he calls a regret minimization
framework.
"I knew looking back
I wouldn't regret it."
In the context of the
multi-armed bandit problem,
you can formulate
regret as every time
you tried something that wasn't
the best thing in hindsight.
And so you can ask yourself,
how does my regret-- what
does that look like as I proceed
through my time in the casino?
And we have good
news and bad news.
We'll start with the bad news.
The bad news is that you will
never stop getting more regrets
as you go through life.
The good news is that the rate
at which you add new regrets
goes down over time.
Specifically, if you were
following an optimal algorithm,
the rate at which
you add new regrets
is logarithmic with
respect to time.
And this has led to a series
of breakthroughs in computer
scientists looking for simpler
solutions than the Gittins
index that still have this
optimality of minimal regret.
One of our favorites, which
I think is the most thematic,
is called upper
confidence bound.
And this says that for every
slot machine-- you know,
you've got some error
bars around what
you think the payoff might be.
So the expected value would be
in the middle of that range.
But there's some error bars
on either side of that.
Upper confidence
bound says, simply,
always do the thing with the
highest top of the range.
Don't care about the
actual expected value,
and don't care about
the worst case scenario.
Just always do the thing with
the highest top of the range.
And I think that's sort
of a lovely, lyrical note
that the math brings us to,
which is that optimism is
the best prevention for regret.
TOM GRIFFITHS: So
our third example
begins in a different place,
which is in the closet.
So I think all of us have
encountered a problem
of an overflowing closet.
Things need to be
organized, but you also
need to make a
decision about what
you're going to get rid of.
And in order to
solve this problem,
we'd like to be able
to turn to experts.
Fortunately, there are experts
on exactly this kind of thing.
So we could consult one of
these-- Martha Stewart, who
says to ask yourself a
series of questions--
how long have I had it?
Does it still function?
Is it a duplicate of
something I already own?
And when was the last
time I wore it or used it?
And then based on the
answers to these questions,
you can make a decision
about whether you should
keep that thing or not,
give it away to charity,
and, as a consequence, end up
with a more organized closet.
So there are a couple of
interesting observations here.
The first is that here there
are in fact, multiple questions
that you should be answering.
And the answers
to these questions
could be quite different.
And the other is
that, in fact, there's
another group of experts who
have thought about exactly
these problems and come up
with slightly different advice.
In particular, they discovered
that one of these questions
is, in fact, far better
than any of the others.
So this other group of experts
don't think about closets.
They think about the
memory of computers.
This is the picture of
the Atlas computer, which
was a computer which
was built at Manchester
University in the 1960s.
And Atlas had an
interesting structure, where
it had two kinds of memory.
It had a drum which could
store information in a way
where it was very
slow to access.
And then it also
had a set of sort
of magnets which could be
used to store information
in a format which was
relatively fast to access.
And when they first
built the machine,
the way that they were using it
was to read off the information
would be needed for a
computation from the drum,
and then store it in
the magnetic memory,
and then do the operations, and
then write it back to the drum,
and then take the next
part of the computation
and read off all of the relevant
information from the drum,
and then do the operations,
then write it back to the drum.
But a mathematician who
was working on Atlas
named Maurice Wilkes realized
that there was a better way
to solve this problem.
He realized that they
could make the whole system
work much faster if
they didn't always
take all of the information
out of the fast memory
and put it back into
the slow memory,
but rather they kept around
the pieces of information
which they thought
they were going to need
to use again in the future.
So the reason why
this speeds things up
is that then you
don't have to spend
the extra time reading
those things back
off the slow memory.
And as a consequence, the
computer runs much faster.
So this is an idea which
computer scientists now
recognize as caching.
So it's the idea of keeping
the information which you're
most likely to need in the
future in the part of memory
which is most easily and
most rapidly accessed.
But it brings with
it another kind
of algorithmic problem, which
is the problem of figuring out
exactly what those
items that you're
likely to need in the future
are going to be or, to put it
another way, what it is
that you should throw away.
And this is a
problem that's called
the problem of cache eviction.
So cache eviction
is something which
requires us coming up
with good algorithms
for deciding what
we're not going
to need again in the future.
And the person who really
made the first breakthrough
in thinking about this is
this man, Laszlo Belady,
who was working at IBM.
So before he worked at IBM,
Belady had grown up in Hungary,
and then fled during
the revolution
there with only a bag containing
one change of underpants
and his thesis paper.
And then he ended up
having to then emigrate
from Germany, which is
where he moved to, again
with very minimal equipment.
He just had $100 and his wife.
So by the time he'd
reached IBM in the 1960s,
he'd built up a
significant amount
of experience in deciding
what it was that it made sense
to leave behind.
So Belady described
the optimal algorithm
for solving the problem
of deciding what you
should evict from your cache.
And this optimal algorithm
is essentially clairvoyance.
What you should do is evict
from the cache that piece
of information which
you are going to need
the furthest into the future.
So as long as you can
see into the future
as far as you need to go in
order to make that decision,
then you can solve this problem.
You can sort of do the
best possible solution
to this problem that
you can imagine.
Unfortunately,
when engineers have
tried to implement
this algorithm,
they've run into problems.
And so for mere mortals, we need
to have some different kinds
of algorithms.
And Belady actually evaluated
three different kinds
of algorithms-- one where
you just randomly evict items
from the cache, one where the
things which were first entered
into the cache are the
ones which first leave it,
and one where you evict
those things which
are least recently used.
That is, those items which have
been used the furthest distance
into the past are the ones
which leave the cache first.
And doing an
empirical evaluation
of these different
schemes, he discovered
that there was one clear winner,
which is the least recently
used algorithm.
So basically, the idea
is that the information
that you used least
recently is least
likely to be the
information which you're
going to need again in the
future as a consequence
of a principle that
computer scientists call
temporal locality-- basically
that if you just touched
a piece of information, you're
going to be likely to need
that information again
in the near future
because there's a kind
of correlation over time
in the pieces of information
which an algorithm might need
to access.
So taking this insight, you
can build caching systems
which work very
efficiently in a variety
of different situations.
So nowadays, caches are
used all over the place.
So if we look on
computers, we find
that you'll see multiple chips
that are dedicated to caching.
There are caches that are
built into hard disks.
There are other
kinds of caches which
are used in servers
for delivering websites
to people as quickly and
efficiently as possible.
But the one place where these
caching algorithms perhaps
haven't been applied and perhaps
should is back in our closet.
And if we look at these
ideas that Martha provides us
with how to organize
our possessions,
then as we go through
these possibilities,
it's clear that
one of them might
be a better recommendation
than the others, which
is when was the last time
I wore it or used it,
which is actually an
instantiation of the least
recently used principle.
So next time you're
thinking about trying
to organize your closet,
it might be worth
keeping this in mind, that as
long as your possessions obey
the same kind of principle
of temporal locality,
focusing on those possessions
that were least recently used
might be the most
predictive of those things
that you're least likely to
need again in the future.
So this kind of principle
isn't just something
which is useful in
thinking about how
to organize your closet.
It's also something
that might be
useful in thinking about
how to organize your office.
And this was a discovery that
was made somewhat accidentally
by a Japanese economist
called Yukio Noguchi.
Co Noguchi is a tax economist.
He was constantly
receiving reports
and papers and documents
that needed to be filed away.
And he was kind of overwhelmed
with all of this information.
He didn't have time
to file it properly,
so he came up with
a simple solution,
which was just to put all
of those papers into a box.
But he didn't just
dump them into a box.
What he did was
actually put them
into a box in a
very orderly way.
So basically, he had a box
which was sort of horizontally
aligned.
And he put the information into
the box at one end of the box.
So as he'd get some
new papers, he'd
put those papers in
at the left-hand side.
And as a consequence,
you know, the papers
would sort of move down the
box as new things came in.
And then he did
something else important,
which is that as he used
one of those papers,
he'd pull it out of the box.
And then when he was
finished using it,
he'd put it back in again at
the left-hand side of the box.
So this has a clear connection
to least recently used caching,
right?
Once your box fills up, you
need to get rid of something.
And you can get rid
of the things that
hit the right end of
the box because those
are the things which you
used furthest in the past
and consequently, least likely
to need in the near future.
But this principle
of taking out a file
and then putting it back at
the left-hand side of the box
also corresponds to
another idea which
has shown up in theoretical
computer science.
So we can actually show
that this way of organizing
his information
is something which
is near optimal, or
at least as close
to optimal as we're
likely to be able to get.
So the actual data structure
that he had created here
is something that a computer
scientist would recognize
as a self-organizing list.
So basically, in a
self-organizing list,
you have a sequence of
pieces of information.
And then as you access
those pieces of information,
you have the opportunity
to change the order
that those pieces of
information appear in.
And so this idea of taking the
information that's accessed
and then putting it at the
very front of the list,
or at the left-hand
side of Noguchi's box,
is actually something
which turns out
to be a very
effective algorithm.
So Robert [? Tagin ?]
and Daniel Slater
proved in 1985 that moving
the most recent item
to the front of the list
is, at worst, twice as bad
as clairvoyance.
So clairvoyance is the best
that you could possibly do.
And it turns out, this
is the only algorithm
that comes with a
theoretical guarantee
of being at least
close to clairvoyance
in multiplicative terms.
So if you're thinking about
implementing the Noguchi filing
system in your
office, it might be
reassuring to realize that
perhaps you already have.
So we normally think
about a messy office
as being a bad thing.
And in particular, a
giant pile of papers
on your desk like
this is something
which kind of seems like a
poor method of organizing
information.
But you can kind
of think about this
as taking the Noguchi system,
and then literally turning it
on its side.
Right?
So a big pile of papers is, in
fact, a self-organizing list.
And if you're taking
things out of the pile
and then sticking
them back on the top
and putting the most
recently used items
on the top of the
pile, then you're
implementing exactly this
relatively optimal strategy
for organizing the information
that's contained in that list.
So if you're somebody who
is familiar with these kinds
of messes, you'll
also be reassured
by some of the message in
our sorting chapter, where
we argue against sorting in
many domestic situations.
Thinking about
caching isn't just
useful for thinking about
how to organize information
around you.
It's also something which might
give you new insights into how
human memory works.
So I think there's an
intuition that a lot of us
would have about
memory, which is
kind of thinking
about it as something
sort of like Noguchi's box.
Right?
You've got kind of a limited
amount of space in your memory.
And so when you
learn something new,
you have to put it in there.
And when you do that, maybe
it pops something else out.
And as a consequence,
you forget that thing.
And so forgetting is
just a consequence
of kind of hitting a
capacity limit in terms
of the amount of information
that we can store.
But a cognitive scientist called
John Anderson has actually
proposed that there's a
different way of thinking
about how memory works and
thinking that the analogy is
less like a box of finite
size and more like a library
of infinite size.
So if you have a library which
has infinitely many books
arrayed along a sort of
linear shelf like this,
then the problem
that you have is
one not of figuring out
where to fit information
but rather one of figuring out
how to organize information.
Another good analog of this
is thinking about something
like web search, where you've
got a whole lot of web pages,
and you want to organize
those web pages in such a way
that you can find the
things that people
are likely to be looking for
with high probability close
to the top of the
list that you produce.
So from this perspective,
forgetting something
isn't that you've sort of
had that thing sort of pop
out of your memory but
rather that the time it
would take you to find
that thing is greater
than the amount of time that
you're willing to spend,
that the way that
information is organized
is not sufficiently
good to allow you
to identify that item quickly.
And so this makes an
interesting prediction,
which is that we
should be trying
to organize those
items in memory
such that the things that we
think we're most likely to need
in the future are going
to be the things which
we find easiest to recall.
And Anderson actually showed
some evidence for this.
So he took some famous data.
This is data from
an experiment which
was done in the 19th century
by an early psychologist,
Ebbinghaus, who
basically taught himself
some lists of
nonsense syllables,
and then would look at how well
he could recall those lists
some number of hours
after he'd memorized them.
And so what Anderson
did was look
at whether this pattern of
recall, where, basically, you
get this kind of rapid
falloff followed by a slow,
sort of long tail
could be predicted
by looking at how
likely it is that we're
going to encounter particular
pieces of information
in our environment.
And so what he did was go
to the "New York Times,"
look at headlines in
the "New York Times,"
and then look at how
likely a word was
to appear in the headline
in the "New York Times"
as a function of how
long ago it previously
appeared in those headlines.
So he could look at the
relationship between the number
of days that had elapsed,
and then the probability
of an item appearing.
And he found that this showed
a very similar kind of curve.
So this kind of correspondence
between the probability
that something is
likely to be needed
as a function of how
long ago it was used
and the patterns of recall
that we see in human memory
provides one suggestive
form of evidence,
that one of the things
that our minds are doing
is trying to organize
information in such a way
that we are keeping
around those things
that we are likely to
need again in the future.
BRIAN CHRISTIAN:
Thinking computationally
about the types of problems that
we encounter in everyday life
has payoffs at a number
of different scales.
The overarching
argument of the book
is there are deep parallels
between the problems that we
face in our lives and the
ones that are considered some
of the fundamental and canonical
problems in mathematics
and computer science.
And this is significant
because as a result,
there are these simple,
optimal strategies that
are directly relevant to those
domains in our own lives.
And there's something
very concrete
that we can use and learn from.
And even if we're not literally
going to stop at exactly,
you know, 37% or so
forth, having a vocabulary
and knowing what
an optimal stopping
or an explore or exploit
problem looks like
and having a general sense
of how the solution is
structured gives us a way to
bolster our own intuitions when
we find ourselves
in those situations.
And I think most broadly,
a lot of the solutions
that we explore
don't necessarily
look like what we think of when
we think of what computers do,
which gives us an opportunity
to actually rethink
our notion of
rationality itself.
So intuitively, we kind of have
this bias that being rational
means being exhaustive, exact,
deterministic, considering
everything, getting
an exact answer that's
correct 100% of the time.
In fact, this is
not what computers
do when they're up against the
hardest classes of problems.
This is kind of the
luxury of an easy problem.
And up against an
intractable problem,
computer scientists turn
to a totally different set
of techniques.
When we take into account the
cost, the labor of thought
itself, the best strategies may
not be to consider everything,
to think indefinitely, to
always get the right answer
in each situation.
We may want to, in fact,
trade off the labor
of the computation versus
the quality of the result.
And as the book progresses,
especially in the second half,
we look at what
these strategies are
for dealing with
intractable problems, which
most of the ones that we
face in real life are.
And that leads to
this conclusion,
that what computer
scientists do up
against the hardest
classes of problems
is they use approximations.
They trade off
the costs of error
against the costs of delay.
They relax some of the
constraints of the problem,
and they turn to chance.
These aren't the
concessions that we
make when we can't be rational.
These are what being
rational means.
We explore this line of
thinking through a number
of different domains.
We've talked about three today.
We also look at
sorting algorithms--
what do they tell you about
how to arrange your bookshelf
and, more importantly,
whether you should?
We look at scheduling theory.
Every operating
system has a scheduler
that tells the CPU what to
be doing when, for how long,
and what to do next.
So we look at what the
parallels are there
for thinking about time
management in our own lives.
And in the context
of Bayes's rule,
we think about problems of
predicting the future-- how
long a process will go on, how
much money something will make
based on what it's made so far.
And, on a personal
level, you've been dating
someone for a couple months.
It's going pretty well so far.
Is it premature to
book that weekend place
in Tahoe at the
end of the summer?
What should you do?
And we provide some rational
answers there, as well.
And unlike the types
of advice that you
might find in, for
example, self-help books,
the insights that are derived
from thinking computationally
about these problems
are backed by proofs.
Thank you so much.
[APPLAUSE]
BORIS DEBIC: Questions?
BRIAN CHRISTIAN: Do you want
to get more in the light?
Yeah.
AUDIENCE: So how useful is it to
have a proof of an abstraction
when the real life doesn't
match the abstraction?
TOM GRIFFITHS: Yeah.
So I think the
way that we really
think about this is
there are definitely
simplifications that go into
formulating these problems.
But what you get out
of them is if you're
in exactly that
situation, you know
exactly what you should do.
But more generally,
you get insights
about how those solutions change
as a consequence of changing
the assumptions.
So, for example, I talked
about some of the variants
on the secretary problem.
And from that, you might
not know exactly what
the number is in your scenario.
But you can recognize, OK.
Well, if it becomes
more permissive,
then I should be more
willing to spend longer
looking before I start leaping.
And if it becomes
less permissive,
than I should have a much
more rapid transition
from those things.
And I think there's a
related question here,
which is if we think that
people should follow algorithms,
why don't we just
make computers that
will solve these
problems for people,
and then people don't have
to make any decisions at all?
And I think one of
the things that people
are really good
at is figuring out
how to interpolate between
these possibilities
and how to kind of evaluate
some of the fuzziness
around the particular
problems that we're facing.
And so we're really
providing tools
that can guide those
human capacities in terms
of thinking about solutions
to more realistic problems.
AUDIENCE: Very
interesting topic.
How long it take you
guys to write this book?
BRIAN CHRISTIAN:
There's a quote that we
use at the beginning
of the book,
of the chapter on scheduling.
We have, the
epigraph says-- it's
from Eugene Lawler, who was
a researcher in scheduling
theory.
And he says, "why don't we write
a book on scheduling theory,
I asked.
It shouldn't take much time.
Book writing, like
warmaking, often
entails grave miscalculations.
15 years later, scheduling
is still unfinished."
So I think Tom and I had
an analogous experience,
where we-- I think of the book
as really emerging in a dinner
that Tom and I had in
2011, where we-- I mean,
we've known each
other for 11 years.
These are shared
obsessions that we've
talked about for a long time.
But we basically
realized that we
wanted to write the same book,
what amounted to the same book.
And so that was when we decided
to team up and work together.
And we had what in hindsight
was a sort of predictably
naive sense of like,
oh, it should take
about 18 months or something.
It'll be out in 2013.
So here we are.
And you can tell how
that plan worked out.
TOM GRIFFITHS: I
should also point out
another terrible
thing that can happen
to a book is having
a small child.
So my wife and I
had our second child
somewhere in that process,
which threw everything off.
AUDIENCE: Are there any
rejected chapters in this book?
Are there any
areas of life where
you looked hard for an analog,
and it didn't work out?
TOM GRIFFITHS: Yeah,
that's interesting.
So our rejected
chapters are more
that there are lots and
lots of algorithmic ideas
that we would love
to write about,
but we just ran out of
space and time to include.
So we've actually
got a folder which
is called Sequel, which
is the only way that we
were happy cutting things,
so we could pretend.
And so that contains a bunch
of chapters and proto-chapters
that explored a lot of
different kinds of algorithms.
And in some cases, it was
that there's algorithms that
have already got good press.
And in other cases,
it was that we really
felt like there was a key
insight that those things could
give you about human lives.
But either you had to get
too far into the weeds
in order to get out what that
insight was or it was something
where-- like, basically,
the examples we give
in the book have been
sort of carefully
selected for exactly
the right blend of you
need to understand a
certain amount of math
in order to get a
practical payoff.
And some things just didn't
reach that threshold.
BRIAN CHRISTIAN: We
also had a beta version
of a chapter on data structures
that had a lot of great stuff
that we still kind of
wish we could have saved.
But it just didn't fit under
the rubric of the book.
So that, I guess, lives in
the Sequel folder now, too.
AUDIENCE: Do you feel
like you learned anything
about the limits or lack thereof
of what machine intelligences
could be capable of?
It seems like a lot
of these examples
right now involve pretty
specific parameters for what
the problem has to be,
but also that there's just
a ton of similarity between
how humans actually do things
and how computers
end up doing them.
BRIAN CHRISTIAN:
This is one where
I think Tom can speak as a
machine learning researcher.
But I would just
say, as a preamble,
I think exactly how
you've formulated
the question is how
I think of it, which
is that a lot of the
algorithms that we discuss
involve assumptions about
the distribution of values.
The Gittins index, the
way that we present it,
assumes a uniform distribution,
or the house-selling problem
assumes uniform distribution.
And often, the hardest
versions of these problems
are what are called partial
information problems,
where you have to
be making inferences
about what you think
the distribution looks
like on the fly based on the
values that you've encountered.
And those often turn out to
be the intractable problems.
And so there is a sense in which
human intelligence involves
making all of these
inferential leaps,
but also deciding
how to parameterize
the problem in the first place.
AUDIENCE: Yeah.
I think this also relates
to Peter's question.
So I think as you get closer
to the actual problems
that human beings solve,
you sort of start to hit
the edges of the theory.
Another good example of this
is in the explore or exploit
setting, where-- so if you
get people and put them
in an explore or
exploit scenario,
you find pretty consistently
that people deviate
from the predictions
that come out
of these sort of standard
multi-armed bandit models.
And they deviate in a
particular direction,
which is that they
tend to over-explore.
So kind of at the point where
the algorithm is committed
and said, hey, just
keep on pulling
that one lever, people
are still like, well,
I'm going to go try that one.
I'm going to try that one.
I'm going to try that
one, and still trying
out different options.
And so that looks
mysterious until you
realize that it's not the
people are being irrational.
It's that, in fact, the
assumptions of the model
are kind of wrong for
a real human situation.
So the assumption is that
a slot machine pays off
with the same probability.
That's sort of fixed over time.
Whereas in a human environment,
the payoff probabilities
for different options are
things that change over time.
Right?
So they might be sort of
slowly drifting around.
A particular restaurant is
getting better or worse,
or a chef has changed, or
it's under new management,
or-- all of those
things are things
that over time mean that those
probabilities can be different.
And so if you're in a
situation like that,
then what you should
be doing is, in fact,
exploring more because you
need to go back and check
on the information that you had.
But that problem is what's
called a restless bandit
problem, and it's one
which still presents
a challenge for developing
sort of good machine learning
methods that can deal with it.
So I think there's plenty
of room to take inspiration
from human cognition, as well as
room for making recommendations
about humankind.
BORIS DEBIC: And with
that, let's please
thank Tom and Michael
for coming to Google.
BRIAN CHRISTIAN:
Thank you so much.
