I'm not saying people are quantum devices,
maybe we are who knows but
I'm saying modeling these complex
problems as if they were discreet
dice game like problems is wrong.
[MUSIC]
I'm gonna talk to you today about changing
some of the ideas of how we think
about intelligence and intelligent
behaviors manifested in machines.
We all want our computers, our phones,
our desktop computers to understand us.
We want them to understand us when we
give them commands and we would like them
to understand us well enough to
predict things that we're going to do.
And the basic message I'm
gonna give you here is that
statistics is not the right
set of tools for doing this.
So let me take you through it.
Statistics as probably many of you know,
is a very rigorous mathematical theory
based, more or less on counting.
The examples that you learned when you
study it always begin with gambling games.
So what is the probability of rolling
a two in a pair of six sided dice?
Well, there's 36 possible
outcomes when you roll dice and
only one of those outcomes yields a two.
So the probability of rolling
a two is one out of 36.
The probably of rolling a three,
well there's two ways to roll a three.
So the probability is two out of 36.
And of rolling a four?
Three out of 36.
And the probability of
rolling the two dice
such that the sum is between two and
12 is 100%, 1.0.
Every possible outcome can be enumerated.
So the world of chance games is
completely modeled by statistics and
this is the underlying foundation
of probability theory.
Well this the mathematics that arise from
these observations are commonly used.
We are very familiar with these.
To be less political sensitive
I've chosen a historical example.
>> [LAUGH]
>> [COUGH]
And we know about these polls they call
you up they ask you a question, what do
you think of, what's your confidence level
in this case Bill Clinton and
what's your confidence level in Al Gore?
And if 76% of the people like
Al Gore then they basically
interpret this as a probability.
The probability of meeting someone on
the street and asking them if they
like Al Gore is, in this case,
76% and for Clinton, it was 58%.
Hm.
Well, we use a very similar set of
theories in artificial intelligence and
almost all of the really
exciting work that you
hear about an artificial intelligence
all these new capabilities
really ground themselves in the same
set of theories in counting.
So I'm skipping a lot of details, but
more or less happens when we understand
Language with machines is if we try to
take some particular word and we say,
I want to understand what this word means.
Well, we basically go out on the web and
we count sentences in
which this word appears.
And we count all the other words
that this word appears with.
We call that the context.
And then to find the words
that mean the same thing,
we find all the words who's probabilities
of co-occurring with the set of words that
alabaster co-occurs with at the same rate.
So this is essentially what
alabaster means on the web.
And I don't mean to say that alabaster
co-occurs with these words, but
that these words co-occur
with the same words
that alabaster co-occurs
with in the same frequency.
That's how we derive meaning
in words in an automatic way.
And it's based on counting, this is just
the application of the same theory of
statistics for dice games to
counting words and deriving meaning.
And it works more or less.
But there's a problem here and
the problem is that the theory
that is grounded in dice games assumes
that every possible state is innumerable.
That all the information you have in
the system is all the information there
possibly is.
But we know in reality that that is
just not always the case in fact
it's never the case.
There is always more information.
No matter how you circumscribe the world,
there's always more information
that may impact a decision,
a prediction that you need to make.
[COUGH] So let's take a slight
turn into some studies done during
psychology, and
I'm going back to this very same poll.
So this was done in 1997 and
in response to request from psychologists,
Gallup actually did something
they've never done before.
They inverted the order in which
questions were being asked.
So they asked a bunch of respondents two
questions I don't remember the exact
questions, something like,
do you consider Clinton trustworthy, and
do you consider Gore trustworthy,
something along those lines.
And what we are familiar within
statistics, you may be familiar with this
You run these polls until you
a certain confidence level, a p-value.
So you continue gathering data until
you've gained enough data that
the probabilities don't change much
when you start polling more people.
And once it's stabilizes at some
percentage, you gain a confidence that we
call a p-value that your result Is going
to be, it's not going to change if
you do significantly more,
if you collect significantly more data.
And that's how these polls work.
But in this case, while they polled
the same number of people to give
them high confidence that
the number was correct.
When they inverted
the order of the questions,
they found a gigantic difference
in the percentages of people that
responded positively or negatively
with respect to these two questions.
And this can't be explained
in the laws of statistics.
This was considered
a paradox of statistics.
People are irrational.
Some scientists said,
what do you gonna do?
People are irrational, they can't make
up their minds whether they like or
dislike Clinton or Gore.
That's the problem here,
it makes no sense.
[COUGH] And the reason it doesn't
make sense is because it sounds like
a contradictory thing.
Because according to the theory of
statistics, what this result means
is that some people, and it wasn't
necessarily the same people being polled,
in fact,
it was not the same people polled.
But the way the laws of statistics work,
this implies that people can both like and
dislike the same person.
That doesn't seem to make any sense.
But the problem is that nobody
just likes or dislikes a person.
I mean especially if you're
talking about political figure but
it's applies to almost everything.
What you're doing is you're boiling down
something that's really quite complex
into a simple yes or no question.
I may not like Clinton's trade policy but
I liked his stance on the environment and
his foreign policy.
Or maybe I liked his foreign policy
with respect to Great Britain but
I didn't like his foreign
policy with respect to Nigeria.
I didn't like all the scandals
he was involved in, but
the economy was doing really well.
Yes or no sir.
Did you like him or not?
[SOUND]
No.
Well, what about Gore?
You know I did like Clinton.
>> [LAUGH]
>> [COUGH]
So in some sense,
what we're doing here is
applying the wrong tools.
We're boiling down a very complicated
scenario into something that's very simple
and trying to make it
sound like a dice game.
And then once we boil
it down to a dice game,
we then act surprised when
it doesn't behave like one.
Wait, you mean there's something more
going on here than two six-sided die.
And the same thing is happening
in artificial intelligence.
So this example I gave, what if I change
instead of all of the web, instead of
counting sentences across the entire web,
what if I concentrate just on books,
fictional books?
Fantasy books.
And then I find, in some book,
there's a character named Alabaster and
I end up finding that Alabaster
means the same as other names,
other elvish fantasy names that
are mentioned in the same context as
the character, Alabaster,
in some set of stories.
Or, if instead of the whole web or
fiction, I focus on science
articles and I do this very same analysis
only on articles about science and
I find that in that context, the meaning
of the word alabaster is quite different.
It's like chalk or magnesium.
But I had a good p-value.
It had statistical significance.
What's going on here?
You're treating it like a dice game.
You're treating a complex
situation like a dice game.
There's always something else going on.
And the problem is
actually not statistics.
The problem is using statistics for
a problem for
which it wasn't really intended.
So 100 years ago,
physics was faced with a very similar kind
of dilemma, a story you may have heard.
Some of the experiments
that they were conducting,
the results of those experiments
similarly made no sense whatsoever.
So one dilemma was whether quantum
particles behaved like particles or
like waves.
Did they have wave-like properties or
particle-like properties?
Well, when you pass quantum
particles through a single slit,
you get a distribution pattern behind
the slit to look something like that.
The particles collect in kind of
a normal distribution behind the slit.
So you would expect if these quantum
particles are in fact particles that if
you shoot them through two slits,
the pattern would look like that the same
as the single slit, but
reproduced behind each slit.
If, however,
particles had wave-like properties,
you would expect an interference pattern.
You would expect the waves to interfere
with each other the way water would
as they went through those slits and
you would get a pattern like that.
And that was what they observed and so
it was included that quantum
particles are waves.
And then someone had the thought, well,
what if we put a recorder on each slit?
Just to see which slit
the particle passes through.
And then suddenly,
it behaves like a particle.
What?
If it's both a wave and a particle?
That does not make sense.
Why doesn't it make sense?
Because it seems like being a wave and
being a particle are disjointed.
It's like liking Clinton and
hating Clinton.
You can't do both.
[LAUGH]
Yes, actually you can.
It's reality.
It's not broken, it's just the way it is.
Your math is wrong.
Your assumptions are wrong.
Change your assumptions.
So what consequences does this have for
decision making, for
predicting, for building intelligent
machines that can understand us better?
These machines need to understand that in
fact a lot of times the probabilities of
events that are described as contradictory
can both happen at the same time.
We can't make this assumption
that the probabilities sum to 1.
This is what happened with
the poll about Clinton.
If we asked in a different order,
we got different probabilities and
the probability of the event that
we consider the contradiction of
the other of not liking Clinton versus
liking Clinton, it should sum to one and
it should be pretty close and
it was no where near that.
And experimentally,
the baggage that statistics
brings along with it
means that people almost never check the
contradictory case because it's obviously
the probability of that is one minus
the probability of the positive case.
Why should I measure it?
I just need to subtract it from one.
But you've got all these
other things going on.
I mean, you ask someone a question
when they're in a bad mood,
they'll probably give
you a negative response.
Clinton, that damn, I hated him.
My feet!
>> [LAUGH]
>> It happens.
[COUGH]
So we did some experiments,
I mean that's people and opinions,
I mean come on of course.
[LAUGH] That's a wasteland.
What about medicine?
Ah-ha, medicine, now its science precise.
We don't have any of those problems here,
there are no opinions, it's only fact.
So what if I poll a bunch of
medical experts and ask them,
how likely are they to use
antibiotics to treat typhus?
Typhus is the usual treatment.
And we get something like 75% of
them saying, yes I would use typhus.
A few of them say, no, I would use
antibiotics, a few of them say,
no they wouldn't, and one person
actually said, are you an idiot?
No one uses antibiotics to treat typhus.
Well that's what 75, they're all idiots.
Okay, now, what if we take
a different set of medical experts and
ask them the opposite question?
How likely would you not use
antibiotics to treat typhus?
We do not get an answer that, what we
should get if the laws of statistics is
they boil down to this question
being a yes or no question.
We expect to get a .25 here.
It should be the complement
of the first question.
It's It's just a negation you
can't use antibiotics and
not antibiotics to treat typhus.
But, I mean psychologically what's
happening here I mean apply common sense
when you ask the question that way they
kind of interpret it as would I use?
Yeah I would use antibiotics to treat.
Would you not use
antibiotics to treat typhus?
Yeah, I suppose if the patient
history indicated that I wouldn't.
If the hospital was out of antibiotics
if there was some kind of shortage,
I might use something.
So they begin to consider other things
because there's other things going on.
This is medicine, it's not so simple.
And yet, a lot of what we do,
from political polling to psychological
surveys, to news articles about
what's going to happen tomorrow,
are making these kinds of assumptions,
that are based on the observation that I
can boil all this down to a dice game.
[COUGH] So using classical probability
theory, it's impossible for
75% of the people to say yes and
50% of the people to say no.
But in the mathematics that
underlies quantum theory actually,
this is perfectly acceptable because
of the way particle duality.
A new mathematics was derived
in which probabilities like
this don't need to sum to one.
It's okay for there to be something
called a super position of state
where something is move a particle and
a wave at the same time.
And I'm not saying people
are quantum devices.
Maybe we are, who knows.
But I'm saying modeling these
complex problems as if they
were discrete dice game
like problems is wrong.
We should be modeling them the way
the mathematics underlying
quantum theory models
the universe is that way.
Because that make sense.
Thank you.
[MUSIC]
