[MUSIC PLAYING]
[APPLAUSE]
JENS LUDWIG: Thanks
so much for having me.
If the theme of the
session is cops and nerds,
I think I'm going to
force the audience
to guess which of those I am.
So here's what I
wanted to talk about.
Thanks so much for having me.
What I wanted to talk about is,
you realize, especially when
you live in a big city like
Chicago, how many heartbreaking
social problems that we are
trying to solve right now.
And you also realize
that we are increasingly
living in a world of big data.
And so it seems like there
is a very natural and easy
intellectual
arbitrage opportunity,
where we can take the sort
of machine learning tools
that we encounter all the
time in the commercial sector
and just kind of plug
and play, and apply
them to solve policy problems.
It looks like an easy
intellectual arbitrage
opportunity for making
the world a better place.
And so what I wanted
to do is I wanted
to talk about how I think
social progress, and some
of these really difficult
policy problems,
is going to require
a lot more, I
think, than just off-the-shelf
machine learning.
So that's the theme of the talk.
So let me talk about that a
little bit more concretely
within the context of one
particularly important problem
that we have here in Chicago.
But this is obviously not
just a Chicago problem.
So if you're here in
Chicago, you drive down
to 26th and California on the
southwest side of the city,
you will see the Cook
County Jail, which has,
depending on the day,
between 6,000 or 8,000
or 10,000 residents--
some of the most economically
disadvantaged people
in the city of
Chicago, 80% to 90%
of whom are either
African-American or Hispanic.
So we can back up
and say, how did we
wind up with so many people
in the Cook County Jail?
So here's a little primer
on how the criminal justice
system works.
So you get arrested
in the United States,
and the Constitution
says within 48 hours,
you've got to go in
front of a judge who
makes a decision about
where you're going to await
adjudication of your case.
Do you get to go home or do
you have to sit in jail waiting
for your case to be resolved?
And the law says that
the judge is supposed
to make that decision not
based on whether you're guilty
or not or what the
punishment is for the crime
that you're alleged
to have committed,
but the law says the
judge is supposed
to make that decision entirely
based on a prediction--
a prediction of your safety
risk and a prediction
of your flight risk.
And this is an enormously
consequential decision,
as you can imagine.
So if the judge jails
you, on average,
you'll spend two to four months
in a place like the Cook County
Jail.
You can imagine what that
does to your job prospects.
You can imagine what
that does to your family.
You can imagine what that does
to the local county budget
to keep 6,000, 8,000,
10,000 people in jail.
Releasing people also has a
serious downside risk as well.
The person that
the judge releases
might go on to
commit another crime.
And crime itself is also very
regressive in its impact.
So super high-stakes decision.
If you're a tech person, you
look at this and say, well,
what is the criminal
justice system currently
doing to help judges make this
high-stakes very difficult
prediction?
And this is status
quo in most cities.
This is the same
technology that we
would have used in the 1950s,
or maybe the 1850s-- ye olde
decision aid.
And so it's natural
to think, well,
why not instead use
the sort of technology
that places like
Google are applying
to all sorts of
commercial applications?
And so this seems
super straightforward
because it seems
like we have most
of the important ingredients
that we need to make progress.
So for starters, we have these
amazing machine learning tools.
And I think it's easy--
because they're so
ubiquitous now--
it's easy to overlook exactly
how amazing these tools are.
So let me just spend
a minute reminding us
of this within the context of
one very common kind of machine
learning application, which
is sentiment analysis.
So if you're a
computer scientist,
you know one of the canonical
applications of machine
learning is to take a piece of
text that a human has written,
and try and infer what the
affect was that the author is
trying to communicate.
So here's an example from a
more or less randomly selected
consumer product off of Amazon,
the Hutzler 571 Banana Slicer.
And so what we have here is
a product review-- some text,
and then we also have
a starred review,
which gives us sort
of ground truth--
what actually was
the author intending.
And so here's an example.
This is from someone
named Thrifty E., who
says, "I bought this in
order to speed up cutting up
a banana for my cereal.
Any time that I saved
in that endeavor
was spent cleaning
this implement.
It's not easy to clean.
You have to scrub between every
rung to thoroughly clean it."
You can see that's not a great
review-- two out of five stars.
You can look at another
one by Uncle Pookie who
says, "Once I figured
out I had to peel
the banana before using,
it works much better.
Ordering one for
my nephew, who is
in the Air Force in California.
He's been using an old
slinky to slice his bananas.
He should really
enjoy this product."
Five out of five stars.
"Confusing" by Q-Tip--
"There's no way to tell if this
is a standard or metric banana
slicer.
Additional markings on
it would help greatly."
One out of five stars.
And finally, from J.
Anderson, "Angle is wrong.
I tried the banana slicer
and found it unacceptable.
As shown in the
picture, the slices
is curved from left to right.
All of my bananas are
bent the other way."
So this is an example also of
how a machine learning turns
out to have such powerful
commercial applications.
This helped the Hutzler company
figure out what their 572
Banana Slicer should look like.
Now, I think sort of
the key lesson from this
is, you're reading
these narrative reviews,
and it is super easy
for you to figure out
what the author was intending.
And it's so easy that when
the computer scientists
in the middle of 20th
century were initially
working on artificial
intelligence,
this led to a natural
sort of conclusion
about how you would try and get
a computer to do what humans
do, especially for something
that is so easy for us to do,
which is just program them to
do something how we do it--
so introspect on
what you do and write
a program that does
exactly the same thing.
And here's sort of the problem
with that kind of approach
that the computer scientists
found in practice.
This is from a study done
at Cornell doing sentiment
analysis for movie reviews.
So they get a bunch of
Cornell nerds in the basement
to introspect on what
words they would think
would show up in a positive
or negative movie review.
They write a program that
looks for those words,
and then they classify
positive or negative reviews
on that basis.
Usually, we set up
the test set here so
that we have 50% positive
reviews, 50% negative reviews.
So an accuracy of 50% would
be like random guessing.
And so these are the words
that the Cornell nerds picked.
I don't know why "suck" and
"sucks" are both separate words
there.
It's Cornell, not the
University of Chicago.
So one of my co-authors
is from Cornell,
so I say that with love.
And so here's what we saw
with this sort of programming
approach, is we're doing
better than random guessing,
but not much better
than random guessing--
not much better than
random guessing.
And the breakthrough here with
these machine learning tools
came from when the computer
scientists realized
that we just need to forget that
we know how to do these things,
treat the known like
the unknown and just
start treating this like a
brute force empirical exercise.
And basically,
mine the data, mine
the movie reviews
themselves for information
about what words turn
up more frequently
in positive and
negative reviews.
And once we start
doing that, we start
to get up to accuracy rates
on the order of like 95%.
So this really was
the big breakthrough,
and the tools that
enable us to do this
really are incredibly amazing.
I think what's also particularly
exciting from a public policy
perspective, if you think
about like pretrial release
decisions, is that there is
an enormous amount of data
to be had.
So this is what
we discovered when
we were working at building
a pretrial risk prediction
tool ourselves using data from
a large anonymous American city
of 8.5 million people.
And you know, you have
millions and millions
of observations, which
doesn't sound like a lot
when you're talking to
an audience at Google.
But I think in a
policy environment,
this is a new development
to have so much information
that we can bring to
bear on these problems.
And for each case that
goes through a bond court,
we have lots of information
about the current charge
and their prior criminal record.
So lots of information that
can be brought to bear.
And these machine
learning tools now
are increasingly accessible.
So anybody with an
internet connection
can download our free
software and just
basically build a machine
learning algorithm,
and they're off to the races.
Now, normally in a
machine learning exercise,
the final step is
also easy, which
is determining how good of a
job your algorithm is actually
doing, right?
And so imagine that
I'm doing some sort
of commercial application
of machine learning,
and I am trying to
decide, like, how
good is my facial
recognition software doing?
And I want to give it a
bunch of new face pictures
that it hasn't seen.
And I want it to tell me
if a normal human face is
in the picture.
And so you can look at
that-- face, no face,
it's easy to tell-- and
score the algorithm.
I think one of the
things I learned
for this slide show was Huskies
have the funniest dog faces.
And whether this one
is a normal human face,
I think is sort of a deeper
philosophical question.
But this is easy, right?
This is the easy part.
This turns out to be the first
point in policy applications
where you realize
that this is very
different from commercial
machine learning.
And part of the issue here is
that at some fundamental level,
we really don't care
about prediction quality.
The thing that we really
care about instead
is decision quality.
What I want to know is, can
I take a machine learning
algorithm and build some new
release rule that I can give
to a judge, and
will that actually
turn into the world
becoming a better place?
Now, why is that complicated?
OK, why is that complicated?
Think about how an algorithmic
release role could potentially
make the world a better place.
The algorithm might
want to detain someone
that the judge releases or it
might want to release someone
that the judge detains.
And so then we
can ask ourselves,
how do we score whether the
algorithmic release rule is
better than the judge or not
in a world in which the data
that we have available to
us to evaluate the algorithm
is generated by the human
decisions, generated
by the judge's decisions?
And so here's what
the issue is, right?
If the judge releases
a defendant--
if the judge
releases a defendant,
we don't have any
sort of problem.
I can observe what the crime
outcome is under the judge's
actual decision, and I
know what the crime outcome
is under the counter-factual
algorithmic decision
if the algorithm wants to detain
the defendant, because putting
someone in jail, by definition,
incapacitates them and prevents
them from engaging in crime.
So that's easy.
But what if the judge
detains the defendant?
What if the judge
detains the defendant,
and the algorithm
wants to release them?
How do I know whether
the algorithm is right
or whether the
judge was right, OK?
Now, if you're a computer
scientist, you'll look at this
and say, well, this doesn't
feel like a hard problem.
At some level, it doesn't
feel like hard problem.
We have a bunch of data on the
people who the judge released.
We have a bunch of background
information on the people
that the judge released
and the judge detained.
Why don't we just assume
that the crime outcomes
for the people
the judge released
would be like the crime
outcomes for the people
the judge detained who have
observably similar current
arrests and prior
criminal record, right?
It seems like a very, very
straightforward imputation
exercise on its face.
Here's the problem in
a policy application,
which is the judge sees things
that no algorithm will ever
see.
So this is an example of a
defendant, a 25-year-old guy
who reported his
occupation, when he
was arrested, as tattoo model.
I didn't know there
was such a thing.
He gets arrested
two different times.
One time, that's
him on the left.
And then this is him
again on the right.
So you'll notice that
the 25-year-old tattoo
model on the right has decided
that it was a good idea
to get his face tattooed
like Joker, the super villain
from the "Dark Knight" movie.
What you can't see, because it's
on the other side of the face,
is on this side, you can
see he's got the Joker
tattooed on his right.
On his left, it says-- just in
case there was any ambiguity
about what he's about--
on his left, it says in
giant letters, "Fuck Batman."
Now to the algorithm, the
25-year-old tattoo model
on the left and the 25-year-old
tattoo model on the right
look exactly the same.
The algorithm looks at the
guy on the left and say,
he doesn't look--
no prior record, current
offense is a misdemeanor.
This does not look like
a very high-risk guy.
The judge can see something
extra about the defendant.
Now, imagine what happens.
The algorithm goes to
the judge and says,
why are you detaining these
25-year-old tattoo models?
These are all low-risk guys.
You should just let them go.
You let the guy on the right go,
and crime goes up by a lot more
than the algorithm
had anticipated.
And we wind up inadvertently
making the world a worse place
rather than a better place.
And the key here is that, in
these sorts of social science
or policy applications,
we have to take seriously
the possibility that the
judge has private information
that the algorithm doesn't have,
which makes this evaluation
problem enormously difficult.
And if we ignore that
evaluation problem, when
we're comparing the
algorithm to the judge,
we're basically stacking
the deck in favor of saying,
the algorithm almost has
to do better than the judge
if you ignore the possibility
that the judge has
extra information.
OK, so what is the
solution to that problem?
Well, the solution
to that the problem
comes from two insights.
The first insight
is to recognize
that the problem's one-sided.
So when the algorithm wants
to release a jailed defendant,
that's a problem, but
there's no problem
if the algorithmic decision
rule wants to jail someone
that the judge released.
And the second
insight is to rely
on a common trick that
we have in econometrics,
or the causal inference
literature, which
takes advantage of the
fact that, in this case,
we have something like random
assignment of cases to judges.
So what we wind up having
is multiple judges seeing
caseloads of defendants
that are similar on average,
and yet the judges
differ a lot with respect
to their leniency rates.
And so here's what we
can do in that case.
Here's what we can
do in that case.
So imagine that we have
two judges, one of them
with a 90% release
rate, and one of them
with an 80% release rate.
And they're seeing caseloads
that are similar on average.
And so here on the left,
we've got the judge
with the 90% release rate.
We can go into the pool--
the remaining 90% of defendants
that the lenient judge has
released-- and we can rank order
defendants by our algorithm's
predictions of their crime risk.
And then we can count down that
rank-ordered list of defendants
by predicted risk and pick
another 10% of defendants
to detain.
So this is what the
algorithm's guess
is for the most socially
productive marginal
10% of defendants to detain.
And that gets us
down effectively
to an 80% release rate.
And then what we
can do in that case
is compare the
crime rate that we
get in that case with
the actual crime rate
that we get from the
80% release judge.
And that turns out to be a
credible way and a fair way
to compare whether the algorithm
really is doing a better
job than the judge.
So what you see
when you do that is,
it is indeed the case that
the algorithm is doing better
than the judge,
and the algorithm
is doing a lot better
than the judge.
And so what you can
see is if we were
to hold the size of the
jail population constant
and just use the
algorithmic release
rule to decide who to detain, we
could reduce crime rates by 25%
without having to change
the jail population.
Now, you might be
sitting there thinking,
given that we've ramped
up the incarceration
population in the
United States so much,
maybe you think the big
problem in the United States
now is not crime, so much
as incarceration itself.
And so alternatively, what you
could imagine doing is saying,
let's hold the
crime rate constant.
How many fewer people do
we need to detain in order
to achieve the same
level of crime reduction
as the judge's current
decisions achieve?
And if you do that,
we can reduce the jail
population by fully 42% with
no increase in crime, right?
Now, that's completely amazing.
That's completely
amazing in part
because, if you think about how
hard these social problems are
to solve, usually in
those rare cases where
we find an effective
solution, they
turn out to be either very
expensive to implement
or very difficult to
scale up successfully.
That's not the case with the
machine learning algorithm.
Once the thing is
built, the marginal cost
of running this over and
over again is near zero.
And because it's
an automated tool,
it scales nearly perfectly,
at least over the range
that we have for something like
pretrial release decisions.
There is something that
you'd be worried about, which
is we don't care just about
crime and detention outcomes,
especially in the
criminal justice system.
We care a lot about things
like fairness as well.
And so maybe what's also
amazing about the machine
learning application
in this case is,
in addition to enabling us
to simultaneously reduce
crime and jail populations,
at the same time,
we can also reduce disparities
within the detention
populations as well.
And the answer to why
that algorithm can do that
is easy to see
once you recognize
what the alternative
to the algorithm is,
which is the same human
decisions that gave us
the current criminal
justice system that
is so skewed in the direction
of over-representing racial
minorities and low-income
populations behind bars.
Now, if we were in a
commercial setting,
you would build an
algorithm like this.
You'd have your face
detector software,
you'd put it on a phone, and
you'd be off to the races.
The social benefits
of the algorithm
would be automatically realized.
If you sort of think about the
pretrial release application,
there is another back-end step.
There's another
back-end step, which
is that nobody imagines that
the algorithm is actually ever
going to be the decision-maker.
Realistically, the algorithm
for predicting defendant risk
is inevitably just
going to be a decision
aid for a human
being that then has
to translate the algorithm's
predictions into decisions.
And if the human being
is getting things wrong,
they can undo whatever
the social good potential
is for the algorithm's
predictions to achieve, OK?
And so put differently,
the robot in this case
is not going to
replace the human.
We think of the robot
as a complement,
not substitute, for human
judgment at the end of the day.
So here's one final, I
think, really interesting
potential application of
machine learning that I think
looks very different
from what we're
used to seeing in the
commercial setting.
And that is to basically
use machine learning
as a behavioral diagnostic
to better understand
what the human beings themselves
are doing right now partly
as a way to help solve this
problem of how we optimally
combine human and machine
intelligence in designing
a decision aid that
we can give the judges
and realize the potential
for social good.
So normally what
we do is we take
a machine learning algorithm.
We predict the defendant's risk.
We do a horse race between
the algorithm and the judge.
So the defendant's behavior
is the object of interest.
The other thing that we can do
is basically turn the algorithm
on the judge themselves.
That is, basically
predict what the judge is
going to do in a given
case-- what the judge is
going to do as a function of
the defendant's characteristics,
OK?
And that turns out
to give us lots
of really interesting insights
into what humans are doing.
So let me just give you a couple
of quick examples of that.
Notice what this lets
us do for starters.
So if you look at the
actual judge decisions,
the only thing that
you observe from judges
is whether they detain or
release a defendant, right?
But if you think about our
prediction of the judge's
behavior, that's a continuous
release probability
at that point.
And what we can do then is we
can look across the defendant
risk distribution to see where
judges are least certain.
And you can see in
this graph here,
what we've done is we have
the judge's predicted release
probability on the y-axis here--
sorry, the judge's
predicted detain probability
on the y-axis.
And then what we've done is
we've binned defendants up.
We've grouped
defendants together
based on the algorithm's
predicted defendant crime risk.
So we are looking at the
dispersion of judge's release
probabilities as a function
of how serious the defendant's
crime risk actually is.
And what you can see
here is it's really
the highest-risk defendants
that are the ones where
the judges have the most
uncertainty and the most
difficulty making decisions.
And it's not inevitable that
that's what we would have seen.
So for instance, if
you look at, like,
education applications,
what you see
is that principals do
amazingly well at predicting
which teachers are amazingly
good and amazingly bad.
And principals have
a huge difficulty
distinguishing the quality of
all the teachers in the middle.
So it didn't need to turn out
like this in the crime setting.
It turns out to be the
case that the judges seem
to have the most difficulty
with the highest-risk cases.
And one of the things that
we're learning from psychology
is that judges, like
all human beings,
have very constrained
bandwidth for judgment
and decision-making.
And so one of the
things, for starters,
we can use this
information about
is sort of a bandwidth
triage tool for judges.
Here are the cases where you
need to devote extra time
and being extra sure in
mapping the algorithm's
predictions into a decision.
We know that these are going to
be the cases that are hardest
for you.
And let me give you one
more example of this.
The other thing
that we can do then
is we can look at the
crime and detention
outcomes of the
actual judge decisions
versus the predicted
judge decisions.
So if you think about what the
predicted judge decisions are,
for a case with a given
set of characteristics,
what does the judge
do on average?
And how does the
average judge do
compared to the actual
judge decisions that
vary around that average?
Now, economists would
look at this and say,
it's clear that the
actual judge decisions
have to do better because
the actual judge has more
information than the algorithm
because the judge sees more
things than the algorithm sees.
Psychologists look at this
and say, it's not obvious.
We know that judges, there can
be a lot of noise in the system
too, in the extra
information the judges have.
And so what we actually see in
the data is that the predicted
judge-- that is, the
information-reduced version
of the judge--
winds up leading to
better crime and detention
outcomes than the actual judge.
Now, why might that be?
Why do judges mispredict?
Let me just tell you a quick
story about what I think
might be going on here.
And it's a story that
comes from a conversation
that I had with a friend of
mine who is an emergency room
doctor, and he was like the
attending resident one night
in the ER that he's working in.
And a patient comes
in complaining
of signs that look like
he's having a heart attack.
And so this is a
common situation
that they have in the ER.
And the ER docs, in that
case, their key goal
is to try and predict whether
the patient is actually
having a heart attack or not.
And if they are,
then they send them
to the ICU for
immediate treatment.
And so everybody else
on the team, they go in,
they do this sort of
cardiac enzyme test
as part of their prediction.
And if it's above
some sort of level,
then that's sort
of one diagnostic.
And so they do that.
And then my friend goes to the
doctor and nurses' station,
says, what do we have here?
And everybody at the station
says, we've got this guy,
he's got chest pains.
We administered the
cardiac enzyme test.
It's above the
threshold level-- a lot
above the threshold level.
We've got to get this guy
into the ICU immediately.
So then my friend goes in to
see the guy in the waiting room,
and the guy is sitting there
and he looks up at my friend.
The guy is sitting
there eating a snack.
He's having a slice
of watermelon.
And he's talking to my
friend, and my friend's like,
what's going on?
And the guy's like, yeah,
I'm having some chest pains.
I don't feel great.
My friend talks to him
for a couple of minutes,
goes back out to the station.
And everyone's like, OK,
we're ready to go, right?
We're going to get
this guy up to the ICU.
And my friend says, no,
I think he's all right.
I talked to him.
He's pretty chill.
I'm not worried about it.
And everybody else
is like, no, no, no,
we've got to get him to the ICU.
And my friend says, look,
I'm the only one who actually
saw the patient in person.
You guys have just seen
his medical test results.
And I'm telling you,
I talked to the guy.
He's hanging out.
He's having a snack.
There's no big deal.
15 minutes later, the guy
goes into cardiac arrest
and they race him to the ICU.
I think what's going on there
is the human brain is designed
to be sensitive to very
salient information,
even if it's irrelevant.
We don't normally associate
having a snack with having
a major health disorder.
And so you can see
how my friend--
like the fact of
having the snack
was an enormously salient
detail that turned out
to be totally irrelevant.
I think the data
that we're seeing
are consistent with
lots of things,
like distracting judges in
the court setting as well.
And so the key goal for
realizing the social potential
of these tools is going
to be to figure out
what exactly those
salient-but-irrelevant pieces
of information are that are
leading the judge astray.
That is not a computer
science problem, right?
In some sense, that's kind
of like the last-mile problem
of building these algorithms
and helping judges.
And that last-mile problem is
not a computer science problem.
That's really a behavioral
science problem.
So if I sort of look at--
sorry.
The other thing to say is I'm
using bail as an example here,
but nothing that I've
said is unique to bail.
There are tons and tons of
super important policy decisions
that policymakers
and private citizens
make every day with really
important policy consequences
that hinge on a prediction,
where the person
themselves are currently
making the prediction.
But in principle, we could be
doing that with data instead.
What college, major, or
course is best for me?
Millions of college
students are trying
to answer that every year.
How long am I going
to be unemployed
as I'm trying to think
about how much to scale back
my consumption, or where
to set my reservation
quality for some new job
offer that I should be taking
or not, or how much
house can I afford?
What's going to happen to my
income stream over the future?
Behavioral science
tells us that these
are prediction
problems that are going
to be hugely difficult for us.
So what does this teach us?
Well, I think one
thing that we can see
is that prediction can be super
useful for making progress
on these policy problems.
At the same time, there
are potential pitfalls.
Now, I think what's really
important about this is
that we know that there are--
what's not news
is that there are
pitfalls in applying machine
learning to policy problems.
There are probably 10,000
papers in computer science
published about, for
instance, fairness.
I am reasonably optimistic
that the field will eventually
solve that sort of problem
because we recognize it,
and so many people
are devoting--
understandably and appropriately
devoting-- so much energy
to solving that problem.
What worries me about
this is that there
are lots of other
challenges here
that are not even on
anybody's radar screen,
and that could lead us to
inadvertently take these tools,
import them into
a policy setting,
and accidentally make things
worse, rather than better.
And I think the solution to
those sorts of problems--
they're not really about
machine learning engineering.
I think progress on
those sorts of problems
really are going to require
taking these amazing machine
learning tools
and combining them
with insights from social
science and behavioral science.
And I think the
payoff to getting
that right is potentially
enormous positive
social impact.
Thank you very much.
[APPLAUSE]
AUDIENCE: You mentioned
that you do things
or you're able to do
things like reduce
racial disparities in bail
setting and things like that.
And I can understand
where there's
been obviously a
lot of consternation
about accidentally encoding
demographic characteristics
into a data set that didn't
otherwise express them.
But there may be a
high correlation,
and so you could say low-income
areas, higher likelihood
of missing bail.
Therefore, imputed flight risks.
Can you comment on how those
things get controlled for?
I understand the human
bias element, obviously,
and snap judgments,
on the one hand,
is probably a very bad thing.
I think it's less
well understood
how ML can be made to be more
fair in a way that explicitly
excludes those sort of
inadvertent characteristics.
JENS LUDWIG: Yeah.
Yeah.
It's a great question.
I think one of the challenges in
my mind, in taking these tools
and applying them to
policy settings now,
is that there is a
wedge between what
I think the statistics would
tell us about what promotes
fairness and the way that our
current legal structure is,
which were built around
dealing with human predictions
and decisions allow.
And so I might
turn your question
and turn it on its
head just a little bit.
So if a human being
is making something
like a hiring decision
or a jail decision,
you would say we ideally
would have that human being
completely blinded to defendant
race or job applicant race--
like, for sure that's true.
I think it's less
self-evident that that
is true for the algorithm.
And I think part of
the reason for that
is because like,
unlike a human being,
the algorithm has no
preferences and has
no capacity for
implicit bias the way
that the human brain does.
The algorithm
really is like a dog
that chases the car that
you tell it to chase.
Like it's just very
narrowly monomaniacal.
Let me give you a
stylized example
to highlight what I mean.
So imagine that we're
living in a city where
the rate at which whites and
minorities engage in crime--
the true offending rate--
is exactly the same.
Suppose a city is 50%
white, 50% minority.
Suppose that the police
never make a false arrest
to a white city resident,
but that half the arrests
to minorities are false arrests.
And the chances that you get
arrested, if you actually
engage in crime, are the same
for whites and minorities.
If I take that data
set and I give it
to an algorithm that
is blinded to race,
what's the algorithm
going to do?
The algorithm has no
choice but to assume
that each additional
arrest on your rap sheet--
suppose that I'm predicting an
outcome like failure to appear
in court.
And for the moment, let's
assume that that's not
susceptible to bias.
We can have a separate
conversation about that.
But just suppose that we're
willing to assume that
the outcome itself is
not susceptible to bias.
If the algorithm is blinded
to race, it has no choice
but to assume each
arrest on your rap sheet
is exactly the same--
no matter who you are.
And what's going to wind
up happening, in that case,
is that African-Americans
on average
are going to have
much higher predicted
failure-to-appear
risk than whites.
And if judges are
instructed by the law
to make detention
decisions based
on failure-to-appear
risk, we're going
to increase incarceration for
minorities relative to whites.
But now notice what happens
if I use a machine learning
algorithm that has access to
race and is allowed to mine
the data for evidence
of interactivity
between the predictors.
What the algorithm
does, in that case,
is it would
immediately recognize
that each additional
arrest for a white
has twice as much signal
as each additional arrest
for a minority in this
city for risk of FTA.
And in that case, if you
sort of think about it,
linear regression, in that case,
the slope will be half as large
for African-Americans
as for whites.
And the average predicted
risk in this scenario
will wind up being exactly
the same for whites
versus minorities only
with the algorithm that
has access to race.
Now, I was at a National
Academy of Sciences panel
last fall where a question
like this came up,
and every computer
scientist, social scientist,
data scientist in the
room was like, for sure we
need to think about ways
of letting the algorithm--
thinking about the conditions
under which the algorithm
should have access
to race because it
can, in some circumstances,
promote fairness.
And every lawyer in the room
just went completely bananas,
and you can see exactly why
the lawyers have that view.
And so I think this
is an example where
the technology is outrunning
the legal structures.
And I think that
this is going to have
to be a big area
that we're going
to need to think about
resolving in the future,
as these sorts of algorithms
become baked into our policy
systems more and more.
SPEAKER: Joe.
AUDIENCE: Thanks for coming in.
This is awesome.
You talked earlier about
adjusting the algorithm
to choose between either
crime rate or release rate.
And is it truly just
one or the other,
or is there a way to kind
of balance both of them?
JENS LUDWIG: Oh, yeah, sorry.
So let me I should have given
you a better graph for this.
One way to think
about this is you
can imagine like a
two-dimensional space
where you have crime
rate on one axis
and the detention rate
on the other axis.
And the judge's current decision
is a point in that space.
And what the algorithm
lets you do--
the algorithm winds up
predicting crime risk much more
accurately than the judge.
And so the algorithm
lets you have
less of both, less of one
holding the other one fixed.
So you have a range
of options that you
can choose from in
terms of how you
want to realize those gains.
So one extreme is I hold the
jail population constant.
I take all of the gain in
the form of reduced crime.
The other extreme is I take
the crime rate constant.
I realize all of the
gain from reduced jail.
Or I can pick any
point in between there
and get some of both.
And where you fall on that
potential outcome space
is entirely a policy
decision, not a social science
or a computer science decision.
AUDIENCE: Thank you.
AUDIENCE: Hey, so
how are you doing?
I'm one of those engineers--
computer scientist-- who is
going bananas because there
are a ton of concerns here.
And I've worked on
machine learning models,
and we know that data
makes a difference.
The data that we're ingesting--
it's pretty easy to beat.
I think Chicago
rates around racism,
right, around racist decisions,
around socioeconomic status,
et cetera--
I think Chicago is one of the
most racist segregated places
there are.
And maybe it's because I grew
up a few blocks from that jail.
I've just seen it over
and over and over again.
So the fact-- the dichotomy that
the algorithm can beat Chicago
and Cook County judge rates is--
it's great to see,
and it's great
that we're making
better progress.
But in and of itself,
I don't think actually
says that the algorithm
is the way we should go.
There's other human factors that
have to be taken in account.
So part of it is,
yes, the data's
messy because we have so much
embedded systemic racism.
And the last people I
want to turn more data to
is the Chicago cops
because of everything
that they do in the
city to our communities.
So there's a lot of danger at
play in how we use the data.
And the answer to
the previous question
is taken into account
the demographics are not.
And we know from Cathy
[? O'Malley's ?] book,
from the "Weapons of
Math Destruction" book,
that whether it's credit
card rates-- there's
other ways to proxy
race that end up
also embedding the same
conditions that led
to the situation throughout.
I understand that we
can do a lot better,
and algorithm could do better.
But I definitely want to
challenge the dichotomy that
says the algorithm and the
technology in and of itself
is better without taking into
account all the other embedded
concerns.
JENS LUDWIG: It's a
very fair question.
Here's the way that I
would think about it
is I think of the role as--
of people like me
and people like you,
the other people in the room--
is to build tools and honestly
and accurately describe
what they're capable of doing.
And in a commercial setting,
you build these tools
and then if you think they can
make money, you deploy them.
And if they're successful,
they keep going.
And if they're not
successful, they get scrapped.
In the policy setting,
it's different.
It's not my decision or
your decision to make.
It's like some larger
collective societal
decision about whether we want
to use these tools or not.
And I think in
this case, I think
it is a question that
is not easily dismissed.
So I come to you and
say, I have a policy.
Let's call it a black box
policy for the moment.
I have a jail system
that is 90% minority
in this large anonymous
city of 8.5 million people
that we're using the data from.
I can assure you that the city
population is not anywhere near
90% minority, OK?
I've got a black box policy that
can reduce the size of the jail
population by 42%.
Suppose for the
moment, we can get
judges to follow the algorithmic
recommendations perfectly,
or the black box policy record.
I've got a policy lever
that lets you reduce
the size of the jail by 42%.
Who benefits massively from
a 42% reduction in the jail
population in a world
in which the jail
population is 90% minority?
90% of the people that the
black box policy, in that case,
releases--
so benefit from this--
are minorities.
And then on top of that,
we can disproportionately
further reduce the racial
composition of the jail.
Now, it is true that
you can look at that
and say, yeah, but the
algorithm is not perfect
because it's still relying on
data that have baked-in biases.
Those are the same data that
we would hand to the judge.
I think the flip side is, the
alternative is the judge using
exactly the same biased data
in the current status quo
system that gives us a jail
population that is much, much
larger and has a higher share of
minority than it currently has.
And I feel like this is one
where reasonable people can
argue both sides of this.
And to me, it's not a
crazy position to say,
I will capitalize on
the fairness gains
that I have in
front of me, which
is the 42% reduction
in the jail,
and a reduction in the share
minority by 10 percentage
points.
AUDIENCE: Thank you.
SPEAKER: Time for
one more question.
AUDIENCE: I had a
question around also kind
of reusing data around the
police and things like that.
So are there projects that
you're doing to analyze arrests
and, like, the geographic
breakdowns, racial breakdowns,
of things like that
that the police are
doing versus kind of continue
this kind of work but just
on the opposite side?
And then also, I think you
may have touched on it,
but how you're using this ML
model in real-time with judges
and what you've seen with
that, if that legal system is
somewhat open to these tools?
Just to speak a little bit
more on that if you could.
JENS LUDWIG: Yeah.
Yeah, let me take the second
question for starters.
So we are working with a
large American anonymous city
of 8.5 million people.
To think about whether
there are ways to actually--
so the results that
I described so far,
we did that as an academic study
as a kind of proof of concept.
And the large anonymous city
said their policy decision--
you might disagree with it,
but their policy decision
was if we could really reduce
our jail population by 42%
without increasing crime, that's
something that we want to do.
And that's something
that we view
as a step towards
promoting fairness.
And so we're working with
them now to build this thing.
We're seeing sort of similarly
encouraging results in terms
of the predictive accuracy.
I think the frontier science
problem that we're encountering
now is, how do you give
this to the judges in a way
where they're able to learn
their comparative advantage
versus when the algorithm
has the comparative advantage
in an environment in which right
now there's not much feedback?
And so we need to build a bunch
of scaffolding around what
the judge is doing to make
sure that the risk tool itself
translates into real
better decisions.
Thanks.
SPEAKER: Well, thank you all for
coming that concludes our talk.
Let's give Jens a
round of applause.
[APPLAUSE]
