- Thank you all for coming.
This is pretty remarkable.
(laughing)
Let me formally welcome you to
the now-third annual workshop
on fairness, accountability,
and transparency
in machine learning.
My name is Soren BROH-GUHS.
I'm a postdoc researcher
at Microsoft Research,
and I just wanna take a moment to reflect
on the fact that this has become
a relatively large community.
This workshop started a little less
than three years ago as a
relatively intimate affair
at NIPS 2014 where a
colleague who, unfortunately,
couldn't be here today,
Moritz TOHR-ET, and I tried
to bring together some people
from within computer science,
but also, in law and
policy, and social sciences
to kind of reflect on these questions
that were becoming urgent
policy questions, but also,
I think, interesting and
urgent technical questions.
And that conversation,
and the initial meeting,
actually led to a kind
of ongoing discussion
among a number of people who
then later become members
of the organizing committee,
all of whom, which the
exception of Moritz,
are actually here in the room.
We staged another event
(speaking faintly) last year
which kind of brought
together a number of people
who had been working on this
topic, largely, in Europe
and in East Asia to try to bridge
what is sort of becoming
an American style,
North American style of
conversation about these issues
with a longer-standing
conversation that's happening
in Europe and elsewhere.
And I don't think I have
to say that this event
takes place at a pretty unique moment.
I did not anticipate that we'd be having
this conversation in the
particular contexts of today.
And it made me think that, perhaps,
this wasn't the right time
for this conversation,
that there are more basic
issues to attend to.
But I actually think
that this is, in fact,
a great time to be
investing in these issues,
and to be doing so, perhaps,
not only as questions of politics,
but questions of technical practice,
of kind of basic research questions
within computer science.
And I'm really, really quite excited
about the program that we have for today.
There's a number of different things
over the course of the day.
The first will be an opening panel.
We'll also end the day
with a closing panel.
But in between, we have a couple
of interesting different things.
We'll have a spotlight
session which, I guess,
you can see here on the slide,
which is basically kind
of a rapid-fire sequence
of talks relating to
what the kind of group
of people organizing the event perceive
as some of best published
work in the past year.
Then, we'll have morning
and afternoon sessions
which will feature,
both, some existing work,
presentations of existing work,
but a lot of new work, as well.
And we've tried to pair these in ways
where they'll speak to each other,
where the different
research that we present
addresses each other.
We'll also have some, this
is actually a really exciting
last-minute addition to the program.
If you take a look at, actually,
let me see if I can do this.
well, anyways, if you're in New York City,
we have a special regulatory agency called
the Commission on Human Rights,
which has interesting legal authority
on topics related to
discrimination and bias.
And we are fortunate enough to actually
have the chair and
commissioner of that agency
here today to actually address the group.
That will be a short
address right before lunch.
She'll share some thoughts
about these issues
and describe how the agency might be able
to address some of these things,
hopefully, I think, in
partnership with many
of the people in this room.
The other thing we'll have during lunch,
which will be catered,
so everyone can stick around for lunch,
is an excellent poster session.
So, perhaps, you saw, as you walked in,
there was a room on the
opposite side of this room
which is actually a
nice space for posters.
And so, we'll have nine posters
and we encourage everyone
to have some food.
About 30 minutes after we have lunch,
during that same hour and
a half break for lunch,
we'll then have an hour-long
session for the posters,
and I'd highly encourage
everyone to stop in to see those.
So that's really the course of the day.
I just wanna say a few
quick thank-yous before
I hand it over to the incoming panel.
So first, I just wanna
thank my co-organizers
who have been incredibly
helpful in putting
this all together, both, this
year, and previous years,
and in just helping to kind
of foster this community.
We also had a program committee
for the first time this year
which included many people in this room,
and we're very, very
grateful for their hard work.
I also wanna thank New York Law School,
New York University of Law School,
for providing us with this
venue and, specifically,
I wanna call some people.
Nicole Archer, you've probably dealt with
when you first walked in.
Some student volunteers,
YOO-LEE and Casey,
also, who you'd met when
you probably walked in.
Amanda LOH-DOW-SKEE, who
I think may be somewhere
in the room, as well, who
was helpful in putting
this all together.
And Jason Schulz, who
was really the person
who made it possible to secure this venue.
I also wanna just quickly
thank our sponsors,
which includes the Data Transparency Lab,
who's events took place
yesterday and the day before.
And I wanna call out Augustine,
who was very helpful in trying to arrange
a space for us at Columbia before we had
to actually relocate.
And finally, I just wanna
also thank two other sponsors,
major sponsors who really had provided
some important financial support,
which is Microsoft, where
I am currently employed,
and Google, as well as some
other folks in the room.
So, with that, thank you all for coming.
It's going to be, I think,
a really interesting
and engaging day.
Please, if anyone has any
questions, pull me aside.
And, if you don't find me,
there will be someone
stationed at the entrance
all day long, too.
Thanks very much.
I'm now gonna pass it over to Dana Boyd,
who many of you, I'm sure,
know she is a principal researcher
at Microsoft Research,
and also, the founder
of Data and Society.
She is going to be the moderator,
but also, a panelist in her own right
on the opening panel
and (speaking faintly).
(audience applauding)
- I wanna ask the other panelists to come
and join me up here.
So I am honored and delighted to be here.
I am so excited that
there's so much energy here.
Our goal for this opening
panel is to get people excited
about what is FAT ML?
What's the state of the field?
Where are we going and how
should we be thinking about it?
And so, the way that this is gonna work
is that I'm gonna ask my fellow panelists
to basically provide some
provocative opening statements
to sort of see where
they're at and to actually,
in many ways, challenge
you to think critically
as you engage throughout the day.
And then, I'll ask some questions
and we will open up to questions.
So this will be a good opportunity
for everyone to engage.
I say that in advance
because you should feel free
to write down your
questions and think about
what you wanna ask, and
to note that I'm not
gonna let you ask them from the floor.
You've gotta actually go to a microphone,
because we do have a live stream,
because this has, indeed,
exploded far beyond
the basic interests of this room,
which is kind of exciting.
So first, a moment of introduction.
To my right is Sorelle Friedler,
who is an assistant
professor at Haverford,
and is also visiting,
she was a fellow with me
at Data and Society.
We've played for a long time.
Next is Rayid Ghani from
the University of Chicago.
And then, we have Cynthia Rudin from MIT.
- Duke.
- Oh, Duke, sorry.
Oh, sorry.
I didn't get that update.
Thank you.
From Duke.
I'm sorry.
(speaking faintly)
And they'll each tell a little bit more
about where they're at and
where they're starting from.
And so, I was asked by someone
to also provide some of my
own provocative statements
to the room as part of this as, you know,
a quasi member of the panel.
So I'll get started first,
and then, turn it over
to my fellow panelists.
The two provocative
things I wanted to offer
to this room is, first, the
notion that transparency which,
obviously, is the T in FAT ML,
as it's currently constructed outside
of the technical community,
has become a red herring and,
in many ways, is going to be
a problem for us, as a field.
I'll come back to that in a second.
The second comment that I wanna make
is that machine learning
requires a form of precision
that society isn't really prepared for,
which complicates the tension between
the FAT and the ML.
So let me start with that
notion of transparency.
So what is happening is that transparency
is moved within a broader
notion of accountability.
In fact, they often come together,
where there's the assumption
within broader political discourse that,
if we have transparency,
we will have accountability.
And that becomes a
challenge in its own right
because it ends up being this moment
where we obsess over just
achieving transparency
without really realizing the relationship
between transparency and accountability,
something that the technical
community understands
how to separate, but is
not actually separable
within the broader political element.
Transparency of data and algorithms
is often assumed to be
the desired end goal,
and many people in this
room would be like,
of course not, that's insane.
But part of what's at
stake right now is that,
as that notion of
transparency starts to go out
and become part of broader environments,
I would argue it's becoming a distraction.
What ends up happening
is we get so focused
on the mechanisms that are about trying
to achieve transparency that we lose
a broader picture in what's going on.
And, importantly, we start to lose
the notion of trade-offs.
What are we actually trading off
when we make some of these decisions?
The other thing that I
sort of struggle with
on a regular basis is when and where
is transparency a new mechanism
for enabling manipulation?
And I think that there's
an interesting moment
to think about because a
lot of what we're talking
when we talk about
accountability and transparency
and fairness are these
moments where we're trying
to remedy against moments of manipulation
for conflicting value statements.
This is where, I think, we get to have
an interesting conversation about values.
To my second statement
that machine learning
requires a form of precision
that society's unprepared for,
a lot of it is we've started to see
some of these really fascinating debates
about what is necessary
to even define fairness?
In fact, there's so
many people in this room
who've been having a really
good conversation about,
you know, the mathematical
definitions of fairness.
And, you know, warning for anybody
who isn't familiar with the social science
and philosophical literature,
that's been debate for thousands of years.
(audience laughing)
So if y'all figure it out, please,
that's awesome for everybody.
But note that this is a
very contested term because
what's at stake, of course,
is that we're grappling with values.
And we're grappling with values
that the rule of law has
long tried to figure out
how to formularize in
different sorts of structures
with huge challenges.
And one of the things that's intriguing
about how the rule of law
tries to operationalize this
is that it often leaves
room for flexibility.
That room for flexibility is, of course,
where huge challenges of
bias and discrimination start
to enter the picture, where values start
to enter the picture.
And it's one thing, my
guess is many of you
in this room have jaywalked
in the last seven days, right,
and it's a really good thing
that you didn't get fined
every time you did something illegal.
Because, if you've actually looked
at what you've done recently,
there's a lot of illegal
activity in there.
And some of it, you don't even realize.
My favorite ridiculous
law is you cannot bicycle
in a swimming pool.
Who knew?
(audience laughing)
Now, I'm sure many of you
haven't tried that this week,
but it's an interesting
moment to think about
those little subtle elements.
And I say this because we require
a level of precision when
we're putting something
into code that laws often
left room for fuzziness.
So what is the kind of fuzziness
that we wanna start talking about,
and what happens when we
start to go to precision,
and what are the consequences of that?
Because, in some ways, fuzziness is where
we actually allow for a
meaningful sense of judgment,
and the questions of fairness often come
and become human questions.
So as we think about what the role
of the human in the loop is,
a lot of it has to do with that fuzziness.
So those are my two
provocations for the day,
and I'm gonna turn it over to Sorelle
and see what she's got.
- Thanks.
So where you sort of ended,
thinking about questions of fairness
and how that's long been contested
within social sciences is sort of where
I wanna pick up.
Because, as Dana mentioned,
you know, within this community,
we've been thinking a lot about
what it means to be fair,
and what the appropriate mathematical way
of framing that question is.
And there've been a bunch of
different answers to that and,
you know, they have there various nuances,
but I'm gonna sort
broadly characterize them
as being in two basic camps, right?
One is sort of, that kept coming,
that's sometimes known as group fairness
or nondiscrimination,
and it's coming out of,
I would sort of place it
in this historical context
as coming out of the
Civil Rights Movement,
of saying that it's important that groups
coming from different
backgrounds still have
the same opportunity in the world.
And these are often sort of
outcomes-focused measures.
So these are measures that sort of say,
if we find that black
people and white people
are hired for a job at the same rate,
then we'll consider
that to have been fair.
And if we find that those
rates are largely different,
then we'll say that that was unfair.
On the other hand, there's
a whole body of work
that is more concerned with
the rights of the individual,
which is, sometimes, in our community,
even termed individual fairness.
And these are ideas that
are saying that people
should be judged by
their individual merits.
And what I wanna draw out here
is that I think that, obviously,
both of these options are value choices.
And I think that, on the group fairness,
and on the discrimination side,
people are often more clear about the fact
that these are value choices
and a little more transparent saying,
these are my values, and now, I'm going
to mathematize them and go
ahead and try to optimize
towards this definition.
I think that people in our community
on the individual fairness side,
this is my provocation,
have been a little bit less clear
about stating that these are
also value-driven assumptions.
And I think that, you know,
I hope that we can all
state our value upfront.
And I think that the sort of
subtle values stated there
is that the data is correct, right?
That, when we are given a, you know,
CSV of a bunch of data,
that it somehow represents truth, right,
and not just, you know,
some observation of truth
as it was written down in, potentially,
noisy measurements into
some spreadsheet that,
potentially, left off some stuff,
and included some additional stuff.
So you can sort of see
where, on the values side,
I land on this, but I
think that it's important
to sort of be transparent
about that values system
and to say, you know,
if I'm going to go ahead
and choose this set of definitions,
what I'm implicitly assuming
is that my data's correct.
And then, if we all agree
that the data's correct,
then we can all sort of move on and say,
you know, let's gather
around that definition.
And, if we don't agree that it represents
some underlying truth with respect
to the task that we're
trying to determine,
then maybe, we should come
to some other sort of remedy.
But my hope is that we in the room
can start to move the
conversation to that place
so that we can actually
move it forward beyond that.
- Thank you.
Rayid?
- So, as the minority on
the panel (speaking faintly)
in many ways, I'm sure all of us
can take different sides of this,
so I'm gonna take a different side.
I think, you know, we're all here
assuming that we're so influential
and so powerful that all these people
are making decisions
based on machine learning
and data and they're so rational,
and we have to stop them from
being accidentally unfair
or accidentally, you
know, non-transparent.
Well, the reality is, they're
not doing much with data,
especially, if you're looking
at the large companies,
and hiring biases, and
Google and Facebook, true.
If you're looking at people
that are being affected,
you know, public health,
and criminal justice system,
and people being impacted by the police,
none of those people are using data
to make any decision.
So we're worried about
fairness and accountability
and transparency when
we should worry about
slightly better use of data.
So when I start doing these
things, part of me wants
to be a cheerleader,
just trying to get them,
just use any data.
I don't care.
Just start being better than today.
That's the baseline.
Better than today is the baseline,
with constraints on, and the other side
is trying to scare them into,
you know, when you start doing that,
you start running into the issues
that Sorelle and Dana talked about,
because the people who
are doing this work today
with data are not trained
to think about those issues.
So they take, you know, IAD,
and data is perfect
and (speaking faintly).
So kind of, I have this, you know,
schizophrenic personality where I'm trying
to scare them and saying,
don't do too much, but do a little bit.
And I think it depends
on what group you're in,
you wanna take one or the other side.
So I think it's important for us
to not just go to them and say,
hey, data is scary.
If you use data, you're gonna make
really bad decisions.
It's gonna be horrible,
so don't do anything.
Because that's how they take it, right,
if you sort of talk to
governments and nonprofits.
A lot of them are really
scared about using any data,
which means they're doing things
that are unfair accidentally,
that can't be audited,
that can't be measured.
So that's one thing, I think.
I think, the second
thing is I think we have
to be a little more explicit in what
are we trying to protect, right?
So the way to think about it is, you know,
for a long time, we had kind of control
of what data we collected about people.
And then, from there, we moved on
to what inferences?
So everything on privacy in not,
so security and access control
is typically around, so
what data's collected
and who can access it?
Then, there's inference control, right,
where what can you infer from that data,
and that's sort of, most privacy is.
I think, what we're talking about now,
you know, I'm assuming most
of us don't care about,
well, some of care about
fairness and transparency
as a theoretical construct.
But most of us care about
how that affects people.
So what we wanna do is
figure out action control.
Who is being acted upon fairly and not?
I can make unfair inferences about people,
and I can't stop anyone from doing that.
But what I can do is
control the action, right?
So the jaywalking example Dana's giving,
yes, there's some fuzziness there.
But, if the probability of
somebody getting a ticket
for jaywalking is different for, you know,
white people versus black people,
that's the thing we need to control.
We can't control some of the other things.
So I think we should try
to be very explicit on,
let's not try to stop people
from doing everything.
Let's try to get them to figure out,
you can do whatever you want,
but at the end, you need to be transparent
on some of these things.
And I think, one thing
you can be transparent
on is this exactly.
It's what's the probability
of action, right,
that's just from the precision
that you can have arranged
in a distribution and all
of those different things.
And then, I think, the third piece
I wanna sort of prompt
people to think about is,
you know, most of us in this room
are not gonna be affected
too dramatically,
again, it might be an unfair statement,
by a lot of these things
we're talking about, right?
So maybe, we should talk to the people
who are being affected by all of this
instead of us making the rules
and us sort of asserting,
this might hurt people.
Well, maybe, we should ask somebody
who's gonna be affected by it,
you know, this could be unfair to you,
but it could help all these people.
How about you tell us
what your preferences are
and we design some of these policies
that are much more collaborative
and much more focused on
the people affected by it
than the people designing them
and kind of taking
about them in this room?
- Thank you.
- Okay, I'm gonna follow
up on some Rayid said,
which was that people are not using data
to create models, because
they just haven't done that
in the past.
So they've started to do it,
but the way they've done it has been
a little bit alarming, at least, to me.
The government started hiring companies
to produce predictions for them,
and the company gets paid every time
it makes a prediction.
And these are black box
predictions because,
otherwise, the company
wouldn't get paid, right?
So should I give this
person parole or bail?
Should I give the person bail?
So they hired this company
and the company says,
"We think this person is dangerous,
"so you should not give them bail."
And then, since we can't figure out
what exactly the company is doing,
and their predictions are clearly unfair,
there's a lot of documentation on this...
I find this very disturbing.
Why is our government
paying this private company
to produce predictions
that nobody can explain?
There should not be a
business model for this.
We need to erase that
business model and, instead,
create a business model around designing
transparent models or, at least,
models that you can explain.
I mean, I agree that
transparency is not the same
as accountability but,
if it's not transparent,
it's very difficult for
people to be accountable
for their decisions.
You have no reasons for that.
Yeah, one thing I should say, also,
is not only is accountability
a little bit different
than transparency, causality
is different than transparency.
I work on designing transparent models,
machine learning models and,
every time I put up a model
that people can understand,
all of the sudden, they start protesting,
which is great, because it means that they
can understand what's going on.
(audience laughing)
But then, once they sort of understand
what the model's doing, they go,
"Oh, well, does that
mean, if I change this,
"then the prediction is different?"
I was like, "No, no, no,
you're not allowed to do that
"because it's not a causal model."
So you have to actually,
you have to think about causality
when you're designing these things
because you realize that people
are gonna misinterpret what they are.
The other thing I wanted to talk about
is computational hardness.
Transparency in modeling
is tied inherently
to computational hardness.
If you wanna put a constraint
on like a fairness constraint,
you have to be willing to solve
a computationally-hard problem.
And because a lot of people in this field
are not willing to do that,
it sort of pushes the
computational hardness
onto the users.
So then, you have doctors
and people designing
criminal justice models who are trying
to do manual feature selection because
they can't deal with the
computational hardness problem.
And I feel like that is our fault because,
you know, computational
hardness is something
we should handle, not to
push it onto the users.
Okay.
- Thank you, guys.
So one of the things I'm
hearing from each of you
in different ways is
this moment in which data
is understood or not
understood in a relationship
to broader social issues as
we're seeing it play out.
Rayid, you made it very clear that people
are not even knowing how to use data.
They don't understand it, and so,
they're going with their gut in many ways
in these environments.
So it's a moment where, can data be just
a little bit better?
Sorelle, you've talked about this tension
that we have between group fairness
and individual fairness and this way
in which we don't even
necessarily understand
how to take the data
that we have to situate
in that model.
And, Cynthia, you know,
your point about causality
is a reminder that we don't even have
really basic mathematical literacy,
let alone more sophisticated
technical literacies.
So how do we think then about the role
of the efforts in FAT
ML which, in many ways,
are technical efforts
with a social implication?
What is the role of this community
to grapple with the
ways in which, frankly,
as a result of this, a lot of this stuff
can be misused?
A lot of the kinds of implementations
and the tools that we see
being built can be misused
if people don't know how
to read the material,
if they're going to misuse it because
they're just gonna black box it,
or if they're going to do it without
even necessarily thinking through values.
How do we challenge that?
- That's a good question.
(audience laughing)
- So, I think, one, I
don't have a good answer,
but one way of, at least,
the way I think about it is,
just like when machine learning and a lot
of technologies get started,
they don't get started from, you know,
governments and people doing stuff.
They get started from people
who have the resources
or the luxury of having the data,
and the luxury of not making
too many important decisions.
Lots of micro-decisions.
What ad do I show people?
Who cares?
But the benefit of that
is you've got tons of data
and you've got tons of, I mean,
you do care because it does
affect people in this room.
So don't take anything I say
too literally, by the way.
(audience laughing)
Because I could argue the other side.
So I think one thing we
could do is use those,
you know, there must be a use
to those companies, right?
A lot of people here work for those.
Can we use them to test
out these ideas to see
what needs to be developed,
how do they work,
as opposed to going directly
to the criminal justice world
and saying, "We're gonna make
you fair and transparent."
Well, first, we have to make
them slightly more rational.
And then, we have to make
them fair and transparent.
But, if we go to the people
who claim to be rational
and start using them as a testing ground
for fairness and
transparency accountability,
and then, you know, I don't know if things
will trickle down, but
I think that's one hope
that I have is, let's start playing
in the sandbox that those
organizations already have
and figure out what
frameworks need to be there,
what challenges need to be solved,
what are the algorithms
we need to develop,
what do we need to empirically verify,
and then, go down to the problems where
it's really sensitive.
Because you also don't
wanna experiment too much
in some of those other domains.
Because you wanna go there knowing
that it's somewhat safe
and that your experiments
aren't gonna be unfair.
So that's kind of my view,
is use the large ad search, you know,
use those domains, and
use them for testing,
and we get something
good out of them, right,
and then, push them down to other areas.
- So, I think, another thing
that comes up from this,
so FAT ML is values-first, right,
and that ends up being the frame
of so much of what this
entire community's about.
But there's some implicit assumptions
within those values,
that those values are,
in many ways, progressive.
That those values are, in many ways,
thinking about broader societal elements.
And yet, Cynthia, you pointed out,
one of the big challenges
is that we're also dealing
with a set of values that
are economics-first, right,
business models-first,
which is, in some ways,
where the challenge is occurring.
There's also places where I'd say
that things are even more contested.
Folks don't agree on what
criminal justice should serve,
as a purpose, right?
There's very deep conflicts
within criminology
about what that even means.
So how do we start dealing and grappling
with conflicts in values?
I mean, one of the things which, I think,
you'd pointed out is
it's about making some
of this visible.
But part of what's also gonna come in line
is that there's moment
where those elements
are actually just going to
be, themselves, political.
And so, what is the responsibility
from this community
when we're facing areas
where we're talking about serious
and deep entrenched
conflicts in values when,
in many ways, that's not
necessarily what we're skilled
to be working our for people?
Rayid, you mentioned
about getting communities
to be engaged as part of this.
But there's also which communities, right?
Because I think that
this is what's coming up.
So maybe, Cynthia, starting
with you, it's like,
how do we think about that
where one of the things
is that we're coming up
against economic desire,
first and foremost.
How do we deal with that tension at play?
- Yeah, I don't know
exactly how to deal with it,
but one thing I've noticed is that,
all of the sudden, everybody's
doing machine learning
regardless of whether
they're trained in it.
I've noticed that, even at
universities I've been at,
all of the sudden, there's
machine learning classes
in several different departments.
Who's teaching these classes?
I don't know those people.
Why do I not know those people?
- He's got AI now.
- Okay, yeah.
(audience laughing)
(speaking faintly)
It's just, you know, something gets hot,
and then, everybody decides
that they're qualified
to do this, like this company
that's producing these predictions.
What the heck are they doing?
Are they trained in machine learning?
I have no idea.
Maybe, they just got some
off-the-shelf who-knows-what
and they're producing predictions.
I mean, it would be good if the community
would start putting out statements
that people could listen to,
you know, written statements.
- So part of it is you wanna
see the community articulate
its own values?
- I think so.
I think, that would help.
Because, right now, it's
just like this Wild West.
Like, oh, let's all do machine learning
and get money for it.
- Well, I think, it, ironically,
becomes accountability of
accountability people, right?
(audience laughing)
This twisted element of how do
we actually hold accountable,
you know, what are those values
that are being articulated?
- Yeah, and I feel like
I should jump in here
and say that, in fact, some
of that has started happening
within this community,
and there have now been
a couple of workshops
that have been held to
try to say, you know,
what are our principles?
What do we believe are principles
of accountability within algorithms?
And one of the results of
that was posted on the website
for FAT ML last night.
- So, everyone, I'm sure
you've looked at it.
(audience laughing)
If you have computers, now is the time.
Sorry.
No, but, you know, one of the places
where I think we've seen it most recently
is I wanna talk about
the ProPublica article
and the different reactions to it.
And I know we have a piece, or, at least,
one piece, maybe, two pieces
later today discussing
some of this.
For those who didn't see it,
ProPublica released an analysis
of a risk assessment tool that was used
in Broward County where they looked solely
at the output of it and looked
at the disparate impact
of the different analyses,
the black box, if you will,
and came to the conclusion
that black individuals
who were not likely to commit a crime
were more likely to be
given a high-risk score and,
therefore, face greater punishment through
the criminal justice system upfront
than white individuals.
The company itself, Northpointe COMPAS,
or, Northpointe was the company,
the tool was COMPAS,
sort of came out with a response and said,
"Actually, no, that's not
how we were modeling this."
And what unfolded was a series of debates
that happened, both, within
technical communities,
and there's some great
papers in the archive,
as well as all the way up
through the Washington Post,
trying to articulate out a
different definition of fairness.
And, maybe, Rayid, you
can speak to some of this.
This is like, how do we deal with what,
in many ways, was that
fundamental difference
in what should be done
with criminal justice that,
in many ways, was disconnected
from the communities because,
actually, what's at stake
is we're not even sure
who we're talking about in
criminology and, you know,
what is the role?
Is the community judges, is the community,
you know, those who have been
scored in different ways,
and how do we deal with that?
Because, in many ways, and
this goes to your other point,
which is that, judges aren't even sure
what to do with that data
half the time, right?
So they're not even necessarily using it.
But if we're pushing for
these things to be used,
how do we deal with that when
we're actually seeing that
there's not even agreement on this?
- Yeah, I mean, I think, that's,
so the last year or so,
I spent a lot of time with people
doing criminal justice work,
and again, I think the challenge there
is a lot of these systems are there,
the reason for them being
there is consistency,
not fairness.
Hey, judges are really,
they're subjective.
And so, if we give them
a consistent thing,
they will makes similar
decisions every time,
so you reduce the variance,
but who knows where the mean goes.
(audience laughing)
And so, I think there's part of that.
So there is this, for bail,
there is very few companies,
very few jurisdictions
are using that system.
Most of them are using, or
the more progressive ones
use the scorecard where
they score different things
by numbers and they have to add them up,
and numbers greater than X,
then you're high-risk.
Otherwise, you're not.
So what's happening is
there's all this, you know,
perception of using this information.
And then, you talk to the people
who are doing sentencing.
You talk to a judge.
So I was talking to
somebody in Salt Lake County
a couple of months ago and they said,
"Well, there was a time in the city
"where there was one of the
G20 summits happening and,
"because the G20 summit was happening,
"they were expecting to
arrest a bunch of people."
So they started releasing
people from jails
just pretty randomly just so
they could make space so that,
when new people are gonna be arrested,
we should start putting them in, right?
So we talked about, and
the reason that came up
was we're talking about
the data, and I said,
"Well, you be careful
because the same person,
"everything else being equal,
"two people have different probabilities
"of being sentenced into jail because
"of the capacity of the jail."
And when we're doing this type of work,
we never look at the capacity of jail.
We look at everything
else but what's happening
inside the jail.
So even those types of things where judges
are making these sentencing decisions
based on this information, is that fair?
It might be transparent,
(audience laughing)
and who are you be fair to,
and how are you dealing with that?
Sitting on the criminal justice side,
there are so many big issues there, right?
We've got a quarter of the
world's prison population
is in the US.
Most people who (speaking
faintly) prisons.
But most people who are in jails right now
have chronic health issues,
substance abuse disorders.
There's sort of all these other things
that we have to deal with.
And so, I think there's, you know,
on the fairness side, sometimes,
it's fairer to put somebody
unfairly into, especially,
in some cases, a prison,
because they have long-term
support services that
can actually help you
if designed correctly.
And sometimes, people judge.
They're looking and say,
does this person have
safe and secure housing
and a stable employment and,
if they don't have those things,
maybe, we should put them in a place
where they can get extra support
so that the outcomes for
them could be better.
So I think there is, when I'm
talking about being involved
in the community, there are the people
who are being impacted there.
So let's figure out what's the
best outcome for these people
together with these people,
as opposed to us telling them
what they should be doing.
And then, let's think
about sort of fairness
and transparency and
accountability in that framework
because right now, I
think, there are some,
criminal justice is very complex.
You know, some of the other
things are slightly easier.
But criminal justice in the
US is just a mess right now.
(overlapping discussion)
- So figuring out what
programs are gonna help people
is also an interesting
and important fairness
and machine learning problem.
There are a lot of
people trying to create,
assess a particular social program and,
in order to do that,
you need to collect data
and you need to know what,
it's a causal question.
How much is this program
gonna help this person?
- Warning to the audience,
I'm gonna ask one question,
and then, go to you.
So now is the time to actually go
and stand in front of that mic.
So my question for you.
Rayid picked up, Sorelle, this is for you.
Rayid picked up a notion
of consistency first,
and I'm curious how this squares away
with some of the group
in individual fairness
that you've been thinking about.
How do you position consistency first
as a framework which, in many ways,
is operational as on the ground,
with what you're thinking about
in terms of some of these
technical questions?
- Yeah, so I think my question there
would be consistency with
respect to what, right?
Because it's always gonna
be consistency with respect
to some measured information
about an individual.
And so, then you have to decide,
do you want to be sort of consistent
across races and what
does that end up meaning
given that people are coming
from very different contexts,
and that's gonna be reflected in the data.
- [Rayid] Con-FAT ML.
(laughing)
- Just wanted to add letters, huh?
- [Mark] So I'm Mark McCarthy.
I'm with the Software and
Information Industry Association.
I'm delighted to be here at this group.
We just published an issue
brief on algorithmic fairness
which I'd like to make
available to all you guys.
It's on my trade association's website.
And the theme is that the industry
wants to be part of an
ongoing conversation
on how to design fairness
into the algorithms
and how to monitor them so
that in the continuing use,
they remain fair.
That's the big picture message.
The detail is we can probably learn a lot
on how to do that by looking
at the regulated industry,
the people who've been, you know,
constructing statistical
models in housing,
and employment, and insurance, and credit
for many, many years.
And they try to handle this fairness issue
through disparate impact analysis.
They look to see if there's an adverse
disproportionate effect
on a protected class,
whether there's a
business reason for that,
and whether or not
there's a model that does
a better job.
I do think that way of
structuring the issue
makes a lot of sense for a lot of people,
especially, the policymakers,
the people that I deal with,
and it seems to focus the
issue on the real problem
where help can come from
the technical community,
which is where is the
alternative to the policy,
or procedure, or algorithm that's in place
that might be able to do a
job that's almost as good,
but not quite?
There's a quantitative trade-off there,
and I think the world would be served
if that kind of trade-off
could be examined
in more detail.
You won't solve it.
Once you get to the
point where you've done
a disparate impact analysis and there's
a compelling business need,
and there's no alternative
that's any better,
you might be in a situation where you
just have to make a value choice.
Is this the kind of thing
where we think accuracy
is so important that we cannot let
any other value intercede,
or do wanna put some
constraints on accuracy
in terms of providing for more fairness?
And there are lots of
examples in that area.
- I think, let's let folks respond.
Folks wanna respond?
- Yeah, so there has actually been
a whole lot of work on
questions of trade-offs
of fairness and accuracy
from within this community,
and the point that I wanna sort of pick on
a little bit there is that, I think that,
when we're all, and I've done
this in my own work, as well,
but so when we're all sort of thinking
about trade-offs between
fairness and accuracy,
I think, we also need
to be aware of the fact
that that is usually
actually really a sort
of consistency matter,
consistency with respect
to our previous decisions,
and not necessarily accuracy in the sense
of representing some fundamental truth
about what the correct
decisions were, right?
It's really about sort of
replicating historical decisions,
and that's the accuracy
that we're referring to
when we're talking about this trade-off.
- And, even to the extent that, say,
you're doing some sort of a trial
and you have a control group
and you can do accuracy in the future,
it's still, I mean, accuracy
is a pointless measure, right?
We're saying accuracy means something else
and we don't really have,
so let's say you wanna assume accuracy's
the right metric and
you've got 88% accuracy.
So you've got 12% error.
Error on whom?
What kinds of people?
Are you making equal errors on all types?
So I think we can simplify
by calling it a trade-off between accuracy
and something else, but
it's not that simple.
It's much more complicated
than that simplification
that we often use.
- Thank you.
Just a warning to those of you in line,
do you see that camera right behind you?
It's staring at your head.
(laughing)
Thanks.
Go ahead.
- [Michael] I'm Michael
VEHL and I'm a researcher
at the University College London,
between there and the
Government Office of Science,
and also, a researcher
at the Royal Society
on Machine Learning,
streams for science policy
and data governance.
So I do work on what machine learning
and ideas towards responsibility aims
and responsible practices look like
in the public sector right now.
So we're going and interviewing people
across the public sector.
And what you find is
there's a lot of focus
on the decision of support
and how you actually design these systems.
A lot of them are designed
in-house in Europe
and the UK rather than contracted out,
or they involve contractors.
And there's concerns that
things like releasing
the logic behind the
models will make people
build inaccurate mental
models of how these are used
and start to judge on the risk scores,
on the factors rather
than the risk scores.
And there are also solutions
people are proposing
like creating, using these risk scores
just to order lists that
are currently unordered
to help people make decisions.
So a kind of question would be,
should we not be focusing
more as a community, as well,
on how this decision support is presented,
and what actually empirically happen
when people work on it,
and what is the best way of
presenting decision support
to enable fairness, and
understanding, and judgment.
- Cynthia, do you--
- [Cynthia] Yes.
(audience laughing)
We should definitely focus on this.
I'm working with some visualization, guys,
that, I suppose, can help it.
I mean, I don't do user
studies, personally,
but I think that there's definitely
a decent realm of research to be done
on how these things are used
and how people interpret them.
- Yeah, and I would say that I'm aware
of some folks in the room who are starting
to do that work, right?
- So all of you find him afterwards.
(laughing)
- [Man] Morning, it's
been a good panel so far.
Here's my open-ended question.
I wanna hear your thoughts
on feedback cycles.
So a lot of definitions of fairness
are one-off definitions.
We have a single example, a single person,
what is a fair prediction for that person?
But then, there's societal
feedback issues and, often,
policies made to try to address them.
For example, affirmative
action might be one
where you try to say,
oh, we're going to make
what could be considered
a slightly-unfair decision
for one individual, but that then is going
to have a long-term impact.
- Sorry, what's your question?
- [Man] How should one model and think
about feedback cycles
when it comes to fairness?
- I mean, I'll take this
one for a little bit,
which is that, I think, this is one
of the most important things
that we don't think about holistically.
We talk about within small silos,
for example, search engine
optimization is a place
where we see some of
those feedback cycles.
We see it in these
discreet moments in time.
Obviously, many of the
people in the room know
Tanya Sweeney's example
which is often used
as an example around this.
I think part of what's
at stake as we start
to put these algorithms and
these data-centric systems
in place and, to Rayid's point,
they're mostly not in
place, but once they are,
they become a tool in which
you want to manipulate.
And, once you want to manipulate them,
that's where it's not just
the natural feedback cycles
that we start to pay attention to,
but the folks who have different interests
starting to feed into the feedback cycle.
And I think this is
something that, you know,
both, we have to think about technically,
but we also have to think about socially.
So, you know, into the
criminal justice context,
it's one of the biggest challenges
of predictive policing.
We know that where we send police,
they will increasingly arrest people.
And so, if we sent them to NYU frats,
we might be finding a lot of drug use
that we're not otherwise
paying attention to.
But we're not.
We're sending them to Harlem.
And so, this is that moment
of where are the data
that we're seeing getting
biased by the structures
that we've put in place
because we haven't realized
all of the various externalities
of what we don't see?
And so, I think, the
other important aspect
in all of this is to think about the data
that we see, as well as
the data that we don't see,
and really try to model within, at least,
conceptually model in your head,
what is the data that we aren't seeing
and how is this affecting the models
that we're building.
- And I would just quickly add that there
will be a couple of talks later today
that will be talking about
this when we're done.
(speaking faintly)
(laughing)
- [Richard] My name's Richard Johnson.
I'm an independent consultant
around data analytics.
I work with city agencies,
nonprofits in this space,
and I'm firmly in Rayid's
camp where it's just like,
the start is really using data better
and being more rational in the systems.
But I also very much appreciate the fact
that you need to have a conversation
around the ethics, the impact,
all those kinds of things,
and data scientists are
not conditioned, trained,
experienced, interested in
really assessing impact.
You know, their focus is
a little bit different.
So while I appreciate
that outside conversation,
I'm also very conscious of how,
if you can't discern
the difference between
the probable and the possible in terms
of machine learning,
Cynthia, you pointed out
the hardness problems
and things like that,
it can be very easy to see a lot of danger
where there isn't necessarily there.
So I would encourage,
if anyone in this room
has not ever done a data science model
or created a very basic
one, it's a lot easier
than you probably think,
which actually can also show you
how scary it is if people can
very easily create models.
I'm really interested,
though, with this panel,
your thoughts on this
idea between probability
and possibility and in terms of discerning
what are the actual threats
or the actual concerns from those
that aren't necessarily threats right now
and not the most concerning thing
that we should be focused on.
- Who wants to pick that one up?
- I guess, I'm a little confused as to
what types of threats?
Do you mean threats in the way the models
are gonna be used, I guess?
I'm not sure.
- Well, I think this is where we also see
the dynamic between, both, what
is statistically meaningful
and what is culturally meaningful.
But maybe I'll give it in a funny way,
which is that I'm fascinated by a struggle
that the National Oceanographic
and Atmospheric Administration has.
You would think that their data
is clean and simple, right?
But they know that they've
got massive problems.
They're got sensor problems.
They've got all sorts of things.
They're operating in probabilities.
Just think the weather
for a second, right?
The weather is not something that we know.
It's a probabilistic model.
So what does it mean then where they have
to transfer that
probability into something
that is about narrating
probably a possibility?
There is a hurricane that may possibly hit
your geographic thing.
And then, of course, what
happens when that gets political,
when you have have, oh,
let's just say a governor
deciding to turn it into a big fiasco,
and then, how does the public respond?
And so, I think, we get
these interesting moments
where we think about things in terms
of building a meaningful,
holistic, probabilistic system,
and we wanna be as
accurate and responsible
as possible to the data that we seek.
But the moment that
these things become part
of an impact story, or part
of a possibility narrative,
the have a whole set of
different implications.
And I think that's where it becomes
this interesting challenge for us,
is how do we convey those
models when we know that people
not just can't read causality,
they can't read probability, right?
So how do we deal with some of that?
- I think, we probably also wanna put cost
into our models, make them
real decision analysis models.
Maybe, the right thing
to do would be this,
but it's gonna cost us this much.
Maybe, we shouldn't do that.
This person maybe, like
you were saying, you know,
yesterday, this person
should've been in jail,
but now, we've run out of space.
- I mean, also, I think,
that's where the intersection,
in order to get these things into policy,
there is law in the middle, right?
We have to turn them into laws.
So a law's not gonna say maybe to how
you calculate probability.
But assuming the probability's correct,
you then have to put some
decision thresholds, right?
And so, that's where you
can start being precise.
Is my policy, you know,
so if your platform is,
I'm gonna only take
action if the probability
for this thing is greater than .79,
you can have the range.
But I think that's where
you start getting people
to be more explicit,
and if these decisions
are getting a little bit more automated,
it's riskier, but it also forces them
to define their policy,
and define the outcome
they care about, where the trade-offs are.
And I think that's a good
thing in the long term.
In the short-term, it might hurt us more.
But I think forcing government agencies
to be very transparent about
what is the policy there
that they have, not in the
hand-wavey hundred-page report,
but in a, here is the
probability I'll take action on.
Here's the value of the probability.
Here's who I wanna be fair to.
Here's who I'm okay being unfair to.
Because, if you're being
fair to some people,
you're being unfair to other people.
That's just gonna be, and it's okay
because we're defining that.
So I think a lot of it is how
do we right down these things
so that policymakers understand them,
and then, the people being impacted
understand them, as well?
- So recognizing the time,
I'm gonna asked each of you
to state your questions
as quickly as possible,
and then, we'll respond
collectively with closing statements
in relation to your questions.
- [Woman] Thank you so much
for the very interesting
discussion so far.
My question is coming from someone
who's outside the machine
learning community.
It sounds like a lot of
you are talking about
the machine learning community
as a single community,
a monolith, and I was
wondering if you think
that it's possible or
even healthy for there
to be a single framework
or principles put out,
like you'd mentioned, there's
some on the website already,
or do you imagine there
being multiple communities
that arise in the future,
a more Conservative community,
and a more Progressive community.
And then, also, different
frameworks depending
on different fields,
like a criminal justice
machine learning framework,
and so on, and so forth?
- Let me ask each of you, yeah.
- [Man] Oh, sure, yes.
My question is just,
are you seriously saying
that transparency's a red herring?
I can point to 15-20 students
currently in the room
who've made fantastic work to reveal
what is being done online,
who gave those results to, either,
computational analysts,
or social scientists who,
you know, look at the bigger picture.
I think, there's also a danger in trying,
as an engineer, to think about fairness.
I think there are people
with better training than us.
So I see myself as a
very, very strong opponent
to any attack on transparency.
So I just want clarification.
- Okay.
- [Woman] I was wondering, since it seems
that there's an inherent,
that machine learning
inherently gives more power to entities
that can collect lots of data,
that have large SHIH-BANDS capabilities,
and I was wondering if you think
that it's possible for
machine learning research
to be a democratizing force
and how to shape your research
around that sort of power distribution.
- Fabulous.
- [Daniel] Hi, Daniel
O'Neill for Carnegie Mellon.
I was wondering what should be the role
of machine learning in
quantifying and controlling
for human implicit biases?
So I'm sure that's a source of fuzziness
in human decision that's
not such a good one.
- Great, so I'm gonna ask each of you
to respond collectively for questions,
and this is the closing
statement, effectively.
Maybe, Cynthia, can you start?
- Sorry, I need to think
about some of those questions.
- I can actually jump in if you want.
So sort of responding
to the first question,
which was about the
potential for, essentially,
separating into different communities,
I guess, what I would
say is that I would hope
that we could actually sort
of remain in one community
even if we end up on different sides
of various value
propositions because I think
that that's sort of, as
a technical community,
it's really important that
we all sort of be able
to talk to each other and, you know,
think about different models
for what we mean by fairness,
and wrestle though the
technical implications together,
and I would expect that there
might be some people who,
you know, are occasionally
on both sides of the fence.
And I also wanna recognize that,
even within the technical community,
at this point, we're no longer
really just machine learning.
There are people in the room from lots
of other pieces of the technical community
and I think it's also
good that we continue
to speak about these issues across
those technical boundaries, as well.
- Okay, I think, I can,
I'd like to respond
to Daniel's question.
So he's asking about human implicit biases
and it would be really
interesting if the machine
could sort of have a discussion
with the decision maker.
You know, the machine says,
"Look, this person is actually low-risk",
and the human says,
"But, no, I really think,
"looking at this person
that they're high-risk."
And then, the computer could say,
"Well, maybe, that's because
you're sort of biased.
"This is the sort of distribution
"of what humans would think.
"But, actually, the risk
scores sort of over here."
And then, the human could say,
"No, but your data's wrong."
You know, something like
that that would allow
the combination of the two to make
a good decision together.
I think a system like
that would be really cool.
Good question.
- Rayid?
- Yeah, that's a great, I
think, on the community side,
I strongly feel we need to
have a single-ish community
because the work, and
it wasn't obvious to me
until I started looking
at criminal justice,
and public safety, and
education, and health,
and working with governments
across all of those areas.
All of these challenges are very similar.
And they're all very connected.
And so, if we all go off into our corners
and do these things separately,
we're not really gonna get anywhere,
because we need a lot
of help from each other
in convincing people, both,
in using these things,
but also, in kind of being
aware of what could go wrong
in one area versus the other.
So I think it's important
to deal with that.
I think, one of the
things we have to sort of,
so one kind of closing
thing is that, you know,
we're having a wonderful discussion here.
This discussion rarely, if ever,
happens in any, let's call it
a training program in machine learning or,
you know, grad school.
No grad student has
really had this discussion
as part of the curriculum.
And so, I think, one thing
I feel strongly about
is extending this into
having these things,
not as a separate course, because then,
you do if for one semester
program and you move on.
As part of every course.
One of the things we've tried
in one of the programs
we run is, every Friday,
we would have this
discussion about the phase
of the project you're in.
Let's talk about
implications of that phase
on all of these things.
If you're doing data collection,
fairness, and ethics, and transparency.
If you're doing modeling,
if you're doing evaluations,
if you're doing method selection,
if you're doing communication.
So I think that's kind of,
this needs to happen a lot more.
It definitely shouldn't be
three or four people here,
or even, this room.
And, if we can figure
out a way to, at least,
help people create these,
what is the curriculum
that we can embed, that'd be great.
- So, realizing we're out of time,
I wanna sort of conclude for a moment.
I totally wanna take up
the transparency debate
and I'm happy to talk to
everybody at lunch about this.
I think, the thing that I would say
to all of this is, you
know, one thing from
the social science community is,
don't assume that the goals
you're trying to achieve
will be what you achieve
through your intervention.
And I'll give you just a concrete example
for autonomous systems.
So, Madeline, I'm not sure if she's here,
she's work with Tim HOO-AHN, who's there,
and one of the things
they did is they looked
at the history of automation
in the federal aviation contexts.
And one of the things
that came out was the idea
of how important it was
to put humans in the loop
as we went to autopilot.
The planes you all flew on to get here,
for those of you who fly at all,
are actually, pretty much,
autonomous at this point.
And what that human in the loop means
at this point is that their main role
is to serve as a liability sponge.
They basically pick up where things
go terribly awry, except
they've been deskilled
on the job.
They don't actually do that work anymore.
And they only solve it when
there's a real big problem.
So that's where we create
these moral crumple zones,
that moment where, you know,
a car has crumple zones,
where we put humans in that crumple zone.
And their paper on this is beautiful.
But the reason I bring this up
is let's be really cautious as we go
with a lot of these values,
that we're not actually unintentionally
creating moral crumple zones throughout
these kinds of systems as we're trying
to achieve fairness,
accountability, and transparency.
And that becomes a really
important thing to say,
because I think our intentions are good,
but our outcomes may not play out the way
we would like them to.
With that in mind, it's
time for some more fun talks
from different people.
So thank you, and thank you to the panel.
(audience applauding)
(ambient chatter)
- Hey, folks.
So, obviously, we're sorting through
a few technical challenges up here.
But while we're doing that,
I wanted to invite the folks
who are in the next spotlight session
to come up here and sit
in these lovely chairs,
and I also wanted to give
sort of a brief intro
to the session.
Come on up, come on up.
It's fine.
So this spotlight session,
our goal here is, as it sounds,
to spotlight some of the work
that we've seen in this area
in the past year that's published
that we think are really shining examples
of some of the best work.
So we do sort of encourage you,
even though we've only given five minutes,
which is really only
a short amount of time
to tell you about the deep
work that they've done,
we do encourage you to go
and actually read the papers
and sort of grapple with the true extent
of the work that they're
gonna be presenting quickly.
(speaking faintly off mic)
(ambient chatter)
Okay, folks.
Technical challenges resolved,
so we're gonna try and do this.
I also wanted to say, for
folks standing in the back,
you're welcome to come
and sit in the front.
I know people don't often like to do that,
but you really are welcome.
Okay, and so, with that said,
thank you to this really exciting panel.
I'm looking forward to it
and I hope you all are, too.
(speaking faintly)
- Okay, so I'm gonna be
talking about, I'm Nati Srebro.
This is work with Moritz
Hardt and Eric Price,
trying to define and see how to work
with the nondiscrimination
and supervised learning.
And I don't think here I have to describe
why this is desirable to
introduce nondiscrimination
constraints into supervised learning.
What I do wanna emphasize
is what I'm saying.
So when we're talking
about nondiscrimination,
we're talking nondiscrimination not
in general decision-making
but, specifically,
in supervised learning.
So in prediction tasks when
we're trying to predict,
there's a well-defined target
that we're trying to predict
based on some collective attributes,
and we wanna do this prediction in a way
that's nondiscriminatory with respect
to some specified protected
attribute or a group
like race and gender.
So what we want is to
build a predictor for why
that's based on the features,
or maybe, even those protected attributes
in a way that this does not discriminate
with respect to the (speaking faintly).
And the main thing we're grappling with
is what does it mean in this setting
to be nondiscriminating?
So I have to say, when
I started with this,
I said, nondiscrimination
in machine learning?
All you do in machine
learning is discrimination.
(speaking faintly)
So the goal is we wanna discriminate,
but not based on the attributes.
Okay, so the kind of two baselines,
one is blindness, when you
just don't use the attribute.
It is, of course, not sufficient
because we can pretty
efficiently predict the attribute
from the other features, and then,
either, intentionally
or inadvertently use it.
And also, there is another issue here
which I think is important
to work which is,
even if we don't use them
really, not even implicitly,
then blindness isn't just an
issue of accuracy disparity.
A predictor might be much more accurate
in one group rather than another.
And this also can often
be seen as discriminatory.
So another option, I think,
is demographic parity.
So demographic parity,
which was also mentioned earlier today
says that we're gonna make the same,
the solution for
recommendations or predictions
is gonna be same in both groups.
So if we're gonna
recommend giving a model,
predict a no-default on a loan
for 60% of the white population,
we're also gonna predict
60% on the black population.
Another word to say it
is that our predictor
should be statistically independent
of the protected attribute.
There are two problems with this.
So one is it's too strict,
and I think people appreciate it because
remember that our task
that we're looking here
is nondiscrimination in the
supervised learning test,
where our test is to predict why.
We're not looking at whether or why itself
is discriminatory or not, right?
We're just looking at,
do we wanna predict why?
We wanna predict non-default of the loan?
We wanna predict appearance
at some later date.
And, if the correct predictor
actually correlates,
it's a bit much to
expect that the predictor
will not correlate.
In particular, we do wanna
allow perfect prediction.
So the perfect predictor,
which means that we
predict exactly the truth,
does not satisfy demographic parity.
And so, this precludes, for example,
giving loans to exactly those people
that we know exactly
are gonna non-default,
or not releasing on bail
exactly those people
that we know are actually
gonna not show up.
The other problem with demographic parity
is it's also too weak.
So it doesn't protect
from accuracy disparity.
So, for example, we're allowed,
according to demographic parity,
to give loans to all
qualified white people,
exactly to, among white
people (speaking faintly),
and black people, just flip the coin
as long as the bias of the coin matters.
So our definition (speaking faintly)
is that adding recognition of why.
So saying then that it's not
that the prediction has to be independent
of the particular attribute.
This will be independent,
conditioned on the truth.
So, in other words,
when you think about it,
is that we know that know the true outcome
tells us something about
the protected attribute.
But once we know the true outcome,
knowing the prediction should
not tell us additional information.
They don't give us additional information
about the pure outcome.
And those protect from accuracy disparity,
and those allow the perfect predictor.
So there's much more in the paper,
but I'm getting a signal
that this is the end of time.
So I'll just tell you what
you can find in the paper.
So first of all, this discussion on how
to computationally efficiently find
the optimally-correct
discriminatory predictors
to be nondiscriminatory
according to this definition.
There's some interpretation
in terms of our C-curve
and incentives structure.
And also, I wanna focus
on this last bullet here.
We talked quite lengthy about
the inherent limitations
of oblivious tests,
tests that are only based
on (mumbling) and demographic parity,
and like many others,
they're only based on
viewing the predictions
of a black box.
And we show some
non-identifiability results
saying that you cannot say
these two things and argue why
our definition is more accurate
in the definition, both,
(speaking faintly)
Thank you.
(audience applauding)
(speaking faintly)
- Great, good morning.
Okay, so I'm going to pick
up where that talk left off
and assume that you'd interpreted
all of that correctly.
(audience laughing)
So five-minute talks are difficult.
Okay, so the work I'm talking about today
is motivated by all these settings
that we've talked about so far.
The idea of using machine learning
for hiring decisions,
for lending decisions,
policing, sentencing, parole,
any setting in which we're thinking
about using machine learning algorithms
to pick individuals.
And individuals may have
some inherent qualities
for the task at hand, for example,
the expected revenue of giving
a loan to an individual and,
moreover, that that
quality of an individual
entitles high-quality
individuals to access
to some resource that you are allocating.
For example, people who are going
to make a bank a lot of money should
have more access to the loans
that are being allocated.
And all of these settings,
we're thinking of there being
some features which are
available to the bank
or the algorithm which the bank is making
a decision as a function of,
and the learning algorithm
is actually trying
to learn a relationship
between the observable features
and the qualities at hand.
And I wanna assume, just
as mentioned previously,
that these may be different
for different groups
because not doing so might actually allow
for other sorts of bias
to creep into the models
that we're learning.
So one source of bias in machine learning
that was briefly mentioned
in the Q&A before
is the idea that there may be some kind
of data feedback loop, right?
If I think certain people are low-quality,
I may never give them loans and never see
whether or not they would
have paid those loans back.
So we're going to extenuate this and think
about a particular model in
which this might be the case
and sort of discover what kinds
of fairness definitions
are reasonable here.
So, in particular, we
study a particular notion
of fairness which enforces that
machine learning algorithms
have to treat high-quality individuals
at least as well as
low-quality individuals,
and we study the cost of such a constraint
in terms of the learning rate
that's defined in a
particular technical way.
And I wanna focus on the fact
that this is actually saying something
about the process of learning a fair model
rather than actually learning through,
being there throughout that process,
rather than just taking a black box model
and turning it into
something that we think of
as more fair.
So the setting I wanna think about
is there being K, different groups,
each of which has some function
which maps quality, or maps
from features to qualities,
that these may be unknown and different
for different groups, right?
And on each day, I wanna think about there
being an individual from
each group represented
by some feature vector and
that the bank or, you know,
the algorithm's going to
pick one of those people.
One of those people's going to get a loan
on each day and, as a result,
the bank actually sees whether or not
that person paid it back.
I get some information
about the person I chose
and nothing about the remaining people.
And the goal in such a setting is usually
to maximize the expected average quality
of the people you're choosing.
So the definition that we
study is the following.
We say an algorithm is fair if,
given any failure probability,
with all the remaining probability,
for any sequence of features
that we've seen so far,
for all rounds, and for
all pairs of groups,
in the expected quality
of the person from Group I
at Time G is at least as
large as the expected quality
of the person from Group J at Time G,
I want the algorithm to treat I at least
as well as J.
And what I mean by that is
just some probabilistic thing.
The algorithm's going
to have some probability
of picking each.
I want the probability
to be at least as high
for I as for J.
So the first technical thing that I'll say
about this paper is that
this costs something, right?
Even in this setting
where I have no features
that I'm predicting as a function of,
the best learning rate I can get subject
to this constraint is higher.
It's a slower learning rate
than in the case without
such a constraint.
So that's what this says here.
Don't worry about parsing
the precise technical definition.
This, obviously, remains once we look
at a setting where we
actually do have features,
but precisely how much more we lose
in terms of our learning rate depends on
the particularities of the problem.
So, for example, one
thing we do say is that,
when this mapping between
features and qualities is linear,
we can get non-trivial learning rates.
But if the mapping is
much more complicated
such as the set of conjunctions, right,
we sort of are forced to
experience exponential diminishment
in terms of our learning rates.
So there's a great amount of related work,
some of which is done by the people
who are currently talking on this panel.
But I wanna emphasize
that this work primarily
focuses on sort of the process of learning
a fair model and being
fair in that process
rather than just taking a model
and modifying it to be fair.
So, in conclusion, our
work (speaking faintly)
particular new notion of fairness
which sort of embeds the
idea that high-quality people
deserve to be treated at least as well
as lower-quality individuals.
And one technical thing that I didn't
get any chance to talk
about is that an implication
of this is sort of that
you have to be confident
about relative qualities of individuals
before you can exploit
information about people.
And there's going to be
some costs to satisfying
such a constraint and it
will depend on precisely
the parameters that your
problem (speaking faintly).
Thanks.
(audience applauding)
(speaking faintly off mic)
- Alright, thanks.
So I'm gonna discuss the paper,
Combating Police Discrimination
in the Age of Big Data,
which is joint work with Sharad Goel,
Maya Perelman, and David Sklansky.
So this paper examines big data
and police discrimination in departments
with New York City stop,
question, and frisk policy
and discusses the legal implications
of a statistical technique that we call
Stop-level Hit Rate
Analysis, or SHR analysis.
So stop and frisk is a policy
of investigative stops,
which are also called Terry Stops
after the 1968 Supreme Court decision,
Terry Vs. Ohio, where
officers briefly detained
an individual given reasonable suspicion
that crime is afoot and conduct a pat-down
if they suspect the individual
is armed and dangerous.
Stop-level Hit Rate analysis
uses a statistical model
to calculate the ex-ante likelihood,
based on the information
available to the officer
at the time the stop is conducted
that a Terry Stop will be successful.
That is, it'll result in finding what
the officer suspect they'll find.
In other words, Stop-level Hit Rates
are numerical estimates of the likelihood
that the suspicion motivating a stop
will turn out to be correct.
So in our paper, we
outlined three applications
of SHR analysis.
So first, police departments
can use SHR analysis
to improve the efficiency and fairness
of their stop and frisk practices.
Second, courts can use SHR analysis
to assess whether a police department
is engaged in illegal discrimination
in violation of the 14th Amendment.
And third, courts can
use SHR analysis also
to assess whether or
not stops were supported
by reasonable and articulable
suspicion, on not,
in violation of the 4th Amendment.
So speaking to efficiency and fairness,
the plot here displays by race
the model estimated distribution
of Stop-level Hit Rates
for stops where the suspected
crime is criminal possession
of a weapon.
So the dash line here
indicates the 1% probability.
So you can notice that many stops
have a low likelihood of weapon recovery,
and this differs by race.
So whereas 20% of white stops have less
than a 1% SHR, 50% of black stops
have a similarly low likelihood.
So this suggests that if you eliminated
low-likelihood stops,
if you eliminated stops
where the chance of recovering
a weapon was less than 1%,
you could simultaneously
increase the efficiency
of your stop practice while achieving
greater racial balance.
So, similarly, SHR
analysis has three features
which should appeal to courts that have
been wary of relying
upon less-powerful kinds
of statistical discrimination.
So first, an individual
Stop-level SHR analysis
can provide evidence of discrimination.
Second, SHR analysis can counter
a common mutual explanation
for racial disparities in stop rates,
namely, aggressive policing
in high-crime areas
as the cause of observe
racial disparities.
And, moreover, because SHR analysis
can be used by police departments
to increase the efficiency of their stops
while simultaneously decreasing
the racially disproportionate impact,
failure to make use of the lessons
in SHR analysis may be evidence
of discriminatory intent.
So our paper also makes the argument
that the assessment of
the constitutionality
of police stop policies needs to consider
such policies as programs and
not just isolated occurrences
of individual stops.
In particular, the
constitutionality of a stop policy
depends on the reasonableness
of the program,
which depends, in part, on the hit rate,
and that's where SHR analysis comes in.
The costs and benefits of
a stop and frisk program
need to be considered in determining
a numerical threshold
for reasonable suspicion,
and we argue that it should weigh against
finding a reasonableness that the program
disproportionately
burdens racial minorities
or any other
traditionally-disadvantaged groups.
So, in conclusion, our paper
makes three broad points.
So first, the potential uses of big data
to make policing more
fair and more effective
are just beginning to be discovered.
New tools of police accountability warrant
reexamination of traditional
rules and assumptions
pertaining to legal
oversight of the police.
And Terry Stops should be analyzed
not as isolated interactions,
but as programs.
That's how they're implemented,
that's how they tend to be experienced,
and that's how to make the best use
of the tools for oversight
provided by big data.
Thanks.
(audience applauding)
(speaking faintly)
- Sorry, guys, we...
- So a quick little change
of the sequence here.
One of the people who is on the spotlight
couldn't join us in person,
but she actually submitted a video.
So I thought, maybe,
I'll put up the video.
(speaking faintly)
Hopefully, this will work.
- Hello, my name is Jenna Burrell and I'm
an associate professor at
the School of Information
at UC Berkeley.
I am here to tell you about a paper
I wrote this past year titled,
How the Machine Thinks:
Understanding Opacity in
Machine Learning Algorithms,
which was published with the journal,
Big Data & Society earlier this past year.
I wanted to just give some background
about the paper and try
to entice you to read it.
So I decided to write this
paper for a couple of reasons.
I was starting to see a
number of social scientists
participating in this really
interesting discussion
about the politics of algorithms.
My training's in sociology
but I have a background
in computer science, so I thought
it might be an interesting opportunity
just to kind of merge
those areas of expertise.
And, in particular, while the discussion
about algorithms is fairly broad,
I was really interested
in a particular class
of algorithms called
machine learning algorithms,
which were of kind of
increasing importance
in application in many domains.
One thing I'd noticed in this
sort of emerging discussion
about the politics of algorithms
was a sort of ambiguity
about what was at the source
of their opacity.
And I think, often, the
opacity of those algorithms
was really blamed on
corporate and state secrecy.
The assumption was that,
if you could read the code,
you code figure out how
things were being classified
the way they were and to
determine whether or not
there was some problem of
bias or a lack of fairness
in most categorization practices.
And from what I knew about
machine learning algorithms,
I suspected that was not really entirely
a fair assessment of what
the source of opacity
in algorithms was really about.
So I wrote this paper
where I argued that opacity
could really be divided into three types.
The first type of opacity is, indeed,
problems of algorithms being proprietary
or corporations wanting
to protect and keep secret
how their algorithms are written
because that's what sort
of gives them an edge.
And often, that can be a cover for things
like bias and algorithms.
I think, if you wanna read about that,
certain, Frank Pasquale's
book, The Black Box Society,
talks about that sort of thing at length.
Another type of opacity that seem
to be at the source of what
many people were talking about
had to do with the fact that
reading and writing code
is a specialist skill and not
everybody has that capacity.
Some ways of addressing
that might be, you know,
getting more kids and young people
to learn how to code, or
including learning to code
in higher education for
students who are not
necessarily majoring in or
studying computer science.
But really, the third form of opacity
which I wanted to argue
was incredibly important
and significant to consider had to do
with a mismatch between
mathematical optimization
in high dimensionality,
which is what you really see
with things like machine
learning algorithms,
and the demands of human-scale reasoning
and styles of semantic interpretation.
So let me say a little bit more about
what that means.
And this gets to the
reason I titled this paper,
How the Machine Thinks,
because I wanted to draw
sort of a distinction
between how a machine might
mathematically optimize data
and how a human would
try and make sense of it,
and understand, and come to some sort
of semantic understanding
of what those classification really mean.
So this centers around the
question of interpretation.
And, certainly, interpretation
is not just a concern
for people like me who consider ourselves
to be interpretivists.
That term is actually something
that many computer scientists
are concerned about.
So if you have a machine
learning algorithm
which uses, basically,
takes a massive pile of data
and finds some patterns and
regularities in that data,
those patterns and
regularities are, potentially,
useful whether or not
you can understand them.
But, for many reasons, it
may be really important
for a human to understand
why an algorithm classified
something one way or the other.
And, to explore that question,
it decided to look at spam filtering.
And the reason I was
interested in spam filtering
was because I'd spent a lot of time
in West Africa, specifically, in Ghana.
A nearby country to Ghana is Nigeria,
and many people know
Nigeria for the email scams
that originate from that country.
And so, I really wondered after some time
in Ghana whether, if
you're sending an email
from Ghana or Nigeria,
if you have a fair shot
at keeping your emails
out of spam filters.
Are you more likely, if
you're sending an email
from Ghana or Nigeria,
or making a reference
in your email to Ghana or Nigeria,
are you more likely to
see your emails end up
in the spam folder?
You know, legitimate
emails is, really, I think,
the important question.
So are Ghanaians and
Nigerians disadvantaged
in their ability to use
email to send messages
to people by virtue of
the fact that they're
from those countries?
And that to me, of course, would seem like
a big problem of fairness.
I won't tell you what
I found when I actually
ran some code and sort of
tested out this question,
but all of that is contained in my paper.
And I spent some time trying to describe
and explain how machine
learning algorithms
would work for a non-expert audience,
and to actually think about the design
of those algorithms, and
to think about the code.
And I think what you
will learn from reading
my paper is that there are limits
to the ways we can audit algorithms simply
by just reading the code,
and that there are some very
kind of hard inescapable
forms of opacity, specifically,
with these machine learning algorithms
that it's important to be aware of
and to consider.
I hope you'll read my
paper and I hope you enjoy
the Fairness, Accountability,
and Transparency workshop.
Thanks.
(audience applauding)
(speaking faintly)
- Alright, hi, I'm Kristian Lum
from the Human Rights Data Analysis Group
and I'm gonna be talking about some work
I've done with William Isaac
who is sitting right here
in that little striped blue,
striped shirt, purple shirt,
here in the front who is equally qualified
to answer any questions as I am.
So if anyone has any questions afterwards,
you can see me or you
can also see William.
So what I'm gonna be talking about today
is machine learning and, in particular,
machine learning as
applied to police records,
which is called predictive reasoning.
And, because I only have a
few minutes today to talk,
I'm going to asset a few
things that, normally,
I would spend a little
bit more time explaining
and trying to convince you are true.
But for machine learning
people in the audience,
maybe, you can just nod your head
and agree with me (speaking faintly).
For the non-machine learning
people in the audience,
maybe, you can just go
ahead and believe me
for a second.
(audience laughing)
Alright, so the first thing
I'm going to assert is,
if you take statistically biased data
and feed it into a machine
learning algorithm,
that algorithm will reproduce the biases
in the data that it was trained with.
I see a lot of nodding heads,
that this is something
that you're realizing.
(audience laughing)
You don't always have to trust me.
You can look around here, also.
Alright, the second fact is
that police records are biased.
And there's a little bit
of an unfortunate collision
of terminology here in
the sense that I'm talking
about statistically biased.
You might also think of them biased
in a more colloquially sense,
but here, I'm just talking
about statistical bias,
meaning some crimes that
occur in the area of study,
so in the city, are more likely to appear
in the police records than others.
And this seems relatively straightforward.
One example of the way
in which this could occur
is areas where police patrol heavily,
crimes that occur in those
areas are more likely
to be observed simply because that's where
the police are looking.
So again, if you don't believe me,
you probably should, but
that (speaking faintly).
Alright, so the point of the paper
that William and I
wrote was that we wanted
to bring an example to show to people
to illustrate the point that,
if you apply machine learning algorithms
to police records, you can end up
with these feedback loops.
And this has been something that people
have been discussing for a while,
that we haven't seen any really
concrete evidence of this,
and we haven't seen any
examples on real data
with real algorithms that demonstrated
how this should happen.
So that's the goal of what
I'm going to show you now.
So on the upper left-hand corner,
we took some data from the
Oakland Police Department.
This is on openoakland.org.
And this shows the
number of crimes per bin.
So we binned up the space of Oakland.
Areas that are bright red or pink
showed where found a lot of crimes.
These are drug crimes, by the way,
so drug crime reports.
It's not just arrests.
It's crime reports of anything they found,
or was reported to them,
or they came to know about.
And you can see, there's
basically two locations
where the police are
finding a significant number
of drug crimes.
So if we go in this up here,
this is West Oakland,
and this is the brightest spot right here.
And the other area is
right along down here
and this is kind of-ish along
International Boulevard.
That's the best description
of this location
that I could find.
Alright, you can compare
this to this map here,
and I've sort of done
the best that I could
to match up the line and the circle here
onto a demographic dot map that shows
the racial composition
of the city of Oakland.
So the way, I'm sorry,
these little blue dots
correspond to white people.
The green dots over here
and somewhat over here
correspond to black residents of Oakland.
Asian are represented by red dots.
And Hispanic people are
represented by orange dots.
And there should be a citation down there.
If there's not, I will be
tweeting this out again.
(speaking faintly)
And so, you can see that they're really
only finding crimes in, essentially,
the African-American neighborhoods,
or they're predominantly finding crimes
in the African-American neighborhoods
and the Hispanic neighborhoods.
And this is in contrast to, and again,
this is something I'm
just going to assert.
But people are doing drugs
everywhere in Oakland.
They're not just doing
drugs in these two places.
(audience laughing)
Again, you're just gonna have
to to have to believe me,
but we explored that a little bit, too.
That's true.
Okay, and so, what we did was we applied
a predictive policing
algorithm to the data
from 2010 that I'm showing you here
to make a prediction for each day in 2011
to see where the police would've been sent
if you'd applied the
predictive policing algorithm
and sent the police to where
the algorithm tells you
to send them.
Those are the red squares on this map.
The black dots are
where they were actually
finding crimes (speaking faintly).
And so, if the algorithm
were actually making things
more representative, or less biased,
we would see red squares everywhere,
because I'm sorting drug
crimes everywhere in Oakland.
Instead, what we find
is that the algorithm
just sends the police officers right back
to the locations that were already
really overrepresented in the data,
the areas that I would describe as,
possibly, over-policed in the data.
And so, in this sense,
the algorithm has actually reinforced
the bias that's in the data.
And if, when they go there,
they find a little bit
more crime than they
would've found anyway,
that bias is going to be amplified.
So even if the data
didn't begin as biased,
it will become biased
simply by applying this
sort of technique.
And so, I saw my time is about up
so I'll just let this run for a second,
bask in the movie, I always like movies,
and thank you so much for your time.
It's been a pleasure to be up here.
It's such a great panel.
(audience applauding)
