- Welcome to Ethics Matter.
I'm Stephanie Sy and our
guest today is danah boyd,
a principal researcher at
Microsoft Research and the
founder and president
of Data & Society,
a research institute
focused on understanding
the role of data-driven
technology in our society,
technologies used by
companies like Facebook
and Google, and soon
perhaps your employer,
your government, perhaps
your police department.
She has degrees in
computer science as well
as a Ph.D.
in information, and has
written two books on the
intersection of
society, culture,
and internet technologies.
Danah is here to help us
understand the limitations
and the risks of big
data, algorithms,
and machine learning,
and hopefully, danah,
we will be able to define
some of these terms as we
get into the conversation.
Welcome to the Carnegie Council.
- Thanks for having me.
- Are we in the midst of
a technological upheaval
that is drastically
changing society?
Is that why you started
your research center?
- For me, it's not so much
that we are in the midst
of something that is changing,
it's more that there's
a moment where we're
suddenly paying attention
to a set of issues that
have actually been going
on for a very long time.
When you step back, you
actually can see patterns
over a longer period, but
we're in a moment where
everybody is paying attention.
It's a phenomenon, people
want to understand it.
And there's nothing like
that moment of phenomenon
for people to get
obsessed with hype,
imagine all of the
great things it will do,
and also simultaneously
be terrified.
So a lot of what I'm interested
in with Data & Society
is to make certain
we can ground it,
and sort of step back and
say: Okay, what is real?
What are the ramifications
in light of a whole set of
other social dynamics?
and really try to make
certain that we're more
informed in our approach to
a lot of these technologies.
- A phrase that I hear
thrown around a lot in the
last few years is big data,
and I'm sure that is
something your research
center looks into.
How big has big data gotten,
and how did we get here?
- I joke that big data
often has nothing to do
with bigness and rarely
has anything to do with
data, but it's in many
ways the mythology that if
we just collect more
information about more
people that we can
do magical things,
that we can solve
problems that have been
historically intractable.
And I that's actually
where we get ourselves
into trouble.
There are a lot
of techniques and
technologies that are
actually doing data
analytics across
large swaths of data,
and some of the most
sophisticated have nothing
to do with people:
astronomy data,
for example, pretty amazing;
what we're seeing
in terms of genetic
analysis, unbelievable.
But a lot of what we talk
about when we talk about
big data is the idea that
companies like Facebook
have tremendous
information about you and
your practices and
what you're doing,
and so they're trying
to understand patterns.
So a lot of what it
becomes synonymous with is
the idea of prediction,
the idea that we could
just take this data and
predict something about you.
The question is: Should we
be doing those predictions?
Who is manipulating that data,
and what are the
ramifications there?
The other thing about big
data is that it has become
collectively synonymous
with artificial
intelligence, which
is our other term.
- We are going to get into
artificial intelligence,
but can you give us a
broad definition of what
you mean by big data?
You brought up some of the
ways data is collected,
but when we talk about big data,
what are we actually
referring to?
- From my perspective, big
data is actually about a
phenomenon, it's not
actually about something
that is collection of
large swaths of data.
It's about a set of
business practices,
a set of technologies, and
a set of beliefs of what
we can do with a huge
amount of information
about people and
their practices.
- Part of that is
algorithms and how big
data is used by algorithms
to make decisions.
- That's been a lot of the
interesting transition,
right, which is, one, the
phenomenon has been what
do we do with all
the analytics or the
information we have?
How do we analyze it?
Often we get into
conversations then about
machine learning.
Machine learning is usually
the next translation.
So at that moment we can
take all of this and not
just do run-of-the-mill
business analytics or
statistical processing, but say,
How do we actually analyze
this data for prediction?
A lot of machine learning
algorithms are to cluster
or to predict, to make
specific decision-making
processes available.
That actually is one of
the reasons why you have
to connect it to
artificial intelligence
because big data became almost
synonymous with Big Brother,
with big surveillance,
and so that became a term
that has been deprecated
by a lot of different
communities and been
replaced with
artificial intelligence,
where we actually
meant many of the same
things, large amounts
of data analytics,
the ability to do
sophisticated machine
learning, and more and more
advances in machine learning.
- The way I think most of
us typically confront and
use these technologies is
every time we go on Google
or Facebook.
What are other ways and
other examples of how big
data and machine learning
and algorithms are
impacting our lives today?
- Within a core
technology universe,
think of any time you are
given a recommendation:
What movies should you watch?
What things you
should purchase next?
What news articles
should you read next?
Those are all part
of this ecosystem.
But, of course, it goes
beyond the world of
information technologies.
It's also starting to
shape things like medicine.
How do we start to
understand your cancer?
We also see this in
environments like criminal
justice, which is where
it can actually be a much
more problematic environment.
- Let's stop there,
with criminal justice.
That is an area in which
algorithms are being
applied that some say
is ethically concerning.
Let's parse that out
a little bit more.
What types of risks are
there to using machine
learning in criminal justice?
- The biggest challenge
with criminal justice is
that the public does not
actually agree on the role
of criminal justice.
Is criminal justice
intended to punish somebody?
Is it intended to prevent
them from committing
future crimes?
Is it intended to rehabilitate?
What is the actual role of that?
That's part one, because
actually it starts to
shape the kinds of
data that we collect.
We collect different data
depending on what we view
the role of criminal
justice to be.
Next, we have a whole set
of biases in how we've
actually deployed criminal
justice practices,
how we've engaged with policing.
Our policing structures
have long been biased
along axes like race, gender,
income level, communities.
- In theory, wouldn't
making machines more part
of the process make
it more neutral,
make it less biased?
- It depends on what
data it's using.
The challenge here is what
most of those systems are
designed to do is to say,
Let me learn based on
what I've known in the
past to make decisions
about what should
happen in the future.
- Give me an example of that.
- For example, when
you're trying to do a
predictive-policing algorithm,
you're trying to say,
Where have there been
criminal activities in the past?
Let me send, then, law
enforcement to those sites
where there is a higher
likelihood of criminal behavior.
Where has there been
activity in the past?
It's the places where
police have chosen to
spend time; it's the people
that they've chosen to arrest.
- And that might be based
on their personal biases.
- Or a whole set
of other values.
For example, if you
look at drug arrests,
we actually know that the
drug data in the United States
is that whites are far
more likely to consume
and to sell drugs.
Yet, when we look at the
arrest records of the
United States, we
overwhelmingly arrest
blacks for both consumption
and sales of drugs.
As a result, when you're
trying to predict who is
most likely to engage in
criminal activity around
drugs, your algorithms are
going to say, Oh, well,
actually it seems to be
mostly something that
occurs with black and
African American individuals.
That's not true.
That's based on flawed data.
That's the problem
in criminal justice.
Could we design a system
to be more responsible?
Certainly.
But it all depends on the data.
The problem with
machine-learning
algorithms or big data or
artificial intelligence is
that when the data is
flawed we are going to not
only pipe that flawed
bias through the system,
but we're going to amplify it.
The result is that we
increase the likelihood
that we are reproducing more
and more biases in our data.
- How do companies like
Facebook and Google use
machine learning and
algorithms, for example,
to in their case optimize
their bottom line?
How do they account for
values such as democracy
and privacy and free speech?
- Take something
like a search engine.
That's probably the
easiest example to make
sense of.
When you put in a
search term like cats,
what you might want to
get out of it is the
Broadway show.
What I might want to get
out of it is pictures of
soft, fuzzy things.
Part of it is the system
is trying to figure out,
it's trying to make a
good prediction of what,
based on knowing about you,
you actually meant by
that very vague term.
The result is that the
data is used to start
personalizing the
search queries.
The result is that
you search for cats,
you get the Broadway
show because we all know
you love Broadway; I, who
have clearly watched way
too many cat videos, I'm
getting lots of fuzzy
animals, and that feels
all nice and fine.
But what happens when
I decide to figure out
about, say, a
political candidate?
I want to search for the
current mayoral candidates
in my home city.
What is the information
that I should receive?
I have a clear history
of watching a particular
segment of news.
Let's say I regularly
watch Fox News.
Should I receive the
Fox News link to the
information about that
candidate as the first thing?
Or should I receive,
for example,
a New York Times response to it?
The challenge with those
is those are two different
social views on a
political candidate.
What Google is trying to
do for its bottom line is
to try to give you the
information it believes
you want the most.
That's because it makes
certain that you come back
and are a return customer.
It fulfills your goals,
you are more likely to
spend time in its services
and therefore click on its
advertisements, et
cetera, et cetera.
- This goes into that whole
idea of confirmation bias,
that what people want in
general is for their
views to be confirmed.
- And what they want is
to feel like they have
control over the information
that they're receiving.
So the results is that
combination of their
perception, that they
have control with their
perception that they're
getting what they want,
is what makes them commit to
that particular technology.
This is the funny thing.
People actually want to be
given the information that
confirms their worldview.
They want the things that
actually make them feel
comfortable.
It's hard work to deal
with things that are
contradictory; it's hard work
to tease out information.
People generally want things
that are much more simple.
What's challenging for me
is that as a society we're
so obsessed with us
individually, our choices,
our opportunities, what
will let us feel good,
that we're not able to
think holistically about
what is healthy for society.
That is a challenge
at every level.
We live in an
individualistic society,
and even though we can use
technology to connect with
people, we use it to
magnify our relationships
with people that we like,
that we're interested in,
who share our values.
- There's the
magnification part,
and I also want to talk
about the manipulation part.
In this past election,
American intelligence
believes that there was
intervention by a foreign
power, specifically by Russia.
There is a sense that
there was a manipulation
of social media and
other search platforms.
The stakes are high in
all the ways you describe
them, but even to
the point that on a
geopolitical scale that's
how high the stakes are.
Was that a wake-up call?
- I think it's become a
wake-up call for many.
I think it's got a long history.
Let's just take the core
data architecture that is
built into our
Constitution: the census.
The census is what allows
us every 10 years to count
the population in the
United States and then to
make decisions how we
reapportion the population
and how we distribute
a lot of resources.
Since 1790 when we started
actually doing this,
people have manipulated
that data source.
They've manipulated it
for all sorts of political
gains, they've manipulated
the outputs of it for all
sorts of gerrymandering,
they've tried to mess with it.
Voting records?
No different.
We have a long history of
trying to mess with voter
registration.
That manipulation is not
just by external actors,
there's also manipulation
within our own state.
Nowhere is that clearer
than the history of Jim Crow
and what we've done
around a huge amount of
racism in the United States.
Here we are walking into a
2016 election with a long
history of every data
type being messed with for
economic gain, for
political ideology,
for fun and games, for
foreign adversarial attacks.
Of course people tried to
mess with this election,
they always have.
The question is, what was
different about this one,
and how did it play out?
- Okay.
What was different?
- For me, what I saw again
was that we started to see
technologies be part
of the equation,
and they started being
part of the equation on
multiple fronts.
On one hand, there was the
moment of using technology
to manipulate the media,
and that perhaps is the
one that is often
most challenging.
- How was the media manipulated?
- Any journalist knows
that you get phone calls
trying to get you to sell
their product effectively
or to tell their story
or their opinion from the
White House or whatever
variation of it.
Journalists have long
dealt with very powerful
actors trying to
manipulate them directly.
What they are less
familiar with is a world
of things that look
organic designed to
manipulate them.
Let's talk about some
concrete examples.
When you have
decentralized populations
who are running campaigns
to get content onto
Twitter to make it look natural,
to produce sock
puppets, basically fake
accounts on Twitter, to
then write out to you as a
journalist and be
like, "Hey, you know,
"what's going on with
this Pizzagate thing?"
And all of a sudden, you
as a journalist are like,
What is happening?
Somebody in the public
has given me a tip.
I need to pay attention.
Except it wasn't just
somebody in the public;
it's somebody who is
intending to get a message
to you very much designed
to send you down a
particular track.
That's when we started to see
massive coordinated efforts.
These efforts had been
happening for social media
marketing for the
better part of a decade,
but we started to see it
really turn political.
The interesting thing is
the political coordination
of it, at least that I got
to witness, was, first,
not foreign actors, it was
people who were messing
with systems.
I watched this pattern
with young people for the
better part of 10 years.
- So it was trolls and people
who were just having fun?
- It started out that way.
Years ago there was
nothing funnier than to
get Oprah Winfrey to say
inappropriate things on TV.
It was great.
I watched teenagers build
these skills in order to
get Oprah to say
something ludicrous.
And they learned how to do this.
That's a skill that is
interesting when you start
to think of how it
can be mobilized.
Then we had a situation
about four or five years
ago where we have a lot
of very misogynistic
practices happening
through technology.
New techniques,
things like doxing,
the idea of finding
somebody's full
information so that you can
actually cause them harm.
An example of causing them
harm would be something
like swatting, which is the
idea that I would call up
911 and say that there's
a bomb in your house.
The result is that the
police would send out a
SWAT team -swatting-to
your house, cordon it off,
looking for the bomb.
But it was a hoax, it
was not actually real.
These were things that
were done to start
attacking a group of
women in a whole set of
phenomenon known as Gamergate.
These were moments when
these same networks started
to take a more problematic turn.
They started actually
doing things that did a
lot more harm to people.
These are the cornerstones
of a lot of groups who
began then trying to mess with
journalists for the election.
In the beginning, it
was pure spectacle.
It was hysterical to watch
during the Republican
primaries this weird candidate,
who for all intents and
purposes was a reality TV
show star, be such a fun
game to mess with because
you get the journalist
to obsess over him.
- It feels scary to
hear you talk about this
because it feels like
we have surrendered our
control entirely to these
anonymous people that have
figured out how to utilize
these technologies to
manipulate societies,
governments, democracy,
voters, journalists, every
aspect of society that we
could talk about that is
dependent now on social
media and online technologies.
- But that's been true
of every technology
throughout history.
- Has it?
- Yes.
That was the story of film.
Look at the story of film
and propaganda and the
anxieties in the 1930s
that we had because we
thought it was a fascist media.
We've had these turns and
we've had these moments
where we had wake-up calls.
What we're in the middle
of right now is a serious
wake-up call.
And the question is what
we're going to build in
response to it.
Also, are we going to be
able to address some of
the root problems that
are actually made visible
during these moments, root
problems of serious racism?
That is not new in this country,
but for so many people
the Obama years meant, Oh,
we're past that.
It's like, no.
We're not even close
to being past that.
Or these moments where we
actually have to deal with
destabilized identities.
We have a large number
of people in this
country-especially young
people-who don't feel
secure in who they are
or where they're going.
They are so ripe
for radicalization,
and that is extremely scary.
We, again, have seen
this throughout history.
How do we get ahead
of that and say: Whoa.
It's not just about who
is engaged currently in
horrible, nefarious,
racist acts,
but also who has the
potential to be where we
have a moment we can
actually turn them?
I think that's where we
should be starting to be
responsible about our actions.
When we think about
the morality of these
technologies, it's not
just about thinking about
the technologies, but
the people as they're
interfacing with them.
- I agree that we can
point to different
technologies throughout time,
even dating back to
the printing press,
as being sort of periods of,
I think you've called it
moral panic in your writings.
But that brings me to
artificial intelligence
and the new dimension and
the new risks and worries
that we're hearing
about with AI.
First of all, give me your
sixth-grader definition of
AI, and then let's talk
about how that maybe
changes the game a little bit.
- I think that what AI
currently means is not the
technical definition.
It's actually about a set
of business and social
processes where we're
going to take large
quantities of information,
and we're going to use it
to train a decision-making
algorithm to then produce
results that we then go
and use in different ways.
- Okay.
And eventually that
machine will be trained to
in some ways think on its own,
make decisions based on
huge amounts of data,
machine learning.
AI is sort of that next level.
- It's not think on their
own in the way that we as
humans think about thinking.
It's about going beyond
procedural decision making
basically it's training an
algorithm to design better
algorithms for the
broader system.
But the values are still
the whole way through.
The idea that the machines
will suddenly wake up and
start thinking, that
is not the case.
It's more that they will
no longer just do exactly
what they're told, they'll
be designed to iterate
themselves.
- But doesn't that
definition surrender part
of a human being's ability
to control that machine?
- Part of why we have
always designed machines
is to scale up our capacities.
I can count pretty high,
but a machine is going to
be able to count a lot
higher and a lot faster.
I can certainly
divide perfectly fine,
but a machine is going to be
able to divide a lot faster.
Those are those
moments of scale.
What you want is for
technologies to be
designed in ways that
actually allow us to do
things for which we simply
don't have the capacity.
Think about something
in a medical realm,
detection of cancer.
We have to use tools
to detect cancer.
We have used a ton of
tools throughout the
history of medicine.
We have the ability
to use more and more
sophisticated tools
based on data,
artificial intelligence systems,
to be able to detect
cancer faster,
and over the next 10
years we're going to see
phenomenal
advancements with this.
Those are the moments
where I get very excited
because that is leveling
up a capacity that we
don't have.
It's also about pattern
matching in other contexts.
I'll give you one.
I'm on the board of
Crisis Text Line,
which is this amazing
service where we counsel
young people and adults,
but primarily young
people through text messaging
with trained counselors.
We use a lot of
technologies to augment
the capabilities of
those counselors.
A counselor may have
had a hundred sessions,
they have experienced
those sessions,
and they use their past
knowledge to decide then
how to interact with
whoever is coming in their
text message stream.
But what does it mean to
use technology for that
counselor to learn from
the best practices of
thousands of counselors
and for the technology to
sit in a relationship to
her and say: Guess what?
I've seen this pattern before.
Maybe you want to ask if this
is a disordered eating issue.
And that actually is
that augmentation.
- That's terrific, and
obviously there are a lot
of positive uses of AI.
Let's talk about in the
context of our previous
conversations, again, that
idea that every time there
is a new technology,
society must reckon with
how it expresses its values,
and whether you feel like
artificial intelligence
presents yet another
challenge to what we've
already been talking about
here in the deployment of
algorithms and machine learning.
- I think it presents a
challenge for the same
reasons that I'm
talking about it
in these positive dimensions.
Just because it can scale
positive doesn't mean it
will only scale positive.
It will also scale negative.
How do we grapple with that?
Again, I like history because
it's a grounding force.
Watching the history of
scientists around nuclear
energy is a really
good reminder of this.
They saw the potential
for a much more
environmentally friendly
ability to achieve energy
at a significant level.
Of course, we also know
where that technology got
deployed in much
more horrifying ways.
- Are you optimistic that
there can be a regime that
can grapple with these
issues and hold the
different players to
account in ways that
we saw with nuclear technology?
Are you optimistic about that?
- Yes and no, and I say
that because I think we've
done a decent enough
job on nuclear.
We're still contending
with it massively.
We've done a lousy
job on climate.
We have the data on that,
and we can't get our
political processes
together to actually do
anything about it.
So I can see both possibilities.
I think there are a lot of
really good interventions
or efforts being made.
I think there are a lot
of attempts to build out
tools to understand what's
going on with the system.
The question for me is,
it's not actually about
the technology or
about the systems;
it's about us as agented
actors in the society,
and what will it take for
us to mobilize to do the
hard political work?
It's not clear to me.
I can see us getting there,
but I would have thought
we would be a lot further
on climate today than we are.
That's the challenge.
- Danah boyd, thank you so much.
Fascinating insights.
- Thank you.
(upbeat eletronic music)
- [Announcer] For more
on this program and other
Carnegie Ethics
Studio productions,
visit carnegiecounsel.org.
There you can find video,
highlights, transcripts,
audio recordings, and
other multi-media resources
on Global Ethics.
This program was
made possible by
the Carnegie Ethics Studio
and viewers like you.
