Thanks, David, for the kind words. Thanks
for all of you. It's beautiful outside so
what I'm going to talk to you today
about is the following: I want to first
just talk very briefly about social
media but how we think about it as
political scientists, which, I'll talk
about social media, social media data, and
politics. Then what I'm going to spend
the bulk of the time doing is giving you
some quick brief examples of some of the
work we've done in the SMaPP lab about
the relationship between social media
and politics over the last few years, and
in particular I'm going to talk to you
about hate speech on Twitter, I'm going
to talk to you about sharing fake-- who
shared fake news on Facebook, and I'm
gonna talk to you a little bit about
Russian troll behavior in the 2016 US
elections. When we think about social
media data, when we think about social
media as social scientists, we think
about it in two different ways. We can
think about social media data as a
variable, so in that sense, it's how does
social media impact politics.
So in the SMaPP lab, this is one of the
big questions. We're interested in like,
what is the impact of social media on
politics? What's the impact of using
social media on your likelihood to
participate in politics? How does having
leaders who use social media change
politics? How do regimes respond to
online opposition-- but we also have
thought about social media as data, and
for those of us who are social
scientists, the rise of social media data
presents unprecedented opportunities to
study social and human behavior. That's
what we do, we try to understand human
behavior. For political scientists we
want to understand politics, other
people want to study the economy, our
society, but the rise of social media has
changed the amount of data that we have
at our disposal. All of the research that
we produce in the SMaPP Lab is set up
like a lab even though we're in a
political science department. We have an
organization that looks more like a
Natural Sciences lab; so we have P.I.s, we
have postdocs, research engineers, PhD
students doing research, undergraduates
who do research with our team and play a
big role in our team, I'll come back to that
at the very end-- if you were following
the 2016 U.S. elections you might have
seen headlines like this. You might have
seen things about "Hate speech seeps into
the U.S. mainstream campaign" and in
particular things, like "massive rise in
hate speech on Twitter" during the
presidential elections. So we set out in
the SMaPP lab to figure out: did this actually
occur? Like we went out to see, could we
actually document this, beyond just lots
of different anecdotal observations of
it? So the research question was, to what
extent did online hate speech and white
nationalist rhetoric on Twitter
increase over the course of Donald
Trump's 2016 campaign and in the
aftermath of his election? Now we've got
definitions for hate speech, so we define
hate speech in a particular way. We
define white nationalist language in a
particular way. And then we thought, okay,
so if these claims were all right, hate
on the rise after Trump's election,
Donald Trump and the escalation of hate,
what would we look at? What would we be
able to see if we went into the real
world? Well, if we could measure the
prevalence of hate speech online and if
we could look at time, and if we think
about the x-axis here as time, and we
think about this line as Election Day,
all these arguments about hate speech
increasing over the course of the
election campaign would either predict
something like this was going on, right,
that Trump gets in-- Trump declares, he
uses different language, then people are
used to it legitimizes, it kind of, feeds
on itself-- so if it really increases over
the cuts it could go up like this or
maybe what people were talking about
with something that looked more like
this, that the election of Donald Trump
legitimates using this kind of language
and there's a surge in hate speech after
the election. But if those claims were
describing what was happening, it would
probably be something like this, or
something like this. So what do we do to
test this? Well we gathered two types of
data sets, one we call our Political
Twitter data, which was tweets that
mentioned either Hillary Clinton or
Hillary Clinton slogans and tweets that
mentioned either Donald Trump or Donald
Trump slogans, and this gave us-- we looked
at a two-year period from like basically
the date Trump declared in 2015 until
the summer of 2017 and this gave us
about 750 million tweets we also however
looked at a random sample of American
Twitter users so maybe there was
something peculiar about the political
data but what about people tweeting in
general? So we had been following a
random sample of 500 thousand American
Twitter users, collecting everything they
were tweeting about, and that gave us
another four hundred million tweets. All
right, so then we had 1.2 billion tweets
to look at over a two-year period.
Billion with a B. So how are we gonna do
this? So we do use two different methods:
the first was a dictionary based method,
we took words that people had associated
with hate speech,
but it turns out if you do, that there's
actually an awful lot of false positives
you're gonna get because people will say
things like Trump has a chink in
his armor, right, over foreign policy--
that word is included in the
anti Asian hate speech dictionary, but
that's not a hate speech sentiment. Or
you would have people saying things like
I can't believe I was called X right
that's not, that's not hate speech that's
against hate speech, so we have to train
machine learning models to remove the
false positives and then we came up with
a whole new method which is, we were
afraid that well we had these
dictionaries of hate speech that had
been used in offline speech. Maybe they
weren't right for the way people talked
about things online, so we came up with a
whole other method, which-- we used
sophisticated machine learning models
similar to neural nets to essentially
try to find a place where we knew there
was hate speech online, which turned out
to be certain corners of Reddit. For
those of you who are familiar with
reddit, there were subreddits that are
organized around these topics, and then
we were able to compare how much the
Twitter data we saw every day kind of
looked like it was similar to that
online data, all right, the key was both
of these methods allowed us to measure
change over time in the prevalence of
hate speech which is exactly what we
needed to sort of test these assertations. So this is the sort of raw data
that we have, everything kind of pulled
together as the Clinton data set, the
Trump data set, and the random sample
data set, and this is the proportion of
tweets. But we also did this all sorts of
other different ways as well, where we
did it with numbers of tweets and
everything-- it makes no difference and
what you see is essentially kind of flat.
You don't really see changes here you
see a little burst here in January and
February of 2017 that turns out to be
around the Muslim travel ban, and a lot
of that is misogynistic speech aimed at
Sally Yates who was the assistant
attorney general there, who resigned as
part of that process, or was fired as
part of that process. This is what it
looks like when you disaggregate it here
what we're doing is we're breaking this
out in terms of-- one of the more
depressing things I've done in my career--
anti-asian hate speech, anti-black hate
speech, anti-immigrant, etc., and what you
see across here is the pattern that we
found across all of this data. No matter
how we looked at it it's that we do not
see a trend up like we saw in either of
those things, we do not see a huge shift
after the election, what we see is
burstiness. So this, for example is
anti-muslim hate speech, and this is what
where the Bataclan
nightclub shooting took place in Paris,
so after this terrorist attack
there was a surge of anti-muslim hate
speech, but then it really rates back
down to the trend level. To be clear I'm
not saying anything normatively good or
bad about this, right, you might
think that the trend level is way too
high and you might think that the trend
level is not that high, but the point is
when we looked across all of these
things that, contrary to received wisdom,
our analysis did not show a systematic
increase in hate speech over the course
of the campaign or after the election,
nor did we find this for white
nationalist rhetoric. Instead what we
found was that this was bursty. Now here's
another thing that we were sort of, got
interested in testing, what people were
saying if you were following, again
closely the 2016 election, you may not
have been paying attention and
discovered the fact that actually Pope
Francis had shocked the world by
endorsing Donald Trump over the course
of the election, or that WikiLeaks
confirmed that Hillary sold weapons to
ISIS, right, so this is what is now known
as the sort of famous fake news that
emerged over the course of the 2016
election. Some of the admitted students
here might know that the response to
fake news in the 2016 election has been
a rise of literacy classes, digital
literacy classes in high schools, so okay
on the assumption that younger people are
maybe sharing fake news. So we set out to
see could we figure out after the
election who was sharing fake news, we
were very fortunate that we had in the
field at the time of this election three
wave panel surveys that we had fielded in
2016 for 3500 people where we included a
Facebook app that-- we built an app with
the cooperation with Facebook that
allowed participants in our survey to
share their Facebook data with us if
they wanted-- but then here is the big
finding we found, which is that it turns
out that people who were over age 65 on
average shared seven times as much fake
news as the class of 2023
here at NYU. So the big takeaway from
this is you know the kids are all right,
right, and here we blame, we blame
Millennials for a lot of things and
going into-- and I'm a big fan of teaching
digital literacy in high schools I think
it's incredibly important-- but digital
literacy in high schools is not going to
be the policy solution that gets us out
of this situation. The greatest
collection of data we've ever had in the
history of the world for studying social
and human behavior will be put to one
purpose, and that will be maximized
revenues for giant multinational
corporations, right, so we live in a world
and-- and most likely we have to assume it
may get into the hands of security
services as well, but it won't get into
the hands of people like me who have to
go through IRB reports, and
transparency and everything we do with
data, so the-- so one thing I want
you to live in-- take out of this, when you
go back home and you're thinking about
it and hearing these sort of privacy
arguments, is that we do live in a world
where this is a trade-off, right? If you
go on one extreme, I would never have
been able to show you any of this stuff
today, and we wouldn't have learned these
things and it wouldn't be in the public
domain. Facebook might know it but you
wouldn't right? So that's a trade-off
between privacy versus what's public good,
and I'm not telling you where you should
fall on that, but I want people to think
of it as in that regard. So, thank you
very much, you can find more about the
SMaPP lab here at smappnyu.org and I'm,
looking forward to your questions.
