>> Good evening ladies and gentlemen.
Before we start, we would like
to take a moment to thank
our amazing instructors
for being amazing mentors,
role models, and friends.
This summer as Jake said,
we set out to replicate a paper that
investigated racial bias
and police use of force.
So replication is very important.
To quote the open
science collaboration,
reproducibility is
a defining feature of science.
So a huge problem in research is that
many papers are not reproducible.
There are many reasons why a
paper cannot be reproduced.
For example, the code or
datasets are not provided by
the author which means
you're not guaranteed to be
working with the same datasets.
The methods could be unclear or
difficult to parse through which
means you don't know exactly
what to do with the data.
There could be p-hacking
present which means that,
when a researcher either
by accident or on
purpose does question with
things to his or her data,
to achieve a significant
result to report.
We decided that if we're going
to do a replication study,
we should replicate a paper
that people truly care about.
So the paper we studied
is by Rolan G Fryer.
He's an economist at
Harvard University.
His main findings were that
in non-lethal uses of force,
blacks and Hispanics experienced
significant racial bias even
when controls are present.
But in lethal or more
extreme uses of force,
there were not significant,
there were not racial bias present.
So as you can imagine
this paper generated a lot
of buzz and controversy,
not only within
the academic community,
but also to the general
public as well.
This is the New York Times and
this is USA Today
reporting on the paper.
This highlights why it's
important to evaluate the findings
of scientific papers,
because the public often takes
a scientists word as fact.
So one important thing to note is
that our goal was not to resolve
the controversy of the paper,
surrounding the paper and
the issue as a whole,
but rather just check his results.
One way of looking at that is,
if the result doesn't replicate,
is there even any
controversy present?
So within the paper,
there are two publicly
available datasets.
The NYPD Stop Question and Frisk
data-set or SQF for short,
which looks into a police
civilian interaction
from a police perspective,
and then the Police Public
Context Survey or PPCS for short,
which covers police
civilian interaction
from a civilian point of view.
Key difference is that
the stop-and-frisk data-set
is exclusive to NYC,
and the PPS data-set is
a nationally representative survey.
There were simple methods
employed by the paper,
counting methods and
logistic regression
and it's a pretty detailed paper.
It clocks in at about 110 pages.
If you actually look closely,
the appendix is longer than
the actual paper itself.
But it appears that there's
an ideal scenario
present for replication.
But is there a systematic
way to approach this?
Well, here are
the three types of analysis.
There is reproduction which
is when the data is the
same as the original paper,
where the code is
provided by the authors,
and that you get
exactly the same numerical results
and thus the same conclusion.
But Fryer did not actually
provide his code,
so that makes
reproduction impossible.
So our main focus is
replication which is where
we write code to try and get
the exact or approximate
numerical results and
does the same conclusion.
We also did some extension
which is where we use
datasets released after
the paper was published,
and try to achieve similar results
and observe a similar trend
so that we can reach
the same conclusion.
Our agenda was replicating
Stop Question and Frisk,
the analysis on it,
extending that analysis
with maps and census data,
and also replicating
the PPS survey analysis.
Now Roy Mill will probably introduce
you to the Stop and
Frisk data analysis.
>> Thank you Emeka. Stop,
Question and Frisk.
So how many of you know what a Stop,
Question and Frisk is?
So basically is an NYPD program
that was created back in
2003 and is still ongoing.
What is this likely about?
Well, if an officer suspects that
civilians are related
to criminal activity,
they might stop them and
question them and probably
search them for weapons
or contrabands.
The data is released every year.
It's available to the public,
but have in mind that this
is a one-sided report.
In fact, Fryer's paper
addresses the issues
by focusing on use of force by
the police and how it varies by race.
We found potential shortcomings
in the data at the high level.
So for example, the police race
is not specified in the datasets.
We only account for people who
has been stopped in the past.
So we don't know about others.
Now, we're going to take
a look at the one of
the simple steps which
is download the data,
and collecting, and loading it
into R, which is pretty simple.
Well, not so simple.
So we found a lot of challenges,
like mismatching columns,
like the naming convention
was really weird,
and different numbers
of columns are missing
columns non-central in
inconsistent values.
I'm going to show you
a few examples in the next slide.
For example, we took a look at race.
We went exploring and we were
expecting to have white,
black Hispanic black, white
Hispanic Asian and American Indian.
But wait a minute,
what's male doing there?
A race? Probably. Well,
next example will be age.
This is beside. How many
of you know somebody
who's older than 150?
Well, I guess we have immortal
people even in New York City.
We also found replication issues.
For example, the number
of observation
were different not crazy different.
In fact was 2,000 more
data points than Fryer's paper.
But we also have coding variables
issues like the last mention in
civilian race which
he decide to actually
code black Hispanic as black people.
In only white Hispanic as Hispanic.
That's his choice and probably
if we took a different approach,
we should get a different result
which you're about to
see in this table.
This table represents five results.
He took the full sample
of the population,
meaning people who were stopped.
Ten percent were white,
58 percent were black,
25 percent Hispanic, and so on.
Age is based on the average
and they par down by
race. So you can follow.
Now we're comparing
our data to his data.
As you can see, mostly
everything much with
exception of the Hispanic group
which is a little bit lower.
Perhaps some filtering that he
didn't mention in his paper,
that's one of the possibility.
Now we take in the whole table
which is five different categories.
The category on top is
the one that we just saw.
Therefore Moore's like
encountering characteristics.
Civilian behavior in
alternative outcomes.
As you can see, everything
match with one exception.
Day time which was suggested,
he actually didn't specify
anything about day time.
So we actually went on and try
different possibilities
and we got it to match,
not perfect, but something.
We actually went on
and create up lot of
it since he didn't introduce
any visual in his paper.
So as you can see here,
the y-axis has hands pushed
to the wall and so on.
These mean that hands
is considered to be
low-intensity force and as you go on,
you have the mass intensive force,
which is paper spray in but soon.
In the y-axis you
have the proportion.
As you can see here,
Y civilians are 13 percent likely
to experience on hands on them.
However, if you look at the Hispanic
groups in black population,
you can see that their
disparity is bigger.
They have like 25, almost 20 percent
chance to have hands on them.
However, if the intensity
of the force increases,
we see that the disparity decreases.
Now, I'm going to
pass the mic to Anna,
who's going to talk about the model.
>> Thank you, Roy. So
Rogers talked about
empirical averages
and the differences
we see in terms of proportions.
But these differences are
just about race, but are they?
Maybe there are other variables
that can account for some
of these differences.
So maybe there is one group
carries more weapon
than the other and this could be
the main factor and race
could be secondary.
In an ideal situation,
would have a controlled experiment
and all of the variables
would be constant,
we will just only change the race.
But it's not possible in real life.
Then we'll have to deal
with like real people,
real crime, real stop-and-frisk.
So we were restricted
to the observational dataset
that we collected.
What Fryer did that he
constructed a model that accounts
for as much difference as
possible for this dataset.
So let's now look at the model.
As you can see, the main outcome
of the model is use of force.
So whether force was used or
was there in no use of force,
and the variable of interest is race.
So for a know Control model,
would have race as
the only predictor and use of
force as the main outcome.
But we want to know how it
changes after we add like
different variables.
Do other variables
actually account for
some of the differences
that we noticed?
So we add different controls
like CivilianDemographics,
EncounterCharacteristics,
CivilianBehavior, Precinct, and Year.
Now we want to know whether
there was any danger or not.
For that, we have to look at the
results for Fryer and for us.
So let's look at the tables.
So the table at the top,
that's Fryer's results and the one
at the bottom is our results.
So what's going on here?
So if we look at the first row
of Fryer's results,
we can see the first column it says,
"White Mean", and
the row says, NoControl.
So NoControl basically means
we are using race as
the only predictor.
So the White Mean the number
basically here is saying that 15.3
percent of white people that were
stopped had some force used on them.
The other columns that
follow White Mean are
basically their odds ratios
relative to the Black Mean.
So let's focus on the second column,
Black population. So
what do we see here?
So this number is
actually saying that,
"Compared to the relative
white mean of 15.3 percent,
the black population have like
53.4 percent higher odds
of having any kind of
force used on them."
So one of the things that we wanted
to see out of the results is,
do adding different controls
actually affect account for
any of the differences?
Now if you look at the second
column for the black population,
and we see after adding controls,
the odds ratios are
actually decreasing.
So it is actually accounting
for some of the differences,
but it doesn't go down to one.
That means, it can't account
for all of the differences.
So there is still some existing
difference there, some disparity.
Now we can compare
the Fryer's model to our model,
and see the results.
So we can see the numbers
exactly don't matter,
but it's similar. It's close enough.
But the important thing here is that,
we see a similar trend.
So we see that downward trend for
our results as Fryer's results.
So it means that it's going down,
but it's not going down to one.
So there is still some difference.
So one of the problems,
one of the issues
that we ran into was the sample size.
So if you look at the very right
column sample size,
you'll see a lot of question marks.
What is that about?
So Fryer actually only
reported this one sample size.
So it's supposed to change
with adding controls.
But that's what we found,
when we added controls,
the sample size change,
but for Fryer it was
just one sample size
and we were not sure what
that was supposed to mean,
like what sample size wasn't
NoControl or full control.
So that's what's one of
the discrepancies we found.
So I've been talking
about odds ratios,
and what does that actually mean?
It's hard to talk about.
It's hard to interpret,
and how Fryer talks about it
in the paper is basically.
He is talking about probabilities
based on odds ratios,
which is not that intuitive.
So we decided to calculate
the actual probabilities,
and see what the tangents
are across the race.
So here's a plot of
the difference in predicted
probability by race.
On the x-axis, we have
all the races and on the y-axis,
we have the predicted
probability of use of force.
Now this is a much clearer picture.
We can see how likely
a black individual is to experience
any force when he stopped.
We can also calculate what's
the difference between
a black person having
any kind of force
used compared to a white person.
So our findings suggest that,
the difference is
actually 37.5 percent.
So black people have 37.5
percent more likelihood of
having any force used on them.
But Fryer reports it to be
53 percent and which is
based on odds ratios.
So that's another inconsistency
we found in our results.
So that led us to
evaluate the models,
see how the model is
actually performing.
So how do we evaluate
a logistic regression model?
So the standard way to evaluate
the logistic regression model
would be to plot
the ROC Curve and calculate
the area under the curve.
So our dataset was not
a balanced dataset
and AUC is very balanced
accuracy measure.
So this was a good way to
evaluate the model for us.
So the highest measure for
AUC is actually 100 percent.
So we can say, "Whenever we
said there is force being used,
there is actually forced being used."
The lowest measure is 50 percent.
So it's like randomly guessing
whether force was used or not.
We calculated the AUC,
and we found that AUC was 68 percent,
which is not great, but it's not
terrible because it's
above 50 percent.
It's closer to 50 than 100,
but it's not that terrible.
So this low AUC suggests that
the model is not
really a good description of
the reality of the dataset.
So we did one more thing.
We decided to take out
race from the model,
and see how that changes AUC.
Surprisingly, there was
no significant change.
It remains the same.
So it potentially might imply
that race probably doesn't
have that much predictive power
when it comes to the model.
So from the takeaways
from our analysis,
we were not able to replicate
the results exactly,
but we recovered
the underlying patterns.
We can say that the controls don't
account for the racial
differences, not completely.
But the caveat is that the model's
performance is not optimal,
is not ideal, and removing race
does not impact the model much.
So now I'm going to hand over
the mic to my colleague Harpreet,
who's going to talk about some maps
based on the SQL of datasets.
>> So as the [inaudible]
previously mentioned,
we are trying to do two things,
replicate and extend.
I will be talking
about the extension.
We decided to visualize stop and
frisk data on New York City Maps.
In order to visualize the data,
we first decided to calculate the
probability of any force used
on arrays and given on
the condition that they were
stopped in a particular precinct.
So the map on the left
indicates the probability of
all the white and off
the white people who were stopped
in a particular precinct.
What fraction of them were
subject to any force?
The same goals for
the map on the right,
but this is for all
the black civilians.
So if anyone's more likely
or the probability of them
being subject to
any force used is higher,
the map's going to
get a little darker,
but there is slightly
higher probability
for the black civilians who
are subject to any force used.
But this still is a little unclear.
We still don't get a very clear idea
of the difference.
Therefore, what we decided to
do was calculated how much
more likely where black subjected
to any force use in
comparison to whites?
So to do so, we divided
both the probabilities of
the white and black civilians.
Sadly, this map actually shows that
black civilians were more likely to
experience any force in a who
were stopped in a precinct.
The only exception
to this is a Storia,
and this can be shown highlighted
as the green area there.
So we could see that, okay,
there was a difference between
blacks and whites civilians.
Is there a different trend for
another race, say Hispanics?
So yes we could actually
still find out that Hispanics
were more likely to be subjected
to any force in comparison
to white civilians.
The disparity is still not as
much as the one for
the black civilians.
But, yeah we still see
that the map is darker.
So that was for any force used.
But the data set actually had
different amounts of forces.
So now, let's look at
how much force was used.
Is there a difference
between the level of
intensity that might affect
the color of the person?
So what we did was we decided to
split all the forces into
two different intensities,
one of them was low-intensity
and high-intensity.
We use these intensities to
calculate racial differences.
The map on the left
indicates the probability of
low-intensity force being used
on a black person in
a given precinct,
over the probability of a
low intensity force being
used on a white person who was
stopped in a particular precinct.
So we do see that blacks
are more likely to
be subjected to low-intensity
force compared to white civilians,
whereas on the map on the right,
we see that there's less of
racial disparity
between both the races.
So when it comes to
more high-intensity forces
there is less of a difference.
This actually is indicated by
if there is no difference,
the ratios would be around one,
and if there actually is
a difference the ratio would
increase and the map gets darker.
So now that we looked at who was
stopped on the basis
of different forces,
Eder will be talking about
who actually gets stopped.
>> So as Harpreet said,
Friar's paper only discussed
who has force used on
them by the police,
but we also wanted to look at
an extension of his paper,
another aspect of police
willing interactions,
who is the police actually stopping
and that might be very different.
So we wanted to look at
the stop-and-frisk data and
we can count how many people
have a tracer stopped,
but we figured that wouldn't be
so representative of the big picture,
because we could have
a situation like this.
Let's say here are the people who
are being stopped by the police.
It looks like the
purple people are twice as
likely to be stopped
as the orange people,
they are twice as many of them.
But maybe the actual population
looks like this.
There are lots of
purple people and very
few of them are
actually being stopped,
and then maybe the orange people
are experiencing more police bias.
So we decided to bring
in the census data.
The US census data shows
the racial distribution of
people in New York City,
and then we could see how
the racial distribution of
the population compares to
the racial distribution of
the stops in that area.
So we started
downloading census data,
Harpreet and I, and
all was going fine,
and all of a sudden, we
just got this error one
day in the middle of the day
that the website was down.
The census website just
wasn't working and there's
nothing we can do about it,
we couldn't go on with our work.
So Harpreet was great.
She went and called it the census.
She demanded that they fix it.
She spoke with
three different agencies
and we got it back up and running,
and we were ready to start
working with our data.
So the census data doesn't
split on the precinct level.
We wanted to look at precincts
because we're discussing police,
and that was one of
the most consistent ways that
location was reported in
the stop-and-frisk data.
So John Keefe, the founder
of WNYC data news team
had published
a publicly available key
that maps census blocks to precincts.
We use that to convert our
data and we calculated
for every precinct how many people
have each race there were.
Then we were ready to
start analyzing our data.
So similar to the maps
Harpreet was showing,
we calculated stop rates for
each precinct for each race.
In other words, the number of
people of that race
who were stopped in
that precinct divide by the number of
people of that race who
live in that precinct.
On the left we have for
the black population,
on the right for
the white population.
It's clear that the map for
the black population is
much darker indicating that
they're more likely to be stopped.
If you look at the scales, it's
interesting because in most cases,
the most precincts to the black
stop rate is greater than one,
which indicates that the number of
black people being
stopped in each precinct
for many of the precincts
was greater than
the number of black people
who live there at all,
which was really pretty extreme.
Those two gray areas on
both maps are areas that
we chose not to look at.
The one in the middle of
Manhattan is Central Park.
We decided not to look at that
because according to the census,
it has a population of 25,
which goes like 100th of
any other population.
It's not a big enough sample size
to make any significant claims,
and then the other precinct
that we didn't look at
is in Staten Island, precinct 2021.
It was only created in 2013.
So 2010 census data just wasn't
going to work with that.
So like our predicted we
wanted to compare
these two maps more easily.
So we decided to divide
the black stop rate by
the white stop rate and
visualize that statistic.
That's what this looks like.
This is the black stop rate
divided by the white stop rate.
So just to discuss
the scale for a minute.
If the black and white
stop rates were equal,
this value would equal one,
and then the precinct on the map
would be a kind of whitish color,
not the white around,
but the whitish areas
in New York City.
If the black stop rate was much
greater than the white stop rate,
the area would be red,
darker red indicating
more of disparity.
If it was biased,
the police were biased
against whites,
if the stop rates were
biased against whites,
then the area would be green or blue.
So we see that in general,
it's hard to see the coloring
on this production.
But in general, the areas
are very red and orange
indicating that the black people
are more likely to be stopped.
The only exceptions are Central Park,
which as I said is not
really meaningful,
and this precinct over here,
you can't really see
it in this picture,
but that's around Jamaica area.
Not sure why there was
no disparity there.
Just to get an idea of how
extreme the disparity
is in this precinct,
Precinct 1 which is around like
Wall Street World Trade Center area.
Black people were 90 times
as likely to be stopped
as white people using
these stop rates.
So I just want to put that
in perspective and say
that there's one caveat and
the way we measure this.
The census population is not
necessarily representative
of who lives in an area,
but people who spend
their days in an area
aren't necessarily people
who live there at night.
So this might not be giving
us a perfect picture
like Wall Street central park areas
like that for sure.
The people there aren't necessarily
the people who live there.
So that could be messing with
our numbers a little bit.
We wanted to try to fix
that issue by looking at
commuting data and getting an idea of
a commuting patterns in New York
City and calculating it that way.
We actually found data on the New
York City Department of Planning
website that talks about
where people commute to and
from the New York City,
but we can't really
work with it because
it was on the census tracks level,
not precinct level, and it didn't
have any racial date in it.
I'd like to turn the stage
over to Cindy who's going
to talk about another data set
that Friar used in this paper.
>> Thank you Eder.
So now let's bring in
a different perspective.
So tonight, I'm going to speak about
the PPCS data which was briefly
mentioned earlier in the intro.
So let's take a closer look
at the PPCS data.
So I would like to start off with
why Friar decided to use this data
in his research paper
and what this data
basically provides us with.
So unlike the stop inference data,
the PPCS data provides us with
the civilians perspective of
their interactions with the police.
So Friar thought this was
important to include in the paper
because he thought it could reveal
differences marked by
police misreporting.
So I bet you're probably
wondering what is this PPCS data.
Well, it's basically based on
the police public contact survey,
which is a nationally
representative survey of
civilians aged 16 and older.
It provides us with
the information of
a civilian's perspective of
their encounter or
encounters with the police.
This data is collected by
the Bureau of Justice Statistics,
and it was first
collected in 1996 and
the latest year that this data
was collected was in 2015.
Friar did only use the years from
1996 to 2011 in his research paper.
Lastly, this data is
publicly available on
the National Archive of
Criminal Justice data's website.
But you could also find it on
the Bureau of Justice Statistics
website as well.
So when we were looking at this data,
we noticed a few things that we
thought were a little
bit problematic.
So I would just like to
briefly mention what they are.
But in the next few slides,
I will definitely get
deeper into them.
So the first one is that people
fill out nonsensical answers.
The second one is that
people are less likely
to admit to their own shortcomings.
The last one is that
people might not remember
things as accurately
as we would assume.
So people fill out
nonsensical answers.
So there was actually a question
in the code books and
in the surveys that
the civilians had to answer.
This question was basically asking,
how many face-to-face
contact did you have with
the police officer in
the last 12 months?
So as you can see,
there's a column that
shows us their responses
and 12 months you would assume
that it's one year, 365 days.
But people are writing down
some really crazy answers.
As you can see, 600, 365, 350,
which basically means that
these people were having
face-to-face contact with
these police officers
at least like once every day,
which is pretty bizarre.
So the next one is
that people are less
likely to admit to
their own shortcomings.
So there is also another question
in the surveys and in
the code books that asked,
at anytime during this contact
with the police officer,
did you try to disobey or interfere?
Did you try to get away?
Did you try to push, grab,
or hit the police officer?
Did you try to resist being
handcuffed, arrested, or searched?
So we assumed that the people who
would answer this question
would kind of react like this.
So this explains why
so many people do not
report any of
these things in the data.
So besides those issues
that we had earlier,
we experienced some more challenges
with the data cleaning process.
So we had issues with
the PPCS data columns because
a lot of them were spread out for
the variables that Fryer
used in his paper.
We also had to make
subjective decisions when
categorizing these columns.
Some of the variables that were
used in this paper
are actually missing.
So one example that
I'd like to bring up
about the spread out columns
is the officer raise column.
So Fryer used this variable,
but in each of the years,
it was spread out differently.
So for one of the years,
which is 2005,
this variable was split
up into two columns,
which doesn't sound too bad.
But then you look at
years like 2011 where you
had 12 different columns for
this one specific variable.
So it was like tedious to
try to go through all
of these codebooks,
try to find all these
columns and put them
together into one code so
that they could all match.
So here we go. So you
can see so many columns.
So the next one is
subjective decision.
So we've definitely did have to
make a lot of subjective decisions.
There was one variable
which was called the type
of incident variable that
Fryer used in this paper.
So this variable was actually
really interesting to deal with,
because we had to decide on
labeling each of these reasons.
So each one of these reasons
was labeled as a traffic stop,
a streets stop or
some other type of stop.
But Fryer wasn't really
specific as to which ones were
a traffic stop or which ones were
a street stop or which one
was another stop.
So we basically had to go through
each one of these reasons for
each year and try to figure
out what he thought
was a traffic stop,
and so on and so forth.
So as I mentioned earlier,
there was a lot of missing data.
So Fryer was not specific about
how the data was aggregated.
Some of the years are missing
variables that were used for
the regression and for a lot
of the summary statistics.
So for example, 1996 and
2008 were missing
incoming population size,
1999 was missing the time
of encounter variable,
2005 was also missing
civilian behavior.
So as a result of this,
the missing variables greatly
affected the aggression
and our results.
So after merging all
of the years together,
we found our case counts,
but we ran into
a case count discrepancy.
So in the paper Fryer reported
a total sample of
426,000 observations.
But our total sample that we counted,
it was 409,678 observations.
So after spending many hours and
many ways of trying to figure
out what went wrong and trying
to fix this discrepancy,
we came up with
a pretty different solution.
So we decided to e-mail
the Data Manager from
the National Archive of
Criminal Justice Data.
So thankfully she responded back
to us and she basically said that
she was unsure about why there
is a case count discrepancy.
She also conducted a count herself.
She reported that she got a
number closer to our number.
So that is pretty strange,
and we were pretty puzzled
on how Fryer reported
more observations than
the actual survey company.
So the mystery of
the extra cases still remains.
I would now like to
pass on the mic to
Naomi who will speak about
the summary statistics.
>> So thank you Cindy.
So now, I'm going to talk about
the summary statistics
for the PPCS data.
So first, I'm going to
start with this table.
So what is this table?
This table is practically showing
the proportions that Fryer
computed based mostly on
these variables that belong
to a civilian demographics.
So in the first row,
he's computating the proportions
over the full sample.
In the second row,
he's doing the same.
But instead of the full sample,
he's doing it over
just the white population.
In the third one,
he's doing it over
the black population,
and in the fourth one he's doing
it over the Hispanic population.
So as Cindy mentioned,
the data it's supposed to
be national representative.
This can be shown in this table.
For example, in
the full sample column,
the proportions of white that
were surveyed was 77 percent.
The proportion of
black civilians were 10 percent,
of Hispanics civilians were
nine percent and other
was four percent.
This is just Fryer statistics,
so now I'm going to show you
our statistics and
see how things match.
So here we have our statistics in
the right-hand side and we
see that pretty much a lot
of things match.
However, there were some stuff that
we're not completely the same.
Those are labeled as the red ones.
So do you see some disparities?
They are not the biggest things.
There are not that different.
However, if we compare this part of
the table to the stop
and freeze data,
we obviously see more
than the experience.
So now, I'm going to show you
the whole table and see how it goes.
So as you can see,
we have way more variables
that did not match.
Again, there were not
really big disparities
except for one or two.
But again, if we compare this
to the stop and freeze data,
we see that it's very different.
This brings us again to
the subjectivity decisions and how
that might affect our outcomes.
So I'm going to pick
three variables and try
to see what were the
problems and go over them.
So the first variable
will be type of incident,
the second will be time of encounter,
and the third one will be
the alternative outcomes.
Interestingly, the alternative
outcomes variable was
not using the regression
for the model.
However, he did not specify why
he did made that observation.
So the first summary statistics
is type of incident.
Here, we have visualization.
So as you can see in the x-axis,
we are specifying which type
of incident was it.
Was it a street stop, traffic stop,
another stop at which the
encounter between the civilian
and the police occur.
The y-axis is just
measuring the proportion.
The street stop was
practically the same as
Fryer's result and our results
are just falling the same point.
However, for the other two points,
the traffic stop and the other stop,
they are very different so
there is a disparity there.
So some problems that
we encounter here as in
dimension is that in the
years 2011 and 1996,
there were many variables,
so we were forced to just
make decisions on our own
on how to categorize them.
So despite following
Fryer's coding schemes,
we were not able to
replicate the same results.
So the next variable
is Time of Contact,
in which we also have the plot
and the x-axis is representing
if the time of encounter between
a civilian and police officers
was during day or night,
and the y-axis is just
calculated the proportions.
So here both of them are different.
Some problems that we encounter
with this data was that,
in the year 1999 did not have
any variable related to these.
So all the variables for
that year were missing.
Another interesting finding was
that in the appendix for Fryer,
he coded these as
six different variables,
three for day and three for night.
However, the only year that only
was coded as that was 2011.
The other years only were
coded as day or night,
and when we went back to
the summary statistics,
he had them calculated
as day and night.
So we have to go back
and forth between
our code and his paper
and keep changing it.
We found problems similar to
the SQF data as for the
time of contact day.
So the third variable was alternative
outcomes, more specifically,
the proportions of civilians
that were guilty of carrying
any illegal items as per when they
encountered the police officer.
So in this case,
we are measuring all
of that just by race.
As you an see in
the x-axis is specifying
which races and then in the y-axis
is specifying the proportion.
This was the largest
discrepancy we encounter.
As you can see the standard errors
are more visible in this case.
So here we encounter a very
large difference in proportions.
So something to take into account of
why these discrepancies
happens is that
subjective decisions are
a very real threat to replicability.
So now something that
Fryer didn't mention,
but we still something did,
is we created this variable
the force-related incident.
So we are basically
separating groups by income,
and measuring the type of force,
and measuring by race.
So something interesting is that as
income increases, force reduces.
However, something that
keeps going it's that
blacks are more likely to experience
force rather than whites.
So now I will like to pass
the microphone to my teammate,
Brenda, who is going to take
over the model regression.
>> Thank you, Naomi. So Naomi
just mentioned summary statistics
which are empirical averages.
But what if there was
more like add-on mention?
So Fryer introduces
to us another model.
In this model, we're
predicting if the civilian
reports use of force,
and the outcome would either be yes,
they do or no, they don't.
That's based on civilian race
which is our variable of interest.
So together this is
our baseline model with no controls.
But now, what if we add controls?
What if it wasn't just race involve.
So first we added
civilian demographics,
that's like income, age, gender.
We then added civilian
behavior which as Cindy
mentioned is anything that
you would do to the officer,
we added contact
and officer care characteristics
which was the incident types,
time of contact was it day or night,
and the officer race,
and lastly, we added the year.
Now in order to use the model,
we had to filter out
the data to include
only people that had
face-to-face contacts
with the police officer,
and that reduced our number
of observations from
about 400,000 to about 60,000.
So let's take a look at
the results of our model.
So here we have a comparison of
Fryer's results and our results.
We first started off with
no controls and then we
added civilian demographics,
encounter characteristics
as explained before.
But if we take a look to the side
where we show a sample size,
Fryer start off with
about 59,000 sample size,
and we're not sure if that was with
no controls or if it
was with full control.
But when we ran our model,
we started off with about
60,000 and as you can see,
when we got to
encounter characteristics,
we have a quarter of the data left.
So overall, we can see that
the numbers don't really match.
But looking at the black population,
there is still the same trend,
they still have high coefficients
that decrease but still
don't get to one,
and so the odds are
still higher for blacks.
But like Adam mentioned,
all these numbers, what do they mean?
It's hard for me to
interpret and imagine
it's hard for a lot
of you to interpret.
So let's look at it in
a more simple way with probability.
So I created this plot
with the probability.
As you can see on the x-axis,
we have the civilian race
and on the y-axis,
we have the predicted
probability of force.
As you can see whites have
a two percent more chance of
experiencing force and blacks
about 5.5 percent chance.
So Fryer actually mentions
in his paper that blacks are
3.5 times more likely to report use
of force by police in an interaction,
and we're not sure if that's
odds or that's probability.
But looking at this plot,
we came up with that blacks are
2.7 times more likely than whites.
Well, if it was odds or not,
or probability is a difference.
So let's take a look at
the performance of our model.
So we use area under the curve,
because that's more of
a balance accuracy some measure.
Looking at this part,
if the area under
the curve was 100 percent
that would be like perfect.
If it was 50 percent that would
be equivalent to guessing.
In our case, we have an
area under the curve of
78 percent which is okay.
But now, what if we run the model
again without without race?
We got now the area
under the curve to be
74 percent and that is a difference.
So I think that race maybe
does play a role in
predicting use of force.
So we spoke a lot
about the PPCS data,
but what can we take away from it?
Fryer's results do not
replicate exactly,
but the underlying pattern
we did find which was
that the controls don't account
for racial differences.
We didn't reach one on the odds,
but the odds are still
higher for blacks.
Also, there was a caveat that
we had a lot of missing data which
eliminated many observations and so
the module doesn't really capture
everything that's going on.
With that, I'd like
to conclude and talk
about what we could take away
from research application.
So we pretty much had
the most ideal scenario.
We had public data,
simple methods, and
extensive documentation.
But actually it was much harder,
I mean, originally when
we got this project,
we thought it would take us
like two days, no big deal,
we can do this. It
took us four weeks.
Also it's really important that
people often leave out important
details and that's where we
get into the issues that
were mentioned before
of subjective coding
and missing data.
Another thing is that
releasing data and
code are key for producibility,
because otherwise it takes a lot
of effort to clean the data,
to recreate it, and we'd
have to spend a lot of
time of what we did.
Finally, it's important
to think critically
about easily misinterpreted results.
So as we mentioned before,
we prior said that blacks are
53 percent more likely to
experience use of words.
We still are not sure if
that's odds or probability,
we tried to simplify it
for us to understand it,
but again, we have to think
critically about these results.
I just want to go back on the point
about releasing data and code.
We have a major contribution
especially for all you
researchers out there.
We hope that you're going to
publish a lot of papers and
our clean data-set that
we worked hours on.
But it's now open to
the public for anyone to use,
it's hard work effort of
all of the team members here
and it's available on GitHub.
Thank you. I'll like to call.
If anyone has any questions,
we're here to answer.
>> I had one thing before
you get to the questions,
is that I got a heads
up that the building
might give a fire alarm test at
6:45 so we might have
a very abrupt end to questioning.
If that happens, they'll be around
afterwards, but have at it.
>> That might be too fast. Yes.
>> Do you guys have a contact flier?
>> We wanted to. Trust
me, we wanted to.
>> We have an e-mail out here now.
>> Have a draft [inaudible]
>> We have a draft, yeah.
>> We can [inaudible].
>> We were hoping he might
just randomly show up
here. Any other questions?
>> I have a more
technical one as well.
When you were doing the division
between low-intensity and
high-intensity force before I forget,
which speaker was talking about
that part of the presentation?
>> Yes?
>> How was the demarcation made?
How did you guys decide
where you were going
to categorize us? How did you?
>> Yeah, that's actually interesting
because first we thought
of the same too,
but actually that was
[inaudible] who did that and
each of the labelings for
the different intensity of force,
the forces, being handcuffed,
having a weapon drawn at you,
they were all actually part
of the reporting process.
That's how they were
labeled and [inaudible]
actually split them into
the intensity levels that way.
>> [inaudible].
>> Yes. Because another reason
that we also split
the intensity levels was that they
were not a lot of data points
for the high-intensity levels,
so in order to visualize our data
better and get
a more significant answer,
we decided to go with one
[inaudible] did. Thank you.
>> Any other questions?
>> Let me add to that. Another reason
we chose to do what [inaudible]
did is because there's
a lot of pots on the paper
that actually corresponds
to different levels of intensity,
different levels of
force that was used.
So we had to do a lot of cleaning
and we worked on some of them,
some of them turned out to
be not that significant,
but we had to do it for
the overall process.
That's why we did that split.
>> Yeah. In that, you showed that
in your replication that
you had pretty much
the same results regarding
intense force by
[inaudible] to almost
zero in the outside.
You have an interpretation
why that should be the case?
>> For this top question on Frisk?
>> Yeah. You showed
that for low intensity,
it was quite different by rays but as
we get higher and higher intensity,
the difference became
smaller and smaller.
>> Yes. So that was one of
the hypothesis of [inaudible].
So [inaudible] said that when
he actually to analyzed the data-set,
he saw that when it's
low-intensity forces,
there is more disparity but
high-intensity forces there's
not a lot of disparity.
So he concluded it's because
like how police uses force.
How it is for stop-and-frisk,
it's more common to
use hands, handcuffs,
or pushing to the wall,
that's a very common practice than to
use a pepper spray or a baton.
That's really extreme.
So very small portion of
the stop that were made
had extreme use of force.
>> Could you be possibly
because in extreme use
of force by the police,
the situation clearly requires it.
In other words it's not a question
of being subjective any longer?
>> Yeah, I think so.
That's a good point.
>> To add on to what he said,
[inaudible] actually does look closer
into higher intensity uses of
force through other data-sets,
but they were constructed
specifically for
this paper and are not publicly
available as a result.
There is a question at the back? No?
>> So my question was I
know you guys were mainly
trying to replicate
[inaudible] results but did
you experiment with
any other models beyond
logistic regressions to see if you
can get better predictability?
>> We mostly tried to
replicate and so we used
the same logistic regressions.
However, there were times
in [inaudible] paper,
he separated by use
of force and so we
had more of the lower
intensity forces running
a model just based on that
or adding more higher
intensity of forces.
But we mainly replicated
according to what he did.
But again we also tried to
take out race as a predictor and
we saw the area under the curve
for both stop-and-frisk and for PBCS
and so that was a little bit
of reaching out beyond
the paper and we saw the
differences. Any other questions?
>> All right, thank you guys.
>> Thank you-all for coming.
