Our next two speakers
are Melanie Martin.
She was a postdoctoral associate
in biological anthropology
at Yale.
She studies growth and nutrition
among indigenous populations
in Bolivia and Argentina
and registered her most
recent project on the
open science framework.
Melanie holds a BA
from the University
of Puerto Rico and an MA and PhD
from the UC of Santa-- UC Santa
Barbara.
She holds a certificate in
college and university teaching
and has taught courses
in human variation,
evolutionary medicine,
and statistical methods
for the behavioral sciences.
Her co-presenter
is Brett Beheim is
a senior researcher at
the Max Planck Institute
for Evolutionary Anthropology.
He studies cultural and
technological change
using both large scale
economic data sets
and ethnographic field work
from small scale societies.
He has shared several
research projects on GitHub.
Professor Beheim holds a BS
from Emory University and a PhD
from the University of
California at Los Angeles.
Welcome both of you.
OK.
So thank you to Michelle
and the other organizers
and presenters.
I also want to
thank our colleagues
in the Yale department-- Eduardo
Fernandez-Duque, Margaret
Coralee, and Juan Pablo
Perea who together with Brett
we've formed sort
of a working group
to discuss analytical
methods in anthropology that
have relevance to questions
about transparency
and reproducibility in
observational research, which
we'll both be
talking about today.
So to start off,
I actually wanted
to explain some of the
research objectives
in quantitative anthropology,
because most people are
familiar with anthropology
as ethnographic studies
of foreign cultures, which are
very important and critical
to current anthropology.
But there are also
many researchers
who are studying cultural
and biological processes
in a systematic fashion,
studying human and non-human
primate evolution and
behavior, and then,
of course, fossil and
archaeological remains
from our ancestral
and more recent past.
And in studying these topics,
we use theory and methods
from a host of natural
and social sciences
along with quantitative
methods of data collection
and analysis.
So some of this
research lends itself
very easily to open,
transparent, and reproducible
methods.
For example, a recent study
published by Cody Ross--
an anthropologist
now at Max Planck--
he examined evidence of racial
bias and police shootings using
a publicly available database
of US police shootings
that's actually crowdsourced
and maintained by Deadspin.
Some of the key
findings of this study
was that racial bias
was more prevalent
in certain large
metropolitan areas,
but wasn't really correlated
with local crime rates.
Instead it seemed to be more
predicted by higher income
inequality in those areas.
He also found that on
average across counties
black and unarmed Americans
were 3 and 1/2 times
as likely to be shot by police
as white and unarmed Americans.
So in addition to the
importance of these findings,
this is a wonderful illustration
of open, reproduceable research
in anthropology.
Because not only does the
data come from a raw--
does the raw data come from a
publicly available database,
but Ross also made available
in the publication the cleaned
and compiled data
sets that he used
in the study and all of his code
for conducting the analysis.
So that said though,
most quantitative
anthropological
research is still
done by individual researchers
or small groups of researchers
working with data that
they collect often
in small remote populations.
So an example of that would be
a recent study from Brian Wood--
an anthropologist here
at Yale-- and colleagues
who looked at physical activity
and cardiovascular disease
risk among Hadza
hunter-gatherers in Tanzania.
And what they found was that
the Hadza are about 14 times as
likely to spend
their day engaged
in moderate to vigorous
physical activity as
compared to same
aged US subjects.
You can also see in this
plot from that study--
the blue line are males
and the green line
are females-- that activity
levels aren't lower with age.
In fact, some of
the older adults
had relatively higher
activity levels,
which is not a trend that
we typically see in the US.
They also found in
a separate study
that across ages in the Hadza
we see a very low prevalence
of hypertension and
other biomarkers
of cardiovascular disease
risk, such as LDL cholesterol,
high c-reactive protein,
and high triglycerides.
Similar results have been shown
and other subsistent scale
indigenous populations
and together
contribute to our growing
understanding that many
of the chronic diseases that
plague citizens in our country
and other westernized
nations, including
cardiovascular disease risk,
obesity, diabetes, and cancer--
these are really diseases
of modern civilization.
And we just don't see
them in populations
that still engage in
relatively ancestral behaviors
with high activity levels and
relatively low caloric intake.
These are some pictures from the
data collection in that study.
So you can see subjects
going about their daily lives
climbing trees to collect
honey, digging for tubers,
and just hanging out.
And all of these subjects
have heart rate monitors
and GPS monitors
strapped to them,
which the researchers
used to actually get
the data about their
physical activity.
And I illustrate
these really just
to show you that this data
is really hard to get.
Because first off,
they're working
in a remote area of Tanzania.
This is also only
possible because Dr. Wood
and his colleagues
have spent decades
building trust and understanding
with this population.
And on top of that,
they're collecting
electronic and biomarker
data in an area
without electricity or
internet or refrigeration.
And then on top of all those
logistical difficulties,
if you have ever
done field work,
you know that in the field,
everything goes wrong
and nothing goes to plan.
And you have to work
from plan B, C, D,
and change your original
design concept multiple times.
So for this reason,
anthropological research really
is often constrained
to very small data sets
or small populations with
one shot data collection
that also has to be very
flexible at the same time.
So I think this raises
questions for what
transparent and reproducible
research in anthropological
and other field
studies really is.
So to examine that,
we might start
by first acknowledging what
observational research can
and can't do.
So first off, really,
we don't usually
have experimental controls,
because these may be
impossible and often unethical.
Second, even if you might be
working with a small sample
size or actually the sample
size is the entire population
in the case if you're working
with available skeletal sample.
Also, often many
times, our populations
are specific to a
certain time and place.
It might be rapidly
changing, which
is happening with many
subsistence level cultures
like the Hazda.
And in that sense, then
anthropological research--
anthropological
researchers we can never
resample our populations.
And in that sense, our data
can never be replicated.
OK.
Secondly, we might
really examine
what the goals of
observational research are.
And this has been
echoed earlier today too
in terms of acknowledging that
very often, we are exploring
and describing a known
phenomena and generating
new hypotheses from the
patterns that we observe.
At the same time,
researchers might
be testing existing hypotheses
from existing theories,
but using proxies in
observational research
rather than
experimental controls.
That said, even though we lack
experimental controls to show
pure causality, we are
still interested in showing
empirical support
for these hypotheses
and establishing
robust associations,
as these findings then
can inform future research
and policy in certain cases.
OK.
So on top of acknowledging
these challenges and goals that
are sometimes specific
to observational
as opposed to experimental
research, anthropology
is-- anybody who
might be familiar
with the history of Margaret
Mead's research might know
there's a long
history of questioning
methods and transparency
in anthropology.
So thinking about
all these things,
we have tried to identify
certain practices that
might promote transparency and
reproducibility in anthropology
and other observational studies.
So with that, I'm going
to hand the talk over
to my co-presenter, who is going
to discuss some of these areas,
again, that have been mentioned
today in terms of identifying
and using appropriate
statistical methods
for exploratory versus
confirmatory analysis
and emphasizing
reproducible methods rather
than replicability of results
through data management
and sharing, registering
analytical protocols, and then
internal incentives to
promote this behavior.
Yes.
Thank you.
As Melanie described,
in anthropology
to paraphrase Monty Python,
every data point is precious.
And often the
information that we
have about a particular site
or historical location--
is that OK?
Often the information we have is
possible-- the only information
that will ever be known about
that particular population.
So we have a somewhat
unique situation.
But in anthropology, as most
of the physical and natural
sciences, there's active
interest in the open science
and reproducibility
movement that I
think has kind of been
percolating up here.
So in terms of these
statistical aspects
of quantitative
anthropology, there's
a interest-- a renewed interest
in developing methodologies
that are suited towards
observational studies
with small samples.
And often, this is
predicated on the idea
that the traditional
methods in statistics
were often developed in
situations that we can't really
apply to the kinds of analyzes
and data sets we're using.
Often, they were
developed in the context--
and in the case of many frequent
statistics-- in strict control
treatment experimental studies,
where we have the ability
to model exactly how the
analysis is going to happen
before the data is collected.
In anthropological research--
observational research
in anthropology-- often
we forego the direct goal
of establishing causality
or treatment effects,
and focus instead on
trying to establish
reliable or robust associations
between measurements.
And there's also a
small difference,
but I think a very important
one between arguing
that we're arguing for
a causal mechanism,
as opposed to predicting
a particular phenomenon
without necessarily
making direct claims
about the causality.
This is, of course,
the same idea
when it comes to Bayesian
modeling techniques applied
to predicting outcomes
that-- oh God, I've
got to get it up on the screen.
Sorry.
So much like election
predictions and Moneyball--
sorry flashbacks-- and
the new wave of analyzes
that we see in social
science, often this
takes on tools drawing
from Bayesian approaches
from information theory,
machine learning methods as
described, for example,
in Hastie, Tibshirani,
and Friedman.
These are all
techniques which are
coming in to
anthropological analyzes
just as much as they are
into other social sciences.
One other aspect, which comes up
quite a bit with us especially
in anthropology,
which we think maybe
as we're talking
the most about here,
is the difference
between exploratory and
hypothesis-oriented research, or
is the way that Brian Nosek put
it yesterday in
his talk-- thinking
about looking at
something prospectively
as an exploratory approach
or as a justification,
scrutinizes any existing theory.
And this is another great
paper by Paul Rosenbaum,
which was written
specifically for psychology.
But I think it's applicable
to anthropology as well,
as a justification
for what we can
call nontraditional types
of papers, papers that
are oriented around
exploring data
sets in a way that could
be described negatively
as data dredging or harking,
which is hypothesizing
after results
known-- both of which
have been become relatively
recently identified
as serious problems in behind
the non-reproducibility plague
that seems to be
happening under our noses.
But from an anthropological
point of view,
this is often all we can do.
We don't have necessarily
strong hypotheses
coming into a particular
field site or location.
And we struggle
with the framework
of a strict hypothesis
testing sort of idiom,
simply because in
many cases, it's
not appropriate, or
the analysis itself
motivates particular
relationships
between variables.
So how to navigate that
with the obvious problems
with things like harking
is something that I think
is worth talking about more.
I will mention one
particular study
that I was part of that came
out this year, because I thought
that it had a relatively
novel approach at least
in anthropology
to doing the data
analysis and the presentation.
And as Melanie mentioned,
sometimes the reproducibility
of the methods is possibly
the best we can do,
because we're dealing with field
populations or with historical
archaeological data, which
cannot be replicated.
We cannot do another study from
the same source and collect
more data from that source.
So this is a
particular project that
was spearheaded by Siobhan
Mattison at the University
of New Mexico collecting
data on parental investment
and decision making
in a minority
group in southwest China.
And the particular research
question that she proposed was
is there evidence that
women facultatively
change their reproductive
stopping behavior?
That is, the number
of children that they
will have conditional
on the current sex
composition of their
children that they have.
So the way that we visualize
the particular results
of this model-- we
have two different sets
of communities within
this population--
the Moso of southwest China.
And these sort of
honeycomb shapes
here represent a sequence
of birth decision making,
depending on the existing
number of children
and how many children
are boys versus girls.
So starting at the top of
each one of the pyramids,
we imagine a woman
with no children
having a probability of
continuing reproduction--
the probability of having
another child-- the parity
progression ratio as it's
called in demography.
And the orange pyramid starts
at 93% and then the blue pyramid
88.9%.
I'll explain what the
colors mean in a second.
And then conditional on having
a particular child, either a boy
or girl, we moved
down the pyramid
into one of the
corresponding cells.
And we have another stopping
probability-- or sorry--
a continuation probability--
the probability of continuing on
to have another child.
The stratification here
between the patrilineal
and the matrilineal
is theoretically
interesting in this
particular study,
because we have reasons
to think that groups where
property and status and
titles are inherited
through the mother's
line might have
a different view of the marginal
value of having another child
if it's going to
be a boy or a girl
than on the matrilineal
side of things.
And in this
particular society, we
have both kinship systems
taking place simultaneously
in different villages in
the same ethnic group.
So there's a lot of
similarities between the two,
except for a very
marked difference
in their kinship systems.
And indeed, we do see as you
can see by the colorations
here, a relatively strong
implication that women with two
sons-- or sorry, excuse
me-- two daughters
in the patrilineal group
have a much higher chance
of continuing on and
having a third child
than in the matrilineal
group having two sons--
or sorry-- two daughters.
So the reason that I mention
this particular study
is that we have the data
and all of the materials,
including the R code,
that this was done in R,
available in GitHub and the link
being down at the bottom there.
And the paradigm that
we're working here
is using language from the open
source software environments,
which I think there's a lot of
idiomatic similarities here.
So this is a potential
visualization
of a research project done under
strict version control system,
like using [? Git, ?]
for example.
And I think there's
a lot of similarities
in this picture to what we
saw during Brian's talk.
Each of these nodes represents
a particular snapshot
of a project continuing
on from the initialization
of the project all the way to
publication or post publication
peer review feedback
on some kind
of potentially Amazon
comments type system,
like we heard
during Alan's talk.
And the contention
that I have here
is that this kind
of framework is--
at least in terms of laying out
how a project might develop--
is compelling for us as
anthropologists, because it
implies there's a chronological
sequence that's stored,
that can be shared.
This particular
visualization here also
has different branches
that correspond to-- we
could imagine--
branches that were
willing to be made public--
maybe the blue line at the top
there, the master branch.
And then other components
during the development process
that for reasons of
confidentiality or sensitivity,
we don't have the
ability to disclose
or we don't have as
available for public release.
And this is maybe the last
point that, in anthropology
particular, it's very often
the case that anthropologists
see themselves not
only as researchers
for particular population,
but also as advocates
for that population.
There's usually a very
large power difference
between the groups
that we work with
and ourselves as researchers.
And so that informs the way
we look at data releases.
And I can say, just speaking
from my own ethnographic
experience with
anthropology, we generally
are very much data hoarders.
And part of the reason
why is because we're
afraid there's going to
be unintended consequences
of making too much information
available up the groups we
work with.
So in that case, in
this particular project
with the Moso, I felt that the
solution or the happy medium
was that we strictly had
the variables and the data
points that were used for the
analysis available with nothing
else.
And to some extent, scrambled
so that they couldn't
be used to match up
to individual people
without distorting any
of the signals that were
done during the data analysis.
So with that, I think we
should mention two things that
are sort of
developments here that I
think are very hopeful or
insightful in anthropology.
Mel is responsible for
one of the first projects
in the open source framework in
anthropology-- pre-registering
and analysis she's doing right
now on the Age at Menarche.
And also, to again,
plug the badges,
which is, of course, responsible
from the other open source
framework innovation here.
This is something which
anthropologists here at Yale
have advocated for
in the International
Journal of Primatology.
And you can see
the citation here.
So yeah.
In conclusion, we
think that there's
a lot of scope for these
ideas in anthropology.
But there's some concerns or
qualifications that are maybe
unique to our particular field.
And thank you.
[APPLAUSE]
