- Hi, my name's Casey Canfield
and I'm an assistant professor
in Engineering Management
and Systems Engineering
at Missouri University of
Science and Technology.
I'm here to talk about
Framing Transparency
as an Ethical Responsibility
in PhD Data Science,
I teach a graduate level course
called advanced Engineering
Management Science.
This is a new course that I developed
when I joined the faculty in 2018.
And so far, I've taught it twice.
This course was largely inspired
by the applied data analysis course
I took from Alex Davis at
Carnegie Mellon University.
For the students, this
is their first exposure
to statistics, data science,
and programming, and R.
It's a small graduate-level course
that's required for all engineering
management PhD students.
In the second week of class,
when they're still getting
their feet wet with R,
I introduced the concept of ethics
and frame the importance
of learning statistics
in the context of scientific integrity.
We talk about how, "Data science
doesn't just predict the
future, it causes the future,"
which is a quote from Cathy O'Neil
who wrote weapons of mass destruction.
She talks about how the
most dangerous algorithms
are important, scalable, and secret.
I encourage my students to work on things
that are important and scalable
because that's how you make
the world a better place,
but it should not be secret.
So if we wanna avoid secrets,
we want to be transparent.
As Richard Fineman put it,
"Scientific integrity entails
bending over backwards
to show how you might be wrong."
We want to document
everything and make it public.
I introduced the concept
of the nine circles of scientific hell.
Neuroskeptic drew this comic
inspired by XKCD and Dante's Inferno.
The levels range from
overselling in level two,
which is related to overstating
the results of our analysis
to inventing data in level nine.
In class, we talk about
how there's this gray area
between unethical data
analysis and misconduct
that involves intentional deception.
We want to do ethical data analysis,
which involves not deceiving
our audience with statistics.
Sometimes, we have to worry
about accidentally deceiving ourselves.
There's lots of research about heuristics
and biases that influence
human perception.
For example, there's confirmation bias,
which is a tendency to focus on evidence
that is consistent with our expectations.
And hindsight bias, which is a tendency
to perceive events as predictable
after they have occurred.
These biases affect all
people, including scientists.
Before class, I ask students
to read Silberzahn's paper,
many analysts, one data set.
I really like this paper
because it goes to great lengths
to show how there are many different ways
to approach a single data set,
and even the most conscientious
data analysts can disagree.
This emphasizes that
there's a structural problem
that can best be solved with transparency.
The graph shown here is
from their nature paper
that summarizes this work.
Basically, they had 29
teams analyze the same data
to determine if dark
skinned players are more
or less likely to receive red cards
in professional soccer games.
Of the 29 teams, 20 found a statistically
significant correlation between
skin color and red cards.
While discussing this paper,
I highlight the implications
of drawing conclusions,
effective uncertainty,
and how this relates
to how they are gonna do data science.
So, to address this problem
related to all the degrees
of freedom that a data analyst has,
we need structural solutions.
This isn't about getting
rid of the bad apples.
We talk about all the challenges
for reproducible science
and how there are opportunities
to shift the environment
in which we work to be more transparent.
For the purposes of the class,
we focus on developing skills
related to preregistering studies,
understanding methodology,
employing checklists
for reporting, and making our
code readable by other humans.
After class, I have the students
knit their first R notebook
and ask them to write a paragraph
about which of the nine
circles of scientific hell
sounds like the hardest to avoid and why.
Most students focus on overselling
and post-hoc storytelling.
I'd love to hear your ideas
about how to teach ethical data analysis.
I can be reached at canfieldci@mst.edu.
I've also included the
full references here.
Thanks.
