Good afternoon.
Hello and a warm welcome to the second
Lunch Hour Lecture this term.
Hello also to those
who are watching us online.
Who can also ask questions
and you can ask questions too.
If you are online
and you want to ask questions,
then you can either go to Twitter
@UCLLHL
or the Slido site,
for which the hashtag is 8821.
Having done that, I'm very pleased
to introduce Professor David Shanks
from the Department of Psychology
and Language Sciences,
with his lecture entitled,
'Does Social Science tell the truth?'.
Thank you David.
Thank you very much.
Obviously, I'd like to be able
to stand here
and give an affirmative answer
to this question that I've posed
but I'm afraid the social sciences
have taken something of a battering
in the last few years.
We've come to learn
that many of the methods we use
may not be quite as robust
as we would like them to be.
We now have major ongoing efforts
to check the reproducibility
and reliability
of the kinds of things
we have taken for granted,
the findings that we have put faith in
and built our theories on.
This scrutiny of our methods is taking
place across all the social sciences,
in economics, in political science,
education research,
linguistics,
sociology and so on.
The problems
that I will be talking about
and probably the remedies as well,
are pretty general
across the social science spectrum.
I'm a psychologist, as you've heard,
and I'm going to be giving
a brief chronology
of some of the events
in the last few years
that have been particularly influential
on my thinking about these issues.
It is a little bit psychology oriented
but I do want to emphasise that these
concerns span all the social sciences.
The other thing to say is that
psychology, regrettably,
has really borne the brunt
of a lot of criticism.
We have been very badly tarnished
by some of the things discovered.
The good news is that that has motivated
many of my colleagues in the field
to be at the forefront of thinking about
remedies and methods
to create stronger social science
and that is what I want to do.
By picking out some particularly salient
events from the recent history,
telling you about what the original
event was
and using them as a case study
to introduce some of the methods
that have been developed in the
intervening time, up to the current day,
which perhaps will help us build
a more solid stock of evidence
so that perhaps somebody giving a talk
under this title in twenty years' time
will give a rather different answer.
Our story begins around
six or seven years ago, in 2010,
and I want to introduce you
to this lady in the top left,
whose name is Amy Cuddy.
She was a co-author in 2010
of a very attention-grabbing study.
You may have come across this notion
of power posing.
This was this study done by Dana Carney,
Amy Cuddy and Andy Yap,
published in a high-profile, prestigious
psychology journal.
Basically what they reported
in their study,
was that striking a power pose,
shoulders back, chest out,
that assertive pose,
for just a couple of minutes
could have striking effects.
Not only internally but also externally.
For example,
they measured testosterone levels
in their participants in this study
and they found
that striking this power pose,
as illustrated by these images
at the bottom,
elevated testosterone levels
compared to a control group where
participants didn't adopt that pose.
Of course, testosterone is associated
with aggression, assertiveness, status.
You can imagine if you were going
to go into a job interview
you might want to get a little boost
to your assertiveness
and an incredibly simple two-minute
intervention in which you take this pose
could have powerful effects.
They also found that
after taking a power pose
participants were much more willing
to accept risk, to take risky choices.
Well...
That became a well-discussed
and broadcast piece of research.
Picked up in all sorts of media outlets.
This is the image of Amy Cuddy,
who has built a big career
around the power posing literature
and she gives lectures on it
and has written a book on it.
This is the front of her TED talk.
You can see the number at the bottom,
it says 36 million.
That is how many people have viewed
her power posing video.
It is one of the all-time
most-viewed TED talks.
It has been incredibly influential.
You may recognise this individual.
He was ridiculed at the Conservative
party conference a couple of years ago,
for coming out and standing like this.
Everybody thought, 'What is he doing?'.
Maybe he had this idea
that by striking this power pose
he could not only exert an impression
of power on the audience
but also boost his own testosterone
and give a more assertive speech.
Not that it did him
a great deal of good.
Why am I telling you all this?
I am telling you all this
because it turns out
probably to be completely untrue
that power posing
can have these effects.
The story is going on
to the present day.
This is a post on Dana Carney's website
from just last week,
perhaps two weeks ago.
She posted a statement saying that,
she, now,
in the light of numerous failures
to reproduce these findings,
in the intervening five or six years,
now no longer believes the effect.
"As such I do not believe
the 'power pose' effects are real."
Amy Cuddy, one of the co-authors,
has completely distanced herself
from that conclusion,
she still believes the effects are real.
There are many lessons
from this kind of example
and I could have given you
many examples of this,
of influential research that has
turned out to be difficult to reproduce.
The one lesson I want to draw from this
is around this obsession
that social scientists have
with this notion
of statistical significance.
We do our studies and at the end,
we come up with a thing called a P value
which many of you will be familiar with
but if you're not, you are trying
to calculate a probability.
The probability of getting the
effect you observe in your experiment
if there really is no difference.
If the truth is that there's no effect
of power posing,
how likely is it
you would see a difference
between the control group and the
experimental group of the sort you saw.
Convention has it that one seeks to find
a probability that is less than 5%
a probability of 0.05.
There is nothing magical about that 5%.
It has just become embedded in the way
that behavioural researchers,
that social researchers,
do their studies.
If you do your study and you find
the effect is statistically significant,
meaning the probability is less than 5%,
then glory awaits you.
You will be published in a high-prestige
journal like Psychological Science.
You will get promoted, you will get
the invitation to do a TED talk.
If your effect is not significant, if
it comes out at higher than 0.5,
what happens?
Probably what happens is your study
disappears into your filing cabinet
and never sees the light of day.
One thing we have learnt
in the intervening period
during which these replication failures
have been accumulating
is that this obsession with P values
is a very poor way
of assessing the evidence
we are acquiring in our research.
Other methods have become popular
in the intervening period
and here I am just showing you
a screenshot of a piece of software
that has been developed
and is now freely available called JASP.
This is a statistical software package
that is free to download
and you can use this to do
conventional statistical testing
and it will give you a P value,
but what is more interesting is that
you can use it
to do completely different forms
of evaluation of evidence,
particularly around Bayesian methods.
Bayesian methods don't yield a P value.
P value has nothing to do with them.
They are methods
for aggregating evidence
and telling you
how much support you have
for your experimental hypothesis.
I think it is quite important
that these forms of software
are available freely
and that you do not need
to be a statistical expert.
You don't need a PhD in statistics
to be able to understand
what these tools do
and incorporate them in your research.
Another thing these
alternative statistical methods
illustrate very obviously
is that for many studies,
even though they might achieve
this magical 0.05 level,
a Bayesian approach will tell us that
the sample sizes are way too small
and this is epidemic
in social science research.
We study the phenomena
that we are interested in
with sample sizes in our experiments
that are much too small
Well, not long after that work
came to prominence,
there was another development
which I think played quite a big role
in how social sciences
are reforming themselves.
I want to illustrate this by reference
to a website
that was created by Hal Pashler,
Bobbie Spellman and their colleagues,
around 2011.
I have emphasised that
one of the problems
with this magical 0.05 cut-off,
is that if your study doesn't give
you a statistically significant result,
there is a strong tendency
to bury it in your filing cabinet.
The problem with that is that it means
things that do get published in journals
are actually not a proper representation
of the state of the world.
They are a biased sample
of the evidence.
We now take it for granted
that we live in a cloud-era,
where storing information and making
it publicly available on websites
is essentially free and trivial,
almost costless in terms of time
and effort as well.
Back in 2011, this was quite a novelty
and the idea behind this website was
that it simply operated as a repository
for researchers to upload brief reports
about any replication experiments
that they had undertaken.
The sorts of things they maybe wouldn't
even try to publish in a journal,
even if they did try,
the likely response would be that
the journal wouldn't be interested.
Rather than burying them
so that they are publicly invisible,
the website allows these reports
of the studies
to be publicly available for
other researchers
to see how much evidence there is,
how easy or difficult has it proved
to replicate a study.
Very quickly when this website
was created,
it was found that there were hotspots.
There would be particular findings
in the literature,
like the power pose finding,
where lots of people would be uploading
descriptions of their failed experiment.
Then of course that gains traction
and can make its way through ultimately
to correcting the literature.
This was 2011,
now we find that many journals
have actually created special sections,
routinely for publishing
replication attempts.
There is recognition of this difficulty
of disseminating
failed replication studies
and trying to make that easier
through the normal publishing process.
Well 2011 was a bad year for psychology
in another respect,
which is that this individual,
you may recognise, Diederik Stapel,
was unmasked as a fraud,
as a data fabricator.
He lost his job as a consequence of that
and subsequently has had to retract
58 journal articles.
He had been making up data
for quite a long period of time.
This is all now on record
because he wrote an autobiography
which is freely available on the
internet
and is a fascinating read
for a variety of reasons.
We have it from his voice that he was
engaged in data fabrication.
Unfortunately, we have found
that he is not alone.
These are some other individuals who in
the last few years
have also been revealed
to be involved in data malpractice.
If you look at that and think
there is a gender asymmetry there,
there is now quite good evidence
that data malpractice is actually much
more prevalent amongst men than women,
even controlling for the different
proportions of males and females
in different scientific disciplines.
The reason that I am referring to this
is because in at least a couple
of these cases,
particularly around Lawrence Sanna
and Dirk Smeesters,
the evidence that they had been engaged
in data fabrication
came from statistical detective work
undertaken by people
who looked very carefully
at some of their
published research articles.
And found that there were various
things in those reports
where you would have expected things to
be random but which looked non-random.
This eventually led to unravelling of
the fabrication that had been going on.
Of course, that kind of data
exploration,
occasionally you might be able to do it
on the very small amount of data
that makes its way
into the published research article.
It would be much healthier
if all the raw data collected
in your research
was publicly available.
The actual behaviour of each participant
in your experiment,
in addition to the summary data
in the final publication.
Open data becomes a really important
aspect
of attempts to weed out data
irregularities
because of course you would
have to be a very brave person
to fabricate data at the level of
individual participants
and make it
publicly available
because it is very difficult
to fake randomness on a large scale.
Open data sounds good in principle
how achievable is it?
Almost every journal has a policy
that if you publish in the journal
you should make your data available.
We know that this doesn't work.
We know if you send off requests
to 100 random authors
of published articles,
asking for their data,
your success rate
will be something like 25%.
Authors come up with all sorts of
reasons
why they don't feel it's appropriate
to let you see their original data.
Some reasons may be perfectly valid,
if they have done a trial of a
particular, in medicine for example,
where there might be worries
about anonymity,
but in the main,
it is difficult to get data
and you come up with a lot of 'the dog
ate my homework' type responses.
So what can we do?
What I want to tell you about now
is quite a nice psychology intervention,
which looks quite promising.
What I am showing here
is an article from a recent issue
of Psychological Science,
the actual content of this article
is not the point here.
I want to point your attention
to those three coloured symbols
appearing on the title page
of this published article.
These are badges.
The blue badge is awarded to authors of
a paper
if they make their data publicly
available
through an open or public repository
at the time of publication.
The orange badge is for making
the experimental materials available.
The red badge is for pre-registering the
study,
which means giving a full description of
what the study is going to entail
and how the data
are going to be analysed,
prior to any data collection
taking place.
You get these little badges.
You might think, 'How is that
going to affect anybody's behaviour?'
It is a trivial little symbol appearing
on the front page of an article.
In actual fact, there is reason
to believe that this kind of badging,
operates as quite a successful
nudge to authors.
What is shown here is some data
from a study undertaken last year.
My PhD student Tom Hardwick
was part of this study.
They looked at the proportion
of articles in Psychological Science
before and after the introduction
of the badge mechanism.
You can see that it shoots up,
after the dotted line which is when
the badges were introduced.
Along the bottom, in these grey bars,
are a series of comparator journals,
in the same sort of area,
which haven't introduced badges.
Of course that is not conclusive proof
but it is pretty suggestive
that you can get small nudges
to influence people's behaviour.
There is evidence that people want
to adopt good practices
in their research
and maybe they just need
a little bit of a nudge to do so.
Still around 2012, 2013,
we now turn to an intervention
by this individual,
perhaps some of you will recognise
as Nobel Prize winner Danny Kahnemann.
Kahnemann won the Nobel Prize for
Economics,
but let me make it clear
that he is a psychologist.
Card-carrying psychologist.
What Kahnemann did in 2012,
these issues of replication difficulties
were beginning to get a lot of coverage,
and Kahnemann sent round an email
to a closed group of colleagues,
expressing grave concern
about the health
of some aspects of research
in psychology.
Particularly around
some aspects of social psychology.
When a Nobel Prize winner
expresses views
on the health of a field,
people tend to pay attention to that,
quite rightly.
They also paid attention
because Kahnemann used
some quite evocative language.
This was a news item
reporting his email.
His email went completely viral, it got
circulated to more and more people
and eventually it became
a big news story in its own right.
He said, "I can see
a train wreck looming."
This is powerful and evocative language.
Particularly around these areas
in social psychology.
"Your field is now the poster child
for doubts
about the integrity
of psychological research."
Roughly at this time,
I myself started to become interested
in the kinds of research
that Kahnemann was pointing to
and this was really research about
how you can tweak people's behaviour,
you can influence people's behaviour by
very subtle or unconscious signals
that you provide to them.
It turned out there were examples of
this that were probably not reproducible
despite them being very prominent
and figuring in textbooks and so on
I want to tell you about
one example of this
that we have had a look at it,
which leads into another recommendation
and area in which research
has been trying to modify itself.
This is a book which summarised a lot of
research carried out
by Douglas Kenrick and Vladas
Griskevicius and their colleagues
all around the idea that the brain
is made up of a series of modules,
so you have a module
for gaining status for instance.
You have a module for mating.
These modules have a long
evolutionary history to them.
Subtle signals, unconscious signals
in your environment
can activate these modules
and affect your behaviour,
particularly around things
like taking risks.
This is the kind of research
that fell under this umbrella heading.
This is a study, I won't go into the
details,
but a study that if you subtly
suggest to women
that they might have a sexual rival for
their partner,
that increases their willingness,
apparently,
to consume dangerous dieting pills
or to go and get a sun tan.
We were a little sceptical
about some of these findings.
This is one of the things
that we did in our research.
We looked at all of the studies that had
been conducted
on these mating effects
on risk-taking behaviour
and what is shown in this graph,
each of the data points is a study
that was conducted and published.
On the X axis is the size of the effect.
So you can see that
everything is greater than 0.
0 here would mean no effect being found
of this subtle, unconscious influence.
The further right the data points are,
the bigger the effect.
They vary. An effect of 1 or 1.5
is quite a large effect.
What also is plotted in the graph,
is the precision of the estimate
and this is basically measured
in terms of sample size.
Very large sample sizes, big experiments
comparing large groups,
are higher up on the Y axis
and small studies
down here lower on the Y axis.
There is clearly a strong relationship
between these two things.
What does that tell us?
Really, there should not
be a relationship.
There is no reason why the size of your
samples
affects the estimate
you make of an effect.
One likely explanation for what is
happening here
is the publication bias effect I talked
about in terms of the file drawer.
If you do a small study,
you have to get a very big effect for it
to be statistically significant
and for it to be publishable
in a journal.
If you do an experiment
with small sample sizes,
if there is no true effect at all,
you are quite likely to get a
non-significant result
and the study disappears into your
filing cabinet and never gets published.
That is why you might expect to see data
points down here to the right.
If you do a very large study,
it can yield statistical significance,
even if the effect size it acquires
is quite small.
I hope you can see that it is not too
hard to imagine
that if you extrapolate up here
a little bit,
and do a study again
with quite a large sample,
you will probably get an effect
that is vanishingly small.
Indeed, we did lots of attempts to
replicate these experiments
with complete lack of success.
All our data points fell up here.
The conclusion
that I want to draw from this
is that we need methods,
as we have used here,
for analysing
entire sub-fields of research,
not just looking at individual studies
and asking whether they are reproducible
but methods called meta-analytic methods
where you look at a whole body
of research to assess its robustness.
This is a meta-analysis
in which we are pooling
lots of different studies.
We can't from this draw any conclusion
about individual studies
but we can say that the entire body
of research
is looking quite dubious
in the conclusions
that it might want to support.
Now we have many ongoing methods
being developed
in the meta-analysis field,
these are just a few examples,
where techniques are being devised
that you can apply to a body of research
to ask whether it is healthy or not.
How do we go about
reforming social science?
I have suggested a few things
to look at,
alternatives to this magical
P value cliff,
we need to give much more visibility and
accessibility to replication studies,
open data, pre-registration,
badges and so on.
I hope that by developing
these sorts of methods
and embedding them in the field
and training young researchers
with these best practice methods,
we will be able to come in the future
with a slightly more robust
social science.
Thank you very much, David.
I enjoyed that.
Now it's time for questions.
It's open to you on the floor.
When you ask a question,
hang on for the microphone.
There's an online one down here.
Hello, David. Thank you very much.
This question is coming from
our Slido page,
and it is related to how you opened
talking about power posing.
The question is:
Do you think popular science
and social media has impacted
what is regarded as true?
That is a very interesting question.
Clearly, I think there are concerns
about the connection between academic
research
published in scholarly journals that
has gone through an editorial process
and then how that makes its way into the
media and how that gets picked up.
Examples like the autism
and MMR episode,
highlight, beyond question,
the worries that can arise as a result
of getting that translation wrong.
I think there is responsibility
at both ends.
I think clearly academics
need to recognise
the responsibility they take on when
they give a message out into the world
particularly on something that could be
a matter of health
or life and death
to people.
But also one does worry sometimes
about the media and news outlets
that they just have
this uncritical desire
to absorb anything
that any scientist says
and that can lead to concern as well.
There's a question just behind,
right where you are.
I'm just wondering what you think
about a different field, medicine,
where I think the same sort of thing,
I'm not working in the field,
but especially with regard to open data,
and this recent case
of the statin business,
where NICE have taken
some guidelines over,
but there is still a lot of data in that
that has not been released, I believe,
I follow a group called All Trials
to try and get open data,
but have you got any information?
Of course, in the medical field,
these issues around pre-registering
studies
and making the data openly available
have been part of the discussion
for a long time.
You have concerns about drug trials
for example
where drug companies are not
making failed trials publicly available.
In terms of open data, in medicine
and in some other areas as well,
one does have to be extremely careful.
There are all sorts of issues
about confidentiality and anonymity,
when a participant comes along
to take part in your study
whether it is a medical study
or a study of power posing,
they sign a consent form,
the study has been through
an ethical approval.
Did that ethical approval
explicitly request
that the participant would consent
to their data being made
publicly available?
If you are looking at a study
done five or ten years ago
and people are asking for the data
to be made available,
if the participants
did not consent to that,
it is quite difficult for researchers
to make the data available.
There are issues about anonymity.
If you are doing a study
on a particularly rare health condition,
if the data were made publicly
available, even anonymised,
would you be able to figure out
who a particular participant is.
These are quite tricky areas.
I would say that the UK research
councils and the government
has recently published a concordat
on open data
and it is quite nuanced in thinking
about some of these issues
but it does say, in many research areas,
there is huge scope
for more openness of data.
Thank you for that
and there is a question just here.
This is a completely
new topic to me entirely
but bearing in mind
what we've talked about,
particularly the last questions about
the use of data and responsibility,
what efforts do editors
of these very fine journals take
to inform readers,
particularly hysterical
mass media outlets,
to say that this was a very small sample
and what is more,
the authors of this study did not allow
their data to be open data
therefore, take it
with a massive pinch of salt
and indeed we are a bit embarrassed
to publish it
because it could all be a bit iffy,
with what we know about small samples.
The journals bear some culpability here.
I think you have journals
like Science and Nature.
We know there is good evidence
that research published in Science
is less likely to be true
than research published elsewhere.
For behavioural research.
These are commercial entities.
They are trying to justify
their high subscription rates.
One way that they do that is by
attention-grabbing research
on topics like power posing.
They could respond by not publishing
that kind of research.
If it is not open, if it does not have
big sample sizes and so on.
They are conflicted.
On the one hand,
wanting to sell subscriptions,
but on the other hand,
being gatekeepers of research
and we have a long way to go
to reconcile that.
There are some journals like PLOS ONE,
that now demands open data.
There are some exceptions to that
which are reasonable,
but in the main
you have to publish your data.
That is a very positive development
for a journal to take.
The question that's over there.
What would you say about
using different statistical analyses
to work on the same data set?
Do you usually arrive
at different results,
especially when considering data
transformation and amputation?
You just look for
what is working sometimes.
Of course
and that is a very good question
and this is part of the problem,
in that the reason why many studies turn
out not to be reproducible
is because authors collected
small amounts of data
and they pummelled the data
until they achieved
a significant result,
they transformed them,
maybe they dropped participants
who they thought were outliers,
they did all of these things
after the fact,
after getting the data,
and that increases the likelihood
that you will get a false result.
So what is the remedy?
Partly the remedy is statistical.
To make it very clear what you have done
so that others can see the strengths
and weaknesses of what you have done
but also if the data themselves
are there,
then others can try
other statistical techniques
to see if you can get a statistical
result
only if you do transformation
and so on.
Also, I briefly mentioned
pre-registration and that is important
because when you pre-register a study
you are tying your own hands.
You are saying how you are going
to analyse the data,
what is your sample size, when are you
going to end data collection and so on.
Question here.
Do you think there is anything to be
said for the idea
that social sciences like psychology,
there is a lot of pressure now to make
them as clear cut
as the hard sciences
like physics or biology
and that part of the problem is changing
the perception of social sciences
so they come to realise
there is never quickly going to be
a definitive answer
on how certain things affect
behaviour and things like that
and maybe it is partly to do with how
people perceive the social sciences
and how clear cut they are and can be?
That is an interesting question.
I would hope that for a psychological
study or a political science study
or a physics study,
what you do is in a particular
situation,
a particular context
with particular participants
so the extent that you can recreate
that,
we want the findings to be robust.
It is true that in social science
subjects
things probably do vary more
across contexts
and across time and so on
in ways things don't in physics.
Let me just make the point,
these problems are widespread
in the social sciences
I think the problems are perhaps are a
bit wider than people recognise
in physics and chemistry
and other subjects as well.
There are surveys that have been done
in which the vast majority
of practicing chemists
estimate the reproducibility of research
in their field
to be way less than 100%.
70 - 80 %
Now that is survey data,
but it strongly suggests that people
in those hard science fields
have had difficulty reproducing
other things that are published.
Thank you very much,
I think it is the last we have time for.
In a world that is driven
by metrics and league tables,
we lean very heavily on the journals
in order to try and lead the way here
beyond the badges technique.
Anyway, thank you very much indeed David
for a very interesting lecture.
