Reproducibility is the closeness of the agreement
between the results of measurements of the
same measurand carried out with same methodology
described in the corresponding scientific
evidence (e.g. a publication in a peer-reviewed
journal). Reproducibilty can also be applied
under changed conditions of measurement for
the same measurand to check, that the results
are not an artefact of the measurement procedures.
A related concept is replicability, meaning
the ability to independently achieve non-identical
conclusions that are at least similar, when
differences in sampling, research procedures
and data analysis methods may exist. Reproducibility
and replicability together are among the main
beliefs of 'the scientific method'—with
the concrete expressions of the ideal of such
a method varying considerably across research
disciplines and fields of study. The reproduced
measurement may be based on the raw data and
computer programs provided by researchers.
== About ==
The values obtained from distinct experimental
trials are said to be 'commensurate' if they
are obtained according to the same reproducible
experimental description and procedure. A
particular experimentally obtained value is
said to be reproducible if there is a high
degree of agreement between measurements or
observations conducted on replicate specimens
in different locations by different people—that
is, if the experimental value is found to
have a high precision. Both of these are key
features of reproducibility.
== History ==
The first to stress the importance of reproducibility
in science was the Irish chemist Robert Boyle,
in England in the 17th century. Boyle's air
pump was designed to generate and study vacuum,
which at the time was a very controversial
concept. Indeed, distinguished philosophers
such as René Descartes and Thomas Hobbes
denied the very possibility of vacuum existence.
Historians of science e.g. Steven Shapin and
Simon Schaffer, in their 1985 book Leviathan
and the Air-Pump, describe the debate between
Boyle and Hobbes, ostensibly over the nature
of vacuum, as fundamentally an argument about
how useful knowledge should be gained. Boyle,
a pioneer of the experimental method, maintained
that the foundations of knowledge should be
constituted by experimentally produced facts,
which can be made believable to a scientific
community by their reproducibility. By repeating
the same experiment over and over again, Boyle
argued, the certainty of fact will emerge.
The air pump, which in the 17th century was
a complicated and expensive apparatus to build,
also led to one of the first documented disputes
over the reproducibility of a particular scientific
phenomenon. In the 1660s, the Dutch scientist
Christiaan Huygens built his own air pump
in Amsterdam, the first one outside the direct
management of Boyle and his assistant at the
time Robert Hooke. Huygens reported an effect
he termed "anomalous suspension", in which
water appeared to levitate in a glass jar
inside his air pump (in fact suspended over
an air bubble), but Boyle and Hooke could
not replicate this phenomenon in their own
pumps. As Shapin and Schaffer describe, “it
became clear that unless the phenomenon could
be produced in England with one of the two
pumps available, then no one in England would
accept the claims Huygens had made, or his
competence in working the pump”. Huygens
was finally invited to England in 1663, and
under his personal guidance Hooke was able
to replicate anomalous suspension of water.
Following this Huygens was elected a Foreign
Member of the Royal Society. However, Shapin
and Schaffer also note that “the accomplishment
of replication was dependent on contingent
acts of judgment. One cannot write down a
formula saying when replication was or was
not achieved”.The philosopher of science
Karl Popper noted briefly in his famous 1934
book The Logic of Scientific Discovery that
“non-reproducible single occurrences are
of no significance to science”. The Statistician
Ronald Fisher wrote in his 1935 book The Design
of Experiments, which set the foundations
for the modern scientific practice of hypothesis
testing and statistical significance, that
“we may say that a phenomenon is experimentally
demonstrable when we know how to conduct an
experiment which will rarely fail to give
us statistically significant results”. Such
assertions express a common dogma in modern
science that reproducibility is a necessary
condition (although not necessarily sufficient)
for establishing a scientific fact, and in
practice for establishing scientific authority
in any field of knowledge. However, as noted
above by Shapin and Schaffer, this dogma is
not well-formulated quantitatively, such as
statistical significance for instance, and
therefore it is not explicitly established
how many times must a fact be replicated to
be considered reproducible.
== Reproducible data ==
Reproducibility is one component of the precision
of a measurement or test method. The other
component is repeatability which is the degree
of agreement of tests or measurements on replicate
specimens by the same observer in the same
laboratory. Both repeatability and reproducibility
are usually reported as a standard deviation.
A reproducibility limit is the value below
which the difference between two test results
obtained under reproducibility conditions
may be expected to occur with a probability
of approximately 0.95 (95%).Reproducibility
is determined from controlled interlaboratory
test programs or a measurement systems analysis.Although
they are often confused, there is an important
distinction between replicates and an independent
repetition of an experiment. Replicates are
performed within an experiment. They are not
and cannot provide independent evidence of
reproducibility. Rather they serve as an internal
"check" on an experiment and should not be
shown as part of the experimental results
within a scientific publication. It is the
independent repetition of an experiment that
serves to underpin its reproducibility.
== Reproducible research ==
The term reproducible research refers to the
idea that the ultimate product of academic
research is the paper along with the laboratory
notebooks and full computational environment
used to produce the results in the paper such
as the code, data, etc. that can be used to
reproduce the results and create new work
based on the research. Typical examples of
reproducible research comprise compendia of
data, code and text files, often organised
around an R Markdown source document or a
Jupyter notebook.Psychology has seen a renewal
of internal concerns about irreproducible
results. Researchers showed in a 2006 study
that, of 141 authors of a publication from
the American Psychology Association (APA)
empirical articles, 103 (73%) did not respond
with their data over a 6-month period. In
a follow up study published in 2015, it was
found that 246 out of 394 contacted authors
of papers in APA journals did not share their
data upon request (62%). In a 2012 paper,
it was suggested that researchers should publish
data along with their works, and a dataset
was released alongside as a demonstration,
in 2017 it was suggested in an article published
in Scientific Data that this may not be sufficient
and that the whole analysis context should
be disclosed. In 2015, Psychology became the
first discipline to conduct and publish an
open, registered empirical study of reproducibility
called the Reproducibility Project. 270 researchers
from around the world collaborated to replicate
100 empirical studies from three top Psychology
journals. Fewer than half of the attempted
replications were successful.There have been
initiatives to improve reporting and hence
reproducibility in the medical literature
for many years, which began with the CONSORT
initiative, which is now part of a wider initiative,
the EQUATOR Network. This group has recently
turned its attention to how better reporting
might reduce waste in research, especially
biomedical research.
Reproducible research is key to new discoveries
in pharmacology. A Phase I discovery will
be followed by Phase II reproductions as a
drug develops towards commercial production.
In recent decades Phase II success has fallen
from 28% to 18%. A 2011 study found that 65%
of medical studies were inconsistent when
re-tested, and only 6% were completely reproducible.In
2012, a study by Begley and Ellis was published
in Nature that reviewed a decade of research.
That study found that 47 out of 53 medical
research papers focused on cancer research
were irreproducible. The irreproducible studies
had a number of features in common, including
that studies were not performed by investigators
blinded to the experimental versus the control
arms, there was a failure to repeat experiments,
a lack of positive and negative controls,
failure to show all the data, inappropriate
use of statistical tests and use of reagents
that were not appropriately validated. John
P. A. Ioannidis writes, "While currently there
is unilateral emphasis on 'first' discoveries,
there should be as much emphasis on replication
of discoveries." The Nature study was itself
reproduced in the journal PLOS ONE, which
confirmed that a majority of cancer researchers
surveyed had been unable to reproduce a result.In
2016, Nature conducted a survey of 1,576 researchers
who took a brief online questionnaire on reproducibility
in research. According to the survey, more
than 70% of researchers have tried and failed
to reproduce another scientist's experiments,
and more than half have failed to reproduce
their own experiments. "Although 52% of those
surveyed agree there is a significant 'crisis'
of reproducibility, less than 31% think failure
to reproduce published results means the result
is probably wrong, and most say they still
trust the published literature."
== 
Noteworthy irreproducible results ==
Hideyo Noguchi became famous for correctly
identifying the bacterial agent of syphilis,
but also claimed that he could culture this
agent in his laboratory. Nobody else has been
able to produce this latter result.In March
1989, University of Utah chemists Stanley
Pons and Martin Fleischmann reported the production
of excess heat that could only be explained
by a nuclear process ("cold fusion"). The
report was astounding given the simplicity
of the equipment: it was essentially an electrolysis
cell containing heavy water and a palladium
cathode which rapidly absorbed the deuterium
produced during electrolysis. The news media
reported on the experiments widely, and it
was a front-page item on many newspapers around
the world (see science by press conference).
Over the next several months others tried
to replicate the experiment, but were unsuccessful.Nikola
Tesla claimed as early as 1899 to have used
a high frequency current to light gas-filled
lamps from over 25 miles (40 km) away without
using wires. In 1904 he built Wardenclyffe
Tower on Long Island to demonstrate means
to send and receive power without connecting
wires. The facility was never fully operational
and was not completed due to economic problems,
so no attempt to reproduce his first result
was ever carried out.Other examples which
contrary evidence has refuted the original
claim:
Stimulus-triggered acquisition of pluripotency,
revealed to be the result of fraud
GFAJ-1, a bacterium that could purportedly
incorporate arsenic into its DNA in place
of phosphorus
MMR vaccine controversy – a study in The
Lancet claiming the MMR vaccine caused autism
was revealed to be fraudulent
Schön scandal – semiconductor "breakthroughs"
revealed to be fraudulent
Power posing – a social psychology phenomenon
that went viral after being the subject of
a very popular TED talk, but was unable to
be replicated in dozens of studies
== Stochastic processes ==
The reproducibility requirement cannot be
applied to individual samples of phenomena
which have a partially or totally non-deterministic
nature. However, it still applies to the probabilistic
description of such phenomena, with error
tolerance given by probability theory.
== See also
