WELCOME TO THE 2019 ROBERT S.
GORDON LECTURE IN EPIDEMIOLOGY.
MY NAME IS STEPHANIE GEORGE, AND
I'M PLEASED TO REPRESENTATIVE
DR. DAVID MURRAY, OUR DIRECTOR,
AND INTRODUE OUR SPEAKER.
BEFORE I DO THAT, I WANT TO
INVITE EVERYONE TO JOIN US AFTER
THE LECTURE FOR A RECEPTION IN
THE NIH LIBRARY DIRECTLY TO MY
LEFT.
THE ROBERT S. GORDON LECTURE IS
AWARDED EACH YEAR TO A SCIENTIST
WHO HAS MADE MAJOR CONTRIBUTIONS
IN THE AREA OF RESEARCH OR
TRAINING IN THE FIELD OF
EPIDEMIOLOGY OR CONDUCT OF
CLINICAL TRIALS.
THE AWARD WAS ESTABLISHED IN
TRIBUTE TO ROBERT S. GORDON JR.,
FOR HIS DEDICATION AND
CONTRIBUTIONS TO THE FIELD OF
EPIDEMIOLOGY, AND HIS
DISTINGUISHED OVER 30 YEARS OF
SERVICE TO NIH.
DURING WHICH TIME HE SERVED IN
NUMEROUS SENIOR LEADERSHIP
POSITIONS.
DR. 
DR. JOHN IOANNIDIS COMES TO US
FROM STANFORD, PROFESSOR OF
MEDICINE AT HEALTH RESEARCH
POLICY AND BY COURTESY OF
BIOMEDICAL DATA SCIENCE AND
STATISTICS.
HE'S ALSO DIRECTOR OF MEADOW
RESEARCH INNOVATION CENTER, AND
DIRECTOR OF THE Ph.D. PROGRAM
IN EPIDEMIOLOGY AND CLINICAL 
RESEARCH.
DR. IOANNIDIS RECEIVED HIS M.D.
FROM NATIONAL UNIVERSITY OF
ATHENS IN 1990 AND AND DOCTORATE
IN SCIENCE IN BIOPATHOLOGY IN
SAME INSTITUTION, TRAINED AT
HARVARD AND TUFTS IN INTERNAL
MEDICINE AND INFECTIOUS 
DISEASES, AWARDS ARE NUMEROUS,
AS ARE THE DISCIPLINES HIS WORK
SPANS.
CHECK OUT HIS BIOGRAPHY ONLINE
AND I WILL NOW INTRODUCE OUR
SPEAKER.
DR. IOANNIDIS' PRESENTATION IS
IN SCIENTIFIC METHOD, WE DON'T
JUST TRUST, OR WHY REPLICATION
HAS MORE VALUE THAN DISCOVERY.
PLEASE JOIN ME IN WELCOMING OUR 
2019 ROBERT S. GORDON JR. AWARD 
RECIPIENT, DR. JOHN IOANNIDIS.
>> THANK YOU FOR THIS TREMENDOUS
HONOR, AND FOR ASKING ME TO GIVE
THIS LECTURE TODAY ON THIS
TOPIC.
IT'S ALWAYS A PLEASURE TO BE
BACK AT THE NIH, AND TO MEET
WITH COLLEAGUES AND TO MEET MORE
COLLEAGUES, VERY BRIGHT PEOPLE.
AND THIS IS A STELLAR
INSTITUTION.
VERY UNIQUE IN THE WORLD.
SO, IN SCIENTIFIC METHOD WE
DON'T JUST TRUST.
TRUST IS GREAT TO HAVE, BUT WE
NEED TO HAVE MORE THAN JUST
TRUST.
SCIENCE IS A FANTASTIC
ENTERPRISE.
VERY SUCCESSFUL.
ABOUT 180 MILLION PAPERS
FLOATING AROUND.
THIS IS JUST 20 MILLION OF THEM
FROM THE LAST 16 YEARS.
THEY CREATE A UNIVERSE ALONG
WITH TWO MILLION PATTERNS AND
200,000 DISCIPLINES OF SCIENCE.
I'M POINTING WITH AN ARROW HERE,
AND IT'S VERY TINY FONT, BUT LET
ME MAGNIFY THAT.
HI THERE, MY BEST PAPER IS A
SPECK OF DUST IN A SPECK OF DUST
IN A SPECK OF DUST SOMEWHERE
AROUND HERE.
[LAUGHTER]
NO SINGLE PAPER CAN EVER COMPETE
WITH ITS SURROUNDING OR WITH&
SCIENCE AT LARGE.
SCIENCE IS A COMMUNAL EFFORT.
IT'S THAT WHOLE GALAXY THAT
MATTERS.
IT'S A LIVING GALAXY, AN
EVOLVING GALAXY, WITH PLENTY OF
DATA, WITH PLENTY OF HYPOTHESES,
WITH PRESENT OF INFERENCES.
AT THE SAME TIME, THAT GALAXY
ALSO HAS PLENTY OF EMPTY SPACE.
PLENTY OF DARK MATTER.
AND WHAT WOULD THAT DARK MATTER
BE?
IT WOULD BE ANALYSES THAT HAVE
NEVER BEEN PUBLISHED, DONE BUT
NOT PUBLISHED.
IT WOULD BE DATA THAT ARE
AVAILABLE BUT NOT REALLY
AVAILABLE.
THEY WERE ACCESSED MAYBE ONCE BY
SOMEONE BUT THEN WERE LOST.
IT WOULD ALSO BE SCIENCE THAT
WAS NOT DONE.
IT WOULD BE REPLICATIONS THAT
WERE NOT DONE BECAUSE IT WAS
THOUGHT TO BE NEED TO RESEARCH
AND LOST OPPORTUNITIES ABOUT
RESEARCH THAT NEVER  FLOURISH
BECAUSE IT WAS NOT FUNDED, NOT
ENOUGH RESOURCES TO DO IT.
HOW CAN WE LOOK AT THE EXISTING
MATTER AND TRY TO UNDERSTAND HOW
WE CAN EXPAND THAT UNIVERSE
FURTHER?
MUCH OF THE EMPHASIS HAS BEEN ON
MAKING DISCOVERIES.
BUT TO BE HONEST, I THINK THAT
DISCOVERY IS A BORING NUISANCE.
ALMOST EVERY SINGLE PAPER THAT
YOU WILL READ IN THE LITERATURE
WILL CLAIM THAT IT HAS SOME
NOVEL RESULTS, EVERY GRANT I
WILL SUBMIT TO NIH I PROMISE TO
FIND SOMETHING NOVEL, AND I KNOW
THAT PROBABLY I'M LYING
STATISTICALLY SPEAKING.
[LAUGHTER]
THIS IS A TEXT MINING EXERCISE,
PROBING THE ENTIRE PubMed
FROM 1990 TO 2015.
96% OF THE BIOMEDICAL LITERATURE
CLAIMS SIGNIFICANT,
STATISTICALLY SIGNIFICANT AND
ENOUGH RESULTS WHEN P-VALUES ARE
USED IN THE ABSTRACT OF A PAPER
OR IN THE FULL TEXT PAPERS
AVAILABLE IN MEDLINE.
TRANSLATION PROCEEDS AT GLACIAL
PACE DESPITES HAVE MILLIONS OF
STATISTICALLY SIGNIFICANT AND
SEEMINGLY NOBLE RESULTS.
WE DO MAKE PROGRESS BUT IT'S NOT
MILLIONS OF DISCOVERIES THAT
MOVE TO HAVE THAT TREMENDOUS
IMPACT ON MEDICAL CARE, ON
PATIENT OUTCOMES, ON HEALTH, OR
ON LIVING BETTER AND LONGER
LIVES.
SEVERAL YEARS AGO WE LOOKED AT
THE CREMÉ DE LA CREME, HIGHLY
CITED CLINICAL RESEARCH WITH
LANDMARKS, MILESTONES, AND WE
ASKED HOW LONG DID IT TAKE TO
GET THERE, TO HAVE ONE OF THE
MOST HIGHLY CITED PAPERS IN THE
HISTORY OF MEDICINE.
ON AVERAGE IT TOOK ABOUT 25 TO
30 YEARS IN THE CASE OF NITRIC
OXIDE IT TOOK ABOUT 200+ YEARS.
THERE ARE A FEW EXCEPTIONS WHERE
THINGS MATERIALIZE FASTER.
FOR EXAMPLE I CONSIDER ONE OF
THE MOST EXCITING EXPERIENCING
IN MY CAREER WHEN EARLY ON AS A
SCIENTIST I WAS AT NIH INVOLVED
IN ACTD 320, A RANDOMIZED TRIAL
THAT SHOWED WITH THERAPY WE
COULD DRAMATICALLY DECREASE THE
RISK OF DEATH AND DISEASE
PROGRESSION FOR HIV INFECTED
PATIENTS.
THAT TRIAL WAS PUBLISHED WITHIN
LESS THAN FOUR YEARS FROM THE
TIME THAT PRO PROTEASE
INHIBITORS HAD BEEN DEVELOPED
BASED ON CONCENTRATE AND
COMMUNAL EFFORT TO TRY TO DO
SOMETHING ABOUT HIV.
AND IT DID WORK.
SO, HOW CAN WE SHIFT FROM
STORIES WHERE IT TAKES 200 YEARS
OR 30 YEARS TO STORIES WHERE IT
TAKES FOUR YEARS?
AND ALSO HOW CAN WE SHIFT NOT
ONLY THE TIMING BUT ALSO
CREDIBILITY OF THESE EFFORTS?
BECAUSE IN THAT SLIDE HERE, I
ALSO HAVE SOME BLACK MILESTONES
WHICH IS WHEN LARGER DATA
CONTROL STUDIES WERE DONE THAT
SHOWED THAT THAT CREMÉ DE LA 
CREMÉ PAPER WAS EXAGGERATED IN
TERMS OF WHAT IT WAS PROMISING
TO HAVE FOUND.
I THINK THAT WE'RE INCENTIVIZING
A FAKE NARRATIVE.
THE NARRATIVE THAT IS DOMINANT
IS THAT WE HAVE AN OVERSUPPLY OF
MAJOR TRUE DISCOVERIES, AND I
THINK THAT THIS IS CURRENTLY
UNTENABLE.
I DON'T NEED TO REMIND YOU HOW
FEW NEW DRUGS GET LICENSED, EVEN
THOUGH WE'RE TRYING TO PUSH THAT
AGENDA AND WE HAVE THE BRIGHTEST
MINDS IN THE WORLD WORKING ON
SCIENCE, THE EFFORT THAT WE PUT
IS TREMENDOUS AND THE REAL
PROGRESS THAT YOU CAN MEASURE IN
ANY WAY YOU WANT IS FAR MORE
LIMITED.
NOW, THIS NARRATIVE COEXISTS
WITH SOME ADDITIONAL URGES.
ONE OF THEM IS MAKE HASTE.
YOU KNOW, RUSH.
PATIENTS ARE DYING.
LICENSE NEW DRUGS IMMEDIATELY.
AND I DO FEEL THAT PRESSURE.
AND I THINK THAT, YES, WE NEED
TO MAKE HASTE, BUT THIS DOESN'T
MEAN THAT WE SHOULD CUT CORNERS
IN TERMS OF THE EVIDENCE THAT WE
NEED TO ACCRUE.
SECOND URGE IS TO BE
METHODOLOGICALLY SLOPPY.
OH, JUST GET AN ANSWER RIGHT
AWAY.
WE HAVE ALL THESE HUGE COLLECTED
DATASETS.
NO NEED TO DO RANDOMIZED CONTROL
TRIALS, JUST RUN SOMETHING
THROUGH THE MILL OF YOUR
COMPUTER AND THAT'S GOING TO BE
ACCURATE.
AND VERY OFTEN, THIS IS NOT
ACCURATE.
AND THE THIRD URGE AVOID
REPLICATION, DON'T WASTE TIME
WITH REPLICATION. 
THIS IS "ME TOO" EFFORT.
WE NEED TO MOVE FORTH, TO MOVE
ON MORE DISCOVERIES.
NOW, THE VALUE OF DISCOVERY CAN
BE MODELED.
LET'S SAY R IS PRE-STUDY ODDS OF
RESEARCH STUDY BEING TRUE, BF IS
BAYES FACTOR CONFERRED BY THE
DISCOVERY DATA.
H IS RATIO OF THE RATE OF
NEGATIVE CONSEQUENCES FROM A
FALSE POSITIVE DISCOVERY CLAIM
VERSUS A POSITIVE CONSEQUENCE
FROM THE TRUE POSITIVE
DISCOVERY.
THEN THE VALUE OF THE DISCOVERY
PROCESS IS PROPORTIONAL TO TRUE
POSITIVE MINUS H TIMES FALSE
POSITIVE, OR IF YOU LOOK THROUGH
THAT, IT IS PROPORTIONAL TO R
TIMES THE BAYES FACTOR MINUS H.
AND OBVIOUSLY YOU CAN ADD A
CONSTANT IF YOU HAVE TWO
SCIENTISTS, TWO THOUSAND
SCIENTISTS, TWO MILLION, YOU CAN
MULTIPLY THAT WITH HOW MUCH
EFFORT IS GOING INTO THIS.
R AND H ARE RATHER FIELD
SPECIFIC.
IF I HAVE WASTED MY LIFE IN A
FIELD WHERE THERE'S NOTHING TO
DISCOVER, THAT'S VERY SAD.
BUT THAT FIELD JUST HAS NOTHING
REALLY USEFUL TO DISCOVER.
IF I WANT TO MAKE PROGRESS, I
JUST NEED TO CHANGE FIELDS.
SOMETIMES MAYBE THERE ARE THINGS
TO BE DISCOVERED, BUT THEY WOULD
REQUIRE A COMPLETELY DISRUPTIVE
APPROACH ABANDONING WHAT WE DO
AND JUST TAKING A COMPLETELY NEW
APPROACH TO TRYING TO ATTACK THE
SAME QUESTION.
H IS ALSO PRETTY MUCH FIELD
SPECIFIC.
SO, THE FOCUS SHOULD BE -- MUST
BE ON INCREASING THE BASE FACT
FACTOR.
THE OPTIONS FOR INCREASING THE
BAYES FACTOR ARE NUMEROUS AND
THEY DEPEND ON WHAT FIELD WE'RE
WORKING WITH, BUT TYPICALLY
RUNNING LARGER STUDIES, GETTING
MORE EVIDENCE, WOULD HELP.
AND ALSO ENSURING GREATER
PROTECTION FROM BIAS WOULD HELP
WHICH MEANS WE OPTIMIZE DESIGN,
THINKING, HOW WE SHOULD GO ABOUT
ANSWERING A QUESTION.
IN THE VERY SAME EQUATION IT'S
EASY TO GET NEGATIVE VALUES.
IF YOU WANT TO AVOID HAVING
NEGATIVE VALUES FROM DISCOVERY,
THEN ONE NEEDS A BAYES FACTOR TO
BE MORE THAN THE RATIO OF H OVER
R, AND OFTEN THIS IS VERY
DIFFICULT.
WHY IS THAT?
MOST ORIGINAL DISCOVERIES ARE
CLAIMS TO COME FROM SMALL
STUDIES, WHERE BIASES ARE VERY
COMMON, AND THEREFORE THE BAYES
FACTOR OFTEN IS EVEN LESS THAN
5, WHICH IN THE BAYESIAN WORK
MEANS VERY, VERY LITTLE.
AND ALSO, MOST FIELDS CURRENTLY
ARE WORKING AREAS WHERE THE
PRE-STUDY ODDS ARE PRETTY LOW.
WE'RE ATTACKING MASSIVE SPACES
OF EXPIRATION WHERE THERE ARE
SIGNALS TO BE DISCOVERED, BUT
THE DENOMINATOR OF ALL THE
POTENTIAL THINGS THAT WE CAN
MEASURE AND ALL THE THINGS WE
CAN ANALYZE IS PROBABLY VERY
LARGE.
IF WE HAVE THAT COMBINATION,
MOST DISCOVERIES ARE OPERATING
IN A SPACE WHERE THEY HAVE
NEGATIVE SCIENTIFIC VALUE.
IT'S FAR MORE LIKELY THAT THEY
WERE CONFUSERS, THEY WILL
GENERATE FALSE NEGATIVES THAT
WOULD LEAD PEOPLE ASTRAY AND
MORE RESOURCES WASTED
DOWNSTREAM, BUILDING ON
SOMETHING THAT IS A FALSE
NEGATIVE CLAIM, OR AN
EXAGGERATED CLAIM, RATHER THAN
THAT THEY WILL SAVE THE WORLD.
HOW DO WE SORT OUT WHERE TO GO
NEXT?
WE NEED REPLICATION.
WE NEED TO TAKE ALL OF THESE
TENTATIVE DISCOVERIES AND TRY TO
REPLICATE AND SEE WHAT STILL
SURVIVES DIFFERENT EFFORTS TO
REPRODUCE THESE RESULTS EITHER
EXACTLY THE SAME WAY OR WITH
DIFFERENT ANGLES OF
TRIANGULATION.
IS THAT NEW?
NOT NECESSARILY.
THIS IS PRETTY MUCH HOW SCIENCE
STARTED IN THE WESTERN WORLD.
YOU HAD TO REPLICATE AND
REPRODUCE FINDINGS IN FRONT OF
THE ROYAL ACADEMY AND PEOPLE
WERE WATCHING TO SEE WHETHER THE
APPARATUS WOULD WORK.
CURRENTLY, WE MOSTLY TRUST WHAT
WAS DONE BEHIND CLOSED DOORS OR
BEHIND A CLOSED COMPUTER IS
SOMETHING THAT WE CAN PUT TRUST
ON.
WE DO HAVE EMPIRICAL STUDIES ON
FIELDS WHERE REPLICATION IS NOT
JUST CONSIDERED TO BE A "ME TOO"
TYPE OF EFFORT, AND ACTUALLY
REPLICATION PRACTICES ARE
COMMON.
AND THESE EMPIRICAL STUDIES
SUGGEST MOST OF THE INITIAL
CLAIMED STATISTICALLY
SIGNIFICANT EFFECTS ARE EITHER
FALSE POSITIVES OR SUBSTANTIALLY
EXAGGERATED.
ONE SUCH FIELD IS GENETICS.
GENETIC EPIDEMIOLOGY WENT
THROUGH IMMENSE TRANSFORMATION
OVER THE LAST 10 YEARS, MOVING
FROM CANDIDATE GENE STUDIES
WHERE PEOPLE HAD TO COME UP WITH
SINGLE  HYPOTHESES TO TEST IN
AGNOSTIC FASHION IN TERMS OF
ASSOCIATION WITH PHENOTYPES.
ONE COULD GO BACK, BY DOING
THIS, AND ASSESS HOW OFTEN THE
PAPERS THAT WERE PUBLISHED IN
THE CANDIDATE GENE ERA WITH
SUCCESSFULLY REPLICATED,
REPLICATION RATE ON AVERAGE WAS
1.2%. 
THIS MAY BE AN UNDERESTIMATE,
I'M WILLING TO TAKE THAT TO 5 TO
10% AT MOST BUT MORE THAN 90% OF
THESE PAPERS THAT WERE PUBLISHED
FOR MANY, MANY YEARS, OVER 20
YEARS IN THE BEST OF OUR
JOURNALS WERE PROBABLY NOT
SAYING MUCH AND PROBABLY JUST
FALSE POSITIVES.
HERE IS ANOTHER EVALUATION,
ANIMAL STUDIES, THERE'S TENS OF
THOUSANDS OF ANIMAL STUDIES
BEING DONE, I THINK THEY ARE
TREMENDOUSLY IMPORTANT BECAUSE
ANIMAL RESEARCH IS AN ESSENTIAL
GATEWAY TO HUMAN CLINICAL
RESEARCH.
IF WE DECIDE TO MOVE DIRECTLY
FROM EARLY BENCH DISCOVERY TO
HUMANS, I THINK WE WILL BE
TESTING LOTS OF NOISE
INADVERTENTLY ON HUMANS AND WE
DON'T WANT TO DO THAT.
HOWEVER, YOU'RE ALL AWARE THAT
FOR MANY FIELDS WHERE WE HAVE
HUNDREDS IF NOT THOUSANDS OF
POTENTIAL LEADS FROM ANIMAL
STUDIES LIKE NEUROLOGICAL
DISEASE WE HARDLY HAVE ANY
SUCCESS WHEN IT COMES TO HUMANS.
FOR EXAMPLE,  DEMENTIA OR 
TREATMENT OF STROKE, WE HAVE
HUNDREDS THAT SEEM TO WORK IN
ANIMAL MODELS, AND THEIR LEVEL.
ONE MIGHT SAY THAT THE
EXPLANATION FOR THAT IS ANIMALS
ARE VERY DIFFERENT FROM HUMANS.
I THINK THAT THERE ARE
DIFFERENCES.
I HOPE I'M A LITTLE BIT
DIFFERENT THAN A  MOUSE BUT MY
GENOME IS NOT THAT DIFFERENT.
I THINK MOST OF THE PROBLEM IS
NOT NECESSARILY THE
DISSIMILARITY OF THE
EXPERIMENTAL SYSTEM AS THE WAY
RESEARCH IS DONE IN WAYS THAT
BIAS CAN CREEP IN.
THIS IS A SLIDE FROM A PAPER
WHERE AWAY ARE WE LOOKED ACROSS
THE ENTIRE LITERATURE ON ANIMAL
STUDIES NEUROLOGICAL DISEASES
AND FOUND PROMINENT SIGNALS OF
EXCESS SIGNIFICANCE BIAS, YOU
KNOW, PRESSURE TO DELIVER
STATISTICALLY SIGNIFICANT
RESULTS COULD BE DETECTED ACROSS
THAT FIELD, AND ONCE WE STARTED
SORTING OUT DIFFERENT PATTERNS
OF BIAS THERE WERE VERY FEW
PIECES OF EVIDENCE THAT WE
MAINED TO BE PRETTY STRONG.
PRE-CLINICAL RESEARCH IN THE
LAST 8 YEARS WE HAVE SEEN RAPID
CHANGE IN OUR UNDERSTANDING
ABOUT REPLICATION AND
REPRODUCIBILITY.
THIS TIME, MOST OF THE LEADS
CAME FROM THE INDUSTRY.
AND THE INDUSTRY HAD EVERY RIGHT
TO FEEL UNCERTAIN ABOUT NOT
BEING ABLE TO REPRODUCE PAPERS
PUBLISHED IN TOP JOURNALS BY
ACADEMIC TEAMS.
FOR ACADEMIC INVESTIGATORS,
MAYBE THIS IS CURIOSITY, MAYBE
THIS IS A PAPER IN "NATURE,"
MAYBE IT WAS A WAY TO AGAIN
TENURE.
FOR THE INDUSTRY, IT WAS AN
ISSUE OF SPENDING HALF A BILLION
DOLLARS ON SOMETHING THAT WOULD
LEAD NOWHERE.
SO WE HAD SEVERAL COMPANIES THAT
LAUNCHED REPRODUCIBILITY CHECKS
ON HIGH PROFILE HIGHLY CITED
PAPERS FROM ACADEMIC TEAMS,
PRETTY MUCH SUMMARIZING WHAT
THEY HAD ALREADY BEEN SEEING IN
LARGE SCALE.
MOST OF THE RESULTS COULD NOT BE
REPRODUCED IN THEIR HANDS. 
REPRODUCIBILITY RATES RANGE FROM
0 TO 25  25%.
ONE SUCH FAMOUS AM EXCEL WITH
AMGEN, GLEN BEGLEY CONCLUDED
THAT THE FAILURE TO WIN THE WAR
ON CANCER HAS BEEN BLAMED ON
MANY FACTORS BUT RECENTLY A NEW
CULPRIT EMERGED, TOO MANY BASIC
SCIENTIFIC DISCOVERIES ARE
WRONG.
WE'VE SEEN SIMILAR CHECKS ACROSS
VERY DIFFERENT SCIENTIFIC
DISCIPLINES.
PSYCHOLOGICAL SCIENCE HAS ALSO
GONE THROUGH A MAJOR
TRANSFORMATION.
THIS IS A PAPER THAT WAS
PUBLISHED THREE YEARS AGO IN "
SCIENCE," A COLLABORATION OF 273
PSYCHOLOGISTS AND THEIR TEAMS
TRYING TO REPRODUCE 100 OF THE
CREMÉ DE LA CREME PAPERS FROM
TOP JOURNALS, AND HE COULD READ
RESULTS IN DIFFERENT WAYS BUT
THE SUMMARY NO MATTER HOW YOU
LOOK AT IT, ABOUT 2/3 OF THE
TIME THE ORIGINAL RESULT COULD
NOT BE REPRODUCED, AND EFFECT
SIZES WERE MUCH SMALLER COMPARED
TO THE ORIGINAL.
WHAT IF YOU ONLY NEED "NATURE"
AND "SCIENCE"?
THIS IS ANOTHER REPRODUCIBILITY
CHECK OF 21 PAPERS, ON AVERAGE
THE REPRODUCIBILITY EFFORT
REVEALED EFFECT SIZE OF 50% OF
THE ORIGINAL.
AND IN MANY CASES, THERE WAS
NOTHING THERE. 
THE EFFECT WAS COMPLETELY IN THE
VICINITY OF THE NULL WITH TIGHT
CONFIDENCE INTERVALS.
DOES IT MEAN THAT THE
REPRODUCIBILITY WAS CORRECT AND
THE ORIGINAL WAS WRONG?
IN THESE CASES PEOPLE
SYSTEMATICALLY TRIED TO FOLLOW
THE EXACT RECIPE AS THE ORIGINAL
STUDY AND EVEN COMMUNICATED WITH
INVESTIGATORS, EVEN TRIED TO
MAKE SURE THEY HAVE ALL THE
CRITICAL DETAILS.
THERE COULD BE MANY
EXPLANATIONS.
IT COULD BE THAT BOTH ARE
CORRECT BUT FOR SOME SHOW WE
DON'T KNOW SOME REASON WHY THEY
DISAGREE.
IT COULD HAVE BEEN BOTH ARE
WRONG BECAUSE, AGAIN, THERE'S
SOME BIASES THAT ARE CREEPING IN
THAT WE'RE NOT FAMILIAR WITH.
OR IT COULD BE THAT ONE OF THEM
IS NOT CORRECT.
BUT CLEARLY, IF YOU HAVE A
SITUATION WHERE UNDER THE VERY
BEST EFFORTS TO REPRODUCE YOU
CANNOT GET SOMETHING TO WORK
AGAIN, ONE HAS TO WONDER WOULD
IT WORK IF WE WERE TO USE THAT
WIDELY IN REAL LIFE, IN
PATIENTS, IN COMMUNITIES, IN THE
REAL WORLD.
REPRODUCIBILITY EFFORTS CAN BE
TRICKY, AND THEY CAN ALSO BE
EMOTIONAL.
THEY CAN LEAD TO WHAT THEY CALL
THE REPRODUCIBILITY WARS.
THE REPRODUCIBILITY EFFORT, FOR
EXAMPLE, ON CANCER BIOLOGY
STARTED PUBLISHING A NUMBER OF
PAPERS WHERE THAT METICULOUS
EFFORT TO REPRODUCE HIGH PROFILE
CANCER BIOLOGY PAPERS WERE
ATTEMPTED TO BE REPRODUCED,
EVERYTHING HAD BEEN
PRE-SPECIFIED, PRE-REGISTERED
REPORTS.
RESULTS WERE VERY CLEAR.
BUT THEN MOST OF THE TIME IF THE
ORIGINAL WAS NOT REPRODUCED,
ORIGINAL INVESTIGATORS WOULD
FIGHT BACK, AND YOU END UP IN A
SITUATION WHERE YOU FEEL THAT
REPUTATION IS AT STAKE, THERE'S
VERY FIERCE EMOTIONS ABOUT WHO
IS RIGHT, WHO IS WRONG, CAREERS
ARE THOUGHT TO BE AT STAKE, AND
INTERPRETATIONS CAN BE DIFFERENT
ON WHAT IS SUCCESSFUL AND WHAT
IS UNSUCCESSFUL.
WE BEIEVE PEOPLE DO CARE ABOUT
REPRODUCIBILITY.
AND THEY SHOULD CARE.
IT'S REALLY A CENTRAL PIECE OF
THE SCIENTIFIC METHOD.
HOWEVER, WHAT EXACTLY DO WE MEAN
BY REPRODUCIBILITY?
WHAT IS RESEARCH
REPRODUCIBILITY?
IF YOU LOOK ACROSS ALL 22 MAJOR
DISCIPLINES OF SCIENCE THERE'S A
RAPIDLY INCREASING USE OF THAT
TERMINOLOGY, IF YOU DO TEXT
MINING EXERCISE LIKE WHAT I'M
SHOWING HERE, BUT PEOPLE MEAN
VERY DIFFERENT THINGS.
BASICALLY, YOU CAN SEPARATE
REPRODUCIBILITY INTO THREE MAIN
CLUSTERS, ONE IS REPRODUCIBILITY
OF METHODS WHICH IS ABILITY TO
UNDERSTAND OR TO REPEAT AS
EXACTLY AS POSSIBLE THE
EXPERIMENTAL AND COMPUTATIONAL
PROCEDURES, SO AVAILABILITY OF
SOFTWARE, OF SCRIPT, OF DATA,
AND THE ABILITY TO PUT THEM
TOGETHER AND GET THE SAME RESULT
FROM THE VERY SAME DATA.
REPRODUCIBILITY OF RESULTS WHICH
MEANS THAT WE'RE DOING YET
ANOTHER STUDY ON NEW
PARTICIPANTS, NEW SAMPLES, NEW
OBSERVATIONS, AND WE HOPE TO GET
A RESULT THAT IS CONSISTENT,
COMPATIBLE, IDEALLY AS CLOSE AS
THE ORIGINAL.
AND REPRODUCIBILITY OF
INFERENCES WHICH MEANS THAT WE
HAVE ONE STUDY, A REPLICATION OR
MULTIPLE STUDIES OR BODY OF
EVIDENCE, AND I ASK PEOPLE IN
THE AUDIENCE, WHAT DO YOU
CONCLUDE OUT OF THIS, AND WE MAY
OR MAY NOT AGREE ABOUT WHAT
THESE DATA AND WHAT THESE
RESULTS MEAN.
REPRODUCIBILITY CAN BE AFFECTED
BY THE RECIPE OF RESEARCH
PRACTICES THAT WE APPLY.
AND YOU CAN THINK OF TWO
EXTREMES OF RESEARCH PRACTICES,
TWO STEREOTYPES.
OF COURSE THIS IS A STEREOTYPE,
DOESN'T MEAN I HAVE SOME
PARTICULAR RESEARCHER IN MIND
DOING THINGS WRONG BUT IF YOU
THINK OF SMALL AND BIG DATA THIS
IS WHAT OFTEN THAT LOOKS LIKE.
SO WITH SMALL DATA, WHICH IS
STILL THE MAJORITY IN MOST
SCIENTIFIC FIELDS, WE HAVE THE 
PROTOTYPE OF THE SOLO SILOED
INVESTIGATOR, SMALL TEAM, VERY 
LIMITED RESOURCES, LIMITED
FUNDING.
ONE NEEDS TO BE SUCCESSFUL.
YOU KNOW, THESE THREE OR FOUR
YEARS OF FUNDING END, AND YOU
NEED TO SAY I HAVE SOMETHING
MAJOR TO SAY AND SOMETHING MAJOR
THAT WOULD LEAD ME TO MY NEXT
GRANT.
ACTUALLY WE NEED TO GET GOING
PROBABLY NOT AFTER THREE YEARS
BUT AFTER THREE DAYS TO WRITE
THE NEXT GRANT.
AND HOW DO YOU DO THAT?
MUCH OF THE TIME THE RESULTS
WILL BE QUOTE/UNQUOTE NEGATIVE
OR UNIMPRESSIVE.
YOU NEED TO START EXPLORING,
SEARCHING, WHATEVER SPACE YOU
HAVE AVAILABLE, THERE WILL BE A
LOT OF CHERRY PICKING OF BEST
LOOKING RESULTS, A LOT OF POST
HOC INTERPRETATION, P-VALUE OF
LESS THAN 0.05 ARE CONSIDERED UP
IN, WITH P-HACKING IT'S POSSIBLE
TO GET THERE.
THERE'S NO REGISTRATION BECAUSE
THAT DECREASES DEGREES OF
FREEDOM, NOTED SHARING OFFERS
AMMUNITION TO COMPETITORS AND NO
REPLICATION BECAUSE IF YOU TRY
AND FAIL YOU'VE INVERSED TWICE
THE EFFORT AND YOU'RE BACK TO
SQUARE ZERO AND PEOPLE SAY YOU
FOUND NOTHING SO THAT'S IT FOR
YOU.
SOME OF THE WAYS TO IMPROVE THE
VALIDITY OF SMALL SAMPLE SIZE
RESEARCH ARE VERY EASY TO
IMPLEMENT.
THEY COST NOTHING.
THEY WOULD SAVE US FROM A LOT OF
TROUBLE.
HOWEVER, THEY ARE NOT
IMPLEMENTED.
FOR EXAMPLE WE KNOW ABOUT
EXPERIMENTAL BIAS FOR OVER HALF
A CENTURY NOW.
ROSENFELD PUBLISHED PAPERS MORE
THAN HALF A CENTURY AGO.
AND WE KNOW THAT IN ANIMAL
EXPERIMENTS OR IN OTHER IN VIVO
STUDIES BUT ALSO IN VITRO IT'S
GREAT TO HAVE PEOPLE READ THE
RESULTS TO BE BLINDED TO
EXPERIMENTAL CONDITIONS, HOWEVER
THAT HAPPENS LESS THAN  10% OF
THE TIME.
RANDOMIZATION IS NOT ABOUT
HUMANS HERE, IT'S ABOUT ANIMAL
WORK OR OTHER EXPERIMENTS.
IT COSTS NOTHING.
IT'S VERY EASY TO DO.
IT WOULD SAVE US FROM LOTS OF
TROUBLE OF IMBALANCES THAT WOULD
BE CONSCIOUSLY, SUBCONSCIOUSLY
BE CREATED.
AGAIN, LESS THAN 30% OF THE TIME
EVEN IN THE MOST RECENT STUDIES
THIS IS IMPLEMENTED.
WITH BIG DATA WE'RE CHALLENGING
THE STATUS QUO OF SMALL STUDY
RESEARCH, BUT IT DOESN'T MEAN
NECESSARILY THAT OUR ODDS OF
SUCCESS ARE NECESSARILY BETTER.
IN THAT SITUATION, WHICH IS
BECOMING MORE COMMON WITH
ELECTRONIC HEALTH RECORDS, WITH
LARGE OMICS DATABASE IS, AND
OTHER SUCH EFFORTS, WE HAVE
EXTREMELY LARGE SAMPLE SIZES.
WE HAVE OVERPOWERED STUDIES THAT
ARE LIKELY TO GIVE SIGNALS NO
MATTER WHAT, EVEN IF NO SIGNALS
ARE WORTHWHILE DETECTING, YOU
WILL DETECT TONS OF THEM.
THIS MEANS, AGAIN, THERE NEEDS
TO BE A CHERRY PICKING PROCESS,
AND, AGAIN, A LOT IS DONE POST
HOC, MANY -- MOST OF THESE
DATABASES ARE NOT EVEN ASSEMBLED
FOR RESEARCH PURPOSES.
THEY HAPPEN TO ACCRUE OVERNIGHT.
I'M SLEEPING AND WAKE UP IN THE
MORNING, AND THERE'S SO MANY
MORE PATIENTS THAT HAVE BEEN
ADDED TO THE ELECTRONIC HEALTH
RECORDS SO THAT ONE OR MORE
RESEARCHERS COULD TAP INTO.
STATISTICAL INFERENCE TOOLS ARE
A BIT DIFFERENT.
PEOPLE RECOGNIZE VERY QUICKLY
THAT IF THEY JUST USE A P-VALUE
OF LESS THAN 0.05, EVERYTHING
WILL BE STATISTICALLY
SIGNIFICANT, SO THERE'S A LITTLE
BIT OF MORE SOPHISTICATION MUCH
OF THE TIME IN THAT SPACE, BUT
VERY OFTEN YOU SEE IDIOSINCRATIC
TOOLS WITHOUT CONSENSUS, PEOPLE 
WORKING IN SAME OR SIMILAR
FIELDS, DIFFERENT APPROACH AND
DIFFERENT THRESHOLD FOR CLAIMING
SUCCESS.
REGISTRATION REMAINS EXCEPTION,
ALSO IN THAT SPACE, AND DATA
SHARING DOES HAPPEN PROBABLY
MORE OFTEN THAN IN FIELDS THAT
USE SMALL DATASETS, BUT
UNFORTUNATELY MUCH OF THE TIME
THERE'S NO UNDERSTANDING WHAT IS
BEING SHARED.
MOST OF THE TIME DATASETS ARE
DUMPED SOMEWHERE WITH SOME
ACCESS WHERE SOMEONE COULD
EASILY OR MORE DIFF RESULT  --
DIFFICULT GET ACCESS AND YOU ASK
DOES ANYONE KNOW WHAT THE DATA
ARE AND WHAT THEY SHOW AND
LITERALLY EVEN THE INVESTIGATOR
WHO HAS GENERATED THE DATA
PROBABLY DOES NOT KNOW BECAUSE
IT'S JUST A BIG BLACK BOX THAT
IS VERY, VERY HARD TO INTERPRET
WHAT EXACTLY HAS BEEN GENERATED,
LET ALONE HOW VALID IT MIGHT BE.
REPLICATION DOES HAPPEN IN SOME
FIELDS, AND I SHOW YOU SOME
EXAMPLES.
AND EVEN WHEN IT DOES NOT
HAPPEN, WE MAY HAVE A SITUATION
WHERE PEOPLE SAY I'M DOING A
STUDY THAT IS SOMEHOW DIFFERENT,
ACTUALLY THERE'S AN INCENTIVE TO
JUSTIFY I'M DOING A DIFFERENT
STUDY BUT YOU LOOK AT THE TWO
STUDIES AND SAY YOU'RE ASKING
EXACTLY THE SAME QUESTION BUT TO
GET IT PUBLISHED SOMEONE HAS TO
SAY IT IS DIFFERENT.
METANALYSIS CAN TAKE THE SIMILAR
STUDIES AND EXAMINE THEM IN
THEIR TOTALITY.
STUDIES ON THE SAME QUESTION
WITH PRETTY MUCH THE SAME
COMPARISONS WITH PRETTY MUCH THE
SAME INTERVENTIONS OR THE SAME
RISK FACTORS, OR THE SAME
HYPOTHESES BEING ADDRESSED, AND
GIVE US A SENSE OF HETEROGENEITY
BETWEEN THE DIFFERENT STUDIES
THAT ADDRESS FAIRLY SIMILAR
QUESTIONS.
HETEROGENEITY MAY BE GENUINE,
MUCH OF HETEROGENEITY MAY
REFLECT GENUINE DIFFERENCES
ACROSS THE STUDIES.
HOWEVER, IT ALSO POINT OUT
DIFFERENCES THAT STEM FROM
BIASES DIFFERENTIALLY EXPRESSED
OR AFFECT DIFFERENTIALLY
DIFFERENT INVESTIGATIONS ON THE
SAME QUESTION.
IF THAT'S THE CASE, WE SHOULD BE
ABLE TO PICK SOME HINTS OF THE
PRESENCE OF A BIAS BECAUSE IT
LEAVES A PATTERN OF A PARTICULAR
TYPE OF DIVERSITY, A PARTICULAR
TYPE OF HETEROGENEITY ACROSS THE
RESULTS THAT ARE BEING COMBINED
IN THE METANALYSIS.
SO THIS IS PRETTY MUCH WHAT WE
DID HERE.
WE PRE-SPECIFIED 17 PATTERNS OF
BIAS AND THOUGHT, HOW WOULD THE
LITERATURE LOOK LIKE, IF THESE
BIASES WERE PRESENT, AND FOR ALL
THE MET  MET -- METAANALYSES WE
TRIED TO ASK DO WE SEE PATTERNS.
MOST BIASES COULD BE SEEN IN
MOTE FIELDS.
HOWEVER, SOME BIASES SEEMED TO
BE HAVING MORE COMMON HINTS OF
PRESENCE IN SOME FIELDS COMPARED
TO OTHERS.
FOR EXAMPLE, SMALL STUDY EFFECTS
WHICH IS PATTERN WHERE SMALL
STUDIES GIVE MORE PROMINENT
PERHAPS EXAGGERATED RESULTS
COMPARED TO LARGER STUDIES WAS A
PATTERN SEEN VERY PROMINENTLY IN
THE SOCIAL SCIENCES, WE'RE
SEEING PROMINENTLY IN THE
BIOMEDICAL SCIENCES AND NOT
REALLY SEEN, OR VERY SOFT
SIGNALS WERE SEEN IN THE
PHYSICAL SCIENCES.
PHYSICAL SCIENCES ARE MUCH
BETTER USED TO WORKING WITH
LARGE DATASETS, WITH COMMUNAL
SCIENCE, WITH CERN OR BIG
TELESCOPES SHARING, PHYSICISTS
OR ASTROPHYSICISTS WORKING ON A
PARTICULAR DOMAIN OF SCIENCE.
THE SAME APPLIED TO OTHER
DISCIPLINES, SOMETIMES ONE OR A
ANOTHER FIELD HAD MORE OF THE
SIGNAL OF ONE BIAS OR ANOTHER.
BUT MOST  BIASES WERE SEEN IN
SCIENTIFIC FIELDS.
SOLUTIONS ARE INTEGRAL TO THE
PROCESS WE DO SCIENCE AND YOU
CAN THINK OF REPRODUCIBILITY
PROBLEMS AS A CENTRAL CONCERN IN
EVERYDAY EXPERIMENTATION, IN 
EVERYDAY STUDY DESIGN, IN
EVERYDAY WAYS OF RUNNING
SCIENCE.
LOTS OF PEOPLE ARE THINKING
ABOUT THEM AND SOME VERY SMART
SOLUTIONS HAVE APPEARED, BUT
ALSO MANY OF THEM ARE VERY
SPECULATIVE.
THEY HAVE NO EMPIRICAL SUPPORT
TO TELL US THAT THEY WILL MAKE
THINGS BETTER RATHER THAN MAKE
THINGS WORSE BECAUSE OF
COLLATERAL DAMAGES THAT THEY MAY
PROCURE.
HERE'S 12 FAMILIES OF SOLUTIONS.
LARGE SCALE COLLABORATION,
ADOPTION OF REPLICATION, CULTURE
AS A SINE  QUE NON, 
STANDARDIZATION OF ANALYSIS,
THRESHOLD FOR CLAIMING DISCOVERY
OR SUCCESS, IMPROVEMENT OF STUDY
DESIGN STANDARD, IMPROVEMENT IN
DISSEMINATION OF RESEARCH AND
BETTER TRAINING OF THE
SCIENTIFIC WORKFORCE IN METHODS
AND HOW IT SHOULD BE APPLIED AND
STATISTICAL LITERACY AND
NUMERACY.
I HOPE I'VE CONVINCED YOU BY NOW
SOME OF THE MOST SUCCESSFUL
FIELDS IN TERMS OF CREDIBILITY
ARE THOSE THAT ADOPTED LARGE
SCALE COLLABORATION AND ADOPTION
OF REPLICATION CULTURE.
THIS IS ONE EXAMPLE OF MANHATTAN
PLOTS, GENETIC  EPIDEMIOLOGY
TRANSFERRED TO A FIELD WE CAN
SAFELY REPRODUCE SIGNALS.
I'M NOT DELVING ON WHETHER
SIGNALS ARE USEFUL, WHETHER
THAT'S SOMETHING THAT YOU CAN
TAKE TO BE PATIENTS AND CHANGE
THEIR LIVES BUT AT LEAST THEY
ARE THERE.
SOMETIMES THE SIGNALS MAY BE
TRUE BUT MAYBE NOT USEFUL.
THIS IS PERFECTLY FINE.
YOU KNOW, WE KNOW WHAT WE'RE
DEALING WITH.
WE KNOW WHAT IS THE TRUE
COMPLEXITY OF THE RESEARCH
QUESTIONS THAT WE'RE FACING.
REGISTRATION CAN BE TRICKY IN
MANY SITUATIONS.
THE LEAST I WOULD LIKE TO SEE IS
PEOPLE WHO ARE DOING VERY
INTERESTING EXPLORATORY RESEARCH
FEEL THAT THEY NEED TO SAY THAT
I HAVE REGISTERED MY STUDIES.
IF YOU WAKE UP AT 3:00 AFTER
MIDNIGHT WITH A FANCY IDEA THAT
YOU CANNOT GO BACK TO SLEEP AND
NEED TO RUN TO THE LAB TO TRY IT
OUT, AND YOU JUST ARE MESSING
RIGHT AND LEFT WITH DIFFERENT
POSSIBILITIES PROBABLY YOU
DIDN'T WRITE A PRODIGAL BEFORE
YOU STARTED DOING THAT BUT
SOMETHING INTERESTING MAY COME
OUT OF IT AND MAYBE YOU'RE THE
NEXT ALEXANDER FLEMING WITH
PENICILLIN.
HOW LIKELY IS THAT TO BE THE
CASE?
NOT VERY LIKELY, BUT THERE ARE
20 MILLION SCIENTISTS AUTHORING
SCIENTIFIC PAPERS.
SO A FEW AMONG THOSE 20 MILLION
WILL BE ALEXANDER FLEMING,
HOPEFULLY.
WHAT WE NEED TO DO IN THAT CASE
IS JUST SAY THAT THAT WAS
EXPIRATION, IT WAS MAD, WILD,
CRAZY, EXCITING, FASCINATING
EXPLORATION.
THAT'S WHAT CAME OUT.
NOW SOMEONE NEEDS TO REPRODUCE
IT IN A PROSPECTIVE
REPRODUCIBILITY EFFORT.
LEVEL 1 WOULD BE REGISTRATION OF
DATASET.
I FEEL UNEASY WHEN I'M WORKING
IN A FIELD THAT I DON'T KNOW HOW
MANY DATASETS ARE OUT THERE THAT
COULD BE PROBED AND HOW MANY
DIFFERENT WAYS THEY CAN BE
PROBED.
REGISTERING A DATASET IS LIKE
REGISTERING A NUCLEAR ARSENAL.
SO I'M TELLING YOU THAT I HAVE
THIS BIG DATASET IN MY COMPUTER,
IT INCLUDES OBSERVATIONS ON TWO
MILLION PEOPLE, AND I HAVE 2,000
VARIABLES ON EACH ONE, WHICH
MEANS THAT TONIGHT IF I FEEL
DEPRESSED I CAN PRESS A BUTTON
ON SOME STATISTICAL SOFTWARE AND
RUN SO MANY BILLIONS OR
TRILLIONS OF CHI SQUARE P-VALUES
AGAINST YOU, SO IT'S ONE WAY TO
CONFER THE BREADTH OF
POSSIBILITIES OF ANALYSIS THAT
CAN BE DONE.
LEVEL 2 WOULD BE REGISTRATION OF
A PROTOCOL.
IF A PROTOCOL EXISTS, THERE'S
LOTS OF RESEARCH WHERE THERE'S
NO PROTOCOL.
AND SOMETIMES IT'S JUST CHI
EXPLORATION BUT MOST OF THE TIME
A PROTOCOL IS FEASIBLE.
TIME SPENT COMING UP WITH A
PROTOCOL IS ALWAYS WELL SPENT.
LEVEL 3 WOULD BE REGISTRATION OF
ANALYSIS PLAN IF THERE IS ONE.
SOMETIMES YOU HAVE TO KNOW THIS
IS THE ANALYSIS PLAN THAT I
COULD THINK AHEAD OF TIME, AND
THESE ARE SOME MODIFICATIONS
BECAUSE OF SOME PECULARITIES I
HAD NOT ANTICIPATED.
LEVEL 4 WOULD BE REGISTRATION OF
BOTH ANALYSIS PLAN AND RAW DATA.
AND LEVEL 5 WOULD BE OPEN LIVE 
STREAMING, WHERE YOU ITERATIVELY
COMMUNICATE WITH THE SCIENTIFIC
COMMUNITY ABOUT WHAT ARE THE
EXPERIMENTS I'M PLANNING TO DO,
ONE RECEIVES FEEDBACK, THEY ARE
REVISED, SHARED, DONE AGAIN,
ITERATIVE OPEN PROCESS WITH THE
WHOLE COMMUNITY.
THIS IS PRETTY MUCH HOW THE
CLAIM BY NASA THAT THEY HAD
IDENTIFIED BACTERIA USING
ARSENIC INSTEAD OF PHOSPHORUS
FOR DNA BACKBONE WAS REFUTED, A
PAPER IN "SCIENCE."
IN THE ABSENCE OF HAVING SOME
RULES IN THE SCIENCE GAME, IN
MANY FIELDS ANY RESULT CAN BE
OBTAINED.
WHAT YOU GET IS WHAT I CALL
EXTREME VIBRATION OF EFFECTS,
AND AT THE EXTREME YOU GET THE
GAIN OF PHENOMENON, A GREEK
ROMAN GOD COULD SEE IN TWO
OPPOSITE DIRECTIONS.
THESE ARE DATA FROM THE NATIONAL
SURVEY, METICULOUSLY COLLECTED,
FOCUSING ON THE RIGHT PANEL,
HORIZONTAL RATIO, MINUS P-VALUE,
ASSOCIATION OF ALPHA OR VITAMIN
E LEVELS WITH THE RISK OF DEATH
WITH ONE MILLION POINTS ON THE
PLOT, ONE MILLION RESULTS ONE
COULD GET IN THE SAME DATASET,
ON THE VERY SAME QUESTION, JUST
ANALYZING THE DATA SLIGHTLY
DIFFERENT.
FOR EXAMPLE, DEATH CAN BE
AFFECTED BY ZILLIONS OF FACTORS,
SO IF YOU ACCOUNT FOR A FACTOR
OR NOT YOU HAVE TWO VOICES.
IF YOU HAVE 19 SUCH CHOICES TO
MAKE THIS IS 2 TO THE 19th
POWER, THIS IS ONE MILLION
DIFFERENT POSSIBILITIES OF
ANALYZING THE SAME DATA ON THE
SAME DATASET, ON THE SAME
QUESTION.
70% OF THE TIME VITAMIN E
DECREASES RISK OF DEATH.
30% OF THE TIME, VITAMIN E
INCREASES RISK OF DEATH.
IF I HAVE A STRONG BELIEF ON
WHAT VITAMIN E SHOULD DO, I CAN
GET THAT RESULT.
NO MATTER WHAT.
THIS IS REALLY HAPPENING ON A
DAILY BASIS.
IT IS HAPPENING ON SOME OF THE
MOST INFLUENCE PAPERS THAT YOU
WILL SEE IN THE LITERATURE.
THIS IS A PAPER THAT LAST YEAR
WAS ONE OF THE 20 HIGHEST IMPACT
PAPERS ACROSS ALL SCIENCE,
PRACTICALLY CONCLUDED WITH THREE
CUPS OF COFFEE PER DAY YOUR RISK
OF DEATH DECREASES BY 17%.
IF THEY GIVE ME THE DATASET, I
COULD MAKE IT INCREASE BY 50%,
IT'S AN OPEN PLEDGE.
TRANSPARENCY, HOW CAN WE FIND
OUT THAT WE CAN EVEN TRUST THE
DATA?
THAT THERE'S NOT AN A MISSING
FROM TRANSPARENCY, FOR EXAMPLE.
WE NEED TO FIND WAYS TO MAKE
SHARING EASIER.
THIS IS A PIVOTAL STUDY, 329,
SOUNDS LIKE A SUBMARINE BUT IT'S
A RANDOMIZED TRIAL.
WHEN IT WAS DONE, PUBLISHED IN
2001 BY SMITHKLINE MEACHAM, IT
SHOWED THIS IS SAFE FOR MAJOR 
DEPRESSION ADOLESCENTS.
15 YEARS LATER THEY CONCLUDED
THEY ARE NOT EFFECTIVE AND NOT
SAFE FOR MAJOR DEPRESSION IN
ADOLESCENTS.
HOW OFTEN DOES THAT HAPPEN?
A FEW YEARS AGO WE LOOKED AT
REANALYSIS THAT HAD BEEN DONE ON
THE SAME CLINICAL QUESTION FROM
THE SAME CLINICAL TRIAL DATASET.
THESE ARE REANALYSES RATHER THAN
REPUTATIONS BUT IF WE CANNOT
REANALYZE AND HAVE SOME
CONFIDENCE WHY SHOULD WE EVEN
TAKE THE SECOND STEP OF DOING A
SEPARATE INDEPENDENT REPLICATION
ON A NEW STUDY?
WE FOUND 37 SUCH REANALYSES AND
35% OF THE TIME THE CONCLUSION,
THE MAIN CONCLUSION OF THE
REANALYSIS WAS DIFFERENT
COMPARED TO THE CONCLUSION OF
THE ORIGINAL PAPER.
THIS TREATMENT SHOULD BE USED,
NO THIS TREATMENT SHOULD NOT BE
USED, IN THIS SUBGROUP, NO IN
THAT SUBGROUP.
WAS IT RESEARCH PARASITES WHO
HAD RUN THESE REANALYSES?
WAS IT ROGUE ANALYSTS TRYING TO
MAKE A CAREER OF THEMSELVES OR
PUT THE GREAT ORIGINAL
INVESTIGATORS TO SHAME?
ALMOST ALWAYS IT WAS THE SAME 
INDIVIDUAL INVESTIGATORS WHO
REPUBLISHED BUT IN A PUBLICATION
ENVIRONMENT WHERE IF THEY WERE
TO SAY I TRIED TO REANALYZE MY
DATA AND FOUND EXACTLY THE SAME
THING AND CONCLUDE THE SAME
CONCLUSION SOMEONE WILL TELL
THEM WHY DID YOU WASTE YOUR
TIME?
THIS IS DUPLICATE PUBLICATION.
SO, OUR INCENTIVE SYSTEM SELECTS
FOR CONFUSION, SELECTS FOR
DISCORDANT RESULTS, EVEN FOR
THINGS THAT ARE DONE BY THE SAME
INVESTIGATORS.
RECENTLY WE VISITED THAT
PATTERN, LOOKING AT ANOTHER
EXTREME WHERE REANALYSIS
ACTUALLY SHOULD HAVE BEEN THE
DEFAULT.
PLOS MEDICINE AND BMJ HAVE
POLICIES IN PLACE IF YOU PUBLISH
A RANDOMIZED TRIAL YOU NEED TO
MAKE NOT ONLY THE FULL PROTOCOL
BUT ALL DATA AVAILABLE TO ANYONE
WHO WOULD ASK FOR THEM.
ALONG WITH THE COLLEAGUES FROM
MY METRICS TEAM, WE INVITED
INVESTIGATORS WHO PUBLISHED
UNDER THIS POLICY TO SHARE THE
DATA WITH US, AND WE PROMISED WE
WOUL REANALYZE THEM AT NO COST
FOR FREE.
IT'S LIKE GETTING AN INVITATION
FROM IRS.
[LAUGHTER]
BUT I'M REALLY GLAD THAT ALMOST
50% OF THESE INVESTIGATORS
ACTUALLY DID SEND US THEIR DATA,
AND THEY ALSO WERE VERY HELPFUL
WITH THAT AND WE REANALYZED AND
FOUND A FEW ERRORS BUT NOTHING
MAJOR.
THE CONCLUSIONS WOULD STILL
REMAIN THE SAME.
ONE EXTREME IS HIGHLY SELECTIVE
ENVIRONMENT WHERE PEOPLE ARE
JUST TRYING TO IMPRESS AND SHARE
VERY LITTLE, AND THE OTHER A
VERY SELECTIVE ALSO ENVIRONMENT 
WHERE EVERYONE IS WILLING TO
SHARE, 50% ARE WILLING TO SHARE,
AND THEY ARE VERY OPEN TO HAVING
THEIR DATA REANALYZED AND
EVERYBODY LOOKS FINE.
ONE MIGHT ARGUE MAYBE THESE
PEOPLE WHO DID CONTRIBUTE DATA
TOOK AN EXTRA LOOK AND IF THEY
FOUND MAJOR ERRORS MADE SURE
THEY SENT US A VERSION THAT
WOULD BE COMPATIBLE.
I DON'T WANT TO BECOME PARANOID.
I BELIEVE THAT IF WE CREATE A
CULTURE OF SHARING, MOST OF THE
RESULTS THAT WE WILL GET IF
PEOPLE ARE WELL TRAINED,
HOPEFULLY WILL BE REANALYZABLE
AND REPRODUCIBLE AT THAT LEVEL
SO HOW DO WE IMPROVE SHARING?
SHARING IS A CHALLENGE IN
ITSELF, AND WHEN WE TRIED TO
RENEW, TO RETRIEVE, THE MOST
HIGHLY CITED PAPERS DATASETS,
THE DATA BEHIND THE MOST CITED
PAPERS IN PSYCHOLOGY AND
PSYCHIATRY, WE MET WITH QUITE A
LOT OF RESISTANCE.
YOU KNOW, THESE TRIALISTS AND
THESE STUDIES WERE NOT BOUND BY
THE PLOS MEDICINE AND BMJ RULES
AND COULD OR COULD NOT SHARE,
MIGHT NOT SHARE THEIR DATA WITH
US.
WE REALIZED THAT IN SOME CASES
THEY HAD MADE THEIR DATA
AVAILABLE ALREADY, AND IN FEW
MORE THEY WERE WILLING TO SHARE
THAT INFORMATION.
BUT VERY OFTEN THEY COULD NOT OR
DID NOT.
WHAT ARE THE REASONS FOR THAT IN
THE MOST COMMON REASON WAS THAT
IT WAS OUTSIDE OF THE
RESEARCHER'S CONTROL.
I HAVE COME ACROSS MANY, MANY
SITUATIONS WHERE RESEARCHERS DID
NOT CONTROL THEIR OWN DATA.
SOMETIMES RESEARCHERS PUBLISHED
AS FIRST OR LAST AUTHORS OR
BOTH, AND THEY HAVE NEVER SEEN
THE DATA THEY PUT THEIR NAMES
ON.
IT'S REALLY SCARY.
LEGAL AND ETHICAL CONCERNS, THE
CONSENT DID NOT ALLOW THAT, WAS
DONE IN THE '90s OR SO, THE
DATA NO LONGER EXISTS, CLASSICAL
EXAMPLE DATA WAS EATEN BY
TERMITES, A FAMOUS QUOTE.
RESEARCHERS ARE STILL USING
DATA, FOR HOW LONG, SIX MONTHS, 
TWO YEARS, TWENTY YEARS?
ARE WE MAKING PROGRESS IN
SHARING?
WE ARE.
A FEW YEARS AGO ALONG WITH
SHELLY SCULLY FROM NIH WE LOOKED
AT REPRODUCIBILITY RESEARCH AND
TRANSPARENCY PRACTICE AND FOUND
HARDLY ANYTHING WAS SHARED
BETWEEN 2000 AND 2014.
SOME ISSUES LIKE GENOMICS ARE
DOING THIS BUT IN THE BIG VIEW
OF ZOOMING OUT THAT WAS VERY,
VERY UNCOMMON.
CONVERSELY, WHEN WE LOOKED AT
2015 TO 2017 THERE WAS REAL
PROGRESS.
AND IN SOME CASES THAT PROGRESS
CAN BE EXPLAINED BECAUSE
GENERALS DID CHANGE POLICIES,
FOR EXAMPLE IN PSYCHOLOGY --
JOURNALS DID CHANGE POLICIES,
SOME SWITCHED TO ENCOURAGING
ROUTINE SHARING, AND YOU SEE
RAPIDLY INCREASING RATE OF
SHARING DATASETS IN THE
PUBLISHED PAPERS IN THE
JOURNALS, BUT IN BIOMEDICINE I
THINK IT'S BEEN MORE OF A
DIVERSE MOVEMENT WHERE MULTIPLE
JOURNALS, FIELDS, INSTITUTIONS
ARE INCENTIVIZING OR
FACILITATING SHARING, GOING FROM
0% TO SOMETHING LIKE 25% BY
2017.
WE ALSO SEE SOME OTHER
CONCOMITANT CHANGES IN
TRANSPARENCY INDICATORS.
DISCLOSURE OF FUNDING, FOR
EXAMPLE, OR DISCLOSURE OF
CONFLICT OF INTEREST HAS GONE UP
OVER THE YEARS.
MOST PEOPLE STILL CLAIM THEY
HAVE SOMETHING NOVEL TO SAY,
EVEN IF YOU LOOK AT THE ABSTRACT
IT'S VERY PROMINENT, BUT THERE
ARE MORE PEOPLE WHO SAY THAT I'M
TRYING TO REPLICATE SOMETHING OR
AT LEAST TRYING TO DO SOMETHING
BUT AT THE SAME TIME REPLICATE
EXISTING KNOWLEDGE.
COMPUTATIONAL METHODS CAN
FACILITATE MUCH OF OUR
REPRODUCIBILITY QUEST.
A COUPLE YEARS AGO WE PUBLISHED
GUIDELINES TRYING TO ENHANCE
WAYS THAT JOURNALS,
INVESTIGATORS AND INSTITUTIONS
COULD MOVE FORWARD IN IMPROVING
THEIR REPRODUCIBILITY FOR
COMPUTATIONAL METHODS, INCLUDING
SOFTWARE, SCRIPT, AND LINKED TO
DATA.
VERY OFTEN YOU SEE A LINK, AND
YOU CLICK ON THAT LINK AND
THERE'S NOTHING THERE.
YOU GET AN ERROR SIGNAL.
SO THERE'S SOME VERY EASY
INTERVENTIONS THAT CAN IMPROVE
AVAILABILITY OF THESE FUNCTIONAL
LINKS, BUT THERE'S ALSO MORE
SOPHISTICATED ONES THAT CAN TAKE
US SUBSTANTIALLY FURTHER.
BETTER STATISTICS AND METHODS,
WE HAVE SEEN A TRANSFORMATION OF
RESEARCH OVER THE LAST SEVERAL
DECADES, AND DATA SCIENCE HAS
TAKEN A CENTRAL ROLE ACROSS
MULTIPLE SCIENTIFIC FIELDS.
IS THAT NEW?
WELL, SCIENCE HAS ALWAYS BEEN
ABOUT DATA.
BUT I THINK THAT WHILE IN THE
PAST IT MIGHT HAVE BEEN EASY TO
WORK WITH SMALL DATA SETS FOR
DESCRIPTIVE PURPOSES, CURRENTLY
YOU REALLY NEED A LICENSE TO
KILL.
A LICENSE TO ANALYZE.
AND MOST OF THE TIME, MOST OF
THE PAPERS THAT I SEE PROBABLY
ARE DONE BY INVESTIGATORS WHO
DON'T HAVE THAT LICENSE TO
ANALYZE.
IT'S VERY UNCOMMON TO SEE
TRANSPARENT STATISTICAL ANALYSIS
PLANS.
WE LAMENT WE DON'T INVESTMENT
MUCH IN TRAINING OUR
INVESTIGATORS IN STATISTICS AND
ISSUES OF DESIGN AND THAT GOOD
STUDY DESIGNS ARE UNDERUTILIZED.
I MENTIONED THE VERY SIMPLE
TRACES OF RANDOMIZATION AND
BLINDING OF INVESTIGATORS SO
UNDERUSED.
HOW DO WE DO THAT, JUST USE A
CHECKLIST, ADD A LEVEL OF
BUREAUCRACY AND SAY FILLING THAT
CHECK LIST, IS THAT ENOUGH?
IF SOMEONE HAS DONE SOMETHING IN
A WAY THAT IS VERY SUBSTANDARD,
IF IT'S REALLY SILLY, HOW LIKELY
IS IT THAT IT WILL BE 
ACKNOWLEDGED RATHER THAN SOMEONE
CHECK THAT CHECK LIST, DID I,
THAT I'M OKAY?
IT'S A CONTINUOUS EXTENSION.
I WAS USING THE RIGHT
STATISTICAL METHODS.
THERE ARE MANY PLEAS TO TRY TO
CHANGE THE WAY WE MAKE
STATISTICAL INFERENCES AND I'M
NOT GOING TO SPEND MUCH TIME ON
THEM BECAUSE EACH ONE OF THEM
HAS A DIFFERENT PHILOSOPHY MIND.
SO ONE SUCH PLEA IS TO BECOME
MORE STRINGENT.
MANY FIELDS THAT HAVE IMPROVED
THEIR TRACK RECORD DID BECOME
MORE STRINGENT, GENETIC
EPIDEMIOLOGY MOVED FROM 0.05 TO
10 TO THE MINUS 9 AND THINGS
SEEM TO BE WORKING BETTER IN
TERMS OF OF REPRODUCIBILITY.
FOR FIELDS WITH P-VALUES OF
0.05, YOU COULD MOVE TO 0.005
FOR STATISTICAL SIGNIFICANCE,
THAT WOULD IMMEDIATELY ELIMINATE
PROBABLY A LARGE SEGMENT OF
NOISE.
BUT THAT WOULD ALSO TAKE AWAY
SOME GENERAL SIGNAL.
SO THIS NEEDS TO BE BALANCED IN
EACH FIELD IN TERMS OF WHETHER
IT'S A GOOD IDEA OR NOT.
MOST ARE STILL USING NULL 
HYPOTHESIS SIGNIFICANCE TESTING,
NOT A GOOD CHOICE, FOR EXAMPLE
FOR EVALUATING A THERAPY OR
MINING ELECTRONIC HEALTH RECORDS
OR MINING BIG DATA.
WE NEED TO FIND STATISTICAL
METHODS THAT ARE FIT FOR PURPOSE
AND VERY OFTEN THESE MAY BE
BAYESIAN, MAY BE FALSE DISCOVERY
RATE BASED, SOMETIMES THEY
MAY -- WE DON'T INVEST ON
TRAINING INVESTIGATORS AND
RETRAINING THEM WITH CONTINUOUS
EDUCATION ON STATISTICAL METHODS
AND DESIGN ISSUES.
WE JUST TRY TO CATCH UP WITH THE
NEXT TOOL, NEXT TECHNICAL TOOL
THAT MAY BE AVAILABLE BUT NOT
THE CORE OF THE SCIENTIFIC
METHOD.
CONFLICTS OF INTEREST, I THINK
THAT THERE IS IMPROVEMENT IN
TRANSPARENCY OF REPORTING
CONFLICTS OF INTEREST, BUT VERY
OFTEN I WONDER IS TRANSPARENCY
ENOUGH.
CAN WE LIVE THE GENERATION OF
ORIGINAL KNOWLEDGE AND
REPLICATION TO CONFLICTED STAKE
HOLDERS?
WHO ARE THE STAKEHOLDERS WHO
SHOULD BE RUNNING SENSITIVE
STUDIES LIKE RANDOMIZED TRIALS,
METANALYSIS, COST EFFECTIVENESS
ANALYSIS GUIDELINES?
NIH HAS BEEN SHIFTING AWAY FROM
MOST OF THOSE.
AND THERE'S EVIDENCE IF YOU LOOK
AT TRIALS SUPPORTED BY NIH
BEFORE REGISTRATION A GOOD
PROPORTION WERE STATISTICALLY
SIGNIFICANT IN RESULTS.
AFTER REGISTRATION, THIS HAPPENS
UNCOMMONLY.
SHOULD PUBLIC ENTITIES LIKE NIH
RESUME AND EXPAND THEIR ROLE
ABOUT SUPPORTING SENSITIVE
RESEARCH WHERE CONFLICTS REALLY
NEED TO BE AVOIDED THOROUGHLY IF
WE WANT FULL TRUST IN THEM?
SHOULD WE WAIT FOR PERFECTION?
NO.
SCIENCE WILL NEVER BE PERFECT.
IT'S THE BEST THING THAT HAS
HAPPENED TO HUMAN BEINGS, TO
HOMO SAPIENS, BUT IT'S A PROCESS
IN EVOLUTION.
WE NEED TO USE THE BEST SCIENCE
WE HAVE.
I HEAR, OKAY, LET'S GET RID OF
ALL THAT JUNK, THAT HORRIBLE
RESEARCH, AND WE KNOW NOTHING.
THAT'S NOT TRUE.
WE HAVE LIGHTS.
WE HAVE WONDERFUL AMPHITHEATERS,
THE PROJECTOR IS WORKING.
I CAN MOVE MY SLIDES.
LOTS OF THINGS HAVE HAPPENED,
AND I THINK WE NEED TO DEFEND
SCIENCE AND THERE WILL BE MANY
ANTI-SCIENCE VOICES TRYING TO
DISMANTLE THE SCIENTIFIC EFFORT.
THERE ARE TWO WAYS WE CAN GO
ABOUT IT.
ONE IS SAY SCIENCE IS PERFECT
AND PROBABLY THAT'S NOT GOING TO
LEAVE US VERY MUCH ROOM TO PLAY
BECAUSE VERY QUICKLY WE WILL
MEET WITH THE SCIENTIFIC METHOD 
ITSELF WHICH SAYS SCIENTISTS --
VERY OFTEN WE MAY BE WRONG OR
SCIENCE IT OUR BEST SHOT, THIS
IS WHAT WE NEED TO DEFEND.
SOMETIMES OUR BEST SHOT WILL
HAVE LESS CREDIBILITY COMPARED
TO OTHER TIMES.
WE KNOW WITH HIGH CERTAINTY
ABOUT CLIMATE CHANGE, WE KNOW
WITH HIGH CERTAINTY THAT TOBACCO
IS GOING TO KILL PEOPLE.
WE KNOW FAR LESS ON WHETHER
BROCCOLI IS GOING TO MAKE ME
LIVE LONGER.
SO, WE NEED TO BE TRANSPARENT
ABOUT WHAT WE KNOW AND DO NOT
KNOW.
WE NEED MORE RESEARCH ON
RESEARCH.
I'M CLEARLY BIASED BECAUSE THIS
IS WHERE I AM INVESTING MY
EFFORT.
SO PROBABLY I'M ASKING FOR MORE
FUNDING, IT'S LIKE, BUT I THINK
WE NEED TO STUDY HOW EXACTLY TO
EVALUATE OUR RESEARCH PRACTICES.
WE NEED TO FIND WAYS THAT YOU
CAN REFUTE EVERYTHING THAT I
TOLD YOU TODAY WITH MORE
EMPIRICAL DATA AND BETTER
SCIENCE.
WE NEED TO FIND HOW WE CAN BEST
PERFORM RESEARCH, COMMUNICATE
RESEARCH, VERIFY RESEARCH,
EVALUATE RESEARCH AND REWARD
RESEARCH.
WHAT SCIENTIFIC WORKFORCE AND
WORLD OF SCIENCE ARE
REENVISIONING WE CAN MODEL THAT,
WE CAN MODEL SCIENCE IN 2030 OR
2040.
IS IT GOING TO BE ACCURATE?
WELL, I THINK THAT WE MAY WELL
BE WRONG.
BUT THIS IS ONE SUCH MODEL OF
SCIENCE IN THE FUTURE.
WE USED 11 EQUATIONS TO TRY TO
DESCRIBE A SIMPLIFIED UNIVERSE
OF SCIENCE WHERE YOU HAVE THREE
TYPES OF SCIENTISTS.
DILIGENT ARE THE MAJORITY.
YOU HAVE THE CARELESS COHORT OF
SCIENTISTS WHO MIGHT BE CUTTING
A FEW CORNERS, MAYBE NOT VERY
WELL TRAINED, MAY BE FOLLOWING
SUBOPTIMAL RESEARCH PRACTICES. 
HOW MANY ARE THEY?
THEY MADE SURVEYS ABOUT THAT,
WHEN YOU ASKED PEOPLE ARE YOU
CUTTING CORNERS THE ANSWER IS
USUALLY NO.
WHEN YOU ASK PEOPLE, DO YOU KNOW
OTHER SCIENTISTS IN YOUR
ENVIRONMENT WHO ARE CUTTING
CORNERS, THE ANSWER IS ALMOST
ALWAYS YES.
[LAUGHTER]
BUT I THINK LET'S SAY THESE ARE
THE MINORITY, AND THEN WE HAVE
THE UNETHICAL COHORT, CLEAR
FRAUDS, YOU KNOW, CREATING DATA
THAT DON'T EXIST.
FRAUD IS VERY UNCOMMON, WE'RE
TALKING LESS THAN 1%.
IF YOU HAVE INCENTIVIZE THESE
COHORTS WITH THE SAME
INCENTIVES, IF YOU DISCOVER
SOMETHING, IF YOU MAKE A CLAIM,
PUBLISH A PAPER IN "NATURE," YOU
WILL BE FINED.
AND YOU DON'T HAVE DIFFERENTIAL
INCENTIVES BASED ON
REPRODUCIBILITY, WHAT YOU'LL GET
IS THAT THE UNETHICAL AND
CARELESS COHORT WILL TAKE OVER.
THE REASON FOR THAT IS THAT THEY
CAN GET THEIR FASTER WITH FEWER
RESOURCES, WITH CUTTING CORNERS.
SO WE NEED TO REENGINEER REWARDS
SYSTEM.
I'LL LEAVE YOU WITH A COUPLE
SLIDES ON HOW WE DO THAT.
WE TRY TO INCENTIVIZE
PRODUCTIVITY, AND PRODUCTIVITY
IS WONDERFUL, BUT WE NEED TO
THINK ABOUT THE WHOLE ELECTRO 
ELECTROCARDIOGRAM, QUALITY,
REPRODUCIBILITY, SHARING AND
TRANSLATION IMPACT.
PQRP.
WE NEED TO FIND OPPORTUNITIES TO
CHANGE THE WAY WE DO SCIENCE IN
OUR EVERYDAY ENVIRONMENT.
IT NEEDS TO BE A GRASS ROOTS
MOVEMENT.
WE CANNOT JUST WAIT FOR SOME
KING OF SCIENCE TO IMPOSE WITH
HIS OR HER AUTHORITY OR QUEEN OF
SCIENCE WHAT EXACTLY SHOULD BE
DONE.
I THINK IT'S SCIENTISTS WHO
REALIZE WHAT MAKES THEIR
SCIENTIFIC WORK MORE CREDIBLE, 
MORE REPRODUCIBILITY, MORE
APPLICABLE.
WE NEED TO CONVINCE OTHER
STAKEHOLDERS.
SCIENTISTS WANT TO PUBLISH A LOT
AND WANT FUNDING BUT THERE'S THE
PRIVATE INVESTORS, PRIVATE, NOT
FOR PROFIT FUNDERS, EDITORS,
PUBLISHERS, SOCIETIES, UNIVERSE,
RESEARCH INSTITUTIONS.
NON-SCIENTIFIC STAFF, HOSPITALS,
INSURANCE COMPANIES,
GOVERNMENTS, FEDERAL
AUTHORITIES, PEOPLE.
SOME OF THEM WANT TO SEE PAPERS,
OTHERS WANT TO SEE FUNDING.
SOME WANT TO SEE THINGS THAT
WORK, OTHERS WANT TO MAKE
PROFIT.
IT'S ALL FINE.
BUT IT NEEDS TO BE INTEGRATED TO
TRY TO GET THE BEST POSSIBLE
SCIENCE.
TO CONCLUDE, I THINK THAT THE
PRESUMED DOMINANCE OF ORIGINAL
DISCOVERY OF REPLICATION IS AN
ANOMALY.
IT'S REALLY THE EXCEPTION.
ORIGINAL DISCOVER CLAIMS
TYPICALLY HAVE SMALL OR NEGATIVE
VALUE AND SCIENCE BECOMES WORTHY
MOSTLY BECAUSE OF REPLICATION.
THERE IS SUBSTANTIAL ROOM FOR
IMPROVEMENT, DOESN'T SEEM SIGH
AS  -- SCIENCE IS NOT GOOD, IT'S
THE BEST THAT CAN HAPPEN TO
HUMANS BUT WE CAN MAKE IT
BETTER.
THERE ARE MANY POSSIBLE 
INTERVENTIONS THAT MAY IMPROVE
EFFICIENCY, AND TRANSPARENCY,
OPENNESS AND SHARING IS LIKELY
TO HELP, BUT DETAILS ON HOW TO
CAN BE IMPORTANT AND WE NEED TO
FIND HOW DETAILS PLAY OUT IN
DIFFERENT SETTINGS.
A MILLION THANKS TO YOU FOR
LISTENING AND SPECIAL THANKS TO
A NUMBER OF COLLABORATORS THAT
HAVE JOINED FORCES WITH ME OVER
THE YEARS TO GENERATE SOME OF
THE EMPIRICAL EVIDENCE I SHARED
WITH YOU TODAY. 
THANK YOU.
[APPLAUSE] 
I'M ONE OF THE BIGGEST FANS YOU
HAVE.
MY LAB IS DEFINITELY READING
YOUR PAPERS.
I THINK IT'S ABSOLUTELY TRUE
THAT INCENTIVE SYSTEM IS
COMPLETELY WRONG.
SCIENCE, AS YOU SHOWED IT, IT'S
REALLY A SYSTEM, RIGHT?
AND IT WORKS BASED ON THE
INCENTIVES THAT ARE PUT INTO IT.
I HAVE RECENTLY TENURED HERE.
IN TEN YEARS OF MY TENURE TRACK,
NOBODY EVER GAVE ME ANY CREDIT
FOR REPRODUCIBILITY.
ALMOST WHEN WE PRESENT DATA AT
THE CONFERENCES, I THINK WE ARE
ENEMIES OF MAJORITY, WHEN WE
PUBLISH -- I MEAN MY LAB
PROBABLY HASN'T PUBLISHED STUDY
WHERE WE DIDN'T INCLUDE
INDEPENDENCE VALIDATION COHORT
IN THE PAST FIVE YEARS.
WE USUALLY GET FROM REVIEWERS,
OH, THEY SHOULD HAVE SECOND
INDEPENDENT VALIDATION COHORT,
RIGHT?
WHEN, YOU KNOW, 90% OF
EVERYTHING THAT IS PUBLISHED OR
PRESENTED DOESN'T HAVE EVEN THE
FIRST ONE, RIGHT?
SO UNLESS WE CHANGE THE
INCENTIVES, UNLESS WE USE, FOR
EXAMPLE, THESE H FACTORS THAT
FACTOR IN WHAT KIND OF
METHODOLOGICAL, YOU KNOW, 
ADVANCES WE HAVE USED OR THE
CHECKLIST, WHAT WE DID, WHETHER
OUR WORK IS REPRODUCIBLE,
IRRESPECTIVE WHETHER IT'S
POSITIVE OR NEGATIVE SCIENCE
WILL CONTINUE THE WAY IT'S
CONTINUING.
>> I SYMPATHIZE WITH WHAT YOU
DESCRIBE AND THINK THIS IS A 
FEELING MANY SCIENTISTS IN
DIFFERENT FIELDS CONVEY TO ME
ALL THE TIME BUT I THINK THERE
IS PROGRESS.
YOU HAVE FIELDS WITH A DOMINANT
VIEW THIS IS IMPORTANT TO DO,
AND I THINK THAT THESE FIELDS
ARE LIKELY TO BE MORE SUCCESSFUL
IN THE LONG TERM, AND IT'S NOT
GOING TO HAPPEN OVERNIGHT.
I THINK THAT IF WE FOCUS ON
TRAINING YOUNGER SCIENTISTS AND
HOPEFULLY RETRAINING SOME OLDER
ONES ABOUT WHAT REALLY MATTERS
AND WHY WE'RE DOING THIS AND WHY
IT IS IMPORTANT TO DO GOOD
SCIENCE RATHER THAN SLOPPY
SCIENCE, I THINK THIS IS LIKE AN
EVERYDAY EFFORT, EVERYDAY
STRUGLE, NOT ONE COURSE ONE
WOULD TAKE.
I HEAR THE QUESTION IS THERE ONE
COURSE I CAN TAKE?
NO, THIS IS SCIENCE IN
CONTINUITY, YOUR EVERYDAY METHOD
AND APPLICATION, AND THERE'S NO
SINGLE COURSE THAT CAN REPLACE
YOUR EXPERIENCE AS A SCIENTIST.
>> THANK YOU FOR THAT TALK.
I AGREE IT'S GOING TO TAKE A
CULTURAL CHANGE TO MOVE FORWARD
CONSIDERING WE'RE AT NIH AND
THERE MIGHT BE PROGRAM STUFF, IN
THE AUDIENCE.
DO YOU HAVE SUGGESTION ON THE
FUNDING SIDE AS A MAJOR FUNDER
OF BIOMEDICAL RESEARCH.
>> SUGGESTIONS APPLIED TO NIH,
NIH HAS TREMENDOUS POWER TO
FACILITATE  PRINCIPLES AND HAS
MOVED IN THAT DIRECTION ON MANY
FRONTS, PROBABLY NOT ALL THE
MOVEMENTS HAVE BEEN EQUALLY
EVIDENCE-BASED BUT  PEOPLE WHO
ARE RECEPTIVE AND WANT TO CHANGE
THE WAY THINGS ARE DONE.
CLEARLY IF YOU HAVE NIH GIVING
PRIORITY TO INCENTIVE
STRUCTURES, TO OPENNESS AND
SHARING, AND FOCUSING ON GOOD
WORK AS OPPOSED TO JUST QUICK
AND SUCCESSFUL, IT CAN BE A
TREMENDOUS IMPACT.
>> YES, THERE WAS A HIGH PROFILE
CASE OF THERANOS, STATISTICSS
CALLED THEM OUT AND SAID IT WAS
NOT GOING TO WORK, THE BIOTECH
COMPANY, DONE WITH INCOMPLETE
KNOWLEDGE OF UNDERLYING DATA
FRO THE COMPANY.
SO THE QUESTION IS, WHAT DOES
GUT ININSTINCT TELL YOU WITH
RESPECT TO HOW SOMETHING IS
GOING TO TURN OUT, INTERESTING
IF YOU TOOK THE GUT INSTINCT AND
SAID I THINK THIS IS GOING TO BE
PROFOUND VERSUS TURN OUT TO BE
FAKE, REGARDLESS OF UNDERLYING
STATISTICAL VALUE.
COULD YOU COMMENT ON THAT?
I THINK THAT WOULD BE REALLY
IMPORTANT IN TERMS OF BIOTECH
I.P.O.s.
>> TRANSPARENCY APPLIS JUST AS
WELL, IN THE CASE OF BIOTECH AND
STARTUPS AND UNICORNS.
ACTUALLY I THINK THAT THE FIRST
TO WRITE THE CRITICAL PAPER ON
THERANOS IN JAMA A YEAR BEFORE
JOHN STARTED PUBLISHING HIS
"WALL STREET JOURNAL"
INVESTIGATIONS, AND I SENT THAT
TO JAMA IN 2014, WENT THROUGH
REVIEW AND LEGAL REVIEW BECAUSE
I WAS CHALLENGING THE HIGHEST
VALUATION START-UP IN THE
COUNTRY VISITED BY VICE
PRESIDENTS AND ALL THE
INFLUENTIAL PEOPLE WERE ON THE
ADVISORY BOARD, AND EVERYBODY
WAS SO EXCITED.
I WAS SAYING THEY HAVE NO
EVIDENCE TO SUPPORT WHAT THEY
CLAIM.
I DON'T SEE ANY PAPER THAT THEY
HAVE PUBLISHED.
I THINK THEIR VALUATION IS $9
BILLION BUT COULD BE $9
ACTUALLY, THE PAPER WAS
PUBLISHED EVENTUALLY IN JAMA.
I GOT A LOT OF PUSHBACK AT THAT
TIME.
I HEARD FROM THE GENERAL COUNSEL
ASKING ME TO RECANT, AND THAT
THEY WERE GETTING FDA APPROVAL
WHEN THEY DID GET THEIR FIRST
AND SINGLE AND ONLY FDA
APPROVAL, ALL OF THAT IS WAY
BEFORE THE "WALL STREET
JOURNAL."
I WAS TOLD AGAIN TO RECANT AND
WRITE AN EDITORIAL WITH THEIR
CEO THAT I WAS WRONG, AND I
REMEMBER WASHINGTON POST WRITING
A STORY LIKE THE INSANELY
INFLUENTIAL STANFORD PROFESSOR
ASKING TOO MUCH OF THERANOS AND
WHAT A SHAME, YOU DON'T
UNDERSTAND HOW INNOVATION IS 
HAPPENING.
PEOPLE NOW THINK THERANOS WAS AN
EXCEPTION.
IT WAS A FRAUD AND EVERYBODY IS
FINE.
I DON'T BELIEVE THAT.
A COUPLE MONTHS AGO WE PUBLISHED
A PAPER LOOKING AT EVERY UNIFORM
IN THE HEALTH CARE SPACE, THE
MAJORITY OF THEM LOOKED PRETTY
MUCH LIKE THERANOS IN TERMS OF
THEIR TRANSPARENCY AND
AVAILABILITY OF PUBLISHED
PEER-REVIEWED SCIENCE.
SO I'M NOT SAYING IT'S FRAUD.
BUT UNLESS WE IMPROVE
TRANSPARENCY IN SCIENCE FOR
THESE ENTITIES I THINK THAT
WE'RE RUNNING THE RISK OF HAVING
DEJA VU THERANOS TIMES TWO,
TIMES THREE, TIMES FOUR IN THE
NEAR FUTURE.
>> THANK YOU FOR THE TALK.
I'M IAN HUTCHINS, A DATA
SCIENTIST AT THE NIH AND SPEND A
LOT OF TIME DEVELOPING AND USING
RESEARCH ASSESSMENT METRICS FOR
PORTFOLIO ANALYSIS.
ONE OF THE THINGS I OBSERVED IS
THERE SEEMS TO BE A LOT OF
CULTURAL AND POLICY BARRIERS TO
REPRODUCIBILITY EFFORTS.
JOURNALS WILL OFTEN HAVE THEY
WON'T PUBLISH REPLICATION
STUDIES, EVEN PLOS ONE UNTIL
RECENTLY HAD THAT POLICY IN
THERE.
I HAVEN'T LOOKED SPECIFICALLY
BUT ONE IMAGINES APPLICATIONS
FOR FUNDING THAT FOCUS ON
REPLICATING A PREVIOUS STUDY
DON'T FARE THAT WELL IN THE
SYSTEM.
HOW DO YOU THINK THE LOG JAM CAN
BEST BE BROKEN UP?
>> SO, I THINK WE NEED MORE
TRAINING AND MORE PEOPLE TO
UNDERSTAND WHAT IS AT STAKE
HERE.
I BELIEVE MANY JOURNALS HAVE
CHANGED THEIR STANCE OVER TIME
AND MANY ARE SYMPATHETIC, MANY
FIELDS THAT HAVE ADOPTED
REPLICATION MASSIVELY LIKE
GENETICS WOULD NOT ALLOW YOU TO 
PUBLISH NOTICE THE REPLICATED, A
SIGN QUA NON.
WE NEED TRAINING.
WHERE EDITORS WERE ASKED TO
PERFORM AN INVESTIGATION ABOUT A
THIRD OF THEM COULD NOT EVEN
TELL A STUDY WAS A RANDOMIZED
TRIAL.
METHODOLOGIC EXPERTISE IS
LACKING, WE NEED TO PUSH AND
EDUCATE PEOPLE TO BE MORE
KNOWLEDGEABLE HOW THINGS ARE
WORKING IN THE WAY THAT WE DO
SCIENCE.
THE BEST WILL BE THE ONES THAT
PUBLISH THE BEST SCIENCE
EVENTUALLY AND THESE ARE
SURVIVE, THINKING IN AN
EVOLUTIONARY MODE.
I DON'T WANT TO THINK THE BEST
JOURNALS ARE GOING TO DISAPPEAR
AND WE'LL JUST GET MORE AND MORE
NOISE.
>> THANK YOU VERY MUCH.
>> WE NEED TO EXIT THE
AUDITORIUM FOR ANOTHER GROUP
COMING IN.
WE HAVE PEOPLE ONLINE TO ASK
QUESTIONS.
JOIN US IN THE NIH LIBRARY TO
ASK QUESTIONS THERE. 
THANK YOU.
01:07:33.249,00:00:00.000
[APPLAUSE] 
