Thank you all for coming out tonight and
giving me the opportunity to talk to a
somewhat different group of people from
the people I normally talk to our
scientific results of talk to about our
scientific results so I wanted to start
by as Bruce mentioned in his
introduction by talking a little bit
about the New York Times those of you
who read the New York Times science page
or the Washington Post or The Guardian
or look at National Geographic or any of
these other publications will have seen
headlines like this it's a very nice
article that came out about a year ago
by one of the best science writers in
the area of evolution Carl Zimmer about
reconstructing the Tree of Life and if
you were to click through and look at
that article you would see an image that
looks like this showing the evolutionary
relationships among 3,000 different
species sampled from across the globe
and if you look closely at this picture
you'll see that most of this tree is
made up of single-celled organisms
bacteria and primitive single celled
organisms called archaea
and the organisms that we know best
including animals, plants and fungi are
all down in this little tiny corner here
and the animals that we identify with
the most such as vertebrates and mammals
are so small that they don't even get a
label on this graph so some of these
pictures are quite remarkable you can
see from this picture that everything
traces back to a single universal
ancestor of all living things that would
have lived about 3.5 billion years ago
as best we can tell from the fossil
record and from genetic analysis if you
continue to look through these sorts of
popular publications you see a number of
different articles about evolution
dinosaur evolution, evolution of
influenza,
fruit fly evolution and the way that
natural selection influences fruit flies,
evolution of lice, co-evolution in mammals
and dinosaurs and of course many
articles about human evolution the
evolution of mountain populations and
their adaptations to high altitude the
population of Australia and of the
Americas and then of course many
articles about one of my favorite topics
Neanderthals and one of these articles
actually describes the paper that we
published a little more than a year ago
and I'm going to tell you about that
today now of course if you're less
selective about your sources you might
encounter some articles like this one
which happens also to be about our paper
I'm sorry to say this is an actual news
report in The Huffington Post talk about
a corseting of public discourse it's
rather embarrassing and this is even
before the Trump era all right but what
I want to talk about today is how do we
actually know this stuff these findings
are obviously astonishing these stories
about human evolution, about dinosaurs,
about the tree of life but how can we
figure this sort of thing out from
modern day evidence the short answer I'm
going to give you is a term called
molecular phylogenetics
where phylogenetics comes from the term
phylogeny introduced by Ernst Haeckel
in the mid 19th century a German
biologist essentially means genesis and
evolution of a phylum or a branch of
life and molecular refers to the
analysis of DNA sequences for the most
part these days but also protein
sequences the sequences of amino acids
that make up proteins RNA sequences and
other biomolecules and so in this talk
tonight I'm going to try to give you a
sense for what this field is how it
developed over the last 50 years or so
and then towards the end of the talk
what it can tell us about our own
ancestry and our relationship to neanderthals
so like any good academic I'll
start by establishing my credibility
so I've been studying molecular
phylogenetics for a long time I got very
interested in this topic right after
college in the early 1990s when I was
working at Los Alamos National Labs
studying HIV and I discovered this whole
world of making use of computers to
reconstruct the past and became
fascinated by it over time we've
published of papers describing the
evolutionary relationships among
cultivated plants we've described
processes by which bacteria transfer DNA
from one strain to another this is a
process called horizontals transfer
we've studied complex families of
genes that evolved by duplication and
loss as well as through speciation we've
studied small RNAs in fruit flies and
many many other topics but there's a
collection of core techniques that we've
used again and again throughout these
sorts of analyses and they rely on
modeling the evolution of DNA sequences
along the branches of an evolutionary
tree or a phylogeny and so I'm going to
try to tell you a little bit about how
this area came about and how it works so
no talk about phylogenetics would be
complete without this slide how many of
you seen this picture alright quite a
few maybe 20% or so there's a bit of
famous image it's claimed to be the
first phylogeny this is actually drawn
by Charles Darwin in his famous notebook
B about 1837 so that's 20 years before
the Origin of Species was even published
so he was doodling in his notebooks
pictures of evolutionary trees and he
realized as soon as he started to think
about these processes by which one
species would evolve from another that
this would give rise to a branching
structure where more primitive
organisms were at the base of the tree
and as you got closer to the tips you
approached the present-day and you would
you would have a series of branching
operations that would lead to a cut of a
family tree among all living
species today Darwin was still quite
taken with this idea by the time he
published on the Origin of Species and
this is actually the single figure in in
that book if you flip to the very back it's
not even numbered because there's only
one figure in the book it's mostly text
you go to the very back you'll see this
image here the single figure in the
Origin of Species and at times Darwin
spoke rather poetically about this image
of the tree he talked about how limbs
divided into great branches were
themselves one when the tree was small
budding twigs and so on and so forth he
wasn't the first one to think in terms
of trees you can see the image of the
tree you can see precursors of this idea
of the tree and the work of Linnaeus and
Lamarque and others but Darwin was the
first one to sort of unify this idea of
evolution with a tree and and realized
that it would imply that all life on
Earth was related by a single tree so
for many years biologists tried to build
trees but not having DNA sequences they
had to work with observable traits and
this was was an area that became known
as cladistics biologists would identify
particular characters phenotypic
characters, morphological characters in
organisms and try to come up with
branching relationships that would that
would only require those characteristics
to emerge once so for example they
imagined that a vertebral column would
emerge once and that would separate a
lamprey from a land slit and then jaws
would emerge once and they would
separate a tuna from a lamprey and so on
and so forth and in that way they were
able to get a pretty good idea of what
the Tree of Life might look like but of
course there were many difficult to
resolve evolutionary questions parts of
the tree that were difficult to work out
because there weren't good
characteristics physical characteristics
that separated one group of organisms
from another so the the key development
that people tend to point back to in
the emergence of molecular phylogenetics
is an observation by Linus
Pauling and Ear- sorry Emil
Zuckerkandl
in the early 60s who were studying the
hemoglobin protein and they were looking
at hemoglobin proteins that they had
sequence from various different species
and they knew something about the
evolutionary relationships about these
species and about how long ago they must
have diverged based on the fossil record
and they noticed that the numbers of
differences in these amino acid
sequences were were roughly proportional
to the estimated evolutionary time since
these species have diverged so they
introduced this idea of a molecular
clock of a clock that's ticking over
time laying down new mutations on on
these amino acid sequences and those
sequences those mutations are
accumulating over time so that things
that are more distantly related have
more mutations between them and things
that are more closely related have fewer
mutations between them so their idea
looks something like this you would have
a gene in some ancestral species that
species would split through some sort of
speciation event into two daughter
species maybe one group of organisms in
the population would migrate to the
other side of the river and stop
interacting with the other subset of
that species and over time they would
diverged from one another into two
subspecies and then those subspecies would
begin to accumulate mutations separately
and so now if you were to compare a
protein from one of them with a protein
of another you might see that there were
two mutations unique to this one and two
mutations unique to this one but now as
time goes on more mutations would be
accumulated and perhaps additional
speciation events would occur and you'd
have more and more differences
accumulating between the proteins that
were present in these individual species
so now if you looked at the protein from
species B and species C they would only
differ at a few places but the proteins
for
B and C would differ from the protein
from species A in many more locations
and in this way you could start to
imagine reconstructing an evolutionary
tree by counting up the numbers of
differences in these proteins that was
the core idea introduced by Zuckerkandl
and Pauling so if you look at modern-day
data for proteins for a particular
protein in this case the cytochrome C
protein from a number of different
organisms in this case we're focusing on
a number of mammals and you plot the
estimated number of years since those
two species diverged from a common
ancestor as estimated from the the
fossil record against the number of
substitutions in this case they're going
to be DNA substitutions rather than
amino acid substitutions but the
principle is the same you would see over
time an approximately linear
relationship between those two
properties and in this way these
mutations can be thought of as a kind of
a clock that we can use to date the time
since things diverged and also to
reconstruct the shapes of the
evolutionary trees that describe their
relationships so Zuckerkandl and
Pauling observations were really just
sort of empirical they just noticed this
property of proteins they didn't really
give a recipe for how to reconstruct the
phylogeny from this sort of data but a
few years later this became a very
active area of research and one of the
pioneers in this area was the Italian
human geneticist Luca Cavalli-Sforza who
collaborated closely with a British
statistician Anthony Edwards and they
came up with the first recipes for using
this sort of data to reconstruct a tree
that would show how closely different
organisms were related and how long ago
they might have diverged so over the
next 10 years or so there were a large
number of different types of techniques
proposed
for reconstructing these trees I want to
show you what one of them looks like
this turns out to be one of the most
intuitive and easy to understand and
also one of the most powerful techniques
for reconstructing evolutionary
relationships it's called parsimony
because it tries to find an evolutionary
history that minimizes the number of
changes required to explain the observed
data and I'll show you what I mean by
that as we go forward imagine we have
three species one, two and three and for
simplicity let's imagine we know their
ancestral sequence maybe we can infer it
by looking at a distant distantly
related relative and we want to find
what the evolutionary relationship is
among species one, two and three let's
focus first on just one variable
position in those sequences this is
known as a site in the literature so at
this particular position species one and
two have a C in species three has an A
and now we're going to consider that
there are three possible evolutionary
relationships among those three species
either one and two could be most closely
related with three as an out-group or
one and three could be most closely
related with two is an out-group
or two and three could be most closely
related with one as an out-group those
are the only possible relationships
among three species and now let's try to
imagine the minimal sequence of
mutations that could explain the
observed data if we assume that there's
an A at the root of the tree well if
there's an A at the root of the tree
then we can explain this data under the
first tree by just one mutation from an
A to a C along this branch leading to
species one and two does everybody see
that there were a mutation there it
would lead to a shared C in species one
and two while species three would still
have an A if we try to explain the same
data using species two or three we can
only do it with a minimum of two
mutations requires two mutations to
explain this pattern under species under
under tree two and it requires two under
tree three now that doesn't necessarily
means tree two or tree three
wrong there are going to be cases where
there are multiple mutations that happen
at a site but if we systematically see
across all positions in the data that
one tree is supported more than the
others that gives us a strong belief
that that must be the true evolutionary
relationship among those species and in
this case if we look at the other sites
sites 2, 3 and 4 and similarly try to
match them up with the tree I'm not
going to go through all the details we
do that we find that actually none of
them none of these sites strongly
supports one tree over the other they
all require five mutations across these
three sites to explain but if we had a
large number of sites we could add them
up and we could say what's the total
number of events required under each of
these trees in order to explain the
observed data in this case we get six
events under tree one, seven events under
tree two and seven events under tree
three so that gives us some confidence
that tree one is most consistent with
the data now in this case maybe not a
whole lot of confidence maybe this is
not the greatest example but in real
examples we would hope we would look at
hundreds or thousands of sites and see
many many dozens or many hundreds of
cases where you prefer one species or
one tree over the other okay and in
practice that's what people do when they
analyze these data so here's an example
this is the famous example so one of
those cases that I mentioned where
morphological characters were difficult
to resolve a question of evolutionary
relationships is the case of the great
apes so in particular the question
of whether humans Homo sapiens are more
closely related to chimpanzees Pan
troglodytes or gorillas gorilla gorilla
and this was a problem that plagued
taxonomists for many years because
there are many derived traits among all
three of those organisms and it wasn't
clear which two were more closely
related than the other so by the late
1980s Goodman and his group had obtained
quite a lot of sequence data for the
time
tens of thousands of DNA nucleotides
from each of these species this was in
the area of the beta globin gene and
they used those to perform this sort of
parsimony analysis that I just told you
about and what they found was that they
the best tree was the one that I'm
showing here that groups humans and
chimps with gorillas as an out-group and
that tree required 383 different
substitution events nucleotide mutations
and they map those to the branches of
the tree and are quite a few not a huge
number but a significant number that
support that grouping of humans and
chimps so these are mutations that are
shared by humans and chimps and not
shared by the other great apes right and
this gave them quite a lot of confidence
that this was the true evolutionary
relationship among these species another
one of my favorite examples also sort of
a classic in the phylogenetic
literature has to do with the cetacean
whales, dolphins and porpoises so as many
of you know whales are mammals but it's
not obvious how they relate to other
mammals because they're morphologically
so distinct they're so highly diverged
from other mammals so this was a problem
that plagued taxonomists for many years
as well and a number of papers in the
late 90s most notably this one by a
Japanese group in 1999 obtained sequence
data from toothed whales and baleen
whales along with many other mammals and
they showed very clearly that the
closest relatives of mammals were
hippopotamuses so this was quite
striking the the whales, dolphins and
porpoises trace their ancestry to an
ancestor of hippopotamuses about 50
million years ago and this is something
that is now fairly well supported by the
fossil record as well it appears that
this this evolutionary divergence
happened in and on the Indian
subcontinent actually started probably
with a terrestrial mammal
and and some time later they made their
way into the ocean so the fact that
hippopotamuses are aquatic as well is
an example of convergent evolution they
it's believed from the fossil record
that their ancestors were terrestrial
okay
another great figure from the early days
of molecular phylogenetics was a guy
named Allan Wilson and I want to focus on
Wilson in particular here because he was
especially interested in the evolution
of humans and of the great apes and he
was a very prolific author throughout
the 1960s 70s and 80s and one of the
pioneers in obtaining sequence data from
humans and other apes and finding out
the relationships among those
individuals he also is important in
that he trained a number of important
people in the field including Svante
Pääbo who's the person I'm going to tell
you about a little bit later
one of the pioneers in Neanderthal DNA
sequencing Allan Wilson also trained of
Mary-Claire King who's the discoverer of
the BRACA1 breast cancer gene which
some of you might have heard of so he's
a very influential person in genetics
and evolution during this period
actually if you look closely you can see
in this picture here he's drawing
molecular clock pictures this is
cytochrome C, there's hemoglobin and
there are a few others he's joined
drawing these pictures like the one I
just showed you about how as time goes
on proteins diverged in a roughly linear
fashion so Wilson and his colleagues
Rebecca Caan and Mark Stoneking
published a very important paper also in
the late 80s this time in Nature this
was really the first large-scale study
of human evolution based on
mitochondrial DNA so they collected 147
samples from 147 different people from
around the world sequence their
mitochondrial DNA and then built a big
evolutionary tree using parsimony
like the ones I just told you about
describing how those individuals were
related you can't see what I'm showing
you there so I'm going to zoom in a
little bit on a subset of these
individuals what they found was that
they looked at multiple populations from
around the world
Africans, Asians, Australians,
Europeans
what they found was that most of these
non African groups such as the Europeans
formed of clades they called clusters on
the trees that they were able to
reconstruct but the Africans almost
invariably fell outside of the variation
in these non African subgroups and
that's very very strongly suggested that
Africa was the original source of human
genetic diversity and that these various
groups had emerged out of Africa
sometime after the African diversity had
already been established and this is
what supported now as I'll show you by
many many subsequent studies in general
we see much greater evolutionary
diver- diversity within Africa
than we see in these non-African
populations and these typically
represent subsets of the genetic
diversity that had been present in
Africa and then moved out possibly in
multiple colonization's you can see in
their abstract they actually mentioned
multiple origins for non-African
populations and we'll see later in my
talk but that's something that has
persisted to the day and something that
our work tends to support another
another piece of this study was they
obtained an estimated date for the
divergence of all of these populations
and they estimated at about 200,000
years ago that turns out to be a date
that also holds up pretty well we'll
come back to that as the talk goes on so
this led to the terminology
mitochondrial Eve some of you may have
heard of this that the idea is that all
people on Earth can trace their maternal
inheritance back to one woman who
lived in Africa about 200,000 years ago
and she would be mitochondrial Eve so I
neglected to tell you some of you may
know this but the but the mitochondrial
DNA is is inherited from your mother
only from it's an maternally inherited
molecule whereas most of your DNA is
inherited from both parents so when you
reconstruct the history of human
populations using mitochondrial DNA
you're reconstructing only your maternal
history so this these results referred
only to that
all right so throughout the the 1990s
people continued to work hard on these
phylogenetic methods for understanding
human populations and a particular
pioneer in this area was this guy Luca
Cavalli-Sforza who I mentioned earlier
as one of the pioneers of developing
phylogenetic methods by this time he was
at Stanford and he carried out a very
ambitious research program traveling
around the world obtaining samples from
people and studying them using
phylogenetic methods including
mitochondrial DNA, Y-chromosomal DNA and
DNA from the rest of the genome and he
also was a pioneer in comparing and
contrasting his genetic findings with
what could be found through the study of
linguistics and through the study of
cultures and so on and so forth
and he wrote a very important book I
think came out in 1994 that really
captured the state of the field at that
time I'm not going to go through these
individual papers but I'm going to
instead give you a summary of about what
was known about human evolution around
2000 actually taken from a review
article by Cavalli-Sforza and his
colleague Mark Feldman from 2003 so at
this time roughly 15 years ago it was
essentially established to their to
their best guess using the data they had
available that anatomically modern
humans had emerged probably in East
Africa although
there were some that argued for South
Africa around 200,000 years ago and that
by about a hundred thousand years ago
these groups had begun to split and
spread out across the African continent
and give rise to the different African
populations that we see today
for example, the Bantu of
northern and western Africa and the Sān
of southern Africa and then by around
sixty or seventy thousand years ago one
or more waves of migration occurred off
of the African continent these early
humans began to populate the rest of the
world through several different paths
there was at least one southern
migration to the east at least one
northern migration to the east and at
least one migration to the west there
quite early remains in Australia going
back as long as 60,000 years ago and
there are remains in China of
anatomically modern humans that also go
back 60,000 years so these are quite
early of colonization's the evidence in
Europe was for us a slightly later
colonisation about 40,000 years ago and
then of course the population of the new
world was considerably later required
crossing the Bering Land Bridge probably
15 to 20 thousand years ago and again
this appears from subsequent work to
have occurred in multiple waves rather
than in one wave of colonization all
right so this was essentially what was
known at that time and then around 2008
or so this game really began to change
dramatically and it really changed
because of DNA sequencing technologies
so so a new type of technology for
obtaining DNA sequences very very
cheaply and in very high volumes began
to emerge in the mid 2000s and it became
clear that we could start to obtain
complete genome sequences from
individuals across the globe and the
culmination of this effort was a project
called 1,000 Genomes Project
which has now obtained very high-quality
complete genome sequences
for several thousand humans from
multiple populations from from across
the globe and as this became possible it
became clear that we no longer had to
restrict ourselves in these sorts of
studies to mitochondrial DNA or
Y-chromosomal DNA we could study
complete genome sequences for humans and
use those to try to understand our
evolutionary history all right
so that sounds good more data is usually
good but it turns out that in this case
more data leads to some significant
complications and I'm going to try to
give you a little bit of a sense for how
this problem becomes more difficult when
you look across the entire genome rather
than looking say just at the
mitochondrial genome or just at the Y
chromosome which are inherited as units
Y chromosomes paternally and
mitochondrial DNA maternally okay so one
issue is that we have two copies of
every chromosome so if you look at lot
one of my genes see my hemoglobin gene I
have a copy that I inherited from my
mother and a copy that I inherited from
my father and those copies have
different evolutionary histories in the
same way that my mother and my father
have different evolutionary histories so
if we look at a collection of individual
chromosomes from modern-day individuals
we're going to count backwards in time
so x 0 is the present day now we can
think of each individual as having two
tips in that tree right so the blue
individual has a tip here and a tip
there one is the maternal copy and one
is the paternal copy of the particular
gene that we're looking at same for the
green individual and same for the purple
individual we can then trace backwards
in time and build up a phylogeny
all of those in- for all of those
individual chromosomes but it's no
longer at the level of individuals it's
now at the level of chromosomes okay so
that's one complication if I build an
evolutionary tree for a single gene in
the genome I have to keep track of the
fact that each individual has two copies
of that gene when it gets really
complicated is when we think about the
problem of recombination so some of you
might remember from your high school
biology class - it's okay if you don't - that
when you're when your cells go through a
process called meiosis the process of
cell division that leads to sperm and
egg cells that the paternal and the
maternal chromosomes swap genetic
material with one another so if this is
the paternal and this is the maternal
chromosomes they cross over and some
material from the maternal chromosome
ends up on the paternal chromosome and
vice versa
and that happens every generation on
every chromosome essentially what that
means is that over time the different
genes on a chromosome will have
different evolutionary histories if I
look at my hemoglobin gene it's going to
have one evolutionary history a
different one for my mother and for my
father but one evolutionary history for
each of those if I then go to my
cytochrome-C gene because it's in a
different location on the genome and
things have been shuffled by the process
of recombination it's going to have a
different evolutionary history so at
every position along the genome I'll
have a different tree describing the
relationships among the chromosomes at
that position now this turns out to be
good and bad it's bad in that it makes
things very complicated to study when I
try to reconstruct evolutionary trees
from population samples of humans I have
to deal with this nasty problem of the
tree changing as I go along the
chromosome but it's good in that I'm
actually sampling a much larger portion
of my ancestry remember with the Y
chromosome I'm only looking down one
lineage I'm looking at my father, my
father's father, my father's father's
father and so on I'm only looking down
one lineage of all my possible ancestors
similarly with mitochondrial genome in
this case at every locus I'm sampling a
different set of
ancestors because things have been
swapped around in different ways by this
process of recombination so it
potentially gives me a lot more
information about my ancestry a lot more
information about how long ago different
populations might have diverged a lot
more information about gene flow between
populations as we'll see in a moment and
more information about how large
ancestral populations might have been so
let me talk a little bit about this
issue of gene flow because that's where
I'm trying to take you with this whole
study just like The Huffington Post said
so imagine that we have two completely
genetically isolated populations let's
say they they live on separate islands
and they don't have any technology for
getting between the islands and they
diverged some number of generations ago
that will call tau now if I sample an
individual from each of those
populations at a single locus and I
trace them back then they're going to
find some common ancestor and that
common ancestor will vary from one
position along the genome to the next
because of historical recombination just
as I was telling you but if it's true
that those two populations have been
completely isolated genetically then it
has to be at least as old as tau right
when I find their common ancestry when I
trace back to their common ancestry to
their common ancestor it has to be in
this ancestral population before the two
were were isolated from one another
however if there has been some gene flow
between those two populations if some
some of these guys have been finding
rafts and sneaking over to these guys
right then I'm going to I'm going to
have some places along the genome where
their common ancestry is younger than
the split between the two populations
all right so if I look across the genome
at many different locations and I see
that most of the ancestry is old but
there's an occasional position along the
genome with very recent ancestry that's
a telltale sign of
gene flow between two populations right
and that is essentially the signal that
we look for when we study these ancient
interbreeding events okay all right now
I'm going to have to start to skip over
some details because the methods that we
actually use get fairly complicated but
I want to tell you at a high conceptual
level essentially what we're doing so my
group got interested in this problem
about seven or eight years ago and we
were we wanted to model this problem of
finding common ancestry along complete
genomes allowing for it to change for
the patterns of ancestry to change from
one position in the genome to the next
so we set it up in the following way we
collect DNA sequences for many locations
across the genome we have a
representative one or more
representatives of several populations
we propose some branching relationship
among those populations we can try
several if we're not sure what it is but
sometimes we have enough information
from the fossil record that we have a
pretty good idea of what that
relationship is so for example if these
were Europeans and West Africans then
these might be South Africans we know
essentially from other studies about
their general relationship with one
another and then using the computer we
explore many many population trees
consistent with the data across the
genome and we adjust the parameters of
this model the time since these
populations diverged and the amounts of
gene flow between populations until they
best fit the data we do that by
exploring millions of these possible
genealogies across tens of thousands of
DNA sequences drawn from the genome and
we make use of techniques drawn from
statistical physics called Monte Carlo
techniques that let us in a principled
way explore this space of possible
genealogies and at the end of the day
the computer gives us
model and it tells us
which model best fits the data and how
much confidence we have in the
individual parameters of that model all
right and some of these genealogies
will will involve gene flow between
populations and others won't and we can
turn a knob there's a parameter that
describes how much of that gene flow
there is so we can test the possibility
of gene flow or the possibility of not
having gene flow okay so the reason we
were particularly interested in this is
we had some collaborators in about 2009
published in 2010 who obtained complete
genome sequences from some southern
African representatives in particular we
were interested in this complete genome
sequence for a represent representative
of this hunter-gatherer population from
the Kalahari Desert known as the Khoisan
or the Sān and the early work by
Cavalli-Sforza and others had shown from
mitochondrial DNA and Y-chromosomal DNA
that the Sān seemed to be a very early
branching group probably the earliest
branching group of all living
populations on Earth today but the data
was was very sparse and it was limited
to paternal or maternal histories so we
set out to see whether we could figure
out how old this population was by using
these statistical sampling techniques
across the entire genome I think I
forgot to tell you the name of our
program the name of our program is G-PhoCS
stands for generalized phylogenetic
coalescence sampler so we wanted to
apply G-PhoCS to these data and see what
we could say about how old the Sān were
so the way we did this was we took at
the time there were only a few complete
genome sequences for multiple
populations across across the globe but
we had a Korean individual, a Han Chinese
individual, a European individual, a West
African Yoruban individual and a Sān
individual and we assumed
the following the tree that I'm showing
here this was based on Cavalli-Sforza
data and other data we could also test
alternative trees and make sure that
this was the one that fits the data best
and we allowed for gene flow between
some of these populations and then we
tried to see whether we could estimate
how old these splits were between the
different groups and we focused in
particular on two splits the split between
the Sān and the others that was the one
I mentioned the very old one that we're
most interested in and then the split
between the west-African Yoruban's and all
of the non-African populations and that
would be a proxy for the time when these
non African groups migrated out of
Africa and colonized the rest of the
world that would give us a pretty good
estimate of when that colonization event
might have happened that's known as the
Out of Africa migration and what we came
up with after after very careful
analysis for many many days was was the
following estimates we estimated the age
of the Sān split to be about 200,000
years ago now that's that's really
pretty old so that's as old as Allan
Wilson's estimate of mitochondrial Eve
so the Sān according to our estimates go
back about as far as mitochondrial Eve
would go back that's that's actually not
surprising mitochondrial Eve is the
maternal ancestor but that but for
reasons I won't go into it's not too
surprising that the maternal ancestor
would be close to the divergence time of
that Sān split so that was
encouraging our estimate of the Out of
Africa event the African Eurasian
diversions AE divergence was seventy to
eighty thousand years and that fit
fairly well
with archeological findings in the
Middle East and with a number of other
arguments people had made on the basis
of both genetic and
archaeological evidence so we
were quite encouraged by these findings
but they did indicate that the Sān are
really quite an old population so note
that this this time is about three times
as long ago as this time that meant the
divergence of this Sān group in southern
Africa was three times as old as the
split between the West Africans and the
Europeans it's a very old group there has
been some gene flow between the West
Africans and the South Africans and we
can detect that in our framework but
they've been remarkably isolated
probably because of this hunter-gatherer
lifestyle living in the desert and their
their tendency not to mix with the
farming populations nearby I just wanted
to mention very briefly that we there
was a recent study that came out just
last week this is not yet published in a
journal but it came out on Cold Spring
Harbor's preprint server known as bioRxiv
this is a group that analyzed
some similar data to the data we
analyzed but they combined it with some
ancient genomes some Iron Age farming
genomes and some Stone Age hunter
gatherer genomes ranging between 300 and
2,000 years old so these were these were
remains that they dug up in South Africa
obtained DNA from these remains sequence
that DNA and analyzed it together with
modern-day genomes for a number of
different populations and they actually
ran our program G-PhoCS on these data and
they also made use of their own method
which which analyzes only pairs of
genomes together I don't want to go into
all the details of their study but
they're estimating that these that this
date for the split of the Sān which are
here and the other African populations
might be two hundred and sixty thousand
years old or even older than that I have
some questions about exactly how they
did the
analysis so we'll see how that holds up
when this paper is peer-reviewed but
it's reasonably consistent with ours and
it's not surprising that with the with
the acquisition of this ancient DNA the
date might get pushed back even farther
one other caveat I wanted to give here
without going into a lot of detail is
that this molecular clock I've been
telling you about is actually kind of a
fiction there actually isn't one
molecular clock there are many molecular
clocks the rate at which mutations occur
varies across human individuals and it
varies quite considerably between males
and females and because of the process
by with the different processes by which
sperm and egg cells are generated it's
age dependent in males and much less age
dependent in females what that means is
that old males who become parents make a
very disproportional contribution to the
numbers of mutations that occur in their
offspring that's one of the reasons why
you see a paternal age of fact in
diseases like autism it's because of the
higher accumulation of mutations and the
sperm cells of older males anyway I
didn't want to go into all the details
here but I want to make the point that
when we try to calibrate these dates
when we try to use genetic data to
estimate how old populations are we're
using very crude averages over mutation
rates across humans and some of these
factors have probably changed over time
generation times may have changed with a
ratio of male and female ages at the
time of reproduction are dependent on
the culture in which these these
reproduction is occurring and so on and
so forth so that's one of the reasons
why there's a lot of uncertainty about
the precise dates that we get out of
these genetic analyses nonetheless we
can be fairly confident about ballpark
estimates okay in a few minutes that I
have left I want to start to talk a
little bit about Neanderthals and I want
to start by introducing you to
Svante Pääbo who is probably the the
most famous person in the field of
Neanderthal genetics he's been here at
Cold Spring Harbor many times given many
talks about almost always about
Neanderthal genetics not always but
almost always and Svante has been
fascinated with ancient DNA for decades
and has really dedicated most of his
career as a scientist to devising new
techniques for obtaining DNA from
ancient samples correcting errors in
that DNA and then analyzing that DNA to
tell us something about our history I
mentioned that he worked early in his
career with Allan Wilson at Berkeley
later on
he moved back to Europe and for a couple
of decades now I think he's had his own
Institute in Germany Leipzig, Germany Max
Planck Institute where they do some of
the world's best work in this field of
ancient DNA so Svante had been
studying Neanderthal DNA for a number of
years and had some initial progress in
obtaining mitochondrial DNA from
Neanderthals but also some setbacks
there had been some high profile cases
where they had published what they
thought was Neanderthal DNA that turned
out to be contaminated by modern human
DNA it's very difficult to avoid that
sort of contamination and he went back
to the drawing board and came up with
more rigorous techniques for obtaining
DNA and then finally in 2010 his team
had a major breakthrough they were able
to obtain a so-called draft Neanderthal
DNA sequence for an entire genome now at
this at this point they were not able to
sequence to high coverage the genome of
a single individual they had to combine
DNA from three bones that were found in
a single cave in Croatia they compared
it with sample
that they had found in some other caves
across Europe but by combining this
information and being very careful about
DNA extraction and about sequencing and
about error correction they were able to
obtain a quite good draft quality genome
for a Neanderthal and then they said
about analyzing that genome and the big
story from this analysis was that there
appeared to be strong evidence that
Neanderthals and modern humans had
interbred probably about 60,000 years
ago I'm not going to go through all of
the evidence that they presented in
favor of this hypothesis but I want to
show you one finding that I think is
quite striking and and fairly easy to
understand if you'll bear with me for a
moment
so what we're showing here is on the
x-axis we're show we're going to take
two genome sequences a European genome
sequence and an African genome sequence
and we're going to compare them to the
newly sequenced Neanderthal genome
on the x-axis and to the human reference
genome on the y-axis now the human
reference genome is predominantly
composed of DNA from Europeans but it's
not the same European as the one we're
comparing so there's still going to be
quite a few differences between the DNA
the European genome that we're using as
a query and this human reference genome
and now what what they do for this plot
is they normalize they standardize the
distances so they have an average of one
so there are some overall differences
between the European and the African and
how similar they are to these two
reference genomes but they're going to
get rid of that by adjusting them so
they have averages of 1 now what you see
when you look across the genome is that
both the European and the African mostly
have a positive slope here where they're
more where they're farther away from the
neanderthal genome they're also farther
away from the human reference genome and
that just reflects the fact that the
clock ticks at different rates
different places across the genome so
you're accumulating mutations at
different rates at different positions
across the genome and when the clock
ticks faster you tend to be more distant
both from the neanderthal genome and
from the human reference and when it
takes more slowly you tend to be closer
to both but look at this strange anomaly
down at the left-hand side in the
European genome so this is a collection
of sequences a small fraction of the
entire genome but a significant fraction
a collection of positions across the
genome that are very close to the
Neanderthal genome to the sequence
Neanderthal genome and very far from the
human reference okay
sequences that look a lot like
Neanderthal sequences but are in a
European individual and don't look
anything like the reference genome
that's composed of a collection of
different people so these are sort of
anomalous sequences it's like alien DNA
embedded in this European genome that
looks a lot like Neanderthal sequences
and not like other european sequences
and it only appears in Europeans you
don't see it in Africans it's a very
strange observation and if you do this
plot with other populations from outside
of Africa such as East Asians or
Americans or Papua New Guineans you see
the same sort of pattern a small
fraction of sites that look a lot like
neanderthal DNA in humans all right so
I'm not going to show you the other
analyses that they did but through a
whole series of analyses a large team of
researchers very convincingly showed
that the only plausible explanation for
the strange observation in non African
genomes is that non Africans
interbred with Neanderthals probably
about 60,000 years ago after they had
migrated off of the African continent we
know that it can't have happened in
Africa because we see
no sign of it among African populations
we also see no fossil record of
Neanderthals in Africa so Neanderthal
the Neanderthal range was predominantly
in Europe, the Middle East, and Western
Asia so it would make sense that this
band that migrated off of the African
continent would have encountered
Neanderthals somewhere in Eurasia and
the only way we can explain this strange
observation and a fraction of their
genome as if there was an interbreeding
event okay so I want to go on with the
story
so the next chapter in this story was the
discovery of a new cave so the sampling
of ancient DNA that's ponte Pavel and
his team were doing was very much
limited by the quality of the DNA they
were able to obtain from these these
bone fragments they were analyzing many
of the bone fragments that they found
that appeared to be Neanderthal bone
fragments they couldn't extract any DNA
from and even the best ones were maybe
one or two percent Neanderthal DNA and
mostly bacterial DNA and contamination
from modern humans but then they found
this beautiful cave in Siberia and the
Altai mountains called Denisova cave and
they teamed up with some Russian
archaeologists and began to explore some
bones in that cave and found that they
were sorry here it is it's quite far to
the east of these European Neanderthal
findings probably on the eastern
side of the Neanderthal range but they
found some beautiful bones in this cave
that has astronomically higher
enrichments for Neanderthal DNA than
anything they had seen before so they
found in particular this one very tiny
finger bone this is the distal manual
phalanx so it's the tiny little
fingertip phone that had a
very good DNA sample and when they
obtain the DNA from this sample they
came up with the amazing finding that it
appeared not to be a Neanderthal
it appeared to be another type of archaic
hominin so it was it was closer to a
Neanderthal than it was to a modern
human but it was divergent enough from a
Neanderthal that it must have been
hundreds of thousands of years separated
from Neanderthals so they called that a
new subspecies or species the Denisova
named after the cave and they also found
in the same cave a toe bone probably
from the fourth or fifth toe that was
very rich in Neanderthal DNA so these
two samples then became the source of
the the next several years of analysis
of ancient DNA they both were high
enough quality that it was possible to
obtain very high-quality complete genome
sequences for a Denisovan and for
another Neanderthal from these two tiny
bones excuse me in this cave all right
so I can't go through all of the
findings from the analysis of these of
these two bones but I want to I want to
show you a summary of what was known in
about 2013 after the analysis of the
complete genome sequences from these two
bones so first of all you see there are
two distinct groups the Denisovans and
the Neanderthals they are more closely
related to each other than either one is
to modern humans but they're pretty
distantly related to one another
they probably diverged hundreds of
thousands of years ago from one another
in addition there was now evidence for
several different gene flow events
there's the one that I just told you
about from a Neanderthal into these out
of Africa populations represented by
this line here right here are Africans
and here are non Africans modern humans
that event must have happened somewhere
in the branch
leading to the non-Africans in addition
they found evidence of gene flow from
the Denisovan into modern humans as well
this evidence appears to be con-
confined to East Asia it's most strongly
observed in oceanic populations such as
Papua New Guineans but you see some
hints of it as well in Han Chinese and
Korean populations this appears to be
the result of a distinct interbreeding
event between these Denisovan
individuals and a group that was
probably on its way
migrating along the way to Southeast
Asia in addition there was a some weak
signal indicating gene flow between the
Denisovans and the Neanderthals and then
perhaps most interestingly there was a
sign this remains a mystery something
that we're interested in working on in
my group there remains a sign of some as
yet unknown hominin possibly Homo
erectus which is a much earlier group
that is known to have lived in in China
and across Eurasia that group has left
some segments in the Denisovan genome
that appear very strange relative to the
rest of the genome so there are short
segments in the Denisovan genome that
don't look like anything else that we've
sequenced essentially and it's possible
that that represents another
introgression event another
interbreeding event a very old one but
that remains an open question okay so
this is all background to the story that
I'm going to tell you about from my
group very briefly and there this story
involved using this program that I just
told you about G-PhoCS to jointly analyze
all of the data that was available at
this time so we had now three
Neanderthal genome sequences the ones
from the first paper, the ones from the
second paper
and a partial genome that had not yet
been published from a cave in Spain we
had the Denisovan genome and then we had
a series of modern humans whose genomes
had been obtained I'm using the Yoruban
in here as a placeholder but we analyzed
several of them together we put them
into this G-PhoCS program that samples
over all of the possible evolutionary
histories that could explain the data
and after some careful analysis we came
up with the following model so G-PhoCS
detected evidence of essentially all of
the gene flow events that I just told
you about so for example here's the gene
flow event from Neanderthals to the
Out of Africa populations here's here
are the gene flow events from Denisovans
to East Asians in Papua New Guineans
here is that mysterious archaic hominin
that might be Homo erectus introgression
here is the introgression between the
Neanderthals and the Denisovans
detected at quite low levels but in
addition we found another introgression
event and no matter how we did the
analysis no matter how careful we were
no matter how we subsetted the data we
couldn't get this one to go away and
this one is quite interesting it's going
in the opposite direction it suggests
some early modern human from before the
divergence of Europeans and Africans
left its imprint in the Neanderthal
genomes remember the event I told you
about earlier was in the opposite
direction it was Neanderthals leaving a
footprint in Out of Africa
human genomes this is humans leaving a
footprint in
Neanderthal genomes but it's
shared across all humans it's not
present just in the Out of Africa
populations it's you see the same signal
essentially symmetrically in all modern
humans so it must date to a time
before
the vergence of these human populations
and it appears only in this eastern most
Neanderthal genome the altai Neanderthal
genome so this is really a mystery how
can we explain this observation well
here's our best guess at coming up with
a scenario that might describe it so
first of all if we if we think about the
human lineage about 600,000 years ago in
Africa the Neanderthals would
have branched off and they would have
migrated off of the African continent
this is very early sometime later around
200,000 years ago just before these
different African groups began to split
apart from one another the Sān and the
West Africans for example there must
have been a group that interbred with
Neanderthals now the question is where
could that have happened because we
don't think Neanderthals at that stage
lived on the African continent so it
suggests maybe there was an earlier
migration Out of Africa an interbreeding
event with Neanderthals perhaps in the
middle east or east of the Caspian Sea
leading to that eastern most Neanderthal
lineage and then who knows what
happened to that group of modern humans
that group of early modern humans we
don't see any representatives of them
alive today but they could have been
absorbed by the Neanderthals they could
have died out completely or they could
have migrated back and become absorbed
by the other African populations we
don't know we just know that we see no
sign of them and then sometime later
going back to about 65,000 years or
so there would have been the main
migration Out of Africa the so called
Out of Africa event and subsequently the
interbreeding event that had already
been discovered by Svante Pääbo and his
colleagues in the opposite direction
from Neanderthals into modern humans so
this was a subject of our paper a couple
of years ago there are a lot of
questions about exactly how this could
have happened but the genetic evidence
is very
strong that there was at least one
interbreeding event in the other
direction from early modern humans into
Neanderthals okay so I apologize for
going long I'm going to wrap up there
the main point I want to make is that we
can take use of the we can make use of
these classical molecular phylogenetic
techniques to study complete genome
sequences and reconstruct human history
it's computationally expensive requires
supercomputers and very sophisticated
computational models but we can do it
and we can come up with new discoveries
including these ancient interbreeding
events the other point I wanted to
make is that simultaneously modeling all
of the data gives us a lot of useful
information so most of the previous work
published by Svante Pääbo and others has
looked at subsets of the data in
isolation this finding that we were able
to publish a year ago was made possible
by the fact that we we came up with a
single model that had to explain all of
the data together and we could only see
evidence of this early interbreeding
event in the opposite direction from
early modern humans into Neanderthals
after we were accounting for all of the
signals of the other migration events
with only by building a holistic model
that described all of the data together
that we were able to discover that event
and as I mentioned we found the first
evidence of early modern human gene flow
into Neanderthals and they suggest a
likely possibility of an earlier
migration Out of Africa although we have
no other evidence to support that
finding other than the timing the
inferred timing of the event so finally
what's next well we're very interested
in understanding that sort of phantom
introgression events in the Denisovan
genome those hints of some very early
introgression event possibly from Homo
erectus I have a student in my lab who's
working very hard on trying to build
models that can detect those early
events we're also very interested in
coming up with ways of
detecting specific introgressed segments
specific segments in the human genome
that have come from Neanderthals and
Denisovans and I didn't get a chance to
talk about it but it's very interesting
to think about the possible
disease-causing mutations that are out
there in modern human populations that
may have been inherited from
Neanderthals because these Neanderthals
had adapted to a different genetic
background they had adapted much
earlier to the climate and conditions of
northern Europe and Asia and and those
mutations that they passed to modern
humans through this introgression event
some of them were probably advantageous
but some of them were probably
disease-causing mutations so there's a
lot of interest in trying to understand
which mutations now linked to disease
might have come in to our populations
through these introgression events okay
I'm going to stop there I'd like to
thank all the members of my lab who have
contributed to this work as well as our
collaborators and I have a number of
funding sources over the years who've
allowed us to pursue these sorts of
questions thank you very much.
