(delightful music)
- [Narrator] We are the paradoxical ape.
Bipedal, naked, large-brained,
lone the master of
fire, tools and language
but still trying to understand ourselves.
Aware that death is inevitable
yet filled with optimism.
We grow up slowly,
we hand down knowledge,
we empathize and deceive,
we shape the future from
our shared understanding
of the past.
Carta brings together experts
from diverse disciplines
to exchange insights on who we are
and how we got here.
An exploration made possible
by the generosity of humans
like you.
(delightful music)
(techno music)
- Like many people here ,
ever since I was a graduate
student in David Nelson's lab,
I've been interested in
the genetic basis of this,
which is what makes the
human brain so unique
compared to our most closest
related living relatives.
And I would argue that and
has already been alluded to
that geneticists have had
a problem for a long time
and this is partly
recognized by Alan Wilson
and Mary Claire King,
is that the genomes of humans and chimps
are thought to be virtually identical
with 1.1, 1.2% genetic difference.
And I already alluded to that in fact
that the amino acid level of the proteins,
they estimated half the number
is closer to about a third
of the proteins are identical
between the chimp and human.
Yet when you compare it to
other organisms, Drosophila
you can compare it to amphibians,
even lizards that were
compared at that time
that Mary Claire was working on,
there are species that
have way more difference
in their amino acid compositions
and yet look virtually identical
in behavior and morphology
and so this led to kind
of the speculation that,
it must be regulatory changes
that may make humans unique
or structural changes that
are actually important
in terms of shaping our DNA in ways
that couldn't be detected
at the level of amino acid differences.
I've been interested in one
very specific type of mutation
for a very long time,
duplicated sequences.
You might not think of
duplicated sequences as mutations
but they are.
They arise as everything
else as a mutation.
The initial duplication
is a mutational event.
They're important for genomes
for two very fundamental reasons.
Recognized for many years
long before molecular biology
was even a word.
Susumu Ohno recognized this
and even before him others did
that duplications are the primary force
by which new genes are
born within species.
Doesn't matter if you're a
chimp or a fly or a worm,
if you want to create a new
gene, there's ways to do it
that don't involve duplication
but the primary ways
to make an extra copy,
free it from a selective
constraint, new mutations occur,
new function.
The second actually goes back earlier
from guys named Sturtevant
and Bridges and Muller,
back into the 1930s, is this idea that
when you duplicate sequence
and you create two sequences
that are virtually identical,
you've actually made the genome unstable.
So there's a process called
unequal crossing over
that can lead to unequal
crossing over events
that leads to gains and losses of sequence
right where the duplications live.
So structurally, dynamic and
the potential to give birth
to new genes.
So I'm going to talk to
you a little bit today
about duplications, recent duplications,
things that have evolved over
the last 25 million years,
we call them segmental duplications.
There'll be duplications
within a chromosome,
they're inter-chromosomal duplications
and duplications within a
chromosome which we'll call,
intra-chromosomal duplications.
And here's a map that I've often showed.
This is some of the early work that we did
as part of the genome project,
where we actually built
the first duplication map
of the human genome.
Anything that you see here in blue
is an intra-chromosomal duplication,
so that means it's duplicated
along the chromosome.
So these are your chromosomes
schematically represented,
the purple represent the
regions that still to this day
haven't been sequenced
despite what people claim
that the genome is finished.
These are centromeres and acrocentric
and some telomeric portions
of the chromosomes.
The blue represent the
intra-chromosomal duplications
that are greater than 95% identical
and greater than 10 kilobases in size.
So these are the biggest
events in our genome.
A couple things I want
you to get from this.
You can run the statistics
but you can see it by eye
that this distribution
is highly non-random.
Certain chromosomes,
chromosome seven, 15, 16
are particularly rich in this.
The second point I want
you to get from this
is that if you look at the pattern,
you see that a lot of these
lines actually go over
what appears to be long distances.
This is called an interspersed
pattern of duplications.
That means duplications are
born but when they are born
they don't actually stay
close to one another
but they in fact distribute,
disperse from one another
by long distances.
In fact if you do this
measurement, ask the question,
what fraction of our duplicated sequences
are actually separated from their ancestor
by at least a Megabase,
are located on completely
different chromosomes,
the number is something like 60%.
So this is your interchromosomal
pattern overlaid.
This is actually highly non-random as well
but it's non-random in terms
of regions of the genomes.
So near the ends of chromosomes
and near these centromeres
are where those predominantly live.
Alright so that's the pattern.
Why should you care?
Well the fact that you have
interspersed duplications
in your genome that are
separated by sometimes dozens,
if not many dozens of
genes, means that the genes
that live between those
duplicated sequences
are prone to be deleted
or duplicated themselves
because they are now inside
a region of instability
in our genome.
I'm not going to go into great detail
but there now about 40 different genomic,
what is called disorders
of which half of them
are mediated by recurrent
deletions primarily
but sometimes duplications of
the sequences that are these
recently evolved duplicated
parts of our genome.
So this is an example of
one that we discovered.
It's called the, with the others,
we discovered this back-to-back-to-back,
a Koolen DeVries syndrome.
These kids have this region of the genome
only about a half a million
base pairs but it's deleted
and so their parents have
two copies of chromosome 17
but the child has inherited
one that actually is missing
about 500 kilobases of sequence
and that's because there are
human or great ape specific duplications
located at the boundaries
of this sequence.
Here's another one that
we discovered in 2008.
This is a very specific form of autism.
These children are born with
the exact same kind of problem,
it's a bit different portion of the genome
where there's duplications
that live right here
that cause this piece of DNA to be deleted
about three megabases of
sequence and about a dozen genes.
Every kid that's been
born that we've identified
at least thus far has a
form of autism characterizes
by this frontal bossing of the forehead,
very characteristic facial feature.
So it's a very rare form of
autism in the human species
caused by this duplication architecture.
This is an example of one
which doesn't actually
have any clear facial morphology.
And this is a recurrent micro deletion on
again, chromosome 15.
Once again mediated by recently
evolved duplicated sequences
these kids can either receive
kind of an inherited form
but more often they get a
de novo which means that
it happened in generally
in one of their parents.
Instead of just having one disease,
these kids are at risk
for multiple diseases.
So it turns out that we
identified this associated
with intellectual disability
but relatively high functioning
later it was found to be
associated with autism
later it was found to be
associated 1% of epilepsy
so idiopathic generalized epilepsy
and papers from other labs showed that
it was an important risk factor
probably one of the biggest ones
for schizophrenia in the adult population.
This is our duplication architecture
and you could ask yourself the question,
well, why do we have this organization?
It turns out when we go to the
genomes of Mouse, rats, dogs,
cats, platypus, they don't
have this type of organization.
They keep their duplications
organized in clusters
and little pockets
without being dispersed.
So this dispersal of duplications
has created a bad design,
pardon the pun.
It actually makes our
genome fragile essentially
because of the presence of
these duplicate sequences
that have evolved over the
last 15, 20 million years
of evolution.
Humans, chimps, gorillas,
we all share this
although the exact patterns are different
between chimps and gorillas.
And to a lesser extent, species
like macaque and marmoset
have fewer of these compared
to that of the great apes.
So I'll say the answer, I
won't give you the answer
but you've actually
heard some of this today
but just remember that
these are not basically
gene poor regions of the genome.
There's about a thousand genes
if you believe your browsers,
that are out there that are
aren't mapped in these areas
of the genome.
And Ohno argued that the primary force
by which these new genes
evolve is duplication.
So before I tell you some stories,
I want to tell you a couple other features
of these duplications.
One, is that their accumulation over time
has been non-random.
So work that was alluded to,
work that I did with my
former postdoc Thomas Marquez,
we sequenced about a
hundred great ape genomes
to try to estimate which
duplications were fixed,
which duplications were polymorphic.
And this generally accepted phylogeny
and the thickness of the lines,
indicate roughly the proportion
of duplicated sequence
that has been fixed on any branch.
So the numbers are not that important but,
for those of you who are interested,
what this means is for every
base that has been fixed
as a result of single base pair mutation,
there have been two point six
one bases that have been fixed
in this branch as a result of duplication.
But the really important and I think,
relevant to this audience, is that
there is a huge excess, very statistical.
Not in our branch but in
the common branch leading to
humans, chimps and gorillas.
This is where we see the biggest excess
of duplicated sequences and
almost all the duplications
that are causing disease in our species,
associated with delay in autism,
are mapping to duplications
that evolved here and evolved
right around the separation
of chimps, gorillas and humans.
Second point.
What I presented in terms of
the organization was too simple
so if you actually go and
actually look at the structures
of these duplications within a chromosome,
here's your chromosome 16
and these little numbers here
indicate the structures that
are indicated on the right here
anything that you see in color,
means that we've been able to determine
the evolutionary origin of the segment
so whenever you see the same
color, that means this came
from the same origin.
In this case on chromosome 16.
You get this picture of duplications
that are made up of different
pieces of the genome
that have stitched together
to build these complex mosaics
or modular structures
and then they actually,
some are very similar to one another
but some are actually very
different to one another.
But the really remarkable thing
and it's true for every
chromosome that's experienced
this burst of duplications, is that,
in those chromosomes you
see a specific sequence,
this is indicated here by the red,
we call it a core duplicon.
It is the place of the genome
where it seems to be the focal point
for the building up of these
more complex architectures.
This sequence is over-represented,
way more than you'd expect by chance
and it seems to actually be
involved somehow inherently
in this duplication
and this interspersed
duplication architecture.
Moreover, most of the
recurrent rearrangements
that are associated with
disease are mediated
by duplication blocks
that have these cores.
So we came up with simple
hypothesis back in 2008.
Maybe the disadvantage
of this intersperse
duplication architecture
is offset by the emergence of
new genes with new functions
which override that actually
select a disadvantage of this
which is predisposing us to disease
or our children to disease
and maybe contributing
to the unique features
that make us human i.e. the
developing of the human brain.
So, is there any evidence?
So you've seen a version of this slide.
This were examples of
the youngest duplications
that evolved and so are
the genes are listed here
at the bottom.
This is the copy number that
we estimate in the genome
of multiple humans from
Asia, Europe and Africa
compared to chimp, orangutan and gorillas
shown here in gray and black.
So these are human
specific duplication events
and one of the things that
we and Jim Sequila noticed
early on is that there is
actually a noticeable enrichment,
it's borderline significant
because there's not that many genes,
of genes that have been implicated
in terms of brain development.
And what's really interesting
is if you actually look
at the chimpanzee specific
or the gorilla specific duplications,
you don't see these types of genes,
you see genes involved in immune response,
genes involved in drug detoxification
but you don't see these types of genes
actually being enriched.
So for example you've heard about SRGAP2
and ARHGAP11B at this meeting,
GTF2IRD is a transcription factor,
thought to be important in
terms of visual spacial defects
associated with the
Williams syndrome disease.
G pin, GPRN is a G-couple protein inducer
of neurite outgrowth.
CHRFAM7A is a related nicotinamide
acetylcholine receptor,
HYDIN is a gene that's
important in fluid flow
in terms of developing brain.
SMN1, survival motor neuron
protein, incredibly important
in terms of spinal muscular atrophy.
Two stories that you heard here,
in fact I think rise
above a just so story.
The one you heard from
Frank about SRGAP2C.
So we were involved in
kind of characterizing
this duplication.
It turns out that the actual gene itself
wasn't in the human genome in 2012,
so we had to actually go
and clone and sequence it.
In 2012 it turns out
there was only one copy
in the human reference and
there are actually four
of which one is a clear pseudogene.
Ancestor produced a
daughter called SRGAP2A
about 3.2 million years ago
with a secondary duplication
leading to SRGAP2C
which is this duplicate truncated form
that antagonizes the function
that's thought to be important
in terms of altering spine development
as well as excitatory
and inhibitory synapses.
The other story you heard
was from Viland Hutner.
We discovered this released
a duplicated sequence
back in the early 2000s and
actually characterized it
and reconstructed the
evolutionary history in 2014.
And this was a gene
that as you heard today
thought to be important in terms of
increasing number of basal
radial glial divisions
or also known as auto
ventricular cell divisions
that may be important in terms
of increasing neuronal count.
The common theme about these,
is that each of them are truncated
with respect to the parent
copies, they're not full-length.
These genes or these
duplications are also associated
with genomic instability.
So in the case of 15Q1,3,
that's associated with
that schizophrenia former
and as well as epilepsy
that I showed you earlier.
And in these particular
cases it looks as if
the duplication itself may have been,
the incomplete nature of it
may have been important for the
Neo-functionalisation of it,
for the actual evolution of new function.
So I'm going to end with
a story on this last one
which we just recently published
and we continue to characterize.
So this is this picture of
chromosome 16 I showed you before
and I'm going to zoom in
on these duplication blocks
that evolved over the
last few million years
in evolution of our genome.
And the reason that this
particular duplication pair
is so important, is that
recurrent rearrangements of it
actually result in the second
most common cause of autism
in the human species
that is of a deletion
of chromosome 16P1 1.2
in the 25 to 28 genes
that map between these.
So this is the second most
common cause of autism
genetically known in the human population.
Result of duplications
that evolved specifically
in the last few million
years in our in our species.
So I convinced a student
about four years ago to go
and characterize this
and do a comparative evolutionary study
which almost no one ever seems
to do at the genetic level
because they think the genomes are done.
So in a weak moment he agreed.
So the student was named Xander Nuttle
and this was just showing
you kind of our sequencing.
So the way he did it was
kind of old-fashioned.
He took large insert
clones and sequenced them
and reassembled because we
didn't trust the genomes
that were assembled.
And the way I'm going to
show you these pictures,
I'm just going to show you,
this is a portion of
chromosome 16 on the orangutan
where the little ticks represent genes.
So there's 48 genes that
are represented here.
The color represents
the duplicated sequence,
at least in my life, I
always put that in color
and then the actual
arrows here indicate the,
what we call synteny, so it's
the order of these segments
with respect to other mammalian species.
Other than the duplications
this order and these genes
is completely syntenic with mouse
which diverged about 80
to 90 million years ago
so we believe this is the ancestral state.
So then he repeated the
experiment by looking at gorillas,
chimps and humans.
So I'm going to show you
two chimps and two humans
or one human right now
for the exact same region
of the genome.
So this is the exact
same area of the genome.
In this it's one point four
megabases or million base pairs
in orangutan and these
are the two chimp versions
of this particular portion
of chromosome 16 right here
and there's the human for comparison.
Now the colors remember,
represent the duplications
and the arrows represent the segments
and so the first thing
you should get from this
when you look at it you say,
"God that doesn't look
even close to the same."
And you'd be right because
this area of the genome
has essentially doubled in size
as a result of duplications
in the chimp and the human lineage,
those are all the color bars.
The interesting thing is,
is that the patterns of duplications
are almost completely different
between chimp and human.
So remember the different colors represent
different evolutionary
origins and you can see
that there are some
things that are similar,
these little red ticks
are those core duplicons
but by and large the structure
is radically different.
More interestingly, if you actually look
at the actual segments of the DNA itself
you can see that they're
completely ordered differently
between a chimp and a human.
These segments, these
six segments of 48 genes
have been shuffled around
in completely different combinations.
In fact my student
estimated parsimoniously
that you would need 13
large-scale structural changes
to actually convert a human to
a chimp structure over this.
So the idea that we're
99 percent in this region
of the genome actually has no meaning.
We are so radically different
but over a very focaled
region of our genome.
Coming back to disease,
this is the area that
causes autism in our kids
and it's because we
have these duplications
in a direct orientation of
the same type on either side.
If you look at the chimp region five,
both haplotypes, they don't have this.
They don't have the
duplication architecture
that would predispose the disease.
In other words, they are not predisposed
to developing this form of autism
because they don't actually
have the architecture
to predispose the instability.
So what do humans look
like if you compare them.
Thankfully we look much
more similar to one another
so this is three different
human chromosomes
that are being compared.
But we do differ and we
differ over only one portion
and this is indicated here by
the orange and green arrows.
There's a region of
about a hundred kilobases
that is expanding and
contracting like accordions
on this region which predisposes to autism
but on either side we're
expanding and contracting
this hundred kilobase cassette.
And if you look really
carefully you'll see
that there are four genes
right over the area of change
and these are genes that are important
both in drug detoxification we have found
but also genes important in
terms of iron metabolism.
This gene called BOLA2.
This is a gene that's actually
been shown biochemically
both in vitro and in vivo
to be important in terms of
recruiting more iron into
a cell and helping it
to essentially become stable
in terms of the proteins
that it produces.
So how do humans vary if you
look at thousands of them?
This is 2500 humans
that we're comparing now
for copy number of this
hundred kilobase segment
which in claim contains this BOLA2 gene.
And here's the interesting part.
You look at humans from all
the different continents
and they're quite variable
in terms of their copy
but all of them or I
should say none of them
go back to the ancestral state
as what you see in chimps,
gorillas and orangutans.
So every human has at least three,
most of us have five or
six of this duplication.
A few at three but none go back to two
as what you see in these species.
Here's where it gets really cool.
If you compare to Neanderthal and Denisova
which you've heard a little bit about,
they separated from us very recently
only about 400 to 500,000 years ago.
They look like chimps and gorillas
in terms of their copy number.
If we look at archaic hominins
or I should say archaic
humans not hominins,
archaic humans.
So this would be humans that
lived 30 to 40,000 years ago,
they look like us.
This is 400,000 years ago,
this is 50,000 years ago.
What's really remarkable is
that, when we estimate the time
and we can do this with
phylogenetic approaches,
we estimate the birth of
this duplication to being
about 282,000 years ago.
This is where most
paleontologists estimate the root
of the species Homo Sapiens.
Of course the plus or
minus is 75,000 here.
(audience laughs)
One thing we can show
from basically doing,
using methods involving
evolutionary or phylogenetic methods
is that the rate at which
this expanded is too rapid
to be explained based on
strictly neutral evolution.
So this is not evolving
neutrally, it's actually the fact
that we haven't gone back to
the actual ancestral state
is very unusual specifically
for a duplication
where copy number variation
is always occurring.
And copy number of this piece of DNA
actually correlates expression.
So those of you who have more copies
actually have more of
this BOLA2 expressed.
We looked at this in terms
of expression differences
between chimp and human with rusty.
So he had actually
published this early work
that he mentioned already looking at
induced pluripotent stem
cells in chimps, bonobos
and humans.
And even though it didn't
make his top 10 list,
it should've because this
gene was actually expressed
seven point five fold higher
on average in the chimp
or I should say the human
compared to the chimp.
And the reason when we went
back and looked at this
is because there was variance
in actually the copy
number estimates in human
because he looked at more than one human.
So we did these experiments with his group
and we could show that in
fact there is a big difference
between humans and chimps and bonobos
but the biggest different
happens early in development,
what we consider induced
pluripotent stem cells.
There's still some difference
between these NPCs,
neuronal progenitor cells
but that's the biggest
difference that we see.
The one last point I'll
make is that in fact
when we actually look at
the actual gene itself
there is just a simple duplicate gene
which is entirely represented
but because these
duplications are often mosaic,
we actually have identified a gene
which looks like a complete
fusion of part of BOLA2
with another CHR3 in kinase
which now makes a duplication,
a fusion duplication that
has an open reading frame.
This is in fact to my knowledge,
the only Homo Sapien specific
gene that distinguishes chimp
I should say humans from
Neanderthals and Denisova
because Denisova and
Neanderthal do not have
this duplication therefore
do not have the fusion.
Coming back to the disease,
going back to the kids
that actually have autism
and mapping the breakpoints of those kids
where did those breakpoints
specifically occur,
96% of the children that we've looked at
that have this breakpoint
associated with autism,
their breakpoint maps to the
Homo Sapien specific segment
that evolved on the order
of about 280,000 years ago.
And then this is just to show you
that if we look at those kids
that have low copy number
because those kids do
have lower copy than most,
we in fact find out that
in fact half of them
and the numbers are still small,
they have problems with
anemia and I require
iron supplementation compared
to the rest of the kids
with this particular deletion
who still have high copies of
BOLA2 which do not have anemia
or they very have low incidence of anemia.
So we think it's actually
relevant also to disease
in kids that are anemic with autism.
So in summary,
I hope I've convinced
you how cool our genome
and how absolutely chaotic it is
with respect to duplications,
that we've had this burst of duplications
that has occurred on the order of
eight to 15 million years ago
before we separated as species
but as we were separating as species,
this wired us for disease.
So we actually have a lot
of additional, recurrent,
structural changes in our genome
as a result of this
duplication architecture.
But this has come with
benefits and we think that,
we hypothesize the core
duplicon hypothesis is that
the negative result of having these
interspersed architecture is offset
by the emergence of new genes.
We consider SRGAP2C and ARHGAP11B
to be some of our best
examples of these types
but we think there'll be others.
And we also think BOLA2 is
particularly interesting
because it's a Homo
Sapien specific expansion
which we think will be
relevant for iron homeostasis
or improving our ability to recruit iron
and at the same time it's predisposing us
to the most common or second
most common cause of autism
in the human species.
So when I think about the
evolution of our genome
and this is the way I think about it,
I think of actually a balancing act
between both disease and evolution.
These are the folks that did all the work
and I will show you Seattle in the Sun
because this is just to show you that
the Sun does shine in Seattle.
Thank you.
(audience applauds)
(delightful music)
