John Martignetti:
Great, thank you very much. So, first of all,
thank you very much for the invitation to
speak. And even before starting, I actually
did want to preface just by two things. One,
clearly listening to the talks of the past
two days, you guys really are data users of
TCGA. I think you have to look at me more
of a data abuser. Yeah. So, that's kind of
one way to take this into account. The other
thing I'd like to say also, prefacing the
talk, is we're trying to use the TCGA data
in a very different way from most of the presentations,
and -- that we've spoken about, so clearly
the problem that we focused on is ovarian
and breast cancer. Clearly, I'm showing here
the number of cases, which all you guys know
-- number of cases, the number of deaths,
prevalence in the population.
And at Mount Sinai, what we've been doing,
really, focusing primarily on gyno/onc cancers,
we've set up a whole bioinformatic platform
and bought a repository to take samples in
as they come from the OR, directly bring them
into the lab, start blood samples, start cell
lines, start animal models, collect ascites,
with the idea that, really, as we talk about
patients -- and clearly so many of the talks
are geared this way -- we really do think
about the time of patient's diagnosis, their
original surgery, chemotherapy, their progression,
their overall survival progression, but what
I wanted to talk about today was not this
paradigm, but really more this paradigm. Thinking
about risk genes. And so, clearly, you know,
being a pediatrician, being a geneticist,
my clinic thinks about risk much more frequently.
And so, what we wanted to do was really take
a different approach using the families that
we've collected -- I'll show you a few pedigrees
-- and a different approach in finding cancer
susceptibility genes. And so really, the rationale
for the approach is family-based studies to
identify ovarian and breast cancer susceptibility
genes.
So, as most people will know, family history
is the strongest single predictor of a woman's
chance of developing breast and/or ovarian
cancers. While BRCA1/2 mutations still represent
the strongest known genetic predictors, they
really are responsible for less than 50 percent
of all families. And I can tell you from the
clinical side, the clinicians are faced with
this every day where a patient comes in, know
that they have a strong familial risk for
developing -- Myriad testing, or whatever
testing now will come back as negative -- and
these women are really faced with really loneliness,
and just the confusion, what to do next, as
is the physician. So, any kind of a study
that can identify other susceptibility genes
would be a real help in the clinic.
So again, really, what I am talking about
is more of a patient-centric or family-centric
bias, but from a gene discovery standpoint,
things become more evident on the population
level, which are not evident when viewed in
isolation. And I say this again where having
families are always a great way to start doing
genetic studies, but the problem is then you
have a single family, and then determining
whether or not you have a rare private allele,
or is this something that is more prevalent
or frequent in the population becomes a little
tricky.
So, materials and methods. So, to start these
studies, we actually did this in collaboration
with two groups. Dr. Lilian Jara, who I gave
us 70 families with three or more affecteds.
She's from the University of Chile. We also
included a number of families with male affecteds.
And Kunle Odunsi from Roswell Park, who gave
us 72 families with more than 72 affecteds.
As a preliminary test, we started by sequencing
21 exomes. We did this level of coverage,
and then we went through all of the data,
again, looking for potential susceptibility
genes. And here's a representative ovarian
cancer family. This is family 311. You see
that there are three affected women, the age
of onset: 48, 43, and 25. And these three
individuals, they all have parents or grandparents
that are affected; and they're clear, they're
all related. Here's a breast cancer family,
again, with the number of individuals -- these
were actually screened by Myriad as being
BRCA negative. These are other individuals
in the family. So, here's just the average
coverage per base for BRCA1/2, because again,
the idea that we started with is that we should
be looking for families that are BRCA1/2 wild
type to look for other susceptibility genes.
So, we tested the BRCA coverage. Indeed, one
of our families, which had been screened as
negative, actually turned out to be positive.
These are families that had males in them.
And, then to start the analysis -- and this
is where our collaboration with the gene pool
-- and there's a poster outside, poster 55,
that describes some of this work. So, you
can see, we took these three individuals,
and as we did the sequencing and we did the
analysis, we had a couple of filters that
we wanted to apply that rationally made sense
for gene discovery in families. So we looked
at the number of variants in genes, and then
the filters we started applying to the data
-- and this is actually kind of nice; the
gene pool allowed us to do this kind of alluvian
[spelled phonetically] filtering, so if we
ever wanted to go backward -- say, either
the filter was too tight, or it's something
that we wanted to change -- we could go back
and position. So, we've removed all the variants
with allele frequency of greater than 1 percent.
We kept variants that had either a high or
moderate impact. We kept variants that were
present in all samples. In other words, all
three individuals in this particular family
had to have the same variants if we believe
it to be a Mendelian trait. And then we kept
genes containing only one variant. So, in
other words, if there was a particular gene
that had seven variants in it, we excluded
that from the analysis.
And so from starting from a large number,
we came down to a more reasonable number,
but still too many genes for us to do any
kind of a functional validation. And again,
we're coming from a different perspective
in the laboratory than having, you know, hundreds
of samples to look at. So what we did, we
actually then manually curated for the variants
that were likely to validate, the ones that
were unlikely to validate either by low coverage
or where they were in the gene. And doing
that, we ended up with 24 candidate genes.
So, these genes are shared between all three
individuals, all of them were validated by
Sanger sequencing. So, now we had a little
bit of a problem, right, because again, each
of these were equally likely to be the susceptibility
gene. And, this where I'm going to ask my
first apology from you, other than hopefully
going over time, which I won't do. But I cannot
include the names here, because it's being
broadcast. I can't show the names, but what
I would say to everyone here, please, if you
stop by poster 55, I am more than happy to
share the genes here, the candidate genes,
because I think that's important, but I just
can't show them on the slide here.
But what we did in collaboration with the
TCGA and Sandeep Sanga from Station X, was
able to get the TCGA data for ovarian cancer.
And what we asked was, not the mutation frequency
in the tumors, but the mutation frequency
in the germ lines of these individuals. We
had 240 samples that we could look at, and
we split those by BRCA status. So again, using
another filter, we essentially said which
of those sample sets we believed to have a
BRCA mutation, BRCA1 or 2, which, were wild
type. And then the filter, the last filter
we applied was, we looked at the fraction
of samples in each of the individual gene
candidates, and the chance that they had mutations
in them in the germ line. So, again, the idea
was, well, if you actually have a very high
fraction of samples that have lots of mutations
-- and again, it's probably not a highly conserved
gene -- whereas if you're on this end of the
tri-modal distribution, you're more likely
to have a functional or something that's being
preserved.
And so, we went from 24 candidates by using
then the TCGA data down to eight candidates.
Again, using the wild type, or the BRCA wild
type in these 160 samples, we could look at
the number of samples that actually had mutations,
the same mutations as in our family, Family
311. So, five of the genes of these eight
genes actually had the same specific variant
in the TCGA germ line data set, and these
are present in about 1 to 2 percent of this
population. Two of these, when we then looked
at the 1000 Genome database, were actually
increases; so the frequency was actually increased
in TCGA with a P value of -- relatively low,
.02. So, again, if we had more samples, again,
we could probably increase the confidence
in that.
So, another parallel line of support was we
went back to that list of 24 genes that we
had here, and then we applied it functional
impact of mutations -- we used MutationAssessor.
Boris Reva, who developed it -- was one of
the developers of the program, actually helped
us look at all of our 24 genes, and independently
of the TCGA data, went through and tried to
score the functional impact. So, seven of
the variants were assessed as functional.
One of them had a switch of function, which
is this one here; so this is actually the
DNA binding domain. It changed the proline
to a serine. Three of the genes actually had
involvement in cancer. When we looked at the
overlap of our eight genes by TCGA, and the
seven genes here, four of them actually overlapped
and stayed together in the final analysis.
So, we then looked at the Pan-Cancer. We took
all those 15 genes between the MutationAssessor
and the TCGA. We looked at the Pan-Cancer
to look at the distribution of mutations there.
We broke it down then into the ovarian cancer
data, so we could see one of these genes actually
had about 6 percent, which we think is too
high, looking at the function. Again, I'm
more than happy to share the gene name and
its function later. And a couple of these
actually had some interesting distributions
-- 0 percent here, just because it's below
the threshold of 1 percent. That one gene
that we showed here that has the three mutations
-- this is actually the gene here; these are
the mutations that are found. It's also mutated
in breast. Again, an interesting that BRCA,
having mutations in both breast and ovarian
cancer. And this is another paper that had
actually come out earlier in the year, February,
looking the same kind of analysis, taking
the germ line TCGA data from the ovarian cancer
group, and essentially going through functional
filters and coming up with a couple of new
candidates that kind of bundled into these
functional pathway of significance. What's
interesting, I think, is that our final list
of candidates don't overlap at all with these.
So I think that the value of having some families
and formative families can still lead to new
gene identifications.
Here, just quickly running through this New
York breast cancer, we've selected three of
the women for sequencing. We did the same
kind of flow through using the gene pool software.
We ended up with 22 genes. Again, we then
sequenced six of the individuals here. Of
those 22 genes, six of six women shared one
gene. Five of six shared the same mutation
in five of the genes. Then, there's a distribution
four, three, two, one. Again, given that this
is breast cancer, and the population prevalence
is, you know, one in eight women, one in seven
women, will develop breast cancer in their
lifetimes, again, we assume that some of these,
they could all be familial, or they could
even all be familial in a sporadic form. What
was interesting when we looked at one of the
top candidates, and we looked at the potential
clinical significance, and overall survival
and breast cancer, if you actually take BRCA1
and you plot the overall survival, this is
what the distribution looks like. If you take
one of those candidate genes -- again, more
than happy to share that -- we get a very
similar survival curve from there. So, again,
it's not functional proof, but it is interesting
that that happens.
So, in conclusion, we think we've identified
a number of high-interest candidate ovarian
and breast cancer susceptibility mutations
in genes, and this was taken either through
a personalized approach within families and
between families.
[sneezes]
Salud. So, the use of the germ line TCGA data
allowed refinement of the candidate list,
and really now the next step is validation.
And if you hear a little tremor in my voice,
it really is, one, because the functional
studies now are really going to take up a
lot of time. And so, you know, one of the
thoughts we had, again, is if there's anyone
that's interested, again, in looking through
more of the breast or the ovarian families,
if there's additional data to share, again,
we're more than willing to share, because
if we can trend down that list of candidates,
it will allow us to do more of the functional
studies, again, generating mice, generating
animal models, generating these mutations
and cell lines to try understanding the actual
impact on the biology.
So, with that being said, the acknowledgements,
I just wanted to, again, acknowledge all the
families that participated in these studies.
A number of individuals, Peter Dottino's my
collaborator at Mount Sinai, Slav Kendall,
who's the post-doc in the lab -- and I guess
in the TCGA, guys, you say the manuscript
committee, so I guess Slav is the manuscript
committee by himself. Boris Reva, who looked
at these with MutationAssessor, individuals
here, Roswell Park and Chile, and also Sandeep
Sanga from Station X, who helped design and
modify the software so we could do these studies.
With that being said, thank you very much
for your attention.
[applause]
Male Speaker:
Have you looked at the mutation, is there
any SNP associated with a specific mutation?
Then, you can use GWAS to validate your finding.
That would be much faster.
John Martignetti:
Yeah, so there are. So, a couple of genes
actually are rare; they do not have SNPs.
So, let's say, the eight that we pulled out,
I believe, three have no associated SNPs.
The others, again, in a European population,
less than 1 percent. Again, that was part
of our filter, but great question.
Peter Laird:
Okay, do we have the next speaker, Chai Bandlamudi?
Okay, here we go. Okay, the next talk is "Discovery
and Functional Characterization of Recurrent
Gene Fusions from Primary Tumor Transcriptomes
across 19 Human Cancers."
[end of transcript]
