Male Speaker:
So, this is the part where we wrap everything
up into a tidy bundle.
Male Speaker:
That should be easy.
Male Speaker:
Go ahead.
Male Speaker:
So, we’ve heard a lot over the last couple
days and we’re going to try to use this
time to see what additional pieces of it that
we heard, separate pieces can be brought together
further. What we heard so far is very interesting
-- I think has been very interesting and very
valuable. It is also very expansive and so
we’re going to be thinking about -- we’re
going to have to be thinking about that as
we go forward.
I just want to start asking about -- I mean
now that you’ve heard the discussions of
all sort of three major parts if there are
any other points that sort of apply across
all three parts, so we heard the importance
of getting really accurate sequencing and
developing those capabilities and how that’s
clearly important for just about everything
we’ve talked about in the long term today.
But I was wondering if there was anything
else like that has occurred now that you’ve
listened to all three parts. And the other
kind of question I want to ask is similar
to that -- I’d like to find what parts,
among all these three parts, are useful to
bring together scientifically or they might
be none that could cut across all three parts
but it -- anything that suggests where the
scientific interactions have to happen between
these three, between this functional, the
clinical and the discovery parts. Some are
obvious between discovery and clinical and
some are obvious between maybe discovery and
function but it’s -- is there anything else
that ties across all three? So, I’ll just
throw those out to start.
Male Speaker:
And if I could just add to that. So, this
session here is to start to think of going
from the strategic things that we’ve heard
about to more tactical things. So, as you
think about your answers, think about, you
know, what tactics you would use to implement
some of these strategies.
Male Speaker:
Eric, go ahead.
Eric Green:
So, I’m struck that a thread that binds
all three together -- it’s not a technical
comment but the thread that binds all three
together is variant discovery or gene discovery
for disease; that basically that’s a first
step to than that to mechanism and taking
that to the clinic to improve diagnostic rate.
Again, if you’re looking for threads that
tie all three together in a very clear way
in a, you know, five-second elevator speech.
It just seems like discovery for health and
disease is what drives this.
Male Speaker:
And I also want to add that not -- sorry,
I got you Debbie [spelled phonetically] wait
just a second. Add that also in bounds in
this discussion are particular that if you
imagine that if you had three general kinds
of programs that did the kind of -- as we
sort of do now but maybe some modification
that are an evolution of that. These are suggesting
to us areas where we might have to build in
interactions rather than having silos. So,
with that said, Debbie.
Female Speaker:
I think the technology also binds them together.
Really, it came up in comparative but putting
the W back in whole genome is really important.
Okay, I think that thinking about how to do
that and Heidi should get the credit for that
one. But the moniker is true, I mean, I do
think the technology binds it being able to
dissect variation at different levels and
how it gets implemented in diagnostics, even
prenatal diagnostics is extremely important.
And so I think that ties together across the
whole array of applications. And I think that
something should always be in the mantra of
genomics that’s the reason it started --
Male Speaker:
And Jeff, did you have your hand up before?
Male Speaker:
Yeah I don’t want to -- we should continue
this part of the discussion but I want to
come back to this integration question. People
have really creative ideas about how we can
most effectively integrate rather than just
add another fifteen conference calls.
Male Speaker:
Yep, and there was a hand up over here. Yeah.
Male Speaker:
I was just going to say that another theme
I think we’ve been hearing is longitudinal
phenotype tracking among large numbers if
not everyone one of the people who are getting
sequenced for clinical reasons whether that’s
through a learning healthcare system or some
other model that I think that’s the theme
we’ve been hearing throughout the several
days.
Male Speaker:
I just want to amplify that. I think that
the longitudinal aspect of all that, it’s
like what Robert said earlier about some of
the most important things out of CSER won’t
even be, you know, haven’t emerged yet no
pun intended. So, I think the longitudinal
aspects critical in all of this.
Male Speaker:
Data renic [spelled phonetically] integration
is of course a perennial binding issue. We
haven’t spent almost any time talking about
biocomputing, bioinformatics and I’m confused
as to whether we’ve transcended that as
an item because it permeates everything, or
if I’ve just forgotten about it for now.
Male Speaker:
It’s too hard.
[laughter]
Male Speaker:
Yeah, I -- it does need its own discussion,
but it has come up in key places and I think
also the data integration is part of that
as well unless you weren’t including data
integration.
Male Speaker:
Actually maybe we can use that for a second
or two to talk about tactics then so, you
know, what tactics should we be thinking about?
Everybody talks about data integration but
when you think about all the different data
types that we’ve got, you know, we’ve
heard maybe a dozen different functional assays
that have been proposed by people all of which
the data is a different kind of beast. We’ve
heard about lots of different structural variation
which I think is another topic because we
haven’t figured out yet how to measure all
that structure variation. So, what strategies
or tactics do we have to integrate that? So,
you I think your hands are up.
Ewan Birney:
I don’t -- I mean this is the thing, I mean
-- I think that it’s become so pervasive
that we have stopped talking about it and
that’s why there are mathematicians here
around the table and I think that’s on one
level that’s a good thing. I remind people
that there’s these very -- I describe this
as two sides of this that’s what I describe
sometimes as blue collar which is sort of
making sure your data is straight, keeping
it straight, keeping the meta data straight
making sure it flows correctly, making sure
people can get access to it. That’s very
often quite engineering heavy and there I
really feel that we keep having the set point
of the teams. I know that’s like the fourth
time I’ve said it, goddamn it. We -- there’s
size of the teams need to be in sizes of five
engineers at a time, something like that,
rather than one or two people.
Then we have this much more sophisticated
what I sometimes describe as white collar
problem and there I think it is and that is
both the things associated with this big data
world and making sure that NHGRI really is
exploiting, using and attracting the best
people in this area is very, very important
I think because I think there’s a huge amount
of problems that are sat on this side and
there is no magic bullet. There’s no magic
thing that says “Oh, well if you only pulled
out this method it would work.” And so the
most important thing I think is to keep -- is
to invest and fund in both of these areas
correctly and then NHGRI has a headache about
coordinating with BD2K, with other IC’s,
with medical informatics.
Male Speaker:
I don’t think it should be a headache to
coordinate with BD2K, I mean, that’s a -- we
should relish that as an opportunity and it
gives us a much better leverage. The thing
about buying for biometricians is that they’re
capable of building things that immediately
apply across the institutes at NIH, and it’s
frustrating to have them -- to be -- things
being replicated in different institutes in
an inconsistent way so I would strongly encourage
that. And I would also exhort everyone to
make -- to remind themselves that this certain
“white collar” bioinformatics as Ewan
describes it –
Male Speaker:
Can we not call it white collar?
Male Speaker:
Yeah, let’s not call it that, but bioinformatics
at that level --
Male Speaker:
It’s kind of a weird -- I don’t know if
biometricians are part of the proletariat
or the bourgeois but I probably would just
kind of --
Male Speaker:
Okay, rejected your analogy, but the deepest
form of bioinformatics is a research enterprise
and it has to be supported as a research enterprise.
Male Speaker:
[inaudible]
Female Speaker:
Yeah, I think specific incentives for data
integration should be built in the form of
RFA’s or something like that because you
can’t expect necessarily different groups
to spontaneously integrate or the data to
integrate without specifically an incentivizing
of that portion.
Male Speaker:
So, this is an area where I worry that we
really have the wrong picture in our head.
I think it’s good it’s been raised. I
agree with Ewan, and this is what I mean.
First of all, what we need in some cases is
not new methods and technologies but we have
sort of an organizational problem, okay, which
is that we don’t invest in interoperability,
you know the white collar whatever -- the
high level bioinformatics tools are all non-compatible.
Very, you know -- some tools get used a lot
but most people write their own. There’s
really no infrastructure or platform or whatever
that the field relies upon.
Even things that have been successful like
file formats. I often hear people say, “Well
man, we have UCF.” I mean the history of
that is probably people know was there was
there were no file formats so the thousand
genomes project at cold springs harbor we
sat people in a room and said you have to
come up with a file format and then there
was no governance of it or evolution because
it didn’t belong to anyone or anything.
That’s now taken on by the global lines
data working group, you know, to move that
forward but we have a problem which is the
picture in our head is often to do projects.
To people write up a program and then we’re
surprised when it’s not interoperable, and
the solution is not a monolithic approach.
It’s not to have the big data base in the
sky or to force everything together but we
do need to change how we do it.
And the last point I’d like to make focusing
more on, how do we learn from other areas
where it’s a virtuous cycle. It’s not
how we work, how they work. The other point
I want to make is BD2K at least to date, at
least I think it was [unintelligible] I think
it will change, does not see itself as I understand
it in fixing this organizational problem or
taking on some of the tasks that we talk about.
At least to date, maybe it will change, it
has been focused on fundamental data science.
Okay, that might be a great -- Eric Green
who ran it up until now is nodding his head
yes. Okay, but -- so I worry a lot that the
problems we’re talking about are not generally
problems of fundamental data science. And
then everyone goes, “Oh, BD2K will take
care of that.” I think it’s highly unlikely
as currently configured. Maybe Phil will change
it because he’s a great guy and he’s starting,
but I just worry that in the discussion -- yeah
it’s BD2K that’s going to take care of
that and there’s no plans for BD2K to take
care of that so no one’s going to.
Male Speaker:
So, what strategy can NHGRI do to try to engage
and fix that problem?
Male Speaker:
And NSF won’t take care of it either, just
to add.
Male Speaker:
So, I mean, some of the things -- David may
want to comment and others is again trying
to figure out what limits people doing these
things. Okay, some of its incentives but incentives
can be created by you know, people said grants
requiring people to share. We see with dbGaP
for example that people are required to share,
but for a lot of architectural and regulatory
and other reasons it doesn’t actually flow
all that smoothly.
So one thing, I don’t want to be a broken
record in this or suggest it’s the only
approach but trying to work on shared API’s
trying to use what the people use in other
fields of processes developed open API’s
to let people iterate them and trying to parse
the problems so people can write tools that
are interoperable and plug and play et cetera.
David might want to comment more. But there
are some things and they are being worked
on. The question of what can NHGRI do? Well,
it could support some of those things. It
could encourage through its grantees to actually
use such systems in some ways or measure whether
or not data is flowing as opposed to just
say, did you check a box? I don’t want to
make up on the fly what NHGRI can do, but
it’s not going to be start with a new monolithic
approach it’s going to going to have to
be figuring out that strategy.
Male Speaker:
Sort of a meaningful use for genomic data.
Male Speaker:
You know, I think that --
Male Speaker:
This is what the meaningful use standards
is supposed to do and didn’t do. So, we
actually need to look to other industries
or other places because the whole point was
API’s was supposed to be wrapped around
this part of the meaningful use standards
and it didn’t happen.
Male Speaker:
Doing deep bioinformatics in the context of
one of the truly great challenges that we
have discussed at this meeting is a tremendous
opportunity. That’s really -- you know,
you have to embrace the bioinformaticists
as a key part of the team. Fund them appropriately
to get their part of it done and let them
be a real team member with these great challenges.
It’s a simultaneously working out the new
-- the new theory and the deep kind of things
they would actually get credit for if they
were in a computer science department. You
ought to do that, but doing that in the context
of a grand challenge problem that NHGRI has,
has identified so you’re developing the
API’s and then you’re working on deployed
implementations of them at the same time so
they’re proven in the field and we’re
actually making great scientific progress
with them. That’s a huge opportunity so
we just need to bring them into that. Give
them the opportunity.
Male Speaker:
Just a -- I want to echo what David says,
but also say, you know, the API’s are, you
know, necessary but nowhere near sufficient,
you know, start on this right because if you
think about, you know, how Facebook or Twitter
or Google work, right, it’s an incredibly
well orchestrated and designed and, you know,
built set of systems -- in order for you to
have the functionality that you have in those
settings, right and the set of problems that
we’re trying to challenge --
Male Speaker:
-- I think that’s the wrong analogy though.
I actually think the analogies, we’re trying
to create the internet. No, no I’m actually
totally serious. I think that this is actually
one of the huge problems we have as a field
is that we think that what we’re doing is
creating things that sit on top of an internet.
Internet allows data to flow, okay? And then
once data flows people can add layers on top
of it of increasing value and right now data
doesn’t flow. Data is isolated in silo.
Male Speaker:
Right, so I think the reason why Facebook
isn’t too good analogy is Facebook is a
big closed system there. That’s not what
we’re talking about. I think it’s a question
of wrapping metadata tags around pieces of
data and what’s true about those sorts of
things, X amount type of limitations, is that
they’re extensible. They tend to be things
that can be started lightly and grown and
grown. I do think it would be worth getting
sets of people from the tech industry in but,
you know, Craig Mundie at Microsoft talks
an awful lot about this about ways that you
build a system that start light and grow rather
than a Facebook kind of thing.
And I mean there are few of us in this room,
myself is not one of them who are deeply experienced.
David Haussler knows a lot about these things.
But I think the NHGRI could benefit from finding
a handful of distributed tech advisors who
do this routinely and have them come and look
at what we’re doing and say, yeah we were
in this position for certain things and we
got out of it this way. My points about the
meaningful use standards for the Office of
the National Coordinator for Health Information
Technology was there was recommendation to
do precisely this and it wasn’t done, and
there was a set of people who tried pushing
to do it. We could get them in and help us
because there’s an evolutionary -- it doesn’t
have to be perfect at all, it’s quite incremental
but it would help.
Male Speaker:
I’m not disagreeing with you but, you know,
in order, you know, to do the large scale
analysis people are going to want to build,
you know, bigger systems in order to put that,
you know --
[talking simultaneously]
Male Speaker:
[inaudible] --
Male Speaker:
-- And that’s not going to -- you know,
so the extraction of meaning from the data
isn’t going to come from the API.
Male Speaker:
No, no, no what comes from the API --
Male Speaker:
[inaudible] standards --
Male Speaker:
-- is the ability for third parties, you know,
the graduate student tier or the post doc
there to write a piece of code that plugs
in, and it unleashes great creativity because
there’s API’s.
Male Speaker:
Carol --
Male Speaker:
That’s all.
Female Speaker:
So, I think the biggest challenges to integration
is not data format or API’s but going back
to the semantics of what we’re talking about.
So, phenotypes -- defining phenotypes in a
way that the data you share is meaningful.
You put a bunch of clinicians in a room and
you ask them to define what schizophrenia
is you’re going to get about a thousand
different sort of descriptions about -- so,
how do you integrate data on schizophrenia
when the phenotype definitions being used
to collect the data in the first place are
so divergent?
So, I think semantics and metadata tags as
Eric said, I think that’s really, really
critical to data integration. That’s one
and then the other thing is, how do we represent
uncertainty about a data we’re generating
so that it can be computed on? I think that’s
another critical issue that we haven’t really
addressed.
Male Speaker:
And this is a great topic; we have a few more
but let’s take two questions and move onto
another area.
Male Speaker:
Go ahead.
Male Speaker:
All right so I just sort of wanted to second
what Carlos was saying. I mean, you can sort
of ask what comes first, the API or the standards
or then tackling the question the research
or do you start to think about the question
and the research and then think about the
type of interfaces an API should need. I would
argue the latter. I mean, I think you’ve
fundamentally integrate the data and put it
together and to get good people to work on
it, you really have to have clear research
questions, people really thinking about interesting
problems, and then they’ll build all the
interfaces but I think you need that to motivate
things.
Male Speaker:
Yeah, but is that what we’ve been doing
the last 25 years?
Male Speaker:
Yeah, but --
Male Speaker:
No, I’m serious -- I mean that in all seriousness
at least for myself I feel like --
Male Speaker:
-- I just think here we’re going through
the same growing pains -- it’s been very
revealing to me the conversations we’ve
had with the CERN data group which has grown
up over the years around the NHC and what’s
interesting is that they went through a period
where the data stuff inside of CERN was sort
of hidden inside of their projects and they
didn’t understand why and they didn’t
think about it deeply. It was only in the
1990s that they really sort of said, “Wait
a second, we really have to treat the data
stuff as a separate part.”
And they made it -- I mean it’s in a very
CERN way we don’t have to copy it but what
they went through socially I think is what
we’re going through at this moment by saying
we need a data infrastructure which is data
API’s and back ends and all of this stuff
that allows us to do good science on top of
things. In a certain -- they viewed it in
the same way they talked about the ring as
being a piece of infrastructure for physics
in which you do great clever physics but the
making of the ring is an engineering --
Male Speaker:
And I think I --
Male Speaker:
-- and so I just want to say when we’re
drawing -- I mean I just think we’re growing
opposite data science. This is part of our
growing pains, there’s stuff that we should
carry forward from what we’ve done. So,
for example data openness compared to many
other sciences, we are naturally open in a
way many other sciences aren’t including
high energy physics for example. So, that’s
a good thing to take from our history but
there are some other things that I think we’ve
just got to leave it behind. Like our own
kooky file formats, we just have to leave
it behind. And, you know, so that’s not
a revolution or an evolution, we just got
to mature as a data science.
Male Speaker:
And in particular, you know, as you -- I agree
entirely with you and I agree also what you
said about the engineering teams can’t be,
you know, three to five people because we’ve
tried that and what you do is you end up with
lots of things that don’t connect. I do
think it’s very informative and if NHR hasn’t
done it they should, some of us have done
it in other settings, talk to a bunch of people
who do work in tech or who have tried to get
into genomics and -- because they’ve all
looked at it.
They think it’s an analogy. They just are
perplexed and dumbfounded at how we do things.
The utter absence of any standardized interfaces
or any group that comes together and tries
does that. Like, when we set up the global
lines not to keep doing this I can tell you
what the Googles and Amazons and all these
people said was, “Oh, now we might consider
working this field. One of the things that
stopped us in our tracks is that you could
talk to 10 different people and hear 14 different
things about how you should do it, and that
makes it a market failure.” Okay, now whether
NHGRI can play an important role because they
can’t do it alone but for example even just
the lining that it will support things and
also one of the things that they tell us about
standards is you can’t have 16 sets of them.
And one of the things about our community
is whenever you talk with anybody someone
goes, “Yeah I’m doing it,” and someone
else says “We should have a meeting to talk
about how to do it, because I’m not doing
it.” And the problem is if you have many
different sets of standards then no one can
from the outside build anything and have the
hope that there will be a market. Because
in the absence of something to plug into,
because if every wall has a different plug,
no one can sell a toaster.
All right, so NHGRI could -- that’s what
we’re talking about guys. We’re talking
about it like no one can sell anything and
build anything because it doesn’t plug into
anything else. And so NHGRI could line up
behind those things, trying to line up BD2K
behind them, not try to pick the winners but
set conditions on which there can be a virtuous
evolution of a market based on what actually
works.
Male Speaker:
So, I think it’s been a great discussion
but there a few other sort of common themes
that maybe we should move on to but one of
the other common themes I think we heard that
crossed all three boundaries was this whole
issue of we don’t know to yet capture all
the kinds of genetic variation that exists
in genomes. So, you know, we heard about today
the idea of a telomere to telomere sequence.
We heard about how to think about doing that
across evolutionary space so that we understand
a little bit more about what’s conserved
and what’s not conserved. So, thinking about
that what strategy or what tactic should NHGR
be thinking about to help advance that project
of being able to capture all kinds of variation
in genomes.
Female Speaker:
Well, I was just going to say to me that’s
a critical piece that goes across everything
we’ve talked about and one can certainly
imagine taking the Mendelian project, for
example, and taking the families that really
look the most Mendelian but for which no alterations
been found and to do these absolute platinum
genomes on those families once we do them
on normals and know what’s missing. Because
I think we, you know, in the commercial space
-- and in the clinical lab space we compete
on who’s got the best coverage, but what
are we covering, right? We don’t even know
what genomic regions we’re not even sequencing.
So, I just think having incredibly -- well
back to the whole, you know, the W [spelled
phonetically] and the whole genome that informs
the Mendelian project, it informs the CSER
project, it cuts across all of them.
Male Speaker:
Now I can just say from the 1000 Genomes structural
variation group, one of the things we’re
finding is we do need to encourage input of
a lot of different technologies together and
integrating that data properly and so anything
we can do to encourage that from NHGRI would
be great.
Male Speaker:
So what would that be though?
Male Speaker:
Well, so in our case it’s for example incorporating
data from paced bio reads, integrating optical
analysis data for example, PCR free DNA libraries,
et cetera, and then coupling that with a huge
amount of validation using orthogonal approaches
as well. And we’re doing this actually in
a U41 setting just as a FYI so it’s very
-- it’s worked in a very concerted effort.
Male Speaker:
So I want -- I’m sorry Debbie, I want to
pick up on something you said because I always
-- I always think about the balance of what
can be done in this standalone technology
development program or center or grant and
what really benefits from being -- having
integrated data types in more of a center
that can do a bunch of things but may not
focus on any particular type of development.
So, here’s the, you know, where’s the
right -- what is the right way to balance
that or to handle that? Do you absolutely
need both or is it --
Male Speaker:
Yeah, I personally I do think you do need
both. You need to encourage the development
of the new technologies and then when those
become available to try and integrate that
as and test them robustly as in the larger
scale projects.
Male Speaker:
So, I guess another part of that question
is so we’re not looking at the next four
or five years predictably a magic bullet technology
that’s going to do the end to end chromosomal
sequencing nobody really has faith that that’s
going to happen to the extent needed.
Male Speaker:
I mean, let me add a little bit of perspective
I guess on this. We have spent quite a bit
of time with PacBio as one platform and certainly
there are regions on the genome that we still
can’t dissemble with PacBio technology.
We’ve looked at this specifically; we know
where those regions are. But the regions we
can access and assemble in a routine fashion
as opposed to a targeted, you know, mom and
pop operation where we go after each of the
difficult regions is really diminished significantly,
and I’m actually quite excited by the potential
not just for PacBio, but for any of these
long reads technologies.
Something that we’ve brought up over and
over again over the last 10 years. The importance
of long reads in terms of comprehensively
accessing genetic variation still remains
really high but I think if we had an increase
from let’s say 30 kilo base reads to 50
to 100 kilo base reads, we really could be
talking about, I mean, I think we’re at
a transformational position right now in terms
of general assembly but it’d be another
catapult that where NHGRI could really invest
in the advancing and helping advance tech
--secret technologies to get us to that next
level. I don’t think we’re 10 years away.
I don’t even think we’re five years away.
The technology exists now to increase by two
orders of magnitude [inaudible] --
[talking simultaneously]
Male Speaker:
-- [inaudible] we could get a gold genome
within three to five years, and we need do
that.
Male Speaker:
But I understood that you were asking for
platinum genome and I was wondering about
the difference between the gold and Evan’s
90 percent --
Male Speaker:
Evan made a distinction here. Evan made a
good distinction, right? So the gold genome
is going to be good enough without every base
of every centimere to do a lot of stuff and
we need to go after that aggressively and
we can get it to three to five years.
Male Speaker:
And the gold genome gets how many? What percentage
of your remaining insertions are you worried
about?
Male Speaker:
Oh, gets all.
Male Speaker:
Gets all.
Male Speaker:
So, what it gets is it gets all the chromatic
variance with -- irrespective of whether they’re
inversions, insertions, deletions irrespective
of size or complexity, that’s gold.
Male Speaker:
If Evan’s happy with it, we’ll all be
happy.
[laughter]
Male Speaker:
Jay.
Male Speaker:
So, telomere to telomere is, I mean that’s
a technology goal and that will take some
time, but you know, you said three to five
years for a gold genome. You know, like kind
of alluded to it this morning PacBio I think
it’s reasonable to do a gold genome now
and it’s more expensive than an Illumina
genome, but it’s not intractable.
Male Speaker:
We’re talking about $10,000.
Male Speaker:
Let’s even say it’s $50,000 but I think,
you know, there’s a -- I don’t think this
papers come up yet today but a recent paper
from Han Brunner and Joris Veltmann I think
in Naturen kind of illustrates how much we’re
missing in exomes when we -- I don’t think
we have a real appreciation for until we sequence
a genome and you see how much your actually
missing. And I don’t think we have a real
appreciation now quantitatively for how much
we’re missing even though the genomes that
we sequence. And you know doing 50 or 100
PacBio genomes would go a long way towards
serving that goal and not waiting three years
to do it but just go ahead and finding out
what happens and that’s something we can
do tomorrow, right and it would also serve
the goal of supporting a second technology
other than Illumina.
Male Speaker:
I think this is a good thing to shoot at.
I think it’s a good thing to use program
announcements for, that rather than huge efforts
to do it -- there’s a lot of creative ideas
out there. PacBio’s a good one but maybe
there are a bunch of other ones too. But right
now, it’s study sections. New assembly programs
don’t get reviewed well because there’s
a sense on the part of reviewers that well,
it’s a solved problem or a boring problem
or filling in some of this stuff is kind of
not so interesting.
I think if you merely had a program announcement,
you’d begin to get some R01’s maybe not
to do 50 but give me 10 good genomes like
that, which at these prices you could do.
I think that there’s a lot of creativity,
high Sedic [spelled phonetically] to be able
to jump over things, other kinds of interesting
long range technologies, and it would be good
to get a bunch of creativity. What I wouldn’t
want to see is a, you know, single monolithic
large project to get 10 or 20 or 50 gold standard
genomes. What we actually need is a way to
really be able to turn the crank and make
them and I think there a program announcement
that signals the study sections fund some
of this stuff could be a good thing.
Ewan Birney:
So, just saying, you know also Manipal is
another technology here and for a long time
they’ve not seen a change in signal as the
wave length goes through and it really does
look like it’s limited by sample prep. So
the interesting problem that’s a problem
or there’s things about tuning up the chemistry
and the read-out system but in fact some of
these challenges are going to be about the
upfront process of sample prep and delivery
to the pole [spelled phonetically] rather
than the actual process of reading it. So,
that telomere to telomere view might not be
quite so crazy actually. If you can get a
sample prep sorted out in a five year time
scale which is kind of awesome actually.
Male Speaker:
I mean, I think this discussion is a really
important one and I agree it’s not just
PacBio, it’s not just an either or, it’s
not like we wouldn’t do Illumina, we use
other technologies. But I think one thing
that has been touched on a couple times is,
what’s the best way to do this? My sense
is there’s still a great advantage to having
the large-scale genome centers be involved
in these activities largely because they’ve
been involved in a lot of large scale sequencing.
These are -- to generate for example, you
know, a 250 flow cells of data from or smart
cells from a PacBio does take some time, takes
months and there’s process involved, there’s
management, there’s annotation of sequences,
there’s algorithm developments, there’s
software developments. And I think, you know,
in terms of a mechanism of this if we’re
going to proceed in this direction is to involve
maybe small groups of individuals in conjunction
with large scale genome centers to really
pull this off because we do still need muscle.
It’s not as if this fits perfectly with
the seven characteristics Adam that you mentioned
in the beginning in terms of scale and consortia
and so on, but there’s like a really -- it
shouldn’t be too big and it should be kind
of a pilot in some respects but at the same
time you want to involve enough muscle and
enough expertise to get the job done.
Male Speaker:
So, this is something that came up in the
comparative genomics and evolution break out,
this whole concept of organizing this around,
I don’t know, an ENCODE-type consortium
where you have technology development which
is aimed at the [unintelligible] genome, where
you have an allelical development which is
about assembly, alignment, algorithms, you
know, element of discovery, and where you
also have production of “Let’s do a bunch
of seeds for, you know, all of these programs
to work on.” And these seeds can be a bunch
of different species, a bunch of different
populations, a bunch of different individuals,
which can be serving as gold standard references.
So, when you put all that together as a vision
I think you will get buy-in both the technologies
and the computational info as well as the
sort of reference both species and populations.
Male Speaker:
Just a little point there, if we keep the
data producers being the genome centers, we
have the Bermuda standards which means data
get out there. They get out there and they
become accessible immediately, which will
then create a whole proliferation of new algorithm
development and software development. I think
there’s a real benefit to not having that
be localized to a group but to be distributed.
Male Speaker:
[inaudible]
Male Speaker:
Oh absolutely, in terms of data -- yeah, yeah.
Male Speaker:
[inaudible]
Male Speaker:
So, the quality is there yeah.
Male Speaker:
[inaudible]
Male Speaker:
David.
Male Speaker:
This is an important issue for the Mendel
projects as well. I think there’s nothing
more frustrating than to have all the family
re-agents in hand and still not solve the
family and so we’re always wondering whether
it’s somewhere in regions that we can’t
access. And I would also urge, Evan mentioned
this morning, that it would be great to have
this kind of resource for at least 50 different
individuals from around the world because
we get samples from as the point was made
earlier from all over the world. So, having
this kind of gold or platinum genome from
a lot of different populations -- at least
one representative of a lot of different populations
would be quite helpful I think.
Male Speaker:
So, can I transition slightly but build on
a point that was just made. So, I’m going
to refer to what I call the “Bustamante
matrix,” which we saw this morning. So,
what is the right balance? You know, this
raises the whole question, what raises the
right balance versus a large-scale project,
versus a U54/U01-type metric, versus an R01?
And so getting the right balance of passionate
PIs whose necks are on the line versus the
cost efficiency and production standards of
a large scale center and the probably somewhat
in between of the U54’s and the UO1’s
what’s the right way to think -- what’s
the right balance of the portfolio for again,
thinking about strategy -- or I’m sorry,
tactics going forward. Adam, you said you
thought it was about 25 to 30 percent R01’s
right now?
Male Speaker:
[inaudible]
Male Speaker:
So, what -- and then factor into that the
use of program announcements that can actually
guide the direction. So, what are people’s
thoughts about what the right balance of that
is? Conversation killer.
Male Speaker:
So, I think on another level the NHGRI has
actually been doing this experiment for the
last decade or more to try to figure out how
to do science at scale by mixing these various
types of things. The large scale consortia
projects, the directive projects, the R01’s
and given the outputs of NHGRI funded science,
I think the experiments have been arguably
largely successful but that doesn’t mean
that what has been done in the past should
guide what’s done in the future.
So I think probably there’s no direct answer
to the question that you asked to what the
right balance here is, but the way forward
is to be responsive and to probably keep experimenting
between large projects, R01’s and as Ewan
described earlier, you know when or at least
be comfortable making a bet to take something
into a large scale sort of consortium project
as it gets to the tipping point so it can
be pushed into that. And obviously there it
won’t be perfect but I think it’s something
to keep experimenting and something that NHGRI
has been good at experimenting with and getting
good outcomes over time.
Male Speaker:
At the risk of being accused of putting the
shoe on the other foot, I agree with the comment
that was just made but as a follow on one
has to measure the outcome and I’m wondering
does NHGRI track this -- these experiments
in a way saying, you know, if I have 20 percent
of this or 30 percent of that what do I tend
to get out over the next few years?
Male Speaker:
What’s the metric?
Male Speaker:
Yeah.
Male Speaker:
It’s really hard to measure.
Male Speaker:
You have to -- it’s a hard metric but, you
know, having been on the other side of that
equation I’ve been asked to come up with
metrics.
Male Speaker:
And some -- it’s so context dependent I
can think of a number of times when we’ve
had R01’s that came in and were really stretching
the boundaries of what an R01 was and actually
ended up being a U grant later on because
that was most appropriate for that scale of
effort.
Male Speaker:
So, something that hasn’t been touched on
and it sure has done very well and I want
to make sure to mention it is NHGRI has provided
strong project support in the form of very
strong program officers who often could steer
and support a project without necessarily
controlling the funds. And so for example
like the 1000 Genomes project, which has been
a coalition of the willing with like really
almost no dedicated funding, not really a
strong governance model, and yet if you ask
why did that project succeeded, there was
a coalition of funders who did support individual
activities. Then there was Lisa Brooks and
her team she worked with who really were glue
who held that project together.
And I’ve worked with many institutes and
like, you can have the mechanism that’s
supposedly a very collaborative mechanism
but where the inmates, you know, the sort
of the animals in the cage are clawing at
each other and no zookeeper is actually having
them go in the same direction, even though
the mechanism would suggest they’re all
going to work together. And then I’ve seen
others where no one knows where the funding
is coming from and yet everyone works together
really well, and that’s a testament to a
good project leadership. And I think that’s
something in my experience NHGRI, no offense
to any other institutes around the table,
and NHGRI does better than anyone. And so
what is the action item, like, value, support
and recruit and retain you know, people who
-- no, who are really strong and you know,
program officers who don’t try to control
and dictate but do actually add value. And
you’ve done that well and you should keep
doing it.
Male Speaker:
Thank you, so I always wanted to be a zookeeper
when I was a kid.
[laughter]
Male Speaker:
Which animal? Which animal, Adam?
Male Speaker:
I won’t say which animal. I agree with you,
David, and actually it is although it’s
not an overt consideration, my colleagues
and I talk about it from time to time about
models that scale to that -- to allow that.
I will also add that although sometimes we
love to take credit for anything good that
happens, that in fact a lot of the amounts
of management things need, the amount of people
very much depends on the individuals who are
involved and also on the institutes -- expectations
of the institutes that may be substantial
partners in it. And I think those variables
are actually larger than what you said, but
both are really important, so thank you. Debbie.
Female Speaker:
I want to change topics again, is that okay?
Education, it hasn’t come up. I have to,
have to bring up education, and we have to
train more people in genomics. We need to
continue that trend, we probably need to double
what we’re doing in education in genomics
as it takes mainstream and I don’t see us
going there. I also think we need to change
the people at the table. I see lots of grey
here; I want to see lots of young in the future,
right? More young. It’s younger than it
was the last time we all met, I have to say
that I’m very happy. But I want to see more
young people at the table, too. So, I’ll
just start it there. Education is really important
and I know genome has always been interested
and supportive of it, but I think we have
to be even more supportive in this climate.
Male Speaker:
So, what’s the strategy, fix it.
Female Speaker:
I mean we need to increase training at all
levels. I mean, we don’t have enough training
grants, we don’t have enough slots on training
grants, we don’t have enough anything on
training grants; we’re not diversifying
as much as we should be on training grants.
Where do we need to go to do that, I don’t
know but we try, I mean genomics tries harder
than I think most groups do but we cannot
give up and we have to move forward and improve
this.
Female Speaker:
I always jump on this bandwagon whenever it
gets brought up, because I think it’s incredibly
important, and that most of training programs
today focus on bringing physicians into science
and funding their research. But the reverse
of that I get, you know, I run a training
program in clinical molecular genetics to
bring people to do clinical genomics for the
last seven years. I now have applications;
I got 65 for one slot last cycle, and these
are incredibly talented, largely PhD’s,
interested in moving into the clinical translational
space. And there’s no -- they’re all excited,
young, and energetic, and I can take one or
two of them. And this is the same with the
other, you know, handful of programs across
the U.S. So -- and all the programs complained
there is no source of funding for the PhD
training in the clinical space, the reverse
of it. So, I think this is incredibly important,
you know, opportunity to harness the young
PhD’s that can move into the space if we
can do something right.
Male Speaker:
So, I’ll get David and then I’ll get Jim
but you know, one of the things this flies
in the face of is been a lot of really high
profiled papers in the last year or so that
have talked about the fact that, you know,
maybe we’re training too many people. And
it may not be we’re training too many people
in the area of genomics but we’re training
too many people in the area of life sciences
so, I just think as we think about that we’ve
got to think about this new environment where
this issues been raised I think at the fairly
highest levels of our sort of scientific communities.
So, David?
Male Speaker:
Yeah, I would also like to mention medical
students. We’ve had a lot of disparaging
remarks in the last two days about how physicians
can use -- what they are able to interpret
out of this data that we’re generating.
And going forward, we have to have a much
more sophisticated physician user base, I
would argue. If NHGRI could provide, oh I
don’t know, modules of information related
to clinical genomics that could be use or
co-opted by medical schools to educate their
students, I think that would be highly desirable.
We’d get over -- it’s like making an investment
for a long time.
Male Speaker:
So I just want to amplify what Heidi and Debbie
said. I think it’s not just, you know, altruism.
The presence of trainees, and having a critical
mass of trainees, really vitalizes programs,
and I think propels success and tackling new
things. So I agree that as far as bang for
the buck in propelling progress, it’s probably
a really, really good one.
Male Speaker:
Mike?
Michael Boehnke:
And I’d just like to respond to Rex’s
comment. Yes, this comment is always made:
“Are we training too many people in genomics?”
We are not training too many people, and particularly
on the quantitative computational side, these
guys have lots and lots of job options. Many
of them don’t do post docs. I mean, there
is a huge demand for this area. It is increasing,
it’s not decreasing, and if we look at what
data is going to be coming up over the next
period of time, who’s going to be analyzing
this data? Who’s going to be interpreting
and helping us go forward if we’re not training
enough people? And we’re not, absolutely
not, in the quantitative sciences.
Male Speaker:
I want to add my voice to that chorus and
also raise a concern about the current move
afoot on training grads, which is to use them
as mechanisms that are distributed across
as many institutions as possible. Right? So
there’s this sort of discussion about breaking
up training grants into smaller and smaller
slots, and this is certainly what you’ve
heard at NIGMS, and I just think it’s incredibly
bad thinking.
You know, the places that are most successful
in producing research should be the places
that are doing the -- it should be sort of
proportional to the training, right? Or the
places should come up with training programs
that, you know, are -- that have the right
kind of argument, and I don’t think that
that view is going to produce the best set
of trainees nationally.
Male Speaker:
Mike?
Male Speaker:
I’m sorry, one other comment, and I’m
strongly agreeing with Carlos. My wife has
had training grants with four and six people,
and ours is not huge. Michigan in genome science
is now 13, it’s been 10 -- I’m sorry eight.
With eight, to 10, to 13, you can start to
build a critical mass; with four or six, or
eight, or six, there’s just no way you can
do anything interdisciplinary and have any
kind of critical mass of people. We can’t
get too small; we can’t get too spread out;
we need our programs to be big enough that
these students are interacting not just with
a strong faculty, but a diverse set of fellow
trainees across a range of disciplines. It’s
incredibly important.
Male Speaker:
So what is the impediment for increasing the
amount -- the number of trainees, meaning
training grant slots? I mean, in terms of
the numbers --
Male Speaker:
It’s always money.
Male Speaker:
-- and dollars, it’s not huge amounts of
money in the grand scheme of things. And I’m
actually appalled by this, largely because
this is the first time my wife had been asked
to take on a training grant, and at the same
time has to reduce her number of slots by
half over a five-year window, which I think
is unfair, because we produce -- and others
do -- great trainees, and there’s a demand
for them. So I don’t understand why this
is such an issue, and why NHGRI if they’ve
been good, couldn’t be better in this regard.
Male Speaker:
Having taken on some of Aikler’s [spelled
phonetically] trainees, I’d like to vouch
for him. I think he does produce some very
good trainees.
[laughter]
Male Speaker:
So I think we’ve had a really good discussion
on a variety of areas, or actually there are
probably a few more that we could have but
we’re running out of time. So are there
sort of any last minute, burning issues that
anyone would like to get on the table? Okay,
great. Thank you, Eric, I think you’re up.
[end of transcript]
