Hello, my name is Phillip Compeau, I'm the
Assistant Department Head for Education
in the Computational Biology Department
at Carnegie Mellon University and that's
certainly a mouthful but what it means
is in part that I work as the program
director for our computational biology
major and so I'd like to tell you a
little bit about that major today and a
lot of the fun things that we do with
students and kind of a big picture of
what computational biology is and I
choose the penguin metaphor here
because we're really the first
computational biology major that you can
do in a computer science school and for
that matter we were the second major in
the school of computer science at
Carnegie Mellon you could only major in
computer science for about 30 years here
if you wanted to study computation.
Starting a few years ago we we realized
that we wanted to broaden what we offer
in the school of computer science and so
now we have a major in computational
biology - we were the second major that
you could complete and hopefully over
the coming years you'll start to see a
few more majors now that we have
Artificial Intelligence and
Human-Computer Interaction as well. I
borrowed the penguin analogy from
Randy Pausch, who was an instructor at
Carnegie Mellon and sadly passed away
too young but he had a great analogy of
the... in a group of penguins any penguin
the one penguin has to be the first to
jump off the iceberg into the water so I
used this analogy to inspire his
students and it's an image you can still
see on the CMU campus today. There's a
bridge from the School of Computer Science
to the Drama Department that has a bunch
of panels and each panel has a single
penguin on it in his honor. So we're in a
sense the penguin to jump off and try
the waters to broaden what we do in the
School of Computer Science as well as to
offer a Computational Biology major to
computationally minded students, and
we're looking for...still looking
for the other Penguins to join us 
and jump off as we continue to recruit
students and so I want to talk about
three things today. I want to talk about,
first, what the heck is computational
biology? Since if you're watching this
there's a good chance that you
know a little about computational
biology and I think there's probably a
good chance
that this might be your first
introduction to computational biology so
if that's the case I want to spend a
little bit of time talking about what
are the big picture problems that we
study in computational biology as of
today? So here is a classic problem in
the field where we want to say, “what is
it that your DNA says about you?” or for
the species, “what is it that corresponds
to the DNA of a species?” So, for example,
you can consider lots of medical
problems that have to bear on this
and I'll discuss that very briefly in a
minute but most people don't realize
that this is ultimately a computational
problem so if for example we want to
know what is the DNA in your genome in
each one of your cells what are the
three billion letters of As, Cs, Gs and
Ts that make you a unique organism on
this planet.
Well, how we can do that is we can take a
sample of your cells, we can apply some
technique that will break your DNA up
into small fragments so they may be a
few hundred or a few thousand
nucleotides long, and then there has been
an enormous amount of investment in the
lab techniques that will allow us to
read these short fragments. So we still
can't read a very long fragment of DNA
on the order of millions of nucleotides
but we can read shorter fragments for
very technical reasons and so once we
have those letters -- the As, Cs, Gs and Ts --
once we know those sequencing reads we
want to then use the overlap because we
had multiple original copies of our
genome we want to assemble the genome
using those overlapping fragments so
here you can see that the orange
fragment overlaps heavily with the blue
fragment and it overlaps heavily with
the green fragment and now at the bottom
we can start assembling that genome
shown in black. The critical part is it
shows that just in one slide you see the
balance between laboratory techniques
and computational approaches. This
actually cost three billion dollars just
for the first human human genome and
many more billions of dollars have been
spent over the
past two decades in terms of making this
process a lot more high fidelity ie.
much more accurate faster and 
cheaper as well so we're now at the
point where we can sequence a human
genome for on the order of about a
thousand dollars where it used to cost
three billion dollars so it's this
remarkable moonshot human achievement
and it boils down to being able to take
those overlapping fragments of DNA and
assemble our genome using that
overlapping information. So that's a
computational problem, and if you think
it's an easy problem then you're
unfortunately mistaken, because you have
to handle a lot of noise in the data you
have an enormous data set, you're going
to have potentially hundreds of millions
or over a billion of those sequencing
reads for certain organisms, and you have
to be able to manage this data and
figure out how to put this puzzle back
together in a natural way very
efficiently with a computer. So that is a
classic problem in computational biology
and I can kind of tweak this very
slightly and show you a more modern
problem in computational biology which
is say you do what we have done with our
students -- whether it's our high school
program that we run every summer for our
students or whether it’s students in our
laboratory course that is based off of
quantitative methods in in lab biology
where we go out to Pittsburgh's three
rivers and we sample river water from a
lot of different locations in
Pittsburgh's three rivers over different
seasons and in that case when we scoop
out the water the DNA that's present in
that sample is going to correspond to a
lot of microscopic organisms especially
a lot of bacteria. So the word identical
goes away if I then applied this same
technique. What happens if I shatter the
genome into reads? Now it’s genomes, and
what happens then when I sequence those
reads what are the types of analyses
that I then carry out? How can I figure
out what all the organisms are that are
present in that sample? How do I
differentiate that from one sample to
another? Maybe they're at different
points in a river, maybe they're at
different points, different
time points throughout the seasons, and
this problem is far more general than
just river water. You can for example
view the... see the same problem in many
many different clinical studies that are
looking at, what are the microbes that
are present in your body? So this is
another biological fact you might not
know. Probably over half of the cells in
your body are not yours, they're
Bacterial. And there are many research
projects going on that are trying to
sample these from different locations in
the body, figure out how the bacteria are
different from one person to the next,
and then try and figure out how this
might be implicated in human diseases as
well so it's a fascinating research
problem that's still very modern and of
course a lot of computational people
will look at biology and say well once
we can figure out the the blueprint or
the DNA of a living thing we've solved
biology. It's the source code of our
bodies, we can sequence the DNA, we just
have to figure out what the language of
DNA is. And the biologist then pushes
back a little bit and says, well, there's
a ton of feedback and external
processing, and if you want a good
comparison, look at the source code of
google.com and you ultimately see just a
few years of optimization by the
developers at Google. Well, DNA is a
process of optimization that's been
going on for millions and millions and
millions of years, four billion years, in
every living thing running in parallel,
all right, and so now the computational
person says, well, biology is impossible. I
wouldn't say that it's impossible, but
it's a very exciting very modern
discipline that is waiting for
computation in many cases to come and
continue to revolutionize it because the
revolution is very much ongoing. You
might say, well, why do we sequence
genomes? Well here's a great example, I
mentioned bacteria -- you might say I don't
think bacteria are that cool -- but here's
Darwin's notebook in 1837. It's the first
example of anyone drawing an
evolutionary tree, because he reckoned
that if the species are differentiating
based off of natural selection,
it goes to reason that they would branch
out over time as they differentiate. And
so the natural construct there is an
evolutionary tree. If you look at the
plot on the right, I call
that the second most important
evolutionary tree ever drawn after
Darwin. It's from a 2016 research paper
where they took the same gene in many
species -- and this is a simplified version
of the evolutionary tree -- but you take
the same gene in every species, produce
an evolutionary tree for it, and what you
see is that tiny green sliver down there
in the bottom right, is eukaryotes. That's
everything you have seen that's alive on
this planet, so most of what we think of
is life. You watch a nature show and
everything there is a eukaryote. You
go outside in your garden, it's a
Eukaryote. All of us were eukaryotes.
Archaean make up a larger piece of
that tree, but by far the biggest most
diverse most interesting pieces bacteria
and that kind of makes sense really
because they can live in a lot of
environments that we cannot. They can
produce a lot of things like antibiotics
that we don't produce, that we have to
borrow from them. And so you get this
kind of, these type of revolutionary
rethinkings of biology if you're able to
move towards a data-driven science and
that's what biology is today and then
you might ask, how did they produce this
evolutionary tree? And you could consult
literally a thick textbook of different
algorithms that may need to be applied
in different circumstances and then you
really need to understand that rely on
very advanced mathematics in some cases
or statistics in order to produce a
reliable evolutionary tree. So there's
computational biology again. You might
say, well, why do I care in terms of on a
human level? This is actually the first
person whose life was ever saved as a
result of genome sequencing -- a boy who
had I think dozens of surgeries.
Doctors couldn't figure out what was
wrong with him. As a result they
sequenced his genome, found that he was
defective for a given gene, and then gave
him essentially gene therapy for the
gene that he was lacking and it made him
Better. So they found what was wrong with
his genome, they had a genetic disorder
that was very rare, and they were able to
give him therapy for exactly what he
Needed. The hope is that this type of
thing will be much more commonplace in
21st century medicine that you might be
able to use this blueprint if we can
figure out what the source code really
means
as a weapon against disease that you
would have a map of what your future
medical history might be and so you
could use that with your doctor to start
trying and heading off diseases that you
wouldn't know might be around the corner.
There's also exciting gene editing
technologies that rely on a lot of
understanding all this and computational
Approaches. We're now at the point where
because there's a Dwayne Johnson movie
about gene editing I can assume that
many people have become familiar a
little bit with gene editing, but beyond
the headline grabbing things like
“designer babies.” You also have very
practical reasons for why you would want
to edit genes. One would be, well, imagine
if we can start editing the genes of
crops to be higher yield, tastier,
healthier, and to be able to get those
crops into a supermarket
around the world for a low cost. We have
ten thousand people in the US who die
every year waiting on an organ
transplant in the hospital. Imagine what
we could do if, for example, you could
unhook a pig heart and hook it up into a
human. It might sound crazy, but that is
functionally possible. So we slaughter
millions of pigs, for example, every year
in the US alone and the heart will
function if you transplant a pig heart
into a human, but the problem is that
there are remnants of viruses that have
attacked the pigs over time and the
human immune system will recognize those
parts of the DNA as invaders and so you
have kind of an immune system response
against the heart. Now, what would happen
though if you just edited out all those
problematic parts of the pig genome and
made it look more human to the human
immune system? That's a very active area
of research, it sounds like science
fiction but a lot of people are hopeful
that it's actually going to be something
that we roll out with the goal that at
some point in the 21st century. No one
needs to die in a hospital waiting on an
organ transplant because they don't have
a donor. One more would be that we can
move beyond the DNA level all right, and
so computational biology has
grown far beyond just the study of DNA
the next thing that we might look at is
being able to understand well how is it
that the genes themselves are turned on
and off at different levels at different
time points with respect to different
Stimuli? So, for example all of your cells
have the same DNA but the genes
themselves get turned on and off like
this big sound board at different levels
depending on whether, for example, it's a
bladder cell or a t cell or an epidermal
cell or an endothelial cell etc., okay, so
you have all these different types of
cells and how is it that your liver
cells and your bladder cells in your
heart cells are different in terms of
how they turn these genes on and off?
Well, some statistical and machine
learning approaches have only really
recently, along with research
technologies that will allow us to
measure the sound board levels in a
single cell, these are allowing us to
produce beautiful charts like what you
see here on the left. This is from the
Chan Zuckerberg Biohub, so that comes
from Facebook funding via Zuckerberg, where
they were able to again using stats and
machine learning differentiate different
cells in the mouse and see clustering
based off of understanding how these
levels of gene expression are turned up
and down in different cells. That's very
powerful and of course it required a lot
of biological knowledge, too, but it's a
very powerful thing to do because now
you can say well how are those, how do
those blots look like in a in a diseased
mouse? How is it that they compared to
humans? What about a healthy human versus
a diseased human, can we understand that
much better? Because we're still at the
beginning of getting this framework, at
CMU we have relatively recently gotten a
piece of a fifty five million dollar or
fifty four million dollar NIH grant
where we're working on the software used
for the kind of producing you know
software for helping biologists in the
lab understand after they've produced a
data set be able to work with that data
set and understand the analysis
computationally of it. Drug discovery
is another area that needs computation
very badly. So in computer science you
may have heard of Moore's Law. In drug
discovery over the past several decades
we've had Eroom’s Law, which is
Moore's Law backwards, and so it says
that at the per dollar spent you are
getting an exponentially decaying number
of drugs. So in terms of
inflation adjusted dollars we used to
spend about a billion dollars and we
would get about a hundred drugs out -- that
was in the 50s -- so if you imagine this
pipeline is putting money in and getting
drugs out on the other end after
clinical trials and and regulation, now
we spend a billion dollars and we get on
average one or fewer drug. So we get one
percent for the same investment in
research and development in terms of
drugs falling out the other end of the
Pipeline. There's hope that this is
getting, you know, a little bit better
with computational approaches, say, for
example, simulation. I'll show an
animation of ultimately what a lot of
understanding how drugs say interact
with potential protein targets boils
down to. So here's a classic problem in
computational biology where I may have a
protein that is represented as a strand
of amino acids from a twenty character
alphabet and then via some
process this strand forms into a
three-dimensional protein shape. Even if
you untangle the protein right or just
sequence the strand it will essentially
always form into that three-dimensional
shape. So there's a clear problem here
which is to understand how the order of
amino acids somehow determines this
final three-dimensional shape. If we can
understand that, we have a lot more
understanding of how things interact on
the molecular level. We can start
building better and better simulations,
say, for drug discovery and this is a
problem that a lot of people are working
on today. It's also a problem that is not
New. The Russian Academy of Sciences
Protein Institute has been working on
this for over 50 years. So it was founded
in 1967 and this is...
they call it fundamental research on the
protein problem. And so that means
figuring out how a protein is forming
into a three-dimensional shape and how
it’s interacting with its environment.
There's another problem area of
computational biology which is analyzing
medical images we're now at about 20,000
petabytes of medical data
worldwide and most of it about 90% is
stored in images. So here's a
computational problem for you -- if I give
you a lot of these images, say they're
images of skin cells with lesions on
them, you want...let's see if you can
train a computer to beat a doctor's
diagnosis of those skin cells. So there
are two examples of where this has been
applied in a really cool fashion -- one is
with skin cancer, and the other is with a
condition called diabetic retinopathy
where algorithms have now been developed
that can look at these images and
diagnose the condition of say a
cancerous skin lesion or diabetic
retinopathy better than a doctor. And so
we are hopeful that every condition may
start to see automated diagnosis that
will help us reduce medical costs and
improve patient health. There is one
caveat I should say there, which is that
if you ask one of the best
dermatologists in the US to classify a
skin lesion for example as benign,
cancerous, or growing unexpectedly, but
not necessarily cancerous -- so you have
three classes that you've got to put all
your images into -- a doctor gets that
right about two times out of three.
This is a part of why you you have so
many biopsies, because the human eye can
be tricked by skin cancer. The algorithm,
you say, how accurate is it? Well the best,
the state-of-the-art algorithm I think
is currently about 70% accurate so it
does beat a doctor currently but it's
not beating a doctor to such an extent
that we're going to just turn everything
over to a computer. We still have so much
work to do in understanding the
underlying
biology and medicine and applying this
to real-world problems with computation.
Cellular images are another area where
you might think everything has been
understood but we are still seeing
revolutionary developments in microscopy.
Here's for example something that two of
our alumni working at the Allen Cell
Institute in Seattle produced. They were
looking at bright field, very simplistic
images of cells. What's nice about these
is even though they're simplistic, when
and you're just essentially shining a
light at the cells
and getting this black and white picture,
you're allowed to not, you don't have to
kill the cells so you're able to see the
cells in their living environment. Now
what are these colors? These colors are a
deep learning model that they trained,
that was able to infer different levels
of organization, whether it's seeing what
the DNA is or seeing where the cell
membrane is in these very simple images,
and they're even able to use this so
they can see this at different layers of
the cell and so this is amazing because
it's more really than what the
human eye can see -- it's bringing this out
and even inferring the 3D shape of these
cells to let researchers see what they
couldn't see before with this type of
microscopy, so it's an amazing
development to be able to see all that
hierarchy from such simple images, and
it's cool because our alums did that. The
next thing I would show you was, would be,
well here's a hot take that somebody
fortunately had in 2017 in a short paper
that they published that I don't have to
propose, I can simply point to somebody
else where they say, well, biology used to
be this very fixed discipline and
everything was experimental and
everything had to be done in the lab, and
there was this random little offshoot
called computational biology, but that is
no longer the case today.
Biology is now fundamentally a data
science, and if you're working in the lab
you're probably generating these very
large data sets whether it's images or
DNA or protein analysis and it would be
very strange for you to not be well
versed in computational approaches for
analyzing it, as well as understanding
that we still have very many unsolved
problems in biology.
It's hard to see how we're going to
solve them without computation. So
it's an unbelievably exciting field to
be in.
Now that's part one of my talk, and I
love telling students about kind of what
computational biology is because you
know what biology is thought of and what
the frontier of biological research
actually are. There's such a big
difference there and it's so cool
getting to connect students to that. In
part two I wil tell you a little bit
about our computational biology major
and ways of studying computational
biology at CMU.
So the first picture I will show, we've
got these honeycombs, so all of our
majors at in the School of Computer
Science have these honeycombs. We
started these honeycombs with the
thinking that, well, computational biology
is sitting at the intersection of a lot
of different things, so we should have a
core of course work, and it should pull
from different cores of study. And some
of these cores of study are going
to be analogous to other courses that
other students at the School of Computer
Science take. So, for example, we need to
have a CS core in our major, and
that's very similar to the CS core
coursework that a CS major or an AI
major or a Human-Computer Interaction or
whatever SCS major would take. We also
have analogous mathematics and
statistics and of course we have the
same humanities requirements too so
that's that orange hexagon is the
general education where everyone has
certain humanities and arts requirements
as part of their degree to round out
their degree as well. So those three
cores are going to be very similar among
the majors if not identical. Our core is
going to pull coursework from those
areas so if you take a course like
computational genomics it requires our
students to have taken several
mathematics courses in turn including
probability and statistics. Students will
have typically taken one or two biology
courses at that point as well, and
students will be strong programmers and
will have taken all the way up to
Introduction to Machine Learning, and so
that course now is able to assume that
wealth of background from students and
so it's taught at a very high level. It's
a course that you know
with this depth of understanding of
statistics and machine learning and
computation you essentially will not
find at an undergrad level at any other
university, and so we're very proud of
being able to rely on how very strong
our students are mathematically
and computationally and leveraging the
fact that they also take a few biology
courses to understand the fundamentals
of biology that are going into the
problems that we solve. The next question
I get from a lot of people is, could I
major in another SCS major and
computational biology? And so the answer
there is yes. So students feel like they
want to try, you know, every possibility
on the menu of everything that we offer
and that's great, and so we try to find
ways to facilitate that. If you're
outside of the School of Computer Science
it's possible to transfer in, I'll say a
little more about doing a double major
inside of SCS momentarily. But we require
students who are coming from outside SCS
to have an overall we call it QPA, you
can think of it as GPA, a QPA of 3.0 or
better, and then we have six gauntlet
courses of a QPA of at least 3.6.
So here we have a mathematics
course that's taught in the CS
Department, three computer science
courses, a biology course, and then our
introductory computational biology
course and so essentially students need,
you know, four A's and two B's or
equivalent in those courses and these
are very very challenging courses and so
admission from outside of the School in
Computer Science isn't guaranteed. This
transfer is a very high bar
unfortunately because we have high
demand for all of our SCS majors. We do
have an additional major, though, and this
is open to all CMU students. It's also a
good fit for for other majors in SCS and
so for if you're inside of CMU and
you're an AI major for example and you
also want to do an additional major in
computational biology it's just a matter
of clicking a button and you can
sign up for that. If you're outside the
college, so you're a biology major or
you're in the engineering school or, heck,
you're a music or a drama major, you're
also welcome to do this
program, you just need a 3.0 QPA and a
few prerequisite courses to do that. So
you would keep your original major and
then you would take a few additional
courses or several additional courses in
order to complete our additional major
in computational biology. We also have
kind of lower requirements in terms of
not having to do an entire major, so if
you're outside the School of Computer
Science we have a minor available for
for you, and if you're inside the School
of Computer Science we have a
concentration available,
so that's tacked on to your degree so
it's not as much as an additional major
but you would your degree would show
that you have a concentration in
computational biology, and that's one of
the family of concentrations that we
offer that are open to all of our SCS
students. Both the minor and the
concentration are five courses of
Coursework. The next question that people
ask is,well, in terms of the nuts and
bolts, how is it that if I came in and I
wanted to see if I liked computational
biology, try it out, or maybe I know I
don't want to be a major or
concentration or a minor but I at least
want to take a course, what could I do?
And so School of Computer Science students
we suggest to take the same first
semester of coursework -- everybody has the
same wonderful first year advisor and
everybody is viewed as a School of Computer
Science student first. And so you'll have
the same fall coursework, you may place
out of a course or two here or there, and
then there will be replacement courses
so if you place out of a calculus course
then you can take a replacement, more
advanced mathematics course for example.
You may plug in a humanities course or a
science course there in your fall -- we
have very tailored advising, but this is
kind of the standard sequence
of courses we would suggest for students
in the fall. And then we ask all of our
students who are interested in comp bio,
well, take our Great Ideas in Comp Bio
course. It's a second semester i.e. first
year spring course. You won't find a
course like this anywhere else, where we
get students doing computational biology
as a first year undergraduate that
doesn't exist at other universities and
it explores what are the great ideas
that have made us into
our own discipline? What has
revolutionized biology computationally?
And the things that I talked about
before -- evolutionary tree construction,
assembling genomes, metagenomics where we
try and figure out what are the species
present in a sample that may have
thousands of species and differentiating
RNA or gene soundboard levels between
two individuals -- those are the types of
problems and more we talk about in that
class, and we connect students to the
kind of the big work that has made us
our own field, so that they can get a
flavor that in their first year. We also
suggest that students take our excellent Modern
Biology course at some point in their
first year. It doesn't have to be before
our comp bio course and so we would
suggest it as a spring course at the
latest, and then all of our students in
the School of Computer Science will declare
a major in March around the time of
spring break. And so regardless of how
you're thinking when you enter,
everybody has a chance to declare
whatever major they like in March around
spring break of the first year. So that's
earlier than at some universities where
major declarations may happen in say a
sophomore year, but at CMU where we're on
the college model, we know our students
are interested in something
computational and so we ask our
students to declare their major in the
spring of their first year. There is some
fluidity of that, so for example we've
had students go from CB to CS and vice
versa and it's not a complicated process
either so you're not locking yourself
into an eternal decision by majoring in
one major or another but typically
students get a good sense that what they
want to do in their first year and
stick to it so we're happy about that. If
you happen to be in the process of
applying currently or are interested in
applying to the School of Computer
Science and Computational Biology, we
have a checkbox that you can select on
the Common Application when you're doing
that application process. And you might
wonder, well, what are we looking for in
terms of our students? Students who
are great fits for us, you might think, do
I have to be an expert coder? And you
don't have to be an expert coder to be
successful in any of our majors in the
School of Computer Science, and and we would
back that up as well for computational
biology. So if you have some
computational skills, you've been coding
since you were three or four year
old, fine, I'm sure that's a plus. But what
we're looking for are
students who are very strong in math
quantitatively and students who love
biology. So if you love biology and
you're strong at math and you want to
see a very modern presentation of life
sciences as a data science then we think
you're a great fit for us, even if you
haven't programmed before. You
can take an intro programming course
your first semester here, and about a
third of our students in the School of
Computer Science do and they all do
great.
What makes... part three of this is, so I've
talked to you about what the heck is
Comp bio, I've talked to you a little
about, you know, what are the logistical
nuts and bolts of our degree? And part
three is what makes us a unique major
outside of the classroom. And the first
thing I would tell you guys there's me
in a pink shirt instead of a blue shirt
but we like to have fun with our
Students. We have ice cream socials with
our students, and lunches, and cookouts, we
take our students to Kennywood amusement
park for Fright Nights and Octobe,r where
there's a Halloween theme so you get
roller coasters open but you also get
scary haunted houses and so on. We do
this in all seasons, so we do in the
wintertime even when it's cold we'll go
out and take advantage of make the most
of Pittsburgh winters and go to a local
park and do snow tubing with our
students. And so we love having fun with
our students first and foremost, we like
to have a relationship with them that's
more than just a classroom relationship.
Another thing that makes us really
unique, and I would if you're interested
in doing any kind of scientific research
I think this is a big selling point for
us, we guarantee research -- at least one
semester of undergrad research -- to all of
our majors, with the understanding that
some students may do more. Say, for
example, if you're interested in doing an
honors thesis, we had so many
prospective students come to us and tell
us, well, how can I do a research
experience? and it's going to be really
hard for me to do research with the
professor if I'm accepted to a
university where they may have a
thousand students in a given major, say,
in a Computer Science major, how am I
going to get connected to one of the
comparatively few faculty that are doing
research in an area that I want and actually
stand out? We have worked with our
faculty to make a pledge to all of our
students that everyone is able to do
research for credit. So is something
we're really proud of. We're also proud
that our students have self-organized
themselves and they've formed a
student organization for students at CMU,
not just in the School of Computer
Science but across colleges for any
undergrad student who's interested in
computational biology.
This is a relatively new development at
least at the time of my recording this
and we really look forward to doing a
lot of cool things with them whether
it's, you know, student events or bringing
in external speakers or having kind of
hackathons as part of this student
Society. We also love connecting
companies to our students. We
acknowledge that what companies do in
this area can be spread out across the
US. It can be a small startup that's
funded with a bunch of venture capital
funding or it could be part of a huge 50
billion dollar valuation pharmaceutical
company, alright, and so there's a big
breadth there in terms of the
organizations and the companies that are
doing work in the life sciences that's
heavily computational. So we've made huge
efforts to as a department reach out to
companies and get them connected into
what we do at CMU, coming in and having
panels for our students for example.
This was a panel of representatives from
Regeneron who talked to our
students about the kind of scientific
cool scientific work that they do on a
day to day basis. These are the places
that over the past year and a half or so
that we've had on campus or talking or
remotely to our students about the
scientific work that they do. Some of
these places you might have heard of,
some of them you might
not have, in many cases because we
have a lot of new companies, it's a hot
area of research where we see companies
funded every day. We also love connecting
alums to our students, even though when
this is being recorded we’re in the midst of a
pandemic, we have next week an alumni
session where alums are going to
remotely
connect to our students and talk
about what scientific work are they
doing in their day to day lives. We try
to get a good balance of this -- that's
actually oversubscribed, so we try to do
this at least once per semester, and you
know, it's great for students to see at
all levels whether it's an undergrad a
masters or a PhD student to see the
types of things that people are out
there doing in industry or in research
organizations and especially because
it's growing so fast so there's these
companies that alums wind up at that you
know are have just started and are
really taking off. We felt so strongly
about this that we wound up building an
entire careers website to house
information on these organizations in
computational biology because no one had
done it, and so if you check out careers dot
cbd dot cmu dot edu and you don't trust me that
there are a lot of different places that
do computational work, well go there and
look at genomics in the Bay Area and see
what comes up. Look at pharmaceuticals
across the US or in New York City, for
example, and see all the organizations
that come up. You can see there
information about educational programs,
whether it's a summer research
experience for undergraduates or whether
it's PhD programs in computational
biology, and you'll find about 250 or
more because the resource is growing,
companies that do work in this area. We
probably wouldn't have launched this
entire website had we realized at the
start just how many places out there
have really cool, compelling, well-paid
positions for computational biologists.
After all, I always tell students, two
Alphabet’s companies are comp bio. Yes,
Alphabet is famous for Google, but they
have grown and broadened what it is that
they do and two of those sub-companies
of Alphabet are directly connected to
computational biology. Verily does
healthcare and disease prevention
research, Calico is interested in another
offshoot of life sciences, which is
research into human longevity. This is a
moonshot just as the sequencing all the
letters in a human genome being able to
read 3 billion letters in this sub
microscopic
text that corresponds to what it is to
be human.
That was a moonshot, that was a dream at
one point in time. Well, we have another
dream now, which is, how is it that we can
figure out what is longevity really? Are
there ways that we can slow it or
reverse it and imagine how you know this
could shape research in the 21st century
or beyond in terms of figuring out the
key to human longevity, and I always tell
students, too, Google built a search engine,
and look at how valuable that was to be
able to quickly type in a query and get
a piece of information back, but how
valuable would it be to have a
technology that could increase human
health by on average 10 years? How much
would, if you're a student, how much would
your parents have paid for such a thing
when you were born in order to ensure
that you have a longer healthier life,
all right, so there's a lot of
interesting parts of biology and
everywhere you look in biology there's
computation but I always tell students
you don't have to take my word for it
I've got two very prominent people here
and grabbed quotes from them. Francis
Collins is the NIH director and was
integral in sequencing the first human
genome and they asked him, well, if you
were starting out in college today what
would you do? and Francis said, if I was a
senior in college, I wouldn't go the
route of being a lab biologist, which is
the route that he went, he would just
start with computation. He would start
his training as a computational
biologist and go that way. And they asked
Bill Gates, the Microsoft co-founder, what
would you do? And Bill said, well, energy
is one thing I'd be interested in, AI is
another thing, so there's a pointer at
our wonderful AI majo,r and the other, the
third thing is biosciences -- promising
fields where there is a lot of
computation involved and you can make a
huge impact. We're in many senses where I
feel like computer science was at a much
earlier age where there were these
exciting problems and people didn't even
realize all the exciting problems that
were there. So that's my talk, and
wherever you are
I hope that this has been a little
informative, if you've managed to make
it to the end and I'm very easy to find
so I would encourage you, feel free if
you have any questions, don't hesitate to
reach out to me I'd be happy to start a
conversation with you,
and otherwise I wish you the best of
luck and whatever you do
