Artificial intelligence in drug discovery
is a relatively new field.
It's a very important field.
Today, we're speaking with one of the most
prominent voices in AI and drug discovery.
I'm Michael Krigsman.
I'm an industry analyst.
Thank you so much for watching.
Before we begin, please subscribe on YouTube
and subscribe to our newsletter.
You can do that right now.
Alex Zhavoronkov, he is the CEO of Insilico.
Tell us briefly about Insilico Medicine and
tell us the things that you're working on.
We are focused primarily on applying next-gen
AI techniques to drug discovery, biomarker
development, and also aging research.
We focus specifically on two machine learning
techniques.
It's generative adversarial networks and reinforcement
learning.
Those are the techniques we are most expert
in our field.
We use those techniques for two purposes.
One is identifying biological targets and
constructing biomarkers from multiple data
types and also generating new molecules, new
molecular structures with a specific set of
properties.
We were one of the first companies, possibly
the first one, to generate new molecules using
this new technique called generative adversarial
networks--it's kind of AI imagination--and
validate those molecules experimentally.
Give us some context.
What is the drug development pipeline?
Why is it so hard?
Let's talk about that.
Then we can shift to how AI makes that better,
makes it easier.
Drug discovery and drug development is a very
lengthy process.
It's also one of those processes where you've
got more failures than successes.
Actually, much more failures than successes.
It takes more than $2.6 billion to develop
a drug and bring it to the market to address
a specific disease.
That's after the molecule has been tested
in animals.
Also, there is a 92% failure rate after the
molecule has been tested in animals.
When it goes into humans, it fails 92% of
the time.
So, the process is not only lengthy, but also
risky.
Usually, the time it takes to discover and
develop a molecule is around a decade.
People who initiate the process are not always
there when the molecule launches.
The process is comprised of several steps.
The first one is hypothesis generation.
You come up with a hypothesis, a theory of
a certain disease and identify relevant targets.
You theorize about what kind of proteins are
implicated in a disease condition and what
proteins are causal.
Afterward, you go and develop either an antibody
or a small molecule for this protein target.
If you are developing a small molecule, you
usually start with screening large libraries
of compounds that might hit this particular
target and do all kinds of experiments to
see how well those small molecules bind to
this target.
Afterward, you select several hits.
You identify what kind of molecules fit best
for this protein target and start doing all
kinds of experiments on those molecules to
see if they work very well in the biological
system, in the disease-relevant assay, in
a mouse, in a dog, or other animals, and then
you file for IND with the FDA to get the molecule
into clinical trials.
After that process is complete, we are getting
into drug development and starting clinical
trials.
It starts with phase I, which is safety; phrase
II, you test for efficacy; and phase III,
you test for both in a larger clinical setting,
in a larger population.
Then you might want to go for a phase IV or
start launching the product.
Mm-hmm.
And then, post-marketing research.
That process takes more than ten years, usually,
and fails 92% of the time.
With AI, you can really play in pretty much
every segment from early-stage drug discovery
where AI can assist you with a hypothesis
model and, essentially, pulling out the needles
from the haystack with a target ID, with small
molecule identification, with virtual screening,
with generation of novel molecules with specific
properties, with planning your clinical trial
design with enrolment of the clinical trial.
And then, also, for predicting the outcomes
of clinical trials.
Where does AI begin to shorten that process,
make that process better?
If you got the very early steps of the pipeline
and start working on the hypothesis generation
and target identification, usually you have
multiple kinds of paths to pursue.
One path is to look at the literature and
identify promising areas that had been uncovered
by scientists in the past and were published
in peer review literature.
Ideally, these targets, those hypotheses were
not implicated in the disease that you are
looking at by somebody else.
AI can help you mine massive amounts of literature
and also other associated beta types to identify
signals that a certain target might be implicated
in the disease.
We, at Insilico, usually start with grants
data.
We look at biomedical grants that monitor
about $1.7 trillion worth of grant money over
the past 25 years.
Then we look at how those grants progress
into publications, into patents of the clinical
trials, and then into products on the market.
We follow this idea from idea and money to
money, so from money on the market.
We also look at how money becomes data.
So, usually, when the government is supporting
a certain study, the data needs to be deposited
in a public repository for other people to
replicate it and also for the common good.
We try to follow the money into data.
If the data is not there, we try to contact
the scientist and get the data from the scientist
and/or to encourage the scientist to put the
data into the public repository.
We start with text databases, but also link
this data to omics data.
It's basically everything that ends with "omics"
is called omics data, so transcriptomics,
genomics, metabolomics, you name it, so metagenomics.
We work primarily with gene expression data,
so we look at how the level of expression
of certain genes or entire networks change
from, let's say, a health state to disease.
We deconvolute those changes, those signatures
of disease into individual targets, especially
causality models, and identify what kind of
proteins could be targeted with a small molecule.
Then we go back into the prior art in the
text and see if anybody has published anything
that strengthens our hypothesis.
It doesn't necessarily mean that our hypothesis
is wrong if the signal is not there in text
because sometimes the humans just couldn't
really associate a certain target with a disease
using older methods, but it gives us a little
bit more confidence to see that somebody already
touched on this challenge and on this target
before.
Alex, is the key then at this point that the
various AI techniques that you're using enable
you to discern patterns in the data that those
signals, as you said, that otherwise you could
not pick out?
Is that the key issue here?
Yes, but, really, we are aggregating enormous
amounts of data that is just not possible
to process using human intelligence.
We are also aggregating and grooming those
data types together.
Sometimes, those data types are completely
incompatible and it's impossible to just suture
them together using standard tools.
You really need to train deep neural networks
on several data packs at the same time in
order for them to generalize and in order
for us to be able to extract relevant features
that are present in several data types at
the same time.
Some of the data types that we work with are
completely incomprehensible to the human mind,
to human intelligence.
Like, for example, gene expression or movement
or cardiovascular activity scanning or ultrasound,
for example.
We manage to bring those data types together
using AI and then identify relevant targets
that basically trigger a certain condition.
At Insilico, is your core competence in biology
and medicine or in developing the AI techniques?
Is it possible to even split those two?
In our case, we are good at both and we hire
competitively, internationally.
We actually hire through competitions where
we put very challenging tests out in order
for people to try and solve them very, very
quickly.
Those challenges are usually in combination
of developing an AI method plus solving a
complex biological or chemical problem.
However, when you're looking at really great
AI scientists, they are usually not great
in biology or great in chemistry.
They are good at math.
That is why some percentage of our company
are just great mathematicians who are developing
novel methods for bridging chemistry and biology
using deep learning, for example.
Part of the company is specifically focused
on applications of already existing techniques
like GANs and reinforcement learning to existing
problems in chemistry and biology.
Those people are usually on the applied side
and they know both chemistry and biology.
They can talk to the mathematicians and they
can do some basic research in AI as well.
Of course, we just have pure play biologists
and chemists who are also necessary in order
to validate some of the results of our AI.
That's why we have such a large, diverse,
and international team because you really
need to have those three areas covered: the
methods, the applications, and the validation.
We have an interesting question from Chris
Peterson on Twitter who says this; he says,
"Grid-based parallel Fortran programs are
still being used for some pharmacokinetic
and pharmacodynamic studies.
Do you see AI replacing the old school code,
enhancing it, or advancing in parallel?"
I think, currently, we need to advance in
parallel.
Of course, some of the old techniques and
some of the very primitive mol dynamics are
still being used by really top experts in
drug discovery today.
But most of those methods are being significantly
accelerated by high-performance computing
and AI, so typical software that's been around
for a very long time, like Schrodinger, for
example.
The company has been around since '92.
This guy has made major breakthroughs in multiple
areas and kind of managed to advance older
algorithms to solve very complex problems.
I think that at Insilico, we try to reinvent
everything from scratch and we write our own
software.
But, of course, we know many of our collaborators
who would just like to take small pieces of
our big salami that we're developing and play
around with it today.
They might be using some more classical tools
that we cannot get around today.
Ideally, you need to have a seamless pipeline,
which identifies the targets, generates the
molecules, and runs those molecules through
a large number of simulations in one seamless
pipeline.
That's what we are building and that's our
holy grail.
But, of course, many companies, many groups
are trying to do the Lego game and try to
use multiple tools with varying outputs to
solve the same problem.
Why do you develop your own tools?
Yes, just because many of the methods that
we are using are so new that they are incompatible
with the older tools.
There are many groups that claim to do AI
but, essentially, what they are doing is they
are mechanic jobs taking off-the-shelf software
and trying to bridge some gaps in pharma R&D
using those tools.
We don't do that.
We develop everything from scratch, so from
target ID to small molecule generation.
Now, we have spoken about using your techniques
to uncover potential candidates.
The next step is evaluating.
First, we have to uncover possibilities, and
you do that by aggregating all of this data
and then mining that data using the various
techniques.
Now you've done that.
How do you evaluate the candidates that you've
uncovered initially?
Usually, when you are left with a list of
protein targets for a specific disease and
you are trying to prioritize, you try to annotate
those proteins with as many scores as possible.
You are looking at whether this protein target
has ever been implicated in toxicity.
How is it connected with everything else?
Which tissue does it play in more?
How does it interact with other proteins?
Is it druggable?
Is it druggable with a small molecule or with
an antibody?
Did anybody else touch it?
What is the patent space around the molecule?
Has anybody tried taking it into the clinic
with a small molecule or an antibody for a
specific disease?
There are many, many, many, many scoring functions
that you need to consider.
At the end, when you basically are left with
a very small set of targets, then you also
test them in a variety of biological systems
to see which one is more relevant for your
disease of interest.
I'll give you an example case study.
For example, we are very interested in fibrosis.
Fibrosis is not a very simple process to describe
and there are multiple types of fibrosis.
There is IPF, so pulmonary fibrosis.
There is smoking-induced fibrosis in the lung.
There is aging-induced fibrosis in the lung.
We've identified more than 120 types of fibrosis
by comparing normal tissue to tissue inflicted
by a certain condition that is associated
with fibrosis.
We just recently did a case study where we
looked at the IPF, so pulmonary fibrosis,
identified the list of targets for this condition,
and our list was 50 targets.
We looked at when those targets are more active
and more disease-relevant at what stage of
the disease because I think, if you kind of
catch it later or address it later when there
is just so many symptoms, you are going to
be treating the symptoms, not the cause.
In our case, we've identified a large list
of targets that are likely to be very relevant
early in the disease progression.
Then we looked at what targets are novel,
so we looked for novelty, so what targets
people did not focus on as much.
We don't want to focus on old targets.
Then we looked at what targets are druggable,
so where we could actually come up with a
small molecule from within the library or
we can generate a molecule from scratch.
Then we looked at what targets could be validated
in a specific set of assays for fibrosis.
Where is the impact of the AI techniques that
you're using in this?
Usually, it's for scoring.
You identify multiple scores for those targets.
In our case, the target is annotated with
more than 50 scores.
Whether it has been implicated in a certain
condition before, whether it interacts with
other proteins in a specific way, whether
it is likely to lead to toxicity.
Those predictors that basically give you this
kind of score and probability that this target
is the most relevant one, these scores are
deep learning.
We developed them using machine learning.
We have another interesting question from
Twitter.
This is from Shreya Amin.
She says, "How does this type of research
that you've been describing using AI and the
process compare between academia and industry?"
Sure.
It's a very, very good question.
In the industry, in big pharma, people are
a little bit less adventurous.
They are trying to develop the various techniques
to really solve a problem and make incremental
changes.
It's not for publication purposes.
In academia, people are much more innovative
and adventurous.
Of course, they try to publish.
That's where the innovation comes from primarily.
We, at Insilico, we sit in between academia
and industry, so we publish at the rate of
about two research papers a month.
That is a lot for even some of the academic
groups just to also prove the concept and
explain where we're going.
Academics, I think, are much more productive
nowadays, whether it comes to developing new
methods and showing new directions.
However, the disconnect between really good
computer scientists that are developing novel
techniques that might be relevant for drug
discovery, they very often are so far away
from biology and chemistry that they put the
papers out and the paper is really from the
machine learning perspective, but it's really,
really poor from real-world applications.
Very often, they don't really understand that
they overfitted somewhere or if it's a completely
irrelevant output that they are getting, or
input, only after somebody tries it in biology
and chemistry.
Very often, and nowadays it's actually more
prevalent, a lot of people put papers on Archive,
so in a repository, with a catchy title so
it goes viral and gets picked up by the browsers,
by Google, or by some news outlets.
They get recognition and PR for this work,
but then you try to replicate what they did
or even just read the paper carefully, and
you realize that it's not going to work in
the real world.
I think those kinds of papers and those kinds
of efforts, early efforts, by academic groups
specifically, without going through a peer
review, also put a lot of skepticism in big
pharma.
People just don't think that many techniques
are relevant, applicable, or transformative
for their business.
Let's talk about the team construction aspect
because one of the things that you've mentioned
a couple of times is the importance of both
the machine learning capabilities as well
as the biology capabilities.
These are very specialized skills, and so
how do you construct teams that enable both
sides to work together and create something
that one or the other could not do alone?
That's another very good question.
In our case, that's one of the reasons why
we are growing so slowly.
We've been in business for 5 years now, but
we are still 66 people.
One of the reasons for this slow, organic
growth is because it takes time to really
integrate the AI scientists with biologists
and chemists.
It's very difficult to find people who are
good at both at the same time.
Usually, you are good at math or you are good
in chemistry or you really need to have some
good programming skills to be able to do an
API and properly combine your technology with
somebody else's.
We try to work in teams of three or four on
specific therapeutic projects where one person
is very good in chemistry or biology, one
person is good in AI, and another person is
good in just basic IT.
It's basically teams of three or four people.
On top of them, there is an infrastructure,
an organizational infrastructure that helps
manage those teams.
We also separated the pure play AI team from
everybody else, so they could work on the
methods without being brought into the applied
domain.
Getting this kind of talent who are willing
to really contribute to methods development
and develop novel algorithms, that is very,
very difficult.
Getting people who are good in application
of already developed methods, that is rather
easy.
Getting the two to work together, that is
very hard.
To do this we, again, try to pursue organic
growth and work on projects in small teams.
In fact, we have a question from Twitter on
this subject of your business model.
Chris Peterson is asking great questions.
Thanks so much, Chris.
He's asking, "Are you contracted to look for
specific therapies or are you developing molecules
from scratch and hoping to license them for
clinical trials through distribution?"
We've been in business for five years and
we have explored multiple business models.
As an AI company, you have to explore because
otherwise it's very, very difficult to scale
on one business model and it's also quite
risky.
We started as a service company, and we started
partnering with pharmaceutical companies,
with biotechnology companies and, also, venture
funds where we provided a service or provided
a system to them.
We learned the applications that people are
looking for and started developing our own
small molecules, discovering our own small
molecules and then licensing them.
Our current business model is actually very
simple and actually allows us to scale.
We work with venture capital firms that really
know the business of biotechnology and are
pursuing drug development and drug discovery.
They guide us into where we need to identify
targets and generate small molecules.
Then they form teams around those small molecules
and targets from them and let them do a little
bit more validation and development of those
target molecule associations.
What we get, we get a small upfront payment
initially and then we get milestone payments
as the molecules progress through the various
steps of validation.
Then we get some royalties.
Usually, if you consider the BioBox or the
future revenues that might come from the molecule,
those deals are very, very substantial, but
initial payment is rather small.
That is why we have another business that
is a software licensing business where we
license some of our software tools to others
to generate some revenue and ensure that we
are sustainable, consistent, and also get
some feedback on how well the software works;
if we need to add more features.
Okay.
Another business model is that we do have
some joint venues.
For example, a joint venture with a company
called Juvenessence.
They are developing the molecules that we
provide to them.
Okay, so you have a diverse range of things
that you're working on and trying that support
your business model efforts, essentially.
Correct.
But what we are mostly interested in is not
the immediate revenue.
In most of those licensing arrangements and
engagements, we get some data back.
We pretty much became one of the largest data
factories in the world, getting data back
from preclinical experiments.
That's interesting.
We have another question from Twitter.
This is from TrovatoChristian.
He is a biomedical engineer and he is a Ph.D.
student in computational biology in the Department
of Computer Science at Oxford.
By the way, I find it very interesting that
computational biology falls under the Department
of Computer science rather than the Department
of Biology.
His question is, "Are there any examples of
drugs developed by AI only?"
At this point of time, there is no such example.
You always have a human in between.
I hope that in the very near future, we'll
be able to show that the pipeline where no
human was involved from target identification
to small molecule generation might be able
to churn some of those molecules, some of
those promising molecules.
But at this point in time, the experiment
is king.
So, unless you can validate your techniques
experimentally, it won't really go forward.
I have never seen an example of a molecule,
even in mice at this point in time, that is
completely generated using AI.
What's the obstacle preventing using AI to
go from beginning to end?
Well, because of the failure rates in pharma,
in general.
There are very, very few success stories to
train on.
Those success stories are very, very diverse.
In some areas, it's easy to validate whether
your algorithm is producing some meaningful
output.
But, in many cases, you really need to go
and validate at every step of the way.
That is why, when you are building this salami
that is allowing you to go end-to-end, you
need to ensure that you validate every slice
of the salami and validate it internally,
but also validate it with external partners.
That's what we are trying to do as well.
Eventually, that data may be there, but it
sounds like it's just far too early at this
stage.
At this stage, nobody tried to virtualize
drug discovery completely using AI and do
it seamlessly without human intervention.
In many areas, it's actually not possible
just because biology is so diverse and medicine
is so diverse that it's very, very difficult
to have a solution that would fit all.
That's why people are going primarily after
cancer just because it's a little bit easier
to validate and the specific types of cancer,
like for example solid tumors where you can
do a xenograft and see if the tumor shrinks
in a mouse if you feed it, if you give it
a specific molecule.
There needs to be validation at every step
of the way and, at this point in time, those
end-to-end pipelines will work only in certain
therapeutic modalities.
Let me ask you another question from Twitter.
This is from Shreya Amin again, a great question,
an interesting one.
She says this; she says, "Using existing AI
techniques, which areas from the perspective
of types of drugs, diseases, conditions, and
so forth are closest to breakthroughs or have
made the most progress and what's most difficult?"
I'll give you an example that I am very, very
familiar with.
We've got some JAK inhibitors, so Janus kinase
inhibitors that are developed completely using
generative adversarial networks and reinforcement
learning.
I think those are kind of the most promising
techniques for de novo molecular design - period.
We're currently in mice with those, so went
all the way from enzymatic assays to mice,
and showed that we can now achieve selectivity,
specificity with those molecules, and those
molecules have many other properties.
Those are pretty common techniques nowadays,
both the GAN that we used and the reinforcement
learning technique that we used.
It's not something super new, so we actually
switched our R&D in a slightly different direction.
Where is all of this going over the next--I
don't know--three, four years, two to four
years?
Let's not go out ten years.
Over the next few years, where is this going
to be?
I think that companies like ours are going
to put much more emphasis on their internal
R&D instead of collaborating with big pharma
because collaborating with big pharma is usually
a path to nowhere because it's either death
by pilot or they just ingest this expertise
internally and catch up.
But, at the same time, they are so bureaucratic
that it's very difficult to change and, at
the same time, at the CEO level, big pharma
companies are more focused on increasing sales
or buying other companies to increase sales
or to get late-stage clinical assets, so phase
two, phrase three assets.
The internal R&D is actually not being viewed
as a huge priority and, regardless of what
they think, it's fact.
Usually, it's kind of the 15% to 20% on the
income statement that needs to be there because,
otherwise, the investors are not going to
invest in the company.
But the productivity of this internal R&D
is usually very low.
I think that smaller biotechnology companies
that embrace AI and embrace virtualization
of drug discovery, they are going to be very
successful.
There are several cases that I admire in the
industry, like for example Nimbus Therapeutics.
This guy has managed to virtualize the entire
drug discovery and development process and
get some phase two assets to market and license
them.
As the SAI improves and starts solving more
problems in the pharmaceutical R&D pipeline,
so from hypothesis generation, target ID,
small molecule generation, prediction of the
various properties of the molecule in clinical
trials, and better stratification techniques.
I think that people who really understand
the process and can virtualize it will be
the winners.
So far, I know several companies that are
doing this, so some companies are working
with us.
Some are in the stealth mode.
I think they are going to be winners going
forward.
When you talk about drug discovery in two
to three years, it's actually a very, very
short time.
In many other areas of human development,
if you ask me to plan five years ahead, I
won't be able to because things are changing
very quickly.
In pharma, that's not the case.
We really need to do the experiments and get
things right.
Do you want to just very briefly tell us about
the last research you did on either longevity
or smoking?
I know we're out of time, but just very briefly.
[Laughter] Sure.
We just published a very fun paper showing
that smoking accelerates aging.
One of the areas that we are focusing on is
age prediction using multiple data types,
so from pictures, blood tests, transcriptomic
data, proteomic data, microbiomic data.
We use this data to predict the person's age
reasonably accurately and we then look at
what kind of interventions or behavioral modifications,
what kind of lifestyles contribute to that
person looking younger or older.
We did this exercise in Canada.
We worked with the University of Lethbridge
and the government of Alberta to process a
large data set of smokers and nonsmokers of
varying ages looking only at anonymized blood
tests, just very, very few parameters from
a recent blood test.
First of all, we built a predictor of the
smoking status, so now I can, with reasonable
confidence, say whether you're smoking or
not by looking at a blood test but, also,
we showed that people who smoke, they look
older to the deep neural net trained on their
blood tests than nonsmokers.
Once we published, it actually went rather
viral and we got very positive feedback.
For example, my daughter is considering quitting
smoking just because she doesn't want to look
old.
People don't really care about their health,
but they really care about how they look.
If you don't want to look old, just quit smoking.
[Laughter] Okay.
Great advice.
Alex, thank you so much for taking time.
Everybody, please subscribe on YouTube.
Check out CXOTalk.com for lots of videos and
subscribe to our newsletter.
Have a great day, everybody.
Take care.
Bye-bye.
