Data science plays a key role in the selection
of influenza vaccines.
What may sound like an excerpt from a sci-fi
novel is, in fact, a real-life application
of modern data science techniques improving
lives today.
In this video, we’ll talk about viruses
and vaccines.
We’ll explore machine learning’s role
in the preparation of influenza vaccines and
the ways to visualize and analyze genome data
using data science techniques.
(These include ML and different substitution
models).
We’ll also mention platforms, where you
can store and analyze gene data or even your
own genome if you’ve got it.
But first things first – let’s see what
viruses are and how they operate!
What are viruses?
Viruses are small cells, which can cause illness
in different organisms, like birds, mammals,
and humans.
In the case of Influenza, there are two distinct
surface proteins N and H, which it uses to
enter a host or host cells (the H protein)
and replicate (the N protein).
Now, these proteins vary a bit in their structure,
so different versions of them are identified
by a number.
An example of that is the H3N2, which contains
the third variant of the H protein and second
variant of the N protein.
Both H3N2 and H1N1 are called subtypes of
Influenza.
And they’re also the two most common subtypes
to infect humans.
H3N2 is an important example of the flu virus.
Also known as the Hong Kong flu, it caused
a pandemic in 1968, resulting in over a million
deaths worldwide.
The virus was highly contagious and spread
quickly through the population, starting from
Asia and later reaching America, via returning
troops from Vietnam.
By the end of 1969, the virus had reached
parts of Africa and South America, as well.
And if you thought this was bad, hold on to
your hats!
There’s even a more dangerous influenza
subtype: the H1N1, also known as the Spanish
flu.
H1N1 was responsible for the swine flu pandemic
of 2009, as well as the devastating Spanish
flu of 1918.
It was extremely lethal, resulting in over
30 million deaths worldwide.
The reasons behind the high mortality of the
virus still remain a mystery.
While some scientists suggest an unusually
aggressive form of the virus was involved,
others claim it was the circumstances surrounding
the infection (overcrowded and unhygienic
camps during the war) that contributed to
the high death toll.
At this point, you’re probably thinking:
“If this virus can be so dangerous or potentially
lethal, how can we protect ourselves against
it?”
The short answer: influenza vaccines, commonly
known as flu shots.
So, what is a vaccine and how does it work?
Nowadays, vaccines can include forms of a
weakened virus, which our immune system can
train to recognize and deactivate.
In the case of the influenza vaccine, it includes
some forms of H1N1 and H3N2 viruses we talked
about earlier.
Influenza vaccines are formulated annually.
But why do they need to change the vaccine
each year?
The answer lies behind two phenomena in genetics:
antigenic drifts and shifts.
Hold on, wait, what are those?
Let’s start with antigenic drift.
Imagine you have a group of people, stranded
on a raft in the sea.
Over time the people on the raft slowly change
appearances, they grow a beard, hair gets
longer, they get more tanned.
In essence, they remain the same people but
slightly changed.
This is what antigenic drift means - slow
changes over time.
And what about an antigenic shift?
Now, if two people on the raft mix their genomes
(as none of the kids are calling it) and create
a progeny, a.k.a. a child, it will contain
a mixture of both their traits.
So, the antigenic shift is the exchange of
genetic material and the creation of a new
organism.
Because of the antigenic drift Influenza mutates
and changes quickly, making it difficult to
find a vaccine against all possible mutated
viruses.
The antigenic shift also causes the emerging
of new influenza subtypes, such as the H3N1
or H1N1 we talked about earlier.
So, when scientists decide which virus types
to include in the vaccine, they need to think
about how to make it most effective.
And that depends on how closely the vaccine
resembles the types of influenza viruses which
will dominate during the upcoming flu season.
This is where data science comes into play.
Based on existing data about former and current
virus spread and variants, scientists try
to model and predict the future behavior of
viruses, using machine learning algorithms.
To do that, they first need an appropriate
way to handle information about viruses, or
more precisely their genomes.
This is done via analysis of genetic data.
But what’s genetic data, exactly?
Genetic data includes the genome of organisms
or some parts of it.
It usually consists of DNA, represented in
the form of strings.
In the case of Influenza, it contains RNA,
which some viruses have as their genetic material.
Alright!
Once we have our genetic data, it’s time
to decide how to best visualize it.
Though there are many options, we’ll talk
about one in particular.
The staple phylogenetic tree.
Phylogenetic trees, also known as evolutionary
trees, represent the closeness of different
species in terms of their genetics.
Basically, they are a diagram showing the
evolutionary relationships between species.
In the case of influenza, such trees can be
used to visualize different strains of the
virus.
Let’s put all of this together and get to
the final point: prediction using data science.
Using information obtained from phylogenetic
trees combined with different machine learning
techniques, you can model future behavior
or spread of the Influenza virus.
One of the methods involves nonnegative least-squares
optimization, which measures distances between
branches of a phylogenetic tree.
It uses a bidirectional weighted phylogenetic
tree and determines sets of coding changes
on the surface of the H protein.
The model can then identify the antigenic
impact of different influenza strains.
Another way to perform phylogenetic analyses
is to use the PAML package, which contains
programs for phylogenetic analyses of genetic
data using maximum likelihood (ML).
How it’s done?
By taking a set of trees and evaluating their
log-likelihood values under different models.
These models estimate some parameters while
allowing for others to vary.
This way they can incorporate the variety
of gene types in influenza strains and their
surface H protein.
Of course, there are other methods you can
use to make predictions in biology.
Our aim is to provide you with an overview
of two main ones, and we trust you can delve
into and explore other methods on your own
if you find this topic interesting.
And that pretty much brings it into a close.
We went all the way from learning about the
flu and how a virus works, through the history
of the first vaccine and the biggest flu pandemics,
to the antigenic shifts and drifts.
That was fun, right?
We discussed different types of biological
data and their visualization.
Finally, we learned how to make predictions
using different machine learning techniques.
But, before we go, let’s round off with
something about data science and its diverse
applications.
Data science is not just a tool used in the
IT Domain or by large corporations.
It plays an important role in (life) sciences
and its medical and biological applications
are becoming more and more widespread.
In fact, big tech companies like Google and
Amazon started their own genome projects recently,
allowing users to store and analyze their
own genome on their respective cloud platforms.
Microsoft entered the field too, with the
release of Microsoft Genomics on their Azure
cloud.
So, if the big players are on it, it’s a
safe bet to assume that genomes and their
analytics using machine learning are definitely
worth looking into.
Ok, guys and gals.
I hope we managed to shed light on influenza
vaccines and the data science behind them.
If you enjoyed the content of our video, please
click the like button and share the story
with your friends!
And, if you’re curious to find out more
on the topic, you can follow the link to the
article in the description.
Thanks for watching!
