Hello, my name is Britt Glaunsinger
I'm a virologist and a professor at the University of California, Berkeley and
an investigator of the Howard Hughes Medical Institute.
And what I'm going to be doing is presenting a lecture on the fundamental molecular virology of coronaviruses.
These are viruses that have been circulating in the human population and in animals for a long long time.
We know of seven human coronaviruses.
These are present in two of the four known
genre of coronaviruses, the alpha coronaviruses and the beta coronaviruses.
The four circulating strains of human coronavirus are shown boxed in red here.
There are two that we've known about for a long time called 229E and OC43.
These together with the other two, NL63 and HKU1,
which were actually discovered more recently after the SARS epidemic, but are likely also thought to be
circulating in the human population and not recently emerging.
These four circulating viruses are the causes of, some of the cause of the common cold.
Probably 10 to 15 percent of the common cold are caused by these viruses.
We also know of three coronaviruses, which have recently emerged into the human population
through species jumping or zoonotic transfer and these, of course, are original SARS coronavirus,
MERS coronavirus, and the newly emerged caronavirus 2, which is the cause of COVID-19
Each of these like other alpha and beta coronaviruses are thought to have a common ancestor in bat viruses.
And this is different from the gamma and the delta coronavirus genre,
which have common ancestors in birds.
SARS coronavirus and MERS coronavirus as I mentioned probably came from bats
But it's thought that rather than directly jumping into humans from bats,
they entered into the human population first through one or more intermediate hosts, animal hosts.
For SARS the intermediate animal host, or at least the main one, is thought to be civet cats and
for MERS coronavirus the intermediate host is probably dromedary camels
in these intermediate hosts.
The virus probably jumped from bats to these hosts then underwent some rounds of replication in these hosts
and in doing so acquired probably mutations that allowed the virus to then
more easily transmit to the human population.
Now for transmission purposes, we don't know what the other
intermediate hosts may be and we particularly don't know what the intermediate hosts, if there is one, is for
SARS coronavirus 2, the current pandemic strain. It's possible that it came directly from bats.
it's also possible that it went through one or more other animals before jumping into humans.
I should mention that these are certainly not the only coronaviruses in bats.
More than 500 coronaviruses have been identified in bats in China and
the estimates of unknown bat coronaviruses diversity reach into the thousands,
indicating that these are probably massively under-sampled in the bat population.
And I think this is also important because it suggests
that this current pandemic strain of coronavirus 2 is unlikely the last we will see if coronaviruses,
in particular, we have now had three zoonotic jumps of highly pathogenic coronaviruses
into the human population in less than twenty years and with the huge diversity of coronaviruses likely
circulating in bats, I think many
scientists and virologists are quite concerned that they will continue to jump into the future as well.
And so very worthy of continual studying even after this current pandemic is finished
It's worth thinking about comparing the other two highly pathogenic
Zoonotic coronaviruses, SARS and MERS, to the current pandemic. So SARS, which emerged in late
2002 and caused a little over 8000 cases and
774 deaths globally. This was an epidemic that lasted about a year
It was mostly brought under control in 2003 in the last cases were seen
sort of a laboratory outbreak related to 2004. That epidemic has now ended.
MERS, on the other hand, has caused a fewer number of cases
2521 to date with 866 total deaths.
This is actually the most pathogenic of the
viruses and the one with the highest death rate of about 34 percent. Unlike SARS, MERS infections
still periodically occur, and this is probably not through circulation in the human population.
This virus does not transmit particularly
well human to human but these new infections are thought to occur from occasional
recurrent spill overs from dromedary camels into the human population.
So why is it then that the SARS epidemic was able to be brought under control within about a year
whereas we are clearly far from bringing the current coronavirus 2 pandemic under control?
There's several ideas that I've heard discussed on this and I just want to bring up three here
Comparing SARS to the current COVID-19 pandemic.
First, is that the spillover reservoir as I mentioned for SARS coronavirus was known.
This is primarily the civet cat, and these animals could be called then to attempt to break the chain
so that there were not further transmissions from these animals into the human population.
For Cov-2, as I mentioned the spill of a reservoir is not known.
Second, for SARS Cov-1, most of the human transmission occurred in a hospital setting and indeed
these hospital settings were hubs of transmission for that epidemic
and so once this was recognized and the risk to the medical community was recognized,
personnel were able to implement barrier nursing enabled in order to stop transmission of that virus.
Unlike Cov-2, which has not just spread in a hospital setting
but in fact, there is widespread community transmission of this virus.
And then finally for SARS Cov-1, individuals infected with this virus
tended not to transmit until probably 24 to 36 hours after the onset of symptoms
and in general, there was a lack of asymptomatic cases as far as we know.
And so this is really important from a contact tracing perspective and the ability to stall or inhibit spread
of the virus within the population through effective contact tracing and other public health measures.
Unfortunately for Cov-2, the situation is very different,
both in that there are possible and likely maybe abundant either
asymptomatic cases and further screening will be needed in order to confirm that
certainly abundant mild cases,
which are furthering transmission in the human population.
So for these reasons and probably other reasons having to do with the molecular virology and epidemiology the
pandemic that we are experiencing now is very different from the one that we saw in
2003 with SARS and as of this morning is reaching nearly
400,000 total confirmed global cases.
This is not likely reflective of the actual number of cases.
These are just the confirmed cases through screening, but as we know there are currently limitations in testing
So the actual number of presumed cases is thought to be much higher than this.
And as of this morning so March 24th 2020 we were
reaching over 17,000 deaths, seventeen thousand two hundred and fifty-two deaths worldwide.
So as you can see from this graph, unfortunately most Western countries are on a very significant
coronavirus trajectory which is one of basically exponential growth.
While some countries have been able to slow down growth and limit spread, this is largely not the case for
UK, Europe, and the United States in particular which shows a graph that is displaying a clear
exponential spread, and so I think we can anticipate that the number of cases
are going to continue to exponentially grow in the future.
All right, we've heard a lot in the news and from experts about the epidemiology of this virus,
about the spread of this virus, the transmission, and the control,
and so that is not going to be the focus of this lecture today.
Instead, what I want to do is really drill down on the molecular virology of how it is that this virus enters and
replicates within cells in order to amplify itself.
And so I've broken down the lecture into four parts.
The first part, I'm going to discuss how the virus is able to enter cells through
interactions with the spike protein and host receptors.
I'm then going to spend time talking about once it deposits its genome into cells
how does the virus replicate that genome and get its genes expressed
and there are some very unusual and interesting features coronavirus biology in this section.
I'm going to then move on to talking about some of the remarkable cell biological
changes that occur in an infected cell
Particularly, involving membranes and the formation of what are called replication and transcription complexes
during coronavirus replication. And then in the end
I'm going to spend the last few minutes talking about immune interactions that this virus has
with, in particular, the innate immune system as these are likely drivers of pathogenesis of these viruses in
animal hosts and in humans.
Alright, starting with the structure of the viral particle and entry,
we know that corona viral particles are pleomorphic
that means they don't really have a defined structure. They've been looked at by
Cryo-electron tomography to confirm this,
and they also have what's called a helical nucleic acid. So looking at the structure of the virus
I'm showing you here on the left the nucleocapsid which is shown in the center in brown
basically refers to the genome, which is a
30 kilobase huge for an RNA virus huge genome 30 kilobase genome of RNA that is of
positive sense or plus sense RNA.
When we say plus sense, that means it can be directly read by ribosomes in the cell.
That genome is coated with a protein called a nucleocapsid protein
that forms sort of this helical nucleocapsid.
that nucleocapsid-protected genome
basically is encased in a lipid envelope that is derived from the host cell.
Many viruses have a lipid envelope. In all cases, those lipids are taken from the host.
No virus is able to make its own lipids,
but many viruses make use of and steal host lipids for
their replication and sometimes for their morphogenesis.
And so that is the case for coronaviruses where you can see there is a lipid envelope,
which is studded with a number of viral proteins,
the most prominent of which is the spike protein shown in blue.
This is the one that of course gives coronaviruses its name for the corona-like either
halo effect seen during a solar eclipse that looks like this or a crown-like
appearance of these viruses under the electron microscope.
The spike protein, as we'll talk about in a minute, is critical for viral entry process.
Additionally, in red is a membrane glycoprotein called the matrix protein
This is the most abundant protein on the outside of the viral particle and its role is basically
to connect the membrane to the nucleic acid so you can see in sort of the inset there that there is
this is a transmembrane protein, but it has a significant C-terminal domain,
which makes contacts with the nucleoprotein
nucleocapsid protein and that's probably important for the morphogenesis phase of the viral life cycle,
when these virions are formed. And another minor envelope protein called E is present as well.
Also thought to be important for formation of these viral particles at the end of the viral life cycle.
A little bit more about the spike protein, there have now been published a few different
research papers showing structural information for the coronavirus 2 spike protein.
And what this, I've pulled this from one of the papers which is cited below and what this is showing here
is the structure, a cryo-electron microscopy structure of the coronavirus 2 spike protein overlaid
showing sequence conservation of related
spike proteins from other coronaviruses that are basically plotted onto the SARS-CoV-2 spike structure.
And these are then color-coded based on their level of conservation across these related viruses.
And so what you'll notice from the spike, this spike is a trimeric protein.
What you'll notice is that there are sort of two domains.
There's this upper globular domain which is the receptor-binding domain.
This is the thing that engages the host cell receptor and we'll talk about that on the next slide and
then in this domain you'll see that there are many residues that are colored in sort of a teal color
And this indicates that they are highly variable.
Indeed the receptor-binding domain in the spike protein is the most variable part of the coronavirus genome
and this tends to be common for viruses in general.
This is a region of viruses that are under intense evolutionary pressure
because of interactions with the immune system.
The lower part of this spike protein is the part of the protein that encodes and possesses
the fusion machinery that is important for the entry process, and you'll notice this in purple
is much more conserved and also that is sort of a classic finding that the fusion machinery tends to be
very conserved. And tucked in the center of the fusion machinery is actually this hydrophobic fusion peptide,
which is very important for being able to fuse the viral membrane with the host membrane so that the virus can
deposit its nucleocapsid payload into the cytoplasm of cells.
So what does this entry process look like?
Well, as I mentioned, the spike protein is the protein responsible for engaging a cellular receptor.
And this Is, you can think of like a lock-and-key mechanism?
where the key is the viral glycoprotein and the lock is the cellular receptor.
Different viruses will use different cellular receptors
as a way of getting into cells. The receptor we know for both SARS-CoV-1 and
for CoV-2 is the same protein.
It's a cellular protein called angiotensin-converting enzyme 2 or ACE2.
And that binding to that protein is important but it is not enough you need a second
Feature to happen and that is a proteolytic cleavage event.
And this is carried out by a cellular protease called TMPRSS2 and perhaps others, but that one
people have suggested as clearly involved for coronavirus 2 entries.
So what happens is that the spike protein interacts with the receptor.
This protease then comes and cleaves the spike protein.
Actually, there are two cleavage events at least two cleavage events that are known for
SARS coronavirus and probably CoV-2.
These cleavage events the first one what's happening is that the receptor-binding domain of
the spike protein is being separated from the fusion domain and the second cleavage
event, which is not shown here is actually an activating fusion
event that activates the fusogenic state of this protein.
And so that allows then subsequent entry, which for coronaviruses may occur at
directly the plasma membrane may occur upon into cytosis or may occur at both sites.
That really hasn't been resolved
So the spike protein is really a classic
class 1 fusion protein and there are a number of viruses that have fusion proteins of this type.
The best characterized are influenza, the hemagglutinin protein for that.
There's Ebola virus fusion protein is also class1. HIV fusion protein is also class 1.
And so what I've outlined here on the bottom is the basic stages that are known to
underlie the fusion mediated by these class 1 fusion proteins.
So first, in the pre-fusion state, you can think of this as almost
sort of like a metastable state for the fusion protein.
And prior to proteolytic event that triggers the fusion process,
this receptor binding subunit, which has not been cleaved off yet,
basically, you can think of as sort of clamping the fusion subunit and keeping it tucked away and inactive
until the viruses encountered the
appropriate host cell and it can be activated by these proteolytic cleavage events.
So protease cleavage that we talked about then causes the receptor binding subunit to move out of the way and
that unclamps the fusion subunit so that it can then form a pre-hairpin that is embedded
into the target membrane of the cell and this occurs through the fusion peptide.
The fusion peptide is a stretch of hydrophobic amino acids.
Usually, which means that they can be inserted into the membrane.
This pre-hairpin then starts to fold back, basically forming a six-helix bundle and
progressively pulling the cellular and viral membranes together to promote fusion. And the final post fusion
conformation in these class 1 fusion proteins is always a trimer of hairpins.
And by this mechanism then once the fusion has occurred the viral nucleocapsid with
the genome payload can be deposited directly into the cytoplasm of the cell.
Some early studies that have now emerged from SARS-Cov-2 indicate that there are some interesting
features that are different between its spike protein and that of the original
SARS-Cov-1. And the first difference is that
scientists know from research with the spike protein of SARS-CoV-1 that there are basically
six critical amino acids within the receptor-binding domain that are necessary for
interaction with the ACE2 receptor and interestingly five of those six residues are different
for SARS-CoV-2 than for SARS-Cov-1.
Nonetheless, CoV-2 is still able to quite efficiently interact with the ACE2 receptor.
The second notable difference is that
uniquely SARS-CoV-2 seems to acquired a polybasic cleavage site.
This polybasic cleavage site is
interesting and important because it's predicted to enable cleavage by other cellular proteases
beyond the one that we talked about.
It may also enable efficient cleavage by the cellular protease the TMPRSS 2 protease
known as the sort of the canonical one that's been thought about for this virus.
And is particularly important because
insertion of a polybasic site in other viruses
Has been shown to increase transmissibility, particularly for pathogenic influenza viruses.
So it's going to be important to figure out whether the same is true for SARS-CoV-2.
Okay, so that covers entry and we're now going to move on and talk about what happens
to the viral genome once it has moved into the cytoplasm of the cell.
Well, the 2019 CoV-2 genome has been annotated and depending on the annotation that you look at,
it's thought to possess about 14 open reading frames,
encoding an estimated 27 or so proteins.
Now let's think about this for a minute because it's kind of remarkable.
Remember that the viral genome is a single
stretch of RNA that is incredibly long. It's 30 kilobases long.
But for any virus, the same is true for coronaviruses, once that RNA is deposited into the cell,
the ability to translate or generate proteins from that RNA is
requires the virus to basically follow the gene expression rules that are set by the host cell.
And for eukaryotes, translation is a process that's generally a monocistronic one, which means that
a ribosome comes and recognizes an RNA in the cell.
It will translate generally one open reading frame - one gene from that RNA
before recycling and falling off.
This is different than prokaryotes, which of course have multicistronic RNAs
where multiple proteins can be translated from the same RNA, not generally true in eukaryotes.
So, how is it that from one RNA then this virus is able to express 27 different proteins?
Well for coronaviruses, there are at least three solutions or three well-known solutions that the virus has
evolved in order to solve this problem of expressing many proteins
using the eukaryotic rules of gene expression and translation.
And we're going to talk in some detail about those.
The first is that if you'll notice a large portion of the genome is made up by a single open reading frame,
called open reading frame 1, which is separated into just sort of two sub-open reading frames, 1a and 1b
This is a giant open reading frame that is basically translated into what's called a polyprotein.
It's a series of many proteins
fused together with no stop codons intervening them to generate one giant protein,
which is then proteolytically processed and we'll have that on the next slide.
This protein is also generated in two different forms through the use of a
programmed ribosome frameshifting event, which will also talk about on the next slide.
That gets you translation of all of the open reading frame
encoded proteins in that portion of the genome, which are generally the nonstructural
proteins of the genome.
But it doesn't get you translation of all of the structural and other accessory proteins
which are found on the 3-prime half of the genome,
the 3-prime end of the genome. And for these to be made, the virus uses a very unusual strategy of
discontinuous transcription that produces something called subgenomic RNAs
and we'll discuss those as well.
So to start, remember that the virus has to first make proteins
that are going to be necessary for it to be able to copy its genome and
transcribe the rest of its genes and for any RNA virus this requires an
RNA dependent RNA polymerase or an RdRP
If you are a plus sense RNA virus like coronaviruses are
your incoming genome is basically recognized as a messenger RNA, it's ribosome ready.
So you don't have to package that RNA dependent RNA polymerase complex or protein in your virion
because it can be directly translated from the genome and that is what happens.
That's what is encoded by this giant open reading frame 1a or 1ab.
So this is made as I mentioned into a huge polyprotein. Within this polyprotein are two
proteases that the virus encodes and these proteases, the job of those proteases is basically to now cleave this
giant polyprotein as shown here on the left
in the lower pullout into the individual proteins which are going to have separate
functions for viral gene expression and replication
and so you get proteolysis to generate many different proteins from one initially translated polyprotein
Additionally, you'll notice that this polyprotein as I mentioned
is not just translated as one giant open reading frame to start
There's a frame shifting event.
So a portion of the time, maybe 50 or 60 percent of the time,
the ribosome will read through and there's a stop codon at the end ORF 1a, so it will stop there.
However, the remaining percentage of the time the viruses
enables the ribosome to actually read through that stop codon and continue translating down to generate a
longer ORF 1ab fusion.
And that programmed translation read-through
is a frame shifting event that is governed by two properties of the genomic RNA.
The first is that right around that stop codon,
there's something called a slippery sequence and this is shown in in the RNA diagram on the right
the sequence of UUUAAAAC
and when the ribosome lands on this site it's known that it tends to have a propensity to
occasionally slip back out of frame.
Now the frequency with which that frame shifting event occurs can be
increased, and is increased, in these coronaviruses
because just downstream of that slippery sequence is what's called an RNA pseudoknot structure.
This is basically a highly stable RNA structure that
causes the ribosome when it encounters it to pause so the structure is thought to interact with the ribosome
causing the ribosome to pause over the slippery sequence, which
increases the chances that it will slip back out of frame if it slips back one nucleotide out of frame that
stop codon at the end of ORF 1a is no longer read as a stop codon
and the ribosome can continue to translate through and generate the rest of the viral polyprotein.
Okay,
then that protein is processed as I mentioned, but how do you get
production of all of the rest of the proteins that are found on the 3-prime end of the viral genome?
The structural and accessory proteins.
These are made from basically a nested set of what are called subgenomic RNAs
that have they're all 3-prime coterminal.
So this is important if you think about it for how these are going to get their proteins expressed.
These are not polyproteins,
but by having this nested set of subgenomic RNAs,
what this enables the virus to do is have each of these genes on the 3-prime end of the genome,
have a chance to be present as the 5-prime most open reading frame on a messenger RNA.
Let's think about it this way where if you are generating an RNA,
for example, where in this case gene 2, which would represent
the spike protein, for example, is at the 5-prime end
the ribosomes are going to come translate gene 2 and
everything downstream of it based on the eukaryotic rules of translation is basically going to be viewed as
UTR sequence - untranslated sequence.
So only gene 2 will get translated into protein 2 or spike in this case. The same thing
if you generate a transcript in which now gene 3
has the chance of being the 5-prime open reading frame that will get translated into protein
and everything downstream will be untranslated sequence.
So every gene at the 3-prime end of the,
every open reading frame at the 3-prime end of the viral genome has a chance to be the
5-prime open reading frame on a messenger RNA, allowing it to get translated.
How this happens is
quite fascinating and involves another feature, which I hope you have noticed here on these
RNAs that I've drawn and that is that in addition to being 3-prime coterminal all of them have the exact same
sequence at the 5-prime end and that exact same sequence is the sequence
that is the same at the 5-prime end of the genomic RNA called the leader or L sequence.
So how is it that you are able to get the same sequence, which is not
present within this 3-prime end of the genome,
how are you able to fuse that to the ends of each of these subgenomic RNAs?
And that basically the answer to that underlies how these are produced
It involves a series of sequences called transcription regulatory sequences or TRSs.
At the junction between each of those genes encoded by the virus as
well as at the
the 5-prime end of the genomic RNA just downstream of the leader sequence,
which is denoted in red here,
are these conserved TRS sequences these transcriptional regulatory sequences.
And so as the polymerase is coming and copying the genome,
it's going to reach these TRSs, which are at the 5-prime end of each of the genes.
There's a core highly conserved sequence within these TRSs.
This is called the core sequence or denoted as CS here in yellow.
And so once the polymerase gets to these TRSs and copies this core sequence
it can either continue to copy or
it will now jump from that sequence, probably through a long-range RNA-RNA interaction and
base pair with the same core sequence that is part of the TRS at the 5-prime end of the genomic
RNA that is just downstream of the leader and then the polymerase will continue to transcribe there by
capturing the leader sequence.
So this looks something like this, where the nascent RNA is shown in red. The RNA polymerase starts to copy.
You'll see that the TRSs are present at just upstream of each of the genes in the virus.
As the polymerase gets to one of the TRSs
it either can read through that TRS and go on to the next one or
it will jump and translocate basically to the TRS at the extreme 5-prime end of the genome
finish its transcription to generate that fusion with the leader sequence.
So this is discontinuous transcription.
It allows for the generation then of a series of these subgenomic templates.
Remember, these are copied from the plus sense RNA genome.
So these are now negative sense or minus sense RNAs.
They're complements of the genome but not ribosome ready themselves.
For that the polymerase now has to go back make copies of these minus sense subgenomic messenger
messenger RNA templates to generate the actual
positive sense messenger RNAs that can be translated.
It's worth thinking about this mechanism of discontinuous transcription
means that there's a lot of polymerase jumping
and probably facilitates what are known to be extraordinarily high
recombination rates within coronaviruses. As high as I've heard estimates of about 25%.
Most RNA viruses and plus sense RNA viruses have vanishingly low levels of recombination
and so this is a unique feature to coronaviruses which may be interesting in regards to how they evolved.
And perhaps also how they are able to maintain such enormous genomes.
So this discontinuous transcription mechanism is quite complex and is orchestrated by a replicase
that includes the polymerase, but many other proteins as well. And this replicases complex
requires functional integration of the RNA polymerase, capping, and
proofreading activities as well as other things. And so what I'm showing you here on the left is a structure of
basically what people think is the polymerase holoenzyme this is made up of the nsp12
RNA dependent RNA polymerase itself
together with two other nonstructural proteins nsp7 and nsp8,
which are thought to help with processivity of the RdRP. As I mentioned,
this is thought of as perhaps the core holoenzyme of
the polymerase and it is believed to be able to initiate de novo primer independent RNA synthesis.
In addition the complex is
associated through protein-protein interactions with another nonstructural protein
called Nsp14, which is a bifunctional protein
that has both capping activities and an exonuclease activity,
which turns out to be a real paradigm-shifting activity for
how scientists think about RNA virus evolution.
I'm going to spend some time talking about that
but first to mention that it's not just this but these proteins mentioned above
but in fact, there are a variety of other
viral processing proteins and activities associated with the replicase complex
not all of which are well biochemically understood as well as from an
undefined set where at least incompletely defined set of cellular proteins
that may participate in its regulation as well.
So a very complicated replicase complex involved in
orchestrating this discontinuous transcription mechanism.
Right, back to this exonuclease that I mentioned as being part of the polymerase complex.
Turns out that the theoretical limit for how large an RNA virus
genome can be is about 30 kilobases. And this theoretical limit
comes from the observation that in RNA viruses,
which all have RNA dependent RNA polymerases
these RdRPs do not have proofreading capacities. This is different than polymerases in our own cells.
And this means that they are error-prone, and this error-prone capacity of RdRPs
underlies the massive
evolution that occurs during replication of many RNA viruses to generate things called quasispecies
and mutant swarms that are highly characteristic of infections like HIV and influenza.
And what it also means is that most viruses actually don't even come close
to that theoretical limit of 30 kilobases.
Most RNA viruses are in a well below 20 kilobases
and most are in the you know, sort of 10 to 12 perhaps kilobase order.
Now this there are viruses like as I mentioned coronaviruses and others in
a larger grouping of similar viruses called Nidovirales
that have shockingly large
RNA genomes - 30 kilobases. We even know some that are now beyond 30 kilobases.
So even exceeding the theoretical threshold. And within these viruses,
only these viruses, not all of them
but many of them, have this exonuclease activity that's present,
and so this led to the idea that this exonuclease activity could actually be conferring a proofreading
function on the RdRP, which as I mentioned was a real paradigm shift in thinking about
the how RNA dependent RNA polymerases
might actually be able to proofread.
So indeed in SARS coronavirus
if the ExoN, this exonuclease gene is mutated
and then the number of substitutions or
mutations that occur during replication of this virus are measured, you can see here from this graph
that compared to the number of mutations that occur in the wild type virus,
there's more than a 20 full jump in the mutational frequency in the virus lacking this ExoN activity.
So you can see this spread across the rest of the coronavirus genome
here first focusing on the upper panel.
Where in dark in black basically are the lines showing the
frequency of mutation in the populations during infection with a wild type virus
and the gray lines show the same thing during infection with the ExoN mutant virus, and you can
see that there's a significant increase in a number and distribution of mutations
that are acquired. This also renders these viruses
in mutation of ExoN and renders these viruses hyper susceptible to mutagens as shown in the lower panel,
which include here what was tested is 5-fluorouracil, and so you can see of course 5-FU treatment,
which is a mutagen increases the mutational frequency of the wild type virus,
but further increases the mutational frequency of course of this ExoN mutant.
So it's also interesting that you might expect that if this exonuclease activity was
what allows the viruses to reach these enormous
genome lengths that it would be absolutely essential for the virus
and for some virus as it is. That they cannot tolerate a mutation in the ExoN, but SARS coronavirus
and some others in fact, while they are attenuated
mutants can evolve and adapt over multiple passages to stabilize
populations and actually prevent lethal mutagenesis.
And so the location of these will be you might think of as sort of suppressor mutations
on the genome, would be expected to do things like increase the processivity perhaps of the
RNA dependent RNA polymerase, and they may be doing other things as well.
So that I think is a really interesting concept to think about and in fact
in the murine betacoronavirus called MHV
an ExoN mutant there showed clear promise as a vaccine strategy at least when used in mice
because it was an attenuated strain, but subsequently allowed protection from a
challenge with a wild-type strain.
This nsp14, which is the exonuclease is really a fascinating protein.
It's a bimodular protein that is composed of two different domains that basically
have two different activities. So that there's this
ExoN domain here, which is involved in proofreading and then
there's also a domain that's a methyltransferase domain thought to be involved in
messenger RNA capping reaction.
And these two domains are separated by a flexible hinge region and probably allows them to have
orient the protein in different ways as these different functions are needed. And the ExoN works in
concert with another nonstructural protein called nsp10. Together these operate as a heterodimer
and they function in basically a mismatch repair mechanism.
So actually ExoN, this proofreading activity can efficiently excise ribavirin,
which is a chain terminator
that is commonly used as an antiviral against many different RNA viruses.
But is known not to work against coronaviruses
and that's because this proofreading activity can basically remove
That nucleoside analog and allow the virus to continue to replicate
It's been shown with this mouse
coronavirus, MHV, that an ExoN1-knockout is
inhibited more efficiently than the wild-type virus by Remdesivir,
which is another nucleoside analog that's being explored extensively right now for its potential to block
CoV-2 replication.
And what that suggests is that ExoN probably also reduces the incorporation of Remdesivir as well.
And so for that reason it's probably going to be beneficial to perhaps try
simultaneous targeting of both the RdRP with Remdesivir
and ExoN with some sort of a specific exoribonuclease inhibitor as well.
Alright, so
now having explained how the virus is able to replicate its genome
and get its genes expressed through this incredibly sophisticated
replicase complex, I'm now going to move on and talk about
where this happens in a cell because it turns out that the virus is able to form these very
intricate membrane
structures called replication and transcription complexes.
So, these are basically
interconnected double membrane vesicles where viral replication and transcription
can occur. And I'm showing you here some images from a reference that I've cited below that
Are come from cryo-electron
tomography of coronavirus-infected cells. And you can see on the far left
an EM image showing one of these classic double-membrane vesicles that are formed in infected cells.
And a more zoomed out image of that is shown in the center
where you can see that the cell is basically now
contains many of these double-membrane vesicles
and on the far right, what you're seeing is a 3D surface rendering from a cryo-electron
tomograph of these, where you can see that in purple shows the inside of these membranes.
And many of them are actually interconnected in that the outer membrane sort of encapsulates
multiple of these vesicles at once.
These convoluted membranes are derived from the endoplasmic
reticulum, and as I mentioned many of the double-membrane vesicles that looks from these
from these tomography experiments are actually interconnected by their outer membrane
and are part of an elaborate network that's contiguous with the rough endoplasmic reticulum.
Inside of these compartments is where viral replication and transcription is thought to occur. And so this is
works for the virus and probably benefits for the virus in multiple ways. First of all, by compartmentalizing
they can protect their genome from potential attack by antiviral mechanisms or other
exonucleases or nucleases that might be present generally in the cytoplasm.
It also can help them concentrate the factors necessary to efficiently replicate and transcribe the viral genome.
Because these replication compartments, these are RTCs are essential for replication of the virus,
these are have been discussed as potential antiviral targets by trying to disrupt this membrane formation.
There's been a lot of work trying to explore how these are formed. And what is known is that there are integral membrane
proteins that are part of the replicase complex that are thought to function in vesicle biogenesis.
And the three replicase components that are predicted at least to have
a transmembrane domains in them are nsp3, 4, and 6.
And these are thought to be directly involved in vesicle formation. In a study that I'm citing below here,
it's been shown that two of these nsp4 and nsp3,
when expressed alone outside of the context of infection, are actually sufficient to drive
to drive
these double-membrane vesicles
formation and it's thought that this occurs by an interaction between the
luminal loops of these proteins that drive the membrane curvature and vesicle formation.
So there's also been
recently work to try and identify what are the components of the proteome basically
associated with these replication and transcription complexes.
And this has been studied with the mouse coronavirus, MHV
using a proximity labeling-based approach involving the biotin ligase BirA which
was fused in the context of the virus to one of these replicase proteins nsp2, known to locate
within these replication compartments and so through the addition of biotin which could then be transferred to
proximal proteins, these proteins could then be purified identified by mass spectrometry to identify the
RTC proteome basically and then in this particular study that I've cited below, they then took these
hits and did a targeted siRNA screen to figure out which of the components that are host factors are actually
necessary for viral replication, which are the proviral factors here.
And I want to note that they threw out hits that
compromised cell viability on their own,
so what these are are hits that decrease coronavirus replication,
but don't impact the viability of the cell.
And what they noticed, of course, are that there are a number of things involved in
cellular transport which is not and vesicle formation which would be
to be expected and our interesting hits for future follow-up as well as a number of catabolic processes.
Several hits in the proteasome that finding is kind of interesting as it could provide a link to the described
coronavirus replication transcription complex encoded protein nsp3
which is thought to have deubiquitination activity.
And then, quite interestingly, some of the top hits were in translation machinery, these eIF3 components
of the translation complex. And they were able to use
fluorescence imaging of pure myosin labeled cells, which is basically a pulse labeling way to detect nascent
transcription. And this showed really pronounced enrichment of actively translating ribosomes
near these viral replication transcription
complexes, particularly early-to-mid infection indicating that the translation machinery, in addition to the
transcription machinery, is recruited near these membranous webs, basically.
Also, it was interesting to look at what are the viral proteins that are present
within these membrane complexes. And in pink are the
viral proteins that were significantly enriched and it makes a lot of sense because
most of these are the nonstructural proteins which are known to be involved in
replication and transcription, so they should be there.
It's also interesting to look at what wasn't there. And so for example, one of the proteins that was
not significantly enriched there is a nonstructural protein called nsp1.
Nsp1 is fascinating. It is a key coronavirus pathogenicity factor.
It's a host shutoff factor that basically restricts gene expression coming from the host cell
and it does this via a two pronged approach.
Nsp1 is able to interact directly with the 40S subunit of the ribosome and in this way
block translation of host RNAs and also mediate endonucleolytic cleavage of these RNAs
in a pretty widespread way,
leading to broad accelerated messenger RNA degradation in these cells.
And this benefits the virus, perhaps for at least two reasons.
One classic way of thinking about why host shut off benefits the virus,
is that it helps viruses shunt
gene expression machinery away from the host cell and towards viral needs.
The second reason, which has been directly demonstrated for
these coronavirus nsp1 proteins is that this is a general immune evasion tactic because by promoting
widespread RNA degradation, many of these RNAs are going to be things that are induced
as part of the interferon response and this helps the virus delay the interferon response.
Notably, for nsp1, it seems to be specific for cleaving host RNAs
because the leader sequence that 5-prime leader sequence that we talked about
for subgenomic RNA synthesis and that's present on the genomic RNA, appears to protect
viral transcripts from nsp1-mediated cleavage.
And so this is aselective shut off of host, but not viral RNAs.
This activity in particular, the thought that it blocks an interferon response
is quite relevant for viral pathogenesis.
Indeed, It's been shown that if this nsp1 protein is mutated, and this is a mouse survival curve here,
you can see that while mice infected with a wild type virus
generally are dead about 6 days after infection.
In the absence of this key virulence factor, all of the mice survive the infection.
And so this mutation of this factor has also been something
that's been explored as a potential vaccine strategy.
Alright, beyond that nsp1 virulence factor, several of the other things that are not present in these
RTC complexes, if you look at these, are basically
assembly and virion proteins, things like the matrix protein, the envelope protein, the spike protein
and that makes a lot of sense because viral
morphogenesis or assembly is not happening in these RTCs. That's happening in sort of a discrete
presumed location and so it's sort of makes sense that they would not be part of these RTC complexes.
Additionally, not part of the RTC complexes are many accessory proteins.
And so what are accessory proteins? These proteins and genes are things in viruses in general that tend to be
specific to a particular viral species or a particular viral genus.
And frequently accessory genes are dispensable for viral replication in tissue culture cells
but play really important roles in the virus-host
interaction in an in vivo context with the animal or the human.
And so what I'm showing you here in this diagram
are the accessory proteins, which are labeled in blue for a number of representative betacoronaviruses,
you can see that SARS-CoV-2 is included in this diagram in the center, compared directly with SARS coronaviru.
You can see that they, in fact, share a number of accessory proteins that look pretty similar,
but I think that it's going to be interesting to compare the differences as
in the future as there are, at least from sequence gazing, several notable SARS coronavirus 2 variations
in these accessory proteins. In particular, in accessory proteins that are involved in interaction with the
the innate immune response and perhaps countering the interferon response
and some of those are listed here in this table
accessory proteins 3a, b, open reading frame 6, open reading frame 8.
Each of these has some notable differences.
You'll also note that, I'm not going to go through the functions of all of these on the table,
though I will point out that even for SARS coronavirus and other coronaviruses
the functions of many of the accessory proteins are only partially worked out or
not yet established. This is going to be an important
area for research in the future.
Okay, so we've talked about the composition of these
replication transcription complexes which are formed from this elaborate ER-derived
network of vesicles.
And then once the viral genomes though are replicated within these they need to assemble into new viral
particles, and this is called viral morphogenesis.
And assembly is basically driven first by
association of the nucleocapsid protein with the genomic RNA.
This assembles to form those helical nucleocapsids,
remember that are formed in the center of the viral particle.
Then these need to associate with the components of the viral membrane.
And so these are the spike protein, the matrix protein, the envelope protein.
These are all integral membrane proteins that are inserted into the endoplasmic reticulum,
and then the nucleocapsid, which is bound to the viral genome, then buds into these
In perhaps in the ER Golgi Intermediate Compartment or labeled here as ERGIC
it's known that budding occurs in association with the Golgi
and then these particles are then probably glycosylated at particular sites and
released through a process that's like exocytosis,
out of the cells so that they can then go on and infect neighboring cells.
Okay
So that is the basic replication mechanisms of the virus and now in the last few minutes,
I want to turn to immune interactions of this virus.
First I want to point out that SARS and MERS coronavirus and we don't know yet the answer for CoV-2,
an interesting feature of these viruses is that they induce very little if any
interferon in most cells. And this is illustrated in this image that I've shown here,
where you can see that the upper gel shows a signal for interferon beta
and in control cells and these are infected with Bunyavirus, which is a negative sense RNA virus,
which clearly induces interferon beta quite robustly as many RNA viruses do.
SARS coronavirus stands in stark contrast to that control
and that you can see very little interferon beta signal that is induced.
And so why is this the case?
Well, we've touched on this a little bit already,
but just to hammer at home that there are a number of putative interferon
antagonists that have been identified in the SARS coronavirus genome,
several nonstructural proteins we talked about nsp1,
several accessory factors that I touched on, as well as, the matrix nucleoprotein
may be able to counteract this as well. So it appears that this virus and perhaps CoV-2 as well has really a
multi-pronged approach to dampen the early interferon response to the virus.
And this is thought to be really important for viral pathogenesis. And indeed SARS
pathogenesis has been shown to be linked delayed interferon 1 signaling
and subsequent immune toxicity, and so let's look at this first in terms of this survival.
A graph here shown here for mouse experiments where you can see that wild-type BALB/c mice,
when infected with coronavirus, tend to succumb to infection
by about 6 to 8 days post-infection.
However, if you infect mice that are lacking interferon signaling,
so they're lacking the Ifnar receptor, their knockout for that,
you infect these mice with wild-type virus and none of these mice died.
That suggests that the interferon response ultimately is
linked to death of these mice upon coronavirus infection. And that's not because these
mice that lack the Ifnar protein are able to replicate the virus any differently
and that's shown here in this graph in panel D, where you can see that the replication the
levels of replicating virus, as measured by plaque-forming units in the lung are
basically very similar between the wild-type mice and Ifnar knockout mice.
And so the hypothesis is that
the virus is able to replicate too high initial titers because of these accessory factors and other
multi-pronged approaches it has to delay the interferon response.
But then an interferon response comes on later and sort of at an inappropriate time
because it can no longer be used to stop the initial virus infection,
but what this response is doing
is driving aberrant recruitment of pathogenic inflammatory monocyte macrophages
and activation of the innate immune response leading to cytotoxicity. And so that's shown here
in these diagrams, where on the left you have an uninfected alveolus and
these cells upon acute coronavirus infection start to implicate
rapid virus replication because the virus is preventing the antiviral interferon response early.
This leads to inflammatory cell infiltration and release,
both probably from the infected cells as well as from these
infiltrating inflammatory cells of proinflammatory cytokines and chemokines responses and it is those
immune responses that are thought to lead to acute lung injury and acute respiratory distress syndrome,
so a clear immunopathology associated with these infections.
Finally, I want to note that it's been shown for SARS and also for the circulating human coronaviruses that
neutralizing antibody titers, which are shown on the graph here, and the memory B cell responses,
which are not shown here, are both short-lived SARS-recovered patients. And so the black line
here shows a cohort of SARS patients that were monitored for neutralizing antibody
and you can see they do mount a robust
neutralizing antibody response; however, this response is not sustained, and that by
a couple of years after the initial infection their response is basically disappearing.
Now you see a couple of outlier patients shown in the in the green line and the orange line,
indicating that some people may be able to mount a sustained
protective response, but for most people infected with virus immunity probably wanes
and I think that's going to be important in thinking about,
in particular, whether that is also the case for CoV-2
and what does that mean for continued circulation of this virus.
So I think there are a number of really important immunological questions that need to be answered for
CoV-2 right now that are going to really greatly inform the thinking
about how this virus causes pathogenesis and in control of the pandemic.
And I've just outlined a few of these here. For example, how does seroconversion
look like for CoV-2? How long do recovered individuals stay immune? And can they be reinfected?
What type of immunity will we get from vaccines?
And how does it compare to the infection response, which I've shown here.
We also really need more information about what's happening in the older population,
particularly in regards to their immune responses, immunology, and inflammation
that's happening in these patients. Because in part this will help scientists identify parallels
that should be looked for in animal models and these animal models themselves are in need of
significant development for CoV-2.
Okay, so with that I just want to end by listing some of
what in my opinion are some of the key open basic science questions about these viruses.
First for SARS-CoV-2, what is the role of the polybasic sight in the spike protein
in CoV-2 transmission? is this really a component that has helped speed up transmission of this virus?
What are the pathways involved in coronavirus-induced membrane remodeling,
and how do replication and transcription
complexes temporally and functionally coordinate the various stages of the viral life cycle?
What are the biochemical activities and roles of the various proteins that form this highly sophisticated
replication transcription complex? How do they coordinate replication and transcription
at different stages of viral life cycle?
How do these coronaviruses maintain such a large
genome and still have sufficient mutation rates for adaptation and trans-species movement,
which we know certainly occurs for these viruses?
What are the functions of the CoV-2 accessory proteins
and how do they impact the in vivo growth and virulence of the virus?
And will coronavirus 2 infected individuals or vaccines mount protective long-term immune responses?
Okay, so with that I'm going to end
and I first want to acknowledge that I got a lot of assistance in collecting information and
slides for this talk from Professor Laurent Coscoy
as well as from members of my lab: Divya Nandakumar, Ella Hartenian, and Michael Ly,
Azra Lari, Jessica Tuckers, and Allison Didychuk.
I would also like to mention that if you're not a virologist
but you're curious about how viruses and viral research have really informed
a lot of the basic understanding of molecular biology,
I've recorded an open-access iBio talk on that the link of which is below
and then most importantly I really want to thank
all of the coronavirus researchers
Who generated all of the sets of information that I talked about today,
and who are playing really key roles in
the response to this current pandemic as well as all of the scientists and medical personnel
who are working really tirelessly to fight the pandemic, and we are seriously indebted to them
So with that, thank you very much.
