Hello.
I'm Vivian Cheung.
I'm an investigator of the Howard Hughes Medical Institute and a professor of pediatric neurology
at the Life Sciences Institute at the University of Michigan.
In the Part III of my talks, I'm going to discuss mechanisms that underlie RNA editing
and RNA-DNA sequence differences.
In the last two sections, I discussed how a DNA sequence can become a different RNA
transcript, where the RNA transcripts have different sequences, and these are then translated
into different protein isoforms.
For sure, that... this is a somewhat unexpected finding, because we certainly think that DNA
is the genetic blueprint that underlies our RNA and proteins.
So, in order for us to study these more deeply, we want to be able to understand how these
RNA editing, as well as RDD sites, arise.
We'll start with something that is easier and that is with the known editing examples,
and then go on to the RNA-DNA sequence differences that are certainly less well-characterized,
and we'll discuss how, when they... during the transcription, do they occur, and how
we use model organisms to help us understand this process.
So, A-to-G editing, when... in 2012, when we first made the observation that there are
all 12 types of RNA-DNA sequence differences, we certainly knew that there are A-to-G editing
sites, but we didn't know that they were so abundant.
So, that the number of sites we found were surprising, even though there are some computational
analyses that hinted that there could be a large number of sites, we really wanted to
be able to show them experimentally.
So, in order for us to understand the A-to-G editing, what we did was to require that,
upon ADAR knockdown, that the inosine in the transcript are have... either a significant
decrease in level or that they have completely been abolished.
Since inosines are functionally the same as guanosine, I refer to them interchangeably
as A-to-I or A-to-G editing.
So, we require that ADAR knockdown A-to-G editing levels to significantly decrease.
In addition, we require those ADAR proteins to be found bound to those transcripts in
ADAR RNA immunoprecipitation.
So, using these two fairly stringent criteria, we still found over 50,000 A-to-G editing
sites, of which 10,000 were known but 50,000 of those were not known to be A-to-G editing
sites.
In addition, what we found was that, upon ADAR knockdown, that only the A-to-G sites
decrease in level; all the other sites did not change.
So, suggesting that ADAR specifically mediates A-to-G editing.
So, now that we have such a large number of A-to-G editing sites, we can begin to characterize
them.
What we found is that there are a large number of transcripts that are hyperedited.
So, for example, is in SMARCC1, there are over 200 of these A-to-G editing sites in
the transcript and, in Fanconi A, there's also over 100 of these editing sites.
And we also look for sequence motifs that are around these A-to-G editing sites, as
others have shown that many of these A-to-G editing sites are found in Alu-repeats, and
there's a specific sequence motif.
So, there are four nucleotides in RNA.
What we found was that, right before the A-to-I or A-to-G editing sites, there's usually a
depletion of G 5' to it, and right after those A-to-G sites it's a... an enrichment of G
in the 3' end.
One of the... having so many sites, one of the discoveries we made was that there are
many HuR binding sites close to these editing sites.
So, to be sure that that is correct, what we did was we do a HuR IP as well as an ADAR
RNA IP, and as whether those HuR RNA IP and ADAR RNA IP pull out the same transcripts.
And, shown here in this example, indeed ADAR and HuR are binding to the same RNA transcript,
for example, they bound to both TMPO, both in...
TMPO found in... have both the ADAR and the HuR proteins on them.
So, one of the questions we can ask is whether there is a protein-protein interaction between
ADAR and HuR, or that both ADAR and HuR bind to the same RNA transcript and that's why
we're seeing both proteins binding to the same transcript.
And, to answer this question, we dissolve away the RNA and... using RNase V1 and RNase
A, and ask if we still find the HuR and the ADAR co... colocalizing.
And our results show that, once we dissolve away the RNA, we no longer see these two proteins
together.
So, suggesting that what they're doing is that they bound to the same RNA and that's
why we find them in the same place.
So, our model, therefore, is that ADAR binds to Alu elements, and Alu elements basically
allow the RNA to fold, so that you... we have a stem-loop structure.
ADAR comes in and binds to the stem of the stem loop, and therefore remaining a loop
in the RNA, and that loop is then recognized by HuR that binds to it.
As a result, ADAR not only mediates A-to-G editing; upon the binding of AR...
HuR, it also stabilizes the transcript.
Now, that these are the two known mechanisms that I just described -- APOBEC and ADAR.
So, what happened to the rest of the RDDs that we identified?
Although it is a simple question of how they come about or it's not so simple, even though
the question is just, are RNA sequences different from the DNA?
As I said, assembly is itself... it's rather complicated.
But, what we can now do is to take all these sequence reads, we assemble them, but now
we have an anchor, because we now know where the A-to-G sites are and we have validated
those A-to-G sites experimentally by ADAR...
ADAR knockdown and ADAR RNA IP.
So, shown here are, for example, in this cartoon, let's say these are all the sequence reads
with an A-to-G editing site -- we can now ask, which of these sites will have... also
have an RDD nearby on the same read?
This way it kind of gives us some sort of an anchor.
And what we did is to take different samples from individuals, for example, from blood
samples we can extract the B cells and make B cell cultures, we have white blood cells
from blood samples, we can do skin biopsies and then from the skin biopsy we can make
it into induced pluripotent stem cells, and then derive those other cell types from those
induced pluripotent stem cells.
From each of these samples, now, we can sequence the DNA and the corresponding RNA, and compare
them.
As I said, for the A-to-G sites, we have a lot of confidence because we can manipulate
the mediating protein.
We then take the sequence reads that have these A-to-G sites and look within the same
reads whether there are other types of RDD.
So, shown here is in such an example, where in this one sequence read are 2 A-to-G sites
and nearby is a T-to-C RDD.
What we see here is that, in leukocytes, the RDD level and editing level is very low, whereas
it's much higher in cultured B cells.
And these editing and RDD levels are actually independent of gene expression, so... as if
this editing and RDD adds to a whole layer of complexity in the RNA, in addition to the
gene expression level.
So, next, I really want to think about, what are the mechanisms that mediate RDDs?
I already showed you that upon ADAR knockdown the only types that drop significantly are
A-to-G.
So, A-to...
ADAR protein, basically, only mediates A-to-G editing, not the other types.
So, knowing that the deaminases do not play a key role in RDD, we basically have to go
back to square one and ask, how do these RDDs come about?
So, we want to ask the question, when during transcription does RDD occur?
So, our RDDs formed during RNA synthesis or does it happen after the RNA has been made?
And, in order to answer this question, we collaborated with John Lis at Cornell University.
John's group invented two methods to seek... to study nascent RNA in a very careful manner.
These two methods are called GRO-seq and PRO-seq.
Essentially, these two methods combine nuclear run-on assays with deep sequencing.
So, what these two methods allow us to do is to identify nascent RNA.
So, we do an in vitro nuclear run-on assay, and only the RNAs that have a RNA polymerase
on them can be extended.
And these extensions are done with biotinylated bases.
So, not only are the nascent RNAs made in vitro, but the biotin also gives us a tag,
so we can pull out those nascent RNAs, and we know exactly where the RNA polymerase is,
because usually once we added that one base the RNA polymerase stalls.
So, the base is added and we know, right behind it, is the RNA polymerase.
And this allows us to ask, where in the nascent RNA strand are RDDs relative to the RNA polymerase?
So, what did we find?
We found that the newly made RNA, when it's still in the polymerase bubble, those RNAs
are actually identical in sequence as the corresponding DNA -- there are no RDDs.
And as the nascent RNA exits the polymerase bubble and it's capped, at that point it still
is identical in sequence to the underlying DNA -- there are no RDDs.
As the nascent RNA continues to kind of move along, what we found was that we see RDD formation
abruptly at somewhere about 60-55 nucleotides outside of the polymerase bubble.
So, what we identified, therefore, is that RDDs are not generated during RNA synthesis,
but they are made about a few seconds after the RNAs have exited the RNA polymerase complex.
And so, shown to you here are some of the actual results.
If we look at the y... the... the y axis is the RDD frequency and on this graph, here,
I show you where the RNA polymerase is.
And you see an abrupt rise of RDD somewhere about 55 to 60 nucleotides outside of the
polymerase bubble.
And this has nothing to do with the sequencing technology itself, because when we sequence
mRNA we see that RDDs are basically found all along the sequence read -- we don't see
this kind of abrupt rise like 55 or 60 nucleosides outside of the polymerase bubble.
So, we can validate these RDDs in nascent RNA using different methods, here, data showing
you from digital droplet PCR.
Where, in the... it's a G-to-A RDD, and in the genomic DNA it's 100% G, so there's no
A, whereas in the nascent RNA we have about 15% A that's not encoded by the DNA.
And this nascent RNA, as it's made into mature mRNA, that's still A in those mature mRNAs
in the nucleus and also in the cytoplasm.
And we can also use digital droplet PCR to validate the sequencing results and they always
match the sequencing results, showing that these are not from just sequencing errors.
So, what we... have we found so far?
We know that the RNA polymerase does not catalyze RDD formation, that there are no modified
or unusual bases that are incorporated during synthesis that could explain these RDDs, and
that RDD formation occurs a few seconds after RNA synthesis.
So, what I'm showing you here are some of the genes that contain these RDDs, and many
of them play a role in RNA regulation and other types of nucleic acid metabolisms.
And, as I said, the RNA polymerase does not catalyze RDD formation and modified bases
are not incorporated during synthesis to explain the RDD.
The question therefore remains is that, it happens 60 to 100 nucleotides outside of the
bubbles, what mediates those processes?
But that is not so simple a question, because many things can happen to it that could mediate
these RDD formations.
So, we thought that... that maybe looking in human cells is too complicated.
So, we came up with a wishlist for studying RDDs.
We want an organism that allows us to understand the mechanisms that underlie RDDs and on our
wishlist is: an organism with a small genome, so we can cover the sequence read very deeply;
in order to map those sequence reads more easily, we want it to be a haploid genome
and all so to have very few repeats; we also want to have a lot of genetic and biochemical
tools available for us to study the process genetically and biochemically.
So, I think, looking at this list, an organism easily emerged and that is yeast.
So, we decide to turn to studying cere...
Saccharomyces cerevisiae for RDDs.
The genome is only 12.1 million bases, compared to 3,000 in human.
It's haploid, very few repeats, and there's also very few introns, so it's very easy to
map the sequence reads, and there are delete... gene deletion libraries for us to study the
genetics, and... or using genetic approaches to study how... what underlies RDDs.
So, what we did was to take different strains of yeast, sequence the DNA very deeply -- now,
with a small genome, we can study... sequence them to over 200x coverage.
We can also sequence the RNA much more deeply.
So, for each of these strains we sequenced the DNA and RNA deeply and then we go for
RDDs.
And, again, as in humans, we found all 12 types of RDDs in the mRNA of yeast.
And here is in... just showing you the data from S288C, which is the reference strain
in yeast, but in other strains we also find a very similar distribution of RDDS, and the
frequency of RDD in all strains is about 1 per 10,000, very similar to what we found
in human.
But what does it really translate to?
This is finding about 700 to 1000 RDDs in one genome of yeast and, just to put it in
reference of, say, introns, there are about 281 genes in yeast with introns and there
are about 500 to 600 genes with RDDs.
So, there are certainly more genes with RDDs than genes with introns in the yeast genome.
So, we've turned to yeast in order to study its mechanisms, right?
So, we want to be sure the observation we made in human, which is that RDDs arise not
during RNA synthesis but soon after synthesis, also holds in yeast.
So, we carry out the same nascent RNA sequencing using the method that John Lis' group developed
called PRO-seq.
And we exactly found the same observation, that is that RDDs are formed about 60 nucleotides
outside of the polymerase bubble.
I've shown you, here, in this slide that there's an abrupt rise of RDDs about 60 nucleotides
outside of the RNA polymerase bubble.
So, this gives us some places to look and ask, so what happens about a few seconds after
synthesis?
We have to remember, at that time, the DNA stran... the two DNA strands are still partially
unwound, and you have a newly made RNA strand still close by.
So, what happens is that nascent RNA strand basically can re-hybridize with this DNA template,
forming this three stranded nucleic acid structure often referred to as R-loops.
So, we think that, if there are R-loops forming at about the same times when... as RDDs are
formed, could these R-loops be mediating RDDs?
But, the first thing we need to do is to ask, where are R-loops in yeast?
So, luckily there is an antibody called S9.6 that is very specific for these R-loops or
RNA-DNA structures.
So, we used this antibody, S9.6, to do an RNA-DNA IP and then sequenced them, the immunoprecipitated
products.
This way allows us to identify where R-loops are across the genome.
So, in the yeast, we found over 1500 R-loops.
The quick... big question here is whether these R-loops are where RDDs are.
So, here's a metagene of where R-loops are across a standard gene, and we superimpose
on this, basically, where RDDs are.
And what we found was that the RDDs are indeed... follows very similar distribution in genes
as the R-loops.
But that is just association.
What we're saying is that, well, RDDs are found where R-loops are; we really don't know
that the R-loops are helping to the RDD formation.
But since, in yeast, there is a very rich gene deletion library, we can turn to those
for help.
R-loops, since they are kind of trailing, oftentimes, a RNA polymerase complex, they
really have to be resolved before normal transcription can occur.
So, cells, whether in yeast or humans, have developed sophisticated ways to resolve these
R-loops.
One way that cells resolve these R-loops is by using an RNA-DNA helicase to resolv...
to relax the structure.
Another way that the R-loops can be resolved is by digesting away that RNA strand using
ribonucleases, or that cells use topoisomerases to relax the DNA strand.
All these ways will allow R-loops to resolve.
So, we looked into yeast who that have mutants in senataxin, which is that RNA-DNA helicase,
in ribonucleases, and in topoisomerases, for... to see what happened in these yeast mutants
that should not be able to resolve R-loops.
So, again here, we do a dot blot using the same antibody as before, that S9.6, that is
specific for R-loops.
Since these mutants have deficiency in genes that resolve R-loops, not surprisingly, they
have more R-loops.
So, now the question is, if these yeast mutants have more R-loops, what happened to their
RDDs?
We found that these mutants with more R-loops also have more RDDs, and this is in both types.
Despite any genetic backgrounds, we see the same pattern.
And, in this case, for senataxin, the mutant is in a temperature-sensitive strain, so when
we shift the temperature to 34 degrees, basically, the senataxin protein is no longer active,
and we expe... correspondingly, we see more R-loops and more RDDs.
So, suggesting that R-loops and RDDs are indeed coupled.
So, what happened in human?
What happens if... to the RDDs in human -- are they also sensitive to R-loop formation?
One way that we turn into asking this question is a very unusual form of ALS that has a childhood
onset.
So, this juvenile form of ALS is an autosomal dominant form of amyotrophic lateral sclerosis.
It's due to a mutation in the gene senataxin.
You may remember from their last slide that senataxin is this helicase that resolves RNA-DNA
hybrids, or R-loops.
So, these patients with this unusual form of ALS have a mutation in senataxin and it's
unusual in the sense that the ALS... usually, onset is in mid-teens and so very early on,
in, kind of... as a very early onset form of ALS.
But this form of ALS is also very indolent, so most of the patients kind of have a full
lifespan.
It's also a progressive neurol... motor neuron disease, just like other ALS, and patient
showed this disease with weakness, with muscle wasting, and oftentimes they develop difficulty
walking in the 50s and 60.
What distinguishes this unusual form of ALS from classic ALS is that patients do not develop
problems breathing or swallowing, so there is no bulbar involvement.
This disease, since it's a mutation in senataxin, it gives us an opportunity to ask whether,
in human... whether the R-loop and RDDs are also coupled.
Here... a way to study R-loops in human cells... oftentimes umm... people turn to this gene,
beta actin.
Other studies have already mapped out where the R-loops are formed in beta actin, and
they are shown here in this graph by the arrow and the numbers 3, 4, 5, and 6.
So, indeed, in normal individuals, shown here in a blue bar, we found R-loops in regions
3, 4, 5, and 6.
Correspondingly, in the patient, we see a significant decrease in R-loops.
So, this is very different from the pa... from the yeast mutant that I showed you.
In the yeast mutant, it was a knockout of the senataxin, so there's no senataxin protein,
so you see more R-loops.
In the patient, this patient... this mutation is a gain-of-function mutation, so the RNA
helicase actually works better, so it resolves more R-loops.
As a result, patients actually have fewer R-loops.
So... but it gives us an oppor... another opportunity to ask, what is the relationship
between R-loops and RDD?
So, the patients have fewer R-loops -- what happened to their RDD?
And beta actin indeed is a perfect gene, because there's actually a RDD in the gene itself.
It's a G-to-A RDD.
So, now we can ask, in the patient, relative to the normal control, who have a normal copy
of senataxin, what happened to this RDD?
And we see that, not only do the patients have fewer RDDs, they also have lower RDD
level.
And that's not only for this G-to-A RDD, because when we look genome-wide we see that, overall,
they also have a lower RDD frequency, showing that R-loops indeed are coupled to RDD formation.
So, in conclusion, what we found was that RNA-DNA sequence differences, or RDDs, are
conserved from yeast to human, and RDDs is a way to diversify both the transcriptome
as well as the proteome, and that RDD formation is coupled to R-loops.
And I'd like to end by thanking all the individuals in my lab who did the work, in particular,
Isabel Wang, as well as collaborators from University of Pennsylvania, from Cornell University,
from Johns Hopkins University, as well as the NIH.
