- [Voiceover] In this module
we'll discuss genome editing using
the CRISPR-Cas9 system in mammalian cells.
Traditional gene targeting
is technology challenging
and relies on the process
of homologous recombination.
Spontaneous homologous recombination
occurs at a very low frequency,
and thus is an intrinsically
inefficient process,
which has required the use
of antibiotic selection
and other tricks to isolate the rare cells
in which gene mutagenesis
has been successful.
Genome editing takes
advantage of new technologies
that let you introduce
double-strand breaks
anywhere you like in the genome.
By causing a double-strand break,
you can dramatically improve
the efficiency of mutagenesis,
whether you're simply
trying to knock out a gene,
or trying to knock in specific DNA variant
or stretch of DNA.
Genome editing tools have
now been well validated
to work both in vitro,
in cells in culture,
as well as in vivo, in
organisms ranging from
fruit flies to zebra fish to mice,
to even non-human primates.
The cell has two major ways in which
it can repair double-strand breaks.
One method is non-homologous end joining.
It takes the two ends and
simply puts them back together.
But this is an error-prone process
that often results in the insertion
or deletion of nucleotides.
The other method by which
the cell can repair the break
is homology-directed repair.
Ordinarily, the cell will use a sister
chromatid or chromosome
as the repair template
via homologous recombination.
The repair template allows the area
of the double-strand break
to be cleanly replaced.
You can exploit the
homology-directed repair mechanism
by providing the cells
with large quantities
of a traditional double-strand
targeting vector.
Alternatively, you can provide a
single-strand DNA oligonucleotide
that matches the sequence around the site
of the double-strand break.
In either case, you can fool the cell
into inserting a mutation into the genome
by putting the mutation in the middle
of the repair template.
Here's a schematic of the
two repair mechanisms.
On the left, you can see
how homology-directed repair
allows you to perform
site-directed mutagenesis
and create a specific mutant cell line.
On the right, you can see how
non-homologous end joining
results in the introduction of a variety
of small indels into the genome,
generating a variety of mutant cell lines.
Over the past decade,
a number of different
genome editing tools have
emerged into widespread use,
including zinc finger nucleases,
meganucleases, and TALENs.
Each of the tools has its
advantages and disadvantages.
The most recent advance
is the CRISPR-Cas9 system,
which has created significant excitement
in the biomedical community because of
its efficacy and its ease of use.
The CRISPR-Cas9 system is based on
a recently-characterized
adaptive immune system
found in bacterial species,
and used by the bacteria
to protect against foreign DNA molecules.
The system comprises both
protein and RNA components.
The protein, called Cas9,
has a variety of functions.
It can act as a helicase and
unwind to double-strand DNA.
It can recognize and bind
a particular DNA sequence,
and recognize and bind RNA sequences.
It can produce a
double-strand break in DNA.
In the simplified system
that's now being used
for genome editing in mammalian cells,
the RNA component is a
so-called guide RNA, or gRNA,
that's about 100 nucleotides in length.
This guide RNA is also
known as the CRISPR RNA.
Cas9 binds to this guide
RNA, which itself hybridizes
to one strand of double-strand DNA,
as indicated by the red oval.
Cas9 also binds to several
adjacent nucleotides in the DNA.
Thus, a triple complex of
protein, RNA, and DNA is formed.
The specificity of this complex is encoded
in the first 20 nucleotides
of the guide RNA,
indicated here in blue.
By changing this 20-nucleotide sequence,
one can change the DNA sequence
to which the protein
RNA complex will bind.
Once bound, the complex will generate
a double-strand break in the DNA.
There are some clear advantages
to the CRISPR-Cas9 system.
The Cas9 protein is a fixed component.
It remains the same, regardless of which
DNA sequence you wish to target.
This contrasts with other
genome editing tools
like zinc finger nucleases and TALENs,
where new proteins must be produced
for each new DNA sequence
that's to be targeted.
With CRISPR-Cas9, it's the
RNA component that's changed.
In order to change the specificity
of the CRISPR-Cas9 complex,
all you need to do is change the first
20 to 21 nucleotides of the guide RNA.
Because all this requires is
very simple molecular biology,
it only takes a day of laboratory work
to create a new guide RNA.
Indeed, it's so straightforward
to make guide RNAs,
that you can make a large library
of guide RNAs all at once.
For example, a library that covers
all of the genes in the genome.
Another advantage of CRISPR-Cas9
is its multiplexing capacity.
If you wish to target two genes at once,
you can mix Cas9 with
two different guide RNAs
matching the two gene sequences.
CRISPR-Cas9 complexes will form and create
double-strand breaks into
two genes simultaneously.
With the use of several
guide RNAs, you could
potentially target several
genes at the same time.
Here's a schematic
showing how genome editing
with the CRISPR-Cas9 system works.
If you want to knock out a gene,
you use a guide RNA whose
first 20 or so nucleotides
match a sequence in the
coding portion of the gene.
This DNA sequence is
known as the protospacer.
Of note, the protospacer must be adjacent
to a DNA sequence that
is known as the PAM,
highlighted here in red.
We'll learn more about
this in a few slides.
Cas9 and the guide RNA form a complex
on the protospacer in genomic DNA
and create a double-strand break.
One way the cell can repair the break
is non-homologous end joining.
It takes the two ends and
simply puts them back together.
But this is an error-prone process
that often results in the
introduction of indels,
which can result in frame shift mutation
that prematurely truncates the protein.
If you mutate both alleles,
you can generate a full gene knockout.
No homologous recombination is needed.
No antibiotic selection is needed.
Let's say you, instead,
want to knock in a mutation.
Again, you designed
your guide RNA to match
the desired site in the genome
and introduce a double-strand break.
The other way the cell
can repair the break
is homology-directed repair.
Along with Cas9 and the guide RNA,
you provide a double-strand DNA vector,
or a single-strand DNA oligonucleotide,
containing your mutation,
along with homology arms
to serve as the repair template.
At some frequency, the
cell will incorporate
the mutation into the genome.
Again, no antibiotic selection
or any other tricks are needed.
Let's highlight a couple of
common research applications
of the CRISPR-Cas9 system.
It's increasingly being used to generate
knock out and knock in mice.
In vitro transcribed RNAs,
one a messenger RNA encoding Cas9,
the other the guide RNA,
are injected into
single-cell mouse embryos.
The intent is that mutagenesis occurs
at the target site in the
genome in some of these embryos.
The resulting blastocysts
are implanted into
surrogate mothers, and after
three weeks, pups are born.
These pups are then screened for
mutations at the target site.
This methodology works
with high efficiency,
in some cases approaching
100% mutagenesis rate.
The obvious advantages
are that knockout mice
can be generated without
ever needing to use
mouse embryonic stem cells,
and the process is much quicker
than the traditional approach
of making knockout mice.
Another common application entails the use
of human pluripotent stem cells,
whether human embryonic stem cells
or induced pluripotent stem cells,
to perform disease modeling.
One either starts with a
wild-type stem cell line
or with an induced
pluripotent stem cell line
bearing a patient-specific mutation.
CRISPR-Cas9 is used to either introduce
a disease-associated mutation,
or to correct the
patient-specific mutation.
In either case, the
result is the generation
of isogenic stem cells lines that have
the same genetic background,
epigenetic background,
and so forth.
These matched stem cells lines are then
differentiated into the
cell type of interest,
whether it's cardiac myocytes,
endothelial cells, neurons,
hepatocytes, and so forth.
In principal, any phenotypic difference
observed between the
differentiated cell lines
can be attributed to the disease mutation.
A significant advantage of
CRISPR-Cas9 is its efficiency.
However, the danger of using a tool
that's designed to cleve
the genome at a target site
is that it might also cleve
the genome at a different site
and cause so-called
off-target mutagenesis.
This phenomenon could potentially
confound one's experiments.
In general, off-target
effects are thought to be
most likely to occur
at sites in the genome
with sequence similarity
to the on-target site.
Accordingly, several web
servers have been developed
that allow you to enter
your on-target site
and search through the genome for
potential off-target
sites, with a small number
of mismatches to your on-target site.
This can be helpful in prioritizing among
several candidate guide
RNAs for a project.
As you may wish to choose the guide RNA
that seems to have the least potential
for off-target effects.
A number of variants of
the CRISPR-Cas9 system
are now actively being used
in research applications.
Almost all of them are derived from
the naturally-occurring system found in
the bacterial species
Streptococcus pyogenes.
At least for now, the Strep pyogenes Cas9
and its associated gRNA architecture
are the standard in the field.
There has been extensive
work characterizing
its on-target and off-target effects.
CRISPR-Cas9 adapted from another species,
Staphylococcus aureus, has
recently been introduced.
One potential advantage
is that Staph aureus Cas9
is about three-quarters of the size
of Strep pyogenes Cas9.
Staph aureus Cas9 is just small enough
to fit into an adeno-associated
virus, or AAV vector,
along with the guide RNA.
This makes it possible to use CRISPR-Cas9
for a variety of in vivo
genome editing applications.
Initial studies suggest that
Staph aureus CRISPR-Cas9
can have similar on-target efficiency,
along with less off-target effects,
compared to Strep pyogenes CRISPR-Cas9.
Here's one system by which to introduce
Strep pyogenes Cas9 and a
guide RNA into mammalian cells.
You can express them from DNA plasmids.
The guide RNA can be
expressed from a plasmid
with a U6 promoter, as shown here.
Remember that the first 20
nucleotides of the guide RNA
can be changed so as to determine
the genomic DNA sequence to which
the CRISPR-Cas9 complex will bind.
The remainder of the guide RNA
remains exactly the same.
It's very easy to custom
design a guide RNA
to bind to a desired DNA sequence.
In this system, two complementary
single-strand DNA
oligonucleotides, or oligos,
are used to insert the
desired 20 nucleotides
into the plasmid in such a way
as to put them at the 5
prime end of the guide RNA.
A single ligation reaction
is all that's needed.
Conveniently, the Cas9
protein needs no alteration.
The same version of
the protein can be used
for targeting of any genomic DNA sequence.
In the plasmid shown
here, Strep pyogenes Cas9
is expressed using a
strong promoter called CAG.
The plasmid co-expresses a green
fluorescent protein or GFP,
which is convenient for marking cells
that are successfully expressed in Cas9.
After the guide RNA plasmid is completed
with a single-ligation reaction,
the two plasmids can be
introduced into cells,
typically, by using the techniques
of transvection or electroporation.
Of note, the two-plasmid system shown here
is one of many dfferent
systems that are available
to express Strep pyogenes
CRISPR-Cas9 in cells.
Here is an analogous system
by which to introduce Staph aureus Cas9
and a guide RNA into cells.
This system also uses two plasmids,
which are similar but not interchangeable
with the two plasmids
used for Strep pyogenes
that were shown on the last slide.
The Staph aureus guide RNA is different
from the Strep pyogenes guide RNA.
One difference is that
the protospacer length
for Staph aureus is 21 nucleotides,
rather than 20 nucleotides.
Here are some rules for designing
the Strep pyogenes CRISPR guide RNA.
First, the protospacer is
20 nucleotides in length,
so one must choose a
protospacer of that length
in genomic DNA.
Second, the protospacer must be positioned
just upstream of a 3-base pair element
that matches the sequence NGG,
which means any nucleotide
followed by two guanines.
This element is known as the
protospacer-adjacent motif, or PAM.
The PAM is directly recognized by Cas9.
Without the PAM, no complex can form.
Next, the 5 prime portion of the guide RNA
must match the protospacer.
It is this portion that hybridizes
the complementary stand of DNA,
the mechanism by which
sequence recognition occurs.
Of note, because you're
using the U6 promoter,
there's a specific constraint.
The guide RNA must start with a guanine
in order for it to be transcribed.
Thus, you should add a G to the beginning
of the protospacer, making
it a 21-base sequence
that you're placing at the 5
prime end of the guide RNA.
The extra base at the very
beginning of the guide RNA
does not affect binding
of the complex to DNA.
Here are some suggestions
for choosing a site
to target in the genome.
First, it's important to note
that the double-strand break
generated by Cas9 occurs three base pairs
upstream of the PAM in the position
indicated here by the red line.
When mutations occur by
non-homologous end joining,
they tend to occur
right at the break site.
It's also important to realize
that the CRISPR-Cas9 complex can form
on either strand of double-strand DNA.
You should always check for
protospacer PAM combinations
on both strands in order
to find the optimal one.
In general, your goal should be
to choose a guide RNA that will position
the double-strand break
as close as possible
to the actual site at which you wish
to introduce a change in the DNA sequence,
whether it's an indel to knock out a gene,
or a variant you're trying to knock in.
When searching for well-positioned
protospacer PAM combinations,
you may find several good ones.
You can then prioritize
among the candidates.
For example, you can
profile their possible
off-target bindings sites
elsewhere in the genome
and choose the one that appears to be
most favorable in that respect.
Finally, if possible, it's best to avoid
protospacers that have lots
of guanines and cytocines,
or to put it another way, is GC-rich,
as this has been suggested
to increase the chance
of off-target effects.
The rules for designing
the Staph aureus guide RNA
are largely the same, with
a few critical distinctions.
The protospacer is 21
nucleotides in length,
rather than 20 nucleotides.
The protospacer must be positioned
upstream of a different PAM.
The Staph aureus PAM is more complex
than the Strep pyogenes PAM,
with the sequence NNGRR,
where R is appearing,
whether guanine or adenine.
The optimal PAM is thought
to be slightly longer,
with the sequence NNGRRT.
As before, the 5 prime
portion of the guide RNA
must match the protospacer.
Because you're still
using the U6 promoter,
the guide RNA must start with a guanine
in order for it to be transcribed.
Thus, you should add a G
to the beginning of the protospacer,
making it a 22-base
sequence that you're placing
at the 5 prime end of the guide RNA.
Here are some more general suggestions
for choosing a target site in the genome
for your project.
If you're trying to knock out a gene,
there's quite a bit of flexibility
with respect to target sites,
because all you need to do
is introduce a frame shift
early in the coding sequence of the gene.
The exact location is
usually not important.
Because it's ideal to make the truncated
protein product as short as possible,
you'll typically want to target a sequence
in the first exon that
contains coding sequence.
Sometimes, however, this may not work
if the gene in question has
alternative start sites,
or alternative splicing of exons.
It's always worth checking
in the USCS Genome Browser
to see what genome transcripts
have been identified,
and if there is a lot of
heterogeneity among the transcripts
with differing start sites
or splicing patterns.
It's best to target the
earliest coding exon
that's shared by all of the transcripts.
If you're trying to knock in a variant,
your site selection will
be constrained by the need
to place the double-strand break
as close as possible to
the site of the variant,
ideally less than 10 base pairs away.
Keep in mind that when you're identifying
the site of a mutation, particularly one
that has been reported in the literature,
you'll need to use the complementary DNA
or cDNA sequence.
That is, a coding sequence in which
all of the introns have
been removed to do this.
However, when you're
designing the guide RNA,
you'll need to use the genomic sequence
surrounding the site.
If you use the cDNA sequence,
there's a chance that your site is near
an exon/intron junction,
and your protospacer
may inadvertently span across two exons.
Of course, this guide RNA will
fail to bind to the genome,
since it doesn't take into account
the presence of an intron in
the midst of the sequence.
Let's now consider an
example of CRISPR design.
Imagine that we're trying to make
a cellular model of the
cholesterol disorder
known as familial combined hypolipidemia.
The responsible gene is ANGPTL3,
with loss of function mutations
resulting in the disorder.
The most commonly found mutation
is the S17X nonsense mutation.
Here's our task:
To design a Strep pyogenes guide RNA
that will let us target
the site of this mutation
in a wild type cell.
This will potentially
allow us to do two things.
It will let us try to
knock in the specific
S17X mutation into the genome.
However, because the site is very close
to the beginning of the coding sequence,
we could also use this guide RNA
to try to simply knock out the gene
by introducing frame shift mutations.
Here's the start of the
coding sequence of ANGPTL3.
This sequence is taken from
the human genome sequence,
so we don't have to worry about
missing exon/intron junctions.
Highlighted in red is the site of
the S17X dinucleotide mutation,
which changes a TCC codon
into a TGA stop codon.
To find a suitable protospacer,
we must first look for Strep pyogenes PAMs
matching the sequence NGG.
If you look in the vicinity
of the mutation here,
you'll see that there's no nearby NGG.
However, remember that you can design
guide RNAs that match
either strand of DNA.
So we can also look for PAMs
matching the sequence CCN,
which corresponds to NGG
on the opposite strand.
Here we find three CCN sequences
near the desired mutation site.
Let's consider each of
these PAMs one by one.
For the first one, because we're now
working off the opposite DNA strand,
the protospacer will extend
in the downsteam direction.
The 20-base protospacer,
once you've determined
the reverse complement sequence, is shown.
If you map the location of
the double-strand break,
it'll be three base
pairs away from the PAM,
as indicated here by the red line.
The break will occur 10 base pairs
away from the site of the mutation,
which is okay, but not optimal.
The protospacer is not GC-rich,
so that's an advantage.
There's another important consideration
when choosing the protospacer,
and that's whether the
protospacer and/or PAM
overlap the site of the mutation.
In the example shown
here, the mutation site
falls within the protospacer,
which is an advantage.
Why is this an advantage?
Consider the following scenario
where the protospacer and PAM do not
overlap the site of the mutation.
CRISPR-Cas9 introduces
a double-strand break.
The desired knock-in
mutation is successfully
introduced into the genome by
homology-directed repair.
Because the protospacer and PAM
have not been changed in
the knock-in mutant allele
the guide RNA is still a perfect match
for the genomic sequence, and CRISPR-Cas9
can go back and re-cleave the same DNA.
If an indel then occurs via
non-homologous end joining,
then the knock-in mutant
allele will be disrupted.
It's possible that the
experiment will ultimately
yield no clean knock-in alleles.
This scenario can be avoided,
or at least mitigated,
if the protospacer or PAM is
altered by the knock-in mutation,
resulting in a sequence mis-match.
Then it becomes less likely
that re-cleavage will occur.
It's worth noting that single
or even double mismatches
may not eliminate re-cleavage,
especially if the mismatches occur
near the end of the protospacer
that's far away from the PAM.
Mismatches near the PAM tend to have
more of an inhibitory effect.
Disruption of the PAM itself,
so that it no longer
matches the sequence NGG,
will almost certainly
eliminate re-cleavage.
Here's the second possible protospacer.
The break will now
occur just one base pair
away from the mutation site.
The protospacer is not GC-rich,
which is an advantage.
The mutation site falls
within the protospacer,
which is an advantage.
Here's the third possible protospacer.
The break will occur a little further away
than the last one, four base pairs away.
The protospacer is not GC-rich,
which is an advantage.
While the protospacer would not
be affected by the mutation,
the PAM itself would be,
and that would essentially eliminate
the possibility of re-cleavage,
which is an advantage.
On paper, the second protospacer
appears to be the optimal one,
since it results in cleavage very close
to the mutation site.
However, the best way to chose
among the three candidates
may be to actually test
them in human cells
to empirically assess which has
the highest on-target efficiency in vitro.
If the second protospacer
shows much less activity
than the first and/or third protospacer,
then it may not be the
best choice after all.
Let's assume that the second protospacer
turns out to be the best choice.
The next step is to design
all of the nucleotides
that we can use to place
the protospacer sequence
into the plasmid that will express
the guide RNA and cells.
Recall that we have to
add an extra guanine,
shown in red, to the beginning of the
protospacer sequence, shown in blue.
We can use these template oligos
to design the oligos that
will specifically target
the site of the ANGPTL3 S17X mutation.
Note that the templates shown here
are displayed in such a way as to convey
how they'll hybridize to form a small
double-strand DNA insert that can be
ligated into the vector.
When the desired protospacer
is encoded into the oligos,
you see the result at
the bottom of the slide.
At this point, we can
simply purchase these oligos
from a vendor.
If we were simply trying
to knock out the gene,
we'd be done, since all we'd need are
the guide RNA plasmid,
which we can now complete
with a single ligation step,
and the fixed Cas9 plasmid
that can be used as is.
However, if we wanted to knock in
the specific S17X mutation,
we'd need a repair template as well,
ideally a single-strand
DNA oligonucleotide.
It's worth emphasizing
that knocking in a mutation
relies on homology-directed repair.
However, even in the best-case scenario,
non-homologous end joining will occur
in parallel with homology-directed repair.
So even if you're adding
a single-strand DNA oligo
as a repair template,
you'll like obtain a mix of cells,
some with the S17X mutation,
and others with indels at the
site of the desired mutation.
There's not yet a standard method
to enrich for the first
type, which we want,
and prevent the second
type from occurring,
which we may or may not want.
Although such methods
are under development
and are starting to be
reported in the literature.
To design the oligo repair template,
you can simply take the desired mutation
and flank it with at least 40 nucleotides
of homology on both sides,
taken directly from the genomic sequence.
In this example, the
dinucleotide S17X mutation
is embedded in the middle of
a single strand DNA oligo,
with 40 nucleotides of
homology on either side.
We can simply purchase
this oligo from a vendor.
In practice, we'd probably choose to use
even longer regions of homology,
as it's feasible to obtain oligos
that are up to 200 nucleotides in length,
and there's data to suggest that
longer homology arms will
increase the efficiency
of homology-directed repair.
The final step is to develop a method
by which to screen for mutations
introduced at the target site.
The most straightforward way to do this
is to design PCR primers that will amplify
a region surrounding the
target site in the genome,
ideally with the target site located
in the middle of the amplicon.
The amplified PCR product can be used
to assess the overall mutagenesis rate
through the use of assays
that detect mismatches
among the DNA sequences
present in the PCR product.
The PCR product can also be subjected
to Sanger sequencing, or
next-generation sequencing,
in order to identify
the specific mutations
introduced at the target site.
