Hi.  I'm Steve Bell.
I'm a professor of biology at MIT
and an investigator of the Howard Hughes Medical Institute.
And what I'd like to tell you about today
is some of what we know about
the mechanisms of chromosomal DNA replication.
Now, this event is primarily mediated
by the function of a complex
multi-enzyme machines called replisomes,
that include three DNA polymerases,
an RNA polymerase,
as well as a DNA helicase.
And together these enzymes
must act to accurately, completely,
and rapidly replicate the genomic DNA.
The rapidity can be
illustrated by the fact that it moves...
these replisomes can move at up to
1000 base pairs per second,
and also that entire genomes
can be duplicated in as little as three or four minutes.
The accuracy is about 1 mistake
in every 10^10 base pairs.
To put that into perspective,
that would mean that if you typed
60 words per minute
for 38 years,
continuously,
you would have a single typographical error in the document you made...
so that's pretty impressive.
And finally, it's really important
that genomic replication be complete,
because at the end of genomic replication
comes cell division,
and you need a full copy of the genes
for both of the daughter cells,
so, if you do incomplete replication,
someone is not going to get their full complement of genes.
It's also worth noting that
when you try and segregate the chromosomes
into two cells
when they're not completely replicated,
this will lead to double-stranded breaks,
which can be both mutagenic
and, in some cases, lethal.
Now, it's sometimes hard to appreciate
how much DNA replication goes on
inside our bodies,
but the next slide
is sort of an attempt to make that clear to you.
So, many of you may know that
in each one of your cells,
there's approximately 2 meters of DNA,
and some of you may know that there's...
in your entire body,
there's approximately 150 million kilometers,
and that's enough DNA to go from the sun to the Earth.
Now, sticking with the astronomical sense,
the most remarkable number is
not how much is in your body right now,
but the amount that you will synthesize in your lifetime,
which is upwards of a light year of DNA.
And, if you think about that,
if you're not sort of up on your astronomical units,
that's 9.5 trillion kilometers of DNA.
And so the mere fact that I can stand here
and tell you about DNA replication,
and not be just
a bumbling mass of mutagenized cells,
is a testament to both the accuracy
and efficiency of this process.
Okay, so I'm just going to show you an animation, here,
of the process of DNA replication
at the E. coli replication fork.
And what I want to do over the next few minutes
is to tell you about the various enzymes
that are working together in this animation
so that you understand
exactly what's happening during this process.
So, the first thing I want to do
is tell you a little bit of the ground rules.
So, DNA polymerases
always extend the 3' end
of a growing DNA chain,
and this can be either
extending the 3' end of a DNA chain
or, it turns out,
it can also extend the 3' end of an RNA chain.
And in each case,
it extends this by reading
an oppositely-oriented template strand,
and in fact the process...
or, the place at which DNA polymerases start DNA synthesis
is called a primer-template junction,
and I'll be using that term
throughout the next few minutes
as I describe the function of these enzymes.
Now, both strands of the DNA
are replicated simultaneously
during the replication of chromosomes,
and this is to reduce the amount
of single-stranded DNA,
which is much more prone to chromosome breakage
than is double-stranded DNA.
Now, there's two different types of DNA polymerases
acting at this time.
One is called a leading-strand DNA polymerase,
and it acts by extending
the leading strand DNA
towards the unreplicated DNA,
or in the same direction
as the overall DNA replication process.
So, this is very easy to understand.
It's going to follow right behind
the unreplicated and un...
and double-stranded form of the DNA.
In contrast,
replication of the opposite strand
has to move in the opposite direction,
away from the direction of overall fork movement
or replication.
And so, in this instance,
the primers will be formed
and the polymerase will move
away from the overall unreplicated DNA,
and in the opposite direction of the fork movement.
Now, these two events are happening simultaneously
at the replication fork,
as is illustrated here,
and you can see that the lagging strand DNA polymerase
is moving in one direction,
the leading strand is moving in the opposite direction.
And when you finish an Okazaki fragment,
as these smaller fragments that are made
on the lagging strand are called,
you then reposition the polymerase
and start a new Okazaki fragment,
and, eventually, at the end of the replication process,
these primers that are used,
which I'll have more to say about in a moment,
have to be removed,
and the DNA linked together
to form a continuous strand,
unlike the leading strand,
where it is continuously synthesized.
Now, one property of DNA polymerases,
all DNA polymerases,
is that they cannot start a new DNA strand
by joining two deoxynucleotide triphosphates.
They have to have a primer,
in the form of a primer-template junction.
And so, in order to initiate the new strands
that are required for the replication process,
we need a different enzyme
called DNA primase.
And what DNA primase does is
it synthesizes RNA primers.
And this is because, unlike DNA polymerases,
RNA polymerase,
of which DNA primase is a specialized form of,
can take two ribonucleotides
and initiate a new strand of DNA.
Okay.
Now, importantly, once it does this
it can be extended by the DNA polymerase
-- it's basically forming a primer-template junction --
and one interesting property of the DNA primase in E. coli
is that it is stimulated to act
by interacting with another important protein
that acts at the replication form,
called the DNA helicase.
Now, replicative DNA helicases
always come in the form of
hexameric, ring-shaped structures,
as you see, here, okay.
And these hexameric ring-shaped structures
will encircle one of the two strands of the DNA,
and they will then move
in an ATP binding
and ATP hydrolysis-dependent fashion,
in a defined direction
along this single-stranded DNA,
and by doing so they will
displace the other strand of DNA.
So, you can see that, here,
with the helicase unwinding the DNA.
Now, I've looped this,
so you'll see it a few times,
but I want to point out, also,
that the direction that a helicase moves
on its encircled strand
is a property of the helicase,
and in this particular case
I've illustrated the E. coli
replicative DNA helicase, called DnaB.
And its polarity, as this property is called,
is in the 5'-to-3' direction,
so you can see it's starting at the 5' end
and moving towards the 3' end of the DNA.
Now, we've talked about
a number of proteins that are involved at this point,
but one that is not an enzyme,
unlike the ones we've talked about thus far,
has a primary role of holding the two strands apart
after you unwind.
Because, of course, these two strands of DNA that I have over here
are in fact complementary to one another,
and could rapidly reanneal.
Now, this is prevented by two different events.
The first one is simple to understand
-- that is, that the leading strand DNA polymerase,
up here at the top,
follows almost directly behind the helicase.
And so, that single-stranded DNA
is very rapidly converted into double-stranded DNA,
and this prevents it from annealing
with the complementary lagging strand template.
Now, there's another concern, however,
which is that the lagging strand template
will anneal on itself,
and so there are a specific set of proteins
called single-stranded DNA binding proteins,
or SSBs,
that will bind the single-stranded region
of the lagging template,
and hold it in a single-stranded state,
preventing it from annealing to itself.
And what's important about this is
not only does it keep it from reannealing,
but when a DNA polymerase approaches
a region of single-stranded bound...
single-stranded DNA bound by the SSB,
it is readily displaced,
allowing the template that's left behind
to be readily replicated by the polymerase.
Okay.
So, we've talked about the leading and lagging polymerases,
but it turns out, at the replication fork,
they're part of a larger complex
called a holoenzyme, in particular,
called the DNA polymerase III holoenzyme,
which is a very specialized form of the DNA polymerase,
for acting at chromosomal DNA replication forks.
So, it's illustrated here,
and there are several parts to this complex.
So, first, there are three copies of
DNA polymerase III,
which is the third polymerase discovered by Arthur Kornberg and his colleagues
in their Nobel Prize-winning work investigating the enzymes involved
in DNA synthesis.
Now, in addition, there is
a second large protein complex,
a five-protein complex called a sliding DNA clamp loader,
as well as it being bound to a sliding DNA clamp.
Now, all of the polymerases
are held to the sliding clamp loader
by a subunit that's present three times,
shown here in light blue,
called the τ subunit,
and I'll have to say about τ,
as it plays a particularly important role in
coordinating the events at the replication fork.
But before that, I want to tell you a little bit about the sliding DNA clamp loader
and the sliding DNA clamp, and what their functions are.
So, we'll start with the sliding DNA clamp.
So, this is a ring-shaped multimeric protein
made up of either two or three identical subunits.
This is an illustration of the sliding DNA clamp
from S. cerevisiae, the budding yeast,
and you can see that whether we look at the one
from S. cerevisiae or a phage, T4,
or human cells, or E. coli,
they have very similar structures,
and you can see that in the overlap, here, okay?
And all of them, in this central
hole in the protein donut, per se,
have enough room to fit
double-stranded DNA.
And you can see that here in a crystal structure
of the yeast sliding DNA clamp
bound to double-stranded DNA.
Okay.
Now, what's the purpose of having it
surround double-stranded DNA.
Well, that's illustrated on the next slide.
So, these sliding DNA clamps
not only encircle double-stranded DNA,
but they also are able to
bind to the backside of a DNA polymerase,
holding it on the DNA,
particular at a primer-template junction, okay?
And it turns out, because they do not
specifically interact with the double-stranded DNA,
they will follow along with the polymerase
as it synthesizes DNA.
Now, one property we haven't talked about
of DNA polymerases thus far
is a property called processivity,
and this, put simply,
is the number of base pairs
that are synthesized
each time a DNA polymerase
binds to a primer-template junction.
Now, it turns out, on their own,
polymerases are actually not particular processive.
They'll typically do somewhere up to
about 100 base pairs
before they fall back off the DNA.
Now, what's important about this
is, while it stays on the DNA,
a DNA polymerase typically adds one base, one base pair
or makes one base pair, per millisecond,
and so it's very efficient at doing that
if it stays on the template.
However, if it falls back off the template,
on average,
it's going to take a second to find a new primer-template junction,
rebind, and reinitiate synthesis.
So, what that means is that every time it falls off,
it's lost the chance to add 1000 base pairs
if it stayed on.
And what the sliding clamp does
is prevent that.
In fact, I usually refer to these as
personal trainers for DNA polymerases,
because once it starts,
if the polymerase decides,
oh, I'm tired, I want to fall off,
the sliding clamp holds it on the DNA
and puts it right back to the grindstone,
and starts this process again.
And, importantly, it will only hold the polymerase
while active DNA synthesis is occurring.
When it reaches the end of a template
and you have complete synthesis,
it's readily released from the sliding clamp
and eventually the sliding clamp
is also removed from the DNA.
So, now that you know
the function of the sliding DNA clamp,
we need to talk about how it's put on
the primer-template junction
so that it can serve this function,
and this is the role of the so-called
sliding DNA clamp loaders.
So, these are five-subunit complexes
that use the energy of ATP binding and hydrolysis
to load sliding DNA clamps,
specifically at primer-template junctions.
Now, how does this work?
Well, the first step in this process
is the binding of ATP
by the sliding DNA clamp loader.
This changes its conformation
and makes it competent to bind both the sliding DNA clamp
and the primer-template junction.
When the sliding DNA clamp binds,
it changes its conformation
by opening up the interface
between two subunits,
creating a crack or an opening in the ring-shaped structure.
Importantly, this is big enough
to fit double-stranded DNA through,
and when double-stranded DNA
binds to the sliding DNA clamp loader,
it does so such that it is now encircled
by the sliding DNA clamp.
Importantly, only a DNA
that has a primer-template junction
can actually fit within this region.
Completely double-stranded DNA
can't bend enough to fit
in the sliding DNA clamp loading site.
Also important is the presence of
a 3' hydroxyl at the site of the ATP binding,
which stimulates the ability of ATP
to be hydrolyzed.
So, when ATP is hydrolyzed,
this causes the sliding clamp
to change conformations,
release the sliding clamp
and the primer-template junction DNA,
and this causes the sliding clamp
to now close again
around the double-stranded DNA portion
of the DNA,
and now it's ready
to recognize a DNA polymerase
and facilitate its processive DNA replication.
Now, at this point,
I've told you about a lot of different enzymes,
and I just want to tell you
a little bit about how they work together
at the replication fork.
So, the DNA polymerase III holoenzyme,
this large multi-enzyme complex,
actually does more than just synthesize DNA
and load the clamps
-- it also stimulates the DNA helicase.
And this is mediated, again,
by that same τ subunit
that is interacting with the DNA polymerase subunits,
and this plays an important role,
because if the DNA polymerase I,
either on the leading or lagging strand,
becomes stalled,
it will pull the sliding...
the Pol III holoenzyme away from the helicase,
causing the helicase to slow down
during the time it takes for the polymerase
to restart synthesis.
So, for example, if it hits a lesion in the DNA
that has to be repaired,
the helicase won't run away
at the same rate
as it would if it were bound to the polymerase,
because the polymerase is now detached.
Once the polymerase can bypass that lesion,
or the lesion is repaired,
it can catch back up to the helicase
and the process can become very rapid, again.
I've already told you that primase activity is stimulated
by binding to the DNA helicase,
and in fact if you modulate
the level of interaction, or the affinity,
of the primase for the helicase,
you can actually change the rate at which
it primes new syntheses,
and so if it's faster you'll make shorter Okazaki fragments,
and if it's slower, or lower affinity,
you'll make longer Okazaki fragments.
So, the rate of Okazaki fragment formation
is actually determined by this affinity.
Finally, DNA polymerase III, as I told you,
has its processivity dramatically stimulated
by the sliding clamp
and, in turn, also by the sliding clamp loader,
since that is required for loading it.
So, now that I've told you about the enzymes
and how they work together at the fork,
I'd like to take you through the events that are occurring
at a replication fork, one by one.
So, this is an illustration
of a replication fork
bound to the DNA polymerase III holoenzyme.
You can see the helicase, here,
with its unreplicated and unseparated DNA.
The top is the leading strand polymerase;
at the bottom is a lagging strand DNA polymerase.
There's also a third polymerase,
as I've explained,
which is, at this point in the reaction,
unengaged,
and you'll see how it becomes engaged as we move forward.
Now, you'll also note that
there's large regions of single-stranded DNA,
here and here,
that are bound to the single-stranded DNA binding protein.
And in fact, I've labeled this the trombone model
of bacterial DNA replication,
because this loop down here
actually gets bigger and smaller
depending on where you are in the replication process,
much as a trombone slide goes in and out
as you play different notes.
Now, I've shown you the SSB
that's bound to the single-stranded DNA, here,
but for the rest of the illustrations,
just to reduce the clutter,
I've removed that.
Now, you'll note that there's a large single-stranded region
adjacent to the helicase,
and this is actually the perfect substrate for the primase,
which will come in and synthesize
a short primer at this single-stranded DNA region.
And this is, again,
mediated by the affinity of the primer...
DNA primase for the helicase.
Now, as soon as this is synthesized,
it's recognized by the sliding DNA clamp loader
as a primer-template junction.
And it then loads a sliding DNA clamp
onto the primer-template junction,
making it, now,
ready to be recognized
by the unengaged DNA polymerase.
So, what happens next is that
the polymerase binds the sliding clamp,
associates with the primer-template junction,
and begins to initiate a second Okazaki fragment.
Now, I want to point out,
during this process...
the processes that I've been explaining,
the leading strand polymerase
has continued synthesizing,
as has the other lagging strand DNA polymerase.
And, in fact,
once that lagging strand polymerase
reaches the end of its single-stranded template
-- that is, the beginning of
the previous Okazaki fragment --
it will fall off the DNA, just as I explained,
because it no longer has template,
and it will become an unengaged DNA polymerase,
just as the one that is currently
making the second Okazaki fragment
was unengaged at the beginning of the reaction.
Okay.  So, we've taken you through this in slow motion.
Let's look at what it looks like in real time.
Okay.  So, this...
you should now be able to label
all these different subunits in this process.
So, you can see, in blue, here,
this is the DNA helicase, okay?
And it's unwinding the DNA, both strands,
and feeding it to two different DNA polymerases, here.
You can see the leading strand polymerase,
which is immediately using the template
on the leading strand...
using the leading strand template
to make new double-stranded DNA,
and you can see its associated
sliding DNA clamp, here, in the green.
Now, you'll also notice that
you see the arrival of a primase,
okay, here it comes, right there,
and it lays down a primer,
which is immediately recognized by the sliding DNA clamp loader,
which puts a new sliding DNA clamp on.
And in this case,
because the leading strand already has a sliding DNA clamp,
this is immediately recognized
by the lagging strand DNA polymerase, shown again in purple,
and its associated sliding DNA clamp.
Okay.
Now, it may be a little hard to...
oh, actually, I should point out...
there's one obvious difference
between the model I showed you and this model,
which is that there's only two DNA polymerases,
and that's because at the time this animation was made,
it wasn't actually known that there were three,
instead of two,
DNA polymerases.
But that turns out to be very important,
because, as you'll notice,
there's a period of time,
each time a new primer is synthesized,
and the lagging strand DNA polymerase
has to recognize that new template,
where there's no synthesis occurring
on the lagging strand template.
Now, in some organisms, this would be just fine,
but in E. coli the replication fork
actually moves at 1000 base pairs per second,
and as you'll recall that's the absolute maximum rate
a DNA polymerase can go.
So, while the leading strand polymerase can handle that quite easily,
the lagging strand polymerase couldn't do it
if it had to come off and rebind all the time.
On the other hand, if you have a third DNA polymerase,
it becomes obvious how you can always have
at least one polymerase, and often two,
acting on the lagging strand,
allowing the overall fork movement
to go at 1000 base pairs per second,
which is what's observed in vivo.
Now, sometimes it's hard to understand
how fast this really is,
so let me give you a little way to think about it
that's, I think, pretty impressive.
So, the double-stranded DNA
is 20 Angstroms wide,
but let's just imagine that it's 1 meter wide, okay?
Now, in that situation,
the nucleotides that would be incorporating
would be floating around the room you're in
at about the size of a textbook, okay?
And this replisome machine
would be about the size of a FedEx truck, okay?
The sliding DNA clamps would be the size of
very large wheels, okay?
And what's most impressive
for a replication fork moving at 1000 base pairs per second,
that FedEx truck, at the scale of the DNA
being a meter wide,
would be moving at 375 miles per hour.
So, if you were standing in a classroom
with a double hexamer tube
of 1 meter wide going across the classroom,
and let's say it's 50 meters...
50 feet across,
that replisome would come through,
spend about 0.1 second in the room,
and change that one tube of 1 meter into two tubes
before you would even really notice it.  Okay?
So, these are really remarkable machines
that are accomplishing quite a bit at a time.
So, I want to end by
talking about a comparison
between the bacterial replication fork,
which I've told you about in great detail,
and what we know about the eukaryotic replication fork.
So, there are corresponding proteins
to all the proteins I talked about,
for the replication fork in bacteria,
found in eukaryotes.
However, it's a little bit more complicated there.
So, instead of the single
DNA polymerase III that's involved in replication at the eu... prokaryo...
bacterial replication fork,
there are actually three different DNA polymerases
that work at the eukaryotic replication fork.
Pol δ is exclusively making
the lagging strand DNA.
DNA Pol ε is exclusively making
the leading strand DNA.
Now, the third polymerase
is actually involved in the priming event.
So, while in E. coli the primase is
called DnaG and it's a single polypeptide,
in eukaryotic cells
it's a complex between a primase,
actually, a two-subunit primase,
and a DNA polymerase called
DNA polymerase α.
And the primase synthesizes
the short RNA primers
and then immediately hands them off to the DNA polymerase α,
which then extends them for a brief period of time
before, in turn,
allowing them to be taken over,
either by the Pol δ or the Pol ε,
leading or lagging strand polymerases.
As I showed you before, there's both E. coli
and eukaryotic sliding clamps.
In E. coli it's called β;
in eukaryotic cells it's called proliferating cell nuclear antigen,
or PCNA,
and, not surprisingly,
this was originally identified
just because it was prominently present
in dividing or proliferating cells,
thus the name PCNA.
The sliding clamp loader is called
the τ complex for reasons that you understand, now,
in bacteria,
and it's called the replication factor c, or RF-C,
in eukaryotic cells.
Again, the helicase is a little bit more complicated story
in eukaryotic cells versus bacterial cells.
It's a single subunit repeated six times,
the DnaB protein repeated six times,
in bacteria, but in eukaryotic cells the helicase,
the core of the helicase is the Mcm2-7 complex,
which is made of six different subunits
-- Mcm2, Mcm3, Mcm4, Mcm5, Mcm6, and Mcm7 --
that form a hexamer with one of each subunit in the hexamer,
so it's a heterohexameric protein.
And it's not very active on its own.
Instead, it's activated
by binding to two other proteins,
Cdc45 and GINS,
to form the so-called CMG complex,
which is the active replicative helicase.
So, I've told you already that
there's three different DNA polymerases at the eukaryotic fork.
We know less about the interactions at the fork as well,
although we do know that Pol ε
is held at the replication fork
by interactions with the GINS protein.
In contrast,
neither Pol δ nor the eukaryotic clamp loader
is part of the replication fork,
so that's very different from the prokaryotic fork.
And this may be because
eukaryotic forks move at a much more leisurely pace
of 20-60 base pairs per second,
compared to the 1000 base pairs per second,
and so those lagging strand events
that have to be tightly coordinated in bacteria
can, quite likely,
occur by just solution binding out of...
instead of being tethered immediately
to the site of DNA replication,
because there's much more time for them to occur.
And, in fact, for lagging strand synthesis,
it probably occurs after the fork has gone by.
So, at this point I hope you understand
a lot more about how the replication fork works
and, if you stay tuned to my next presentation,
we'll talk about how you assemble these replisomes
at sites of initiation
and understanding how that's regulated during the cell cycle.
So, I just want to thank the people involved in this process.
So, Sera Thornton
did most of the animations that you saw.
They were done in the MITx Biology office
as part of the development of a course
that Sera and Mary Ellen Wiltrout
helped me develop called 728x,
which is available on the MITx and edX platforms,
and is a full molecular biology course,
talking about all the elements of the central dogma.
In addition, the very high resolution animation
showing the E. coli replication fork in action
was done in the DNA Learning Center,
which is part of the Cold Spring Harbor Laboratory.
Finally, I want to thank
the Howard Hughes Medical Institute
and the National Institute of General Medical Sciences
for supporting my research.
