Today, I'm going to talk about the bacterium,
Escherichia coli, and synthetic biology. I'm going to focus
on transcription and its regulation. What I want to try
and convince you in the next 20 minutes or so, is that
by understanding the mechanism of transcription and
its regulation, we can lead on to the development of
switches and tools that can be useful in the exploitation
of Escherichia coli in synthetic biology.
I think it's important to understand that manipulating
transcription is a means to an end, but not an end
in itself. So let me start off by telling you a little
bit about Escherichia coli. A typical Escherichia coli
cell has a single chromosome that contains something
like 4500 genes, and these genes will be organized
into 3000 transcription units. The amazing thing is
that there's a single species of RNA polymerase
that copies those transcription units into RNA.
Now if you bust open a typical Escherichia coli
cell, you'll find 4000 or so molecules of RNA polymerase.
Now, at first sight, that looks as if there's plenty of RNA
polymerase to go around all the transcription units, but actually
that's an illusion. Because it turns out that some of the
transcripts -- a small number of transcription units receive
a lot of RNA polymerase. So what this means is that
there are a lot of transcription units that are actually
-- that are short of RNA polymerase, in other words,
in the cell RNA polymerase is in short supply.
So, Escherichia coli is very good at distributing its RNA
polymerase between different genes, according to the
need. Need being which gene needs to be expressed
at a particular instance. And the bottom line is the result
is that certain genes get a lot of RNA polymerase, certain
don't get very much. And what I want to convince you is
that by understanding these rules, that govern the
distribution of RNA polymerase, we can then subvert
them and create switches that of course can be used
to enable synthetic biology in Escherichia coli.
So, Escherichia coli is a very good tool or chassis, if you
wish, for doing this. Now one other thing about Escherichia
coli that makes it a good chassis is that actually, there's no
such thing as Escherichia coli, there's no such one thing
as Escherichia coli. Actually there are millions, if not
millions of millions of different sorts of Escherichia coli
swimming around in the world. Because, you see the
Escherichia coli that we use in the lab is just one
of millions and millions of species. And if you look
at the sequence of these species, what you see is an
enormous diversion in sequence. Actually, the number in
common genes between the different species is actually quite
small. So what this is telling us is over billions of years,
Escherichia coli has diverged and has picked up genes,
it's lost genes, and actually it's very, very good at picking up
genes, losing genes, and adapting. So of course, from a
synthetic biology point of view, this makes it the perfect
chassis, because what we want to work with is something
which is capable of receiving genes and losing genes.
And is going to be adaptable. Okay, so now we
need to just focus on the topic, which is the transcription
of DNA into RNA and what I'm going to do is, I'm going to show
you some models of this wonderful enzyme that does
this job. This enzyme's called RNA polymerase. It's a little
molecular machine consisting of different subunits,
the two major subunits are called Beta and Beta-primed.
So they're colored in this board diagram as blue
and this rather gaudy pink color. Basically, these two
subunits form what's called a crab claw. There's a gap
between them, and basically what happens is the DNA
that's being transcribed is threaded through. There's a
rather more complicated picture of that here. And
basically, what this wonderful motor does is it motors along
the DNA, copying DNA sequences into RNA. This form of
the enzyme, by the way, is known as the core enzyme,
consists of the two big subunits that form the active site of the
enzyme that actually does the job. And the two big subunits,
B and B', are held together by two alpha subunits, so these
are shown here in yellow and orange. And this is a highly
conserved structure that is found actually in most living
cells. Now, the problem with this structure is that while
it's good at making RNA, it is incapable of knowing where
to start, and starting transcription. Turns out in bacteria
starting transcription is really important, because by starting
transcription at specific positions, first of all, it ensures
that full-length messages are made, but also it allows
for regulation. So, the question is, how does bacteria
solve the problem of directing this RNA polymerase to
specific starts. And basically, bacteria solved this problem
by using a very, very simple tool. And this tool is called
the sigma subunit. So there's an extra subunit called sigma,
that evolved, whose job is really to guide RNA polymerase
to start at specific positions. Now, interestingly bacterial
use sigma subunits, other organisms, eukaryotes, for example,
use a completely different method for solving the same problem.
And of course, what this is telling us is that bacteria
are of course, during their evolution branched off
from the eukaryotes a long time ago and basically
have used sigma subunits to drive regulation. So
let's take a look at a sigma subunit. Here's a very
very elementary diagram of a typical sigma subunit.
Most, not all, sigma subunits carry 4 independently
folding domains. And like in all proteins that contain
independently folding domains, the independently folding
domains each do an individual job. In the case of sigma
subunits, the domains recognize different bits of
DNA sequence at promoters. I remind you that promoters
are the sequences at the beginning of genes that specify
where transcripts start. And to get a long story short,
the four independently folding domains are referred to as
domain 1, 2, 3, and 4. Domain 2 contacts something called
the -10 element, that targets promoters. Domain 4 contacts
something called the -35 element, that targets promoters.
Now let me show you a sketch which hopefully will make this
really clear for you. This is a sketch derived from some brilliant
structural biology done by the Seth Darst lab about 15
years ago. Basically, the Darst lab solved the high resolution
structure of a bacterial RNA polymerase molecule carrying
a sigma subunit. And what you can see, if you look really
hard, is you can see the crab claw, the B and B' subunits
are colored light blue and pink. And basically, some of the
domains of the sigma subunit show up in their structure.
And they're shown up in this gold structure, so we have
domain 2, domain 3, and domain 4. Actually in this structure,
domain 1 doesn't show up. And basically, this very, very
simple diagram that comes from a most amazing piece of work
shows how sigma subunits work. Basically, the different
domains of sigma are splayed across the surface of the
core enzyme and they provide a template for the recognition
of DNA. And if you look very closely, at this slide, you can
see how the three domains of sigma, shown here, 2, 3, and 4,
recognize three individual segments or elements of the
promoter. Now, there's a simpler way of looking at this
and this is just to show a diagram. So here's a schematic
diagram of what you just saw, grossly simplified, but basically
showing the sigma subunit in orange here. And the idea is
that the three domains of the sigma, 2, 3, and 4, contact
three different elements at the promoter. Now there's one
other point that I need to make here, and this turns out to be
really important in a moment. And this point concerns the
alpha subunits. Remember I told you that RNA polymerase
contains two alpha subunits, and I told you that the alpha
subunits are responsible for the assembly of the B and B'
subunits. Turns out that each alpha subunit actually
contains two domains, an N-terminal domain, a large
N-terminal domain, and a small C-terminal domain.
Turns out it's the large N-terminal domain that does the
holding together of the B and B'. And the small C-terminal
domain, shown here as these two cherry-like things
attached to the N-terminal domain by a line, which represents
a flexible linker. It turns out these two C-terminal domains
fold up into a structure that recognizes yet another
element at promoters, and this element is called the
UP-element. So all together, when RNA polymerase
recognizes a transcription start site, there are four
main interactions, three made with different elements
by the sigma factor, and one made by the two alpha
CTDs. And basically, together these elements drive
RNA polymerase to promoters and position RNA polymerase
so that it can begin transcription at specific positions.
One thing you need to know, just before I move on,
is that different combinations of these four elements
are found at different promoters. So not all promoters
have all four elements. And actually, the efficiency of
any particular promoter is determined by the combination
of the elements. And rather like you can make up a
pound or a dollar, I guess I should say, with various
small coins, you can make up a promoter with various
combinations of these four elements.
Now, a moment ago, I told you that different transcription
units receive different amounts of RNA polymerase in
Escherichia coli, the question is why. So if we look at
the textbook, we'll see that there are three reasons. And
they're listed here. First one is what I just told you,
the different promoter sequence elements, the -10,
the -35, something called the extended -10, which is the region
between the -10 and the -35, and the UP elements
differ from one promoter to another. So according to
the precise sequence, a promoter is going to be able to
capture polymerase more or less efficiently. Second factor
is the sigma factor. Turns out that many bacteria don't just
contain one sigma factor, they contain multiple sigma factors.
Turns out that most E. coli strains contain seven sigma factors,
a major sigma factor which is called sigma 70, and six other
sigma factors. These six other sigma factors come into play
in response to certain stresses. And basically what they
do is these sigma factors capture enzymes and drive
it to promoters specified by domains 2, 3, and 4 of these
alternative sigma factors. I'm not going to say anything more
about sigma factors now, but you should be able to
see how by changing sigma factor, you can actually change
promoter specificity. This is a strategy used by many
bacteria to alter gene expression in response to external
cues. And of course, this is something that synthetic biologists
are going to be able to exploit in the future. And particularly
it's going to be very well placed to exploit it because
we know that sigma factors are made up of independently
folded domains. So it's not rocket science to see how you could
alter particular domains to alter promoter specificity.
Now I'm not going to say anything more about sigma factors,
because I want to move on rapidly to the third mechanism
that drives the distribution of RNA polymerase between
different promoters, which is transcription factors.
And E. coli contains somewhere between 250 and 300
of these factors, or I should more accurately say most
E. coli strains contain this sort of number of transcription
factors. And these come in two flavors, activators
and repressors. So, I guess most of you will know that
repressors function by binding at active promoters, so
these are promoters that have good -10, -35, UP-elements.
Repressors function by binding to those promoters and
shutting down their expression. Activators do the inverse,
so activators interact with promoters that are defective
in some way, such that the promoter is not receiving
enough RNA polymerase. The job of the activator is
to reverse that and make sure the promoter receives
more RNA polymerase, thereby driving transcription.
Interestingly, most of these transcription factors contain,
also contain domains, most of them, not all of them,
but most of them contain what I call a "business" domain,
so that's the domain that actually binds to the promoter
and does the business of activation or repression.
And then most transcription factors contain another domain
which is often referred to as a "regulatory" domain, and that
ensures that the transcription factor responds to a particular
environmental cue. Now again, it's not rocket science
to see that by mixing and matching regulatory domains
with business domains, a synthetic biologist can create
a whole bunch of different transcription factors that
can do desired jobs. Now what I want to focus
on now, for the rest of the talk, is activators. Because I
want to argue, what I want to explain to you first, is how
activators work. And then on the basis of that, I want to explain
to you how we can create new promoters that are regulated
by different combinations of activators. Now just for
completeness, I should say that you could do the same
with repressors, but for this talk I'm just going to focus
on activators. So number one question is, how do
activators work? Well, the start point is what I just told you
and that is activators function to recruit RNA polymerase
to promoters, where the different promoter elements are
insufficient to recruit enough RNA polymerase.
And a whole lot of studies done by many labs across the world
have actually shown the mechanism of action of activators
is quite simple. So most activators are dimers, they bind
just upstream of the target promoters, and they contain
a little patch shown here as a yellow spot. And this little
patch is called an activating region. And what's going to happen
is this little patch is going to interact with a part of RNA
polymerase directly, recruiting the RNA polymerase to that
promoter. Why does it need to do that? Well, because
the various promoter elements are insufficient to do that.
Now, of course it's easy to use Powerpoint to draw this, so
here we are. Here comes the RNA polymerase and this
tells us that many activators function by making an interaction
with the C-terminal domain of the RNA polymerase, alpha
subunit. Now, there's an interesting point that I must just
remark here, because polymerase contains two alpha subunits.
And most activators contain two identical subunits, each
of which would have an interacting region, but it turns out,
that actually, in order to recruit RNA polymerase, you only
need one interaction. And this has been proved experimentally.
Now, another thing about this sort of activation, which we
call activation by recruitment, or activation by velcro, because
at the end of the day, the activating region is just like a little
velcro patch that hooks the RNA polymerase to the
promoter, is that this activation is crucially dependent on the
location of the activator upstream of the promoter.
So, you take a single promoter and you move the activator
around, you'll find some locations where it works and
some locations where it doesn't work. And interestingly, what
you find is that the locations where it works are normally
separated by 10-11 base pairs, in other words, one
turn of a helix. This is some data, very old data from my
lab, in which we took a promoter that was being activated
by a single activator and basically what we did was we
moved the gray boxes, the activator, to different locations.
So these locations are shown on the x-axis, the y-axis shows
the activity of the promoter. And what you see very, very clearly
is that there are some locations where it works, some where
it doesn't, and the distance between the locations where
it works, correspond to the turn of the helix. So the idea is
that in order for activation to take place, the activator
and the polymerase have to be lined up on the same
face of the helix. Now, course you could ask a very, very
interesting question here. Well, hang on a second, if
you move the activator by one base pair, you're twisting
the activator around, around the helix, why can't the DNA
just twist back or if that linker that joins the C-terminal
domain with the N-terminal domain, why can't
that join? Because the thing is, that requires an energetic
penalty. You have to pay energy to do that, and the fact
that these spikes are so sharp is telling you that that
energetic penalty is too much to pay. And this brings me to a
really interesting experiment proposed just a couple of years ago
by a professor of biochemistry at Peking University,
Yiping Wang, and he suggested that if you could
increase the binding of the activator to the RNA polymerase
then maybe these peaks would broaden, maybe some of these
locations where there was no activation could become
locations where there is activation. So his idea was very,
very simple. And together, we did some of these experiments
and basically, rather than presenting you with all his data,
I've just presented you with some cartoons. So we start
at the top, this is sort of the promoter that I spoke about.
And I want you to imagine that the activator is misplaced,
say by one base pair, such that it doesn't work. What Yiping's
student did was introduce an UP-element, so this is shown
as the blue rectangle here, just downstream of the activator,
in the middle. That, of course, increases the binding
of alpha CTD to the DNA and actually this little menage-trois
of the DNA, the alpha CTD, and the activator, the three
components bind cooperatively together and turns out
that when you measure the activity of the promoter, this
promoter activity actually goes up. So the conclusion
of this is that if you beef up the binding, you actually allow
the activator to function at a location where it wouldn't
normally function. Now, in a moment, I hope you'll see
why this is important. Following that, we had a great
idea that complemented the Chinese experiment, and that
was hey, rather than putting an UP-element, why not
put another activator? So this is shown in the bottom
of the -- the bottom sketch here. And the little red
square, or little red rectangle, that's another activator,
and it turns out you can produce exactly the same effect
just by putting in another activator. So, essentially just by
working with these simple principles, we created a promoter
that is dependent on two activators. Now, why is this
important for synthetic biologists? Well, it's important
for the following reasons. Of course, it would be easy for
a synthetic biologist to take the information I just told you
and design a promoter that was triggered just by a single
activator, a new activator. That would be easy. But
it would be much smarter for the promoter to be
co-dependent upon two signals, rather than one, or
even three signals, because if you could do that you could
produce combinatorial regulation. And so this little experiment
here actually suggests a great idea, which is that one
could exploit the fact that RNA polymerase has two alpha
subunits and that activators can function independently
binding RNA polymerase to increase the recruitment of
RNA polymerase to promoters to create switches,
at which expression was actually dependent
upon two activators. The question is, has E. coli
thought of this already? Metaphorically, of course.
And the answer of course, is yes. This is our typical activator
doing its stuff by interacting with alpha CTD, it turns out
that there are many, many examples of naturally
occurring promoters, where a second activator works
by interacting with the second subunit of RNA polymerase.
Now what I didn't tell you earlier was that there are dozens[,00:23:33.19]
if not scores, if not hundreds, of activators that work like this.
So, in this example, I've just shown schematically with
the yellow and the red, but you -- actually, I think you can see
this mechanism could work with pretty much any activator
which played this game. And of course what this does is
opens the possibility to new combinations. These are just
some data to show that it really does work in the lab.
So in this experiment here, what we've done is we've
taken a promoter with a single activator, anchored one
position, we've then moved the position of the second
activator on the DNA. So again, it's the same deal.
The x-axis denotes the position of the activator that
we're moving, the y-axis denotes the activity. What you can
see is there are locations where the second activator works
and locations where it doesn't work. And again, the
phenomenon is face of the helix dependent, in other words,
things have to be lined up on the same face of the DNA.
Okay right, so this slide shows a few examples taken
from the literature and actually on here, there's one example
where Ann Hochschild's lab at Harvard Medical School actually
took two activators that normally don't talk to each other
and showed that they could function synergically together.
So just to summarize, the idea is that most activators
in E. coli function via mechanism like this similar mechanism
and because they function by making contact by different
patches on the RNA polymerase, we can mix and match
different activators to produce new combinations.
But of course, in order for this to work, for the promoter
to be co-dependent on both activators present at the same
time, what you have to do is you have to stop the promoter
being activated just by a single activator. And in order to
do that, you play this little trick of misplacing one of
the activators. We call this the independent contact model
for transcription activation. Just to show you that I'm not
making this up, here is some data produced by a member
of my lab, Doug Browning. So what Doug has done here is
he's compared the activity of a test promoter with either
one activator or two activators. Now in both cases, you
see when you go from one activator to two activators, the
activity increases. That's the y-axis. And what you see is
the starting promoter, this is on the left here, the activation
is probably something like 2-fold, and that's because
the first activator is pretty good at activating. But if you
misplace the first activator, so now you moved this
slide on the right, now misplaced the first activator
the ability of the activator to activate falls right down.
And now you get incredible synergy, I think it's 15-fold.
15-fold synergy. And you can play this game forever,
to tune your promoter to get the output that you want.
So we call this activation by independent contacts, and
I just stress that one of the reasons it works is because
the RNA polymerase has two alpha subunits, and each alpha
subunit has a C-terminal domain. And I would just point out
these C-terminal domains are highly conserved throughout
most bacteria. Interestingly, they're not found in eukaryotes.
In eukaryotes, the subunits that encode the -- or fulfill the
alpha function, one subunit does have a C-terminal domain.
the other doesn't. Okay, so this is activation by independent
contacts. Just like to finish off by asking a question
which is, is there another way of doing it? And yeah, there is.
There is another way of doing it. Let me show you what that
way is. So, we're thinking about a promoter that's codependent
on two activators, let's call them A and B, okay?
So here are A and B, now A and B in this case, don't
bind to the DNA. Remember in the previous case, A
and B bound independently to the DNA. But in this case,
A and B have to interact together before they bind to the
DNA. And when they bind, that recruits our RNA polymerase.
So basically, this creates the same thing. This creates
codependence, you have codependence on A and B,
but this time, the codependence is due to the fact that
A has to bind to B, and B has to bind to A, before
the complex binds to the DNA and interacts with the RNA
polymerase. Now, it turns out that in bacteria, or perhaps I
should rephrase this, in the bacteria that have been studied
so far, because I'll remind you that so far, probably only
I think north of 1% of bacteria have been studied. So these
ground rules that I'm enunciating that come from E. coli
that apply to E. coli, might not apply to some other bacteria.
In fact, they probably don't, but we just don't know about it.
But in the cases that we've looked at so far, you very,
very rarely find this. You find independent binding of A and B,
and A and B making independent contacts. And you find
a lot of that, but you don't find much of this. And in fact,
I believe the last time I looked, I found just three examples
amongst thousands of examples. Of course the question is,
why is that? And I believe there is a simple explanation.
And it comes from the dynamic genomes of bacteria.
The fact that bacterial genomes are dynamic and changing
all the time. Remember, I told you at the beginning,
that there are millions and millions of species of E. coli.
Millions and millions of types of E. coli that have widely different
genes. Now, the point is, if you're going to fix coregulation
by A cooperatively binding to B, A has to have a surface that
binds to B, and B has to have a surface that binds to A.
So the two have to be committed to each other. And if
they're committed to each other, then they're fixed.
Okay? Now, of course, if A now wants to go and play with
C, it can't. Or D. Or E. Or F. So what I'm trying to get over
is that I believe that the independent binding model that
I just showed you, which we find very frequently at
coregulated promoters in E. coli. I believe that that has been
favored in the evolution over the cooperative binding model
simply because it allows far more flexibility. It allows far
more mix and match. Now interestingly, in eukaryotes,
this mechanism of cooperative binding is found very,
very commonly to explain codependent activation of genes.
And I think there's a basic divergence here in strategy.
Anyway, that's a hypothesis that nobody can prove
or disprove, but it's an interesting thought. And at least
it's an explanation that attempts to explain what we see.
And also, it gives synthetic biologists plenty of room
for them to play. And on that topic, I think I want to just finish
off by stressing this number, 250-300. There really are
a lot of transcription factors, and this is illustrated in this
wonderful diagram that's actually totally out of date
and very, very old, but it makes the point perfectly.
This was taken from a review by Julio Collado-Vides,
Collado-Vides runs a website in which he tries to collate
all the information on E. coli transcription factors and
what you see here is that there's a lot of cross-regulation.
Most regulators regulate more than one target, most targets
are regulated by more than one regulator. There are a lot
of regulators and hence, there is a lot of room for synthetic
biologists to play. Most of these regulators regulate, not all,
I stress there are some exceptions, but most of these
regulators regulate by the mechanisms, the simple mechanisms,
that I showed you. So my ending point is that if you take this
and you take the possibility of mixing and matching different
domains, there really is a lot of evolutionary space
into which we can move. So I'd like to finish by thanking
you for your attention.
