I'm Chris Voigt, I'm a synthetic biologist in the biological
engineering department at MIT. And today I'm going to talk to you
about some of the work that we've been doing to create a
programming language for living bacteria. And so
when you think of programming, you might think about
building a piece of software for a computer or trying
to control a robot. But what we've done is actually
created software that allows somebody to write a
program that then gets compiled into a piece of DNA,
that gets put into cells and can be run in those cells.
The reason we want to do this is that computation
underlies everything that we see in biology. And if we really
want to exploit the products that biology can produce,
we have to be able to control what types of computation
cells are doing and how they're thinking and processing
their environment. So what really inspires us is all of the
things that we can get by engineering biology. So believe
it or not, this is already an enormous economy. So about
2% of the United States economy, or $350 billion a year,
comes from engineered biology. This is genetically modified
cells that produce a product of some sort. And we're surrounded
by these products in all sorts of consumer goods, from
the plastics that are in bottles and in your car, to precursors
to medicines, to the foods that we eat. But there's one thing that's
common across all of these different products. And that's
how simple they are. So if you look at the bottom of this slide,
you see that every one of these chemicals that we produce
is actually quite simple. It's a few carbons, an oxygen, hydrogen,
and so on. And it is really a very simple thing that we might
be able to get. What we want to be able to do is actually
fully unleash the potential of biology. And there are all sorts
of things that biology can do that we don't really know how to
access yet. So they can make chemicals that are impossibly
sophisticated for organic chemistry, or build entire organs
like the liver. It's a natural nanotechnology. So some cells
can build materials out of silicone or iron that are sophisticated
and really have a level of accuracy that extends well beyond
what we can do in the lab. And then finally, there are all sorts of
functions that we'd like to be able to access. So at the bottom
of this slide, I'm showing cells or bacteria that can associate
with a plant, in order to deliver fertilizer to that plant. And that
requires real sophistication in the way that the bacteria
is interacting with that host. And so the reason that we
can't access all of these yet as products of biotechnology
is really for two reasons. The first reason is that all of these
functions require many genes, and all of the products that we
currently get out of biology are many 2, 3, 4, 5 genes or something
like that. Whereas these types of products require control over
hundreds of genes or even thousands of genes. And then it's not just
enough to be able to control those genes, you have to tell each
gene when exactly to turn on. The timing, and also the location.
So the particular environmental conditions for that particular gene
has to turn on. And so this is really what we're trying to address in our work.
Where we wanted to created a language that allows
a biotechnology person, someone who's doing genetic
engineering, to go into a cell that they're trying to
build, and tell exactly what genes to turn on and what
times as part of building these products. So now if you're not
used to thinking about cells as doing computation,
they're actually doing computation all of the time.
They're thinking and processing about their environment,
they're figuring out where they are, they're figuring out what
genes need to be turned on to survive. And the way that it's
doing this computation is that it uses a really large sophisticated
regulatory network. And a regulatory network is just where you
have proteins, you have DNA, and you have RNA that are all
interacting with each other. And within these interactions,
you then get computation that arises. So what I'm showing
here is one of the first regulatory networks we figured out.
And really it's encoding a very simple decision that
a virus makes when it enters a bacterium. So it basically
has to decide, am I going to immediately kill this bacteria
and escape into the environment or am I going to hideout
in its genome to survive for a longer time? So this yes/no
decision gets encoded by these different interactions.
And so if you look at this plot, you'll see that there are circles
and these are proteins, and there are lines, and these are
interactions, and the horizontal lines are the DNA that they bind to.
And all together they work together to create this switch.
So at first, in the '70's and '80's, this was being discovered.
People started to use some language like switches and
logic and things like that. And then in the '90's, as these
networks got more and more sophisticated, you started to see
some of the language of electrical engineering being used
in describing these types of networks. So if you look carefully,
you can see that there are going to be some cases,
so for example, if you look up here, where you see these little
gates. And so these are symbols out of electrical engineering
that describe a logic operation that's being done. So those
red arrows are the signals that are going into that logic,
they get processed, and there's an output. And if you look at
the natural regulatory networks that are there in bacteria,
they get huge. So what I'm showing here in this mothball
is an interaction network for the bacterium E. coli.
Where all of the red circles are proteins, and the blue lines
are interactions. And it's all of these proteins and their
interactions and the way that they interact with DNA
that is allowing that bacterium to compute on its
environment. And so somehow, the cell's able to get hundreds
of regulators to work together. Now for about the last 10 or
15 years, a variety of researchers have started to create
what are known as synthetic circuits. And this is where
there's a circuit function that they may want to produce, like
an oscillator, and they figure out how to get the right
regulatory proteins to interact with each other in order to
create that oscillation that they want. And so for example,
I'm showing at the bottom of this slide, where a few of these
regulatory proteins are wired together to create an oscillator.
The green little images are bacteria that are oscillating,
and you can see in the graph that the quantification of the
oscillations. And so a lot of these types of circuits have been
built that are oscillators or pulse-generators, or logic functions.
Just about anything that you see in an electrical engineering
has been reproduced. But there's been a challenge, and that is
that all of these circuits have been limited to 2, 3, 4 regulatory
proteins. And we know that there's this possibility where the
bacteria has hundreds of interacting regulatory proteins. But we
haven't been able to bridge that gap, to go from simple
circuits to the level of sophistication that cells naturally
have. Which is what's ultimately going to be required
to control them. And on top of that, even building one of these
simple circuits requires years of effort and tends to be a very
high profile type thing to do. And so we wanted to both
try to build larger circuits, but also make it faster.
And so first, what are the things that are stopping
us from making this transition from a few simple regulatory
proteins to getting it across the entire network. To make things that are
at the scale of genomes. So first, one of the biggest problems is with
design. And this is where there simply haven't been enough
well-characterized regulatory proteins, where we understand
how they behave at such a quantitative level that we can predict
how they're going to work together as we combine them.
There's also a problem where software doesn't really exist
to help you with that process. And so researchers are
forced to manually put pieces of DNA together
in a Word processor to try to figure out how to build one for
circuits. And then finally, there's a lot of toxicity that can arise
as you put these pieces into DNA, they can start to hurt
them. So the second problem is just the physical construction
of building a big piece of DNA, it's actually quite challenging.
And putting all of those pieces together without creating
errors has been a big problem, but there have been a lot of
technology recently that helps us address this. And then finally,
going in and debugging one of these circuits is a challenge.
We're putting them into living cells that change based on their
environment and what stage of growth they are. It's often
difficult to see how the circuit is performing, and you usually
have to look at dynamic measurements. So you can't just
take a picture of the cells and from that one picture,
you know that the circuit's working. You have to follow
it over time, and that can sometimes be a real challenge.
So throughout this talk, I'm going to describe how we've
gone in and addressed a number of these different challenges.
And so the paradigm that we decided to do was to really
go after creating a software program that would allow a
user to go in and in the same way that they're programming
a computer, they could program a cell. So the first step of this is
that the user would go in and write a textual program that then
the software compiles into a circuit diagram that's made up of
individual gates that are connected together in order to create
the circuit function that's desired. Then the software creates
the DNA sequence associated with that circuit function,
which then can be sent off and synthesized by a company.
It's then sent back to you, you put it into a cell, and then
the program that was written in the computer gets
run in the bacteria. And so there are two stages
to get this to work, the first of course is just getting all of
the software. And that was relatively straight forward.
One of the most challenging things was getting the gates
-- one of the fundamental units of computation that work so
robustly that you could put them together in different
configurations to allow a user to create any program that they wanted.
So if you're not used to thinking about how cells do computation,
this little animation here shows how we do it.
So if you have a sensor, this is as we define it, a piece of
DNA that allows a cell to respond to a signal and then
control the expression of the gene. And the way that this works
is that in the sensor, there's a piece of DNA that encodes a gene
that makes a protein that then can bind to this signal.
So in this case, it's a small molecule shown in red.
And when that protein binds to that molecule, it can then
bind to DNA at a promoter. And this causes a flux of
RNA polymerase that then turns on the gene that
this is connected to. And so if you have a presence of the
molecule, this causes a high flux of RNA polymerase. And in
the absence of that molecule, there's a slow flux of RNA polymerase.
And so, in defining a sensor, we can then black box this and just
think about the different levels of signal that's present and the
different fluxes that this then causes. We can then define a
circuit similarly. Except that in a circuit, we have both the input
and the output as RNA polymerase fluxes. And so in this
case, we have a circuit that takes a high flux of RNA polymerase
as the input and then converts this to a low flux of RNA polymerase
as the output. And so this acts as an inverter, and you can see
the response function. And what's really key about this
design is that because both the inputs and the outputs are the
same, they're both RNA polymerase fluxes, it becomes very easy
to connect them to sensors and to each other.
And so for example, you could have a sensor that has
a signal that's an input and an RNA polymerase flux as
an output. And this can be connected to a circuit that has both
the input and the output as flux, this can then be connected to
another circuit, and this could be done ad nauseum until you're
sick of it. And then the last step is you take an RNA polymerase
influx as the input and convert this to a cellular response.
And the device that does that is the actuator.
And so to figure this out, we turned to some theory from
electrical engineering and something that had been recognized
in that field is that some logic functions are what are known as
Boolean Complete. And what this means is that anything that
you can imagine on the computer can be broken down into
these simple Boolean Complete logic gates without
any additional computational functions. It is one of the basic
principles that allows digital computing. And so one of these
functions is what's known as a NOR Gate. And a NOR Gate
is a two input, one output logic function. So here I'm showing the
electrical engineering diagram for a two input, one output
NOR function, and then this is the logic function that this produces.
So when both of the input signals are off, the output
is on. And if either is on, or if they're both on, then the output
is off. So believe it or not, that simple function is all that's
required to build anything that you see in the computer.
So now that's sometimes hard to wrap your head around.
So I like to use the example of the Apollo 11 missions
to the moon. These were very basic computers at the time.
And believe it or not, they were entirely based on interconnected
NOR gates. So it took 5600 NOR gates in order to create the
guidance systems that took those astronauts to the moon.
And so if we have 5600 NOR gates that we could put in a
bacteria, we could reproduce the Apollo guidance system
within a cell. So the trick then becomes how do we
create a logic function that's this NOR function, that
could be encoded in DNA. So to do this, we turned to
a basic logic function that had been done previously.
And this is what's known as a NOT gate. And a NOT gate
just does the opposite of whatever you tell it to do.
And so if the input is on, the output is off. And if the input
is off, the output is on. And so you can very easily build
a NOT gate in DNA. And the way that you do it is the following.
You just have a repressor, so this is a gene that produces a
protein that then turns off a promoter. And so then we have an input
that's a promoter, and so you have RNA polymerase flux
that's going into the gate, it then produces the repressor
protein, and this then turns off the output promoter. And so,
what you can see at the bottom of this slide is the response
function. So as you turn on the input promoter, it turns off
the output promoter. So we can then invert this very easily into a NOR function
just by having a second promoter upstream of this gate.
So now we have two input promoters connected to
each other in the DNA. And if either of those input promoters
are on, then you have either RNA polymerase flux producing
the repressor protein, which then turns off the output.
And so if you look at the logic function that's produced,
you have the case where if both those promoters are off,
then the output is on, and it's only the case where
if either is on, or if they're both on, then you turn the gate
off. And so that's this basic NOR function, upon which you
can build any other circuit function that you want. Now what's
really critical about this design is that both the inputs
and the outputs of the gate are promoters. And so this means that
you can take the output promoter of one gate and feed that
as the input promoter to the next gate, and you can just
do that in series until you've built up the circuit function
that you want. But there's a problem with that, so if we go
back to the Apollo 11 circuit boards, which were based on NOR gates,
we found that all of these NOR gates are basically the same
gate architecture. And they're physically separated from each other
on the circuit board, so you could use the same design over and
over and over again, and just connect them because they're
physically separated. But a cell looks more like a burrito
where all of the biochemicals are pushed together and bumping
into each other. And so if you have 5600 NOR gates
based on 5600 repressor proteins, all of those proteins are
bouncing off of each other and bouncing off of everything else
that's in the cell. And this can create interference between
them. So one of the first things that we had to do was to find
repressor proteins that wouldn't interfere with each other or
with the cell. So to do this, we went in and we went into
genomic databases, and using some computational
programs, we found a number of different repressor proteins.
And then created synthetic promoters that responded to those
repressors. And so what I'm showing as the grid is
all of the different repressors and all of the promoters that
had been designed for them and how they interact with each other.
If it's a red square, that means they interact. And if it's
blue, that means they don't interact. So for example,
in this case here, if you go to the TetR protein, then it is
binding to its promoter, but then isn't interfering with any of the other
promoters in this system. And what you see then is that for this
set of repressors and promoters, there's a core set of about 16
that don't interfere with each other. And that is about 16
gates that you can use together as part of a circuit design.
So for each one of these non-interfering repressors and their
promoters, we then characterized the response function.
And that shows how the gate turns off as the input promoter
turns on. And that gives us the information that the computer can then
use to figure out to put together the gates to build the desired
circuit. So now there are a lot of tricks that we had to use to
get this to work. And one of the big problems was that all of these
different gates were interfering with each other. And so if you use
the gate in one context, it wouldn't work when you tried to build it
in another circuit. And so what we had to do was we had to
create insulators that allowed these gates to be moved around
in a lot of different combinations, in order to build whatever
circuit function somebody may want. And so to do this we had to
insulate all of the underlying parts. We had to make it so that you
could interconnect promoters easily, and you could stop
the RNA polymerase flux from bleeding from one of the
gates into the other, which would then screw up the
computation. When we did this, we had to rebuild the gates
to make a set of insulated gates that could all be used
in a lot of different configurations. And this involved a lot
of legwork in figuring out those parts that could make each
one of these gates perform robustly without interfering with
all of the other gates in the library. Once we did that, we
had this set of underlying gates that worked well.
We then started to develop the software that would help
us to put them all together. So to do this, we turned to some
software languages that had been developed previously in
electrical engineering to help them build more sophisticated
circuit functions. And so we turned specifically to a language
known as Verilog, which has been around since the 1980's.
And this is how electronic chips are designed. It's a hardware
independent language, meaning that you can generically
describe a circuit function and you can compile it to an
Intel chip or an AMD chip or an FPGA or whatever you want.
And what we did is we went in and hacked it so that instead of
compiling a circuit to silicone, it would compile it to DNA.
And this is the scheme for how it works. A user goes
in, and this is a web-based program, and they can write the
circuit function that they want using Verilog commands.
Then once they hit compile, the program then figures out
the circuit diagram and the gates that are necessary in order to
create that circuit function. It then goes into our library of
insulated gates and figures out how to put all of these gates
together in such a way so that you can get a good circuit function.
It then strings them together as a linear piece of DNA, and the output of
the program is the DNA sequence, which these days can
be sent off to a DNA synthesis company. And then a few weeks later,
you get your program and you can put it in cells and test it out.
And so everything in the dashed box here is completely hidden from the user.
We wanted to make a compiler where you just write your program
and then out comes the DNA knowledge that that program is
encoded in. And so this is a software, this is a movie showing
how the software actually works. So you go into it and you
first have to set the sensors that you want. So you may
want your circuits to sense a small molecule or oxygen or
a metabolite or communication signal from another cell.
And then you can either use the ones we have or upload your own.
You then write the circuit function that you want using this Verilog
language. And so here we're just showing a very simple logic
function that's being written. And then you go in and there are a couple
things that you can specify. So you choose the organism that you're
compiling this program for, and in this case, we're looking at E. coli.
And the gate technology, which in this case, is based on our repressors.
And people can go in and upload other organisms and other gate
technologies. And then you have to select the outputs, whether
they're fluorescent reporters just to figure out if your circuit works,
or if you're trying to connect it to some function in the cell.
Then you can go back and the first thing that you have to do
is verify that the Verilog code works well. And then once it's
verified, you can then compile the program.
And so now as it's being run, the software is going and it's
taking those text commands for the circuit that's desired,
it's figuring out the circuit diagram that needs to be built, it's
taking all of the different gates in our library and assigning them
to those positions, and then it's building the DNA sequence.
So here we're looking at the results of the software.
It's telling us which gates are connected. And for all of the
different states of the circuit, which sets of repressors are
being produced. It's making sure that those response functions
for all of those gates cross each other nicely, and thereby
making a nice overall circuit function. And then finally, it actually
predicts the experiments. So this is showing the distributions that
an experimentalist would get if they ran a method called flow
cytometry. It allows you to looks at tens of thousands of cells.
And then finally, this is the sequence that comes out. And so
this is in NCBI format, and this is the DNA sequence that
then encodes the program that was written a few seconds ago.
And so that could be sent off and synthesized. So once
we had this, we started to go in and design and build these
circuit functions. This is one of the first ones that we tried to
build, and this is what's known as a multiplexer. A multiplexer
is just a circuit that has 3 inputs, and one of those inputs
selects between the other two to figure out what the
output should be. So it's a 3 input, one output logic function.
What I'm showing is the program that was written, and then
everything that came out of the compiler once what's synthesized
was run. And so the first thing that had to happen is that
it created the wiring diagram. It then figured out what repressors
need to be at what position of the circuit, and it created the linear
piece of DNA. And then what I'm showing over here is the data.
So we have each of the different states of the circuit.
So all of the different combinations of inputs. And then
in black, we have the experiments, which is showing the response
in tens of thousands of cells, and then the red distribution
is what was predicted by the program if it's low. And if
it's the blue distribution, that's what was predicted if it was
high. And so you can see in this case, we were able to
very accurately get a circuit function that we had written.
And so we did this for more complex circuits, so this is a
priority circuit. Basically, it assigns a priority to three inputs,
and then the three outputs are a determination of that priority.
So again, we wrote this circuit function and hit compile,
and then created the DNA sequence that was associated with that.
And so again, this time again, we have three outputs and three
inputs. And if you look across all of the states, there's very
close agreement between how the cells performed and how
they were predicted to do. And so this was really a revolutionary
idea in accelerating the way in which we build these circuits.
And so we started to go really fast. So once we had this
ability, whereas before it would takes months or years
to design one of these circuits, we could really every time
push a button and get a new prediction and go out
and build it and test it. And so we did this for many,
many circuits. So we built almost a megabase of DNA
associated with roughly 60 different circuit functions that we
tested. And we found that all but 13 worked perfectly the first time.
And those 13 that failed, tended to fail in ways where we could figure
out what happened after the fact. And so this is now
a way where a researcher can have a circuit function
that they want, they can write it, compile it to a piece of DNA
that has a pretty good likelihood of working in the first try.
So then the next step is taking one of these circuit functions
and connecting it up to everything else in the cell. And so
the circuit function is giving the cell the computational ability
to perform some function, but then that gets connected to
sensors that feed into the circuits. And then the output of the
circuits are connected to a variety of different things in the cell that
you're trying to control. So in this example, we're sort of imagining
bacteria that can grow antibiotic cloth by knowing to turn on
silk proteins at certain times, secreting those proteins
out using a protein secretion device, and building the antibiotic
and loading that into the material, and then killing themselves
after it's all done. And you can take a multi-step function like
that and encode it into a cell and control it using a circuit like the ones
that we've built. So now that we've done this for simple model
organisms, we're extending it to a variety of different organisms.
So that we can go after different applications. We've built circuit
functions that work in a bacterium called Bacteroides, and this
lives in the gut. And we've shown that in mice, in which these
bacteroides bacteria have colonized, by feeding the mice
different foods, we can trigger different circuitry in the
bacteria that's living in the gut and change how the bacteria
are eating food or producing a therapeutic effect as a way of
ultimately creating a human therapeutic. We're also moving
these types of circuits into bacteria that associate with
plants, so that they can be planted with agricultural crops
and then used in order to tell those plants different things that are
happening about their environment, and computing and thinking about that.
And we've even moved some of these basic circuit functions into
yeast, as part of fermentation processes. Or mammalian cells as
part of therapeutics. So with that, I'll conclude. There are a number of
people that participated in this research. Notably, Doug Densmore
is at the electrical engineering and computer science department
at Boston University. And he's an expert at Verilog and
design automation, which was critical for making this work.
There are a number of people, both in his lab and my lab, that have
been involved in this research, everything from building the gates and
insulating them, to building the software package that allows us to
put them together. So with that, thank you.
