By the time that Watson and Crick
figured out the structure of DNA,
you know, it was sort of obvious
that since the two strands were
complimentary you could see how it
replicated.  And they also could see
that somehow the information must be
encoded in the sequence of letters
down the strands of the DNA.
But it wasn't obvious what the code
was and how it was arranged,
how it worked.  And in principle it
was anything you could do with
four-letters.  And so I pointed out
the other day this was sort of a
four-letter alphabet.
And I think it's useful to think of
it this way with A,
G, C and T, and RNA as also being a
four-letter alphabet.
But proteins are actually a
20-letter alphabet because there are
20 different amino acids.
And so somehow, since one of the
key things that the DNA had to do,
it somehow had to encode the
information for making the proteins.
And there was a lot of work on
protein biosynthesis at the time.
And it looked pretty complicated.
People had found that RNA seemed to
be important.  Cells that were
making lots of protein had lots of
RNA in them.  And another thing they
noticed was that if you looked in
eukaryotic cells the DNA stayed in
the nucleus.  The proteins,
most of them, were out in the
cytoplasm.  And the evidence was
that they were made out in the
cytoplasm.  So somehow the
information had to get out of the
nucleus where the DNA was
and into the cytoplasm.
And biochemists were breaking cells
open and trying to make cellular
extracts that would synthesize
proteins.  And I think it's fair to
say at the time that it looked
extremely complicated.
And so thinking about how DNA
encoded information and got
translated into proteins was a very
complex issue.
But then actually there was a very
interesting development that had a
strong influence in Watson and Crick
and led to them,
Crick in particular,
getting a key insight into the
nature of this coding problem.
There's a physicist,
George Gamow, who some of you know.
He proposed the ìBig Bang Theoryî.
A very strong theoretical physicist.
And he wrote a letter to Watson and
Crick.  He thought he'd figured out
the basis of the genetic code.
And his idea was you had these
sequences of A, G, C and Ts.
And so everywhere the two bases came
together there was sort of like a
little different shaped hole.
So his idea was the amino acids
would stick into these little holes.
And he had a theory showing that
you could encode the sequence of
proteins by having the side chains
in the amino acids stick into these
little holes along the DNA.
Now, there turned out to be a
number of problems with that.
It didn't take into account the
involvement of RNA,
which there sort of was quite of bit
of evidence for.
And more importantly it didn't take
into account the structure of the
side chains of the amino acids,
which you guys have been exposed to.
But it had a very profound
influence on Watson and Crick.
They read this letter.  They
immediately realized the idea was
wrong and went out and had a lunch
at a pub, decided again how they
actually thought there were 25 amino
acids, but they realized some of
them were just sort of special ones
that were modified only in
particular proteins and there were
really 20 amino acids that were
found universally in nature
and amino acids.
And what they,
Crick in particular,
realized was that maybe instead of
having to think about protein
synthesis through this very complex
set of extracts and mixtures a
biochemist would work on,
that he could think about it at a
purely theoretical level,
which basically is up at this kind
of level.  But if you have a
molecule that has four letters and
it's going to be encoding proteins
how does it do it?
Can I work out sort of the basis or
a possible theory for how that could
happen without actually knowing all
of the biochemical details?
So Crick made a couple of
simplifying assumptions.
One was that the DNA only
determined --
-- the linear sequence of amino
acids and protein.
That all this information about the
3-dimensional stuff came from the
properties of the linear sequence
once it was made.
And I think you hopefully have
enough understanding of hydrophobic
and other sorts of interactions that
would cause a linear sequence amino
acid to take a particular
confirmation.  And the other
assumption he made was that
it must be universal.
And it would be hard to see how life
could have started if there wasn't
some kind of code that was universal
between organisms.
And if you start from those kinds
of considerations then what you can
see is you cannot just have a
one-to-one correspondence between a
letter in the nucleic acid alphabet
and a letter down here.
If A stood for valine that would be
fine, but you could only have code
for four amino acids that way.
So if you had one-letter words in
DNA there are four possibilities.
And so it could only make four.  If
you had two two-letter words then
you'd have 16 possibilities,
still not enough for all the amino
acids.  If you had a three-letter
word --
-- then you could do 64,
and in principle that would be all
you'd need.  It doesn't rule out
there couldn't be five or six or
seven-letter words.
Or if you think about this as they
were thinking about it at the time,
even if it were let's say a
three-letter word,
is it a code where you have one word,
then the next word,
then the next word?  Or could it be
an overlapping word?  And
what about punctuation?
And maybe another thing,
you can see if it's AG, CT,
etc., there's a frame of reference
problem, because if I'm going to
read them in groups of three,
if I start here I'll get one word,
but if I start one letter over the
next group of three won't be the
same.  So somehow there would have
to be a starting point.
And so these are the sort of
considerations that they had to take
into account.  And, in fact,
Watson, excuse me.
Francis Crick and another scientist
Sydney Brenner and some other
scientists worked out a very elegant
genetic experiment that demonstrated
that it was a three-letter code.
And I don't have the time to go
into it in this course.
If you take a genetics course it's
a very beautiful experiment.
The principle of the thing, which I
could show you rather easily,
is if you're writing a thing where
you're reading in three-letter words,
something like this.
The cat ran out and,
I don't know, ate the rat or
something like that.
And these were all just
continuously run together,
not separated out, but I've put them
out here.  As you can see they're
three-letter words.
If you lost one letter then it
would change to sort of gibberish.
You'd get stuff that looked like
this.
And if you put one in you'd have the
same problem, but if you were to
either take out three letters or put
in three letters then,
even though there'd be a little mess
in here somewhere,
say I took out two more of these,
what we would now have from then is
the rest of it would now
make sense again.
And they did this sort of experiment
genetically.  They managed to figure
out there were two kinds of
mutations they could get in a
particular way.
Some were putting in a letter.
Some were taking out a letter.  And
they didn't know at the time whether
there were adding or deleting,
but they could tell they were in the
opposite directions.
And then they found if they took
three of one class,
like three that would delete a
letter and put them all together
then things would more or less work.
Or if they put three that stuck in
an extra letter then everything
would more or less work.
So there was a genetic proof of the
three-letter part of the code before
it was figured out exactly how the
code itself worked.
And so going from this sort of
theoretical insight into the code to
actually figuring out how proteins
were made there was still quite a
lot of stuff that had to happen.
And one was the concept of
messenger RNA.
As I said, there'd been quite a lot
of evidence that RNA was somehow
involved in protein synthesis
because cells that made a lot of
protein made a lot of RNA.
And it seemed to be in the right
sort of place in the cell for the
proteins to be made.
So the idea merged that RNA was
somehow a carrier of information
from the DNA to the cytoplasm.
So it could serve as a template for
making proteins.
So the idea that the cell copied
the sequence of a portion --
-- of the DNA.
And we'd probably think of this as
a gene right now.
Into RNA.  And the RNA would go into
the cytoplasm.
That's the part outside the nucleus.
And then it would serve
as a template --
-- for protein synthesis.
Because of this thought that if you
had a cell like this with a nucleus
and the DNA in here,
that if a piece of RNA were to go
out into the cytoplasm and have
those properties it would be
functioning more or less as a
messenger.  It would be carrying the
genetic information from inside the
nucleus out into the cytoplasm.
And so the term began to be used of
a messenger RNA.
And so over here I'll put an mRNA to
indicate that.
Now, one thing you can also see is
we've talked about the structure of
DNA and RNA.  And it's essentially
the same with one.
This is the nucleotide,
which is the fundamental building
block of DNA.
And if you recall,
in DNA there's a hydroxyl,
excuse me, a hydrogen there,
but in RNA there is this extra
hydroxyl.  This is 1 prime,
2 prime, 3 prime, 4 prime, excuse me.
Let's just leave it like for the
moment, 1, 2, 3,
4, 5.  And so the DNA,
as you heard, was deoxynucleic acid
because it's missing this.
But other than that the backbones
are similar and the letters are
almost the same.
The A, the G and the C are exactly
the same bases in DNA and RNA.
The only difference is with the T
and the uracil.
So this is thiamine which is found
in DNA.
And this is uracil --
-- which is found in --
-- RNA.  So the base pairing is over
on this part of the molecule.
So whether or not you have a methyl
group doesn't really change the base
pairing.  And so this process of
copying information in DNA to
information that's in RNA was seen
as essentially the same kind of
language, but it's just sort of like
taking somebody's word processor
file and writing out longhand.
You'd be transcribing the
information but it would be
essentially the same kind of
information in essentially the same
form.  So this is known
as transcription.
I'll take just one very brief thing.
Some of you may wonder why did
nature do it this way?
Why didn't it just use uracil in
DNA?  So as a very brief aside,
I think we understand pretty much
why it does it.
And that is cytidine has this
structure.  So this is C which is
found in DNA but it undergoes,
all of your DNA is a chemical and
it's able to undergo spontaneous
kinds of damage.
In fact, in every one of our human
cells every day,
10,000 times in any given cell a
base falls off totally just leaving
the deoxyribose sitting there.
And the cells have to fix it up.
And we have DNA repair systems that
do that.  But another very common
kind of thing that happens is that
this NH2 group deaminates.
And if you do that, if a C happens
to deaminate in DNA it
gives you a uracil.
And if that ever happens,
the cell is actually able to tell
that something went wrong because
uracil is not supposed to be in DNA
and there are repair systems that
constantly scan the DNA and take out
any uracils that are in there.
And the reason, if instead of using
thiamine it used uracil then the
cell wouldn't know whether the
uracil got there because it was
supposed to be there as part of the
sequence or whether it had arisen by
deamination of a cytidine.
It's a minor point but I think we do
have an understanding as to why
there's thiamine in DNA and uracil
in RNA.  This isn't such a worry in
RNA.  OK.  But anyway.
So there's still a really big
problem here, though,
that Watson and Crick and others
were grappling with.
And it has to do,
as I say, with this fact that the
information up here is the first in
DNA and RNA.  It's written as a
sequence of letters,
if you will, chemical letters,
but there are only four letters in
the DNA alphabet and essentially the
same four letters in
the RNA alphabet.
However, the protein language has
got a totally different alphabet so
it's somehow like sort of
translating now from English to
Japanese or something like that.
Some really fundamental change had
to happen because there was a real
conversion from one kind of language
to another.  And so this process is
known as translation,
as going from information that's
written using a four-letter nucleic
acid alphabet to information that's
written using a 20-letter
amino acid alphabet.
And Crick on purely theoretical
grounds figured,
well, if you're going from one
language to another what do you need?
You need a translator?
And what's a translator?
A translator is someone who speaks
both languages.
So his idea was that if there was --
I'm going to just separate out,
let's say this is the messenger RNA.
And I, just for clarity here, have
spaced out the three-letter words so
we can see them.
These would be three like G-A-C or
something like that in the RNA.
That there would be some kind of
translator.  And his idea was that
it would be something that had a
particular amino acid at one end and
it had the complimentary nucleotides
at the other end.
So it could, if you will,
read the genetic code that was
written in the RNA using the nucleic
acid alphabet,
but it would also be speaking the
amino acid language.
Got the idea?  So the idea was that
this would be,
they used the words adaptor or a
translator.  So that was on
basically theoretical grounds.
If you had to go from a four-letter
language to a 20-letter language you
needed some kind of translator or
adapter.  Now,
at that same time that these
considerations were going on,
biochemists began to find a class of
small RNAs --
-- that had an amino acid --
-- attached.  And so there were
entities that had just the sort of
properties that Crick had envisioned
you'd need from theoretical
considerations.
These were given the name transfer
RNAs or tRNAs as they're usually
referred to now.
And I've told you that RNA has,
since it's got nucleic acid bases,
if you have a single strand of
either an RNA or a DNA and you don't
have a complimentary double-strand,
then if there are complimentary
sequences they can come together and
pair just the same way that
complimentary sequences can come
together in DNA.
And in the case of tRNAs,
once the sequence of these was
determined, oops.
There we go.  They folded up into a
clover leaf shape.
And the amino acid is attached up
at the 3 prime end of the chain up
here in what's known as the acceptor
part of the molecule.
And so that corresponds to this
part up here.  And here is what's
known as the anticodon.
Each of these three-letter words --
-- in nucleic acid language is
called codon.  And so something that
had a complimentary sequence to a
codon was called an anticodon.
So if G-G-G is the codon then C-C-C
would be the anticodon.
Now, this is just a schematic,
as you can see.  It shows where the
hydrogen bonds are that form this
stuff.  When the crystal structures
were done, the first crystal
structure of tRNA was actually done
by Alex Rich.  He's in the Biology
Department at MIT.
And he was in this picture I showed
you talking to Matt Meselson.
And although we cannot see this
terribly well,
maybe you could hit the lights here,
the crystal structure showed that
the molecule didn't look like a
clover leaf as in there.
It had more this shape.  And I'll
show you this more clearly in this
picture.  I showed you this little
part of the thing when I was showing
you how an RNA could form.
For example, if you copy the gene
encoding a tRNA and,
for example, the sequence here in
green is complimentary to the
sequence here,
or the sequence here in sort of blue
or purple was complimentary
to the sequence here.
That what can happen then,
if you allow a single strand RNA
like this to fold up,
thermodynamically it will then go to
the lower energy state which
involves being able to make these
hydrogen bonds.
And I think you can sort of see the
clover leaf.  Here's one of the
leaves.  The other is down here and
the others.  It's a little
bit distorted here.
And the reason is,
because I'm going to continue now to
show you how this structure,
once you get to the clover leaf,
then it folds up to make other kinds
of interactions and it takes that
shape with the tRNA going on at this
end and the anticodon being down
here.  And what's happening now is
they've morphed on the van der Waals
surfaces so you can see what this
would look like,
3-dimensional shape.
The amino acid would be attached at
that end and there is the anticodon
that we'd be able to recognize,
the codon in the RNA.
I mean the physical reality is
pretty close to this simple little
depiction here.
OK.  So once this basic paradigm
had been straightened out that gave
rise to this idea then,
putting it all together,
that the information in DNA,
that a portion of it would be copied
into RNA and that would go out into
the cytoplasm.
And then in the cytoplasm these
translators, the tRNAs would be able
to decode, read the nucleic acid
information and use that to
determine the linear order of amino
acids in a protein.
Crick, when he came up with this,
gave this the term ìthe central
dogmaî.  And people still use this
term to apply this idea of
information flow going from DNA to
RNA in protein.
And it's still used to this day.
There's actually sort of a little
twist to that,
because at the time that Crick
proposed the term he actually
thought that the word dogma meant
ìan idea for which there is not
reasonable evidenceî.
But he was sort of amused years
later to realize that a more
reasonable definition of dogma is it
is something that a true believer
cannot doubt.  So he kind of
accidentally made an insertion that
he was right, but fortunately he was
right.  Now --
-- the next big job,
though, in working this out was to
crack the code.
And it's fine to know that it's a
3-letter code and it's fine to know
it goes into RNA and then the tRNAs
translate it, but if you cannot
crack the code then you have no idea
what any of the information means.
It was sort of like before the
Rosetta Stone they could look at the
hieroglyphics in the Egyptian tombs
and they could see that it was a lot
of information and there were
symbols and so on,
but they didn't know what it meant
until finally they got something
that allowed them to relate it to a
language they did know and they were
able to work out the principles.
So somehow scientists had then to
crack the code.
And there were two scientists who
played a really big role.
One was Marshall Nirenberg who was
at NIH and is,
in fact, still at NIH.
And the other was a scientist who's
on the same floor as me at MIT,
Gobin Khorana.  And they used two
different approaches,
but between these two approaches the
genetic code was cracked.
And what Nirenberg did was to take
a protein synthesizing --
-- extract that he knew needed RNA
in order to work.
So that wasn't a surprise at this
point because people were thinking
the RNA would be the message.
And at that point the ability to
make synthesized nucleic acids was
quite limited compared
to what we do now.
And so there were different ways of
making them.  Sometimes you could do
it enzymaticly.
But what Nirenberg,
for example, was able to make was
poly-U.  So this was an RNA that was
just UUUUUUU.  And then what he did
was he set up 20 reactions,
and in every reaction he put some of
this extract, he put poly-U and he
put 19 of the amino acids
that were unlabeled.
And then only one amino acid that
had radiolabel in it.
So he ran these 20 reactions and
waited to see in any of these did he
get protein made that would have
been coded by the poly-U.
And what he ended up with was
polyphenylalanine.
Which you may recall when we were
talking about structures of amino
acids, there's the basic backbone.
And the polyphenylalanine is the
one that has, if you will,
a benzene ring hanging off the end.
And so what that meant was that UUU
must code for a fee
or phenylalanine.
And if it's UUU in the RNA that must
mean that the DNA that encodes this
must have that sequence AAA and TTT.
And you can see that one of the two
strands of the DNA,
since T base pairs the same as
uridine, but one of the strands in
the DNA is going to have the same
sequence as one of the
strands in the RNA.
Now, I'll just tell you one brief
little anecdote.
I heard Marshall Nirenberg at this
meeting they had to celebrate the
50th anniversary of the discovery of
DNA.  And he posed something that
I'd never thought about in my years
of teaching this but might occur to
you guys if we put it
on a problem set.
You all know something that benzene
is nothing but sort of these,
this as I call it, we even referred
to it as a benzene ring,
which is a very organic kind of
solvent.  So if we put a problem set,
if you've made polyphenylalanine
would you expect this to be soluble
in water?  Well,
this is very, very hydrophobic,
very, very water-hating.  And your
answer would be correct.
If you said no, I wouldn't expect
polyphenylalanine to be
soluble in water.
In fact, if it were in a protein
you'd expect it to probably be in
the core where all the hydrophobic
interactions, the water-hating parts
would go.  So Marshall Nirenberg
said in his talk,
well, he had shown that he had
radioactive phenylalanine,
and he still had to prove chemically
that he had polyphenylalanine.
But he wasn't much of a biochemist
so he walked down to the lab just
below NIH and walked in the door and
saw the first person he saw and said
how do you solubilize
polyphenylalanine?
Just to make sure I got this right.
And the guy said, oh, you just take
33% hydrobromic acid and glacial
acidic acid and it works.
So he went back upstairs and
dissolved it.  It turned out it
dissolved in that.
And he went on and characterized it.
And he said it didn't occur to him
or he didn't learn until about 15 or
20 years later that he just walked
up to the only person in the world
who knew how to solubilize
polyphenylalanine.
By total coincidence this guy who
had talked to had been working away
trying to figure out a way and had
come up with this odd mix of
hydrobromic acid and glacial acidic.
And he just said of all the places
in the world, he walked up to the
one person who knew and got the
answer.  So the other part of the
story then involves Gobin Khorana
who I mentioned when I was telling
you initially about the Nobel
Laureates at MIT.
And Gobin is a brilliant organic
chemist.  He synthesized DNA.
You know, it was a point where a
whole issue of a journal came out
and there was nothing but his labs
work and synthesizing DNA.
Well, he was good at nucleic acids.
And one of the strategies that they
could use chemically was they would
make something like a dye nucleotide
like CA.  And then they were able to
polymerize that to make a piece of
RNA.  So they could make an RNA that
had the sequence CA, CA,
CA, CA and so on.
And what you can see from that is
that there are two different codons
in that.  One is CAC and the other
is ACA.  And the reason he made was
he was synthesizing it by
polymerizing nucleotides.
So in these same kinds of
experiments I was describing before,
what they found this synthesized was
alternating histidine
and threonine.
And you cannot tell from that
experiment alone.
One of those must be histidine and
one of them must be threonine,
but you cannot tell from that
experiment so more experiments were
needed.  And what was learned from
that experiment in that case was
that CAC corresponded to histidine
and ACA corresponded to threonine.
So these kind of experiments were
then put together to give what's
known as the genetic code which is
the three-letter words encoded in
DNA that encode the sequence amino
acids and proteins.
And it's usually displayed as a
table and you read it in this way.
That this thing over here is the
first base in the codon,
across the top is the second base in
the codon, and down over here is the
third base.  So if we go to C as the
first, say the one for histidine we
were just showing you.
C is the first letter.
A is the second letter,
so this is the box that we're going
to be looking at.
And if C is the third letter we can
see it encoded histidine or AC come
back to A.  Then the A is certainly
threonine.  But you can also see
something else here.
And that is because there were 64
possibilities with this three-letter
word the code is what's
known as degenerate.
That is there are more words in the
genetic code than are needed to
specify the number of amino acids
that have to be coded.
So I just want to make a couple of
points about this.  So
the genetic code --
It's degenerate.
There are 61 codons that correspond
to an amino acid.
And that means that some,
and I think threonine is a good
example, there's more than one word
in the genetic code that means
threonine.  There were tree codons
for which there was no corresponding
amino acid.  And those mean stop.
And that would make sense because if
you're reading down a nucleic acid
piece of RNA, at some point you'd
have to end the protein.
And so there are actually three
that are used for that purpose.
And although there's some small
variation on this in nature there's
usually one amino acid that's used
for starting a protein,
and that's methionine.
And it's AUG right there.
Now, some of this stuff probably
sounds like it's been around forever,
and that's certainly true of some of
the stuff you hear in your chemistry,
math and physics courses.
I just want to drive you home.
When I was an undergrad Watson's
first book called the molecule
biology of the gene had come out,
so when I was your age, and I
realize that I look ancient but,
you know, at least I'm still here.
When I was an undergrad I had
Watson's book.
This was the genetic code that was
in the code, the genetic code as of
May 1965.  And you'll notice there
are gaps in here.
And all the things that are
underlined were things for which
there was a tentative assignment.
So although you may take this and
think that it's been knowledge
that's been around forever,
it wasn't even complete in the
textbook when I was
an undergrad.  OK.
So one of the things then that's
important to think about the nucleic
acid stuff, this is the basis of how
proteins are encoded in the DNA.
But everything else has to be there,
too.  And the genetic code,
that's what we've been talking about,
is universal.  But there
are other languages --
-- written in the DNA that are not
universal.  And one of them was that
little example I gave you with an
origin of replication.
E. coli only starts DNA replication
at one very particular point in its
chromosome, so it is a particular
sequence of DNA.
It's actually about 250 nucleotides
long.  So you could think of that as
a language.  It's like starting a
chromosome replication language.
It's only got one word in it, and
the word is 250 nucleotides long.
Another place that's very important,
and that is if you're going to make
an RNA copy, if you're going to do
transcription of a piece of DNA --
And I'll call this the coding
sequence.  This would be the
sequence of three-letter words that
we'd specify the amino acid of the
protein.  If you were going to make
an RNA copy of that,
you would have to somewhere have
something here that's a sequence up
here that means start
transcription.
And one at the end,
some other sequence of letters in
the nucleic acid that would mean
stop transcription.
This is given the technical term
that's referred to as a promoter.
The stop one is referred to as a
terminator.  And these,
we'll say more about this.
Because the beauties of having this
system of making an RNA copy is it
provides a beautiful point of
regulation.  Because the cell can
determine whether or not it's going
to make a particular protein by
whether or not it chooses to make
the protein or not.
And so having this RNA intermediate
and being able to control
transcription is a really important
part of the whole regulation that
makes life possible.
The transcription is carried out by
an enzyme that's known as RNA
polymerase.  And let me make one
more point.  These promoters and
terminators are not universal.
So when we talk about recombinant
DNA a little bit in the course,
if I take a mouse gene and I put it
in E. coli.
Even though the genetic code is the
same, we might have all the same
sequence of amino acids specified,
you won't get the RNA made because
the sequences that say start
transcription and stop transcription
are different between a mouse and a
bacterium even though the genetic
code is the same.
So you can kind of see from first
principles.  If you're doing
recombinant DNA and you wanted to
express the mouse protein in E.
coli, you would have to fiddle
around with the sequences up here
and the sequences down there,
the parts that are not universal.
You guys with me?
OK.  So what does an RNA polymerase
do?  It recognizes this sequence,
and then it teases the strands apart
to make a little bubble like this.
So let's say ATAGCTA.  So the other
strand then would be TATCGTA.
And then RNA polymerase,
unlike a DNA polymerase, can begin a
chain de novo.
Remember an important thing about
DNA polymerases was they had to have
a primer terminus to get started.
That was they had to use the
Okazaki fragments.
So this is DNA.  This would be 5
prime, 3 prime,
3 prime and 5 prime.
And what an RNA polymerase can do,
it uses DATP, DGTP, DCTP and DUTP.
It uses triphosphates,
excuse me.  Get rid of these.
Excuse me.  My mistake.  No deoxies
here.  Of course this is RNA.
It uses ATP, GTP, CTP and UTP as
the substrates.
So it uses triphosphates just the
same way DNA polymerases do.
And then it's able to start a chain
de novo.
And it synthesizes the RNA in a 5
prime to 3 prime direction,
the same direction that a strand of
DNA is made by DNA polymerase.
So it would copy here.  And so it
would put in an A opposite a T.
And then because it's RNA it will
put in a U opposite an A,
and then an AGCAU and so on.
So this right here is the beginning
of the RNA that's being synthesized
by the RNA polymerase.
This strand is known as the
transcribed strand.
And by default then that one is the
non-transcribed strand.
And what you can see by doing this,
it's making an RNA the same
sequences up here,
except that everywhere there's a T
there's now a U in the DNA.
So the final thing then is how this
information gets all put together to
make proteins.
And protein synthesis is done by an
amazing machine known as the
ribosome.  It's made up of some
special large RNAs --
-- called rRNAs,
some proteins as well.
These make up the ribosome.
And then it needs a mRNA and then
it needs the various tRNAs,
each of which carries an amino acid
that's appropriate to its anticodon.
And in a very briefly sort of way
this is --
And you can see this in your
textbook, what the ribosome does is
it takes, let's consider this is the
mRNA.  I'm just going to take three
codons here.  And this mRNA treads
into the ribosome.
And I'll sort of show it's able to
recognize the first codon and the
second codon.  Remember,
of course, there's no spacing like
this in the RNA.
And then in the context of this
large factory it's able to find the
tRNA that has amino acid one and the
anticodon that would correspond to
this.  The tRNA that has the next
amino acid attached and its
anticodon.  So you can see what's
happened.  It's been able to order
the first amino acid encoded by that
codon and put it physically right
next to the next amino acid that's
coded here.  And then
it catalyzes --
-- the formation of a peptide bond.
And what happens when that does is
the way this amino acid is joined to
the tRNA there's energy
stored in that bond.
And so thermodynamically that allows
this bond formation to go.
And now you end up essentially with
this.  And what happens now is
everything clicks over one.
So you could think of it as this
whole RNA shifts over one so the one
that used to be here is now sticking
outside.  Here's part
of the ribosome.
Here's the next codon.
What we have here is the tRNA
that's got amino acid two joined to
amino acid one.
The next codon specifies the next
amino acid which is three.
And the process is then able to go
on like that.  Now,
the structure of the ribosome,
the crystal structure of the
ribosome was just finished.
And I guess we've got as many
lights out as we can do right now.
It's absolutely remarkable.
It's mostly RNA.  The gray stuff
and the blue stuff are two huge RNAs
that are all folded up in
3-dimensional space.
And these things that are sort of
stuck on the outside,
these purple things here or the dark
blue things here that sort of look
like cherries stuck on the outside
of a cake, those are proteins.
So most of this is RNA, big balls
of RNA with proteins kind of
decorating the outside.
The mRNA is a green thing that
snakes through.
There's the mRNA.
See it snaking through?
And maybe you can recognize in the
middle this tRNA.
There's an orange one and a yellow
one.  Those correspond to the two
tRNAs I depicted here.
And I'm just going to see if I can
stop this.  There's a viewpoint I'd
like you to see when it comes around
again here in just a second.
I'll see if I can catch it there.
Right there.  Here's one of the
tRNAs in yellow.
And its end is right there.
And there's the other tRNA.
And its end is right there.  So
this corresponds to the point at
which there's going to be an amino
acid formed.  And something is going
to catalyze the formation of that
bond.  Well, the next picture sort
of shows what happens if you pull
that apart.  And what you'll see is
that here's the end of one end of
the tRNA, there's the other end,
and there's nothing near it except
for RNA.
So RNA is actually catalyzing the
formation of the peptide bond.
Another way to say that would be
that the ribosome,
which is the protein synthesizing
factory, is a ribozyme.
Remember I said most of the
chemical reactions that need
catalysts are carried out by
proteins but there are a few that
are carried out by RNA where
RNA is the catalyst?
And remarkably the formation of the
bond, which is at the heart of
proteins which are so important for
all life, is catalyzed by protein.
If you look at what makes proteins,
what do you see?  You see huge balls
of RNA, a mRNA threading through two
tRNAs, and the enzyme activity or
the catalytic activity is encoded by
the RNA as well.
As I said, people think possibly
there was an RNA world that preceded
our present-day world with DNA,
RNA and protein.  And who knows?
But this sort of look at a ribosome
could at least make you see that
that's a plausible explanation that
RNA might have been running the show
for a while before anything else got
involved.  Anyway, we'll
see you on Friday then.
