And in fact, we can visualize
this even a little metaphorically.
So for instance, here is,
for instance, a mailbox.
And suppose that this is address 123.
What is in address 123?
Well it's a variable
of type int, called n,
looks like it's storing the number 50.
Right?
We saw these letters--
these numbers last week.
So here's the number 50, which is an
integer inside of this variable, today,
represented as a mailbox
instead of as a locker.
Well suppose that this mailbox over
here is not n but suppose this is p.
And it happens to be an address 456.
But who really cares?
If this variable p is a pointer to an
integer, namely that one over there,
when I open this door,
what am I going to find?
Well I'm hoping I find
the equivalent of-- we
picked these up at the Coop
earlier --the equivalent
of a conceptual pointer saying
the number n is over there.
But what specifically, at a lower
level, is actually inside this mailbox
if that variable n is at location 0x123?
What's probably inside this mailbox?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah,
the address, indeed, 123.
So it's sort of like a
treasure map if you will.
Oh, I have to go to
123 to get this value.
Oh, the integer in
question is indeed 50.
And that's the fundamental difference.
This is the int that happens to be
inside of this variable of type int.
This is the address that's a pointer
that's in this other variable, p,
but that is conceptually, simply
pointing from one variable
to another, thereby giving any
sort of conceptual breadcrumbs.
And we'll see-- frankly, in one
week --how amazingly powerful it is.
When you can have one piece
of memory pointing at another,
pointing at another,
pointing at another,
you can start to construct very
sophisticated data structures,
as they're called,
things like family trees,
and lists, and other data structures
that you might have heard of.
Or even if you haven't, these will
be the underpinnings next week
of all of today's fanciest
algorithms used by,
certainly the Googles,
and the Facebooks,
and the Microsofts of the world
to manage large data sets.
That's where we're going next
week, in terms of application.
So questions about that representation?
Yeah, in the middle.
AUDIENCE: Does that mean that your
memory has to be twice as big?
DAVID J. MALAN: Sorry
can you say it once more?
AUDIENCE: Is that to say your memory has
to be twice as big to store pointers?
DAVID J. MALAN: Ah,
really good question.
Is it the case that your
pointers need to be twice as big?
Not necessarily, just, this
is the way life is these days.
On most modern Macs and PCs, pointers
use 64 bits-- the equivalent of a long,
if you recall that brief
discussion in Week 1.
So I deliberately drew
my pointer on the screen
here as taking up 8 bytes or 64 bits.
I've deliberately drawn my integer
n as taking up 4 bytes or 32 bits.
That is convention these
days on modern hardware.
But it's not necessarily the case.
Frankly, I could not find a
bigger mailbox at Home Depot,
so we went with two identical
different colored ones.
So metaphor is imperfect.
All right.
So moving from this to something
more familiar now, if you will.
Recall that we've been talking
about strings for quite some time.
And in fact, most of the interesting
programs we've written thus far
involve maybe input from the
human and some form of text
that you are then manipulating.
But string we said in Week
1 is a bit of a white lie.
I mean, it is the training
wheels that I promised
we would start taking off today.
So let's consider what a string
actually is now in this new context.
So if we have a string like EMMA
here, declared in a variable
called s, and quote unquote, EMMA in all
caps, as we've done a couple of times
now.
What does this actually look
like inside of the computer?
Well somewhere in my computer's memory
there are four, nay, five bytes,
storing E-M-M-A, and then additionally,
that null terminating character that
demarcates where the
end of the string is.
This is just eight individual 0 bits.
So that's where EMMA might be
represented in the computer's memory.
But recall that the
variable in question was s.
That was my string.
And so that's why over
the past few weeks
any time you want to manipulate a
string, you use its name, like s.
And you can access bracket 0,
bracket 1, bracket 2, bracket 3,
to get at the individual characters
in that string like EMMA, E-M-M-A,
respectively.
But of course it's the case, especially
per today's revelation, that really,
all of those bytes have
their own addresses.
Right?
We're not going to care after
this week what those addresses are
but they certainly exist.
For instance, E might be at 0x123.
M might be at 0x124--
1 byte away --0x125, 0x126, 0x127.
They're deliberately 1 byte away
because remember a string is defined
by characters back-to-back-to-back.
So let's say for the sake of
discussion that EMMA name in memory
happens to start at 0x123.
Well, what then really
is that variable s?
Well, I dare say that s
is really just a pointer.
Right?
It can be a variable, depicted
here just as before, called s.
And it stores the value 0x123.
Why?
That's where Emma's name begins.
But of course, we don't really have
to care about this level of precision,
the actual numbers.
Let's just draw it as a picture.
s is, if you will, a pointer to
Emma's actual name in memory,
which might be down over here.
It might be over here.
It might be over here, depending
on where in the computer's memory
it ended up by chance.
But this arrow just suggests that
s is pointing to Emma, specifically
at the first letter in her name.
But that's sufficient though, right?
Because how-- if s stores the
beginning of Emma's name, 0x123.
And that's indeed where
the E is but we just
draw this pictorially with an arrow.
How does the computer
know where Emma's name
ends if all it's technically
remembering is the beginning?
AUDIENCE: The null
terminating character.
DAVID J. MALAN: The null
terminating character.
And we stipulated a couple of
weeks ago that that is important.
But now it's all the more
important because it turns out
that s, this thing we've
been calling a string,
has no familiarity with
MMA or the null terminator.
All s is pointing at
technically, as of today,
is the first letter in her name, which
happens to be in this story at 0x123.
But the computer is smart enough
to know that if you just point it
at the first letter in a
string, it can figure out
where the string ends by just looking--
as with a loop --for that
null terminating character.
So this is to say ultimately, that
there is no such thing as string.
And we'll see if this strikes a chord.
There is no such thing as a string.
This was a little white lie
we began telling in Week 1
just so that we could get interesting,
real work done, manipulating text.
But what is string most likely
implemented as would you say?
AUDIENCE: An array of characters.
DAVID J. MALAN: An array
of characters, yes.
But that was Week 1's definition.
What technically now, as
of today, must a string be?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Sorry, over here.
AUDIENCE: A pointer.
DAVID J. MALAN: A pointer.
Right? s, the variable
in which I was storing
Emma's name would seem to
manifest a pattern just
like we saw with the numbers
a moment ago, the number 50.
s seems to be storing the
address of the first character
in that sequence of characters.
And so indeed, it would
seem to be a string.
Well, how do we actually
connect these dots?
Well suppose that we
have this line of code
again where we had int n equals 50.
And then we had this other
line of code where we said,
go ahead and create a variable called
p and store in it the address of n.
That's where we left off earlier.
But it turns out that this thing
here is our data type from Week 1.
This thing here, int star, is
a new data type as of today.
The variable stores, not an
int, but the address of an int.
It turns out that something like
this line of code, with Emma's name,
is synonymous with char star.
Right?
If a star represents an address and char
represents the type of address being
pointed at, just as int star can
let you point at a value like n--
which stored 50 --so could a char star--
by that same logic --allow you to
store the address of and therefore
point at a character.
And of course, as you
said, from Week 1, a string
is just a sequence of characters.
So a string would seem to be just
the address of the first byte
in the sequence of characters.
And the last byte happens to be all 0s
by convention, to help us find the end.
So what then more
technically is a string
and what is the CS50
library that we're now going
to start taking off as training wheels?
Well last week we introduced
you to the notion of typedef,
where you can create your own customized
data type that does not exist in C
but does exist in your own program.
And we introduced this keyword, typedef.
We proposed last week that this was
useful because you could actually
declare a fancy structure
that encapsulates
multiple variables,
like name and number,
and then we called this data
structure, last week, a person.
That was the new data type we invented.
Well it turns out you can use
typedef in exactly the same way
even more simply than we did
last week by saying this.
If you say typedef char star string--
typedef means give me a new
data type, just for my own use.
Char star means the type of value is
going to be the address of a character.
And the name I want to give to that
data type is going to be string.
And so literally, this
line of code here, this
is one of the lines of code
in CS50 dot h-- the header
file you've been including
for several weeks,
where we are creating a
data type called string
to make it a synonym for char star.
So that if you will,
it's an abstraction,
a simplification on top of the
idea of a sequence of characters
being pointed at by an address.
Any questions?
And honestly, this is
why-- and maybe those sort
of blank stares --this is why
we introduced strings in Week 1
as being an actual type as
opposed to not existing at all.
Because who really cares
about addresses and pointers
and all of that when all
you want to do is like,
print, hello world, or
hello, so and so's name?
Yeah, question.
AUDIENCE: What other-- what
other functions are created--
major functions are created
by CS50 are not intrinsic to--
DAVID J. MALAN: Really good question.
We'll come back to this later today.
But other functions that are
defined in the CS50 library that
are training wheels that
come off today are getString,
getInt, getFloat, and the
other get functions as well.
But that's about it that we do for you.
Other questions?
Yeah.
AUDIENCE: Can you define
all of these words again?
Like, it's-- so string is like a
character pointer which points--
I was confused about that.
Can you repeat that?
DAVID J. MALAN: Sure.
A string, per this definition, is a
char star, as a programmer would say.
What does that mean?
A string is quite simply a variable that
contains the address of a character.
By our human convention, that
character might be the beginning
of a multi character sequence.
But that's what we
called strings in Week 1.
So a string is just the
address of a single character.
And we leave it to human convention
to know that the end of the string
will just be demarcated
by eight 0 bits, a.k.a.
the null terminator.
And this is the sense
in which-- especially
if you have some prior
programming experience
--that C is much more low level.
In Python, as you'll
soon see in a few weeks,
everything just works
so splendidly easily.
If you want a string,
you can have a string.
You don't have to worry about
any of these low level details.
But that's because Python
is built here, conceptually,
where C is built down here-- so to
speak --closer to the computer's memory.
But there's no magic.
If you want to string, fine.
Just remember where it starts,
remember where it ends.
And boom, you're done.
The star in the syntax today is just a
way of expressing those ideas in code.
So let's go ahead then and
experiment with this string,
just as we did a moment ago using
Emma's name now instead of an int.
So let me go ahead and
erase those lines earlier.
And let me go back to Week 1 style
stuff, where I just say string s
equals quote unquote, Emma.
And then of course, if I to print
this, I can simply say this as before.
So just as a quick safety check, let
me go ahead and make address again.
Whoops.
What did I do wrong?
Let me scroll up to the first--
of many it seems --errors.
Yeah.
AUDIENCE: You're using
string, [INAUDIBLE]
DAVID J. MALAN: Yeah, I
kind of shouldn't have taken
off all the training wheels just yet.
I'm still using string.
So let me go ahead and put
that back just for now.
That will give me access
to that typedef for string.
Let me recompile it as make address.
That worked.
So that was the solution, thank you.
And then address again.
We just see Emma.
So what can we now do that's
a little bit different here?
Well, one, you know
what I can actually do?
I can get rid of this-- the
solution a moment ago --and say,
I don't need string anymore.
I don't need those training wheels.
If s is going to represent
a string, technically, s
is just going to store the
address of the first character.
And it suffices actually,
just to write this.
So literally instead of
string, you write char star.
Technically, you don't need--
you can have extra space
to the left or right.
But most programmers write it just as
I have here, char star variable name.
That looks scarier now but
it's no different from what
we've been doing for weeks.
If I now do make address
without the CS50 library,
still works, because C knows
what I'm talking about.
And if I run address
now, I still see Emma.
But now I can start to play around.
Right?
If s is the address of a
character, what was the format code
I can use to print an address?
Not percent i, but--
AUDIENCE: Percent p.
DAVID J. MALAN: Percent p, a pointer.
So let me go ahead and
recompile this now.
Make address, that compiles too.
And when I run dot slash address,
I'm not going to see Emma now.
What should I see instead?
Some address, right?
I have no idea what it is.
It looks like Emma's name
is stored at 0x42A9F2,
whatever that number translates
to decimal, somewhere
in the computer's memory.
But it turns out then
too, what about this?
Let me go ahead and add
another line of code and say,
you know what, I'm really curious now.
What is the address of the
first letter in Emma's name?
How do I express in C, the
first letter only of Emma's name
if Emma is stored in s.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: s bracket zero, right?
That would seem to be that.
But that is what?
That's a char. s bracket 0 is a char.
How do I get the address of s bracket 0?
AUDIENCE: Ampersand.
DAVID J. MALAN: Yeah, I
can just say ampersand.
Right?
So it's ugly looking
but that's fine for now.
Make address, enter.
Whoops.
It's uglier because I
forgot my semicolon.
Let me go ahead and make
address again, enter.
Seems to compile.
And when I run dot slash address
now, notice I get the same thing.
And this is because C
is taking me literally.
When you print out s, a string,
it's technically just the address
of the first character.
And indeed, I can corroborate
as much by running s
bracket zero then get the
address of the first character.
And they are indeed one in the same.
So a string is this sort of abstraction
on top of a bunch of characters.
But again, s is just an address.
And that's all we're emphasizing now.
And if I get really curious--
not that you would necessarily
do this in a real
program --what if I print
out a few more characters in Emma's
name, like s bracket 1, 2, and 3?
Let me go ahead, just out of curiosity
and make this program and dot slash
address.
Now notice what I see, is
again, s's address is at 42AB52.
The first character in
s is at the same thing,
by definition of what a string is.
And then notice what's
kind of neat-- if this is--
if-- for some definition of
neat --53, 54, 55 is noteworthy.
Why?
They're one byte apart.
So this whole time, whenever you
implemented Caesar, or substitution,
or some other cipher in
problem set two, anytime
you were manipulating individual
characters-- you didn't know it
--but you were just visiting
different mailboxes.
You were just visiting different
addresses in the computer's memory
in order to manipulate them somehow.
All right.
Can I do one last demo that's
a little arcane and then
we'll make things more-- more real?
All right.
So it turns out if all that's
going on underneath the hood
is just addresses, watch
what I can do here.
If I want to go ahead and print
out what is at the address s,
what will I find in memory
if I go to the address in s?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Sorry, a little louder.
AUDIENCE: The first letter.
DAVID J. MALAN: The first
letter in Emma's name, right?
If we can all agree-- even if
it's a little unfamiliar still
--that s is just the address of
a character, and I say, go to s,
what should I see specifically?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Probably
E in Emma Right?
If s is the address of the
first character of her name,
star s would mean go to that character.
So let me go ahead and
print that as a char.
So let me go ahead now and make
address dot slash address, enter.
There is the E because I can say, go
to that address and print what's there.
And I can actually do this for
all of her letters in her name.
Let me go ahead and print
out another one here.
So how do I get at the
second letter in Emma's name?
Previous-- normally, like last
week, we would have done this.
And that just magically gets you
to the second letter in her name.
But I can do it a little differently.
What if I go to s and
then, from where do I want
to go from s to get the second letter?
AUDIENCE: Plus one.
DAVID J. MALAN: Plus one, right?
I mean, maybe we can literally
just do arithmetic here.
If s is the address of her first letter,
it stands to reason that s plus 1
is the address of her second letter.
So make address now dot slash address.
And I should see EM.
And I can do this twice more maybe and
go ahead and do this and then this.
But this time add 2 and this time add
3, just doing some simple arithmetic.
Make address dot slash address, there is
Emma but in a much lower level detail.
So what is this bracket symbol?
In computer science, this is
what's called syntactic sugar.
It's kind of a silly name.
But it just refers to a handy feature
so that you, the programmer, can say,
s bracket 0 or bracket 1.
But what the computer is actually doing
underneath the hood-- the compiler,
Clang --it's actually converting all of
your uses of square brackets since Week
1 to this format here.
It's just doing arithmetic
underneath the hood.
Now you don't have to
do this moving forward.
But I point out this low level
detail just to give you a sense of,
there really is no magic.
When you say, go print
an address or go do this,
the computer is taking you literally.
Whew.
OK, that was a lot.
Yes, question.
AUDIENCE: So [INAUDIBLE]
DAVID J. MALAN: Star s would
mean go to the address in s.
AUDIENCE: So why for instance, if
you [INAUDIBLE] character [INAUDIBLE]
DAVID J. MALAN: Really good question.
Why, when you print out s, does it
print out the whole string and not
just the character?
That's what the printf
format code is doing for you.
When you tell printf to use percent
s, that has special meaning to printf.
And it knows to go to the first
address and not just print
the second-- the first char, but
print every character thereafter
until it sees what?
AUDIENCE: The null terminator.
DAVID J. MALAN: The null
terminating character.
So printf and percent s are special
and have been special since the Week 1.
They just know to do exactly
what you've described.
So pointer arithmetic, to be clear,
is just taking addresses and like,
doing arithmetic with them,
adding 1, adding 2, adding
3, or any other manipulation like that.
All right.
So [CHUCKLE] let's take
another stab at a meme here.
[CHUCKLE]
OK, a few of us.
All right.
All right, it's trying too hard.
All right.
So what then do we have
when it comes to strings?
Well, let's now try to
learn from these primitives
and actually trip over some mistakes
that we might otherwise make.
I'm going to go ahead
and open up a new file.
I'm going to go ahead and
call this one, compare.
So we'll save this as compare dot c.
And this will be reminiscent of
something we started doing last week.
And you've done this past week,
particularly for implementing
voting and comparing strings.
I'm going to go ahead and
make a quick program that
just compares two integers.
I'm going to put the training
wheels back on temporarily,
just so that we can get
some numbers from the user
pretty easily, including CS50
dot h and standard I/O dot h.
I'm going to do int
main void as my program.
I'm going to get an integer called
i and ask the human for that.
I'm going to get another integer
called j, ask the human for that.
And then I'm going to go ahead
and say if i equals equals j,
then go ahead and print with
printf that they're the same.
Else, if i does not equal j, I'm going
to go ahead quite simply and print out
different backslash n.
So if i equals equals
j, it should say, same.
Else, if it's different,
it should say different.
So let me go ahead and make
compare dot slash compare.
And I should see, hopefully, if I
type in say, 1, 2, they're different.
And if I instead do 1,
1, they're the same.
All right.
So it stands to reason that logically
this is pretty straightforward when
you want to compare things.
So instead of using numbers,
let me go ahead and change this.
Let me go ahead and do, say, string
s gets getString, just as before
but using getString instead
and ask the human for s.
Then give me another string, t, just
because it's alphabetically next.
And I'll ask the human for t.
And then I'm going to go ahead
and ask this question, if s equals
equals t, print same,
else, print different.
So now let me go ahead
and make compare again.
I'm going to go ahead and
type in dot slash compare.
We'll type in Emma.
We'll then type in Rodrigo.
And of course, it's
going to say different.
But if I instead run it again
and type in Emma and all right,
I'll type Emma again--
hmm, different.
Maybe it's a capitalization thing?
No.
But why as of today, are
they indeed different?
Last week we kind of waved our
hands and said, ah, they're arrays,
you have to do some stuff.
But why are they different?
AUDIENCE: They're stored
in different locations.
DAVID J. MALAN: Exactly, they're
stored in different locations.
So when you get a string with
getString and call it s, and then you
get another string with
t and call it t, you're
getting two different chunks of memory.
And yes, maybe the human has typed
the same thing into the keyboard,
but that doesn't necessarily
mean that they're going
to be stored in the exact same place.
In fact, what we really have here
is a picture not unlike this.
If I have a variable called s-- and I'm
just going to draw it as a box there
--and if I have a variable called t--
I'll draw it as another box
here --and I typed in Emma--
E-M-M-A --that's going to
give me somewhere in memory,
E-M-M-A backslash 0.
And I'll try it as an actual
array, albeit a little messily.
And then here, if I type
EMMA again in all caps,
it's going to end up--
thanks to getString,
at a different location in memory.
By nature of how getString works, it's
going to store anything you type in it.
And what's going to
get stored in s and t?
Well, for the sake of
discussion, let's suppose that
this chunk of memory
with the first input--
sorry --happens to be at 0x123.
And the second chunk of memory happens
to be at 0x456, just by chance.
Well, what am I
technically storing in s?
0x123.
And what am I storing in t?
0x456.
So when you say, is s equal equal t.
Is it?
Well, no.
You're literally
comparing 123 versus 456.
The computer is not going
to presumptuously go
to that address for you
unless you somehow tell it to.
Put another way, if I
instead draw these boxes,
not as actual numbers, what we really
have-- sorry --what we really have is
what we'll draw as an
arrow more generally,
just a pointer to that value.
Who really cares where the address is?
So this is why last week we
kind of waved our hand and said,
eh, you can't just compare two
strings because you probably
have to compare every character.
And that was true.
But what you're technically
comparing is indeed
the addresses of those two variables.
Any questions then on this here?
Yeah.
Sure, yes.
AUDIENCE: So you said
earlier that the, I
guess, the pointer, and
the actual thing it's
pointing are like kind of somewhere
in the memory not in a specific--
they're just somewhere, right?
DAVID J. MALAN: OK.
AUDIENCE: So do you need something
that points to the point--
how does the computer
know where the pointer is?
DAVID J. MALAN: Oh, how does the
computer know where these pointers are?
So that's a really good question.
And let's answer it right here.
All this time when you've been
calling getString to get a string,
you've probably been assigning it to a
variable like I have here on line six,
with string s.
But we know as of today that if we get
rid of the CS50 library, technically,
string is just synonymous
with char star.
And so both here and with t, do you
technically have char star, right?
It's just a find and replace if
we get rid of that training wheel.
Char star just means s is storing
the address of a character.
And char star t means t is storing
the address of a character.
Ergo, all this time since the Week 1 of
CS50, what type of value has getString
been returning, even though
we never described it as such?
What must getString be returning?
Yeah.
AUDIENCE: The index of the first letter.
DAVID J. MALAN: Not even the
index per se, but rather the--
AUDIENCE: It houses the memory of that.
DAVID J. MALAN: The address
of the first character.
So anytime you called getString,
getString code we wrote
is finding in your computer's
memory some free space,
enough bytes to fit whatever
the word was that got typed in.
getString then, if we
looked at its code,
is designed to return the address of
the first byte of that chunk of memory.
So getString, this whole time,
has been returning, if you will,
what's called a pointer.
But again, nuances that we didn't want
to get into in the very first week
certainly, of C programming.
All right.
Well, let's go ahead and make
this a little more concrete.
If I pull up this code, I
don't have to just check
if they're same or different, let
me just go ahead and print them out.
If I do percent p backslash n,
I can literally print out s.
And if I go ahead and print out the
same thing for t using percent p,
I can print out the value of t.
So let me go ahead and make compare.
Seems to compile OK.
And I don't know what the
addresses are in advance.
But let me go ahead and type
in, for instance, Emma and Emma.
So even though those strings
look the same notice,
it's a little subtle this time,
the first Emma's at 0xED76A0.
The second Emma's at 0xED76E0, which is
a few numbers away from the first Emma.
So that just corroborates
the instincts last week
that we can't just
compare them like that.
So what are the implications then?
Let's do one other example here.
Let me go ahead and
save this as copy dot C.
And let's try a very reasonable goal.
If I want to go ahead and get the
user's input and actually copy a string
and capitalize the string
from the user, let's see this.
So let me go ahead and give myself the
temporary training wheels again, just
so I can get a string from the human.
Let me go ahead and include standard
I/O dot h and then an int main void.
Let me do a simple
example, the goal of which
now, is to get a string from the
user and capitalize a copy thereof.
So I'm going to go ahead and do
string s gets getString and call it s,
as before.
I'm going to go ahead and
then do string t equals
s to make a copy of the variable.
And then I'm going to
go ahead and say what?
Let me go ahead and capitalize the copy.
And to capitalize the copy, I can
just change the first character
in t, so t bracket 0, to what?
I think we had toupper a while back.
Does this seem familiar?
You can call the toupper function.
And the toupper function,
if you don't recall,
you technically have
to use C type dot h.
This might be reminiscent
of the second c problem
set, where you might have used this in
Caesar, or substitution, or the like.
All right.
And now, let me go ahead and
print out these two strings.
Let me go ahead and print out s.
And let me go ahead and print out t.
So again, all I've done in this
program is get a string from the user,
copy that string, capitalize
the copy called t.
And let's just print
out the end results.
So let me go ahead and save the file.
Let me go ahead and make copy.
Seems to compile OK.
Let me go ahead and run copy.
And let me go ahead and type
in emma, in all lowercase,
deliberately, because I want to see
that t is capitalized but not s.
Hmm.
But somehow they're both capitalized.
Notice, that emma in all lowercase
ended up being both capitalized in s
and capitalized in t per
the two lines of output.
That's a bug?
Right?
I only capitalized t,
how did I accidentally
also capitalize s do you think?
Any thoughts?
Doesn't matter if I avert the
lights, I still can't see any hands.
OK, how about here in front?
Yeah.
AUDIENCE: So when you say t
equal s you have to [INAUDIBLE]
DAVID J. MALAN: Exactly.
When I say t equals s on this line, I
am getting a second variable called t.
And I am copying s.
But I'm copying s literally.
s as of today, is an address.
After all, string is the same thing
as char star for both s and t.
And so technically, all I'm
doing is copying an address.
So if I go back to my picture
from before, this time,
if I've gone ahead and typed in an
array of emma, with all lowercase--
e-m-m-a --and then a backslash 0,
somewhere in memory using getString,
and I've gone ahead initially and
stored that in a variable called s--
and I don't care about
the addresses anymore.
I'm just going to use arrows
now to depict it graphically.
When I created a second variable
called t and I set t equal to s,
that's like literally
copying the arrow that's in s
and storing it in t, which means t
is also pointing at the same thing.
Because again, if I didn't do
this hand wavy arrow notation,
I literally wrote out 0x123.
I would have just written
out 0x123 in both s and t.
So when, in my code, I
go ahead and say, you
know what, go to the first character in
t and then go ahead and uppercase it.
Guess what the first character in t is?
Well, it's this e.
But guess what the first character
in s is, literally that same e.
So this does not
suffice to copy a string
by just saying t equals s, as it has
up until now with every other variable.
Any time you've needed a temporary
variable or a copy of something
this worked.
Intuitively, what do we
have to do probably instead
to truly copy Emma into two
different places in memory?
Yeah.
AUDIENCE: Probably create a char or
create a variable exactly the same size
and copy each character individually.
DAVID J. MALAN: Nice.
So maybe we should give
ourselves a variable that
has more memory, the same
amount of memory being stored
for the original Emma, and
then copy the characters from s
into the space we've allocated for t.
And so we can actually do this.
Let me go ahead and get rid of
all but that first line, where
I've gotten s as before.
And I'm going to go ahead and do
this, I'm to say that t is a string--
but you know, we don't need
that training wheel anymore.
String, char star, even
though it looks uglier.
Let me go ahead and allocate
more memory for myself.
How do I do that?
Well, it turns out-- we've not used this
before --there's a C function called
malloc, for memory alloca.
And all it asks as input
is how many bytes you want.
So how many bytes do I want
for Emma to store her name?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: I heard 4, 5.
Why, 5?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: So we need the
null terminating character,
e-m-m-a and then backslash 0.
So that's 5.
So I could literally
hard code this here.
Of course, this feels a
little fragile because I'm
asking for any string via getString.
I don't know it's going to be Emma.
So you know what, let me go
ahead and ask a question?
Whatever the length is of
the human's input in s,
go ahead and add 1 to it
for the null character
and then allocate that many bytes.
So now my program's more dynamic.
And once I have this, well, how
can I go ahead and copy this?
Well, let me just do old school loop.
So for int I get 0, i is less than
the string length of s, i plus plus--
so this is just a standard for
loop iterating over a string
--and I think I can just do t
bracket i equals s bracket i in order
to copy the two strings.
There's a subtle bug and a
subtle inefficiency though.
Anyone want to critique how I've
gone about copying s into t?
Yeah.
AUDIENCE: [INAUDIBLE]
getString [INAUDIBLE]..
DAVID J. MALAN: Yeah.
This was inefficient.
We said a couple of weeks
ago this is bad design
to just keep asking the question,
what's the length the s?
What's the length of s?
So remember that we had a little
optimization a couple of weeks ago.
Let's just declare n to
equal the string length of s
and then do a condition
of i is less than n.
So we've improved the design there.
It's a little more efficient.
We're wasting less time.
There's still a subtle bug here.
How many byte-- yeah.
AUDIENCE: Aren't you not
copying the null terminator
DAVID J. MALAN: I'm not
copying the null terminator.
So every other time we've iterated
over a string, this has been correct.
Iterate up to the length but not
through the length of that string.
But I technically do
want to go one more step
this time, or equivalently,
one more step.
Because I also want to copy not just
e-m-m-a, which is str length 4--
e-m-m-a is 4 --I also want to do it
a fifth time for the null character.
So in this case, I'm
deliberately going one step
past where I usually want to go to
make sure I copy 5 bytes for Emma,
not just 4.
All right.
Let's go ahead now and capitalize Emma.
So t bracket 0 gets toupper of
Emma's first character in the copy.
And now let's go ahead and
print out both strings s
and t, just as before,
with percent s of t.
And let me make one
change, I use strlen now.
So I know I'm going to get
an error if I don't do this.
I need to use string dot h-- recall
--anytime you use string length.
So I'm going to go proactively add that.
So what's different?
This line is the same as before.
I'm getting a string from the user.
This line is the same as before.
I'm capitalizing the first letter.
And these two lines are the same.
I'm just printing out s and t.
So the new idea here is, with my
malloc, am I allocating as many bytes
as I need to store a copy of
Emma, and then with this for loop
am I actually doing the actual copy?
Let me go ahead and do make copy again.
Seems to run OK.
Run dot slash copy.
Type e-m-m-a in all lowercase.
And voila, now I've
capitalized t but not s.
Yeah?
AUDIENCE: When you use malloc, it's
just allocating number of bytes,
it doesn't matter where?
DAVID J. MALAN: It is just
allocating that many bytes for you.
It does not matter where.
You indeed should not care
where it is because you're just
being handed the address
and using C code,
can you just go there as you want.
All right.
Let's clean this up too.
Surely, people copy strings for years.
And in fact, we don't need
to do this for loop ourself.
It turns out we can simplify this
code a little bit by enhancing this
as follows.
It turns out, if you look in
the manual page for strings,
you can actually use
something called strcopy--
no-- without any vowels.
And you can copy into
t, the contents of s.
strcpy is a function written a
long time ago by some other human.
And they went ahead and implemented,
probably, that loop for us.
And it tightens up our code
here a little bit more.
AUDIENCE: Professor?
DAVID J. MALAN: Yeah.
AUDIENCE: What if I forgot to copy
in the null character at the end?
DAVID J. MALAN: Really good question.
What if you forgot to copy in
the null character at the end?
It is unclear what would happen.
If there just happened to be some
bits in that location in memory
from earlier-- from some
other part of your program
--and you try printing
out s and printing out t,
you might print out many more
characters than you actually
intended-- if there's no
backslash 0 actually there.
We'll see this more and more.
Anytime you don't initialize
the value of a variable,
it's what's called a
garbage value, which means
who knows what 0s and 1s are there.
You might get lucky and it's all 0s.
But most likely it's going to
print some garbage value instead.
All right.
Any questions on this?
Yeah.
AUDIENCE: Is the string length
function only in the CS50 library?
DAVID J. MALAN: Is the--
which function?
AUDIENCE: String length.
DAVID J. MALAN: Oh, strlen,
no, that's in string dot h.
That is a standard C thing.
AUDIENCE: OK.
If string length is a standard
function but strings are not--
DAVID J. MALAN: So what's
the dichotomy here then?
If strings don't exist--
as I've noted multiple times.
And yet, there's functions like
strcpy and strlen --what's going on?
C calls them char stars.
It is c that does not call them strings.
We, CS50, and the world in
general, calls addresses
of sequences of characters, strings.
So the only training wheel
here, really is the semantics.
We gave you a data type called string
so that in the first week of C and CS50,
you don't have to see or type
char star, which would arguably
be a lot more cryptic so early on.
It's arguably a bit cryptic today too.
Other questions?
All right, yeah.
AUDIENCE: So is char
star ID type [INAUDIBLE]
DAVID J. MALAN: Is--
say that once more.
AUDIENCE: Char star ID type [INAUDIBLE].
DAVID J. MALAN: Not all of them, but
any of them that take a string, yes.
In fact, any time you have seen
us or TF in CS50 say string,
you can literally, starting today,
change that expression to char star
and it will be one and the same.
Phew.
OK.
That was a lot.
Let's take our five minute
break here with cookies outside.
All right.
So we are back.
That was a lot.
Let me draw our attention to
what the newest feature was just
a moment ago, this notion of
malloc, memory allocation.
So recall that getString I claim
as of today, all this time,
it's just returning to you
the address of the string
that was gotten from the human.
malloc, similarly, has a return value.
And when you ask malloc for this many
bytes-- maybe it's five, for emma,
plus the null terminator,
malloc's purpose in life
is to return to you the address of
the first byte of that memory as well.
So memory alloc means, go get
me a chunk of memory somewhere,
hand me back a pointer there too.
And the onus is on me to
remember that address,
as I'm doing here, by storing it in t.
But it turns out, now that we're
taking the training wheels off,
unfortunately, we have to kind
of do a bit more work ourselves.
And there's actually a
latent bug in this program.
It turns out that I am
mal-allocating memory with this
but I'm never actually freeing it.
The opposite of malloc is a function
called free, whose purpose in life
is to hand back the memory
that you asked for so that you
have plenty of memory available
for other parts of your program
and so forth.
And long story short, if
you've ever-- on your Mac or PC
--been running a program
that maybe is a little bit
buggy --you might notice your computer
is getting slower, and slower,
or maybe it even runs
out of memory explicitly,
per some error message
--that might be quite
simply, that the programmer of
that program kept using mallc,
and malloc, and malloc to grow, and
grow, and grow their use of memory,
but they never got around to
freeing any of that memory.
So programs can run out of memory.
Your computer can run out of memory.
So it's good practice, therefore,
to free any memory you're not using.
However, how do you find this mistake?
So we've got one final
debugging tool for you.
This one's not CS50
specific like debug50.
This one is called Valgrind.
Unfortunately, it's not the easiest
thing to understand at first glance.
So I'm going to go ahead and do this.
I'm going to run Valgrind on this
program, dot slash copy, and hit Enter.
And unfort--
AUDIENCE: [INAUDIBLE]
[CHUCKLE]
[COUGH]
DAVID J. MALAN: Gotcha.
OK.
I'm going to go ahead and--
there we go.
AUDIENCE: [INAUDIBLE]
So what you missed was
a very scary message.
So I'm going to go ahead and
run Valgrind on dot slash copy.
We see this esoteric output up
top and then my prompt for s--
because it's the same program.
It's prompting me for a string --so I'm
going to give it emma, all lowercase,
and enter.
And you'll notice now, that
there's some summary going on here
but also some mention of error.
So heap summary-- we'll come back to
that in a bit --5 bytes in 1 blocks
are definitely lost
in loss record 1 of 2.
Leak summary, I've got 5
bytes leaking in 1 blocks.
I mean, this is one of these
programs in Linux-- the operating
system that we use, that's quite common
in industry too --I mean, my god.
There's so-- there's so many more
characters on the screen that
are actually enlightening for me.
Let's see if we can focus our
attention on what matters.
Memory leaking, bad.
So how do we go about chasing
down where memory is leaking?
Well, as before, we can use help50.
And in fact, help50 will analyze
the output of Valgrind-- it's still
going to prompt me first string.
So I'm going to again, type in
emma --it's going to look at that.
It's to ask for help.
And voila, highlighted in yellow, is
a message that we, help50, recognize.
And notice our advices, looks like
your program leaked 5 bytes of memory.
Did you forget to free memory
that you allocated via malloc.
Take a closer look at
line 10 of copy dot C.
Now once you've done this a couple
of times and made the same mistake,
you can probably scroll up and glean
for yourself where the error is.
We're not revealing any more information
than is right in front of you.
And in fact, you can see here, ah,
in main on copy dot C, line 10,
there's some kind of 5 bytes in
1 blocks are definitely lost.
So there's a lot of words there but it
does draw attention to the right place.
So let me go ahead and scroll
down, focus on line 10.
And indeed, line 10 is where
I allocated the memory.
So it turns out the solution
for this is quite simple.
Down here, I'm just going
to go ahead and free
t, the address of the chunk of
memory that malloc returned to me.
So I'm undoing the effects of allocating
memory by de-allocating memory.
So now let me go ahead and run copy.
And if I run copy, it's not going
to seem to run any differently.
It's still going to work correctly.
But now if I analyze it for mistakes
with Valgrind, so Valgrind of dot
slash copy--
I'm going to again type
in emma in all lowercase
and I cross my fingers
--that indeed now, leaked
summary, 0 bytes in 0 blocks.
So unfortunately, even when all is
well, it still spits out a mouthful.
But now I see no mention
of blocks that are actually
leaked, at least in the top part here.
And we'll see more of this
over the next couple of weeks
as we use it to chase down
more complicated bugs.
But it's just another tool
in the toolkit that allows
us to detect these kinds of errors.
Let me try one other thing actually.
This is a program that
I wrote in advance.
This one is called memory dot C. And as
always, these are all on the course's
website if you'd like to tinker after.
And it's a little pointless.
It's just meant for
demonstration purposes.
So here is a program.
And it's copied from this online manual
for Valgrind, the tool I just used.
So let's see what's going on.
Here I have main, at
the bottom of my code.
I copied it.
I didn't use a prototype.
I just copied what they did.
And see here, it calls a function
called f and then returns 0.
Well what does f do? f
is this random function
up here that takes no
inputs per the void.
And in English, how would you describe
what's happening in line 7 now--
that we've introduced malloc and stars--
or pointers?
What's this doing?
Yeah.
AUDIENCE: It's allocating enough
memory in [INAUDIBLE] for [INAUDIBLE]..
DAVID J. MALAN: Good.
Allocate enough memory
for 10 integers-- and then
let me add-- elaborate
on your words --and then
store the address of
that chunk of memory
in a pointer called x, if you will.
So sizeof is new.
But it literally does what it says.
If you say sizeof open
paren, close paren,
and then the name of a data type, it
will tell you that an int is 4 bytes.
It will tell you that a long is 8 bytes.
It will tell you that
a char is one byte.
It's just a dynamic
way of avoiding having
to memorize those kinds of things.
So this just means give
me 10 times the size
of an int, which happens to be 4 bytes.
So that means give me 10 times
4, or 40 bytes of memory.
That's effectively an array of
memory that I can store integers in.
And malloc, per its definition, is
going to return to me the address
of the first byte of that memory.
What is now scary about
line 8, relatively speaking?
What might worry you with line
8, which is buggy, unfortunately?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Exactly.
I'm doing x bracket 10 and just
arbitrarily storing the number 0.
Why?
Just because.
But 10 does not exist.
Right?
If I have 10 int, it's bracket 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, not bracket 10.
So this is an example of
overflowing a buffer, so to speak.
Anytime you're talking
about memory, any time
you're talking about an array of
memory-- which this effectively
is, 10 integers, room for
10 integers back to back
to back --if you go one step too far,
that's what's called a buffer overflow,
whereby the buffer is the array.
And in fact, this would
make it even more clear.
Suppose I tried to go
there, bracket 10,000.
That is definitely not among
the bytes of memory I allocated.
That's definitely going beyond
the boundaries of my array.
But so is it true that bracket
10 is one step too far.
So what's nice about Valgrind is this.
Let me go ahead and rerun Valgrind
after compiling this memory program--
whoops --in my source directory.
Let me go ahead and make memory.
All right.
It compiled OK.
Valgrind dot slash memory--
and unfortunately, we're going
to see some crazy arcane error
messages for a moment.
But let's see what it says.
Notice here, invalid write of size 4--
that sounds bad --and 40
bytes in one blocks are--
OK, they didn't really add
an if condition in Valgrind.
--40 bytes in 1 blocks--
plural --are definitely lost.
So let's fix the second of those first.
Why am I leaking 40 bytes exactly?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: I'm never freeing it.
So I think I can get away with just
doing this here, just free the memory
after I'm done using it--
even though I'm not really
using it for anything purposeful here.
So let me try this again.
Make memory, now let me do
Valgrind dot slash memory.
And-- OK, better.
I don't see 40 bytes lost anymore.
So that's good.
But I do still have this issue.
But here's where it's sometimes
useful to understand the various data
types and their sizes.
Invalid write of size 4.
Writing in a program just
means changing a value.
And it mentioned line 8 here.
In what sense is this an
invalid write of size 4?
Well, how big is an int?
Four bytes.
You're trying to change
it arbitrarily to 0.
But I could have made that
50 or any other number.
But I'm trying to touch
an int that should not
be within the memory I
have allocated for myself.
I asked for 40 bytes, or 10 ints, but
again, because arrays are zero indexed,
this is like going one
beyond the boundary.
So let me fix this and just arbitrarily
say, let's go touch that part of it.
Let me go here and do make memory.
Let me go ahead and do
Valgrind dot slash memory.
And now, arcane output aside, notice
that that error message went away too.
So this will be helpful over
the coming couple of weeks
as we continue to use C to
implement a number of programs
that now start to manipulate memory.
It's just a tool that
helps you spot errors
that certainly, your TF
might otherwise, or that
might be causing your program to
crash, or to freeze, or to segfault--
if you've seen that yourselves before.
All right.
So that's just a tool.
Let's go ahead and transition now
to some actual use cases here.
Recall from last week that it was
pretty useful to be able to swap values.
Right?
With bubble sort, with selection sort,
we needed to be able to exchange values
so that we could put things
into the right place.
Turns out this is
pretty straightforward.
Right?
And we can actually mimic
this in the real world.
We just have opportunity for one
volunteer today, one volunteer.
Can we get a little--
OK, over here.
Yeah.
What's your name?
FARRAH: Farrah.
DAVID J. MALAN: Sorry.
FARRAH: Farrah.
DAVID J. MALAN: Vera.
FARRAH: Farrah.
DAVID J. MALAN: Oh, here, come on up.
Then I can hear you up here.
OK, what's your name?
FARRAH: Farrah.
DAVID J. MALAN: Vera.
FARRAH: With an F.
DAVID J. MALAN: Fera.
FARRAH: Farrah.
DAVID J. MALAN: Farrah.
FARRAH: Yes.
DAVID J. MALAN: Farrah.
Yes, OK.
Good.
Come on up.
Still come up.
[CHUCKLE] Thank you.
Thank you.
[APPLAUSE]
[CHEERS]
OK, nice to meet you.
FARRAH: Hi, nice to meet you.
DAVID J. MALAN: Farrah.
FARRAH: Yes.
DAVID J. MALAN: OK.
So let's go ahead here.
Let me give you this so that
you can be mic'd as well.
OK, so the goal at hand is here, I
have two glasses of colored water.
So we have some purple here.
[WATER POURING]
OK.
And we've got some green here.
[WATER POURING]
And the only goal at hand is
to do a very simple operation
like we needed to do
quite a bit last week,
which is to swap two variables
just like we swap two numbers.
So if you could go ahead and
get the purple liquid in here
and the green liquid in here, go.
[CHUCKLE]
FARRAH: Is it OK if they overlap?
DAVID J. MALAN: Ideally, no.
We want to put only purple
here, and only green here,
and no temporary store.
[LAUGHTER]
FARRAH: Oh.
DAVID J. MALAN: OK,
but you're hesitating.
Why?
FARRAH: Because you told me they
couldn't touch [CHUCKLE] and--
DAVID J. MALAN: Well, you
can touch the glasses.
But you're hesitating to swap them, why?
[CLINK]
OK, that's just cheating.
[LAUGHTER]
[APPLAUSE]
[CHEERS]
OK, very clever.
Supposing you can't
just move things around
in memory, what if I gave
you a temporary variable.
FARRAH: OK.
DAVID J. MALAN: Does this help?
FARRAH: Yes.
DAVID J. MALAN: So how can we now get
purple in there and green in there?
[CHUCKLE]
FARRAH: Can I put purple in here first?
DAVID J. MALAN: Sure.
FARRAH: I'm going to spill it.
DAVID J. MALAN: It's OK.
[WATER POURING]
OK.
So purple goes into the
temporary, very nice.
[APPLAUSE]
FARRAH: Thank you.
DAVID J. MALAN: Green
goes into what was purple.
FARRAH: Yes
DAVID J. MALAN: OK, good.
And then purple goes in--
from the temporary variable
into the original green glass.
Now, a proper round of
applause if we could.
OK.
[APPLAUSE]
