>> [MUSIC PLAYING]
>> DOUG LLOYD: Pointers, here we are.
This is probably going to
be the most difficult topic
that we talk about in CS50.
And if you've read
anything about pointers
before you might be a little bit
intimidating going into this video.
It's true the pointers
do allow you the ability
to perhaps screw up
pretty badly when you're
working with variables, and data,
and causing your program to crash.
But they're actually really useful
and they allow us a really great way
to pass data back and
forth between functions,
that we're otherwise unable to do.
>> And so what we really
want to do here is train
you to have good pointer discipline, so
that you can use pointers effectively
to make your programs that much better.
As I said pointers give us a different
way to pass data between functions.
Now if you recall from
an earlier video, when
we were talking about
variable scope, I mentioned
that all the data that we pass between
functions in C is passed by value.
And I may not have used that
term, what I meant there
was that we are passing copies of data.
When we pass a variable to a function,
we're not actually passing the variable
to the function, right?
We're passing a copy of
that data to the function.
The function does what it will
and it calculates some value,
and maybe we use that value
when it gives it back.
>> There was one exception to
this rule of passing by value,
and we'll come back to what that
is a little later on in this video.
If we use pointers instead
of using variables,
or instead of using the variables
themselves or copies of the variables,
we can now pass the variables around
between functions in a different way.
This means that if we make
a change in one function,
that change will actually take
effect in a different function.
Again, this is something that
we couldn't do previously,
and if you've ever tried to swap the
value of two variables in a function,
you've noticed this problem
sort of creeping up, right?
>> If we want to swap X and Y, and we
pass them to a function called swap,
inside of the function swap the
variables do exchange values.
One becomes two, two becomes
one, but we don't actually
change anything in the original
function, in the caller.
Because we can't, we're only
working with copies of them.
With pointers though, we can
actually pass X and Y to a function.
That function can do
something with them.
And those variables values
can actually change.
So that's quite a change in
our ability to work with data.
>> Before we dive into
pointers, I think it's worth
taking a few minutes to
go back to basics here.
And have a look at how
computer memory works
because these two subjects are going
to actually be pretty interrelated.
As you probably know,
on your computer system
you have a hard drive or
perhaps a solid state drive,
some sort of file storage location.
It's usually somewhere in the
neighborhood of 250 gigabytes
to maybe a couple of terabytes now.
And it's where all of your
files ultimately live,
even when your computer is shut
off, you can turn it back on
and you'll find your files are there
again when you reboot your system.
But disk drives, like a hard disk drive,
an HDD, or a solid state drive, an SSD,
are just storage space.
>> We can't actually do anything with
the data that is in hard disk,
or in a solid state drive.
In order to actually change
data or move it around,
we have to move it to
RAM, random access memory.
Now RAM, you have a lot
less of in your computer.
You may have somewhere in the
neighborhood of 512 megabytes
if you have an older computer,
to maybe two, four, eight, 16,
possibly even a little
more, gigabytes of RAM.
So that's much smaller, but that's
where all of the volatile data exists.
That's where we can change things.
But when we turn our computer off,
all of the data in RAM is destroyed.
>> So that's why we need to have hard disk
for the more permanent location of it,
so that it exists- it would
be really bad if every time we
turned our computer off, every
file in our system was obliterated.
So we work inside of RAM.
And every time we're talking about
memory, pretty much, in CS50,
we're talking about RAM, not hard disk.
>> So when we move things into memory,
it takes up a certain amount of space.
All of the data types that
we've been working with
take up different
amounts of space in RAM.
So every time you create an integer
variable, four bytes of memory
are set aside in RAM so you
can work with that integer.
You can declare the integer,
change it, assign it
to a value 10 incremented
by one, so on and so on.
All that needs to happen in
RAM, and you get four bytes
to work with for every
integer that you create.
>> Every character you
create gets one byte.
That's just how much space is
needed to store a character.
Every float, a real
number, gets four bytes
unless it's a double
precision floating point
number, which allows you to
have more precise or more digits
after the decimal point
without losing precision,
which take up eight bytes of memory.
Long longs, really big integers,
also take up eight bytes of memory.
How many bytes of memory
do strings take up?
Well let's put a pin in that question
for now, but we'll come back to it.
>> So back to this idea of memory as
a big array of byte-sized cells.
That's really all it is, it's
just a huge array of cells,
just like any other array that
you're familiar with and see,
except every element is one byte wide.
And just like an array,
every element has an address.
Every element of an array
has an index, and we
can use that index to do so-called
random access on the array.
We don't have to start at
the beginning of the array,
iterate through every
single element thereof,
to find what we're looking for.
We can just say, I want to get to the
15th element or the 100th element.
And you can just pass in that number
and get the value you're looking for.
>> Similarly every location
in memory has an address.
So your memory might
look something like this.
Here's a very small chunk of
memory, this is 20 bytes of memory.
The first 20 bytes because my
addresses there at the bottom
are 0, 1, 2, 3, and so
on all the way up to 19.
And when I declare variables and
when I start to work with them,
the system is going to set
aside some space for me
in this memory to work
with my variables.
So I might say, char c equals capital
H. And what's going to happen?
Well the system is going to
set aside for me one byte.
In this case it chose byte number
four, the byte at address four,
and it's going to store the
letter capital H in there for me.
If I then say int speed
limit equals 65, it's
going to set aside four
bytes of memory for me.
And it's going to treat those
four bytes as a single unit
because what we're working
with is an integer here.
And it's going to store 65 in there.
>> Now already I'm kind of
telling you a bit of a lie,
right, because we know that
computers work in binary.
They don't understand
necessarily what a capital H is
or what a 65 is, they only
understand binary, zeros and ones.
And so actually what
we're storing in there
is not the letter H and the number 65,
but rather the binary representations
thereof, which look a
little something like this.
And in particular in the
context of the integer variable,
it's not going to just spit it into,
it's not going to treat it as one four
byte chunk necessarily,
it's actually going
to treat it as four one byte chunks,
which might look something like this.
And even this isn't
entirely true either,
because of something called
an endianness, which we're not
going to get into now, but
if you're curious about,
you can read up on little
and big endianness.
But for the sake of this argument,
for the sake of this video,
let's just assume that is, in
fact, how the number 65 would
be represented in
memory on every system,
although it's not entirely true.
>> But let's actually just get
rid of all binary entirely,
and just think about as H
and 65, it's a lot easier
to think about it like
that as a human being.
All right, so it also seems maybe a
little random that I've- my system
didn't give me bytes 5, 6, 7,
and 8 to store the integer.
There's a reason for that, too, which
we won't get into right now, but suffice
it to say that what the
computer is doing here
is probably a good move on its part.
To not give me memory that's
necessarily back to back.
Although it's going to do it now
if I want to get another string,
called surname, and I want
to put Lloyd in there.
I'm going to need to fit one
character, each letter of that's
going to require one
character, one byte of memory.
So if I could put Lloyd into my array
like this I'm pretty good to go, right?
What's missing?
>> Remember that every string we work
with in C ends with backslash zero,
and we can't omit that here, either.
We need to set aside one byte
of memory to hold that so we
know when our string has ended.
So again this arrangement
of the way things
appear in memory might
be a little random,
but it actually is how
most systems are designed.
To line them up on multiples
of four, for reasons again
that we don't need to
get into right now.
But this, so suffice it to say that
after these three lines of code,
this is what memory might look like.
If I need memory locations
4, 8, and 12 to hold my data,
this is what my memory might look like.
>> And just be particularly
pedantic here, when
we're talking about memory
addresses we usually
do so using hexadecimal notations.
So why don't we convert all of these
from decimal to hexadecimal notation
just because that's generally
how we refer to memory.
So instead of being 0 through
19, what we have is zero
x zero through zero x1 three.
Those are the 20 bytes of memory that we
have or we're looking at in this image
right here.
>> So all of that being said, let's
step away from memory for a second
and back to pointers.
Here is the most important
thing to remember
as we start working with pointers.
A pointer is nothing
more than an address.
I'll say it again because
it's that important,
a pointer is nothing
more than an address.
Pointers are addresses to locations
in memory where variables live.
Knowing that it becomes hopefully a
little bit easier to work with them.
Another thing I like
to do is to have sort
of diagrams visually representing what's
happening with various lines of code.
And we'll do this a couple
of times in pointers,
and when we talk about dynamic
memory allocation as well.
Because I think that these diagrams
can be particularly helpful.
>> So if I say for example, int k
in my code, what is happening?
Well what's basically happening is
I'm getting memory set aside for me,
but I don't even like to
think about it like that, I
like to think about it like a box.
I have a box and it's
colored green because I
can put integers in green boxes.
If it was a character I
might have a blue box.
But I always say, if I'm creating
a box that can hold integers
that box is colored green.
And I take a permanent marker
and I write k on the side of it.
So I have a box called k,
into which I can put integers.
So when I say int k, that's
what happens in my head.
If I say k equals five, what am I doing?
Well, I'm putting five
in the box, right.
This is pretty straightforward, if
I say int k, create a box called k.
If I say k equals 5,
put five into the box.
Hopefully that's not too much of a leap.
Here's where things go a
little interesting though.
If I say int*pk, well even if I don't
know what this necessarily means,
it's clearly got something
to do with an integer.
So I'm going to color
this box green-ish,
I know it's got something
to do with an integer,
but it's not an integer itself,
because it's an int star.
There's something slightly
different about it.
So an integer's involved,
but otherwise it's
not too different from
what we were talking about.
It's a box, its got a label,
it's wearing a label pk,
and it's capable of holding
int stars, whatever those are.
They have something to do
with integers, clearly.
Here's the last line though.
If I say pk=&k, whoa,
what just happened, right?
So this random number, seemingly random
number, gets thrown into the box there.
All that is, is pk
gets the address of k.
So I'm sticking where k lives in memory,
its address, the address of its bytes.
All I'm doing is I'm saying
that value is what I'm going
to put inside of my box called pk.
And because these things are
pointers, and because looking
at a string like zero x
eight zero c seven four eight
two zero is probably
not very meaningful.
When we generally visualize pointers,
we actually do so as pointers.
Pk gives us the information
we need to find k in memory.
So basically pk has an arrow in it.
And if we walk the length
of that arrow, imagine
it's something you can walk on, if we
walk along the length of the arrow,
at the very tip of that arrow, we
will find the location in memory
where k lives.
And that's really important
because once we know where k lives,
we can start to work with the data
inside of that memory location.
Though we're getting a teeny
bit ahead of ourselves for now.
>> So what is a pointer?
A pointer is a data item whose
value is a memory address.
That was that zero x eight zero stuff
going on, that was a memory address.
That was a location in memory.
And the type of a pointer
describes the kind
of data you'll find at
that memory address.
So there's the int star part right.
If I follow that arrow, it's
going to lead me to a location.
And that location, what I
will find there in my example,
is a green colored box.
It's an integer, that's what I
will find if I go to that address.
The data type of a
pointer describes what
you will find at that memory address.
So here's the really cool thing though.
Pointers allow us to pass
variables between functions.
And actually pass variables
and not pass copies of them.
Because if we know exactly where
in memory to find a variable,
we don't need to make a copy of
it, we can just go to that location
and work with that variable.
So in essence pointers sort
of make a computer environment
a lot more like the real world, right.
>> So here's an analogy.
Let's say that I have a notebook,
right, and it's full of notes.
And I would like you to update it.
You are a function that
updates notes, right.
In the way we've been
working so far, what
happens is you will take my notebook,
you'll go to the copy store,
you'll make a Xerox copy of
every page of the notebook.
You'll leave my notebook back
on my desk when you're done,
you'll go and cross out things in my
notebook that are out of date or wrong,
and then you'll pass back to
me the stack of Xerox pages
that is a replica of my notebook with
the changes that you've made to it.
And at that point, it's up to me as
the calling function, as the caller,
to decide to take your notes and
integrate them back into my notebook.
So there's a lot of steps
involved here, right.
Like wouldn't it be better
if I just say, hey, can you
update my notebook for
me, hand you my notebook,
and you take things and
literally cross them out
and update my notes in my notebook.
And then give me my notebook back.
That's kind of what
pointers allow us to do,
they make this environment a lot
more like how we operate in reality.
>> All right so that's what
a pointer is, let's talk
about how pointers work in C, and
how we can start to work with them.
So there's a very simple pointer
in C called the null pointer.
The null pointer points to nothing.
This probably seems like it's
actually not a very useful thing,
but as we'll see a
little later on, the fact
that this null pointer exists
actually really can come in handy.
And whenever you create a pointer, and
you don't set its value immediately-
an example of setting
its value immediately
will be a couple slides back
where I said pk equals & k,
pk gets k's address, as
we'll see what that means,
we'll see how to code that shortly-
if we don't set its value to something
meaningful immediately,
you should always
set your pointer to point to null.
You should set it to point to nothing.
>> That's very different than
just leaving the value as it is
and then declaring a
pointer and just assuming
it's null because that's rarely true.
So you should always set
the value of a pointer
to null if you don't set its value
to something meaningful immediately.
You can check whether a pointer's value
is null using the equality operator
(==), just like you compare any integer
values or character values using (==)
as well.
It's a special sort of constant
value that you can use to test.
So that was a very simple
pointer, the null pointer.
Another way to create
a pointer is to extract
the address of a variable
you've already created,
and you do this using the &
operator address extraction.
Which we've already seen previously
in the first diagram example I showed.
So if x is a variable that we've
already created of type integer,
then &x is a pointer to an integer.
&x is- remember, & is going to extract
the address of the thing on the right.
And since a pointer is just an address,
than &x is a pointer to an integer
whose value is where in memory x lives.
It's x's address.
So &x is the address of x.
Let's take this one step
further and connect to something
I alluded to in a prior video.
If arr is an array of doubles, then
&arr square bracket i is a pointer
to a double.
OK.
arr square bracket i, if
arr is an array of doubles,
then arr square bracket i is
the i-th element of that array,
and &arr square bracket i is where in
memory the i-th element of arr exists.
>> So what's the implication here?
An arrays name, the implication
of this whole thing,
is that an array's name is
actually itself a pointer.
You've been working
with pointers all along
every time that you've used an array.
Remember from the example
on variable scope,
near the end of the video I present
an example where we have a function
called set int and a
function called set array.
And your challenge to determine
whether or not, or what the
values that we printed out
the end of the function,
at the end of the main program.
>> If you recall from that example
or if you've watched the video,
you know that when you- the call to
set int effectively does nothing.
But the call to set array does.
And I sort of glossed over why
that was the case at the time.
I just said, well it's an array, it's
special, you know, there's a reason.
The reason is that an array's
name is really just a pointer,
and there's this special
square bracket syntax that
make things a lot nicer to work with.
And they make the idea of a
pointer a lot less intimidating,
and that's why they're sort
of presented in that way.
But really arrays are just pointers.
And that's why when we
made a change to the array,
when we passed an array as a parameter
to a function or as an argument
to a function, the contents of the array
actually changed in both the callee
and in the caller.
Which for every other kind of
variable we saw was not the case.
So that's just something to keep in
mind when you're working with pointers,
is that the name of an
array actually a pointer
to the first element of that array.
>> OK so now we have all these
facts, let's keep going, right.
Why do we care about
where something lives.
Well like I said, it's pretty
useful to know where something lives
so you can go there and change it.
Work with it and actually
have the thing that you
want to do to that variable take effect,
and not take effect on some copy of it.
This is called dereferencing.
We go to the reference and
we change the value there.
So if we have a pointer and it's called
pc, and it points to a character,
then we can say *pc and *pc is the
name of what we'll find if we go
to the address pc.
What we'll find there is a character and
*pc is how we refer to the data at that
location.
So we could say something like
*pc=D or something like that,
and that means that whatever
was at memory address pc,
whatever character was previously
there, is now D, if we say *pc=D.
>> So here we go again with
some weird C stuff, right.
So we've seen * previously as being
somehow part of the data type,
and now it's being used in
a slightly different context
to access the data at a location.
I know it's a little confusing and
that's actually part of this whole
like, why pointers have this mythology
around them as being so complex,
is kind of a syntax problem, honestly.
But * is used in both contexts,
both as part of the type name,
and we'll see a little
later something else, too.
And right now is the
dereference operator.
So it goes to the reference,
it accesses the data
at the location of the pointer, and
allows you to manipulate it at will.
>> Now this is very similar to
visiting your neighbor, right.
If you know what your
neighbor lives, you're
not hanging out with your neighbor.
You know you happen to
know where they live,
but that doesn't mean that by
virtue of having that knowledge
you are interacting with them.
If you want to interact with them,
you have to go to their house,
you have to go to where they live.
And once you do that,
then you can interact
with them just like you'd want to.
And similarly with variables,
you need to go to their address
if you want to interact them,
you can't just know the address.
And the way you go to the address is
to use *, the dereference operator.
What do you think happens
if we try and dereference
a pointer whose value is null?
Recall that the null
pointer points to nothing.
So if you try and dereference
nothing or go to an address nothing,
what do you think happens?
Well if you guessed segmentation
fault, you'd be right.
If you try and dereference
a null pointer,
you suffer a segmentation
fault. But wait,
didn't I tell you, that
if you're not going
to set your value of your
pointer to something meaningful,
you should set to null?
I did and actually the segmentation
fault is kind of a good behavior.
>> Have you ever declared a variable and
not assigned its value immediately?
So you just say int x; you don't
actually assign it to anything
and then later on in your code,
you print out the value of x,
having still not
assigned it to anything.
Frequently you'll get
zero, but sometimes you
might get some random number, and
you have no idea where it came from.
Similarly can things
happen with pointers.
When you declare a pointer
int*pk for example,
and you don't assign it to a value,
you get four bytes for memory.
Whatever four bytes of
memory the system can
find that have some meaningful value.
And there might have been
something already there that
is no longer needed by another
function, so you just have
whatever data was there.
>> What if you tried to do dereference
some address that you don't- there were
already bytes and information in
there, that's now in your pointer.
If you try and dereference that pointer,
you might be messing with some memory
that you didn't intend
to mess with it all.
And in fact you could do
something really devastating,
like break another program,
or break another function,
or do something malicious that
you didn't intend to do at all.
And so that's why it's
actually a good idea
to set your pointers to null if you
don't set them to something meaningful.
It's probably better at the
end of the day for your program
to crash then for it to do
something that screws up
another program or another function.
That behavior is probably even
less ideal than just crashing.
And so that's why it's
actually a good habit
to get into to set your pointers
to null if you don't set them
to a meaningful value
immediately, a value that you know
and that you can safely the dereference.
>> So let's come back now and take a look
at the overall syntax of the situation.
If I say int *p;, what have I just done?
What I've done is this.
I know the value of p is an address
because all pointers are just
addresses.
I can dereference p
using the * operator.
In this context here, at the very
top recall the * is part of the type.
Int * is the data type.
But I can dereference
p using the * operator,
and if I do so, if I go to that address,
what will I find at that address?
I will find an integer.
So int*p is basically
saying, p is an address.
I can dereference p and if
I do, I will find an integer
at that memory location.
>> OK so I said there was another
annoying thing with stars
and here's where that
annoying thing with stars is.
Have you ever tried to declare
multiple variables of the same type
on the same line of code?
So for a second, pretend that the line,
the code I actually have there in green
isn't there and it just says int x,y,z;.
What that would do is actually create
three integer variables for you,
one called x, one called
y, and one called z.
It's a way to do it without
having to split onto three lines.
>> Here's where stars get
annoying again though,
because the * is actually part
of both the type name and part
of the variable name.
And so if I say int *px,py,pz, what I
actually get is a pointer to an integer
called px and two integers, py and pz.
And that's probably not what
we want, that's not good.
>> So if I want to create multiple pointers
on the same line, of the same type,
and stars, what I actually need
to do is say int *pa,*pb,*pc.
Now having just said that
and now telling you this,
you probably will never do this.
And it's probably a good thing honestly,
because you might inadvertently
omit a star, something like that.
It's probably best to maybe declare
pointers on individual lines,
but it's just another one
of those annoying syntax
things with stars that make
pointers so difficult to work with.
Because it's just this syntactic
mess you have to work through.
With practice it does
really become second nature.
I still make mistakes with it still
after programming for 10 years,
so don't be upset if something happens
to you, it's pretty common honestly.
It's really kind of
a flaw of the syntax.
>> OK so I kind of promised
that we would revisit
the concept of how large is a string.
Well if I told you that a
string, we've really kind of
been lying to you the whole time.
There's no data type called
string, and in fact I
mentioned this in one of our
earliest videos on data types,
that string was a data type that
was created for you in CS50.h.
You have to #include
CS50.h in order to use it.
>> Well string is really just
an alias for something
called the char *, a
pointer to a character.
Well pointers, recall,
are just addresses.
So what is the size
in bytes of a string?
Well it's four or eight.
And the reason I say four or
eight is because it actually
depends on the system, If you're using
CS50 ide, char * is the size of a char
* is eight, it's a 64-bit system.
Every address in memory is 64 bits long.
If you're using CS50 appliance
or using any 32-bit machine,
and you've heard that term 32-bit
machine, what is a 32-bit machine?
Well it just means that every
address in memory is 32 bits long.
And so 32 bits is four bytes.
So a char * is four or eight
bytes depending on your system.
And indeed any data types,
and a pointer to any data
type, since all pointers are just
addresses, are four or eight bytes.
So let's revisit this
diagram and let's conclude
this video with a little exercise here.
So here's the diagram we left off with
at the very beginning of the video.
So what happens now if I say *pk=35?
So what does it mean when I say, *pk=35?
Take a second.
*pk.
In context here, * is
dereference operator.
So when the dereference
operator is used,
we go to the address pointed to
by pk, and we change what we find.
So *pk=35 effectively
does this to the picture.
So it's basically syntactically
identical to of having said k=35.
>> One more.
If I say int m, I create
a new variable called m.
A new box, it's a green box because
it's going to hold an integer,
and it's labeled m.
If I say m=4, I put an
integer into that box.
If say pk=&m, how does
this diagram change?
Pk=&m, do you recall what the
& operator does or is called?
Remember that & some variable name
is the address of a variable name.
So what we're saying is
pk gets the address of m.
And so effectively what happens the
diagram is that pk no longer points
to k, but points to m.
>> Again pointers are very
tricky to work with
and they take a lot of
practice, but because
of their ability to allow you
to pass data between functions
and actually have those
changes take effect,
getting your head around
is really important.
It probably is the most complicated
topic we discuss in CS50,
but the value that you
get from using pointers
far outweighs the complications
that come from learning them.
So I wish you the best of
luck learning about pointers.
I'm Doug Lloyd, this is CS50.
