[MUSIC PLAYING]
DAVID MALAN: All right.
So this is CS50, and this is
lecture two, our continuation of C.
And for the next several weeks,
we're gonna keep using C.
But we're gonna focus less on the
language and the syntax, which we'll
get experience with over time
by way of the problem sets,
and more and more on the ideas
and more and more on the problems
that we can solve.
But before we forge
ahead with anything new,
let's take a quick look
back at where we left off
and what we'll sort of assume
for today's comfort level,
and ask any and all
questions along the way.
So in order to program last
time, we needed a tool.
And that tool was this
thing here, CS50 IDE.
If you haven't dived in already, you
probably will this weekend for problem
set one.
And this will be a web-based
programming environment
that's got all the requisite
tools you need in order
to write code, compile code, and
then, starting today, debug code
or find mistakes in code.
But this is requisite because it's
not sufficient to just write this.
What do we call this,
generally speaking?
Yeah, so this is source code.
So this is code.
When someone says, I write code,
they write stuff like this.
And this is, particularly, a
language called C. But, of course,
computers don't speak C.
And they don't speak Java,
and they don't speak Python or C++
or any of the languages with which
you're familiar.
They only understand what
at the end of the day?
Yeah, so binary.
So binary is, of course zeros
and ones, otherwise known
as machine code in this context,
insofar as it it is code.
It's instructions that implement
some problem-solving techniques.
But it's just zeros and ones
that computers understand.
So we needed a tool to get from
A to B. And that was called what?
Yeah, a compiler.
So a compiler, of
course, does this for us.
Source code is the input.
Compiler is the program
or really the algorithm,
albeit in the form of
a piece of software.
And then the output is
machine code, zeros and ones.
And for our purposes, we're not going to
worry about how we get from step A to B
per se.
We'll want to use the tool.
But this is another area unto
itself in computer science.
If you want to understand
how compilers work
and how humans got from literally zeros
and ones to something called assembly
code to higher-level languages,
you'll see a glimpse of that in CS50.
But it unto itself is a whole field
that might prove ultimately of interest.
But this, more mechanically,
is how we compiled code.
Clang is a compiler.
It stands for C language.
And it's just software that some
humans wrote some years ago.
And there are alternatives.
If you've ever used Visual
Studio in the Windows world
or GCC in the Linux and Unix world,
there's bunches of other compilers.
We just happened to use clang
since it's pretty popular.
And then that second command
is even stranger looking.
But it represents the act of doing what?
./a.out.
Yes, over here?
Yeah, running the program.
Exactly.
So ./a.out is a cryptic
way of running a program.
But it's like the textual equivalent
of double-clicking an icon.
And a.out is just like
the default name you
get, assembler output, when you compile
a program without specifying its name.
But we were able to specify a name.
If you introduce a technique
called command-line arguments,
you can be a little more precise.
clang -o for output,
then any word you want.
In this case, I went with "hello."
And then the name of the program or
the file that you want to compile.
But, this of course,
gets pretty tedious.
And, in fact, there's a missing step.
Sometimes when you want
to write a program,
it suffices to compile
it exactly as that.
But let me go ahead and do this.
Let me go ahead into CS50
IDE, and let me go ahead
and briefly do the following.
File, New.
And let me go ahead and
save this as "hello.c."
And just from memory, I'll quickly
recreate that same program.
int main void, and then we
had printf, hello, world.
And then just for good measure,
backslash n, which means
move the cursor to the next line.
And now I'm gonna go
ahead and save that.
If I go ahead now and run
clang hello.c, looks good.
And ./a.out.
That looks good, too.
But recall that we introduced some
other functions the other day,
as well, like get_string and get_int.
And we'll see bunches more before long.
And if I do that, notice that I
have to do a couple of things.
So if I want to do, like,
string name gets get string,
and then, quote, unquote
"name" to prompt the user,
recall that the left-hand
side says, hey, computer,
give me a variable that's
gonna store string.
Call it name.
Could have called it anything.
I could have called it s.
But why might it be arguably better
to call my variable name instead
of string, or s, rather?
Yeah?
AUDIENCE: It's clearer.
DAVID MALAN: It's just clearer, right?
It might be a very
marginal, nit-picky detail.
But it's just clearer.
And when our programs get
bigger, it's just nicer
to be able to read words
and understand implicitly
what they mean without having to
think through what x or y or s
or whatever actually is.
On the right-hand side, meanwhile,
we had this function get-string,
whose purpose in life is to
go get a string from the user,
from his or her keyboard, prompting
them with a word like "name,"
and then return it, just as Sam
handed me back a slip of paper
with a name on it.
But there's a catch.
It's now no longer sufficient to just
adapt my code for this new approach.
And I had, again, to
change that second line.
I had to give a
placeholder, which is %s.
And if that's a little cryptic looking,
just kind of think of it like a Mad
Lib, if you're familiar with
those, where there's just, like,
a fill-in-the-blank here.
And all %s is doing is
saying, put a word there.
What word?
Well whatever comes after the comma,
whatever variable or value is there.
So that's good.
But a couple of things can go wrong.
And let me point those out
so that you don't perhaps
trip over it yourself on your own.
Let me go ahead and do "clang hello.c"
now that I've made these changes
and saved them.
Bunches of errors all of a
sudden, even though I've not
changed all that much code.
Again, rule of thumb
from last time should
be even if there's lots of error
messages, always look at the first one
first because the rest might just be
kind of a resulting cascade of errors,
only one of which is
important, which is the first.
Now, it says "use of
undeclared identifier string.
Did you mean stdin," which
is something else altogether.
I didn't.
I meant string.
And it turns out that string is a
feature of the so-called CS50 library.
So this is one of these training wheels
we're just gonna use for a few weeks
until we dive in underneath
the hood of strings, too.
But in order to use anything from CS50,
what did I need to add to my code, too?
AUDIENCE: Source code.
DAVID MALAN: Source code, yes.
But what else?
AUDIENCE: The library.
DAVID MALAN: The library.
And the library was the CS50 library.
And that means there's a file
somewhere on the computer, in the IDE,
called cs50.h, a so-called header
file-- more on those in a bit.
And so I did forget that detail.
But suppose you don't
recall that yourselves.
Well, you might recall from
one of our orientation sessions
that CS50 has tools with
which to help for this, too.
You don't have to turn to this
course's online discussion forum.
You don't have to go to
office hours, necessarily,
for error messages like this.
If you can't quite wrap your
mind around what's happening,
go ahead and do this instead.
Instead of just running
clang hello.c, do something
like help50 clang
hello.c, where help50 is
a CS50 specific command, sort of a
virtual teaching fellow, if you will.
And if we recognize your error
messages in yellow at the bottom,
we'll highlight the first one.
And then we'll try to give you
advice like you might get in person.
So "by undeclared
identifier, clang means
that you've used a
name, string, on line 5
of hello.c, which hasn't been defined.
Did you forget to include cs50.h,
in which string is defined,
atop your file?"
So we'll generally try to prompt
you with rhetorical questions that
hopefully are correct in leading
you toward the right solution.
So OK, that jogged my memory, at
least, even if I'm not yet 100%
comfortable with what these
lines really are doing.
And we'll tease that apart more today.
But it feels like I'm trying to do this.
But it turns out there's one other
gotcha here. clang hello.c, Enter.
Dang it.
Well, actually, this is
a net positive, right?
Fewer error messages, it would seem.
But now one we did not
see the other day--
"undefined reference to get_string."
So it's kind of similar in spirit.
Something is not understood.
But it's different wording, certainly.
And I'm not quite sure what that means.
But it turns out that
it's not sufficient
just when using most libraries to
use include cs50.h or other header
files, as we'll eventually see.
That just teaches the compiler
that something exists.
It was like briefly last time,
when we talked about prototypes,
where I put a little one-liner
that just said, by the way,
this function's eventually gonna exist.
That's what the header file's doing.
It's like a promise to
clang, this shall exist.
But there's a second step.
You actually have to feed
clang the zeros and ones that
implement the CS50 library that you,
of course, did not create yourself,
but we did pre-install in the IDE.
And there's a separate
way of doing that.
Rather than just do
clang hello.c, you have
to do what's called linking
your code against that library,
at least if it's a third-party library
that doesn't come with the computer.
It came from humans like us.
So this now says, hey, clang,
compile hello.c, but link it
against the CS50 library, which
means take my zeros and ones,
take CS50's zeros and ones,
combine them, and then give me
my actual program to run.
And so that's going to
be a key ingredient.
And help50 could guide
you toward that solution.
But if I hit Enter now,
now it seems to compile.
./a.out.
My name shall be David.
Enter.
And it says, "hello, David."
So again, don't get hung up,
ultimately, in office hours and problem
sets on these kinds of errors.
You're gonna hit these
bumps from the get-go.
But just realize-- look
for sort of familiar words.
Use things like help50.
Reach out to the course online.
And just get over those hurdles
because at the end of the day,
the interesting stuff's
gonna be the logic
of the programs and the actual
problems we're trying to solve.
So what is it that's actually, then,
going on here underneath the hood?
And frankly, this is very
quickly becoming tedious.
So how do I automate
some of these processes?
Because it's very easy to forget
this, and it's just very boring
to continue to type so many commands.
Well, recall that
there was a shortcut we
talked about the other day, which just
kind of hides all of these details.
You don't need to remember -o.
You now don't need to remember -lcs50.
What do you do instead
to make a program?
Yeah?
Yeah, so make, it's
not a compiler itself.
It's just kind of a helper program that
knows how to run a compiler for you.
So frankly, the simpler approach
is just to do that-- make hello.
And the reason it outputs so many
more words is just because we have,
in anticipation of teaching
the semester, sort of
preconfigured it with command-line
arguments, additional words,
that we expect you're
gonna need at some point.
And this just saves you the trouble of
having to futz around with the manual
to figure that kind of thing out.
But that's why it looks cryptic.
But notice this is really
the most important word--
hello followed by hello.c.
And those are your two ingredients.
All right.
So what's going on, then,
underneath the hood there?
Well, it turns out that even though
we can simplify the command structure,
it's actually doing quite a bit for us.
And this the process of compiling.
But that was kind of an
oversimplification, or, put more
intelligently, like an abstraction.
There's actually quite a few steps that
go on underneath the hood, one of which
is called preprocessing and
compiling and assembling and linking.
So let's do a quick dive-down here.
But then we'll abstract away, just
so that you've seen what's going on.
But henceforth, we can just take for
granted that all of this is happening.
So here is source code,
same program as the simplest
version we had a moment ago.
And ultimately, I need to
get this to machine code.
Well, let's see if we can't just
visualize how we get from point A to B
without completely abstracting it
away with just those big arrows.
So this is my source code.
And it turns out that
the very first step
of turning source code into
machine code in the world of C
is you first run what's
called a preprocessor.
You don't do this
explicitly, although you
could if you were really
low-level and interested in it.
But what the preprocessor
does, essentially,
is anytime there's a line of code
that starts with a pound sign,
or a hashtag these days, that's a
special command that gets, essentially,
replaced with the contents of
the file, at least in this case.
So somewhere on the idea is a
file called, literally, "stdio.h."
And so #include means go get that file
and essentially copy and paste it here.
And so when you preprocess
your code, this yellow line
here becomes something like this.
And I'm doing "..."
it's dozens if not
hundreds of lines long.
But there's one juicy line in
it which is the little clue
to clang that printf shall exist.
And that's why you need stdio.h.
So that's essentially, for our purposes
today, all the preprocessor does.
It does these kind of find
and replace style operations
so that now your file,
without you knowing it,
suddenly became much bigger because
it's got other lines of code
that someone else wrote.
And then your code remains
right there as it was.
But the next step after preprocessing
is something called compiling itself,
which technically, the
compiler, if we really
want to be nit-picky and look
at its formal definition,
is actually taking these yellow lines,
your source code and someone else's,
perhaps, and converting it into
something called assembly code.
And this is a language that
humans kind of sort of still
do, but back in the day
really did program in.
And in fact, if you have a computer
with an Intel CPU, a brain made
by Intel inside of your computer, there
was and still is a big user's manual
that tells programmers around the
world that this Intel CPU understands
the following instructions--
add, subtract, multiply, divide,
all the basics, and then things
like move numbers from here
to here, read numbers from
here to here, just move stuff
around in the computer's memory.
And so even though this
really looks cryptic even
to me, since I am by no means
an expert at assembly language,
certainly, all these years later,
you can see words that kind of sound
familiar, like "mov" suggests moving
a value from one location to another.
"sub" alludes to subtraction, so
subtracting one number from another.
And without really thinking
this through carefully,
I'm not really sure what's going on yet.
But I do see a familiar word
down there called "printf."
And so long story short, what the
computer or compiler specifically
has done is it's taken my more
user-friendly C code, converted it
to something that's a little closer
to what the machine understands.
But it's not there yet
because the machine only
understands zeros and ones.
So there's another
step called assembling.
And the assembling process
simply takes assembly code
and converts it to zeros and ones.
Now we're down to the zeros and ones.
And what's amazing-- if it's
interesting in the first place-- is
when you run clang and hit Enter, all
of this is just happening instantly.
And you're getting these
zeros and ones, this output.
But I've left the room on the
other side because all we've done
is convert my code from source code
to assembly code to machine code.
What needs to now be merged in, so to
speak, for that "Hello World" program?
Yeah, so still need, like, stdio, the
standard I/O library that has printf.
So the next step is to take a
whole bunch of zeros and ones
from somewhere else on
the system, combine them
until this is the file
containing a.out or hello,
whatever you called your program.
And that, ultimately, is what
the computer understands.
So that is a very low-level detail.
Thankfully, we learned
in the very first lecture
this notion of abstraction, which means
even as you dive in underneath the hood
and sort of understand
how we're building up,
now, henceforth-- literally every minute
hereafter-- that whole mouthful just
becomes compiling.
And indeed, that's what most
people in the programming business
refer to as compiling, is
all of those several steps.
But that's all that's happening.
Feels like magic, but it's
just one step after another
gets us closer to our goal.
Questions?
It's about as low-level as we'll get.
Yeah?
AUDIENCE: Why do you have to go through
the assembly code and then the machine
code?
Why not just go straight
to machine code?
DAVID MALAN: Good question.
Why do you have to go from one
step to another, like from source
code to assembly code to machine code?
You absolutely could.
It just happens to be the case that
there's lots of humans in the world
and lots of people working
on different projects.
And this notion of
layering your software
on top of someone else's
on top of others' allows
us to build more complex systems
much more cleanly, if you will.
And there's different types
of computers in the world.
There's Macs.
There's PCs, which
even though these days,
they're a lot more similar
underneath the hood,
literally, than they used to be back
in the day, there's different CPUs.
There's phones that have
very different CPUs.
And wouldn't it be nice if I could
write my programs in one language
and compile them into zeros and ones
that do work on a Mac and on a PC
and on an Android phone
and an iPhone and so forth?
And that's why by having these
sort of different layers,
one set of humans or one person can
implement the process of converting C
to assembly code.
Then someone else can take it to
the zeros and ones, in some sense.
Or even-- there's even
intermediate steps.
Compilers have front ends and back
ends and all of this complexity.
But it gives us advantages
because it means
we can sort of decide which types
of hardware to support more easily.
Really good question.
Other questions?
OK.
So with that said, let's
now consider any number
of ways in which things can go wrong.
It's easy for me, certainly,
to write "hello, world,"
and everything just kind of works.
And even when it doesn't, I
quickly know how to fix it.
And it's only from
experience and practice.
But let me just give you a teaser not
just of help50 but two other tools
that you'll see, particularly for the
problem sets, that will not necessarily
teach you how to write good
code-- good, efficient code.
That's where the humans come in and the
teaching fellows feedback and sections
and office hours and more--
but at least to write correct
code that meets our specifications
and that's well-styled,
at least looks good.
But the third ingredient, recall,
besides correctness and style
is gonna be design,
which is something we'll
learn after practice and examples.
So with check50, this
is a tool that comes
in CS50 IDE, recall, if unfamiliar,
that allows me to essentially do this.
Let me whittle this
back down to my simplest
hello, world program like this.
I no longer need the CS50 library.
I can run make hello.
Seems to work.
And how do you go about
testing your programs if you've
written this for a problem set?
Well, the easiest and most
straightforward way, of course,
is just run it.
Looks like it's correct.
And it is.
And there's not too much that
can go wrong in this program.
But soon, you'll see,
with problem set one
and beyond, anytime
you start getting input
from the user where he or
she has to type their name
or a number or other things, you
can absolutely concoct scenarios
where something goes wrong.
But if you run a command in this case
like check50, we can do the following.
Let me go ahead and first
make a directory called--
let me go ahead and
do this and do mkdir--
for make directory-- hello.
And then we didn't see
this the other day.
And you'll see more of this
in today's super sections,
or classwide sections,
which will also be filmed.
I'm just gonna to move this file
into a directory called hello.
So that's like on a Mac or PC
just dragging and dropping it.
But I'm doing it with my keyboard.
What's the command to change
into another directory? cd.
So that's like double-clicking
on a directory,
albeit with my keystrokes only.
And now I'm gonna go ahead and run this.
I can run make hello again.
Seems to work.
And I can run ./hello.
Seems to work.
But now let's see if CS50 agrees.
So check50.
And then I'm gonna type
"cs50/2017/fall/hello,"
which looks like a bunch
of folders, but it's not.
It's just a unique identifier that
has sort of some hierarchy to it.
You would only know to type
this by reading problem set
specification online.
And what this is gonna do, if
you haven't seen it already,
is actually connect to CS50's server.
It's gonna authenticate
you, if you haven't already.
I'm gonna go ahead and
log in as student50.
And now hereafter it will
remember my password,
for at least some amount
of time, so you don't
have to type it in every darn time.
Then it's preparing.
It's uploading.
And what's happening
now is my "hello.c" file
is somewhere in the
cloud on CS50 servers.
We are running the checks, the
tests that the staff wrote.
And hopefully, I'm gonna see a
whole bunch of green smiley faces
that look a little
yellow on this projector,
but those are, in fact,
green smiley faces
instead of frowny faces, which
would suggest something is wrong.
So that's all good.
And don't be discouraged if
you see a few frowny faces
or a few flat, confused faces
if something else is awry.
But style50 does something different.
Right now, the style
of my code, I'd argue,
looks pretty good because it's kind of
hard to go wrong when it's this short.
But we'll see a way.
And if I instead run "style50 hello.c,"
just the name of the file I want
to check--
looks good, but consider
adding more comments.
And that's pretty compelling
because there's zero at the moment.
And so what kind of comments
might you want to add?
Well, in this program, it's not that
compelling to add that many comments
because the reality is this program's
so short it probably takes me
less time to read the
code than the comments.
But it's very common, as
you'll see in the examples
from lecture, to do
something like this--
"says hello to user," just a
quick one-line summary so that
when you're skimming the file or
looking at the code, OK, got it.
I know what this does.
And if I care to know how it does
that, then I can read the code.
And so that would be a comment.
And that will probably make
style50 happy in this case.
But what if I'm getting a little sloppy?
And I remember vaguely that I was in the
habit of hitting Tab or the space bar
in lecture.
But I can't be bothered to do that
when I'm working on my problems set.
I just want to get the
darn thing to work.
It's not uncommon for code to
eventually start to look like this,
even though this, too,
is a simple program.
Now, good style, as you'll
see and learn from practice,
dictates that just like in Scratch
there were those yellow puzzle
pieces that kind of hug the code,
similarly, inside of curly braces,
you really should be indenting.
And so if I go ahead and sort of forget
that and now run style50 on "hello.c,"
I'll see see my code
outputted in the terminal
window, the bottom of the screen.
But green suggests hey, programmer,
add the following characters.
So green suggests add here.
And if I go ahead now and
reindent that by hitting
Tab-- specifically four spaces,
which is a human convention--
it should make it happy again.
We can go in the reverse
direction, though.
Suppose that I got a little
confused as to what I actually
am supposed to indent--
and you might even
see in textbooks and
some online resources,
some people write their code like this.
Let me go ahead now and
run style50 on this.
It's gonna print out my code.
And red in this case means
remove those characters
that you might not otherwise see.
So it's not always going to be perfect.
And especially when
the programs get long,
it might be a little nonobvious
what changes you have to make.
But just like with error
messages, start at the top.
Make one or few changes.
Save it and rerun it, and see
what the updated advice is.
And I can't stress this enough,
especially with problem set one
and any problem set
thereafter-- don't get
into the habit of
sitting down and trying
to bite off the entirety of a problem.
Odds are with Scratch, you didn't
sit down and write the whole thing
without once playing it or testing
it or adding features to it.
Don't get into that habit, then,
in C. Take steps and steps,
just as we've been doing
with these examples so far.
All right.
Any questions, then, on those tools?
And we'll come back in just a moment
to more sophisticated debugging
techniques.
All right.
So one of the problems that
we were distracted by earlier
is there's this old-school
games, "Super Mario
Bros.," wherein a character like this
jumps around the screen quite a bit.
And it's one from the
very first Nintendo game,
and there's lots of obstacles in the way
of Mario as he's running left and right
and jumping.
And some of these obstacles
can be represented
with fairly simple constructs like
bricks in this colorful world.
And we can approximate this just
by using characters on our screens,
as well.
So I actually poked around for
far too much time last night
looking at old "Super Mario Bros."
maps, which if I had them in,
like, the 1980s, would have made
"Super Mario Bros." a lot easier.
But people have captured all
of the imagery from this game.
And one snapshot from this
game was a screen like this.
So eventually, Mario's
supposed to run through this.
And he's supposed to bump his
head up against the question marks
and get coins and so forth.
But for now, I'm gonna really,
really simplify this and propose
that all I care about for
the sake of discussion
is this line of question marks.
How would a computer program,
whether in "Super Mario Bros."
or today here in Sanders, go about
printing a line of question marks
in a row like that?
Well let me go ahead
and open up CS50 IDE.
I'm gonna go ahead and
create a new file here.
And I'm just going to go ahead
and call this, say, "mario0.c"
because it's the first or the
zero version of this program.
And I just want to print,
like, four question marks.
So let me take a stab at this first.
So #include stdio.h, which
I think I need because why?
AUDIENCE: Printf?
DAVID MALAN: Yeah, I need printf.
i need to be able to
print the character.
So int main void is what comes next.
And we'll start to tease
apart why that is today.
And now I'm going to go
ahead and print out "????."
And then semi-colon.
All right.
Let me go ahead now and make mario0.
./mario0.
And I kind of sort of have a
very ugly textual representation
of a really fun-- at
least 1980s style-- game.
But there's a slight aesthetic bug.
And I made this same
mistake the other day.
How do I move my cursor
onto the next line?
Yeah, so backslash n.
So backslash is the one we're about
to type, and forward slash or slash
is what people would call
just the other direction.
So that's backslash n.
And that's a special escape
character, so to speak.
For now, just know that this
starts to confuse the computer
if you just literally hit Enter.
Now, your code's on two
lines, when really it's
just one idea or one function.
So humans decided some time ago, let's
just represent that special character
that you would otherwise
just hit on a keyboard.
So now if I rerun make mario0.
./mario0.
OK, now looks a little better.
But we know from scratch that we
don't just need to do question mark,
question mark, question
mark, question mark,
especially if I want even more
coins to be available on the screen.
What's the right programming construct
to just give me more of these?
AUDIENCE: A for loop.
DAVID MALAN: Yeah,
like a for loop, right?
So let me go ahead and
tweak this a little bit.
Let me go ahead and in,
let's say, "mario1.c"--
so "mario1.c"-- I'm
going to instead do this.
So for int-- to give me an integer--
i equals 0 by default,
though it could be 1.
But programmers tend to use 0.
i is less than--
I'm not sure, so let's just put
a big blank there for a moment.
And then i plus plus, I remember,
being the way to increment.
And then inside of this loop,
I'm going to do printf "?"
semi-colon.
And now let's answer this question.
If this for loop, which, recall, has
a very methodical process to it--
it initializes, checks the
condition, does something,
increments, checks the condition,
repeat again and again.
What number should I put on
the otherwise blank line here?
AUDIENCE: Four.
DAVID MALAN: So four.
But if I'm counting from 0 to 4,
that feels like it's five numbers.
So three might get me
closer, but less than.
We have this relational operator.
Less than, could have been
greater than in other contexts.
So the less than actually saves us.
If I do for here, think about
logically what's happening.
i gets initialized to
0 for the first time.
And we get a question mark printed.
Then it gets plus plussed, and so
it becomes 1, which is less than 4.
And so that's the first time
I printed a question mark.
Then i becomes 1 next.
I print another one.
i is now 1.
I do another.
i is now 2.
I do another.
i is now three, which is not
consistent with the number of fingers
I'm holding up because I started at 0.
But once i becomes 3, and therefore I've
already printed my four question marks,
the next value i is gonna
take on is 4 itself.
Is 4 less than 4?
No, so I never get a chance
to print another question mark
or put up another finger.
And honestly, this is a waste
of intellectual capacity
to think through, OK, how many
numbers are between 0 and 4?
We could have-- like most of
us in this room just think--
could have just done i is
less than or equal to 4,
and that, too, would have worked.
This is even more clear, perhaps.
You start at 1, and you count
up to and through the number 4.
And that will give me
four fingers, as well.
Why do we start counting at 0?
It's kind of just because,
but more technically it's
because it's easy to start counting
with all 0 bits per our first lecture.
So it's just a habit.
And it's fine if you're
more comfortable this way.
But before long, get
into this habit just
because everyone else does it this way.
OK, so now let's go
ahead and print this out.
Make mario1.
./mario1.
Ah, still that bug.
OK, is this gonna fix this?
Why not?
Yeah, that's gonna do question mark,
new line, question mark, new line.
That's not right.
So what line number should the
backslash n really go on or between?
AUDIENCE: It should go
outside the for loop.
DAVID MALAN: OK, so it should
go outside the for loop.
So specifically-- I saw
a hand in back, too.
What line number?
Yeah.
AUDIENCE: Eight and nine?
DAVID MALAN: Yeah, so
between eight and nine.
There's no room there at the moment.
That's no big deal.
We'll just hit Enter, printf.
And I can certainly just
do a single backslash n,
even with no words to the left of it.
That, too, is OK.
Let me recompile this.
And honestly, if you get bored
retyping the same commands,
know that you can also hit
up and down on your keyboard,
and it will go through
your history, so to speak.
And that, too, over time
will start to save you time.
So there's make mario1.
Enter.
./mario1, or I could just
scroll back up as I did before.
And now I get those four question marks.
But now let's actually create this a
little more interestingly, as follows.
Let me go ahead and not just
hard-code 4 into this program.
Let me make one more version
of Mario, call it mario2.c,
and this time actually
get some user input.
How about I do int n
because is like a number,
and it's just common
convention to call it that.
get_int, and then I can
say number, semi-colon.
And now instead of hard-coding
4, why don't I just put
n there, which I can certainly do?
So let me go ahead now
and run make mario2.
Uh-oh.
Error.
Yeah, I forgot cs50.h.
So I have to go back up here.
I'll just do a quick copy-paste,
and then change the word, cs50.h.
Now I'm gonna clear my screen.
And to clear your screen,
you can hit Control-L,
for instance, which will just keep
fewer characters on the screen for us.
Make mario2.
That worked.
I didn't need to worry about the -lcs50
because, again, make does that for me.
That's one of the features.
And now I can do make mario2.
Number.
How many question marks do we want?
AUDIENCE: Seven.
DAVID MALAN: I heard seven first.
And now we have seven question marks.
And it's not necessarily
gonna look super pretty.
If I do 700, now I'm
gonna get a whole lot.
But look how quickly it did that for me.
And so we have this power now of loops.
So that's good, but you know what?
Let's see, what about this?
What about -50 question marks?
Is -50 an int?
AUDIENCE: Yes.
DAVID MALAN: It is.
So we will get it for you
via the get_int That's
not really logically what we want.
So think about it.
On line seven if n equals -50, how
many times will the for loop execute?
AUDIENCE: None.
DAVID MALAN: Why none?
AUDIENCE: Because 0's greater than.
DAVID MALAN: Yeah, because 0 is
in this case greater than -50.
So that condition never lets the
loop actually proceed logically.
So we're kind of OK.
Nothing seems to happen.
I get this sort of ugly blank line.
And maybe that's arguably a bug.
But at least it didn't freak out the
computer and just kind of print things
infinitely many times,
as could actually happen.
So let me go ahead and--
actually, at the risk
of losing control over my computer,
let's go ahead and change the logic.
Suppose I change the less
than to a greater than.
And we initialize n to -50.
And now, is 0 greater than -50?
AUDIENCE: Yes.
DAVID MALAN: Yes.
And it's gonna be that way for
a really long time, most likely,
even as you increment it.
And so let me go ahead and do make
mario2 and then hold my breath
and do -50.
And even the internet and the
computer can't really keep up.
And that's why you're just kind
of seeing it bursty like this.
We're sending thousands, tens of
thousands, millions, ultimately,
of question marks across the screen.
And that, too, you
might do accidentally.
And so just as I did, you can hit the
secret keystroke, which usually works,
which is Control-C for cancel.
And that will stop a program
in the window from running.
All right.
So I've gone ahead now
and implemented kind
of a very weak approximation of this.
So that's great.
Let's now take a step up and
consider not just this construct,
but if we fast forward in the game,
to this part of the screen, now
maybe we have a vertical block, as well.
And let's just consider for
a moment what about my code
needs to change if I want to print three
or maybe any number of vertical blocks?
Fundamentally, how do I
want to change the code?
How do I want to change the code?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, I just need a line
break, where I accidentally almost did
earlier.
But in this case, it
would be a good thing.
So let me go over there, and
let me just quickly make mario3
by starting at the same point.
So mario3.c.
And then let me go down here
and change this as follows.
Let's make this i is less than n,
the way it's supposed to be and just
so that if we upload these
later, I don't forget.
And now I'll do a hashtag just
because it looks more brick-like.
It's not one of those coin things.
And now I do this here.
I don't think I need this anymore.
So let me save that and run make mario3.
Seems to be OK compiling-wise.
mario3, number 3.
And I get three of those.
And now I could do maybe five of those.
That works, and so forth.
So there's still an opportunity
for improvement here.
In case I want to pester the
user to actually cooperate such
that if he or she types in
-50, I don't want to just quit.
I want to yell at them or
somehow give them feedback
and say, give me what I asked for.
How do I continue to pester
a user again and again
and again until he or she actually
gives me the value I want?
I'm sorry?
AUDIENCE: While.
DAVID MALAN: While.
So there's different looping constructs.
While-- and it turns out we
could use while or even for,
or there's another one, as well.
And let's consider how we might do this.
It turns out that when you want
to get user input from someone,
you could use for.
You could use while.
But you'll find that it's a little
annoying to use those constructs.
Let me just jump to the better way
first so that we see one other way.
It turns out if you in a program want
to do something at least one time,
and maybe some more times,
you could use for or while.
But it's actually a little more
straightforward to literally just do it
while something is true.
Now, this is just a placeholder.
Let me start to fill in some logic here.
So I want to do the following--
do the following while what?
If the user does not give
me a positive number,
I want to prompt him or her again.
The curly braces on lines seven and
nine at the moment connote exactly that.
Do this, do this, do this
while line 10 is true.
So what Boolean expression, if you will,
do I want to type in the parentheses
here on line 10 to express
the fact keep doing this
until the number is positive?
Yeah?
AUDIENCE: While n is greater than 0.
DAVID MALAN: So while n is
greater than 0, keep doing--
which one?
AUDIENCE: Less than.
DAVID MALAN: I heard less than.
OK, so let's rethink this.
So while n is less than 0,
ask the user for a number.
Ask the user for a number.
Ask the user for a number.
And you know what?
This is going to just
confuse the heck out of them.
Let's be even more
clear with our prompt.
Give me a positive number.
But keep prompting him or her until
we actually get a positive number.
Now, if we really want to be nit-picky,
it's actually not even less than.
We're so close.
AUDIENCE: Less than or equal to.
DAVID MALAN: It's less than or equal to.
Unfortunately, I don't really remember
having a key on my keyboard that's got,
like, an angled bracket
and then a line under it
like you might write in math class.
So there's a way to do that
nonetheless on your keyboard.
I actually just do them side by side.
This is less than or equal to.
This would be greater than or equal to.
And so now just get comfortable
reading these things left to right.
There's no special symbol like you
would have in a math book or a homework
assignment on paper.
So this, I think, says the right thing.
Do this while n is less than or equal
to 0, which is, of course, not positive.
And then down here, the rest of my
code, I think, can stay the same.
I just have a block of code up
here now that's doing something.
And you know what?
This is where comments
start to get useful.
Prompt user for a positive number.
And now down here, print
out that many bricks.
So it's kind of obvious if you just
read through the code what I just said.
But this helps you if you sort
of sleep on it and wake up,
and you want to remember,
why did I do this?
Why did I do that?
It helps the reader of your code,
a colleague, a teaching fellow,
and so forth.
That's how you kind of start
to add comments to your code.
Unfortunately, there's a bug,
and we're about to hit it.
So let me try.
Let me go ahead and make mario3.
Oh, my god.
More errors than I have
lines of code, it seems.
And this one's weird.
Error-- unused variable n.
And now let me dive in deeper
to these error messages
just so you start to
notice little clues.
So over here on the left
is, of course, the filename,
as you might have
noticed-- mario3.c Then
there's a colon, and then a number,
and then a colon and another number.
Turns out this is just a very succinct
way of saying that in mario3.c
on line nine at character or
column, left to right, 13,
you've got a problem, at
least the compiler thinks.
So generally, the character
is kind of sort of helpful.
It's really the line number that draws
your attention to the right place.
Somehow, this is buggy.
And specifically, the bug is
that I have an unused variable n.
And then very inexplicably, on
line 11, now I have a use of n.
So it's unused here, but it's used here.
And somehow, the computer
doesn't like this.
Why might this be?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah,
that's the trick here.
So it's a little different from
Scratch, where when you make a variable,
you can just use it anywhere you want.
In C and some other
languages, variables only
exist in what's called a certain scope.
And a scope you can generally
think of as just the most recently
opened and closed curly braces.
So what does that imply here?
Well, on line nine, I am on the
left-hand side declaring a variable.
Hey, computer, give me
a variable called n.
And it's gonna store an int.
That's the story we keep telling.
But the problem is I am doing that in
between lines 8 and 10, curly braces.
And I claimed today
that that means, kind of
like Scratch has the hug the
puzzle pieces, in C, variables
are treated a little different.
If you declare a variable in
here, it only exists in here,
and you can't use it
down here in your code.
And so this would kind
of seem to be a catch-22.
I need a variable.
And so I can declare it.
But I can't declare the variable
there if I want to use it later.
It doesn't really seem
to be a good situation.
So just logically, even if
you've never programmed before,
if the fundamental problem is
that this variable exists only
in that scope of the curly braces,
how intuitively could we solve this?
Yeah.
AUDIENCE: Move the curly brace?
DAVID MALAN: Remove the curly--
oh, move the curly braces.
Yeah, we could move the curly braces,
which is essentially the idea.
The catch is the do-while loop
really kind of needs them.
At least in generally cases,
you need those curly braces.
But you know what?
It's been a while since I typed them.
But I do have another
pair of curly braces
that are sort of outside of, so
to speak, my inner curly braces.
So I have another scope
here that's essentially
the whole function called main.
So what if I somehow declare
my variable out there.
And indeed, I can.
I can go to, like, line
seven-- or even higher,
but generally you want to
keep it as close to where
you care about it as possible.
I can type int n.
And I don't think I want
to prompt the user here
because then I'm going to create the
same problem as before, where I'm just
prompting him or her once.
I want that prompt in a loop, again
and again and again, potentially.
So that's OK.
We've not seen this before,
but you can declare a variable,
and then do nothing with it yet.
Just say, hey, computer,
give me a variable.
I'll deal with this later,
just like in Scratch.
You declare a variable
if you did, and then you
deal with it later as you want.
Now, this would be a bug still.
I can't say, hey, computer,
give me a variable n.
And then, oh, by the way,
give me another variable n.
So all I have to do to fix this
issue is just don't redeclare it.
Just use it.
So line seven says, hey, computer,
give me a variable called
n that's going to store an int.
Line 10, same story as always
except it's slightly shorter.
On the left-hand side, it
says, here's my variable.
Right-hand side says, here's
a value we got from the user.
Put it from right to left.
And so now because n is declared
or created on line seven,
it exists within the scope of
these outermost curly braces.
And now I can use it
kind of anywhere I want,
including on line 12, which is great,
and, most importantly, on line 15.
So let me go ahead and save this
and do make mario3, hold my breath.
Good, it actually worked. ./mario3.
Positive number.
Nuh-uh.
I'm gonna give you -1.
OK.
I'm gonna give you zero.
All right, fine.
I'll give you 3.
And now it actually cooperates.
And so the do-while
construct is still a loop,
and Scratch doesn't really
have an analog of this.
But the do-while loop is still a loop,
but it does something at least once.
The difference fundamentally,
though, is this--
if I did this, like, while n
is less than or equal to 0,
if I change this to a while loop, which
we saw ever so briefly the other day
as just an analog of the forever
block in Scratch, if I do this,
there's kind of a logical problem.
Here's n being declared on line seven.
So we're avoiding the scope
issue this time from the get-go.
But line eight is saying while
n is less than or equal to zero.
But what is n at this point?
It's not yet defined.
And, in fact, as we'll
soon see in class,
it has some garbage value,
typically, some unknown value,
remnants of whatever the computer used
that RAM or memory for in the past.
So this is literally undefined
behavior, it would seem.
I don't know if the loop's
gonna execute or not
because I don't know what's in n.
So you could hack around
this, so to speak.
Hacking generally means
kind of sort of figuring out
a solution to a problem that
might not be the cleanest.
And OK, let me just initialize
this to, like, -1,000 because I
know that's less than or equal to zero.
So it's a hack in that it fixes
the logical problem because now
on line eight, is -1,000
less than or equal to 0?
It is.
So now my loop will
execute at least once,
and it will then change the value of n.
But what the heck is -1,000 coming from?
These kind of inelegant solutions
would be horrible, horrible design,
even though it logically gets
the job done and it's correct.
Bad, bad design.
And so that's why we started with a
better design, with a do-while loop.
But you'll find there's many
different ways to do things.
And you might not,
certainly, in problem set
one or two do things always the
right or best way the first time.
And that's OK because with
practice and experience,
you'll begin to see
patterns with which you
can solve these same kinds of problems.
Any questions on these
approximations of Mario?
Well, let me do one
last one, one last one
involving Mario, and kind of like this.
I spent way too much time
looking for parts of Mario
that kind of painted these pictures.
And I found this, these additional
bricks underground in the fire level.
And suppose I wanted to print,
like, a cube of hashtags,
so not just a horizontal line,
not just a vertical line,
but kind of sort of both together.
And indeed, you can think of
these bricks as exactly that.
It's like hashtag, hashtag, hashtag,
hashtag, hashtag, hashtag, hashtag,
hashtag, and so forth, kind of
like an old-school typewriter,
printing one line at a time.
And if you even remember
typewriters, you
can actually think of computers and
printf as behaving very similarly.
You can print something,
then do the backslash n.
Print something else,
do the backslash n.
So what is a square
like this on the screen?
Well, it's really just the
process of, like, painting
the screen, if you will, from
left to right, moving down,
left to right, moving down.
And now do we do this?
Well, what was the type
of code we used in order
to do something again
and again and again?
The for loop was the first one.
We could use other constructs, but I'm
going to go ahead and use the for loop
again.
Let me save this as mario4, as
our fourth and final example.
I'm gonna keep this code
up here because I still
want to prompt the user for some number
of blocks, a positive number at that.
And now I don't want to just do this,
but let me just see where I left off.
Let me go ahead and make mario4.
./mario4.
And let's do, like, a 5-by-5 block.
OK, that's not.
That's just a column.
So I've got to do a little more.
Well, it turns out that
just like in Scratch,
I can take one idea and kind
of nest it inside of another.
Let me go ahead and do this.
How about inside of my for
loop that's going from i to n,
let me do another one
for int-- and I don't
want to reuse i because I feel
like if I use i in two places,
something's going to get messed up.
So I'm gonna go with the
next one alphabetically, j,
which is actually pretty common.
So int j gets 0.
j is less than n.
And j plus plus.
And now in here, I'm
gonna put that brick.
I think I need to get rid of this here.
And let's see now what happens.
So I've got a for loop
inside of a for loop.
If I go ahead and do make mario4,
Enter, code is compilable.
./mario4.
Let's type in 5.
Hmm.
I think that's actually, like,
25, if I really count it out.
That's not what I wanted.
I wanted a square.
So what's obviously
missing aesthetically?
A new line.
But I kind of thought that
doesn't go here, right?
Because if I do this--
just real quick teaser.
If I rerun mario4 after
making that change,
now I've just made the opposite problem.
So what needs to change?
Yeah.
Oh, just scratching?
OK.
What needs to change?
Yeah, over here.
AUDIENCE: Another line with
a printf and a backslash n.
DAVID MALAN: Yeah, and
what line would you propose
the printf with the backslash n?
AUDIENCE: 21?
DAVID MALAN: Sorry?
AUDIENCE: 21.
DAVID MALAN: 21.
So above or below it?
AUDIENCE: Above it.
DAVID MALAN: Above it.
OK, so let me go there.
So let me go ahead and
printf backslash n.
And now let's see.
So make mario4.
./mario4, Enter.
5.
Beautiful, beautiful.
It's not quite a square on the screen
because the hashtags are a little more
vertical than they are wide.
But that's OK.
We've built this sort of
approximation of that level, too.
And now, just for good measure,
let me just think about--
this is kind of an oversimplification,
print out that many bricks.
So print out this many rows
or columns on the outside?
And then in here, where we're going
with this, print out this many--
what should my comment
here be on the top?
Top on is rows?
And then down here,
this should be columns.
And it is because on the
outermost loop, you've got i.
And it's starting at 0, and
it's eventually going to 5.
But whenever i is 0,
at the beginning, it's
like the cursor is in the top left-hand
location by default on the screen.
And then you've got this nested
loop, which says, oh, by the way,
do the following five times.
What are you doing five times?
Hashtag, hashtag,
hashtag, hashtag, hashtag.
Then a new line.
Then i becomes 1.
So it's like moving over--
sorry.
Then i becomes 1, which means
you're now on the second row
because you've printed out one
of those newline characters.
So here, too, this is where comments
would be helpful because, frankly,
even I had to think about that.
And you don't want to
waste time thinking
about code you've already written.
Just give yourself the answer
to why you made past decisions
as in a case like this.
All right.
So suppose something's going wrong.
And, in fact, we already solved the
problem of, like, a lot of hashtags
going this way and a lot
of hashtags going that way.
But suppose you want to wrap your
mind around what your code is actually
doing.
It turns out that we have
two other tools we can use.
It turns out we have in the
CS50 library a function that's
almost identical to printf except we
called eprintf for, like, error printf
just to help you see what's
going on inside of your code.
And you should use it as follows.
If you kind of want to wrap your
mind more clearly around what
your own code is doing or, for that
matter, even an example for class
that you downloaded, you can
add, certainly, your own lines
of this-- like, "hello there.
I'm at home playing with this code"
or something like that, right?
So something nonsensical,
but at least now
when you see that sentence on
the screen, you know on what line
the computer was executing your program.
So you can be a little
more methodical than this.
And with eprintf, notice
we can do the following.
I'm going to change this to just
eprintf, and it works the same.
And I'm going to go ahead and do this--
"about to prompt user for a number."
I just want to provide an explicit
note to myself, temporarily,
what should be happening here.
And let me see now what happens.
If I do make mario4.
OK. ./mario4.
Ah.
I get a little ugly output,
but it's just diagnostic.
It's temporary.
It says mario4.c on line 10 is
giving the following message--
"about to prompt user for a number."
That's just a note to self so that
I'm comfortable understanding the flow
or the structure of my program.
I can still interact with it.
Let's type in something like -1.
And what should I see next
on the screen if I type -1?
Yeah, another prompt.
"About to prompt user for number."
So it's just like a sanity check.
If you think something's
going to happen, tell yourself
that it should in your code, and make
sure you see what you expect to see.
And then once you're
sure your code is good,
then don't submit it with
this because this is not
correct per the specification.
You can just get rid
of it at that point.
But frankly, that gets
tedious very quickly.
Oh, and how do I kill my program
if I don't want to keep playing?
Control-C will terminate the program.
There's one other tool,
perhaps the most powerful.
And I can't stress stress this enough.
Get into the habit of using
this as needed early on.
Even if it takes you an extra 10
minutes, half hour to play with it,
it will save you, potentially, hours
over the course of the semester.
And that is a program called a debugger.
So a debugger is a
program that helps you
remove bugs or mistakes from a program.
And it works like this.
I'm going to go ahead
and recompile mario4.
And now I would normally run
it, of course, with ./mario4.
But suppose I have a bug, and I really
want to understand what's going on.
I'm going to do the following.
You'll notice that all
of my examples thus far
have line numbers in the so-called
gutter of the program, left-hand side.
And it turns out you can actually click
to the left of those numbers at, like,
this point here.
And you can put a red dot.
This shall be known as
what's called a breakpoint.
This is like a little stop sign,
only for yourself, that says,
hey, computer, pause my program
here, or really stop my program here,
like a stop sign, temporarily.
And let me, the human,
go at human speed,
not, like, billions of
things per second speed.
And by this, I mean the following.
I'm going to now run not mario4
but debug50 space mario4, which,
again, is a program we
wrote that invokes or starts
the IDE's built-in debugger.
So notice magically this
right-hand panel just popped out.
And it's actually always been there.
It's always said "Debugger," and it
just happened to open that window for me
automatically.
And let's see what's going on.
There's a lot of words, but we're
familiar with many of them already.
Notice that down here is
the word local variables.
And then there's kind of a table here.
And it's not very big because
I only have one local variable.
And at this point in the
story, my variable n happens--
I got lucky.
It has a default value,
it would seem, of 0.
I shouldn't rely on that.
But it's just so early in my
program that it seems safe--
well rather, it's so early
in my program that it
happened to have the value 0
in it for our purposes today.
And it's of type int.
But what's cool now is the following.
Now notice that my program is
effectively paused on line seven,
or, specifically, line 10, which
is the first interesting line.
That's why it's highlighted in yellow.
And what's cool here is this.
Up here in the top right, you have
a play button which will just say,
play the rest of my program.
Just let it go through without pausing.
Or, if I hover over
this thing, you can step
over this line, which means,
hey, computer, execute this line,
but at my human pace, just
one line of code at a time.
If you're really curious, you
can step into that line of code,
but more on that in just a moment.
Meanwhile, this is step out, which
is if we've actually dived in deeper.
So what do I mean by all of this?
So I'm currently paused
on line 10, which
was the first interesting line
of code in my program, so far
as the debugger is concerned.
I'm going to go ahead
at top right, and I'm
going to go ahead and click Step Over.
And notice my terminal window is
now prompting me for a number.
Why?
Well because I've stepped over the
get int line, which means execute it.
So let me go ahead and
type in that number.
Let me go ahead and type in -50, Enter.
And keep an eye on the variable
on the right-hand side.
Notice now in the debugger, even without
printing it with printf or eprintf,
I can see that n has a value of -50.
It's just a sanity check, so to speak.
I can see what it is to be sure it's
consistent with my expectations.
All right.
That's not right, so let
me go ahead and step over.
And notice the yellow line
moved because it's looping.
You can literally see what
I keep doing with my hand.
Let me do it again.
OK, positive number.
I'm going to cooperate this time.
42, Enter.
Notice at the right-hand side,
the value n is indeed 42.
And notice the yellow
line, if I keep stepping,
is about to jump to the next
interesting line of code.
And if I keep doing
this, keep doing this,
watch what's about to happen in the
blue terminal window at the bottom.
There's the first hashtag.
There's the second hashtag.
So the sort of fake animation I did
the other day with just my slides,
and what I try to do verbally and
with my hand going back and forth,
you can now see much more methodically.
So even if it's a simple program,
and even if it's code you wrote,
you can really see step by step
what it is your program's doing.
And maybe it's not
doing what you expect.
And if it's not, you'll see it visually.
All right.
Now I'm just gonna go ahead and say,
OK, print the rest of the thing.
So I hit Play.
You see that the GDB, the GNU
Debugger, server is exiting.
It's just quitting.
And now I'm back at my prompt,
and the debugger goes away.
So do not undervalue
those particular tools.
So before we forge ahead, I thought
I'd introduce Abhishek here,
who you might have seen on the
internet just a couple months ago.
He kind of went viral.
He's a recent grad from NYU.
And he did this extraordinary thing.
He took a device called the Microsoft
Hololens, which is an augmented reality
device that puts sort of a goofy
looking screen in front of your eyes.
But then it projects images
in front of your eyes.
And it's really cool in that much
like an Android phone or an iPhone
these days, it knows where you
are in a three-dimensional space.
And what Abhishek actually did was he
went to a very three-dimensional space,
Central Park in Manhattan.
And he had before that spent
days recreating "Super Mario
Bros." in augmented reality by
recreating one of those maps to which I
alluded earlier.
And the end result-- and I'll
show you just a glimpse of it,
and we'll put it on the course's
website for you to see later in detail--
was this, which was pretty mind-blowing
and a wonderful application of computer
science to the real world, literally.
[VIDEO PLAYBACK]
- Hi.
I'm Abhishek.
And I recreated the iconic first
level of "Super Mario Bros."
as a first-person, life-size,
augmented-reality game
that I'm now going to play as Mario.
[MUSIC, "SUPER MARIO BROS.
THEME"]
[END PLAYBACK]
DAVID MALAN: Abhishek gave a tech
talk in CS50 a couple of months ago.
And the funniest part, if you really
look closely-- and it is Manhattan--
is some people look at him.
But a lot of New Yorkers don't
even look twice at what he's doing.
Let's go ahead here and
take a five-minute break.
And when we come back, we'll begin
to look at the world of cryptography.
So we are back.
And, of course, there are more
functions than just printf.
And we've seen a glimpse of
these by way of the CS50 library.
And there's many, many, many,
many more that come with C itself
and that other people around the
world have written over the years.
But implied in each of
these CS50 functions,
notice, are these key words
like string and int and float--
which we talked about
the other day, too--
long, and long long, and double,
as we saw the other day, too.
So it turns out that C, to be clear,
has what are called data types.
And we glimpsed this the other day.
Data types specify what type of data
you can put inside of a variable.
And that's what's different
from Scratch, too.
In C and a few other
languages, too, you have
to decide in advance as the
programmer what kind of data
are you going to put in this
variable so that the computer--
or, really, the compiler-- knows.
And so the compiler knows
how to deal with it for you.
Well, it turns out that if you
want to print these things out,
printf also comes with
certain format codes.
And we've seen %s for
strings and %i for integers.
And there's a bunch of others, too.
Perhaps the most common would be
these, just so that you've seen them--
%f for float.
We saw that the other day.
%lld for a long, long decimal number.
That's one I often
have to look up myself.
And then there's even
more of those, too.
So just realize that as you're
getting input from the users,
whether for problem sets
or any other purposes,
realize that sometimes you have to
check the manual or the documentation,
so to speak, for functions
that you're using.
And so that you know where to
turn for those kinds of things,
let me just introduce
one thing real quick.
And you'll see more of this in super
sections and sections and beyond.
If you forget, for instance,
how certain functions work,
you can actually type the
following-- "man get_string,"
where man stands for manual.
And this is kind of an old-school
command on Unix and Linux computers
that have this text-based
keyboard environment.
And you'll see pretty much a
standard, structured user's manual
for the function in
which you're interested.
So if you forget what
we talked about in class
or you're not really sure
how else you can use it,
and the function is
something like get_string,
you can simply read about it here.
But sometimes, frankly, it's
going to look a little arcane.
I mean, we have not talked about what
some of these symbols mean-- the ...,
the word const, the asterisk that
I've highlighted on the screen.
So frankly, sometimes you will find
the man pages, as they're called--
the manual pages--
just confusing unto themselves,
which is a nasty situation to be in.
If you're already confused, and
the documentation's not helping,
you of need a third option.
And so if you go to
CS50's website, you'll
actually find that
there's a link to a tool
that the staff has created over
the years called CS50 Reference.
This is a more user-friendly
version of those same man
pages, where we've gone
through and sort of translated
the very arcane English into less
comfortable English, if you will.
So if over here I scroll
down to, say, printf--
or, rather, let me just search for it--
I can see printf here.
It's inside of this header
file, this h file on the system.
And now I can actually
read about it here.
And notice at top right, checked
is the Less Comfortable box,
which means, hey, show me the language
the TFs came up with as opposed
to the default language.
But it, too, is meant
to be a training wheel.
So if and when you're
ready to sort of take away
some of those simplifications,
you can uncheck that box
and now see the much more
verbose technical version
that you would actually
see in the real world.
So keep in mind those kinds
of things, too, especially
if it feels like we go through
things quickly in class,
which we do, and you need to lean on
something authoritative thereafter.
But let's tease apart
what actually a string is.
Let me go ahead and start
actually, with Stelios here.
So Stelios, one of our head TAs
in New Haven, has this name here.
And I've written it as
a string, S-T-E-L-I-O-S.
But I've kind of drawn
boxes, deliberately,
around his name to capture the fact
that this thing we call string,
like "Stelios," is actually
not really a string only.
It's really like an
abstraction for something
a little lower-level, which is
a character after a character
after a character, and so forth.
And so here, too, we see an
example of an abstraction.
It's not that much fun to call Stelios
S-T-E-L-I-O-S. We call him Stelios.
But we, in languages like C, would
call that construct a string or, more
technically, a sequence of characters.
But it's a string.
It's a nice abstraction.
It's a nice simplification.
But it turns out there's
an opportunity here
now to see how characters and
numbers interrelate in a computer
and see how powerful computer programs
and software are that we ourselves
can write.
But first, how do we access
individual characters in a name?
I can easily get Stelios's name using
the function get_string, as we've seen,
just like Sam did from the
audience the other day.
But how do I actually get at,
like, the S or the T or the E?
Or if maybe he makes a typo or maybe
he, like, doesn't type it very neatly,
how do I capitalize his name or
sort of clean up his user input
like websites today very commonly do?
Well, let me go ahead and
open up CS50 IDE again
and just do a pretty simple example
that this time involves strings.
Let me go ahead and create a new file.
And I'm going to call
this file string0.c.
And I'm going to go ahead now
and write a short program--
come on-- once I've lost
control over my terminal window.
Now I've lost control of my menu.
This is my own fault for--
oh, here we go.
Well, this is gonna look great.
Very inspiring here.
Where'd it go?
Oh, oh.
Here.
OK.
That's an example of bad
design, so we will fix that.
And now I see that I've
misspelled string as strig.
So we're just gonna--
no one on the internet will ever
know the following happened.
OK, so string0-- voila.
Here we go.
All right.
So string0.c, and I'm gonna whip up a
really quick program here as follows.
So int main void.
And now string s gets get_string.
And I'm just gonna ask for
the user's input in this way.
And now I'm going to go
ahead and print out--
how about just say the word output here.
And just to be nice and tidy,
let me put a couple of spaces
here in anticipation.
And now let me go ahead
and do this-- on line five,
my intention is to get,
like, Stelios's name from him
or whoever is playing this game.
But now I want to go ahead
and not just print out, like,
hello Stelios, and plug in his
value s, which we've been doing.
I want to do this character at a time.
And doing something one at a
time kind of suggests a loop.
And indeed, I can do that.
So I'm going to do for int i
gets 0, i is less than however
long his name is, and then i plus plus.
And now I can introduce
one other trick that you
can kind of glimpse ever so quickly
from the screen I had up before.
It turns out that %c is the
placeholder for a character.
Perhaps no surprise.
But the catch is I only have access
to s, the whole thing, the string s.
But it turns out there's a
new piece of syntax here.
And as is kind of sort of
implied by our having used boxes
to flank Stelios's
letters of his name there,
turns out that the equivalent in C
is to kind of sort of do the same,
use a box of characters, by using the
square brackets, which you might not
often use on your keyboard.
On a US keyboard, they're
often just above the Enter key.
And here I can go
ahead and type in s[i].
And so to speak, this is going to
print the i'th character, if you will,
of Stelios's name.
So i is going to start at 0.
And I keep doing plus
plus, plus plus, plus plus.
And using the square bracket
notation, so to speak,
I can dive into the individual
letters in his name in this case.
So when I run this, what's
going to be the net effect?
Let me go ahead and make string0.
Huh.
OK, that is not valid C code,
however long his name is.
So I have a problem to solve here.
How do I actually get
the length of his name?
Well I can kind of cheat.
OK, so one, two, three,
four, five, six, seven.
All right.
So we can just write
this program as follows--
7.
But this should rub you the wrong way.
Why is this not a good
solution to the problem?
Yeah?
AUDIENCE: Because it's not changeable.
DAVID MALAN: Exactly.
It's not changeable.
I have this dynamism of
get_string to get Stelios's name.
But seven is not going to be true of all
the humans who might use this program.
I need something dynamic.
Well, it turns out there
is a function for that.
I can call strlen, for string
length, pass in as input
the variable whose length
I want to get, and that
will return to me a number, which will
be, in this case, it would seem, 7.
But it's going to be dynamic.
So if I type in, like, David,
that should return 5, hopefully,
and any number for any number of
other humans engaging in this.
So let me go ahead now and try again.
Make string0.
A lot of errors.
And "use of undeclared
identifier string."
Wait a minute.
We've seen this before.
How did I solve this last time?
Yeah?
What's up above missing?
AUDIENCE: The libraries.
DAVID MALAN: Yeah, the
libraries or the header
files, so to speak, for the libraries.
So I need to include, I'm pretty,
sure at least stdio.h for printf.
I need to include cs50.h for get_string.
And we're almost there.
Let me see if that's enough.
Make string0.
Oh, Implicitly declaring library
functions strlen with type--
I don't really know what that is.
But there's kind of an
answer hinted there--
include the header
string.h and so forth.
So turns out this is true.
And there's different ways to know this.
If I actually go back to
reference.cs50.net and do strlen,
there's that function.
Let me go back to the less
comfortable-- whoops--
to the less comfortable version.
Notice that under synopsis of a
man page or reference.cs50.net
is always a quick summary
of how you use it.
So just the prototype
of the function that
gives you a sense of what it is-- size_t
is essentially equivalent to an int,
just saying the size of
something as a number.
But include string.h is
the ingredient I wanted.
So let me go ahead and copy that.
Let me go back to the IDE.
I'm gonna be a little
nit-picky, and I'm just
gonna keep things
alphabetical at the top.
But that's not strictly necessary.
It just makes it easier to skim
later on when the list gets long.
Make string0.
Seems good to go. ./string0.
Inputs.
Now I'm going to go ahead
and type in Stelios's name.
And I got his output, as well.
Now, that was a lot of unnecessary
work to print his name.
I could have just used %s.
But now I can make modifications.
What if I wanted to
print it one per line?
I can add that.
I can make the program again,
rerun it, and type in his name,
and now I get it one per line.
It's a little ugly.
Like, now it says output s.
But that's just an aesthetic bug.
I could go in and fix that.
But now I have control over
the individual characters
in his actual name.
So that would seem to be
progress in some form.
But if I now have access
to the individual letters,
we can kind of come full circle
from the very first lecture where
we talked about zeros and
ones, and then numbers,
and then letters, and now, in turn,
words, otherwise known as strings,
by way of a topic called typecasting.
Types, of course, are the types of
variables we've been talking about.
Casting means to convert
from one to the other.
And you might recall
from the first lecture
that capital A was the number
we know in decimal as 65
and whatever pattern of
zeros and ones that is.
Capital B is 66, and so forth.
So can I see that now
for the first time?
Well it turns out I can.
Let me go back to the IDE.
And let me go ahead and create
a new program called ascii0.c.
and ASCII, again is just the standard.
It's an acronym, American Standard
Code for Information Interchange,
which maps letters to numbers
and numbers to letters.
So let me go ahead now and
whip up a quick program.
Include stdio.h for printf.
Int main void.
And then let me go here
and do the following.
You know what?
I'm gonna just go ahead and print out,
let's say, string s gets get_string.
Let's just ask for someone's name.
And then let me go ahead and do
the following-- for int i gets 0,
i is less than the
length of that string--
learning from last time--
i plus plus.
So this is gonna iterate
over the whole string.
And now what I want to do is this.
Let me go ahead and
print out the following.
Let me print out the character
itself, and then a space,
and then how about an
integer, and then a new line.
And we'll see what this
does in just a moment.
I want to plug in values
for these placeholders.
So how do I get at the first character
of the name if the string is called s?
Yeah, so s[i] for the i'th character.
And that's gonna plug in, literally,
S-T-E-L if Stelios is the one playing
the game.
But now I put a comma to
plug in a second placeholder.
And %i-- you know what I'm gonna do?
I'm going to do int in
parentheses s[i] semi-colon.
So it looks a little cryptic.
But let me just remove
this for a moment.
This is just the same thing twice--
print the i'th character of the
name, i'th character of the name.
But in parentheses, I'm doing
what's called typecasting.
I'm taking whatever that is,
which is a char or character.
And I'm saying, parenthetically,
make this an int instead.
So if it's capital A, it becomes 65.
Capital B, it becomes 66, and so forth.
And if I now compile this
program after preemptively
fixing what would have been a
mistake by adding the header--
make ascii0.c Whoops, sorry.
Oh, common mistake.
Nothing to be done.
I'm pretty sure there's
something to be done.
I need to compile it.
What did I do wrong?
Yeah, don't put the .c.
It's a little counterintuitive, but
when you want to make a program,
you type the name of the program,
not the name of the file.
Now, in-- oh, damn it.
I almost learned from my mistakes.
What am I missing now?
AUDIENCE: String.h.
DAVID MALAN: String.h.
All right.
Include string.h.
Save it.
OK, so let me make ascii0.
Good. ./ascii0.
And now, Stelios, Enter.
And now we see the ASCII
codes or the numbers that
correspond to the letters in his name.
They're pretty big numbers.
They're in the 100s now.
And that's because they're lowercase.
We've previously talked only about
capital A, capital B, and so forth.
But it turns out that the
lowercase letters also
have values associated with them,
like some of those here, as well.
And now it turns out now
that I know this, now I
can kind of do some low-level stuff that
we all take for granted on our phones
and websites like when you just
type in your name in all lowercase,
and the website just fixes it, or
if you type in your phone number
with parentheses, without
parentheses, with dashes, without,
the website just kind of fixes it and
cleans it up into some cleaner format.
We now have kind of the
low-level control to do this.
I won't type this one out manually
just because it's a little longer.
Let me go ahead and open it up.
And among today's examples in source2,
which is on the course's website,
is this example here--
capitalize0.
So let me make a little
more room for this.
It's a little longer.
But let's just focus on just
a couple lines at a time.
Here's the beginning of my program main.
Here is a line of code
where get_string before.
I just say, give me the before string.
And then I claim, now print the string
after making some changes to it.
So what am I gonna do?
On line 11, I seem to have used
the same ideas a moment ago,
but with one change, actually.
I've done something a little different.
Line 11 is very similar to what
I've been doing to iterate over
the characters in a string.
But I did something
different, which is what?
What looks different now versus what
was on the screen a little bit ago?
Anyone a little farther back?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, it looks like
I'm declaring two variables all
in the same breath, so to speak.
I have my int i equals zero, and
we've done this a bunch of times.
But then I have a comma here
for the very first time.
n equals strlen of s.
But if you think about what
these building blocks are--
OK, the comma is new, but n is on the
left, so that's a variable, apparently.
It's probably an int because
the word int came before.
Equal sign is assignment
from right to left.
strlen is a function that
returns the length of a string.
So this would seem to be storing,
just to be clear, what number in n?
Yeah, the number of
characters in whatever
the string is the user typed
in, "Stelios" or "David"
or whatever the name is.
So then I have my condition.
Then there's the semi-colon,
which we have seen.
And then there's plus plus.
So I claim that this is,
in a sense, better design.
It's a little more complicated.
Like, I typed out more characters.
I added another variable.
But why might it be
smart and good design
to have used an extra variable,
using a little more space to keep
a number around, so as to then
simply compare i against n?
Why did I jump through these hoops?
What do you think?
AUDIENCE: Because then it
doesn't have to check what n
is each time it goes through the loop.
DAVID MALAN: Exactly.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: It doesn't have to
check what the length of the string
is on every iteration because after
all, once I or Stelios or whoever types
in their name, it's not going to change.
It is D-A-V-I-D or Stelios's name or
Maria's name or whoever is playing
the game.
And so why would you keep
calling a function saying, oh,
by the way, what's the length?
By the way, what's the length?
What's the length?
Just remember it the first time because
odds are it takes a little bit of time
to do that computation and
actually figure out the length.
And so here, we've simply
kept that answer around in n
and can compare two variables.
Meanwhile, here's a pretty
big if-else construct.
But if we break it down into pieces,
it's doing something relatively simple.
On line 13, I am asking the question, if
the i'th character of s is greater than
or equal to a lowercase A, &&,
which we haven't seen before.
This means logically and.
So if it's greater than or equal
to a and less than or equal to z--
put another way, if it's
between a and z inclusive in
lowercase-- what am I doing on line
15, which is super weird-looking?
I'm first printing out %c,
which is my placeholder.
But then I'm printing out the result of
s[i] minus whatever lowercase A minus
capital A is?
I mean, this is just strange now.
But let me just point out one clue.
Turns out there's a pattern here.
And humans did this deliberately.
If you can do the arithmetic
quickly, how far apart
is lowercase A from capital A?
It's 32, right?
And you could just do the math.
97 minus 65-- oh, 32.
How about capital B versus lowercase B?
It's 32.
32.
It follows this pattern.
So this is the say-- and it's
sort of proof by example.
We're not even seeing all
the way to Z, but trust me.
32 is invariant across all of
the letters of the alphabet.
They're always 32 away.
And I could hard-code 32, but
that feels a little inelegant.
Why don't I instead
just arithmetically say
whatever the difference is
between lowercase A and capital A,
and that's all I'm saying
in parentheses here.
Whatever that numeric difference
is in the computer's representation
of my numbers, just subtract that
difference from the i'th character.
Now, what's nice is I kind of
sort of should do this first.
Like, I should cast the
character to an int.
But I don't need to be so explicit.
The computer knows that
characters are integers.
And the computer knows that
integers are character.
There's this equivalence.
I don't need to be so
verbose as to even say that.
It just suffices to let
the computer figure it out
implicitly that in this context,
I'm doing arithmetic on numbers,
and then, in the context of printf, I'm
displaying that number as characters.
Nothing is happening.
You're just telling the
compiler what context in which
to treat these values,
numeric or characters.
So long story short, what does the
effect of these four lines of code
have on the characters
in the user's input?
What does it do?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: It capitalizes it.
Right?
It capitalizes it.
And that might have been implied, too,
by the file name, in full disclosure.
But it's how you think about solving
the problem of capitalization.
Here's the string.
Home in on the individual characters.
Figure out if they're within
a range you want to deal with.
And if so, do some kind of mutation
of them to change from one value
to another.
I could have done this a horrible way.
I could have had, like, an
if-else-if-else-if-else--
I could have, like, 26 conditions
checking is the character A?
Is it B?
Is it C?
And if so, make it capital
A, capital B, capital C.
But that code would have
been like this big or bigger.
This is now a more algorithmic
way of solving the same problem.
And if it's not a lowercase letter
between A and Z, just print it out.
There's no work to be done.
Now, just so you've seen it, it doesn't
have to even be as verbose as this.
In capitalize1.c, which is available
also on the course's website,
I've made my code a
little better designed.
I'm now not reinventing as many wheels.
I'm standing on the shoulders
of smart programmers before me.
And I've clearly changed
at least one thing.
Instead of doing this manual process
of comparing against lowercase A
and lowercase Z, I'm
just punting and using
a function which, beautifully,
is called is islower, which
just literally answers that question.
Because another data
type in C is not just
int and char and float and
string in CS50's library,
but there's also something
called a Boolean.
A Boolean, also named after
Boole, is similar in spirit
to a Boolean expression, true or false.
But a Boolean variable is literally
just the idea true or false.
And so islower you can think of
as returning a Boolean value.
It returns true or false, yes or no.
And the name of this function,
therefore is very appropriate.
Is lower?
That's a yes-no or a
true-false question.
I don't know how it's implemented.
If I really care, I could
go to CS50 Reference,
or I could use the man
command on the IDE.
And I could actually check
how this thing works.
But I do need one takeaway.
To use this, I need to
use the ctype library.
So there's other libraries that we're
now just scratching the surface of.
And you would only know they exist
by reading documentation like that.
But you know what?
I can go even further.
You know what?
If some human years ago wrote a
function to check if something is lower,
what did he or she probably
do, as well, for me?
AUDIENCE: Uppercase?
DAVID MALAN: Yeah,
isupper also does exist.
Yeah.
So spoiler here.
So isupper exists.
But if they checked if it's lower or is
it upper, gonna just go out on a limb.
toupper?
Yeah.
So it turns out there's
a function called toupper
that converts a letter to uppercase.
And indeed, I can now leverage this
in my third version of this program
as follows.
capitalize2.c gets even better
designed still, if you will.
It's even shorter, fewer
lines of code, easier to read,
fewer opportunities for bugs.
How do I solve it now?
I still iterate over
each of the characters,
but I just blindly call toupper,
toupper, toupper on every character
because I read the documentation.
If you pass a character to
toupper that is already uppercase,
it just prints it out.
Doesn't change it.
If you pass in a punctuation
symbol, it just passes it through.
But if you pass in a lowercase
letter, it capitalizes it for you.
And so I can now kind of implement--
I can lean on whoever
implemented that before me.
It could have been me.
I could have wrote my own
function called toupper
But I don't need to because
in the world of programming,
there exists libraries of code that
other people have written for us
that we can leverage.
Any questions, then, on that?
Yeah.
AUDIENCE: So this method, you
wouldn't be able to [INAUDIBLE]..
DAVID MALAN: This would
be all of them, yeah.
So if I only wanted to
capitalize Stelios's first--
the first letter of his name, I
probably wouldn't want the loop.
I would probably just want to capitalize
[0], specifically, of the letters.
But I'd want to make sure that his name
is at least one character long, lest he
just have hit, like, Enter
accidentally or maliciously.
Absolutely.
So let's just dive in to one
other detail here as follows.
Suppose that I want to actually
know what the length of a string is.
I know that there exists
this function called strlen.
But it turns out I can figure out
lengths of strings for myself, too.
Let me go ahead and write a
program called strlen itself.
But I'm not allowed in this
example to use string length.
I'm going to go ahead and
include the CS50 library.
Let me include stdio.h.
Let me go ahead and do int main void.
And now let me go ahead
here and do string--
bad style-- string s gets get_string.
Name.
And now let me go ahead
and do int n equals 0.
Just give me a variable, call
it n, set it equal to zero.
And then let me go ahead and while
I'm not at the end of the string--
also not valid code--
n plus plus.
I can use that plus plus trick that
we've seen before for i plus plus.
And then I'm going to go ahead and print
out whatever the value of that counter
is because I want in
my loop to just count
the number of characters
in Stelios's name
or whoever's name
actually ran the program.
And just to be clear, this is
what's called syntactic sugar, which
is a very sexy way of just saying this
is shorthand notation for doing this,
which is just more boring-looking.
This does the exact same thing.
It's just a more
succinct way of doing it.
And you'll see little features
of languages like this just
to save us humans keystrokes.
This, of course, is not
a solution to a problem.
How do I know I'm at
the end of the string?
Well, it turns out we need to break
the abstraction layer, so to speak,
of strings just a little bit.
So it turns out that in your computer,
we have this piece of hardware--
RAM.
And we saw this the other day.
And we talked a little bit about
the limitations of computers
and the finite amount of
memory that they have.
And if you think about all
of the chips on this device--
doesn't matter for today how this works.
But just know that there's
lots and lots of bytes that can
be stored in your computer's memory.
And you might have 1 billion bytes, 1
gigabyte, 2 billion bytes, 2 gigabytes.
But for our purposes today, just think
of this RAM inside of your computer
as just a long list of available
bytes-- lots of bits, zeroes and ones,
that you can change the values of.
And maybe it's kind of a
grid, so there's lots of bytes
horizontally, lots of bytes vertically.
We can kind of number them all
so that one of the bytes is 0,
and the other one way at the bottom
is, like, the 2 billionth byte.
So just assume that we can number all
of the bytes in our computer's memory.
Well, it turns out that when you
type in Stelios's name, it of course
ends with an S. But it would probably be
a stupid decision to just look for an S
when figuring out the
length of someone's name
because it's not gonna work on my name.
It's not gonna work on Maria's name or
any number of other people in the room.
So we don't know enough
yet about what's going
on inside of the computer's memory.
It turns out that if you think of
this grid now as your computer's RAM,
maybe top-left corner is byte zero.
The one next to it is byte one,
then byte two, then dot, dot, to,
byte two billion.
So I'm just arbitrarily depicting
it as a two-dimensional grid.
Turns out we need to know that
there's this special character.
What C does for us even
without our telling it to do,
it always puts a secret number at the
end of any string the human types in.
It's specifically
represented as backslash 0.
But that's just the special
way, like backslash n
is special, of saying that is
eight zero bits all together.
It's a special value, 0.
And so now that we have this so-called
sentinel value, if you will--
sentinel value means
this is just special.
The human can't really type this.
Like, I can't actually type all
zero bits easily on my keyboard
because honestly, even if you hit
the number zero, that is technically
the character 0 because it turns
out even numbers on your keyboard
map to different integers.
But more on that another time.
So 00000000 as bits are what that is.
And so if I write a program that
calls get_string multiple times,
and Stelios is the first
one to type in his name,
it might end up in
memory looking like this.
But then suppose one other person
types in their name, like Maria.
Her name is just going to fit
in the next available memory,
but also be null
terminated, so to speak.
The sentinel value is also called
null, N-U-L. But that's just all zeros.
And then if someone else
types in his or her name,
it's still going to fit in there.
So Zamyla, for instance--
it wraps around, but again, this
is an arbitrary artist's rendition
of my computer's memory.
Z-A-M-Y-L-A, backslash 0.
And I can keep typing in names
until I'm just out of memory.
At that point, the
program's going to crash,
or I'm gonna have an if condition
that says too many things in memory.
Something's gonna have
to stop at that point.
So what this means
for my implementation,
ultimately, is the following.
I can now go ahead here and change
this silly English to the following.
While the n'th character of
the string does not equal,
quote, unquote, backslash 0.
And I'm using single quotes this
time because recall from last time
that we use single quotes anytime
we talk about single characters.
We use double quotes any time
we're talking about strings.
And even though s is a string, s[n] is
the n'th or i'th-- doesn't matter what
letter we use-- the n'th character.
So that's a character.
And so we now need to use single quotes.
So this is really just doing the
following-- it's initializing n to 0.
And then it's looking in memory.
And it's saying, is this backslash 0?
If not, increment n by 1.
Is this backslash 0?
No.
No.
No.
No.
Damn it.
No.
No.
Yes.
And at that point, I have 7 fingers
up, or n is storing the value 7.
That's what my program
is going to print out.
So now we have a complete
program that counts
the number of characters in a string.
I don't need this program because
strlen exists as a function.
But it's now a capability
to which I have access.
Any questions, then,
on what a string really
is underneath the hood as
this sequence of characters
with a special null
character at the end?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Ah, good question.
What about other data types, if I
can rephrase it like ints and floats
and so forth?
Actually, strings are special.
If I scroll back to the list
of data types that C has,
for instance, most of
these are of fixed length.
And this is why the
compiler needs to know
what you're putting in them because
the compiler and the computer in turn
need to know is it one byte?
Is it two bytes?
Is it four?
Is it eight?
How many bytes should I look at?
Strings have no predetermined
length because, of course,
we don't know who's going
to type in their name.
But an int, turns out,
in most systems is always
going to be 32 bits or maybe
64 bits, or, equivalently,
four bytes or eight bytes because
there's a one to eight ratio.
A bool is often one byte.
It's a little wasteful.
Even though technically
you need one bit,
it's just easier to deal
with eight-bit increments.
Chars are, by definition,
eight bits or one byte.
So almost all of the data
types are a fixed length.
So you don't need to have
a special null character.
But strings you do.
Strings are special.
Other questions?
All right.
So what can we start to do with this?
Well, it turns out that this idea
of thinking about things that are
back-to-back-to-back-to-back- as being
individually accessible is actually
a very powerful idea.
Because up until now, we've just
had this list of data types--
bool and float and char and int.
It's kind of a short list
of very primitive things.
But it turns out if you want to
write a program that doesn't just
keep asking for one name but asks for
two people's names or 10 people's names
or asks for, as you asked earlier,
the name or maybe their house
or their dorm or their phone
number or their email address--
a whole bunch of different values--
it would be nice to kind of
store multiple things together.
And one way you can store multiple
strings is you could call one string s.
You can call the next string t.
You could call the next string
whatever-- you could just come up
with arbitrary names for your strings.
But that's going to
very quickly devolve.
Imagine, like, what the
registrar uses here or at Yale
to actually keep track of students.
They don't have a computer program with
thousands of variables inside of it.
They probably have a
computer program for dealing
with course registrations with at
least one variable called students.
And inside of that students
variable can the registrar
fit one student, 10 students,
thousands of students.
It can kind of grow to fill the number
of values we actually care about.
And C isn't quite as powerful as that.
We'll need another language like Python
or JavaScript to really get dynamism.
But for now, we do have the ability
in C to represent multiple things
back to back to back to
back to back in memory.
So not just characters in strings.
We can borrow that idea from strings
and store, if we really want,
student, student, student,
student like multiple strings
back to back instead of
just individual characters.
And what that idea is
called is an array.
An array is a contiguous
chunk of memory,
something back to back to
back-- literally physically
next to each other, typically, in the
RAM that we've presented as hardware.
But it's not just character,
character, character.
Maybe it's int, int int, int, int
or string, string, string, string,
string or, more generally, student,
student, student, student, student--
multiple things back to back to back.
And so now we can actually
give you a glimpse
of what this thing here is that
we keep typing sort of on faith.
Int main void literally
says that your main programs
that you're about to start
writing for pset one and beyond
will be returning an int, even
if you don't do it yourself.
They're going to return by
default 0, it turns out.
And we'll see before long why
this is useful for a main function
to return a value, even though we humans
will rarely, if ever, see that value.
But it is interesting
to note that main can
take input, and not input in the sense
of get_int and get_string and so forth.
You can actually provide
your program with input
at the so-called command line.
All this time, I've been
typing ./mario0, ./mario1,
and no words after that.
And yet we've shown you clang
already, the compiler itself,
which can take in, like,
-o and then -lcs50,
all of these additional key words
that somehow influence its behavior.
So wouldn't it be nice if I
could write a program where
I don't prompt the user
eventually for his or her name.
Let me just let them type
their name at the command line
and hit Enter once and be
done with it, just like clang
is just one long command,
and you're done with it.
There's no prompts.
Well, we can do this if
we change void to this.
And it's a mouthful, but there
is an alternative version of main
that does not just take zero arguments.
That's what the key word
void all this time has meant.
It just means main takes
no input by default.
You have to prompt the user explicitly
with get_int or get_string or whatever.
But there's an alternative
second version of main in C
that takes two inputs.
And you don't have to
provide them explicitly.
We'll see how to use this in a second.
Main can also be handed two inputs.
One is an int, and one
is an array of strings.
The int is the total number
of words that the human
has typed at their keyboard.
The argv, argument
vector, by convention,
though we could call it anything
we want, that is an array of words
that the user typed at the
prompt before hitting Enter.
And so this is useful
in the following way.
I'm going to go ahead and in today's
source code open up an example called
argv-- for argument vector--
0 as follows.
In argv0, there's not
all that much going on.
And if you at least kind of
take on faith the concept here,
you can perhaps infer what's going on.
So I've changed what main looks like
on line six, the signature of main,
so to speak.
And then I'm asking a question.
If argv equals equals 2, then
print out "Hello, something."
Otherwise, just print out
the hardcoded "hello, world."
So it looks like argv[1] is kind of
being treated like we were treating
strings a moment ago.
But this is the special
syntax that's new.
If you use square brackets
like this, like I've done,
with no numbers inside, that's like
telling the computer, hey, computer,
this variable argv is going to be
an array of some length of strings.
Why strings?
Because string is the word immediately
to the left-- string argv0[].
Now, I don't know how the
strings are gonna get in there.
The computer's gonna do that for me.
But it gives me this capability.
Let me go ahead and compile this
program as follows-- make argv0.
./argv0.
Hello, world.
Uninteresting.
But if I now type in my name at the
prompt and hit Enter, now it's dynamic.
So what must this mean?
Even if the syntax is a
little new, we can kind of
infer now what this must be doing.
Argc happens to stand
for argument count.
So argc equaling two apparently
implies that the human typed two words
at the prompt-- the name of the program,
and then whatever else he or she typed.
Meanwhile, argv-- argument vector--
is the variable that you can use to go
get the first word or the second word
or, if there are more, the
third and fourth words.
In fact, if I kind of change this
manually, what should probably be,
by that logic, in argv[0]?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, the
name of the program, right?
So let me see.
So make argv0.
./argv0 David.
Hello-- OK.
I mean, it's stupid-looking,
but that's all I'm doing.
I could be a little bold and say what
is in the 100th location of this array
or list, as you can also think of it?
Make argv0.
./david.
Whoa.
That is bad.
And get used to this because it will
start to happen with greater frequency.
Segmentation fault is a very cryptic
way of saying you touched memory, RAM,
that you should not have.
And you can kind of
think of what this means.
So if argv[0]-- let me pull
up my picture of an array.
If my array looks like this, and argv[0]
is here, and that was safe to print,
and argv[1] is here.
That was safe to print.
It was my name.
And argv-- what did I do--
100, it's like way over here.
I don't know what's over here.
And indeed, touching
that memory was very bad.
The program crashed.
And segmentation fault is an allusion
to how computers lay out memory.
You've got like a segment of memory
here, a segment of memory here,
a segment of memory here.
Segmentation fault means you
touched a chunk of memory
that was not yours to use,
to change or to even view.
So I got lucky, though--
well, I didn't get lucky.
I could sometimes see garbage values.
Let me be a little more conservative.
Let me put [2], which is just
one past what I typed in.
It's sometimes undefined behavior.
I don't know what I'm gonna get.
Null.
So there's some funky
characters there or zeros there.
But now you're playing
with fire, so to speak.
These are logical bugs in my program.
But it is OK to check
if if argc is two, then
it's OK to look at 0 and 1,
two things and only two things.
Any questions on that?
All right.
So where, in what domain, is
this kind of thing helpful?
And there's a couple more examples
of argv that you can look at online.
Turns out that in the
world of cryptography,
this stuff really starts
to get interesting.
So the world of cryptography is
all about scrambling information.
Maybe back in the day
in grade school, you
might have passed notes to a friend or
a crush that you had in the classroom.
And if you were really clever, or
your teacher was really adversarial,
you might have to encode your message
so that you're not just writing,
like, "I love you" or whatever.
But you instead change all
the A's to B's, and all
the B's to C's or hopefully something
a little more cryptic than that so that
the teacher can't just change all the
B's to A's and all the C's to B's.
But you kind of scrambled the words.
But you scrambled the words,
perhaps, in such a way
that it's reversible by the
recipients, the recipient
of your encrypted message.
So to encrypt information means to
convert it into some other format,
from what's called plaintext to
ciphertext, which sounds really cool,
and it's just the scrambled version.
But it's not random.
It's got to follow a
pattern or, if you will,
an algorithm so that he or she on the
other end can reverse the algorithm
and undo it.
Now, in the simple example
I proposed, A becomes
B. B becomes C. What is the secret
that you and your crush know?
It's probably just the number one.
He or she has to just know,
if you added 1 to the letters,
that they should subtract
1 to the letters.
And hopefully they know that if you hit
Z, you should probably wrap around to A
and not get into a weird
punctuation or something like that.
So you can keep an
algorithm as simple as that.
So we can think of cryptography, really,
as just an example of problem-solving.
You want to send a message
from someone, yourself,
to someone else, maybe
over a very insecure medium
like passing a note through the room.
And you want only one person
to know how to access it.
That's like providing inputs,
and you want outputs--
your plaintext and your
ciphertext-- so that no one
can understand it except
you and the recipient.
So it turns out that cryptography--
there's different forms of it, but
perhaps the simplest looks like this.
There's two inputs, the plaintext,
the message you want to actually send,
and then the key, which might be
a number like 1 or 2 or 25 or 26.
And more than that's probably
silly because you're just
wrapping around the alphabet
even more, so to speak.
But the output is going to be
something called ciphertext.
And when your crush
receives this message,
he or she really just needs
to reverse the process.
They have to know the key.
Otherwise, they're going to be guessing
all day long what your message actually
was.
But so long as you know the secret
in advance, you can do this.
Now, of course, there's a gotcha.
You have to be on speaking terms
with this person you're crushing on
because he or she needs to know
what the key is in advance.
Otherwise, you're just sending
them nonsensical values.
So that's kind of, too, a catch-22.
In order to send a secret
message from A to B,
A and B need to be able to confer
in advance and agree on this secret.
But if you need to agree
in advance on a secret,
why don't you just use that time to
send the message directly to the person?
Right?
So there's this disconnect.
And we'll come back to this
before long because most of us
probably don't know someone
who works at, like, amazon.com.
And yet when I buy
something on Amazon, I've
been told all these
years that it's secure.
It's encrypted.
My credit card, my name,
and all of that are somehow
encrypted between me and Amazon in
Seattle or wherever their servers are.
But I don't know anyone there.
And yet somehow,
cryptography still works.
So this type of cartography is just
one called secret-key cryptography.
But there's public-key
cryptography and yet other things.
And so what you'll find in
problem set two in particular
is you'll have an opportunity
to explore this world,
whereby you'll write software
that encrypts and then, hopefully,
decrypts information and even, if
you're among those more comfortable,
an opportunity to try
writing software that
takes passwords that are encrypted--
or, more properly, hashed, so to speak.
More on that before long--
and you try to crack those
passwords, actually figure out
what the passwords actually were.
And it all boils down to,
ultimately, in the context of C,
taking as input a
message, like a plaintext,
and somehow converting it to
ciphertext by manipulating
those individual characters, or, if
you're the recipient, vice versa.
And I like to show a clip from,
frankly, a film you can watch, like,
literally every hour on the hour
around the holidays, "A Christmas
Story," because it has an example of
a very simple form of cryptography.
If you ever saw this movie,
this is little Ralphie.
And he's really excited because
over months or whatever,
he saves up and sends in,
like, all of these, like,
cereal box covers or
something like that,
and gets back, finally,
this secret decoder ring.
And the secret decoder ring
is kind of a nice mental model
to have for the type of
cryptography I'm proposing here,
this sort of rotational idea--
A becomes B. B becomes
C. Because if you imagine
a ring that has another
ring on the outside,
you can kind of line up the A's
and Z's, so to speak, differently.
And that's what he was saving up for.
So I thought we'd take just
a moment to look at this clip
to inspire one of the problems ahead.
[VIDEO PLAYBACK]
- Be it known to all and sundry
that Ralph Parker is hereby
appointed a member of the Little
Orphan Annie secret circle
and is entitled to all the honors
and benefits occurring thereto.
Too
- Signed Little Orphan Annie!
Countersigned Pierre Andre!
In ink!
Honors and benefits
already, at the age of nine.
- Let's go overboard!
- Come on.
Let's get on with it.
I don't need all that jazz
about smugglers and pirates.
- Listen tomorrow night for
the concluding adventure
of the black pirate ship.
Now it's time for Annie's secret message
for you members of the secret circle.
Remember, kids, only members
of Annie's secret circle
can decode Annie's secret message.
Remember, Annie is depending on you.
Set your pins to B2.
Here is the message.
12, 11--
- I am in.
My first secret meeting.
- --14, 11, 18, 16--
- Oh, Pierre was in great voice tonight.
I could tell that tonight's
message was really important.
- --3, 25.
That's a message from Annie herself.
Remember, don't tell anyone.
- 90 seconds later I'm in the only
room in the house where a boy of nine
could sit in privacy and decode.
Aha!
B!
I went to the next.
E. The first word is "be!"
S. It was coming easier now.
U.
- Aw, come on, Ralphie!
I got to go!
- I'll be right down, Ma!
Gee, whiz.
- T. O!
"Be sure to"-- be sure to what?
What was Little Orphan
Annie trying to say?
"Be sure to" what?
- Ralphie, Randy has got to go.
Will you please come out?
- All right, Ma!
I'll be right out!
- I was getting closer now.
The tension was terrible.
What was it?
The fate of the planet
may hang in the balance.
[KNOCKING]
- Ralph, Randy's got to go!
- I'll be right out,
for crying out loud!
DAVID MALAN: Gee, almost there!
My fingers flew.
My mind was a steel trap.
Every pore vibrated.
It was almost clear!
Yes!
Yes!
Yes!
Yes!
- "Be sure to drink your Ovaltine."
Ovaltine?
A crummy commercial?
Son of a bitch!
[END PLAYBACK]
DAVID MALAN: That's it for CS50.
We'll see you next time.
[APPLAUSE]
