[MUSIC PLAYING]
DAVID J. MALAN: All right.
This is CS50 and this is
the start of week two.
And you'll recall that over
the past couple of weeks,
we've been building up.
First initially from Scratch, the
graphical programming language
that we then, just last week,
translated to the equivalent program NC.
And of course, there's
a lot more syntax now.
It's entirely text but the ideas,
recall, were fundamentally the same.
The catch is that computers
don't understand this.
They only understand what language?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: zeros
and ones or binary.
And so there's a requisite step in order
for us to get from this code to binary.
And what was that step or that
program or process called?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, so compiling.
And of course, recall as
you've now experimented
with this past week that
to compile a program,
you can use clang for C, language.
And you can just say clang
and then the name of the file
that you want to compile.
And that outputs by default
a pretty oddly named program.
Just a dot out.
Which stands for assembler output.
More on that in just a moment.
But recall too that you can
override that default behavior.
And you can actually say,
Output instead a program
called, hello instead of just a dot out.
But you can go one step further,
and you actually use Make.
And Make it self is not a
compiler, it's a build utility.
But in layman's terms,
what does it do for us?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: compiles it.
And it essentially figures out
all of those otherwise cryptic
looking command line arguments.
Like dash-o something, and so forth.
So that the program
is built just the way
we want it without
our having to remember
those seemingly magical incantations.
And though that only works for
programs as simple as this.
In fact, some of you with
the most recent problems that
might have encountered compilation
errors that we actually
did not encounter deliberately in
class because Make was helping us out.
In fact, as soon as
you enhance a program
to actually take user input using
CS50's library by including CS50 dot H,
some of you might have realized
that all of a sudden the sandbox,
and more generally Clang,
didn't know what get_string was.
And frankly, Clang might not
even known what a string was.
And that's because those two
are features of CS50's library
that you have to teach Clang about.
But it's not enough to teach Clang what
they look like, as by including CS50.h.
Turns out there's a missing
step that Make helps us solve
but that you too can just
solve manually if you want.
And by that I mean this, instead of
compiling a program with just Clang,
hello.c.
When you want to use CS50's
library, you actually
need to add this additional
command line argument.
Specifically at the end, can't
go in the beginning like dash-O.
And dash-L stands for link.
And this is a way of telling Clang,
by the way when compiling my program,
please link in CS50's zeros
and ones that we the staff
wrote some weeks ago and
installed in the sandbox for you.
So you've got your
zeros and ones and then
you've got our zeros
and ones so to speak.
And dash-LCS50 says
to link them together.
So if you were getting some kind of
undefined reference error to get_string
or you didn't--
you weren't able to compile a program
that just used any of the get functions
from CS50's library.
Odds are, this simple change
dash-LCS50 would have fixed.
But of course, this isn't interesting
stuff to remember, let alone
remembering how to use dash-0
as well, at which point
the command gets really tedious to type.
So here comes, Make again.
Make automates all of this for us.
And in fact, if you henceforth
start running Make and then
pay closer attention to the fairly
long line of output that it outputs,
you'll actually see
mention of dash-LCS50,
you'll see mention of even
dash-LM, which stands for math.
So if you're using
round, for instance, you
might have discovered
that round two also
doesn't work out of the box
unless you use Make itself
or this more nuanced approach.
So this is all to say that
compiling is a bit of a white lie.
Like, yes you've been compiling and
you've been going from source code
to machine code.
But it turns out that there's been
a number of other steps happening
for you that we're going to
just slap some labels on today.
At the end of the day, we're
just breaking the abstraction.
So compiling is this abstraction
from source code to machine code.
Let's just kind of zoom
in briefly to appreciate
what it is that's going on in
hopes that it makes the code we're
compiling a little more understandable.
So step one of four, when it comes
to actually compiling a program
is called Pre-processing.
So recall that this program we
just looked at had a couple of
includes at the top of the file.
These are generally known
as pre-processor directives.
Not a particularly
interesting term but they're
demarcated by the hash at
the start of these lines.
That's a signal to Clang that these
things should be handled first.
Preprocessed.
Process before everything else.
And in fact, the reason for this we
did discuss last week, inside of CS50.h
is what, for instance?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Specifically,
the declaration of get strings.
So there's some lines of code,
the prototype if you recall,
that one line of code that teaches
Clang what the inputs to get_string are
and what the outputs are.
The return type and the
arguments, so to speak.
And so when you have include
CS50.h at the top of the file, what
is happening when you first run Clang
during this so-called pre-processing
step, is Clang looks on the hard drive
for the file literally called CS50.h.
It grabs its contents and essentially
finds and replaces this line here.
So somewhere in CS50.h is a
line like this yellow one here
that says get_string, is a
function that returns a string.
And it takes as input, the
so-called argument, a string
that we'll call prompt.
Meanwhile, with include standard I/O.
What's the point of including that?
What is declared inside
of that file presumably?
Yeah?
AUDIENCE: It's the standard
inputs and outputs.
DAVID J. MALAN: Standard
inputs and outputs.
And more specifically,
what example there of?
What function?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: So printf.
The other function we keep using.
So inside of standard io.h,
somewhere on the sandbox's hard drive
is similarly a line of code that
frankly looks a little more cryptic
but we'll come back
to this sort of thing
down the road, that says
print if is a function.
Happens to return on int, but
more on that another time.
Happens to take a char* format.
But more on that another time.
Indeed, this is one
of the reasons we hide
this detail early on because
there's some syntax that's
just a distraction for now.
But that's all that's going on.
The sharp include sign is just
finding and replacing the contents.
Plus dot, dot, dot, a bunch of
other things in those files as well.
So when we say
pre-processing, we just mean
that that's getting substituted in
so you don't have to copy and paste
this sort of thing manually yourself.
So "compiling" is a word that
actually has a well-defined meaning.
Once you've preprocessed your code, and
your code looks essentially like this,
unbeknownst to you, then comes
the actual compilation step.
And this code here gets
turned into this code here.
Now this is scary-looking,
and this is the sort of thing
that if you take a class
like CS61 at Harvard,
or, more generally, systems
programming, so to speak,
you might see something like this.
This is x86 64-bit
assembly instructions.
And the only thing interesting
about that claim for the moment
is that assembly--
I kind of alluded to that
earlier-- assembler output, a.out.
There's actually a relationship
here, but long story short, these
are the lower level
instructions that only the CPU,
the brain inside your
computer, actually understands.
Your CPU does not understand C. It
doesn't understand Python or C++
or Java or any language with
which you might be familiar.
It only understands this
cryptic-looking thing.
But frankly, from the looks of it, you
might glean that probably not so much
fun to program in this.
I mean, arguably, it's not that
much fun to program yet in C,
So this looks even more cryptic.
But that's OK.
C and lots of languages
are just these abstractions
on top of the lower level
stuff that the CPUs do actually
understand so that we don't
have to worry about it as much.
But if we highlight a few terms,
here you'll see some familiar things.
So main is mentioned in this
so-called assembly code.
You see mention of
get string and printf,
so we're not losing information.
It's just being presented in really a
different language, assembly language.
Now you can glean, perhaps, from some
of the names of these instructions,
this is what Intel Inside means.
When Intel or any brand of
CPU understands instructions,
it means things like pushing and
moving and subtracting and calling.
These are all low
level verbs, functions,
if you will, but at
the level of the CPU.
But for more on that, you
can take entire courses.
But just to take the hood
off of this for today,
this is a step that's been happening
for us magically unbeknownst
to us, thanks to Clang.
So assembling-- now that you've got this
cryptic-looking code that we will never
see again-- we'll never
need to output again--
what do you do with it?
Well, you said earlier that computers
only understand zeros and ones,
so the third step is actually to convert
this assembly language to actual zeros
and ones that now look like this.
So the assembling step
happening, unbeknownst to you,
every time you run
Clang or, in turn, run
make, we're getting zeros and
ones out of the assembly code,
and we're getting the assembly
code out of your C-code.
But here's the fourth and final step.
Recall that we need to link in
other people's zeros and ones.
If you're using printf
you didn't write that.
Someone else created those
zeros and ones, the patterns
that the computer understands.
You didn't create get string.
We did, so you need access
to those zeros and ones
so that your program
can use them as well.
So linking, essentially, does this.
If you've written a program--
for instance, hello.c--
and it happens to use a
couple of other libraries,
files that other people
wrote of useful code
for you, like cs50.c,
which does exist somewhere,
and even stdio.c, which
does exist somewhere,
or technically, Standard
IO is such a big library,
they actually put printf in a
file specifically called printf.c.
But somewhere in the sandbox's hard
drive, in all of our Macs and PCs,
if they support compiling, are,
for instance, files like these.
But we've got to convert this to
zeros and ones, this, and this,
and then somehow combine them.
So pictorially, this just
looks a bit like this.
And this is all happening
automatically by Clang.
Hello.c, the code you
wrote, gets compiled
to assembly, which then gets assembled
into zeros and ones, so-called machine
code or object code.
Cs50.c-- we did this for you
before the semester started.
Printf was done way before
any of us started decades
ago and looks like this.
These are three separate files,
though, so the linking step literally
means, link all of these things
together, and combine the zeros
and ones from, like, three,
at least, separate files,
and just combine them
in such a way that now
the CPU knows how to use not just
your code but printf and get string
and so forth.
So last week, we introduced
compiling as an abstraction,
if you will, and this is all that
we've really meant this whole time.
But now that we've seen what's
going on underneath the hood,
and we can stipulate that
my CPU that looks physically
like this, albeit smaller
in a laptop or desktop,
knows how to deal with all of that.
So any questions on these four steps--
pre-processing, compiling,
assembling, linking?
But generally, now, we can just call
them compiling, as most people do.
Any questions?
Yeah.
AUDIENCE: How does the CPU
know that [INAUDIBLE] is there?
Is that [INAUDIBLE]?
DAVID J. MALAN: Not in
the pre-processing step,
so the question is,
how does the computer
know that printf is the
only function that's there?
Essentially, when
you're linking in code,
only the requisite zeros and
ones are typically linked in.
Sometimes you get more than you
actually need, if it's a big library,
but that's OK, too.
Those zeros and ones are
just never used by the CPU.
Good question.
Other questions?
OK, all right.
So now that we know this
is possible, let's start
to build our way back
up, because everyone here
probably knows now that
when writing in C, which
is kind of up here
conceptually, like, it
is not without its hurdles and
problems and bugs and mistakes.
So let's introduce a few techniques and
tools with which you can henceforth,
starting this week and beyond, trying
to troubleshoot those problems yourself
rather than just trying to read through
the cryptic-looking error messages
or reach out for help to another human.
Let's see if software can actually
answer some of these questions for you.
So let me go ahead and do this.
Let me go ahead and
open up a sandbox here,
and I'm going to go ahead
and create a new file called
buggy0.c in which I will, this
time, deliberately introduce a bug.
I'm going to go ahead and
create my function called
main, which, again, is the default,
like when green flag is clicked.
And I'm going to go ahead and
say, printf, quote, unquote,
"Hello world/m."
All right.
Looks pretty good.
I'm going to go ahead and
compile buggy0, Enter,
and of course, I get a bunch
of error messages here.
Let me zoom in on them.
Fortunately, I only have two, but
remember, you have to, have to,
have to always scroll
up to look at the first,
because there might just be an annoying
cascading effect from one earlier
bug to the later.
So buggy0.c, line 5, is what this
means, character 5, so like 5 spaces in,
implicitly declaring library
function printf with dot, dot, dot.
So you're going to start to see
this pretty often if you make
this particular mistake or oversight.
Implicitly declaring
something means you forgot
to teach Clang that something exists.
And you probably know from experience,
perhaps now, what the solution is.
What's the first mistake I made here?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Yeah, I didn't
include the header file,
so to speak, for the library.
I'm missing, at the top of
the file, include stdio.h,
in which printf is defined.
But let's propose that you're not
quite sure how to get to that point,
and how can we get, actually,
some help with this?
Let me actually increase
the size of my terminal
here, and recall that just a
moment ago, I ran makebuggy0,
which yielded the errors that I saw.
It turns out that
installed in the sandbox
is a command that we, the
staff, wrote called help50.
And this is just a program we
wrote that takes as input any error
messages that your code or
some program has outputted.
We kind of look for
familiar words and phrases,
just like a TF would in office hours,
and if we recognize some error message,
we're going to try to provide,
either rhetorically or explicitly,
some advice on how to handle.
So if I go ahead and run this command
now, notice there's a bit more output.
I see exactly the same output in
white and green and red as before,
but down below is some yellow, which
comes specifically from help50.
And if I go ahead and
zoom in on this, you'll
see that the line of
output that we recognized
is this one, that same one
I verbally drew attention
to before-- buggy0.c, line 5, error,
implicitly declaring library function
printf, and so forth.
So here, without the background
highlighting, but still in yellow,
is our advice or a question a TF or
CA might ask you in office hours.
Well, did you forget to
include stdio.h in which printf
is declared atop your file?
And hopefully, our questions,
rhetorical or otherwise, are correct,
and that will get you further along.
So let's go ahead and try that advice.
So include stdio.h.
Now let me go ahead
and go back down here.
And if you don't like
clutter, you can type "clear,"
or hit Control+L in the terminal
window to keep cleaning it like I do.
If you want to go ahead now and run
makebuggy0, Enter, fewer errors,
so that's progress, and not the same.
So this one's, perhaps, a little easier.
Reading the line, what
line of code is buggy here?
AUDIENCE: Forgot the semicolon.
DAVID J. MALAN: Yeah, so this is
now still on line 5, it turns out,
but for a different reason.
I seem to be missing a semi-colon.
But I could similarly ask
help50 for help with that
and hope that it recognizes my error.
So this, too, should start
being your first instinct.
If on first glance, you
don't really understand
what an error message is
doing, even though you've
scrolled to the very first one, like
literally ask this program for help
by rerunning the exact
same command you just
ran, but prefix it with
help50 and a space,
and that will run help50 for you.
Any questions on that process?
All right, let's take a
look at one other program,
for instance, that, this time, has
a different error involved in it.
So how about-- let me go ahead
and whip up a quick program here.
I'll call this buggy2.c for
consistency with some of the samples
we have online for you later.
And in this example, I'm going to
go ahead and write the correct thing
at first, stdio.h, and then I'm
going to have int main void, which
just gets my whole program started.
And then I'm going to have
a loop, and recall for--
[CLEARS THROAT] excuse me--
Mario or some other program,
you might have done something like int
i get 0, i is less than or equal to--
let's do this 10 times, and then i++.
And all I want to do in this program is
print out that value of i, as I can do,
with the %i placeholder--
so a simple program.
Just want it to count from 0 to 10.
So let's go ahead and run
buggy2, or rather, I want to--
let's not print up--
rewind.
Let's go ahead and just
print out a hash symbol
and not spoil the solution this way.
So here, I go ahead
and print out buggy2.
My goal is now I will stipulate to print
out just 10 hash symbols, one per line,
which is what I want to do here.
And now I'm going to go ahead and run
./buggy2, and I should see, hopefully,
10 hashes.
And I kind of spoiled this a little
bit, but what do I instead see?
Yeah, I think I see more than I expect.
And we can kind of zoom in here and
double check, so 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, ooh, 11.
11.
Now some of your eyes might already be
darting to what the solution should be,
but let's just propose
that it's not obvious.
And if it is actually not obvious,
all the better, so how might
you go about diagnosing this
kind of problem, short of just
reaching out and asking
a human for help.
This is not a problem
that help50 can help with,
because it's not an error message.
Your program is working.
It's just not outputting
what you wanted it to work,
but it's not an error message from the
compiler with which help50 can help.
So you want to kind of get eyes
into what your program is doing,
and you want to understand, why
are you printing 11 when you really
are setting this up from 0 to 10?
Well, one of the most common
techniques in C or any language,
honestly, is to use printf for just
other purposes-- diagnostic purposes.
For instance, there's not
much going on in this program,
but I'd argue that it would
be interesting for me to know,
and therefore understand
my program, by just,
let's print out this value
of i on each iteration,
as by doing the line of
code that I earlier did,
and just say something
literally like, i is %i.
I'm going to remove this
ultimately, because it's
going to make my program
look a little silly,
but it's going to help me
understand what's going on.
Let me go ahead and recompile
buggy2, ./bugg2, and this time,
I see a lot more output.
But if I zoom in, now it's kind of--
now the computer is essentially
helping me understand what's going on.
When i is 0, here's one of them.
When i is 1, here's another.
I is 2, 3, 4, 5, 6, 7, 8,
9, and that looks good.
But if we scroll a little further,
it feels a little problematic
that i can also be 10.
So what's logically the
bug in this program?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Yeah.
I use less than or equal to, because
I kind of confuse the paradigm.
Like programmers tend to
start counting at zero,
apparently, but I want to do this
10 times, and in the human world,
if I want to do something 10 times,
I might count up to and including 10.
But you can't have it both ways.
You can't start at zero
and end at 10 if you
want to do something exactly 10 times.
So there's a couple
of possibilities here.
How might we fix this?
Yeah, so we could certainly
change it to less than.
What's another correct approach?
Yeah, so we could leave this alone
and just start counting at one,
and if you're not actually printing
the values in your actual program,
that might be perfectly reasonable, too.
It's just not conventional.
Get comfortable with, quickly, just
counting from zero, because that's just
what most everyone does these days.
But the technique here
is just use printf.
Like, when in doubt, literally use
printf on this line, on this line,
on this line.
Anywhere something is interesting
maybe going on in your program,
just use it to print out the
strings that are in your variables,
print out the integers that are in
your variables, or anything else.
And it allows you to
kind of see, so to speak,
what's going on inside
of your program, printf.
One last tool-- so it's not
uncommon, when writing code,
to maybe get a little sloppy early
on, especially when you're not
quite familiar with the patterns.
And for instance, if
I go ahead and do this
by deleting a whole bunch of whitespace,
even after fixing this mistake
by going from zero to 10,
is this program now correct,
if the goal is to print 10 hashes?
Yeah, I heard yes.
Why is it correct?
In what sense?
Yeah, exactly.
It still works.
It prints out the 10
hashes, one per line,
but it's poorly written
in the sense of style.
So recall that we tend to
evaluate, and the world
tends to think about code
in at least three ways.
One, the correctness-- does it
do what it's supposed to do,
like print 10 hashes?
And yes, it does, because all
I did was delete whitespace.
I didn't actually change or break
the code after making that fix.
Two is design, like how thoughtful,
how well-written is the code?
And frankly, it's kind of hard
to write this in too many ways,
because it's so few lines.
But you'll see over time,
as your programs grow,
the teaching fellows and staff
can provide you with feedback
on the design of your code.
But style is relatively easy.
And I've been teaching it mostly
by way of example, if you will,
because I've been very
methodically indenting my code
and making sure everything looks
very pretty, or at least pretty
to a trained eye.
But this, let's just
stipulate, is not pretty.
Like, left aligning everything
still works, not incorrect,
but it's poorly styled.
And what would be an
argument for not writing code
like this and, instead,
writing code the way
I did a moment ago, albeit
after fixing the bug?
Yeah.
AUDIENCE: It'll help you identify
each little subroutine that
goes through the thing, so
you know this section is here.
DAVID J. MALAN: Yeah.
AUDIENCE: [INAUDIBLE] next one,
so you know where everything is.
DAVID J. MALAN: Exactly.
Let me summarize this.
It allows you to see,
more visually, what
are the individual subroutines
or blocks of code doing
that are associated with each other?
Scratch is colorful, and it has
shapes, like the hugging shape
that a lot of the control
blocks make, to make
clear visually to the programmer
that this block encompasses others,
and, therefore, this repeats
block or this forever block
is doing these things
again and again and again.
That's the role that these curly
braces serve, and indentation
in this and in other
contexts just helps it
become more obvious to the
programmer what is inside of what
and what is happening where.
So this is just better
written, because you
can see that the code inside of main
is everything that's indented here.
The code that's inside the for loop
is everything that's indented here.
So it's just for us human
readers, teaching fellows
in the case of a course, or colleagues
in the case of the real world.
But suppose that you don't quite see
these patterns too readily initially.
That, too, is fine.
CS50 has on its website
what we call a style guide.
It's just a summary of
what your code should
look like when using certain
features of C-- loops,
conditions, variables,
functions, and so forth.
And it's linked on the course's website.
But there's also a tool
that you can use when
writing your code that'll help you
clean it up and make it consistent,
not just for the sake of making it
consistent with the style guide,
but just making your
own code more readable.
So for instance, if I go
ahead and run a command called
style50 on this program,
buggy2.c, and then hit Enter,
I'm going to see some
output that's colorful.
I see my own code in white,
and then I see, anywhere
I should have indented,
green spaces that
are sort of encouraging me to put
space, space, space, space here.
Put space, space, space, space here.
Put eight spaces here, four
spaces here, and so forth,
and then it's reminding me I
should add comments as well.
This is a short program--
doesn't necessarily
need a lot of commenting
to explain what's going on.
But just one //, like we
saw last week to explain,
maybe at the top of the file
or top the block of the code,
would make style50 happy as well.
So let's do that.
Let me go ahead and take its advice
and actually indent this with Tab,
this with Tab, this with Tab,
this with Tab, and this once more.
And you'll notice that on your keyboard,
even though you're hitting Tab,
it's actually converting it for you,
which is very common to four spaces,
so you don't have to hit
the spacebar four times.
Just get into the habit of using Tab.
And let me go ahead and
write a comment here.
"Print 10 hashes."
This way, my colleagues, my
teaching fellow, myself in a week
don't have to read my own code again
and figure out what it's doing.
I can read the comments
alone per the //.
If I run style50 again,
now it looks good.
It's in accordance with the style guide,
and it's just more prettily written,
so pretty printed would be
a term of art in programming
when your code looks good
and isn't just correct.
Any questions then?
Yeah.
AUDIENCE: I tried using
[INAUDIBLE] this past week
and it said I needed a new program.
DAVID J. MALAN: That's--
it wasn't enabled for the
first week of the class.
It's enabled as of right
now and henceforth.
Other questions?
No.
All right, so just to recap then, three
tools to have in the proverbial toolbox
now are help50 anytime you see an error
message that you don't understand,
whether it's with make or Clang
or, perhaps, something else.
Printf-- when you've
got a logical program--
a bug in your program, and it's just
not working the way it's supposed to
or the way the problem set tells
you it should, and then style50
when you want to make sure that, does
my code look right in terms of style,
and is it as readable as possible?
And honestly, you'll find us
at office hours and the like
often encouraging you, hey,
before we answer this question,
can you please run style50 on your code?
Can you please clean up your code,
because it just makes our lives, too,
as other humans so much easier
when we can understand what's
going on without having to visually
figure out what parentheses and curly
braces line up.
And so do get into
that habit, because it
will save you time from having to waste
time parsing things visually yourself.
All right.
So there's not just CPUs in computers.
CPUs are the brains,
central processing unit,
and that's why we keep emphasizing the
instructions that computers understand.
There's also this, which
we saw last time, too.
This is an example of
what type of hardware?
AUDIENCE: RAM.
DAVID J. MALAN: RAM, or
Random Access Memory.
This is the type of memory
that laptops, desktops, servers
have that is used whenever you
run a program or open a file.
There's another type of memory called
hard drives or solid state drives,
which you're probably
familiar as a consumer,
and that's just where your
files are stored permanently.
Your battery can die.
You can pull the plug from
your laptop or desktop,
and any files saved on a
hard drive are persistent.
They stay there because
of the technology
being used to implement that.
But RAM is more ephemeral.
RAM is powered only by electricity.
It's only used when the power
is on or the battery is charged,
and it's where your files and
programs live effectively when
you double click on them and open them.
So when you double click on
something like Microsoft Word,
it is copied from your hard drive
long term into this type of memory,
because this type of memory,
though smaller in capacity--
you don't have as many bytes of it--
but it is much, much, much, much faster.
Similarly, when you open a
document, or you go to a web page,
the contents of the file you're seeing
are stored in this type of hardware,
because even though you don't
have terribly many bytes of it,
it's just much, much, much, much faster.
And so this will be thematic in
computer science and in hardware.
You sort of have lots
of cheap, slow stuff,
like hard disk space, relatively
speaking, and you have a little less
of the more expensive but
faster stuff like RAM.
And you have just one, usually, CPU,
which is the really fast thing that
can do a billion things per second.
But it, too, is more expensive.
So there's four visible chips
on this thing, if you will.
And we won't get into the details
of how these things work, but let's
just zoom in on this one black
chip here and focus on it
as being representative
as some amount of memory.
Maybe it's one megabyte,
one million bytes.
Maybe it's even one gigabyte
these days, one billion bytes.
But this is to say that this chip
can be thought of as just having
a bunch of bytes in it.
This is not to scale.
You have many more bytes
than these, but let
me propose that you just
think of each of these squares
here as representing one byte.
So the very first byte of
memory I have access to is here.
Next one is here, and so forth.
And the fact that they wrap around
is just an artist rendition.
These things you can think
of just virtually as going
left to right, not in any kind of grid,
but physically, they look like this.
So when you actually create a
variable in a program like C,
like you need a char.
A char tends to be one
byte or eight bits,
and so that means when you have a
variable of type char in a C program,
it goes, literally, physically
in one of these boxes,
inside of your computer's RAM.
So for instance, it might take
up this much space at top left.
If you have a bigger
type of data, so you
have an integer, which tends
to be four bytes or 32 bits,
you might need more than one square,
so the computer might give you access
to four squares instead.
And you have 32 bits spanning
that region of memory.
But honestly, I chose
those boxes arbitrarily.
They could be anywhere in that
chip or in any of the other chips.
It's up to the computer to just
remember where they are for you.
You don't need to remember that, per se.
But if we think about
this grid, it turns out
this is actually very valuable
that we have chunks of memory--
bytes, if you will--
that are back to back to back to back.
And in fact, there's a
word for this technique.
This is contiguous memory--
back to back to back to back to back.
And in general, in programming,
this is referred to as an array.
You might recall from Scratch,
if you use this feature,
it actually has things
called lists, which
are exactly that-- lists of values,
lists of words, lists of strings.
An array is just a contiguous
chunk of memory, such
that you can store something here,
something here, something here,
something here, and so forth.
So it turns out an array,
this super simple primitive,
is actually incredibly powerful.
Just being able to store
things in my computer's memory
back to back to back to back enables so
many possibilities, both design-wise,
like how well I can write my code, and
also how fast I can make my code run.
So let me go ahead and
take out an example.
Let me go ahead and open up, for
instance, a new file in a sandbox,
and we'll call this score0.
So let me go ahead and close this one,
create a new file called scores0.c.
And in this file, let's go ahead and
write a relatively simple program.
Let me go ahead and, as
usual, give myself access
to some helpful functions--
cs50.h and stdio.h.
And no need to copy all this
down verbatim, if you don't like.
Everything will have or is
already on the course's website.
Let me start my program as
usual with int main void.
And then let me write a program,
as this program's name implies,
that, like, asks the user for three
scores on recent problem sets,
quizzes, whatever, and then kind of
creates a very simple chart of them,
like a bar chart to kind of
help me visualize how well
or how poorly I did on something.
So if I want to get an
integer, no surprise,
we can use the get int
function, and I can just
ask the user for their first score.
But I should probably do
something with this score,
and on the left hand side of
this, what do I typically put?
Yeah.
So int-- sure, score 1 equals
this, and then my semi-colon.
So you might not have had many
occasions to use ints just yet,
but get int is in the cs50 library.
This is the so-called
prompt that the human
sees, and let me actually
fix my space, because I
want the human to see the
space after the colon.
But that's just an aesthetic detail.
And then when I get back this
value, its return value--
just like Aaron, last week,
handed me a piece of paper,
so does get int hand me a virtual
piece of paper with a number
that I'm going to store in
a variable called Score 1.
And now just to be clear, what has
just happened effectively is this.
The moment you create a variable
of type int, which is four bytes,
literally, this is what
Clang or, more generally,
the computer has done for you.
That int that the human
typed in is stored literally
in four contiguous bytes back to
back to back, maybe here, maybe here,
but together.
So that's all that's going on
when you're actually using C.
So let me go back into
my code here, and now I
want to-- it's not
interesting to plot one score.
So let's go ahead and do another.
So int Score 2, get int, get int,
and I'll ask the user for score 2,
semi-colon, and then let's get one
more, Score 3, get int, call it Score 3,
semi-colon.
All right, so now let me go
ahead and generate a bar,
like a bar chart of this.
I'm going to use what
we'll call ASCII art.
ASCII, of course, is just text, recall--
very simple text in a computer.
And I can kind of make a bar chart
pretty simply by just printing out
like a bunch of hashes
horizontally, so a short bar
will represent a small number, and a
long bar will represent a big number.
So let me go ahead and say to the
user, all right, here's your Score 1.
I'm going to go ahead, then,
and say, for int i get 0.
I is less than Score 1, i++.
And now if I scroll down and
give myself a bit of room here,
let me go ahead and implement
just a simple print.
So go ahead and print out a hash, and
then when you're all done with that,
print out a new line at
the end of that loop.
And let's just pause there.
Just to recap, I've asked
the human for three scores.
I'm only doing something with one
of them at the moment, so in fact,
just as a quick check, let me delete
those so as to not get ahead of myself.
Let me do make score 0.
Cross my fingers.
OK, no errors.
Now let me go ahead and do ./score0,
and your first score on a pset this year
out of 100 has been?
OK, 100.
And good job.
So it's a really long bar,
and if we count those up,
hopefully, there's actually 100 bars.
And if we run it again and
say, eh, it didn't go so well.
I got a 50.
That's half as big a bar.
So it seems like we're on
our way correctness-wise.
So now let me go ahead
and get the other scores.
Well, I had them here a moment ago.
So let me go ahead and just, well,
copy, paste, and change this to two,
change this to three, change
this to three, this to three.
All right, I know how to print
bars clearly, so let me go ahead
and do this, and then do this,
and then fix the indentation.
I don't want to say Score 1 everywhere.
I want to say a Score 2, Score 2.
I mean you're probably being
rubbed the wrong way that this
is both tedious and sloppy, and why?
What am I doing poorly now design-wise?
AUDIENCE: Copying and pasting code.
DAVID J. MALAN: Like copy-pasting
almost always bad, right?
There's redundancy
here, but that's fine.
Let's prioritize correctness,
at least, for now.
So let me go ahead and make Score 0.
All right, no mistakes-- ./score0.
And then Tab it.
Let me go ahead now and run--
OK, we got 100 the first time.
We got 50 the--
oh, that's a bug.
What did I do there?
See, this is what happens
when you copy-paste.
So let's fix this.
That should say Score 2, so
Control+C will quit a program.
Make score 0 will
recreate it. ./0, Enter--
all right, here we go.
100, 50.
Let's split the difference--
75.
All right, so this is a simple
bar chart horizontally drawn
of each of my three scores, where this
is 100, this is 50, and this is 75.
But there's opportunities
for improvement here.
So one, it rubbed some
folks the wrong way
already that we were literally
copying and pasting code.
So where is one opportunity
for improvement here?
What should I do instead of copying
and pasting that code again and again?
What ingredient can you bring?
OK, so we can use a loop and actually
just do the same thing three times.
So let's try that.
Let me go ahead and do this.
So let's go ahead and
delete the copy-paste I did,
and let me go ahead and say, OK, well,
for int i get zero, i less than 3, i++.
Let me create a bracket.
I can highlight multiple
lines and hit Tab,
and they'll all indent for
me, which is convenient.
And can I do this now, for instance?
Say it a little louder.
AUDIENCE: If you [INAUDIBLE]
to a specific [INAUDIBLE]..
DAVID J. MALAN: Yeah,
I'm a little worried.
As you're noting here, we're using on
line 13 here the same variable, so mm.
So it's good instincts,
but I feel like the fact
that this program,
unlike last week, we're
now collecting multiple pieces of data.
Loops are breaking down for us.
Yeah.
AUDIENCE: [INAUDIBLE] function
[INAUDIBLE] takes in--
like you can have it [INAUDIBLE].
DAVID J. MALAN: OK.
AUDIENCE: So like an input of how
many scores you wanted to enter.
DAVID J. MALAN: OK.
AUDIENCE: And then [INAUDIBLE].
DAVID J. MALAN: Yeah,
we can implement another
function that factors out
some of this functionality.
Any other thoughts?
AUDIENCE: Store your scores in an array.
DAVID J. MALAN: OK, so we could
also store our scores in an array.
So let's do these in
order then, in fact.
So loops are wonderful when you
want to do something again and again
and again, but the whole
purpose of a function,
fundamentally, is to factor
out common functionality.
And there might still be
a loop in the solution,
but the real fundamental problem
with what I was doing a moment ago
was I was copying and
pasting functionality--
shouldn't need to do that,
because in both C and Scratch,
we had the ability to
make our own functions.
So let's do that.
Let me undo my loop changes
here, just to get us
back to where we were a moment ago.
And let me go ahead and, instead,
clean this up a little bit.
Let me go ahead and
create a new function
down here that I'm going
to call, say, Chart, just
to create a chart for myself.
And it's going to take as input a score,
but I could call this anything I want.
It's void as its return type, because
I don't need it to hand me something
back.
Like I'm not getting a
string from the user.
I'm just printing a char.
It's a so-called side effect or output.
Now I'm going to go ahead and
do my loop here for int i get 0.
I is less than--
how many hashes do I want to print if
I'm being passed in the user score?
Like, is this 3 here?
AUDIENCE: The score.
DAVID J. MALAN: The
score, so if I'm being
handed a number that's 0 to 100,
that's what I want to iterate over.
If my goal here, ultimately--
let me finish this thought-- i++ is
[? 2 ?] inside this loop print out one
hash per point in 1's total score.
And just to keep things clean,
I'm going to go ahead and put
a new line at the very end of this.
But I think now, I factored out
a good amount of the redundancy.
It's not everything,
but I've at least now
given myself a function called Chart.
So up here, it looks like I can
kind of remove this loop, which
is what I factored out.
That's almost identical, except
the variable name was hardcoded.
And I think I could
now do chart like this,
and then I maybe could do a little
copy-paste, if that's OK, like if maybe
I can get away with just doing this,
and then say 2, and then say 3,
and then say 3, and then say 2.
So it's still copy-paste, but it's less.
And it looks better.
It literally fits on the screen, so it's
progress-- not perfect, but progress.
Better design, but not perfect.
So is this going to compile?
I'm going to have errors why?
AUDIENCE: Essentially, it's
[INAUDIBLE] the program [INAUDIBLE]..
DAVID J. MALAN: OK.
Yeah.
AUDIENCE: We need to
declare a [INAUDIBLE]..
DAVID J. MALAN: OK, good.
So let me induce the actual error, just
so we know what problem we're solving.
Let me go ahead and sort
of innocently go ahead
and compile Score 0 hoping
all is well, but of course,
it's not because of a
familiar error up here.
So notice, implicit declaration of
function chart is invalid in C99.
So again, implicit
declaration of function
just tends to mean Clang does not
know what you're talking about.
And you could run help50,
and it would probably
provide you with similar advice.
But the gist of this is that
chart is not a C function.
It doesn't come with C. I wrote it.
I just wrote it a little too late.
So one solution that we
didn't used last week
would be, OK, well, if you don't
know what chart is, let me just
go put it where you'll know about it.
And now run make score 0.
OK, problem solved.
So that fixes it, but we fixed
it in a different way last week.
And why might we want to stick
with last week's approach
and not just copy-paste
my function and put it
at the top instead of the bottom?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Yeah, I mean it's
kind of a minor concern at the moment,
because this is a pretty short program.
But I'm pushing the main part of
my program, literally called Main,
farther and farther down.
And the whole point of reading code
is to understand what it's doing.
So if I open this file, and I have to
scroll, scroll, scroll, scroll, scroll,
just looking for the main
function, it's just bad style.
It's just kind of nice, and
it's a good human convention.
Put the main code, the main function,
when green flag clicks equivalent,
at the very top.
So C does offer us a solution here.
You just have to provide
it with a little hint.
Let me go ahead and cut this from here,
put it back down at the bottom here,
and then go ahead and copy-paste
only or retype only the value--
whoops-- the value of that first line,
which is its so-called prototype.
Give Clang enough information so that
it knows what arguments the function
takes, what its return type is,
and what its name is, semi-colon,
and that's the so-called
declaration or--
and then implement it with the curly
braces and all the logic down below.
So let's go ahead and run this.
And if I scroll up here,
we'll see-- whoops.
We'll see make score 0.
All right, now we're
on our way, score 0.
Enter.
Score 1 is 100, 50, 75, and now we
seem to have some good functionality.
But there's still an opportunity,
I dare say, for improvement.
And I think the fundamental
problem is that I'm still
copy-pasting the little
stuff, but I think
the fundamental problem is that I
don't have the expressiveness to store
multiple values, unless I, in
advance, as the programmer,
give them all unique names, because if
I use the same variable for everything,
I couldn't collect all
three variables at the top,
and then iterate over all three at the
bottom, if I only have one variable.
So I do need three variables,
but this doesn't scale very well.
And who knows?
If I want to take in five scores,
10 scores, or more scores,
then I'm really copying
and pasting excessively.
So it turns out, indeed,
the answer is an array.
So an array, at the
end of the day, is just
a side effect of storing stuff in
memory back to back to back to back.
But what's powerful about this
reality of memory is the following.
I can go ahead here and in,
say, a new and more improved
version of this program, do this.
Let me go ahead and open this one, which
I wrote in advance, called scores2.c.
And in scores2.c, notice
we have the following code.
In my main function, I've got a new
feature and a new bit of syntax.
This line here that I've
highlighted says, hey, Clang,
give me a variable called
Scores of type integer,
but please give me three of them.
So the new syntax are
your square brackets,
and inside of which is the number
of variables you want of that type.
And you don't have to
give them unique names.
You literally call them
collectively, Scores,
and in English, I deliberately
chose a plural to connote as much.
This is an array of
values, not a single value.
What can I do next?
Well, here's my for loop for int
i get zero i is less than 3 i++,
and now I've solved that earlier
problem that was proposed.
Well, just put it in a loop.
Now I can, because now my variables are
not called Score 1, Score 2, Score 3,
which I literally had to hard code.
They're just called Scores, and
now that they're called Scores,
and I have this square bracket
notation, notice what I can do.
I can get an int, and I can say, give
me score%i, and plug in i plus 1.
I didn't want to say
"zero," because humans
don't count from zero in general.
So this is counting from one, two, and
three, but the computer is doing this.
So Scores is a variable.
Bracket, i, close bracket says
store the i-th value there.
So i-th is just non-English.
That means go to bracket
0, bracket 1, bracket 2.
So what this effectively means is on
the first iteration of the loop, when
i equals 0, this looks
like this, effectively.
When i then becomes 1 on the next
iteration, then you're doing this.
When i becomes 2 on the final
iteration, it looks like this.
When i becomes 3, well,
3 is not less than 3,
and so it doesn't execute again.
So by using i inside of these square
brackets, am I indexing into an array?
To index into an array means
go to a specific location,
the so-called i-th location,
but you start counting at zero.
Just to make this more
real, then, if you go back
to this picture of
your computer's memory,
this might, therefore,
be bracket i, bracket 1--
bracket 0, bracket 1, bracket 2, bracket
3, bracket 4, bracket 50, or wherever.
You can now, using square brackets,
get at any of these blocks of memory
to store values for you.
Any questions on what we've just done?
All right, then on the flip side,
we can do the exact same thing.
Now when I print my scores, I can
similarly iterate from 0 to 3,
and then print out the
scores by passing to chart
the same value, the i-th score.
Again, the only new syntax here
is variable name, square bracket,
and then a number, like 0,
1, 2, or a variable like i,
and then my chart function
down here is exactly the same.
It has no idea an array is
even involved, because I'm just
passing in one score at a time.
Now it turns out there's still one
bad design decision in this program.
There's still some redundancy,
something that I keep typing again
and again and again.
Do any values jump out
at you as repeated?
AUDIENCE: The for loop.
DAVID J. MALAN: The for loop.
OK, so I've got the for
loop in multiple places.
Sure.
And what other value seems
to be in multiple places?
It's subtle.
Total number.
Yeah, 3.
Three is in a few places.
It's up here.
It's when I declare the array
and ask myself for three scores.
It's here when I'm iterating.
It's not here, because this
is a different iteration.
That's just for the hashes.
So in, ironically, three
places, have I written 3.
So what does this mean?
Well, suppose next year you
take more tests or whatever,
and you need more scores.
You open up your program, and all right,
now I've got five scores and five--
whoops, typo already-- five,
like this kind of pattern
where you're typing the
same thing again and again.
And now the onus is
on me, the programmer,
to remember to change the same
[? damn ?] value in multiple places--
bad, bad, bad design.
You're going to miss
one of those values.
Your program's going
to get more complex.
You're going to leave one at
3 and change the other to 5,
and logical errors are
eventually going to happen.
So how do we solve this?
The function's not the solution
here, because it's not functionality.
It's just a value.
Well, we could use a variable,
but a certain type of variable.
These numbers here--
5, 5, 5 or 3, 3, 3--
are what humans generally
refer to as magic numbers.
Like they're numbers, but
they're kind of magical,
because you just arbitrarily
hardcoded them in random places.
But a better convention would be, often
as a global variable, to do this--
int, let's call it "count," equals 3.
So declare a variable of type
int that is the number of things
you want, and then type that variable
name all throughout your code
so that later on, if you ever
want to change this program,
you change it-- whoops--
in one place, and you're
done after recompiling the program.
And actually, I should do
a little better than this.
It turns out that if you know you have
a variable that you're never going
to change, because it's
not supposed to change--
it's supposed to be a constant value--
C also has a special keyword called
const, where before the data type,
you say, const int, and then the name
and then the value, and this way,
the compiler, Clang, will
make sure that you, the human,
don't screw up and accidentally try
to change the count anywhere else.
There's one other thing notable.
I also capitalize this whole
thing for some reason--
human convention.
Anytime you capitalize all of
the letters in a variable name,
the convention is that
that means it's global.
That means it's defined way up top,
and you can use it anywhere, therefore,
because it's outside all curly braces.
But it's meant to imply and
remind you that this is special.
It's not just a so-called
local variable inside
of a function or inside
of a loop or the like.
Any questions on that?
Yeah.
AUDIENCE: What is [INAUDIBLE]?
Why do you have i plus 1?
DAVID J. MALAN: Oh,
why do I have i plus 1?
Let me run this program real quick.
Why do I have i plus 1 in this
line here, is the question.
So let me go ahead and
run make scores 2--
whoops-- in my directory.
Make scores 2 ./scores2, Enter.
I wanted just the human to see
Score 1 and Score 2 and Score 3.
I didn't want him or her to see Score 0,
Score 1, Score 2, because it just looks
lame to the human.
The computer needs to
think in terms of zeros.
My humans and my users do
not, so just an aesthetic.
Other questions.
Yeah.
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Ah,
really good question.
And I actually thought
about this last night
when trying to craft this example.
Why don't I just combine
these two for loops,
because they're clearly iterating
an identical number of times?
Was this a hand or just a stretch?
No, stretch.
So this is actually deliberate.
If I combine these, what would
change logically in my program?
Yeah.
AUDIENCE: After every [INAUDIBLE]
input, you would [INAUDIBLE]..
DAVID J. MALAN: Yeah, so after
every human input of a score,
I would see that user's
chart, the row of hashes.
Then I'd ask them for another value.
They'd see the chart, another
value, and they'd see the chart.
And that's fine, if that
is the design you want.
Totally acceptable.
Totally correct.
I wanted mine to look a little more
traditional with all of the bars
together, so I effectively had
to postpone printing the hashes.
And that's why I did have
a little bit of redundancy
by getting the user's input
here and then iterating again
to actually print the user's output
as a chart, so just a design decision.
Good question.
Other questions?
All right, so what does this look like?
Actually, you know what?
I can probably do a little better.
Let me open up one final example
involving scores and this thing
called an array.
In Scores 4 here, let
me go ahead and do this.
Now I've changed my chart
function to do a little bit more,
and you might recall from week 0
and 1, we had the call function,
and we kept enhancing
it to do more and more,
like putting more and
more logic into it.
Notice this.
Chart function now takes a second
argument, which is kind of interesting.
It takes one argument,
which is a number,
and then the next argument
is an array of scores.
So long story short, if you
want to have a function that
takes as input an array,
you don't have to know
in advance how big that array is.
You should not, in fact, put a
number in between the square brackets
in this context.
But the thing is you do
need to know, at some point,
how many items are in the array.
If you've programmed in Java, took
AP CS, Java just gives you .length,
if you recall that feature of objects.
C does not have this.
Arrays do not have an inherent
length associated with them.
You have to tell everyone who
uses your array how long it is.
So even though you don't
do that syntactically here,
you literally just say, I expect
an argument called scores that
is an array per the square brackets.
You have to pass and almost
always a second variable
that is literally called
whatever you want,
but is the number of
things in that array,
because if the goal of
this function is just
to iterate over the number
of scores that are passed in,
and then iterate over the
number of points in that score
in order to print out the hashes,
you need to know this count.
So what does this function
do, just to be clear?
This iterates over the total
number of scores from 0 to count,
which is probably 3 or 5 or whatever.
This loop here, using J,
which is just a convention,
instead iterates from 0 to
whatever that i-th score is.
So this is what's convenient.
Now I've passed in the array, and I
can still get at individual values
just by using i, because I'm
on my i-th iteration here.
So you might recall this from Mario,
for instance, or any other example
in which you had nested loops--
just very conventional to use i
on the outside, j on the inside.
But again, the only point here is that
you can, indeed, pass around arrays,
even as arguments, which we'll
see why that's useful before long.
Any questions?
OK, so this was a lot, but we can
do so much more still with arrays.
It gets even more and more cool.
In fact, we'll see, in just a
bit, how arrays have actually
been with us since last week.
We just didn't quite realize it
under the hood, but let's go ahead
and take a breather, five minutes.
We'll come back and dive in.
All right.
So I know that was a
bit of a cliffhanger.
Where else could arrays
have actually been?
But, of course, this is how we
might depict it pictorially.
We called it an array, and it
turns out that last week, when
we introduced strings, strings,
sequences of characters,
are literally just an
array by another name.
A string is an array of chars, and
chars, of course, is another data type.
Now what are the actual
implications of this,
both in terms of representation,
like how a computer's representing
information, and then
fundamentally, programmatically,
what can we do when we
know all of our data
is so back to back to back or
so proximal to one another?
Well, it turns out that we can apply
this logic in a few different ways.
Let me go ahead and
open up, for instance,
an example here called String 0.
So in our code for today,
in our Source 2 folder,
let me go ahead and open up String
0, and this example looks like this.
Notice that we first, on line
9, get a string from the user.
Just say, input, please.
We store that value in a string, s,
and then we say, here comes the output.
And notice what I'm doing
in the following line.
I'm iterating over i from 0
to strlen, whatever that is.
And then in line 13, I'm printing
a character one at a time.
But notice the syntax I'm using,
which we didn't use last week.
If you have a string called
s, you can index into a string
just like it's an array, because
it, indeed, is underneath the hood.
So s bracket i, where
i starts at 0 and goes
up to whatever this value is is just
a way of getting character 0, then
character 1, then character
2, then character 3,
and so the end result is
actually going to look like this.
Let me go ahead and do, make string--
whoops-- make string 0.
Oops.
Not in the directory.
Make string 0, ./string0, Enter,
and I'll type in, say, Zamyla,
and the output now is
Z-A-M-Y-L-A. It's a little messy,
because I don't have a new line here, so
let me actually-- let's clean that up,
because this is unnecessarily sloppy.
So let me go ahead and
print out a new line.
Let me recompile with
make string 0, dot--
whoops-- ./string0.
Input shall be Zamyla,
Enter, and now Z-A-M-Y-L-A.
So why is that happening?
Well, if I scroll down on
this code, it seems that I am,
via this printf line here, just getting
the i-th character of the name in s,
and then printing out one
character at a time per the %c,
followed by a new line.
So you might guess, what is
this function here doing?
Strlen-- slightly abbreviated, but
you can, perhaps, glean what it means.
Yeah, so it's actually string length.
So it turns out there
is a function that comes
with C called strlen, and
humans back in the day
and to this day like to type as
few characters when possible.
And so strlen is string length,
and the way you use it is you
just need one more header file.
So there's another library,
the so-called string
library that gives you
string-related functions
beyond what CS50's library provides.
And so if you include
string.h, that gives you access
to another function called
strlen, that if you pass it,
a variable containing a
string, it will pass you back
as a return value the
total number of characters.
So I typed in Z-A-M-Y-L-A, and so
that should be returning to me six,
thereby printing out the six
characters in Zamyla's name.
Yeah.
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Uh-huh.
AUDIENCE: [INAUDIBLE] useful to get
the individual digits [INAUDIBLE]..
DAVID J. MALAN: Really good question.
In the credit problem of the problem
set, would this have been useful?
Yes, absolutely.
But recall that in the credit
pset, we encourage you to actually
take in the number as a long,
so as an integral value, which
thereby necessitated arithmetic.
But yes, if you had, instead, in
a problem involving credit card
numbers, gotten the human's input
as a long string of characters
and not as an actual number
like an int or a long,
then, yes, you could actually get
at those individual characters,
which probably would have made
things even easier but deliberate.
Yeah.
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Really good question.
If we're defining string in CS50,
are we redefining it in string?
No.
So string, even though
it's named string.h,
doesn't actually define
something called a string.
It just has string-related functions.
More on that soon.
Yeah.
AUDIENCE: [INAUDIBLE]
individual values [INAUDIBLE]??
DAVID J. MALAN: Ah,
really good question.
Could you edit the individual values?
So short answer, yes.
We could absolutely change values, and
we'll soon do that in another context.
Other questions?
All right, so turns out this
is correct, if my goal is
to print out all of the
characters in Zamyla's name,
but it's not the best design.
And this one's a little subtle, but
this is, again, what we mean by design.
And to a question that
came up during the break,
did we expect everyone to be writing
good style and good design last week?
No.
Up until today, like we've
introduced the notion of correctness
in both Scratch and in C
last week, but now we're
introducing these other
axes of quality of code
like design, how well-designed
it is, and how pretty
does it look in the context of style.
So expectations are here on out meant to
be aligned with those characteristics,
but not in the past.
So there's a slight inefficiency here.
So on the first iteration of this
loop, I first initialize i to 0,
and then I check if i less than the
length of the string, which hopefully,
it is, if it's Zamyla,
which is longer than 0.
Then I print the i-th character.
Then I increment i.
Then I check this condition.
Then I print the i-th character.
Then I increment i.
Then I check this
condition and so forth.
We looped through loops last week,
and you've used them, perhaps,
by now in problems.
What question am I redundantly
asking seemingly unnecessarily?
I have to check a
condition again and again,
because i is getting incremented.
But there's another
other question that I
don't need to keep asking again
just to get the same answer.
AUDIENCE: What is the
length [? of the string? ?]
DAVID J. MALAN: Yeah,
there's this function call
in my loop of strlen s, which is fine.
This is correct.
I'm checking the length of the
string, but once I type in Zamyla,
her name is not changing in length.
I'm incrementing i, so I'm moving
in the string, if you will.
But the string itself,
Z-A-M-Y-L-A, is not changing.
So why am I asking the computer, again
and again, get me the strlen of s,
get me the strlen of s,
get me the strlen of s.
So I can actually fix this.
I can improve the design, because
that must take some amount of time.
Maybe it's fast, but it's still
a non-zero amount of time.
So you know what I could do?
I could do something like this--
int n get string length of s.
And now just do this.
This would be better design, because
now I'm only asking the question once
of the function.
I'm remembering or caching, if
you will, the answer, and then
I'm just using a variable.
And just comparing
variables is just faster
than comparing a variable against
a function, which has to be called,
which has to return a value,
which you can then compare.
But honestly, it doesn't
have to be this verbose.
We can actually be a
little elegant about this.
If you're using a loop,
a secret feature of loops
is that you can have commas
after declaring variables.
And you can actually do this and make
this even more elegant, if you will,
or more confusing-looking,
depending on your perspective.
But this now does the same thing
but declares n inside of the loop,
just like I'm declaring i, and
it's just a little tighter.
It's one fewer lines of code.
Any questions, then?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Good question.
In the way I've just done it cannot
reuse this outside of the curly braces.
The scope of i and n exists
only in this context right now.
The other way, yes.
I could have used it elsewhere.
AUDIENCE: What if you [INAUDIBLE] other
loops, and you also had [INAUDIBLE]??
DAVID J. MALAN: Absolutely.
AUDIENCE: Using different
letters of the alphabet,
you could just use n
and not be [INAUDIBLE]..
DAVID J. MALAN: Correct.
If I want to use the length
of s again, absolutely.
I can declare the variable, as I
did earlier, outside of the loop,
so as to reuse it.
That's totally fine.
Yes.
And even i-- i exists only inside of
this loop, so if I have another loop,
I can reuse i, and it's a different
i, because these variables only
exist inside the for loop
in which they're declared.
So it turns out that these strings
don't have anything in them
other than character after
character after character.
And in fact, let me
go ahead here and draw
a picture of what's actually going on
underneath the hood of the computer
here.
So when I type in Zamyla's
name, I'm, of course,
doing something like Z-A-M-Y-L-A.
But where is that actually going?
Well, we know now that inside of
your computer is RAM or memory,
and you can think of it like a grid.
And honestly, I can think
of this whole screen
as just being in a different
orientation, a grid of memory.
So for instance, maybe we can divide
it into rows and columns like this, not
necessarily to scale, and
there's more rows and columns.
So on the screen here,
I'm just dividing things
into the individual bytes of
memory that we saw a moment ago.
And so, indeed, underneath the hood of
the computer is this layout of memory.
The compiler has somehow figured out
or the program has somehow figured out
where to put the z and where the a and
the m and the y and the l and the a,
but the key is that they're all
contiguous, back to back to back.
But the catch is if I'm typing other
words into my program or scores
into my program or any
data into my program,
it's going to end up elsewhere
in the computer's memory.
So how do you know where
Zamyla begins and where
Zamyla ends, so to speak, in memory?
Well, the variable, called
s, essentially is here.
There's some remembrance in
the computer of where s begins.
But there's no obvious way
to know where Zamyla ends,
unless we ourselves tell the computer.
So unbeknownst to us, any time a
computer is storing a string like
Z-A-M-Y-L-A, it turns out that it's
not using one, two, three, four, five,
six characters.
It's actually using seven secretly.
It's actually putting
a special character
of all zeros in the very last bytes.
Every byte is eight bits, so it's
putting secretly eight zeros there,
or we can actually draw this
more conventionally as /0.
It's what's called the null character,
and it just means all zeros.
So the length of the
string, Zamyla, is six,
but how many bytes does it
apparently take up, just to be clear?
So it actually takes up seven.
And this is kind of a
secret implementation detail
that we don't really have to care
about, but eventually, we will,
because if we want to implement
certain functionality,
we're going to need to know
what is actually going on.
So for instance, let me
go ahead and do this.
Let me go ahead and create a
program called strlen itself.
So this is not a function but
a program called strlen.c.
Let me go ahead and include
the CS50 library at the top.
Let me go ahead and include stdio.h.
Let me go ahead and type out main
void, so all this is same as always.
And then let me go ahead and prompt
the user for, say, his or her name,
like so.
And then you know what?
Let me actually, this time,
not just print their name out,
because we've done that ad nauseam.
Let's just count the number
of letters in his or her name.
So how could we do that?
Well, we could just do this-- int
n get strlen of s, and then say,
printf "The length of your name is %i."
And then we can plug
in n, because that's
the number we stored the length in.
But to use strlen, I have
to include what header file?
String.h, which is the
new one, so string.h.
And now if I type this all correctly,
make strlen, make strlen, good.
./strlen-- let's try it--
Zamyla.
Enter.
OK, the length of her name is six.
But what is strlen doing?
Well, strlen is just an abstraction
for us that someone else wrote,
and it's wonderfully convenient, but
you know, we don't strictly need it.
I can actually do this myself.
If I understand what
the computer is doing,
I can implement this same
functionality myself as follows.
I can declare a variable called
n and initialize it to 0,
and then you know what?
I'm going to go ahead and do this.
While s bracket n does
not equal all zeros,
but you don't write all zeros like this.
You literally do this--
that /0 to which I referred
earlier in single quotes.
That just means all zeros in the bytes.
And now I can go ahead and do n++.
If I'm familiar with what
this means, remember,
that this is just n equals n plus 1, but
it's just a little more compact to say,
n++.
And then I can print
out the name of your n--
the name of your n--
the name of-- the length of
your name is %i, plugging in n.
So why does this work?
It's a little funky-looking,
but this is just
demonstrating an understanding
of what's going on
underneath the proverbial hood.
If n is initialized to zero,
and I look at s bracket n,
well, that's like
looking at s bracket 0.
And if the string, s, is
Zamyla, what is s bracket 0?
Z. And then it does not equal /0.
It equals z, obviously.
So we increment n.
So now n is 1.
Now n is 1.
So what is s bracket 1 in Zamyla's name?
A and so forth, and we get to
Z-A-M-Y-L-A, then all zeros,
the so-called null character, or /0.
That, of course, does equal
/0, so the loop stops,
thereby leaving the total count or
value of n at what it previously was,
which was 6.
So that's it.
Like all underneath
the hood, all we have
is memory laid out like this,
top to bottom, left to right,
and yet all of the functionality
we've been using for a week now
and henceforth just boils down to
some relatively simple primitives,
and if you understand
those primitives, you
can do anything you want using
the computer, both computationally
code-wise, but also memory-wise.
We can actually see, in fact, some of
the stuff we looked at two weeks ago as
follows.
Let me go ahead and open up
an example called ASCII 0.
Recall that ASCII is the mapping between
letters and numbers in a computer.
And notice what this
program's going to do.
Make-- let me go into this folder.
Make ascii0, ./ascii0, Enter.
The string shall be,
let's say, Zamyla, Enter.
Well, it turns out that
if you actually look up
the ASCII code for Zamyla's name, z
is 90, lowercase a is 97, m is 109,
and so forth.
There are those characters,
and actually, we
can play the same game we did last week.
If I do this again on "hi," there's
your 72, and there's your 73.
Where is this coming from?
Well, now that I know how to
manipulate individual strings,
notice what I can do.
I can get a string from the
user, just as we always have.
I can iterate over the length of
that string, albeit inefficiently
using strlen here.
And then notice this new feature today.
I can now convert one data type
to another, because a char,
a character is just eight bits, but
presented in the context of characters.
Bytes is also just eight bits that you
could treat as an integer, a number.
It's totally context-sensitive.
If you use Photoshop, it's a graphic.
If you use a text program,
it's a message and so forth.
So you can encode--
change the context.
So notice here, s bracket i is, of
course, the i-th character of Zamyla's
name, so Z or A or M or whatever.
But I can convert that i-th character to
an integer doing what's called casting.
You can literally, in
parentheses, specify the data type
you want to convert one
data type to, and then
store it in exactly that data type.
So s bracket i-- convert it to a number.
Then store it in an actual number
variable, so I can print out its value.
So c-- this is show me the character.
Show me the letter as by plugging in
the character, and then the letter--
sorry, the character and the number
that I've just converted it to.
And you don't actually
even have to be explicit.
This is called explicit casting.
Technically, we can do
this implicitly, too.
And the computer knows that
numbers are characters,
and characters are a number.
You don't have to be
so pedantic and even do
the explicit casting in parentheses.
You can just do it implicitly with data
types, and honestly, at this point,
I don't even need the variable.
I can get rid of this, and
down here, I can literally just
print the same thing
twice, but tell printf
to print the first in the
context of a character
and the second in the context of an
int, just treating the exact same bits
differently.
That's implicit casting.
And it just demonstrates
what we did in week 0
when we claimed that
letters are numbers,
and numbers can also be colors, and
colors can be images, and so forth.
Is this a question?
AUDIENCE: Would've
been useful for credit.
DAVID J. MALAN: Also, yes.
It all comes back to credit.
Yeah.
Indeed.
Other questions?
No.
All right, so what else can we
actually do with this appreciation?
So super simple feature that all
of us surely take for granted,
if we even use it anymore these days.
Google Docs, Microsoft Word,
and such can automatically
capitalize words for you these days.
I mean your phone can do it nowadays.
They just sort of
AutoCorrect your messages.
Well, how is that actually working?
Well, once you know that a string
is just a bunch of characters
back to back to back, and you know
that these characters have numbers
representing them, and like capital A is
65, and lowercase A is 97, apparently,
and so forth, we can
leverage these patterns.
If I go ahead and open
up this other example
here called Capitalize 0,
notice what this program is
going to do for me first by running it.
Make capitalize 0 ./capitalize0.
Let me go ahead and type in Zamyla's
name just as before, but now
it's all capital.
So this is a little extreme.
Hopefully, your phone is not
capitalizing every letter,
but you can imagine it capitalizing
just the first, if you wanted it.
So how does this work?
Well, let me go ahead and
open up this example here.
And so what we did--
so here, I'm getting a string from
the user, just as we always do.
Then I'm saying, after, just to
kind of format the output nicely.
Here, I'm doing a loop pretty
efficiently from i equals 0 up
to the length of the string.
And now notice this neat
application of logic.
It's a little cryptic,
certainly, at first glance.
But whoops.
And now it's gone.
And what am I doing exactly
with these lines of code?
Well, with every iteration of this
loop, I'm asking the question,
is the i-th character of s,
so the current character,
is it greater than or equal to
lowercase A, and is it less than
or equal to lowercase Z?
Put another way, how do you say
that more colloquially in English?
Is it lowercase, literally.
But this is the more programmatic
way of expressing, is it lowercase?
All right, if it is,
go ahead and do this.
Now this is a little funky, but
print out a character, specifically
the i-th character, but subtract
from that lowercase letter whatever
the difference is between little A and
big A. Now where did that come from?
So it turns out--
OK, capital A is 65.
Lowercase A is 97.
So the difference between those is 32.
And that's true for B, so capital
B is 66, and lowercase B is 98.
Still 32, and it repeats
for the whole alphabet.
So I could just do this.
If I know that lowercase letters
have bigger numbers, like 97, 98,
and I know that lowercase numbers
have lower letters, like 65, 66,
I can just literally subtract
off 32 from my lowercase letters.
As you point out, it's
a lowercase letter.
Subtract 32, and that
gives us what result?
The capitalized version.
It uppercases things for us.
But honestly, this feels a little
hackish that, like, OK, yes,
I can do the math correctly,
but you know what?
It's better practice, generally,
to abstract this away.
Don't get into the weeds of counting
how many characters are away
from each other.
Math is cheap and easy in the computer.
Let it do the math for you by
subtracting whatever the value of A
is, of capital A is from the value of
lowercase A. Or we could just write 32.
Otherwise, go ahead and just
print the character unchanged.
So in this case, the A-M-Y-L-A
in Zamyla's name got uppercased,
and everything else,
the Z, got left alone,
just by understanding what's going on
with how the computer's represented.
But honestly, God, I don't want
to keep writing code like this.
Like, I'm never going to get this.
I'm new to programming, perhaps.
I'm never going to get this sort of
sequence of all the cryptic symbols
together, and that's OK, because we can
actually implement this same program
a little more easily,
thanks to functions
and abstractions that
others have written for us.
So in this program,
turns out I can simplify
the questions I'm asking by literally
calling a function that says, is lower.
And there's another
one called, is upper,
and there's bunches of others
that just literally are called,
is something or other.
So is lower takes an argument
like the i-th character of s,
and it just returns a
bull-- true or false.
How is it implemented?
Well, honestly, if we looked at the
code that someone else wrote decades ago
for is upper, odds are-- or is lower--
odds are he or she wrote code
that looks almost like this.
But we don't need to worry
about that level of detail.
We can just use his or her
function, but how do we do that?
Turns out that this function--
and you would only know this
by having been told or Googling
or reading a reference--
is in a library called ctype.h.
And you need the header file
called ctype.h in order to use it.
And we'll almost always point you
to references and documentation
to explain that to you.
Toupper is another feature, right?
This math-- like, my god.
I just want to uppercase a letter.
I don't want to really keep thinking
about how far apart uppercase letters
are from lowercase.
Turns out that in the
C type library, there's
another function called toupper that
literally does the exact same thing
in the previous program we wrote.
And so that, too, is OK.
But you know what?
This feels a little verbose.
It would be nice if I could
really tighten this program up.
So how those toupper work?
Well, it turns out some of you might
be familiar with CS50 Reference
Online, our web-based
app that we have that
helps you navigate
available functions in C.
Turns out that all of the
data for that application
comes from an older command
line program that comes in Linux
and comes in the sandbox
called Man for manual.
And anytime you type "man" at the
command prompt, and then the name
of a function you're
interested in, if it exists,
it will tell you a little
something about it.
So if I go to toupper, man toupper,
I get slightly cryptic documentation
here.
But notice, toupper and
some other functions
convert uppercase or lowercase.
That's the summary.
Notice that in the synopsis,
the man page, so to speak,
is telling me what header
file I have to include.
Notice that under Synopsis,
it's also telling me
what the signature or
prototype is of the function.
In other words, the documentation in
Man, the Linux programmer's manual,
is very terse.
So it's not going to hold your hand
in this black and white format.
It's just going to
convey, well, implicitly,
you better put this on top of your file.
And by the way, this is
how you use the function.
It takes an argument called C,
returns a value of type int.
Why is it int?
Let me wave my hands at that.
It effectively returns a
character for our purposes today.
And if we scroll down, OK, description.
Ugh, I don't really want to read
all of this, but OK, here we go.
If c is a lowercase letter, toupper
returns its uppercase equivalent,
if an uppercase representation
exists in the current locale.
That just means if it's punctuation,
it's not going to do anything.
Otherwise, it returns C, And
that's kind of the key detail.
If I pass it lowercase A, it's
going to give me capital A,
but if I pass it capital A,
what's it going to give me?
AUDIENCE: Capital A.
DAVID J. MALAN: Also, capital A. It
returns the original character, c.
That's the only detail I cared about.
When in doubt, read the manual.
And it might be a
little cryptic, and this
is why CS50 Reference takes
somewhat cryptic documentation
and tries to simplify it into
more human-friendly terms.
But at the end of the day, these
are the authoritative answers.
And if I or one of the staff
don't know, we literally
pull up the Man page or CS50 Reference
to answer these kinds of questions.
Now what's the implication?
I don't need any of this.
I can literally get rid of
the condition and just let
toupper do all of the
legwork, and now my program
is so much more compact than
the previous versions were,
because I've read the documentation.
I know what the function does, and I
can let toupper uppercase something
or just pass it through unchanged.
We can better design, because we're
writing fewer lines of code that
are just as clear, and so we can
now actually tighten things up.
Any questions on this
particular approach?
All right.
So we're getting very low level.
Now let's make these things more
useful, because clearly, other people
have solved some of
these problems for us,
as by having these functions and the
C type library and the string library.
What more is there?
Well, recall that every time
we run Clang, or even run make,
we're typing multiple words
at the command prompt.
You're typing make hello or
make Mario, a second word,
or you're typing
clang-o, hello, hello.c,
like lots of words at the prompt.
Well, it turns out that all
this time, you're using, indeed,
command line arguments.
But in C, you can write programs that
also accept words and numbers when
the user runs the program.
Think back, after all.
When you ran Mario,
you did ./mario, Enter.
You couldn't type any
more words at the prompt.
When you did credit,
you did ./credit, Enter.
No more words at the prompt.
You used get string or get
long to get more input, but not
at the command line.
And it turns out that we
can, relatively simply, in C,
but it's a little
cryptic at first glance.
Let me go ahead and--
let me go ahead and, here, pull up this
signature here, which looks like this.
This is the function that we're all used
to by now for writing a main function.
And up until now, we've said void.
Main doesn't take any inputs,
and indeed, it just runs.
But it turns out if you change your
existing programs or future programs,
not to say void, but to
say, int argc, string argv,
it's a little cryptic at first glance.
But what's a recognizable symbol now?
Yeah, there's brackets here.
So it turns out that every
time you write a program,
if you don't just say void, you
actually enable this feature
by writing int argc, string argv.
You can actually tell
Clang, you know what?
I want this program to accept one or
more words or numbers after the name
of the program, so I can do
./hellodavid, or ./hellozamyla.
I don't have to wait for the
program to be running to use string.
And just as with the earlier example,
where you were able to chart an array,
main is defined as taking an array,
called argv historical reasons--
argument vector.
Vector means array.
Argument vector, bracket, closed
bracket just means this is--
this contains one or more words,
each of which is a string.
Argc is argument count,
so this is the variable
that main gets access to that
tells it how many arguments,
how many strings are actually in argv.
So how can we use this in a useful way?
Well, let me go ahead here
and open up the sandbox.
And let me go ahead and create a new
file called, say, argv0, argv0.c--
again, argument vector, just
list or array of arguments.
And let me go ahead and, as usual,
include cs50.h, include stdio.h,
and then int main not void,
but int argc, string argv--
argv-- open bracket, closed bracket.
And even if that doesn't come
naturally at first, it will eventually.
And I'm going to do this.
If the number of arguments
passed in equals 2,
then I'm going to go ahead and do
this-- printf, hello %s, comma,
and here in the past, I've
typed a variable name.
And I now actually have
access to a variable.
Go ahead and do argv bracket 1.
Else, if the user does not
type, apparently, two words,
let me go ahead and just by default,
say, hello world, as we always have.
Now why-- what is this doing,
and how is it doing it?
Well, let's quickly run it.
So make-- whoops.
Make argv0, ./argv0, Enter, Hello World.
But if I do Hello--
or dot-- the program
would be better named
if we called it Hello,
but Zamyla, Enter.
Hello Zamyla.
If I change it to David,
now I have access to David.
If I had David Malan, no.
It doesn't support that.
So what's going on?
If you change main in
any program write to take
these two arguments, argc
and argv of type string int
and then an array of strings,
argc tells you how many words
were typed at the prompt.
So if the human typed
two words, I presume
the first word is the name of
the program, dot slash argv0,
the second word is presumably my
name, if he or she is actually
providing their name at the prompt.
And so I print out argv bracket 1.
Not 0 because that's the name of
the program, but argv bracket 1.
Else, down here, if the human doesn't
provide just Zamyla, or just David,
or just one word more generally, I
just print the default, "Hello world."
But what's neat about this now is
notice that argv is an array of strings.
What is a string?
It's an array of characters.
And so let's enter just one last piece
of syntax that gets kind of powerful
here.
Let me go ahead and do this.
Let me go ahead and, in a
new file here, argv 1 dot c.
Let me go ahead and paste this in.
Close this.
Let me go ahead and do this.
Rather than do this logical
checking, let me do this, for--
let's say for int, i get 0.
i is less than argc--
i++.
Let's go ahead and, one per
line, print out every word
that the human just
typed, just to reinforce
that this is indeed what's going on.
So argv bracket 0, save.
Make argv 1, enter.
And now let's go ahead
and run this program--
dot slash, argv 1, David Malan.
OK, you see all three words.
If we change it to Zamyla,
we see just those two words.
If we change it to Zamyla
Chan, we see those three words.
So we clearly have access to
all of the words in the array,
but let's take this one step further.
Rather than just print out every word
in a string, let's go ahead and do this.
For intj get 0.
n equals the string length of
the current argument, like this--
j is less than n, j++--
oops, oops, oops-- j++.
Now let me go ahead and print out not
the full string, but let me do-- oops,
oops-- let me go ahead
and print out this--
not a string, but a character, n
bracket i bracket j, like this.
All right.
So what's going on?
One, this outer loop, and let's comment
it, iterate over strings in argv.
This inner loop, iterate
over chars in argv bracket i.
So the outer loop iterates over
all of the strings in argv.
And the inner loop, using a
different variable, starting at 0,
iterates over all of the
characters in the ith
argument, which itself is a string.
So we can call string length on it.
And then we do this up until n,
which is the length of that string.
And then we print out each character.
So just to be clear-- when I run
arv1 and correct it, at first glance,
why it's implicitly declaring library
function sterling, what's almost always
the solution when you do this wrong?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
So I forgot this, so include
string.h and help50 would
help with that as well.
Let's recompile with make argv1.
All right.
When I run argv1, of, say, Zamyla
Chan, what am I going to see?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
Is that the right intuition?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: I'm going
to see Zamyla Chan, but--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: One character on each
line, including the program's name.
So in fact, let me scroll this
up so it's a little bigger.
Enter.
OK, it's a little stupid, the program,
but it does confirm that using arrays
do I have access not only to
the words, but I can kind of
have the second dimension.
And within each word, I can
get at each character within.
And we do this, again, just by using
not just single square brackets,
but double.
And again, just break this
down into the first principles.
What is this first bracket?
This is the ith argument,
the ith string in the array.
And then if you take it
further, with bracket j,
that gives you the j
character inside of this.
Now, who cares about any of
this kind of functionality?
Well, let me scroll back and
propose one application here.
So recall that CS is really
just problem solving.
But suppose the problem
that you want to solve
is to actually pass a
secret message in class
or send someone a secret
for whatever reason.
Well, the input to that
problem is generally
called plain test, a message you
want to send to that other person.
You ideally want ciphertext
to emerge from it,
which is enciphered and scrambled,
somehow encrypted information
so that anyone in the room, like the
teacher, can't just grab the note
and read what you're sending to your
secret crush or love across the room,
or in any other context as well.
But the problem is that if the
message you want to send, say,
is our old friend Hi!,
with an exclamation point,
you can encode it in certain
contexts as just 72, 73, 33.
And I daresay most classes on campus
if you wrote on a piece of paper 72,
73, 33, passed it through the room,
and whatever professor intercepts it,
they're not going to know
what you're saying anyway.
But this is not a good system.
This is not a cryptosystem.
Why?
It's not secure.
[INAUDIBLE]
[INTERPOSING VOICES]
DAVID J. MALAN: Yeah.
Anyone has access to
this, right, so long
as you attend like week 1
or 0 of CS50, or you just
have general familiarity with Ascii.
Like this is just a code.
I mean Ascii is a system
that maps letters to numbers.
And anyone else who
knows this code obviously
knows what your message is,
because it's not a unique secret
to you and the recipient.
So that's probably not the best idea.
Well, you can be a little
more sophisticated.
And this is back--
actually, a photograph
from World War I of a message that
was sent from Germany to Mexico
that was encoded in a very similar way.
It wasn't using Ascii.
The numbers, as you can
perhaps glean from the photo,
are actually much larger.
But in this system, in a militaristic
context, there was a code book.
So similar in spirit to
Ascii, where you have
a column of numbers and a column of
letters to which they correspond,
a codebook more generally has
like numbers, and then maybe
even letters or whole words
that they correspond to,
sometimes thousands of them, like
literally a really big book of codes.
And so long as only, in this context
the Germans and the recipients,
the Mexicans, had access
to that same book,
only they could encrypt and decrypt, or
rather encode and decode information.
Of course, in this
very specific context--
you can read more about
this in historical texts--
this was intercepted.
This message, seemingly
innocuous, though definitely
suspicious looking
with all these numbers,
so therefore not innocuous, the British,
in this case actually, intercepted it.
And thanks to a lot of
efforts and cryptanalysis,
the Bletchley Park style code
breaking, albeit further back,
were they able to figure out what
those numbers represented in words
and actually decode the message.
And in fact, here's a
photograph of some of the words
that were translated
from one to the other.
But more on that in any
online or textual references.
Turns out in this poem too
there was a similar code, right?
So apropos of being in Boston
here, you might recall this one.
"Listen my children, and you shall hear
of the midnight ride of Paul Revere.
On the 18th of April
in '75, hardly a man
is now alive who remembers
that famous day and year.
He said to his friend, if the
British march by land or sea
from the town tonight
night, hang a lantern
aloft in the belfry arch of the
North Church tower as a signal light,
one if by land, and two if by sea.
And I on the opposite shore
will be ready to ride and spread
the alarm through every Middlesex
village and farm for the country folk
to be up and to arm."
So it turns out some of that is
not actually factually correct,
but the one if by land and
the two if by sea code were
sort of an example of a one-time code.
Because if the revolutionaries in
the American Revolution kind of
decided secretly among themselves
literally that-- we will put up one
light at the top of a church if
the British are coming by land.
And we will instead use two if the
British are instead coming by sea.
Like that is a code.
And you could write it down in a
book, unless you have a code book.
But of course, as soon as
someone figures out that pattern,
it's compromised.
And so code books tend not to
be the most robust mechanisms
for encoding information.
Instead, it's better to use
something more algorithmic.
And wonderfully, in computer
science is this black box
to-- we keep saying,
the home of algorithms.
And in general, encryption is a
problem with inputs and outputs,
but we just need one more input.
The input is what's generally
called the key, or a secret.
And a secret might just be a number.
So for instance, if I
wanted my secret to be 1,
because we'll keep the example simple,
but it could really be any number.
And indeed, we saw with the
photograph a moment ago,
the Germans used much larger than
this, albeit in the context of codes.
Suppose that you now want to send
a more private message to someone
across the room in a
class that, I love you.
How do you go about encoding that
in a way that isn't just using Ascii
and isn't just using
some simple code book?
Well, let me propose that now that we
understand how strings are represented,
right-- we're about to make love
really, really lame and geeky--
so now that you know how to
express strings computationally,
well, let's just start
representing "I love you" in Ascii.
So I is 73.
L is 76.
O-V-E Y-O-U. That's just Ascii.
Should not send it this
way, because anyone
who knows Ascii is going
to know what you're saying.
But what if I enciphered this message,
I performed an algorithm on it?
And at its simplest,
an algorithm can just
be math-- simple
arithmetic, as we've seen.
So you know, let me just
use my secret key of 1.
And let me make sure that my crush knows
that I am using a secret value of 1.
So he or she also knows
to expect that value.
And before I send my message, I'm
going to add 1 to every letter.
So 73 becomes 74.
76 becomes 77.
80, 87, 70, 90, 80, 86.
Now this could just
be sent in the clear.
But then, I could actually
send it as a textual message.
So let's convert it back to Ascii.
74 is now J. 77 is now M. 80 is now P.
And you can perhaps see the pattern.
This message was, I love you.
And now, all of the letters
are off by one, I think.
I became J. L became M.
O became P, and so forth.
So now the claim would
be, cryptographically, I'm
going to send this
message across the room.
And now no one who has a code book
is going to be able to solve this.
I can't just steal the
book and decode it,
because now the key is
only up here, so to speak.
It's just the number
1 that he or she and I
had to agree upon in
advance that we would
use for sending our secret messages.
So if someone captures this message,
teacher in the room or whoever,
how would they even go about
decoding this or decrypting it?
Are there any techniques
available to them?
I daresay we can kind of
chip away at this love note.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: What's that?
Guess and check.
OK, we could try all--
there still kind of some spacing.
So you know honestly, we could do
like kind of a cryptanalysis of it,
a frequency attack.
Like, I can't think of
too many words in English
that have a single letter in them.
So what does J probably represent?
[INTERPOSING VOICES]
DAVID J. MALAN: I, probably.
Maybe A, but probably I. And
there's not too many other options.
So we've attacked one part
of the message already.
I see a commonality.
There's two what in here?
Two P. And I don't necessarily
know that that maps to O, but I do
know it's the same character.
So if I kind of continue this thoughtful
process or this trial and error,
and I figure out, oh,
what if that's an O?
And then that's an O.
And then wait a minute.
They're passing from one to another.
Maybe this says, I love you.
Like you actually can,
with some probability,
decrypt a message by doing
this kind of analysis on it.
It's at least more secure
than the code book,
because you're not compromised
if the book itself is stolen.
And you can change the key
every time, so long as you
and the recipient actually
agree on something.
But at least we now have
this mechanism in place.
So with just the understanding
of what you can do with strings,
can you actually now do really
interesting domain-specific things
to them?
And in fact, back in the day, Caesar,
back in militaristic times literally
used a cipher quite like this.
And frankly, when you're the
first one to use these ciphers,
they actually are kind of secure,
even if they're relatively simple.
But hopefully, not just using a
key of 1, maybe 2, or 13, or 25,
or something larger.
But this is an example
of a substitution cipher,
or a rotational cipher where
everything's kind of rotating--
A's becoming B, B's becoming
C. Or you can kind of
rotate it even further than that.
Well, let's take a look
at one last example here
of just one other final
primitive of a feature
today, before we then go back high
level to bring everything together.
It turns out that printing
out error messages
is not the only way to signal
that something has gone wrong.
There's a new keyword, a new use
of an old keyword in this example,
that's actually a convention
for signaling errors.
So this is an example called exit.c.
It apparently wants the human to do
what, if you infer from the code?
AUDIENCE: Exit [INAUDIBLE].
DAVID J. MALAN: Yes.
Say again?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Well, it
wants the-- well, what
does it what the human to do
implicitly, based on the printf's here?
How should I run this program?
Yeah?
AUDIENCE: [INAUDIBLE]
just apply [INAUDIBLE]..
DAVID J. MALAN: Yeah.
So for whatever reason,
this program implicitly
wants me to write exactly
two words at the prompt.
Because if I don't, it's going to yell
at me, missing command line argument.
And then it's going to
return 1, whatever that is.
Otherwise, it's going to
say, Hello, such and such.
So if I actually run this program--
let me go back over
here and do make exit--
oops-- in my directory, make exit.
OK, dot slash exit, enter, I'm
missing a command line argument.
All right, let me put Zamyla's name.
Oh, Hello Zamyla.
Let me put Zamyla Chan.
Nope, missing command line argument.
It just wants the one,
so in this case here.
I'm seeing visually the error
message, but it turns out
the computer is also signaling to
me what the so-called exit code is.
So long story short, we've already
seen examples last week of how
you can have a function return a value.
And we saw how [? Erin ?]
came up on stage,
and she returned to me a piece
of paper with a string on it.
But it turns out that
main is a little special.
If main returns a value like 1
or 0, you can actually see that,
albeit in a kind of a non-obvious way.
If I run exit, and I run it
correctly with Zamyla as the name,
if I then type echo, dollar sign,
question mark, of all things,
enter, I will then see exactly what main
returned with, which in this case is 0.
Now, let me try and be uncooperative.
If I actually run just dot
slash exit, with no word,
I see, missing command line argument.
But if I do the same cryptic command,
echo, dollar sign, question mark,
I see that main exited with 1.
Now, why is this useful?
Well, as we start to write
more complicated programs,
it's going to be a convention
to exit from main by returning
a non-zero value, if
anything goes wrong.
0 happens to mean everything went well.
And in fact, in all
of the programs we've
written thus far, if you
don't mention return anything,
main automatically for you returns 0.
And it has been all this time.
It's just a feature, so you don't
have to bother typing it yourself.
But what's nice about this,
or what's real about this,
is if on your Mac or PC, if you've ever
gotten an annoying error message that
says, error negative 29, system error
has occurred, or something freezes,
but you very often see
numbers on the screen, maybe.
Like those error codes actually tend
to map to these kinds of values.
So when a human is writing
software and something goes wrong
and an error happens, they
typically return a value like this.
And the computer has access to it.
And this isn't all that useful
for the human running the program.
But as your programs
get more complex, we'll
see that this is actually quite
useful as a way of signaling
that something indeed went wrong.
Whew.
OK, that's a lot of syntax
wrapped in some loving context.
Any questions before we
look at one final domain?
No?
All right.
So it turns out that we can answer the
"who cares" question in yet another way
too.
It turns out-- let me go ahead and open
up an example of our array again here--
that arrays can actually now be used
to solve problems more algorithmically.
And this is where life
gets more interesting.
Like we were so incredibly
in the weeds today.
And as we move forward
in the class, we're
not going to spend so
much time on syntax,
and dollar signs, and question marks,
and square brackets, and the like.
That's not the interesting part.
The interesting part is when we
now have these fundamental building
blocks, like an array, with
which we can solve problems.
So it turns out that
an array, you know, you
can kind of think of it
as a series of lockers,
a series of lockers that might
look like this, inside of which
are values-- strings, or
numbers, or chars, or whatnot.
But the lockers is an apt metaphor
because a computer, unlike us humans,
can only see and do one thing at a time.
It can open one locker and look
inside, but it can't kind of
take a step back, like we humans
can, and look at all of the lockers,
even if all of the doors are open.
So it has to be a more
deliberate act than that.
So what are the actual implications?
Well, all this time--
we had that phone book
example in the first week,
and the efficiency of that algorithm, of
finding Mike Smith in this phone book,
all assumed what feature
of this phone book?
AUDIENCE: That it's
ordered alphabetically.
DAVID J. MALAN: That it
was ordered alphabetically.
And that was a huge plus, because
then I could go to the middle,
and I could go to the middle
of the middle, and so forth.
And that was an algorithmic possibility.
On our phones, if you
pull up your contacts,
you've got a list of first names, or
last names, all alphabetically sorted.
That is because, guess what
data structure or layout
your phone probably uses
to store your contacts?
It's an array of some sort, right?
It's just a list.
And it might be displayed
vertically, instead of horizontally,
as I've been drawing it today.
But it's just values that are back,
to back, to back, to back, to back,
that are actually sorted.
But how did they actually
get into that sorted order?
And how do you actually find values?
Well, let's consider what
this problem is actually
like for a computer, as follows.
Let me go ahead here.
Would a volunteer mind
joining us up here?
I can throw in a free stress ball.
OK, someone from the back?
OK, come on up here.
Come on.
What's your name?
ERIC: Eric.
DAVID J. MALAN: Aaron.
All right.
So Aaron's going to come on up.
And--
ERIC: Eric.
DAVID J. MALAN: I'm sorry?
Oh, Eric.
Nice to meet you.
All right.
Come on over here.
So Eric, now normally, I would
ask you to find the number 23.
But seeing is that's a little easy,
can you go ahead and just find us
the number 50 behind these doors,
or really these yellow lockers?
8?
Nope.
42?
Nope.
OK.
Pretty good.
That's three, three out of seven.
How did you get it so quickly?
ERIC: I guessed.
DAVID J. MALAN: OK, so he guessed.
Is that the best algorithm
that Eric could have used here?
ERIC: Probably not.
DAVID J. MALAN: Well, I don't know.
Yes?
No?
AUDIENCE: Yeah.
DAVID J. MALAN: Why?
Why yes?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: He has
no other information.
So yes, like that was
the best you can do.
But let me give you a
little more information.
You can stay here.
And let me go ahead and
reload the screen here.
And let me go ahead and pull
up a different set of doors.
And now suppose that, much like the
phone book, and much like the phones
are sorted, now these doors are sorted.
And find us the number 50.
All right.
So good.
What did you do that time?
AUDIENCE: Well, [INAUDIBLE].
It was 50 is 116.
So I just--
DAVID J. MALAN: Right.
So you jumped to the middle,
initially, and then to the right half.
And then technically-- so we're
technically off by 1, right?
Because like binary search would
have gone to the middle of the--
that's OK, but very well done to Eric.
Here, let me at least reinforce
this with a stress ball.
So thank you.
Very well done.
So with that additional
information, as you know,
Eric was able to do better because the
information was sorted on the screen.
But he only had one insight
to a locker at a time,
because only by revealing what's
inside can he actually see it.
So this seems to
suggest that once you do
have this additional information in
Eric's example, in your phone example,
in the phone book example, you open up
possibilities for much much, much more
efficient algorithms.
But to get there, we've kind of been
deferring this whole time in class
how you actually sort these elements.
And if you wouldn't mind-- and this way,
we'll hopefully end on a more energized
note here because I know we've
been in the weeds for a while--
can we get like eight volunteers?
OK, so 1, 2, 3, 4-- how about
5, 6, 7, 8, come on down.
Oh, I'm sorry.
Did I completely overlook the front row?
OK.
All right, next time.
Next time.
Come on down.
Oh, and Colton, do you mind
meeting them over there instead?
All right.
Come on up.
What's your name?
[? CAHMY: ?] [? Cahmy. ?]
DAVID J. MALAN: [? Cahmy? ?] David.
Right over there.
What's your name?
MATT: Matt.
DAVID J. MALAN: Matt?
David.
[? JUHE: ?] [? Juhe. ?]
DAVID J. MALAN: [? Juhe? ?] David.
MAX: Max.
DAVID J. MALAN: Max, nice to meet you.
JAMES: James.
DAVID J. MALAN: James, nice to see you.
Here, I'll get more chairs.
What's your name?
,PEYTON: Peyton.
DAVID J. MALAN: Peyton?
David.
And two more.
Actually can what have you
come down to this end here?
What's your name.
ANDREA: Andrea.
DAVID J. MALAN: Andrea, nice to see you.
And your name?
[? PICCO: ?] [? Picco. ?]
DAVID J. MALAN: [? Picco, ?] David.
Nice to see you.
OK, Colton has a T-shirt for each
of you, very Harvard-esque here.
And each of these shirts, as you're
about to see, has a number on it.
And that number is--
well, go ahead put them
on, if you wouldn't mind.
OK, thank you so much.
So I daresay we've arranged our humans
much like the lockers in an array.
Like we have humans back,
to back, to back, to back.
But this is actually both a
blessing and a constraint,
because we only have eight chairs.
So there's really not much room here, so
we're confined to just this space here.
And I see we have a 4,
8, 5, 2, 3, 1, 6, 7.
So this is great.
Like they are unsorted.
By definition, it's pretty random.
So that's great.
So let's just start off like this.
Sort yourselves from 1 to 8, please.
OK.
All right.
Well, what algorithm was that?
[LAUGHTER]
AUDIENCE: Look around, figure it out.
DAVID J. MALAN: Look
around, figure it out.
OK, well--
MATT: Human ingenuity.
DAVID J. MALAN: Human ingenuity?
Very well done.
So can we-- well, what
was like a thought
going through any of your minds?
MATT: Find a chair and sit down.
DAVID J. MALAN: Find the chair--
find the right chair.
So go to a location.
Good.
So like an index location, right?
Arrays have indices, so to spea--
0, 1, 2, all the way up to 7.
And even though our shirts
are numbered from 1 to 8,
you can think in terms of 0 to 7.
So that was good.
Anyone else?
Other thoughts?
[? CAHMY: ?] I mean, this is
something we implicitly think of,
but no one told us that it
was ordered right to left.
Like we could have
done it left to right.
DAVID J. MALAN: OK.
Absolutely.
Could have gone from right to
left, instead of left to right.
But at least we all
agreed on this convention
too, so that was in your mind.
OK.
So good.
So we got this sorted.
Go ahead and re-randomize
yourself, if you could.
And what algorithm was this?
Just random awkwardness?
OK, so that's fine.
So it looks pretty random.
That will do.
Let's see if we can now
reduce the process of sorting
to something a little more algorithmic
so that, one, we can be sure
we're correct and not just kind of get
lucky that everyone kind of figured it
out and no one was
left out, and two, then
start to think about how
efficient it is, right?
Because if we've been gaining so
much efficiency for the phone book,
for our contacts, for
[? error ?] coming up,
we really should have been
asking the whole time,
sure, you save time with binary
search and divide and conquer,
but how much did it cost
you to get to a point
where you can use binary
search and divide and conquer?
Because sorting, if it's super, super,
super expensive and time-consuming
maybe it's a net negative.
And you might as well just
search the whole list,
rather than ever sort anything.
All right.
So let's see here.
6 and 5, I don't like this.
Why?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: 6 is
supposed to come after 5.
And so, can we fix this, please?
All right.
And then let's see.
OK, 6 and 1-- ugh,
don't really like this.
Yeah, can we fix this?
Very nice.
6 and 3, OK, you really got the
short end of the stick here.
So 6 and 3, could we fix this?
And 6-- yeah, OK.
Ooh, OK, 6 and 7-- good.
All right, so that's pretty good.
7 and 8, nice.
8 and 4, sorry.
Could we switch here?
All right.
And then 8 and 2?
OK, could we switch here?
OK.
And let me ask you a
somewhat rhetorical question.
OK, am I done?
OK, no.
Obviously not, but I did
fix some problems, right?
I fixed some transpositions,
numbers being out of order.
And in fact, I-- what's your name again?
[? CAHMY: ?] [? Cahmy. ?]
DAVID J. MALAN: [? Cahmy, ?] kind of
bubbled to the right here, if you will.
Like you were kind of farther
down, and now you're over here.
And like the smaller
numbers, kind of-- yeah 1.
Like, my god, like he kind
of bubbled his way this way.
So things are percolating,
in some sense.
And that's a good thing.
And so you know what?
Let Me try to fix some
remaining problems.
So 1 and 5-- good.
Oh 3 and 5, could you switch?
5 and 6, OK.
6 and 7?
7 and 4, could you switch?
OK.
And 7 and 2, could you switch?
And now, I don't have to
speak with [? Cahmy ?] again,
because we know you're
in the right place.
So I actually don't have
to do quite as much work
this time, which is kind of nice.
But am I done?
No, obviously not.
But what's the pattern now?
Like what's the fundamental primitive?
If I just compare pairwise
humans and numbers,
I can slightly improve
the situation each time
by just swapping them, swapping them.
And each time now--
I'm sorry, [? Picco ?]
is in number 7's place.
I don't have to talk to him
anymore, because he's now bubbled
his way all the way up to the top.
So even though I'm doing the
same thing again and again,
and looping again and again
isn't always the best thing,
so long as you're looping fewer and
fewer times, I will eventually stop,
it would seem.
Because 6 is going to eventually
go in the right place, and then 5,
and then 4, and so forth.
So if we can just finish this algorithm.
Good.
Not good.
OK, 6 and 2, not good.
If you could swap?
OK, and what's your name again?
PEYTON: Peyton.
DAVID J. MALAN: Peyton is
now in the right place.
I have even less work now ahead of me.
So if I can just continue this process--
1 and 3, 3 and 5, 4 and
5, OK, and then 2 and 5.
And then, what's your name again?
MATT: Matt.
DAVID J. MALAN: Matt is
now in the right place.
Even less work.
We're almost there.
1 and 3, 3 and 4, 4 and
2, if you could swap.
OK, almost done.
And 1 and 3, 3 and 2, if you could swap.
Nice.
So this is interesting.
It would seem that-- you
know, in the first place,
I kind of compared
seven pairs of people.
And then the next time I went through,
I compared how many pairs of people
maximally?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Just six, right?
Because we were able to
leave [? Cahmy ?] out.
And then we were able to leave
[? Picco ?] out, and then Peyton.
And so the number of comparisons I
was doing was getting fewer and fewer.
So that feels pretty good.
But you know what?
Before We even analyze that, can
you just randomize yourselves again?
Any human algorithm is fine.
Let's try one other approach, because
this feels kind of non-obvious, right?
I was fixing things, but I had to
keep fixing things again and again.
Let me try to take a bigger
bite out of the problem
this time by just selecting
the smallest person.
OK, so your name again is?
[? JUHE: ?] [? Juhe. ?]
DAVID J. MALAN: [? Juhe, ?] number
2-- that's a pretty small number,
so I'm going to remember that
in sort of a mental variable.
4?
No, you're too big.
Too big.
Oh, what was your name again?
JAMES: James.
DAVID J. MALAN: James.
James is a 1.
That's pretty nice.
Let me keep checking.
OK, James, in my mental
variable is the smallest number.
I know I want him at the beginning.
So if you wouldn't mind coming with me.
And I'm sorry, we don't
have room for you anymore.
If you could just-- oh, you know what?
Could you all just shuffle down?
Well, hm, I don't know if I like that.
That's a lot of work, right?
Moving all these values,
let's not do that.
Let's not do that.
Number 2, could you
mind just going where--
where--
JAMES: It's James.
DAVID J. MALAN: --James was?
OK, so I've kind of made the
problem a little worse in that,
now, number 2 is farther
away from the goal.
But I could have gotten lucky,
and maybe she was number 7 or 8.
And so let me just claim that, on
average, just evicting the person
is going to kind of be
a wash and average out.
But now James is in the right place.
Done.
Now I have a problem that's of size 7.
So let me select the
next smallest person.
4 is the next smallest, not
8, not 5, not 7-- ooh, 2.
Not 3, 6.
OK, so you're back in the game.
All right, come on back.
And can we evict number 4?
And on this algorithm,
if you will, I just
interpretively select
the smallest person.
I'm not comparing everyone in quite the
same way and swapping them pairwise,
I'm doing some of more
macroscopic swaps.
So now I'm going to look for
the next smallest, which is 3.
If you wouldn't mind
popping around here?
[? Cahmy, ?] we have to,
unfortunately, evict you,
but that works out to our favor.
Let me look for the next
smallest, which is 4.
OK, you're back in.
Come on down.
Swap with 5.
OK, now I'm looking for 5.
Hey, 5, there you are.
OK.
So go here.
OK, looking for 6.
Oh, 6, a little bit of a shuffle.
OK.
And now looking for 7.
Oh, 7, if you could go here.
But notice, I'm not going back.
And this is what's important.
Like my steps are getting
shorter and shorter.
My remaining steps are
getting shorter and shorter.
And now we've actually
sorted all of these humans.
So two fundamentally different ways,
but they're both comparative in nature,
because I'm comparing
these characters again,
and again, and again, and swapping
them if they're out of order.
Or at a higher level, going
through and swapping them again,
and again, and again.
But how many steps am
I taking each time?
Even though I was doing fewer and
fewer and I wasn't doubling back,
the first time, I was doing
like n minus 1 comparisons.
And then I went back here.
And in the first algorithm, I
kind of stopped going as far.
In the second algorithm, I
just didn't go back as far.
So it was just kind of a different
way of thinking of the problem.
But then I did what?
Like seven comparisons?
Then six, then five, then four,
then three, then two, then one.
It's getting smaller, but how
many comparisons is that total?
I've got like n people,
n being a number.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Is not
as bad as factorial.
We'd be here all day long.
But it is big.
It is big.
Let's go-- a round of applause,
if we could, for our volunteers.
You can keep the shirts, if
you'd like, as a souvenir.
[APPLAUSE]
Thank you, very much.
Let me see if we can't just kind of
quantify that-- thank you, so much--
and see how we actually
got to that point.
If I go ahead and pull up not our
lockers, but our answers here,
let me propose that what we just
did was essentially two algorithms.
One has the name bubble.
And I was kind of deliberately kind
of shoehorning the word in there.
Bubble sort is just that
comparative sort, pair by pair,
fixing tiny little mistakes.
But we needed to do it
again, and again, and again.
So those steps kind of add up, but
we can express them as pseudocode.
So in pseudocode-- and you can
write this any number of ways--
I might just do the following.
Just keep doing the following,
until there's no remaining swaps--
from i from 0 to n -2, which is just
n is the total number of humans.
n -2 is go up from that
person to this person,
because I want to compare him or
her against the person next to them.
So I don't want to accidentally do this.
That's why it's n -2 at the end here.
Then I want to go ahead and, if the
ith and the ith +1 elements are out
of order, swap them.
So that's why I was asking our
human volunteers to exchange places.
And then just keep doing that,
until there's no one left to swap.
And by definition, everyone is in order.
Meanwhile, the second algorithm has the
conventional name of selection sort.
Selection sort is literally
just that, where you actually
select the smallest person, or number
of interest to you, intuitively,
again and again.
And the number keeps
getting bigger, but you
start ignoring the people who
you've already put into place.
So the problem, similarly, is
getting smaller and smaller.
Just like in bubble sort, it was
getting more and more sorted.
The pseudocode for selection
sort might look like this.
For i from 0 to n -1,
so that's 0 in an array.
And this is n -1.
Just keep looking for the smallest
element between those two chairs,
and then pull that person out.
And then just evict
whoever's there-- swap them,
but not necessarily adjacently,
just as far away as is necessary.
And in this way, I keep turning
my back on more and more people
because they are then in place.
So two different
framings of the problem,
but it turns out they're actually both
the same number of steps, give or take.
It turns out they're roughly
the same number of steps,
even though it's a different
way of thinking about it.
Because if I think about bubble sort,
the first iteration, for instance,
what just-- actually, well, let's
consider selection sort even.
In selection sort, how many
comparisons did I have to do?
Well, once I found my
smallest element, I
had to compare them
against everyone else.
So that's n -1 comparisons
the first time.
So n -1 on the board.
Then I can ignore them,
because they're behind me now.
So now I have how many
comparisons left out of n people?
n -2, because I subtracted one.
Then again, n -3, then n -4, all the
way down to just one person remaining.
So I'll express that sort of
generally, mathematically, like this.
So n -1 plus n -2 plus whatever plus
one final comparison, whatever that is.
It turns out that if you actually
read the back of the math
book or your physics textbooks
where they have those little cheat
sheets as to what these recurrences are,
turns out that n -1 plus n -2 plus n -3
and so forth can be
expressed more succinctly
as literally just n
times n -1 divided by 2.
And if you don't recall that, that's OK.
I always look these things up as well.
But that's true-- fact.
So what does that equal out to?
Well, it's like n squared minus
n, if you just multiply it out.
And then if you divide
the two, then it's
n squared divided by 2 minus n over 2.
So that's the total number of steps.
And I could actually plug this in.
We could plug in 8, do the math, and
get the total number of comparisons
that I was verbally
kind of rattling off.
So is that a big deal?
Hm, it feels like it's on
the order of n squared.
And indeed, a computer
scientist, when assessing
the efficiency of an algorithm,
tends not to care too much
about the precise values.
All we're going to care
about it's the biggest term.
What's the value in the
formula that you come up
with that just dominates the
other terms, so to speak,
that has the biggest effect, especially
as n is getting larger and larger?
Now, why is this?
Well, let's just do sort of
proof by example, if you will.
If this is the expression,
technically, but I
claim that, ugh, it's
close enough to say
on the order of, big O of n squared,
so to speak, let's use an example.
If there's a million people
on stage, and not just eight,
that math works out to
be like a million squared
divided by 2 steps minus a
million divided by 2, total.
So what does that
actually work out to be?
Well, that's 500 billion minus 500,000.
And what does that work out to be?
Well, that's 499 billion,
999 million, 500,000.
That feels pretty darn
close to like n squared.
I mean, that's a drop in the bucket
to subtract 500,000 from 500 billion.
So you know what?
Eh, it's on the order of n squared.
It's not precise, but it's in
that general order of magnitude,
so to speak.
And so this symbol, this
capital 0, is literally a symbol
used in computer science
and in programming
to just kind of describe
with a wave of the hand,
but some good intuition and algorithm,
how fast or slow your algorithm is.
And it turns out there's different
ways to evaluate algorithms
with just different similar formulas.
n squared happens to be how much
time both bubble sort and selection
sort take.
If I literally count
up all of the work we
were doing on stage
with our volunteers, it
would be roughly n squared, 8
squared, or 64 steps, give or take,
for all of those humans.
And that would be notably off.
There's a good amount
of rounding error there.
But if we had a million
volunteers on stage,
then the rounding error
would be pretty negligible.
But we've actually seen some of
these other orders of magnitude,
so to speak, before.
For instance, when we counted someone,
or we searched for Mike Smith one page
at a time, we called
that a linear algorithm.
And that was big O of n.
So it's on the order of n steps.
It's 1,000.
Maybe it's 999.
Whatever.
It's on the order of n steps.
The [? twosies ?] approach was twice
as fast, recall-- two pages at a time.
But you know what?
That's still linear, right?
Like two pages at a time?
Let me just wait till next
year when my CPU is twice
as fast, because Intel and companies
keep speeding up computers.
The algorithm is fundamentally the same.
And indeed, if you think
back to the picture we drew,
the shapes of those curves
were indeed the same.
That first algorithm, finding Mike
one page at a time looked like this.
Second algorithm finding
him looked like this.
Only the third algorithm, the divide
and conquer, splitting the phone book
was a fundamentally different shape.
And so even though we didn't use
this fancy phrasing a couple of weeks
ago, these first algorithms, one page
at a time, two pages at a time, eh,
they're on the order of n.
Technically, yes, n
versus n divided by 2,
but we only care about the
dominating factor, the variable n.
We can throw away everything
in the denominator,
and we can throw away everything that's
smaller than the biggest term, which
in this case is just n.
And I alluded to this two weeks ago--
logarithmic.
Well, it turns out that any time you
divide something again, and again,
and again, you're leveraging
a logarithmic type function,
log base 2 technically.
But on the order of log base
n is a common one as well.
The beautiful algorithms are these--
literally, one step, or technically
constant number of steps.
For instance, like what's an
algorithm that might be constant time?
Open phone book.
OK, one step.
Doesn't really matter
how many pages there are,
I'm just going to open the phone book.
And that doesn't vary
by number of pages.
That might be a constant
time algorithm, for instance.
So those are the lowest you can go.
And then there's somewhere
even in between here
that we might aspire to with
certain other algorithms.
So in fact, let's just
see if-- just a moment--
let's just see if we can do
this a little more succinctly.
Let's go ahead and use arrays in just
one final way, using merge sorts.
So it turns out, using
an array, we can actually
do something pretty powerfully,
so long as we allow ourselves
a couple of arrays.
So again, when we just add sorting
with bubble sort and selection sort,
we had just one array.
We had eight chairs
for our eight people.
But if I actually allowed myself
like 16 chairs, or even more,
and I allowed these
folks to move a bit more,
I could actually do even
better than that using arrays.
So here's some random numbers that we'll
just do visually, without any humans.
And they're in an array, back,
to back, to back, to back.
But if I allow myself
a second array, I'm
going to be able to shuffle these
things around and not just compare them,
because it was those comparisons and
all of my footsteps in front of them
that really started
to take a lot of time.
So here's my array.
You know what?
Just like the phone book-- that
phone book example got us pretty far
in the first week--
let me do half of the problem at a time
and then kind of combine my answer.
So here's an array--
4, 2, 7, 5, 6, 8, 3,
1-- randomly sorted.
Let me go ahead and
sort just half of this,
just like I searched for Mike initially
in just half of the phone book.
So 4, 2, 7, 5-- not sorted.
But you know what?
This feels like too big
of a problem, still.
Let me sort just the left
half of the left half.
OK, now it's a smaller problem.
You know what?
4 and 2, still out of order.
Let me just divide this list of two
into two tiny arrays, each of size 1.
So here's a mini-array of size 1,
and then another one of like size
7, but they're back
to back, so whatever.
But this array of size 1, is it sorted?
AUDIENCE: No.
DAVID J. MALAN: I'm sorry?
AUDIENCE: No.
DAVID J. MALAN: No?
If this array has just one
element and that element is 4--
AUDIENCE: There's only
one thing you can do.
DAVID J. MALAN: Yes, then
it is sorted, by definition.
All right, so done.
Making some progress.
Now, let me kind of mentally rewind.
Let me sort the right
half of that array.
So now I have another array of size 1.
Is this array sorted?
Yeah, kind of stupidly.
We don't really seem
to be doing anything.
We're just making claims.
But yes, this is sorted.
But now, this was the original half.
And this half is sorted.
This half is sorted.
What if I now just kind of
merge these sorted halves?
I've got two lists of size 1--
4 and 2.
And now if I have extra storage
space, if I had like extra benches,
I could do this a little better.
don't I go ahead and merge
these two as follows?
2 will go there.
4 will go there.
So now I've taken two sorted lists
and made one bigger, more sorted list
by just merging them together,
leveraging some additional space.
Now, let me mentally rewind.
How did I get to 4 and 2?
Well, I started with the left half,
then the left half of the left half.
Let me now do the right half
of the left half, if you will.
All right, let me divide this again.
7, list of size 1, is it sorted?
Yes, trivially.
5, is it sorted?
Yes.
7 and 5, let's go ahead
and merge them together.
5 is, of course, going to go here.
7, of course, is going to go here.
OK.
Now where do we go?
We originally sorted the left half.
Let's go sort the right-- oh, right.
Sorry.
Now, we have the left half.
And the right half of
the left half are sorted.
Let's go ahead and merge these.
We have two lists now of size 2--
2, 4 and 5, 7, both of which are sorted.
If I now merge 2, 4 and 5, 7,
which element should come first
in the new longer list, obviously?
2.
And then 4, then 5, and then 7.
That wasn't much of anything.
But OK, we're just using a
little more space in our array.
Now what comes next?
Now, let's do the right half.
Again, we started by taking the
whole problem, doing the left half,
the left half of the left half, the left
half of the left half of the left half.
And now we're going back
in time, if you will.
So let's divide this into two
halves, now the left half into two
halves still.
6 is sorted.
8 is sorted.
Now I have to merge them--
6, 8.
What comes next?
Right half-- 3 and 1.
Well, left half is sorted,
right half is sorted--
1 and 3.
All right, now how do I merge these?
6, 8, 1, 3, which element
should obviously come first?
1, then 3, then 6, then 8.
And then lastly, I have
two lists of size four.
Let me give myself a little
more space, one more array.
Now let me go ahead and put
1, and 2, and 3, and 4, and 5,
and 6, and 7, and 8.
What just happened?
Because it actually happened a lot
faster, even though we were doing this
all verbally.
Well notice, how many times did
each number change locations?
Literally three, right?
Like one, two, three, right?
It moved from the original array, to the
secondary array, to the tertiary array,
to the fourth array,
whatever that's called.
And then it was ultimately in place.
So each number had to move
one, two, three spots.
And then how many numbers are there?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Well, they were
already in the original array.
So how many times do they have to move?
Just one, two, three.
So how many total numbers
are there, just to be clear?
There's eight.
So 8 times 3.
So let's generalize this.
If there's n numbers,
and each time we moved
the numbers we did like half of
them, than half, then half, well,
how many times can you divide 8 by 2?
8 goes to 4.
4 goes to 2.
2 goes to 1.
And that's why we bottomed out
at one element, lists of size 1.
So it turns out whenever you divide
something by half, by half, by half,
what is that function or formula?
Not power, that's bad.
That's the other direction.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: It's a logarithm.
So again, logarithm is just
a mathematical description
for any function that you keep dividing
something again, and again, and again.
In half, in half, in half, in third,
in third, in third, whatever it is,
it just means division by the
same proportional amounts again,
and again, and again.
And so if we move the numbers
three times, or more generally log
of n times, which again just
means you divided n things again,
and again, and again,
you just call that log n.
And there's n numbers,
so n numbers moved
log n times, the total
arithmetic here in question
is one of those other values on
our little cheat sheet, which
looked like this.
In our other cheat sheet, recall that
we had formulas that looked like this,
not just n squared and n, and log n,
and 1, we have this one in the middle--
n times log n.
So again, we're kind
of jumping around here.
But again, each number
moves log n places.
There's n total numbers.
So n times log n is just,
by definition, n log n.
But why is this sorted this way?
Well log n, recall from week
0 with the phone book example,
the green curve is definitely smaller
than n. n was the straight lines,
log n was the green curved one.
So this indeed belongs in between,
because this is n times n.
This is n.
This is n times
something smaller than n.
So what's the actual implication?
Well, if we were to run
these algorithms side by side
and actually compare them
with something like this--
let me go ahead and compare these
algorithms using this demo here--
if I go ahead and hit play, we'll
see that the bars in this chart
are actually horizontal.
And the small bars
represent small numbers,
large bars represent long numbers.
And then each of these is going to
run a different algorithm-- selection
sort on the left, bubble
sort in the middle,
merge sort, as we'll now
call it, on the right.
And here's how long each of
them take to sort those values.
Bubble's still going.
Selection's still going.
And so that's the appreciable
difference, albeit with a small demo,
between n squared and
something like log n.
And so what have we done here?
We've really, really, really got into
the weeds of what arrays can actually
do for us and what the relationships
are with strings, because all of it
kind of reduces to just things being
back, to back, to back, to back.
But now that we kind
of come back, and we'll
continue along this
trajectory next time to be
able to talk at a much higher level
about what's actually going on.
And we can now take this
even further, by applying
other sort of forms of media to
these same kinds of questions.
And we'll conclude it's
about 60 seconds long.
These bars are vertical,
instead of horizontal.
And what you'll see
here is a visualization
of various sorting algorithms,
among them selection sort, bubble
sort, and merge sort, and a whole
assortment of others, each of which
has even a different sound
to it because of the speed
and the pattern by which
it actually operates.
So let's take a quick look.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
This is bubble sort.
And you can see how the larger elements
are indeed bubbling up to the top.
[?
And you can kind of
hear the ?] periodicity,
or the cycle that it's going in.
And there's less, and less, and less,
and less work to do, until almost--
This is selection sort now.
So it starts off random, but we
keep selecting the smallest human
or, in this case, the shortest bar.
And you'll see here the bars
correlate with frequency, clearly.
So it's getting higher and
higher and taller and taller.
This is merge sort now which,
recall, does things in halves,
and then halves of halves,
and then merges those halves.
So we just did all the left
work, almost all the right work.
That one's very gratifying.
[LAUGHS]
This is something called [? nom ?]
sort, which is improving things.
Not quite perfectly, but it's
always making forward progress,
and then kind of doubling
back and cleaning things up.
[END PLAYBACK]
Whew.
That was a lot.
Let's call it a day.
I'll stick around for
one-on-one questions.
We'll see you next time.
[APPLAUSE]
