[MUSIC PLAYING]
DAVID J. MALAN: All right,
this is CS50 and this
is lecture 6 and as you
may recall, today we
begin to transition away from
this low-level world of C
and command line programming
into to a domain that's probably
a little more familiar,
that of the web, and yet
all the ideas that we've been exploring
thus far like functions and loops
and conditions and so forth
are still going to be relevant.
It's just we're going to start using
slightly different syntax and the user
interface, or UI, is now going
to be your browser instead
of a black and white terminal window
with just a simple textual prompt,
but how did we get here?
Well, recall that we've
looked recently at structs
and what was nice about structs
in C was that we had the ability
to make our own custom data types
and to kind of encapsulate together
related data, and that
became pretty powerful
when it came time for forensics to
actually manipulate bitmap files
or JPEGs, and even though this
struct is way more complicated
than a student structure,
at the end of the day
it's just individual data types
that are all somehow interrelated,
and by putting them in a struct
you can move them all around
and copy them and save them all
together as you might have done as well,
but then most recently did we
introduce a somewhat fancier structure,
which is still the same idea.
It's got like one or
more things inside of it,
but now, more powerfully,
one of those things
had this star or asterisk, which gave
us, of course, a pointer or an address,
but what was so powerful about
this simple idea and this seemingly
simple symbol is that now we can kind
of stitch together in our computer's
memory any kind of structure we want.
It doesn't have to just be one entity.
It can somehow be linked
to another and you
can keep linking these structures
together as well, and this of course
was an improvement on perhaps our
simplest of data structures early on,
an array or a list,
but of course, as soon
as you have pointers can you
begin to link things together
until we got something like this
and perhaps now with the dictionary
implementation you yourself might be
exploring a linked list, a hash table,
a [? try ?] or some variant in between.
And then lastly, there was this
painting of a picture, whereby
this is your computer's memory
put a little more descriptively,
and this is germane only
insofar as your computer uses
different chunks of memory differently.
All of your function calls
end up using the stack.
All of your users of
malloc and its cousins
end up using the heap, and then
of course there's this up here,
and what was the text segment,
which we didn't really dwell on?
What was the text segment all about?
Text-- you're being volunteered.
Yes, what's the text segment?
AUDIENCE: Files information?
DAVID J. MALAN: Files information, yeah.
Specifically, the 0s and 1s
that compose the actual program.
So when you compile your source
code, like hello.c, into 0s and 1s,
those end up getting stored
in this location in memory
while the program is running.
So long-term they're stored on disk or
your hard drive or whatever's inside
of the computer or the server so that
the files persist even when the power
goes off or you walk
away from the keyboard,
but as soon as you double click a
program on your Mac or PC or as soon
as you do ./hello or
some command like that,
those same 0s and 1s get loaded
into your computer's RAM,
the picture we keep showing, and that's
where they live while they're in use
by your Mac or PC or the actual server,
but thus far we've been running all
of these programs with something like
./hello or some similar command and
running them just in the so-called
terminal window.
But you are probably
most familiar, certainly
with more graphical apps
on your phones these days
and any time you visit a
web browser on your phone
or on your desktop or laptop, you're
still interacting with a program.
It's just that program is not
only running on your Mac or PC.
Your browser is, like Chrome
or Edge or Firefox or Safari
or whatever it is you use.
That's running on your
Mac or PC, but what
you're communicating with
is a program elsewhere,
somewhere else on the internet and
those programs are called web servers.
A web server is just a piece of
software that some human or humans wrote
and their purpose in life
is to serve web pages.
When you request the
homepage of Facebook,
there is a server out there,
a program someone wrote,
that essentially spits out the 0s and
1s that compose Facebook's homepage,
but nicely enough, those 0s and
1s are not written as 0s and 1s
by Facebook engineers.
They're actually written as
something a little more English-like,
a little more familiar, and it's
not even programming code, per se.
It's what's called markup language,
and we'll soon see that and more today.
So we've gone from compiling
your code and running it
like this to actually doing
that in a web-based environment,
but of course, when you're running your
own programs in CS50 IDE down here,
you're actually using
another piece of software
that fills the screen, CS50IDE, a.k.a.
Cloud9, which is the program essentially
running somewhere in the cloud.
And we'll start to make this
distinction and through examples
will the distinction among these
different types of software
begin to make sense, but where is
something like CS50IDE running?
Where is Facebook running?
Where is Google.com running?
Well, back in 1998, Google.com
was running on this.
This was Larry and Sergey, the founders
of Google's, very first implementation,
apparently, of their
first rack of servers.
So servers are generally stored
literally in a rack like this.
It's usually like 19
inches wide by convention
and you just stack computer on top
of computer on top of computer,
but things were very bare
bones back in the day of Google
and so there weren't
even plastic or metal
cases around a lot of their computers.
They were trying to minimize cooling,
minimize cost, presumably, and cram
as much hardware into that
footprint as they could,
and so you actually see a
lot of the wires and hardware
kind of sticking out, and this
is on display now out west.
Of course these days, fast forward
just a decade or two, and this
is one of Facebook's data centers
where it's the exact same idea,
but much fancier, much prettier,
much better lit servers,
but who serve, at the end of
the day, the exact same role.
There are bunches of servers
around the world that are just
sitting there waiting
for you on the internet
to make a request for a
homepage, for an email,
for any other type of
information so that it ends up
getting sent from server to client.
And in fact, if you've ever thought
about those words server and clients,
which is probably the
lesser used of the two,
but a server-client relationship is what
you have when you go into a restaurant,
and you ask the waiter or
waitress for something to eat.
He or she brings something back
to you, thereby serving you,
the client, and the relationship on
the web is pretty much the same thing.
We are the clients.
Our browsers are the
clients and out there
are servers like these who are
serving up content and information,
such as on Facebook.
So let's consider how all
this data even gets to us.
So odds are, these days, if you want
to visit Facebook.com on your laptop
or desktop or phone without
using the app, you probably just
type in Facebook.com and hit Enter.
If you're a little older
school or literally older,
you might just actually type out
the entirety of www.Facebook.com.
Both work and there are technical
reasons for that related to the topic
we'll talk about today,
but both of these work just
because Facebook has
configured their website
to work in either of those addresses.
Now, as an aside, why are so many
websites therefore prefixed with "www"
if both of them actually work?
Like, why have both?
It seems just like
redundant to type "www."
if it's implied by Facebook.com.
Yeah, what do you think?
AUDIENCE: Is the "www" required?
DAVID J. MALAN: Is it required?
Nope, not required.
Not required.
Yeah?
AUDIENCE: Is it to identify that
it's part of the World Wide Web?
DAVID J. MALAN: Kind of, yeah.
It's to identify that it's
part of the World Wide Web
and no one really says
World Wide Web these days.
We of course just say web, but back in
the day and back in my day, frankly,
it wasn't obvious to
a lot of human beings
what Facebook.com might actually
even mean, irrespective of the fact
that it didn't exist at some point.
And so there was this sort
of signal to the world
whereby you just started prefixing
domain names with "www" just
to make super clear to
users, oh, this is a website!
This is one of those things
on the internet or the like,
and also back in the
day, there were also
different services that have
fallen into disuse these days,
like FTP was quite popular
and Gopher, we used it
when I was here and other such things.
And so "www" was just an arbitrary
prefix that just kind of said what it
is, but these days we humans pretty much
know what a .com is and .net and .edu,
but even that kind of road is
changing again because there's dozens,
hundreds of top-level domains.
It's not just .com and
.edu and others now.
I mean, there's hundreds
of these things out there
and so it might even be
non obvious to this day.
So some people, therefore, go really
all the way in and type out http://
and then the address
that they want to visit,
but odds are most of us don't do this
because our browsers just help us out
and prefix that, but that's
where our focus will be today.
Like, this actually has
significance because it specifies
what protocol or language,
what convention your computer,
your laptop should use when
talking to that server's address,
and actually, if you want the
communications to be secure,
odds are your typing or your
browser is doing it for you.
Adding an s there, denoting secure
or encrypted a la Caesar and Vigenere
from some weeks ago and
technically your browser
is also probably adding
a trailing slash even
if it's not shown to you, which denotes
you want the root of the server,
like the default homepage
or something else.
In fact, maybe you do
want something else.
You don't want just Facebook's
website, you want Mark's page,
and so you could specifically /zuck
or whatever the username actually is.
So this is a very long way of
saying all this kind of stuff
that we type or autocomplete and
take for granted these days actually
has some very fundamental
meanings, all of which
make possible the entirety of the web.
So what actually goes on with HTTP
and what does that actually mean?
So HTTP is a protocol.
It is a set of conventions
that dictate how a computer
client, like a browser on your
Mac or PC, talks to a web server,
and it's a protocol in the sense
that it's not a language, per se.
It's really just a set
of conventions and so
like this is kind of an arbitrary
and awkward human convention.
Hello, I'm David.
AUDIENCE: I'm Kara.
DAVID J. MALAN: Kara, so Kara
and I just introduced ourselves.
I extended my hand and she
kind of knew instinctively
that it would be awkward not to
shake my hands or to shake my hand
and so we exchanged
pleasantries and said hello.
So this is just kind of
a silly human convention
whereby we've agreed sort
of socially in advance
how to greet each other in that way.
So HTTP is pretty much the
same thing, but in this case
you're not actually physically
doing something like that.
You're kind of sending a
message from client to server.
You're putting a sort of handwritten
note into an envelope this,
addressing it somehow and then sending
it off on the internet for Kara
or for Facebook.com or
Google.com to actually receive,
and then when Google or Facebook
or Kara receives that note,
reads and sees what I want,
the server or the human
responds in some according way.
So what then goes
inside of this envelope?
Well it turns out that
when a web browser,
like Chrome or Edge or Firefox,
Safari, make a request,
the message they put inside of one
of those envelopes, albeit virtually,
is literally this text.
It's like if I had it written down on a
piece of paper literally GET / HTTP/1.1
Host: www.facebook.com
and then the "..."
just means there's other stuff in there,
but it's less fundamentally interesting
right now.
So what's this all mean?
GET is just a verb and it
kind of says what it means.
Go get something from the server.
HTTP/1.1 mentions the version of HTTP
that I am using or the human convention
that Kara and I were
actually implementing there,
and so 1.1 tends to be the one most
in use these days, and then /, again,
it's just like the default identifier
for the homepage of a website,
the default page that you see in
the absence of typing something like
and /zuck or some other suffix.
Host: Is the same thing as whatever's
on the outside of the envelope.
So if I'm sending a message
to www.Facebook.com,
I'm just making super clear inside of
the envelope which server should expect
this request just in case there
are multiple websites running
on the same physical server, which is
possible for economic and performance
reasons these days.
So alternatively, if I were trying
to visit Mark Zuckerberg's homepage,
the request in that envelope's
going to look almost the same,
but I'm going to be more precise.
/zuck instead of just /.
Meanwhile, if I'm requesting
something from Yale's homepage,
the request would look like
this, or from Harvard's web page,
the request would look
like this and so forth.
So once Harvard or Yale
or Facebook w actually
received the request in that
envelope, opened it up, look at it,
how do they decide how to respond?
Well at the end of the
day, I'm probably expecting
to get back from the web
server some kind of, excuse me,
web page whereby I want to
see my news feed on Facebook
or I want to see the
search page on Google
or I want to see Harvard's
homepage, Yale's homepage
or whatever it actually is.
So there's a lot of information
probably packed into that envelope,
but there's also a
conventional, a standard,
response that looks literally like this.
So at the very top, for instance, of
the "letter" that comes back from Google
or Facebook is a message like this.
Got it.
I'm speaking HTTP version 1.1 also.
Everything is OK and 200.
We'll come back to
that in second and then
the type of content
inside of the envelope,
if I keep digging
deeper into it, is going
to be text, but more
specifically, HTML, and we're
going to focus on that today too.
HTML, Hypertext Markup Language.
This is going to be the language in
which web pages themselves are written.
Then there's usually some
other stuff and way down there
is the actual contents of Yale's or
Harvard's or Facebook's homepage,
but let's zoom in on
this for just a moment.
200, odds are you've never seen or
cared to see this kind of number before,
but have you ever used the
web and requested a web page
and seen some number that for some
reason keeps popping up in your life?
AUDIENCE (IN UNSION): 404.
DAVID J. MALAN: Yeah, 404.
It's just kind of a weird thing
that many of us in the room
know 404 even if we're not
necessarily technophiles
and know what HTTP is, but it turns
out that in these envelopes coming back
from servers sometimes are
not just 200 OK, but instead--
dammit, typo.
This would be much more effective
if I said it's not this.
It's not found.
So inside of the envelope is 404
not found, which means exactly that.
The file was not found that
you were actually seeking.
You mistyped the URL,
the page was deleted.
Somewhere or other, there was
some kind of typographical error
and it turns out there's a lot
of the status codes in HTTP
and there are even more
than these, but these
are the ones we might
see the most commonly.
200 OK means all is indeed well.
404 means not found.
403 forbidden might be
if you've not logged in
or don't have the right access
in order to access some folder
or file on some website.
This is really bad and we'll get to
know this over the coming weeks as we
ourselves start implementing
code on a server.
500 internal server error, if you will,
shall be our new segmentation fault,
but hopefully not too frequently.
It means something is wrong
in the code on the server.
This was an April Fools' joke
back in 1998 I believe, yeah.
So April Fools', some humans decide it
would be funny to announce to the world
that there's yet another code,
which is 418 I'm a Teapot,
which kind of comes up from time to time
in actual code and then there's this
one--
301 Moved Permanently.
It's kind of a scary sounding
thing, as though a website just
kind of up and left and went elsewhere,
but it's a powerful mechanism
in the following way.
If a server inside of
one of these envelopes
responds with a response
like this, there
tends to be one other piece
of information at least.
So if I visit a website
like http://harvard.edu,
I might get back in the response
from Harvard's web server
this answer, 301 Moved Permanently.
Like where the heck did Harvard go?
Well you can see the location
based on this other line
and all of these things
collectively moving forward we're
just going to call HTTP headers.
Anytime you see a word
and a colon, that's
an HTTP header with a name
and a value and the first one
ones a little anomalous
in that there's no colon,
but that's the only
one without the colon.
So location colon
http://www.harvard.edu.
Well what's going on?
Well, if I actually visit Harvard's
homepage exactly as follows,
let's take a look at what happens.
I'm going to go to
http://harvard.edu, Enter.
And notice there's a whole bunch of more
stuff happening on the screen thanks
to what's called autocomplete, which
is a feature of Chrome or my browser.
It has nothing to do
with the topic at hand.
This is just Chrome trying to be
helpful today as on your computer too
and suddenly, even though I tried
to go to http://hardvard.edu, ,
where did I clearly end up?
HTTPS, so they added the s somehow
and what else has it added?
[VARIOUS ANSWERS FROM AUDIENCE]
DAVID J. MALAN: Yeah, the web.
The www prefix was added.
So this is not sort of all
that important to the user
like I got to my destination somehow
but the reason for that is as follows.
I'm going to go ahead and
open up, in the IDE actually,
just a terminal window here and I'm
going to use a new program called Curl
for connect to a URL
://harvard.edu, Enter.
And I get back some cryptic looking
things and that's actually HTML,
and we're going to come back
to this in just a moment
because it turns out there's two
parts to the messages coming back.
There's the headers and then there's the
content, and we're seeing the content.
So more on that in a bit.
I want to look a little higher up in
the response and literally just look
at the headers, and to do that-- and
you would only know this from reading
the documentation--
-I means show me just the
headers that are coming back.
So here now we see the
headers coming back
and you'll see indeed we got
back a 301 Moved Permanently,
and then there's some other stuff
we haven't really focused on,
but at the bottom is something we have--
location, which says to the
browser go to this URL instead.
All right, so let me do that.
Let me save time and just copy paste
this and then do curl -I of this,
Enter, and pretend to be a browser
requesting that page now, but now
where are they trying to send me?
HTTPS.
So this suggests via some
mechanism, some human at Harvard
decided one, uh-uh.
We're not going to be
called like harvard.edu.
We shall be www.hardvard.edu
for whatever reason
and then they also decided that if a
user visits us using HTTP, which is not
encrypted, not secure, we're going
to forcibly tell them to come back
via secure channel, and we won't
dwell today on how that's implemented,
but much like in Caesar or Vigenere
where was a way to encrypt or scramble
information, browsers
can do that too and it's
implied by using the HTTPS
instead of just HTTP.
All right, so let's actually
visit this one more time.
Let me go ahead and
highlight that location.
curl -i of that address and now an
overwhelming amount of information
coming back, and that's why I kept
putting the ...'s, but the juicy stuff
is at the top.
Now everything is 200 OK and
indeed, if I run it without -I
so I see the contents
of the envelope, it's
like looking deeper
inside of the envelope,
now I actually see a lot more
content, which collectively
composes Harvard's
homepage, and it turns out
we can see this even in Chrome.
Let me go over to my browser again
and if you've not done this before,
it turns out that you can go
to your View menu, Developer,
and go to Developer Tools-- and we'll
do this in upcoming problem sets--
and I can go here and see a whole
bunch of features, only a couple
of which we might look at today.
Specifically, I'm going to
click on this Network tab.
So to be clear, Developer Tools in
Chrome still shows me the homepage,
but it kind of dedicates
part of the screen
to these special developer tools that
make it easy to understand and actually
create websites.
So eventually we'll
start using this ourself,
but what's nice about the Network
tab is that you can sniff or monitor
all of the requests going
back and forth between browser
and server in the so-called envelopes.
So I'm going to hit a little
Clear symbol here first just
to get a clean slate.
I'm going to click preserve log so
I can actually see what's happening
and now I'm going to go ahead--
actually, I'm going to
go ahead and do this.
http://harvard.edu, so the sort of
incorrect version that I'm going
expect the browser to fix for me.
I hit Enter.
A whole bunch of stuff is
flying across the screen
and in fact if we zoom
in on this, you can
see that just visiting
Harvard's home page
requires 85 envelopes it would seem
going back and forth with pieces
of the webpage and we'll see soon
with some of those pieces are,
but it's not just one file coming back.
It's bunches of files.
Maybe images, maybe fonts,
or some other things too,
but I'm going to scroll
up in this output
and now notice the story
that's been told here too.
So the very first request,
which I can hover over and see,
came back with a 301, which we
now know is Moved Permanently,
or it's a redirect.
Then if I hover over
the second one, you'll
see that it's a slightly more precise
URL, www, but still with HTTP.
So that got redirected and then lastly,
if we look at the third line here,
this is the one we
ultimately ended up at
and indeed it comes back 200, as do
bunches of other results thereafter,
and we'll see what those
200s actually mean.
Now, you can do a
little better than this
and it's perhaps fitting that our
friends down the road indeed did.
Let me go back to the IDE.
Let me go ahead and clear this and
instead of curling harvard.edu,
let me do http://yale.edu
and ask the question,
what would be a better approach--
knowing these ingredients that we now
have of how redirects work.
How could Harvard do better in terms
of getting the user to the address
that we intend them to be at?
Yeah.
AUDIENCE: By not forcing
like, two redirects?
DAVID J. MALAN: Yeah, by not
forcing two redirects, right?
Even if some of this
material is new, we've
long talked now about
correctness and design and style
and we've seen some messy style on
the screen and that's fine for now.
More on that later.
It seems to be correct
because it's working,
but it feels like it
could be better designed
because why make one request
then make another request just
to fix the first request then
make a third request just
to fix the second request?
Why not combine them?
And, as it turns out, someone down
the road had that same intuition
and so we visit yale.edu with
just HTTP and without the www,
they, in one fell swoop,
actually redirect us
to the right place in this case.
So, with that said, it's
perhaps fitting that just
a few years, well,
some years ago now, you
might have tried to visit
this particular address,
and this is something I
can only do in Cambridge.
If I go ahead and open a new browser
and go to http:// shall we say
safetyschool.org and hit
Enter if you've never been.
Oh, interesting!
[STUDENTS LAUGH]
DAVID J. MALAN: And apologies
for those of you tuning in online
live from New Haven.
So how is this possibly working?
It's actually a very simple heuristic.
If instead of selecting Yale or
Harvard or any other address,
if I literally do like safetyschool.org,
we can wrap our mind around
what's going on underneath the hood
safetyschool.org has moved permanently
to New Haven it would seem, but it's via
this very simple mechanism that someone
back in 2000 registered
this domain name,
and so actually as I was looking
this up in the history last night,
I was amused to find that whoever bought
the domain has been paying for this
domain name now for 17 years for this
joke annually, but it's well worth it,
but I think it would be--
[STUDENTS LAUGH]
DAVID J. MALAN: But I
think it's only fair now,
it's only fair if we take
a look at another one too.
It turns out that if you visit
harvardsucks.org, that one has also
redirected, this time to www.
So let's follow this little
breadcrumb. curl -I harvardsucks.org,
and this one's OK.
So that means something
lives at harvardsucks.org
and it does not as cleverly
redirect to harvard.edu,
but to introduce this,
let me actually introduce
a friend of ours who's now very
awkwardly visiting from New Haven
today.
Hi Natalie.
Do you want to come on up and
say hello for just a moment?
So this is Natalie, who is our head
of the class with Benedict Brown
and [? Anushri ?] and with
[? Staleos ?] in New Haven.
If you'd like to say a quick hello?
Hi, Hi, everyone.
DAVID J. MALAN: So nice to have
you here today and as you know--
do you want to make mention of
what we're about to see here?
What happened back in 2004
just a few years later?
AUDIENCE: We did a
prank back, basically.
DAVID J. MALAN: OK, so perfect set-up.
Thank you very much.
Hello to Natalie.
Let me go ahead and hit
play on three minutes
that are kind of hard
to justify academically,
but it's perhaps one of the best
pranks that's ever been played.
Long story short, our
friends down the road
got together with a few of themselves
just before Harvard Yale, which
was to be at Harvard
that year and actually
mapped out using software,
a sort of grid system
that lined up with all of the
seats in the Harvard stadium,
whereby you assume that a human
each takes up some amount of space,
and then they used special
software to figure out
how they might spell something
out in the audience in a way that
would be readable to the opponents,
the Yalies, on the other side.
So if we could dim the
lights for this look back
at yesteryear and a
slight use of software.
[MUSIC PLAYING]
- All the way at the top.
- This is for you Yale.
We love you Yale.
- We're here to cheer for Harvard.
- Yeah!
Let's go Harvard!
- Yeah, Harvard!
- Take the top one and pass it down.
- It's not going to say
something like Yale sucks is it?
- It says Go Harvard.
- We're nice.
- You see that shit?
Look at them, they have the paper!
It's gonna happen!
It's actually gonna happen!
I can't [BLEEP] believe this.
- What do you think of Yale?
- They don't think good!
- It may be a complete
mess, I don't know.
- Does everyone have it?
Does everyone have their stuff?
- The probability that it's
gonna be legible is very small.
- It's gonna happen!
It's gonna happen!
- It's too complicated.
- Look, look at all the signs.
- I know but it's too complicated.
- Uh, what houses are you guys in?
That's not a real house.
- Ho-fo?
- Yeah.
You guys aren't from Harvard are you?
- No, fo-ho.
Pforzheimer!
- Yeah, but he said ho-fo.
- Let's just make sure everyone has it.
- Well she's probably drunk.
- Are all the cards disributed?
- Almost!
[APPLAUSE]
[CHEERING]
- Hold up your signs!
- They [BLEEP] did it!
[CROWD CHANTING "YOU SUCK!"]
- They [BLEEP] did it!
They [BLEEP] did it!
[CROWD CHANTING "YOU SUCK!"]
- What do you think of Yale sir?
- They suck!
- One more time!
- One more time!
- Oh and there it goes again!
[CROWD CHANTING "HARVARD SUCKS!"]
[END PLAYBACK]
DAVID J. MALAN: All
right, we've been talking
about what goes on
inside of this envelope,
but what goes on on the outside?
So when you hand off this envelope
from your laptop or your phone
to the internet, how does it
actually get to its destination?
Well you've probably heard this
acronym IP, or internet protocol,
and it turns out that every computer
on the internet and every phone
in this room and any very laptop
in this room has a unique address.
That unique address is known as an IP
address and it's much like the address
of a building in the real world, like
the Science Center might be a 1 Oxford
Street Cambridge, Mass 02138, USA.
Down the road is the CS building.
33 Oxford Street
Cambridge, Mass 02138, USA.
So those long strings
uniquely identify buildings
in the world for the
mail service and the like
and similarly do IP addresses uniquely
identify computers on the internet.
These addresses are much
more succinct though.
They're not long strings they're instead
just numbers that have four parts
and each of those numbers within the
IP address are a value from 0 to 255.
So the lowest IP address is all
zeros and the biggest IP address
is all 255s with some constraints.
You can't quite use
all of those numbers.
So just as a sort of quick teaser,
if the smallest number is 0
and the biggest number for each of
these sections of the IP address is 255,
how many bits are being used
for each of those four numbers?
AUDIENCE: 8.
DAVID J. MALAN: Yeah, 8.
So remember like 8 bits gives you 2
times 2 times 2 times 2 times 2 times
2 times 2 times 2, which is 256, and
indeed we have 256 total values from 0
on up to 255.
So an IP address is 8 plus 8
plus 8 plus 8, or 32 bits total,
or, just come really full circle
with week zero, if you have 32 bits,
roughly how high can you count?
Like what's 2 to the 32 power?
Yeah, it's roughly 4 billion.
So, long story short, the implication
of this very simple definition
is that apparently there can only be,
in this model, four billion computers,
phones, refrigerators, internet of
things, devices on the internet at once
if they do all need an
IP address that's unique.
So I've been telling a
slight white lie in that they
don't have to all technically
be unique because there's
ways we can share
addresses, and it turns out
there's even bigger addresses these
days that aren't just 32 bits but 128
bits, which is just massive and daresay
unpronounceable how big that number is.
So we've gotten ahead of this issue, but
you'll find that in a lot of locations,
companies and internet service providers
like Comcast and Verizon and the like
and campuses like Harvard and Yale,
you can notice that they tend to follow
patterns, like many of the IP
addresses here at Harvard start with
140.247.something.something or 128.103.
Down the road in New
Haven, a lot of the IP
addresses there start
with 130.132 or 128.36,
which is not at all interesting to
the humans who are using these IP
addresses, but it is useful to
the servers or the devices that
are actually routing these
envelopes from one place to another.
Meanwhile, in our homes and even
sometimes on campus these days,
there are also what are called
private IP addresses, which
are numbers within
these ranges, and this
has been a solution so that when
you sign up for Verizon or Comcast
back home or your parents
do for internet service,
you technically only get one IP address
from your internet service provider.
That's what you're paying for per
month, but thanks to something
called network address translation
and other technologies,
you can actually give all
of your siblings and parents
and family members or roommates in the
household their own unique address.
It's just private in the sense that
no one else on the outside world
can access it unless you
initiate the connection.
So this is generally why
at home you can reach
any website you want any service
on the internet that you want,
but you can't have like
random people necessarily
trying to get into your
laptop or your device at home
because there's a device, a home router,
that translates these private addresses
into otherwise public addresses,
but for now the takeaway really
is just that every computer on
the internet has an IP address,
and if you've ever poked around your
Mac, like under System Preferences,
you can actually see this.
So I've just pulled up a screenshot here
of a network control panel on Mac OS
and if you look roughly
there on your own Mac,
you should see that your
IP address is something.
It will completely vary by
person and by geography,
but you'll see your IP address there.
On Windows, at least Windows
10, you can see your IP address
under Settings here as highlighted here.
So this has a very different
address, but that's
just because this person was on
a different network all together.
So, where did these IP
addresses come from?
Well back in the day
someone would literally
come to your home to set up your
Comcast or your Verizon internet service
and he or she would like type in
these numbers into your Mac or PC
and then leave, and you would
have one computer on the internet.
These days it's a lot more dynamic.
You don't need someone coming by.
That certainly doesn't scale very
well because there's other protocols.
HTTP is this protocol we talked
about earlier about web pages,
but there's other protocols like Dynamic
Host Configuration Protocol, which
is a mouthful but it just means that our
Macs, our PCs, Android phones, iPhones
and the like, if they speak
this protocol, when you first
turn on your phone or boot
up your laptop it knows,
if it has support for this protocol,
to just announce to the internet,
hello world.
I'm awake.
What should my IP address be?
This just kind of broadcast message
and if Harvard or Yale or Comcast
or Verizon or wherever
you are in the world
has a DHCP server whose purpose in life
is just to listen for those hellos,
that server should respond using the
same protocol with your actual IP
address, and it figures out which one
to give you based on and available pool
of numbers typically.
So that's how you might
get this but there's
other things in these control panels.
In fact, if we look a little lower
on Windows, there's DNS servers too.
Domain Name System.
Another acronym and a bit of a mouthful,
but you can also see this on Mac OS/2
if you actually click Advanced
and actually take a look.
Here, for instance, there's mention of
something else altogether, a router.
So there's lots of different
addresses going on here
and lots of different servers.
So how do these all piece together?
Well, DNS is an interesting
one in that it's
going to be the one that translates
domain names to IP addresses, right?
None of us ever probably visits
http:// and then a number, right?
Like, we visit facebook.com,
google.com or the like,
but that's because our computers knows
how to translate one to the other.
So in fact if I do this command,
nslookup for name server look up
and then I type in something like
google.com, I'm asking the computer,
in this case, the IDE, what is
the IP address of google.com.
I know it as the human as
google.com, but the internet knows it
by its numeric unique address, and
it turns out Google has several,
and even this is a bit of a white
lie because they have thousands,
but the ones that my
computer is being told to use
is, for instance, this one or this
one or any of these other addresses.
So let me see what
actually happens here.
If I highlight that address and open up
a browser and go to http:// and that IP
address and hit Enter, notice
it actually seemed to work.
Well, why is that?
It's a little hard to see
it in Chrome, but let's
go ahead and open up the Inspect tab
and go to Network just like before.
Let me click Preserve Log so
that it saves everything here,
and I could be using curl.
So the curl was just
the simpler version.
Now I'm using the more
familiar graphical version.
Let me go ahead and do that again and go
to http:// and that IP address and hit
Enter.
A whole bunch of stuff flew by
even just for Google's homepage,
but notice what happened.
On that very first-- whoops--
request, if I hover over it, I see
http:// and then the number that I
typed in, but it's a 301
because, what was the response?
We can actually see these responses.
Let me click on the status code
here, or the row, go to Headers
and notice here, if we zoom
in, we'll see that Google
responded with this location.
So someone at Google
just decided, OK, fine.
You figured out one of our IP addresses.
That's great, but we don't want
you to see that in the URL.
It's bad for branding.
We don't want you to
bookmark an IP address
because it might change later on.
So we're using the same
mechanisms as before,
but that's how we might do the
lookup and we can see the same thing
for any number of websites.
Here we go nslookup of harvard.edu
and we get back just a couple here.
If I do the same on Yale, I'm going
to get back different IP addresses.
Yale has even more in
this case and so this
is how the computer's figuring
out to where to send the data.
So what goes on this
envelope then, it's going
to be not facebook.com
harvard.edu or yale.edu,
it's actually going to be
the address like 1.2.3.4
or whatever the actual IP address is
of the server I'm trying to send to.
Now, of course, I expect a
response from the server.
I want to get back my
news feed or I want
to get back Harvard or Yale's homepage.
So what more should I probably
put on this virtual envelope,
just intuitively?
Yeah.
AUDIENCE: Your own IP address?
DAVID J. MALAN: What's that?
AUDIENCE: Your own IP address.
DAVID J. MALAN: My own IP address, yeah.
So just like in the human
world, just in case something
goes wrong with the post office, I
might put my own address, 5.6.7.8,
and actually put that on the envelope so
that if something goes wrong or, better
yet, if something goes right and
they're ready to give me a 200 OK,
it can actually come back to
me because they know from which
address this thing actually came from.
So who is it or what is it
that's doing all of this routing?
Well it turns out there's servers
on the internet called quite simply
routers, otherwise known as
gateways, which is just a synonym,
and they're kind of artistically
pictured here as just dots
across the world, and there's
hundreds, thousands, tens of thousands
of routers.
Odds are you yourself at home,
if you had internet access,
have at least one such router
and its purpose in life,
again, is to take data from inside your
household and send it to the internet,
and then any responses
you get, to send it
back to the appropriate
laptop or desktop or phone
or smart device that happens
to be in your own home.
And we can actually see this too.
Let me go ahead and in CS50
IDE, try one other command.
I'm going to go ahead and
type traceroute and I'm
going to trace the route,
say, to yale.edu from here,
or technically from the IDE.
So if I hit Enter here, we're
going to see a few lines of output,
and if you try this
at home, just realize
I've configured my IDE a little
differently to simplify the output.
So it looks like there's
five steps between Cambridge
and New Haven or technically
the IDE and New Haven,
but what are each of these steps?
Well between here and Yale, if we
continue that version of the story,
there are, it seems, five routers.
There are five computers that
have like lots of RAM, big CPUs
that can handle a lot
of internet traffic
that are figuring out how to
get my envelope from this origin
to this router, to this router, to
this router, to this anonymous router,
to this one.
Sometimes the routers are configured
not to answer these questions
from this program traceroute.
They sort of keep it
to themselves, and you
can see on the right of each of
these IP addresses some numbers.
So just take a guess, what do each
of these numbers represent, perhaps?
Whats that?
No it's okay.
AUDIENCE: Milliseconds?
DAVID J. MALAN: Milliseconds, yep.
So milliseconds that are
measuring what do you think?
Time to go, or time to
reach that specific router.
So we can kind of infer--
and this is the kind of amazing thing.
To get me to New Haven
takes like two plus hours,
but to get an email, to get
an envelope with a message
takes like 10.597 milliseconds
to get data from here to there,
and then hopefully back if
it's a request for a page.
Let's do something a
little farther away.
So let's do like stanford.edu,
tracing the route here,
and already we can see that the
numbers are a little bit higher,
and that makes intuitive
sense in that Stanford's
a little farther away than New Haven
and it takes as many 41 milliseconds
to reach that.
If I go even further and I
read like a company's news
like cnn.co.jp, which is the top-level
domain for a lot of servers in Japan,
you can see a real uptick in just
how many milliseconds it takes,
and in fact, there's
something curious here.
Why does it take so much more time
to get from router number three
to router number four do you think?
AUDIENCE: The ocean.
DAVID J. MALAN: The ocean, yeah.
So there's a really big body of water in
between the US's west coast and Japan's
coast, which probably explains why
not just between three and four,
but really every router thereafter
is that many milliseconds away.
So these aren't cumulative.
We're measuring constantly
from here to there,
from here to slightly farther,
from here to slightly farther.
So it makes sense that
once you cross that ocean,
that's kind of the total value
that you're actually going to see,
and it's fascinating really.
I mean, throughout
the entire world there
are not only wireless technologies
today, but very much wire technologies
and if we take just
a few seconds, we can
see this visualization of so many of the
transoceanic cables that have actually
been dropped by big ships that carry
many, many, many, many bits from one
coast to another.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
[END PLAYBACK]
So, with all of those cables capable of
transmitting data all around the world,
it turns out there's
still one more problem.
Even if we want to do
something simple like
download an internet image
of a cat because there's
different types of servers out there.
There's my computer here like my laptop.
I'm running Mac OS or windows.
There's all those servers
in Google's data center
and in their racks and
Facebook's and the like
and in between all of those
servers there are lots of routers,
but it turns out that those
servers in those racks at Google,
at Facebook, even at
Harvard and Yale, there
are servers that can do multiple
things because technically,
even though we humans tend to talk
about servers as being physical devices,
a server is, as we started
today, really just a program.
It is a piece of software that
someone wrote that, when run,
listens for requests on the internet
and responds to those requests,
generally by spitting out information,
text or 0s and 1s or, in some cases,
cats.
So upon receiving an
envelope, then, how is
it that a server knows whether it's a
request for a web page or it's an email
or it's a chat message or a voice
message or any number of other things?
It turns out we need one more piece of
information at least on this envelope.
It turns out that the
world has standardized
via another protocol called TCP,
Transmission Control Protocol, that you
need at least one other
number on these envelopes,
and that number corresponds
to the type of service
that you're trying to
access or the type of data
that you're trying to send or receive.
So, for instance, 22 is for
something called SSH, Secure Shell.
This is something that most CS majors
might use, but most people in the world
wouldn't use this because
it's entirely command line
and it allows you to connect
securely to some remote server
without using something like a browser,
but all of us generally do use browsers
and HTTP, it turns out, all this time
has had a unique number associated
with all of those requests.
80 is the number and if we visited
any URL starting with https,
turns out there was a
special number, 443,
that humans years ago decided just
uniquely identify encrypted web
traffic requests and responses.
587 is used for Simple Mail Transfer
Protocol, which is for email.
Excuse me, 53 itself is used for DNS.
So if you ever send a
message to a server saying
what is the IP address
of google.com, you're
using number 53 to identify
whatever machine or software can
answer that type of question, and
so we can actually see this too.
If I go back to my IDE and I actually
do curl -I https://www.harvard.edu,
this of course, worked
before and it was 200 OK,
but it also will work if I more
precisely say specifically send this
request to TCP port, or number, 80 and--
damnit.
Oh, it's wrong because made a
compelling pedagogical mistake.
So what did I do wrong?
AUDIENCE: Https.
DAVID J. MALAN: Yeah, so I kind
of screwed up my numbers here.
So I said https, but I meant to
say http if I'm using port 80
or, conversely, if I want to talk
to the secure port which is known,
I actually want to say 443,
and that one in fact works,
and I can do it again even in Chrome.
If I go up to my browser
and go to http://yale.edu/80
and let the redirects
happen, that too will work.
It's just browsers, to keep our minds
focused on the website we're actually
trying to visit and not distracted
by technical details like :80
or slashes or even
sometimes http itself,
just hide that from the URL bar.
It's all there.
It's all happening, but we humans
are getting a little more comfortable
with the internet over the years
so Chrome and other browsers
are just starting to hide some of these
lower-level implementation details.
So that really means, when I actually
want to send a request to a web server,
I should really write
:80 on the envelope
to make clear that that's
going to a web server listening
on port 80 or maybe 443,
and then, you know what?
It turns out, and we won't
dwell too much on the details,
even my Mac or your PC also has its own
port number for all of these requests,
right?
And it would be pretty annoying if you
could only visit one website at a time
or you could use Gmail
or Skype but not both
at the same time, or Facebook
Messenger or Google Chat but only one
at the same time.
That would be pretty
limiting, especially when
we have all this computing power.
So it's also the case
that your own computer,
any time you send a
request on the internet,
chooses a random or pseudo-random
number to uniquely identify
the piece of software on your
computer that's waiting for the reply.
So this might be not port 80.
This is going to be a
bigger number like 1025,
or some large-ish value all
the way up to 65,000, even,
or 32,000 that now uniquely
identifies the port on my computer,
and that's how your computer can
do multiple things at a time,
and when I get the response
those values are just flipped,
but there's one more piece.
Like cats can be pretty high
quality and videos certainly
take up a huge amount of data.
Netflix videos and any
streaming videos are taking up
a huge amount of
information and it would
be pretty annoying to your
neighbors if any time you
were watching a movie
on Netflix, you had
to be done watching the
movie in order for a neighbor
to also watch a video on
his or her computer as well.
So it turns out that what
computers also do thanks to IP
and TCP is, when they're used together,
they offer one more feature still.
It turns out that if I want to
download a picture of a cat,
and we have a nice
printed version here, I'm
not going to get the whole cat
in the one envelope most likely.
This cat or this video
file or whatever it is
is actually going to be divided
up into a few different pieces.
So this message might get chopped
or fragmented into four pieces.
Each of those four pieces now might
go in each of one of these envelopes
here, here, and then here
with the third and fourth,
and what's nice, though,
about TCPIP is that it
provides at least two features for us.
One, IP ensures that every computer on
the internet that speaks this protocol
has an address.
So IP handles the getting of
the data to some destination.
TCP, the other half of
this, ensures or guarantees
with high probability delivery--
that the data actually gets there.
Because as you might have gleaned
from even the animation of all
of the transatlantic cables and all
of the interconnections among routers,
things can go wrong, right?
Routers, it turns out,
can get overloaded.
Their buffers can
overflow such that they
can't handle all of the traffic
coming into them and in fact,
if you try to watch Game of
Thrones, some episode on HBO
and you couldn't access it at some point
or [INAUDIBLE] or some tool like that.
If they're overloaded,
what does that mean?
It just means the server, or the
routers between us and the server,
are getting so many darn envelopes
that they just can't keep up
and can't hold onto them all at once,
and so sometimes packets do get,
so to speak, dropped, both
physically and also digitally,
and this means some packet is lost.
And so what's nice about the internet
is that when my computer here
talks to the nearest Harvard router
that may very well have antennas
in a room like this or an access point,
I might send off a packet here and here
and let's send this all the
way to the back if you could,
but these packets, as you
can see, don't necessarily
need to travel the same
[? path ?] because-- what's
your name in the second row?
AUDIENCE: Monsi.
DAVID J. MALAN: Monsi.
So Monsi is getting a little busy.
So Kara, if you could
route to someone else.
This is literally the effect
that happens on the internet.
If one router, like Monsi,
gets a little bit busy
and her attention is elsewhere or just
has too many packets to deal with,
she won't even necessarily
drop it but maybe
their path will just
be routed around her,
and that's what's nice about having
this mesh network around the internet.
Now unfortunately, one of
those packets can get dropped
and in fact this is a perfect example.
If you want to drop it, drop it.
Uh-oh, a packet was dropped!
What TCP does for us is the following.
Once those envelopes reach
hopefully one specific person--
OK, you are the lucky winner.
Whoever, wants to-- how many do we have?
Two there?
Where did the third go?
That's OK.
TCP can handle multiple
packets being lost.
AUDIENCE: It's over there.
DAVID J. MALAN: Oh, and so packets also
don't take the shortest path sometimes
on the internet.
So what might happen?
So let's assume for
the sake of discussion
that those packets did make their way
to at least one of our audience members
here.
He or she, upon receiving
them, would also
see not just the origin address
and the destination address.
There would also be some notation,
like a memo line on the envelope saying
1 of 4, 2 of 4, 3 of 4, 4 of 4,
so that the recipient can infer
from that little hint whether or
not they received all 4 or just,
as in this case, a subset
thereof, and in that case,
assuming the computer speaks
TCP, it can simply say,
hey David, resend me packet
number 1 or packet number 3
or whichever were actually lost.
And so together all of this
happens at blazing speeds.
10 milliseconds to do all that
back and forth to New Haven,
let alone even faster here
on campus, but those really
are the basic principles
and building blocks
that are just getting our data
from one place to another.
Of course, the real
interesting stuff happens
when we dig deeper into this
envelope and look at the contents.
Not just the cat as in this case, but
the language, HTML and something else
called CSS which we'll
do shortly, but I thought
it might be fun, especially on the
heels of our look at forensics,
to take a look at just how
sort of presumptuous Hollywood
tends to be when presenting us
humans with technical details
that now you'll perhaps have an
even better eye for in addition
to the age-old "enhance" line.
[VIDEO PLAYBACK]
- It's a 32-bit IPP4 address.
- IP as in internet?
- Private network.
[? Tamia's ?] private network.
[STUDENTS LAUGHING]
- She's so amazing.
- Oh, Charlie.
- It's a mirror IP address.
She's letting us watch what
she's doing in real time.
[END PLAYBACK]
DAVID J. MALAN: OK, so
we'll hold it on this screen
here because one, a few of you
laughed when you saw the bogus IP
address because the number was what?
AUDIENCE: 275.
DAVID J. MALAN: 275, which
is too high and that one
we could forgive because you don't
want like random people pausing
their videos on the internet then
trying to hack into or get access
to that URL, but even funnier is
when the hacker is being described
as doing this on the screen
as part of their attack.
This is like the source code in
a language called Objective-C
for some kind of drawing
program, as suggested
by the use of crayons in
the code as a variable.
So let's pause there and when we come
back in five minutes, we'll take a look
at HTML itself.
All right, so we're back and we're
about to learn a new language.
Though this might feel like
a lot to do in just an hour,
this one's a markup language.
So it's not a programming
language, which
means you're not going to see loops.
You're not going to see functions.
You're not going to see conditions
or any of the kind of logic
that we have built into C and
into Scratch and eventually
Python and JavaScript.
You're instead going to see just
what are called tags, pieces
of English-like syntax that
just tell the browser what to do
and what to stop doing.
So we're going to see tags that say
start making this text centered.
Stop making this text centered.
Start making the text bold.
Stop making the text bold.
So these very deliberate
kind of statements
that we're going to express
using something that's code-like,
but it doesn't give you logical control.
So as such, there's a pretty
small language ahead of us
and a lot of what you'll do
when learning HTML is just
check an online reference or an
example online or look at the source
code of actual web pages to just
figure out how these things are done
and today, we will focus
on the fundamentals.
So this is perhaps one of
the simplest web pages you
can write in a language called HTML.
It's a text-based language.
All of the tags resemble
some English words
and there's a pattern to the kinds
of things that you might type.
First of all, if you're using the
very latest version of HTML, which
happens to be version 5,
it's been around for a while,
you simply start every web page with
this cryptic incantation at the top
here.
Open bracket, !doctype
HTML female closed bracket,
as those things are called.
Angled brackets, which you've probably
not had many occasions to type
on your keyboard, but
starting soon you will.
Then after that, they start a pattern.
So HTML > and then all the way at the
bottom is what we'll call the opposite
of that tag.
If this is a start tag, this will be
an end tag, or if this is an open tag,
this will be a close tag, differing
only with this forward slash that's
inside of the tag.
So this says, hey browser,
here comes a web page.
This says, hey browser,
that's it for the web page.
Again, this sort of starting
and stopping mentality.
Meanwhile, inside of the web
page as denoted by the HTML tag,
there are two parts, a head and a body.
The head of a web page tends
to contain very little.
It's usually just like
the title bar in the tab
that we humans see when
you visit a website,
and the body is like 95%
percent of the contents
of the page, the actual viewport
or the rectangular region
that contains actual content.
What is that content?
Well here in the head
we have a title that's
going to be "hello, title"
just because and then
in the body of the web
page, this web page,
there's going to be "hello, body."
That's it.
That's HTML.
If you save this text in a
file, open it in a browser,
you will see a really lame web page
that says hello title and hello body,
but that's a web page using
HTML tags as they're called.
Anything in these angled
brackets are tags.
So I can actually see this
pretty clearly even on my Mac
and you could do this
on your PC as well.
I've opened up TextEdit and I've
configured it to be simpler than
the default, so know that I've
done a little something in advance,
but you could use notepad on Windows
or any other number of other programs,
even Microsoft Word if you save it
in the right way or Google Docs,
but let me go ahead and just recreate
this as !DOCTYPE html, open bracket,
html, and just to kind
of remember to do things,
I'm going to tend to get ahead of
myself and sort of start and finish
the thought and then dive in inside.
Let me go ahead and do head
here, close head tag here,
and I'm indenting, just for good
measure, one, two, three, four tabs,
though so long as you're
consistent the browser will
be perfectly content, as will we.
hello, title, title, open bracket,
open bracket body, closed bracket body,
and then hello, body.
So that's it.
I've just typed out the
exact same thing as before.
Let me go ahead and save this
as not hello.txt or certainly
not hello.c but
hello.html by convention.
I'm going to hit Save.
Mac OS is kind of warning me that this
is text, not something called HTML,
but I know what I'm doing and
I'm going to say use HTML,
and now I have a file called hello.html,
and if I go to my desktop, here in fact
it is.
And if I double click on it, there, in
fact, is that pretty simple web page
and if I actually reveal
the tab, there it is.
Hello, title in the very top tab of
the page and once I get rid of that
do I see the body again.
So that's it for HTML at least
in terms of its basic structure,
but there are some other features
that we can take advantage of as well,
and let's actually tease these apart.
Notice, first of all, that
there is indeed this symmetry.
What is opened is almost always
closed as well in the opposite order.
Just as head here and
title here, and then
followed by body and then
the contents therein,
but because there is this
structure, you can actually
think about this in a relation
to the past couple of weeks
when we've talked about data structures.
I would argue that this
HTML on the left is
kind of equivalent to
this tree on the right,
and we didn't spend a huge amount
of time talking about trees,
and even when we did we used
them for algorithmic reasons
like a binary search tree to
search data pretty efficiently,
but if you think about it, here
is the document, which I'm just
drawing with this shape
here kind of arbitrarily
and it has one child like the
entire page as I'm drawing it,
which is the HTML tag here.
The HTML tag has two children,
so to speak, to borrow
our language from our data structures.
So head and body from left to right.
Head has a child called title and
then title has a child of some sort,
even though it's just raw text.
It's not another tag
with angled brackets,
just as body has its own
content there, just hello, body.
So that hierarchy and the deliberate
indentation, which is there just for us
humans-- the browser does
not care about whitespace--
lends itself to an
implementation in memory,
and so long story short, when
your browser receives an envelope,
inside of which are not just those
HTTP headers, outside of which
are not just the IP
address and TCP port,
but inside of which is a text
file containing HTML like that,
all the browser does is
load that file into memory,
read it top to bottom, left
to right and essentially build
a tree structure in memory so
that it knows how to represent it
underneath the hood, so to speak.
And in fact, you've
seen HTML all around you
even if you've just never looked
underneath the hood, as we say.
In fact, if I go to
like harvard.edu and let
the redirects happen in the usual way,
let me go ahead and inspect the page.
This is another way in
Chrome and in other browsers
to get at the developer tools.
You can control click or right click
on the web page and choose Inspect.
That opens up the same tab.
Previously, we used the network
panel, but if I click on Elements
you can actually see all of the
HTML that composes Harvard's page,
and it looks beautiful here.
It's nicely color-coded.
It's prettily indented.
I can dive in deeper
with all of these arrows,
but that's probably not
how the humans made it
because if I also right click or control
click and choose View Page Source,
and you can do this in
any browser as well,
here is the mess that actually
came back from Harvard's server.
This is HTML and my
god, like, it's a lot.
I see no indentation, so
style 0 here, but that's OK
because it's a browser reading it.
It's not a human in
this case and similarly,
if we visit something like yale.edu, and
let's go ahead and open up their page
source, it's similarly going to be
kind of overwhelming and a lot of it,
but rest assured that even though
these web pages might look really,
really sophisticated-- like, my god,
we've never written a C program with
500 plus lines of code--
a lot of this stuff is
generated, and in fact,
one of the challenges of pset7 and
pset8 when we explore web programming
is going to be not to write hundreds
of lines of HTML, which would just
get mind numbing quickly, but
to write a few lines of Python
or a few lines of JavaScript that
programmatically, like with loops,
generates all of the
structure of your web page.
So if it's like a web page of
photos like a Facebook photo album,
Facebook doesn't have people writing
out thousands of lines of HTML code
every time you upload a photo.
They have code in PHP
or some other language
that has a for loop that iterates
over all of the photos you've uploaded
and spits out the same HTML but
different image for each of the photos
you've uploaded, and that's where
web programming comes into play.
You're not writing the
HTML, you're generating it
by actually writing programs.
So today we set the
stage for that capability
but first we just need a
framework for actually doing this.
So rather than use,
now, my local Mac, which
is kind of lame because I can open the
web page but no one else in the world
can access it, and in fact,
if we do that again, you'll
notice here, if I double click on
hello.html and open the URL bar,
it's curiously clearly
not on the internet.
Like, it's not http, it's not
https, it's literally file://,
which just means it's a
file on my local computer.
So none of you could reach
that because of course
this user jharvard on my laptop
exists only on my local Mac.
So fortunately we have a web-based
IDE with which to put stuff
on the internet, but there's a catch.
The IDE itself, recall, is
a web application, right?
It's code that friends at Amazon
wrote and that we added to that runs
on a server somewhere and, as we'll
see, somewhat in your browser too,
but more on that when we
talk about JavaScript,
but CS50 IDE already has a URL like
https://cs50.io or https://ide.cs50.io
slash whatever your username is.
So we're already using port 80
or maybe 443 for the IDE itself.
So how in the world could you
write web pages in the IDE
and then serve them on the
internet if the IDE itself
is already using the standard port?
Well fortunately you can
write on the envelopes,
when trying to access your own web
pages, a hardcoded TCP port number.
It doesn't have to be 80,
it doesn't have to be 443.
Those are just the defaults.
If I want to actually
visit pages in my IDE,
I can just run a web server
on a different port number,
like 8,080 by convention
or 8,081, 8,082.
Just a pretty big number that odds are
no one else is using on some system.
So let's see this as follows.
Let me go ahead and in the
IDE here create a new file.
I'm going to call it
hello.html and I'm just
going to go into that text
file, whoops, which I closed.
Let me go ahead and just
grab the code that we've
been using here, which is
right here, go back to the IDE,
paste it into the text file here,
click Save, and now I have in the IDE
a file called hello.html, and
indeed if I look at the file browser
and I look on the left-hand side,
there, in addition to the sample code,
is hello.html, but if I double
click this file it's not
very useful because it's going
to open the editor, which
is not like a web page.
It's the source code for my web page.
So I actually now need
to run a program that
serves this file just like Facebook
does, just like Google and Harvard
and Yale do, and I'm going to do this
literally by running http-server,
and I'm going to say on port 8080.
So -p in this particular
program means port
and I'm just going to say, hey CS50
IDE, start a program called httpserver
whose purpose in life is to listen
for requests on the internet,
but specifically on that port number,
and serve up whatever requests come in.
So I've gone ahead and hit Enter here.
Starting up httpserver.
It tells me the long URL
that this is available at.
Your URL will be a little
different with your username
and if I open this now in another tab,
it's a little cryptic at first glance.
I'm just seeing the index or contents
of my directory and in there is like
a secret .c9 for Cloud9 directory.
Don't delete that or change that.
That just has metadata
related to the IDE.
Source6 I downloaded earlier and you
can too from the course's web site,
but there's hello.html,
and on the left-hand side
here, you'll see some
cryptic looking permissions.
This has to do with who can read
and who can write your files,
but for today all I care
about is that the file exists.
So now, like a user on the internet,
I'm going to go to here, click on it,
and viola!
There is my actual web page.
So notice, the URLs are very similar.
Here I am on cs50.io
and here I am on cs50.io
even though your user names
will of course be different,
but the IDE is running
on the default port, 443.
I'm now temporarily
serving up my HTML files
using port 8080 just
because and so that's
how a server can do multiple
things and how you can do
multiple things on the server at once.
So let's do something else besides that.
Let me actually introduce a
few other fundamentals that
might be handy when writing HTML
and let's go ahead and do this.
Let me go ahead and create a
new file and we'll call this one
paragraphs.html, and let me go ahead and
just name this like paragraphs and down
here I'm going to have
some paragraphs of text,
and I don't really know what I want
to say so I'm going to Google some--
so standard Latin-like text.
Oh, I want like three paragraphs of
Latin-like text and so here we go.
Then there's a random
website that just generates
placeholder text in faux Latin.
So, Paste.
There are my three paragraphs.
I'll be a little nice
and tidy and indent them
so it looks at least
somewhat nicely styled.
Save the file and now let me go back
to the URL I was at a moment ago.
Now notice I have two files being
served by this HTTP server program.
Click paragraph-- oh.
OK, one, Chrome thinks
the page is in Latin.
[STUDENTS LAUGH]
Actually, soccer inferior
element estate planning time.
Tomorrow soss quiver before as the--
that does sound like the
Latin I learned years ago.
All right, so Show Original.
So the point is not to focus on
the Latin, but the apparent bug.
Like, what's it not doing that maybe
you thought it should a second ago?
AUDIENCE: No indentation.
DAVID J. MALAN: Yeah, there's no
indentation and also there's no what?
There's no break.
I mean this is one big
Latin-like paragraph.
It's not three.
Well this is simply because a browser
only does what you tell it to do.
Let me go ahead and shrink this
window and, as an aside, what you're
seeing here, all this mess in
the bottom terminal window,
as the httpserver program is running,
it is logging all of the HTTP requests
that come in from browsers just so
you can kind of debug or diagnose,
but we're going to just
ignore that for now
and let this thing run down
here in the background.
But if I want paragraphs I need
to be a little more pedantic
and actually say, hey browser, make a
paragraph with what's called the p tag,
and let me go ahead now and indent
even though the indentation clearly
doesn't matter.
It's just to keep my code nice and tidy.
So, hey browser, start a paragraph.
Here's the text.
Hey browser, stop the paragraph.
Same thing here.
Let me go ahead and start a paragraph.
Then let me go ahead
and stop the paragraph.
Notice the IDE is trying to be helpful.
This is not helpful.
This is not a password, but it's
trying to autocomplete my thoughts.
That's fine.
I'm just going to ignore it.
Then let me go ahead and
close the paragraph and save.
So it's a little more verbose, but
anything in the tags the human is not
going to see, but when you reload the
page, as with command or control+R,
or if you go up here by
clicking the reload icon,
whatever it looks like in your browser.
Now I have three Latin-like paragraphs.
So it's a little more deliberate here.
So that's all fine and good, but
the web is kind of more interesting
when you can actually link to things.
So let's actually do that instead.
Let me go ahead and create a new
file called, let's say, link.html.
Go ahead and paste this here and
say we'll name the title link.
Let me get rid of all of this
just so I have some placeholder
and I can say something
like "Hello, world!
My favorite school is..."
and just to play it safe
today, "stanford.edu."
Save, reload, click
link.html and nothing.
So here too it looks like a domain
name and it certainly is, and frankly,
all of us now are probably conditioned
in tools like Slack and Gmail and other
tools and Facebook that just
kind of figure out that, oh,
if something looks like a
domain name, make it a link,
but that's because someone at Facebook,
someone at Google knows HTML and knows
how to use if conditions
and elses and just says, oh,
if a string that the human has typed
in looks like a domain name ending
in .edu, make it a link.
But how do you make it a link?
We can now do this manually.
It turns out you need an
anchor tag abbreviated as a
and then I'm going to close the
anchor tag at the end of the text
that I want to anchor a link
to, but this isn't enough.
I need to be ever so explicit as
to where I want this link to go,
and so it turns out HTML also
supports what are called attributes.
So tags are the things
in angled brackets.
Attributes are also inside
those angled brackets,
but they come after the tag's
name, and they just going
to modify the behavior of the
tag, and it makes sense here
to need to modify the
behavior because 20,
30 years ago when HTML was
invented, we didn't make up
a tag that leads to stanford.edu.
We made up a more generic tag
that anchors to some destination,
and so here I can now do
www.stanford.edu, save the file,
and notice, this is like saying
to the browser, hey browser,
here comes a link or hyperlink
to Stanford's web site,
and then the end here it says hey
browser, that's it for the link,
and thankfully it's not super verbose.
You don't have to repeat
the attribute at the end.
You just repeat the
tag's name, otherwise
you'd be typing the same
thing again and again.
If I now go back here and reload the
page as with command or control+R,
now it becomes the familiar
and blue underlined link,
and if I click on that,
notice first it's super small.
You can see where the link
is actually going to lead,
and so if I click on this we'll
see Stanford's website and voila.
So now we've visited their page as well,
but there's an interesting side note
here, and if you want to kind of think
about things called phishing attacks
or frankly, Harvard once in a
while and Yale once in awhile
will email out warnings like
"beware of this phishing attack."
P-H-I-S-H-I-N-G.
This is when people on
the internet generally
send you emails or some kind
of spam trying to trick you
into visiting a phony website to harvest
your usernames, passwords, credit card
numbers and whatnot, and honestly,
most of those phishing attacks
boil down to this
10-line example of HTML
because what's to stop me from
saying something like "Hello, world!
Confirm your password at..."
and then we'll say like
paypal.com and then
over here, I can change this
to like davidsphishingsite.com,
which hopefully doesn't exist.
One year I went to badplace.com and--
anyhow, so--
[STUDENTS LAUGHING]
Here I've gone ahead and saved the file,
reloaded, and the link is indeed blue,
but before I click on it,
only the most estute of users
is going to even bother
checking the bottom left hand
corner to see where they're
about to be whisked away to
and even most of us in
this room, myself included,
are not so paranoid
that we're constantly
checking those kinds of things.
Odds are, if I get an
email like this, oh
my god, my accounts been compromised.
I've got to go confirm my password
for PayPal to protect my money.
You might very well
just follow the link,
but of course it can go anywhere you
want just via this very basic building
block, but this is just one
way you can vet actually
what's going on underneath the
hood, but of course the internet
is more interesting
than just text alone.
Let me go ahead and open up
an example that I whipped up
in advance here using image.html
and we'll see another tag here.
So here is another opportunity
to use an attribute
and one that's also not
necessarily visible to the user.
So here's an image tag.
Humans years ago decided to be succint.
It's img > for image, just
like it's just a > for anchor.
The source, src, of which is
going to be that file, dan.jpeg,
which I downloaded in advance
from the URL up above,
and in fact, this is gray in the cs50
IDE because it's syntax highlighting it
just like in C. This is
what's a comment in HTML.
So if you want to make notes
to yourself or to viewers,
some sentence or like
a citation like this,
you can use an HTML comment by doing
! // // > and you can write anything
between those things-- for
the most part-- that you want.
So just like in C do we have the //.
So here's the source
of this image and this
is like an alternative explanation of
it, alt. Why might this be compelling?
I want to show the image to a user.
Yeah?
AUDIENCE: Is it for like if
they hover their mouse over it,
they can see what's happening.
DAVID J. MALAN: Yeah,
so a couple of reasons.
If you hover over the image you can
actually see some descriptive text.
So like Handsome Dan
here, like Yale's mascot.
If the user has trouble
seeing or is blind,
you might need a screen
reader to actually tell you
what it is that's on
the screen, and it's not
obvious from dan.jpeg
what that could be,
but if you have this
alternative text, a computer
can recite verbally Handsome
Dan, which might then
jog the person's memory as to what
it is that actually on the screen.
Or if you have a really
slow internet connection,
sometimes you'll see a
placeholder for an image
that just says what it is before
the image actually downloads.
So being mindful of
these kinds of things
will just make, ultimately,
your websites more accessible,
and indeed if I go to this one now
and go into my source6 directory
where we have even more examples
at our disposal and go to Image 6,
here is their adorable Handsome
Dan as of this past year.
So there's an image.
We can kind of do funky
things now with nesting.
So this is not all that interesting
because it doesn't go anywhere,
but I could just combine these ideas.
I could do a href = http://www.yale.edu
or, because I don't want the user
to bother getting redirected,
I could just proactively make
it secure because I know Yale
supports that per earlier,
and I can nest these tags like this.
Now if I go here, reload,
it still looks the same
but notice my cursor changes to like a
pointer, and if indeed I click on that,
now the image leads to Yale's web
site, but I skimmed over something.
One of these is not like the other.
What detail have I
kind of not mentioned?
Yeah.
AUDIENCE: The image file
closes within itself.
DAVID J. MALAN: Yeah, the image tag
kind of closes in and of itself,
and so there are some of
these anomalies within HTML
where there really isn't a notion
of, like, start doing something
and then eventually
stop doing something.
Like, an image is either
there or it's not.
Like, you can't kind of put
something in between it conceptually,
and so some of these tags in
HTML are what are called empty.
Like, they should not have
anything after the open tag
or before the close tag.
So if you wanted to be really sort
of precise you could say this,
but you should not put
anything where my cursor now
is because it would make no sense to
try to put something inside of an image,
but this is just kind of lame to
have this unnecessary verboseness.
So you can just put the slash in
there and technically in HTML5 you
don't even need the slash in
this case, but at least this way,
and I think for pedagogical purposes,
doing it, even for empty tags,
makes sure and makes more clear
visually, when and that your tags are
balanced.
So that's the only
anomaly there and then
there's bunches of others which we
can fly through really quickly here.
So if I go back to our examples
here, I whipped up headings.html.
So if you want to do
something like this if you're
writing like a book or a website
that has like chapters and sections
and subsections and so
forth, HTML lets you
easily format things as big and
bold, slightly smaller and bold,
slightly smaller and bold, and so
forth by using the h1 through h6 tags.
So if I go into headings, this
is how I made this web page.
I simply have h1, h2, h3, h4
opened and closed and that's it.
So any time you're reading
some kind of online text,
odds are they're using one or more
of these tags to format the page.
If we look at another example in here,
we have something like list.html.
Lists are not uncommon on the internet,
you'll never believe number three,
and here's how you might do something
with a bulleted list by just marking up
three words-- foo, bar and baz--
and the HTML for this,
if I open up list.html,
simply looks a little more verbose
in that we need a parent element so
to speak, borrowing
our tree terminology,
but here we have an unordered
list, or ul, each of which
has one or more list
items, or li, each of which
open and close foo, bar and baz.
And if I really want it
numbered, I can also do this.
I can change unordered list to ordered
list, ol, reload and now the browser
figures out the numbering for me,
which is nice if you have lots of data
and you don't want to deal with
actually laying it out yourself.
Meanwhile, we can go one or two
steps further before we actually
get to something functional.
Here is kind of the
most complicated of all,
but it too just kind of
tells the browser what to do.
So before we look at the result, this
says, hey browser, here comes a table,
like tabular data.
Rows and columns like Excel
or Google Spreadsheets.
Hey browser, here comes
a table row, or tr.
Hey browser, within that row,
here comes some table data, a.k.a.
a cell or column.
Here comes another cell.
Here comes another cell.
So that's one, two,
three cells in a row.
Hey browser, here
comes three more cells.
Hey browser, here
comes three more cells.
Hey browser, here comes three
more cells and if we actually
render this in the browser, you can
see the layout of a sort of old school
phone pad on your phone.
It's not very pretty, it's
not very well formatted,
but if we zoom in you really do see
that it is lined up in rows and columns
as I sort of verbally implied, but
this is all very kind of underwhelming.
Like, Google is cool
because you can go to it
and you can actually search for cats
and find lots of cats on the internet,
but how is it that this actually works?
So, aww, bad news today.
OK, so we'll just zoom in on this one.
OK, so let's try to focus
on the pedagogy here--
of cats-- as follows.
Let me go ahead and focus on really the
URL, which is kind of long and cryptic,
but let me just throw away honestly
anything that kind of looks confusing
or I don't understand.
I have no idea what source means
so I'm going to get rid of that.
I have no idea what
the rest of this means.
I'm going to get rid of that and I'm
going to try to distill-- granted,
with some foresight because I
knew how Google works here--
I changed the URL to something
much, much, much simpler.
Cats,f where it's
www.google.com/search?q=cats.
It seems that, somehow or
other, Google's behavior
is controlled by information
that's conveyed in the URL,
and it's not just that I'm searching.
It's that I'm searching for cats.
So in fact, on a whim, I'm going
to search for dogs instead and hit
Enter, and indeed a few things change.
We have all these dog images
appear here on the right.
We have the text
pre-populated up here and we
can search for any
number of other things
here, like Harvard
Yale prank 2004, Enter,
and there you have a Wikipedia
article on the video we saw earlier.
So it seems that you can
parameterize the behavior of Google
just by understanding
how this URL works.
So here is kind of the path
that's being requested,
the file or folder or whatever that is.
A question mark says, hey
browser, or hey server,
rather, here come some HTTP parameters.
Some inputs from a human who's either
filled out a form or apparently
is kind of hacking the URL bar here,
and then the name of the parameter
comes next. q, meaning query, and this
is what Larry and Sergey decided years
ago for their search
box, an equals sign,
and then whatever it
is the human typed in.
Now it got a little funky here quickly.
Now you see %20.
That is the web's way
of encoding a space so
that it's not a physical space,
it's all one contiguous string.
So it's just one contiguous string for
the server to actually look at or read,
and so why is this useful?
Well it turns out I can
leverage this information
and kind of implement my
own Google pretty easily.
Let me go ahead and go into search.html,
one of the other examples I whipped up,
and you'll see another tag all together.
Inside of the body of this
page is an HTML form tag,
and the form tag takes a
couple of attributes I know.
One is action, which
is the URL to which you
want to send the form's
information, and the other
is the method that you want to use.
Now it's a little inconsistently
lowercased here just because,
but we did see that verb before.
Where?
Where did we see this verb?
This was like the somewhat arcane
message that was going, supposedly,
inside one of these envelopes when
we said GET in all caps /http1.1
and so forth.
So it seems that if you
want, as the web developer,
to create an HTML form that has text
boxes and maybe checkboxes and dropdown
menus and so forth that submits its
information when the user clicks Enter
or a button to this address, and you
want it to go inside of a virtual
envelope using that GET verb, you
literally just say method=GET.
And then down here I seem to have
two inputs, one of whose names
is q, the type of which is a
text box, and the other of which
is a submit type, whatever that
is, the value of which is search.
Now you would only know what these
things mean by seeing them demoed
or looking at some online reference, but
if we pull this up to see the results
we have a super simple--
and I'll zoom in--
very, very simple
version of Google, right?
It don't even have the logo, but it does
have, I claim, all of the functionality
because watch what happens if I type in,
for instance, whoops, birds and click
Search.
Oh my god, I implemented Google
with just like 15 lines of code,
but not really, right?
Like, I've implemented
the front end of Google,
which I got to start Googling
these things in advance
OK, uh, these are very sad stories.
[STUDENTS LAUGH AT MORBID NEWS
 HEADLINES]
DAVID J. MALAN: OK, so the point though
is, the point-- look up, look up.
The point is that the
URL is what I generated.
So using those HTML tags coupled
with the human's cooperation
and actually clicking
a button did I then
generate this URL, whisk
the user away from the IDE
to google.com, where Google
is handling the back end,
like all of the hard work,
actually checking their database,
rendering the HTML, but
I made the front end,
the user interface via which you can
actually interact with Google's search
engine there.
And it boils down to just
these basic heuristics,
but of course this is a pretty
ugly search engine, right?
Black and white text box, a
gray button and that's it.
Like, even Google, simple though it
is, has a little bit of style and color
to it and things are centered
and kind of spaced differently.
So there's an art to this
ultimately and indeed
being a web designer in
itself is a profession
and in fact, you'll find in
industry that some people are
good at front end design.
Some people are bad at it.
I'm among the ones worse.
Like, my web pages look like that
search box just a moment ago,
but some people really prefer the
non-graphical stuff, the back-end,
the database stuff, and indeed one of
the takeaways over the next few weeks
will be for you to figure out for
yourselves if you like any of this
at all certainly, but also
like what your preferences are.
And you might hear terms
in industry these days
like front-end developer,
back-end developer.
That just means do you work on what
the user sees in their browser or app
or do you work on the back-end,
the database stuff that's
really important and
sometimes quite difficult,
but that the user doesn't
interact with directly.
Or are you a full-stack
developer, which means you just
do all of this, which
all of you from CS50
are effectively, albeit after just
one or so semesters of background.
So how do we start, though,
to make things prettier?
Well it turns out that HTML, for the
most part, is just a markup language.
It's for structuring a web page
and semantically tagging things,
and by semantically
tagging things I mean
like, hey browser, here's
the head of my page
and that's a concept, semantically.
Hey browser, here's the body
of my page, and that too
is a concept, semantically.
I didn't say anything about bold
facing or font size or colors
or all this stuff that's important
for a good user experience, or UX,
but that can be decoupled
from HTML, and in fact,
one of the challenges as you
learn HTML for the first time
is to try to make your way through
various online resources and references
will sometimes combine these ideas.
So, again, today we'll focus not just
on correctness, getting things to work,
but design as well.
So here, for instance,
is a super simple web
page for someone named
John Harvard that has
a header and a main part and a footer,
and header is distinct from head.
It's sort of poorly named here.
Head of the web page is just the tab
bar and other such things up top,
but semantically you might have
a page with like three parts.
Like the header, like the title
on the body of the page itself,
like the main part where
the actual contents
are, and then a footer like a copyright
symbol or something like that.
So this might be a general
division of a page,
but notice I've styled
it a little differently.
Let me go ahead and open this up in
a browser as I did just a moment ago
and go to, sorry, I'm going back
through my entire internet history here.
Let's go ahead and open this up
just as we did before at this URL
so that we can go ahead
and open up CSS0.html.
Notice that, oh, this is already
marginally better than the pages
we've looked at before if only because
it's centered, which is a step forward
from everything just being left.
The first line is a little bigger.
The second line is kind of medium
and the bottom line is the smallest.
So there's a little bit of style
here, but not all that much.
So how did I actually do this?
Well take a look at the code here.
I have added, now, a style
attribute to several of my tags.
So the header, the main
and the footer really
aren't styled in any specific way.
They're just a way of
telling the browser this
is the important stuff
for the title, this
is the important stuff
for the main part,
this is the important
stuff for the footer,
but the stylization or aesthetics
come from this yellow text
here, thanks to the IDE
syntax highlighting it,
and notice this text
follows a different pattern.
Up until now, we've been using
angled brackets and words
and equals signs and quotes.
Now, inside of those quotes,
we also have another pattern
when you're using this second
of two languages today, CSS.
fontsize:large is the stylization for
this particular element's content.
Text align should be center.
These are two CSS properties.
CSS, cascading style sheets, and
we'll see what that means in a moment,
but this is just how you configure
the style of those elements,
and indeed that's why one is a little
bigger and then a little smaller
and then even smaller because, notice,
I did fontsize:large, fontsize:medium,
fontsize:small.
All right, but as we've often done,
let's iteratively improve upon this.
Even if you've never
seen HTML or CSS before,
there's some poor design
manifest in this simple example.
What might you say seems wrong or
seems a little copy paste-like?
Yeah.
AUDIENCE: They're all
centered [INAUDIBLE]..
DAVID J. MALAN: Yeah,
they're all centered
and I literally like copied and pasted
that CSS property, its key value
pair, its name and
value, again and again
and again, but remember
the hierarchy of HTML
and the DOM, Document Object Model,
the tree we drew a little bit ago.
All of these elements--
header, main, and footer--
have a parent element called what?
AUDIENCE: Body.
DAVID J. MALAN: Yeah, body.
So one level higher,
which is indented this way
or in the tree is higher up in that
family tree-like drawing, all of these
are children of body.
So why don't I just move or
factor out text align center
into the elements above it?
And herein lies the cascading of CSS.
Cascading style sheets means that
if you have a property up here,
it will cascade down to all of the
children and descendants below it
and it means another thing, too.
You can even override
these properties somehow,
but we'll see that before long.
So if I go ahead now
and open up CSS1.html,
notice that I did
exactly that improvement.
The code's a little tighter now.
It's fewer characters,
easier to maintain
because now if I want to change
it to left or right or center,
I change it one place, not three.
And so this is kind of consistent with
some of our design takeaways from C
and indeed, if I visit this page,
CSS1.html, it looks the same,
but it's better design
underneath the hood.
But we can do a little better still.
If I open up CSS2.html,
notice that I've done this.
I rather like this design now
because it's even more succinct.
I'm not using the style
attribute anymore.
I'm using a different
attribute called class,
and class is kind of a way to define--
much like a struct in C lets you define
your own data types, a class in CSS
allows you to define a name for
a whole bunch of properties,
and so here I just said let's call
this class large, medium, and small,
and I don't know what
those mean, and frankly I
might be working with a friend
who's much better at design
than I am so I'm going to let him or
her actually define these meanings.
I'm just going to kind of tag
things in this way semantically,
but if we scroll up in this file, you'll
see that for now I have no such friend,
and so I implemented it myself,
and here's, for the first time,
one other thing in the head of the page.
Up until now, we've just had
the title, but it turns out
you can have a style tag.
Not just an attribute, but
a style tag inside of which,
it's a little cryptic at first glance,
but there's some pattern here, clearly.
You have all of those properties,
but the new syntax here
is that if you want to define
a word called centered,
you literally do a period
and then the word centered.
If you want a word like
large, you say .large.
So it's similar in spirit, though not
quite the same as like typedef in C,
but you say .center,
.large, .medium, .small.
You use our old friends curly braces,
which we will only see in CSS,
and this just defines
one or more properties
to be associated with that new keyword.
And so, if we scroll
down here to the bottom,
you'll see that I centered the body.
I made large the head, medium
the main, and small the footer,
and the result is going
to be exactly the same.
Very underwhelming, but
again, marginally better
design because now we are just one
step away of really improving this.
If I do finally have
that friend, it's not
going to be very easy to
collaborate, ultimately,
if we're both working on the
same file and moreover, it
seems unnecessary to
introduce these semantics.
Like, why do I have to have tags
like header and main and footer
and classes called large and
medium and small and centered?
Like, why don't I leverage the
names of these tags themselves?
And this is where HTML
can be pretty powerful.
Notice I've simplified
some of my CSS up top.
I've dropped the period,
which was like typedef.
Like, give me something called large,
give me something called medium.
Now I'm just saying literally a word,
but those words are identical to what?
AUDIENCE: The tags.
DAVID J. MALAN: The tags themselves.
So preexisting tags, if I just
mention them by name without a period,
which gives me a new name--
I just mention the body, the
header, the main and footer,
and then, inside of the curly
braces, define my properties,
now I can just stylize the actual
tags as they exist in my page,
and this now looks like really
readable, maintainable HTML.
There is no aesthetics associated
with the markup language here,
but rather there's useful tag
names that come with HTML--
you can't just make up your own tags.
They're in, sort of, the documentation,
but now it's just much more readable,
and this might look different on my
phone or your phone or your laptop,
but my friend who's good at
stylization can figure out
how to style all of these things, and
better yet, he or she doesn't even
need my file.
In the fifth example here,
notice that's it for the page.
We've gotten rid of the big style tag
and replaced it apparently with what?
AUDIENCE: Href, a link?
DAVID J. MALAN: Yeah, link href,
which is a horrible, horrible name
because it's not like a link
in the page and hyperreference
was already used for a link in a page,
but this is what we're stuck with.
This just says, hey browser,
include this CSS file
that is elsewhere on the server.
The name of this file
is arbitrarily CSS4.css
because this is our fifth
example here-- zero index.
The relationship of
this file to this page
is that it's a style sheet, which is
just a list of aesthetics or properties
that should characterize its layout
and indeed, if I open up CSS4.css,
I just copied and pasted
everything in there,
but this is nice now in
principle, even though we're just
creating work for ourselves
today, because now I
can share this file with someone else.
He or she can work on it on their own.
Then we can merge our work together
because my work's in the HTML file.
Their work's in the CSS file.
Better still, if we're making a whole
website that has a dozen pages or 100
pages, consider this.
Just like in a C header
file, I can include bitmap.h
in all sorts of programs.
Similarly can I include
CS4.css in all of my web pages.
So if I want to change the
font size or the layout
or whatever in all of my website
all at once, I change in one place,
not in every darn web page that might
have been created by me or by someone
else, and so there's just that
maintainability to it too,
but we can do even better than
that because even the CSS we're
looking at here is only so
good, and what's really nice
is if we go to bootstrap-- let
Google tell me where to go.
We're safe.
OK, so Bootstrap is a library--
formerly from Twitter, now
a much larger community-- that's
a whole bunch of CSS libraries.
So just as in C, we have code and
functions that other people wrote.
So in the world of web
development do we have
code that other people wrote and we
use that for JavaScript and Python,
but even for aesthetics are
there sites like Bootstrap
and other popular things that
allow us to make our sites prettier
and build them more quickly
without having to reinvent wheels.
So for instance, if I go down to let's
say Content and I go to Typography
and skim through here, you'll
indeed see like h1, h2 and h3,
but if you want things even bigger than
that there's like a display heading.
There's this fancy version,
which has a fancy display heading
with some faded secondary text.
So pretty marginal, but I don't have to
figure out how to do that now myself.
If I want to actually have tables,
I can do much prettier tables
than I did with my little old
school phone pad a moment ago.
Like I can make things different colors.
I can shade the columns like this and
in fact, you can do even fancier things.
If I go ahead and open
up a web page and go
to our big board for
speller.cs50.net, you'll
see that this is a pretty good
looking table as tables go.
Certainly much better than
the one before, but that's
because we're using
the Bootstrap library,
and even more compelling
than the aesthetics are
that suppose that you visit
speller.cs50.net on your phone,
it starts to get pretty ugly
once your window gets smaller,
but notice stuff can
just disappear magically
when you're on a mobile
device or, in this case,
simulating it by using just
a smaller browser window.
So using CSS and the aesthetic
power that it provides,
we can also dynamically change our
files to just render differently
on different devices, and then
lastly, let me open up, for instance,
this under Components.
This is where the really juicy stuff is.
If you want fancy alerts to yell at
the user or say everything is OK,
you get nice little
colored boxes like this.
The forms are much prettier.
I mean, already this looks much
more like the web you and I use
and not the mess of a form
that I created a moment ago
and long story short,
just like in C it's
pretty easy to include these things
in your own site, so can I do this.
Let me go ahead and open up
form0.html, and this is literally
an approximation of the very
first web application I made,
even before web application
was a phrase, in 1997.
I had taken CS50 and CS51.
I hadn't learned web stuff at the time.
I just kind of taught
it to myself and learned
from some friends and
the first thing I did
was build an interactive website
via which first years could register
for intramural sports because literally
that year in 1996 it was paper-based.
You'd walk across the yard, open
up Wigglesworth, one of the dorms,
slide a piece of paper--
old school-- under the door
and you were registered for a sport.
We could do better even in 1997,
and so we did it with the web,
and so this form0 back in the day looked
a little something ugly like this,
but there's a text box where
you could type in your name
and then there's the dorm
where you could select Matthew.
So I could actually do David Malan
and Matthews and then click Register,
but we don't yet have the
ability to make backbends yet.
So this form goes nowhere
for today, but you at least
get these kinds of aesthetics, which
are kind of 1997 aesthetics, literally.
But if we go into this
other example, form1.html,
it looks pretty, pretty better now.
It's maybe a little big in retrospect,
looking at the display font,
but all I've done is now use this
Bootstrap library, and notice,
it's a little hard to see
on the projector here,
but everything's kind
of like nicely outlined.
There's like Mark Zuckerberg
sample text there which
we can override by actually typing
in our own email address here.
We have a prettier looking box, a
prettier looking button, and that's
just because if we
open up, as down here,
form1.html, notice that
in addition to my HTML
down below and in addition
to a couple of other things
that I've added to make things
more mobile-friendly in particular,
I just added this.
I read the documentation
on getbootstrap.com
and I went ahead and added
Bootstrap's library to my own code
in order to have access
to its actual features,
and then down here, it's a little
overwhelming at first glance,
but I just followed the directions.
There's something called div in
HTML for a division of the page.
It means give me this
invisible rectangular region.
The class I associated with
it is called form group.
I didn't make this word up.
This comes from Bootstrap.
I just did what they told me to do.
I then have a label, which
makes things more accessible
and you can click in different places.
I have another class here
but long story short,
I just read the documentation
because I know what tags are,
I know what attributes are.
I know a little bit of CSS
now and I know how HTTP works,
and so really I have enough building
blocks in order to work on this myself.
So that then is CSS and there's one
last detail I thought I'd show us here.
In all of these John Harvard
examples, as in just a moment ago,
we had something like
this at the very bottom.
This {} ampersand #169;.
What was that rendering as, if
you notice, in the web page?
AUDIENCE: Copyright.
DAVID J. MALAN: Yeah,
the copyright symbol.
There is, on my US keyboard,
no copyright symbol.
So you need kind of a
pattern of characters
with which to represent those in HTML.
So just like we have /n and other
special escape characters in C,
you have what are called HTML entities
in HTML that you would only know from
reading the documentation, but
that's the copyright symbol,
but I thought it was rather timely to
point that out because just yesterday
or this morning, Apple announced that
with the very new version of iOS that
you can soon download, they added even
more damn Emojis to the Emoji character
set.
So these are certainly
in vogue these days
and not only do we see, now, a way to
represent special characters that you
couldn't otherwise type using
HTML, it turns out all this time
that Emojis are actually
just characters, chars,
but they're not 8 bits.
Recall that C as we've
been using it uses
ASCII, which uses only 7 or 8
bits total and Emojis, my god.
There's so many of them right
now and we need more than 8 bits
to represent them, and thus was
born something called Unicode.
Well, that is not why
Unicode was invented,
but this is what Unicode is now being
used for because these emojis are
simply like ASCII characters but
multiple bytes, generally two bytes,
maybe three bytes, and in
fact, if you go on unicode.org,
you can see that if the number in hex
1F600 represents the grinning face,
which happens to be implemented
differently by different companies
on different devices,
but if in closing here,
I open up this same file and I change
this to 1F600 in hex, 1-F-6-0-0, save,
and I go back to my browser
and I go back to CSS0,
now we have a very
happy web page for you.
So that's it for today.
I'll stick around for questions
and we'll see you next time.
