[MUSIC PLAYING]
DAVID J. MALAN: This is CS50.
And today, we transition from the
world of C and, with it, pointers
and some of the struggles that you
might have felt over the past few weeks
to a more familiar world,
that of web programming.
I'm using web browsers and mobile
devices and laptops and desktops
and creating more graphical
and more interactive experience
than our traditional command-line
terminals have allowed.
And we'll see, though, along the way
that a lot of the ideas that we've
been exploring over the past few weeks
are still going to remain with us.
And we're going to see
them in different ways.
We're going to see them in the form
of other languages and other syntax.
But the ideas will remain
quite reminiscent of what
we did back in week 0.
So TCP/IP is perhaps
the most technical way
and the most low-level way we can
quickly make the web uninteresting.
But you've probably, at least,
seen this acronym somewhere, maybe
on your Mac, your PC, some
setting maybe once upon a time.
And this, actually, just
refers to a protocol
or, really, a pair of
protocols, languages of sorts
that computers speak in
order to transmit information
from one computer to another.
And this is what makes most
of the internet today work.
The fact that you can pull
up your laptop and desktop
and talk to any computer on the
internet is because of these protocols,
conventions that humans decided
shall exist some years ago.
And they just dictate how
computers intercommunicate.
But let's make it a lot more familiar.
In our human world, you've probably, at
some point, sent or received a letter.
These days, it's
perhaps more electronic.
But, at least, you've
gotten one such letter
from probably a human, maybe
a grandparent or the liked,
or sent something yourself.
But before you can actually send
that message to the recipient
and put it through the US mail or
the international mail services,
what needs to go on the envelope?
AUDIENCE: Address.
DAVID J. MALAN: Yeah--
so some kind of address.
And what does an address consist of?
AUDIENCE: Name.
DAVID J. MALAN: Name.
AUDIENCE: Where they are.
DAVID J. MALAN: Where they are.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: So where they are might
include a street address and a city,
a state, a ZIP code in the US,
or a postal code, more generally,
and the country, if you
really want to be specific.
And so all of that goes on
the front of the envelope,
generally in the center of the envelope.
And then what often goes on the top
left-hand corner in most countries?
AUDIENCE: The return.
DAVID J. MALAN: Yeah.
So the return address-- so
that if something goes wrong,
albeit infrequently, that letter
can get-- make its way back to you,
and also the recipient knows just
immediately who actually sent them
the no.
So that is enough information
to get a letter from point A
to point B because these
addresses, these postal addresses
in our human world, uniquely identify
houses or buildings or people,
in some sense, in the world.
So right now, we're at 45 Quincy Street,
Cambridge, Massachusetts, 02138, USA.
That is probably enough
specificity for anyone in the world
to mail us a postcard saying
"Hello world" in written form
and get it to this building.
Meanwhile, if we wanted to send
something to the Science Center,
1 Oxford Street, Cambridge, Mass,
02138, USA, that's its unique address.
So it stands to reason that computers,
including our own Macs and PCs
and Android phones and
iPhones and the like,
all have unique addresses, as
well, because, after all, they
want to communicate.
And they need to get bits, zeros
and ones, from point A to point B.
But they're not quite as verbose
as those kinds of addresses.
Computers have what you
probably know as IP addresses,
Internet Protocol addresses.
And this just means that
humans decided years ago
that every computer in
the internet is going
to have a unique number identifying it.
And that number is generally of the form
something dot something dot something
dot something.
And, as it turns out, each of
these somethings between the dots
is a number from 0 to 255.
And now, after all these
weeks of CS50, your mind
can probably jump to a quick answer.
How many bits must each of
these numbers be taking up
if the range is from 0 to 255?
Eight.
So eight-- and why is that eight?
So 256 has been a recurring theme.
And if you don't recall, that's fine.
But yes, this is eight bits, eight
bits, eight bits, eight bits,
which means the numbers that we humans
use to uniquely identify our computers
on the internet are 32 bits in total.
Well, there's probably another
number that can roughly come to mind.
If you've got 32 bits, how high can you
count, roughly speaking, from 0 to--
I heard a murmur--
AUDIENCE: Four billion.
DAVID J. MALAN: Four billion.
So it's roughly four billion.
And we brought that up in week 0
with a four billion-page phone book,
imagining that.
So four billion is roughly what
you can count up to with 32 bits.
So that means there can be four
billion computers, devices, or anything
on the internet, uniquely
identified-- small white
lie because that's actually not quite
enough these days with all the devices
and all the humans in the world.
But we found workarounds for that.
Question?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: But only
half of them at the time.
No.
So yes, if by 2023 or whatever
year humans are projected
to be almost entirely
online, and there's
some-- billions and billions
of people, eight billion or so,
then that's a problem for this system.
Thankfully, as long ago as 20
years ago did people realized,
mathematically, this was
going to be a problem.
And so there's actually a newer
version of IP, Internet Protocol.
This is version 4 we're
talking about, which is still
pretty omnipresent in the world.
Version 6 actually uses not 32 bits,
but 128 bits, which is massive.
And I can't even pronounce
how big of a number that is.
So we're thinking about it.
And the biggest companies
of the world have already
transitioned to using bigger addresses
rather than these 32-bit addresses.
But these are still pretty common in
almost any device you might own or see
on campus or elsewhere.
So if you have a unique
address, that's enough to put
on the front of the envelope.
And it turns out that if you're
sending an email or a chat message
or whatever, you, too-- your Mac,
PC, or phone-- has an IP address.
So that's enough to put in the top
left-hand corner, conceptually.
But you need one more
piece of information.
It turns out that on the internet,
there are servers, computers,
that are just constantly
listening for people to connect
to them, like us, checking our
email and visiting Facebook
and Gmail and other such websites.
And those servers, though,
can do multiple things.
Google has lots of businesses.
They give you email and web
services and video conferencing
and lots of other
internet-based services.
And so humans also decided,
years ago, to identify
all of these possible internet
services with just unique numbers--
names also, but also unique numbers.
And it turns out that
humans decided years ago
that when you visit a website, there's
one more piece of information that's
got to go on this envelope, not
just the server's IP address
that you're trying to connect
to, but also the number 80
because 80 equals HTTP, acronym
you're surely familiar with by now.
And that just denotes
this is a web request.
If, instead, it said something like
25, that's SMTP, which is email.
So that might mean inside
of this virtual envelope
is actually an email message
going to Gmail or the like.
And there's bunches more numbers.
But the point is that there are
numbers that uniquely identify.
So when Google gets a virtual envelope,
just a whole bunch of bits, zeros
and ones, that, in some way, has an
IP address on it as the destination,
it also knows, oh, is this an email
or is this a video conference message
or is this a chat message
or something else.
So just to make this
more real then, if I'm
going to go ahead and write
this down, my IP address to whom
I'm sending something might be 1.2.3.4.
Generally, then, I'm going
to send it to, say, port 80.
Maybe my IP address is 5.6.7.8.
And so an envelope--
I'll be at [INAUDIBLE]---- and it's
really just going to have those pieces
of information-- the
destination address, colon,
and then the number of the service
you care about, HTTP or whatever,
and then your own IP address,
and more information.
But the point is both sender
and recipient in dresses--
that's enough to get data from one
computer in the world to another.
And there's so much more complexity.
This is a whole field in
computer science of networking,
if you like this kind of stuff.
But that's how, in a nutshell, the
internet gets data from point A
to point B. And this
envelope just represents
a whole bunch of zeros and ones.
But what's inside of that envelope?
And that's where we'll focus
today and in the weeks to come.
It's actually content.
It's the email you care about
or the web page you care about.
And how do we actually decide
what server we're connecting to?
Well, typically, you might go to
a so-called URL, Uniform Resource
Locator.
A URL is just the address of a server.
And that's going to be the-- really,
the ultimate recipient of that envelope
that we're trying to send.
But this, of course,
is not an IP address.
This does not follow the
pattern something dot
something dot something dot something.
So if all of us humans are
constantly typing stuff like this
into our browsers, yet
the whole story just
told is about numbers and port
numbers and low-level stuff,
where's the connection?
Does anyone already know
how you get from typing this
to a bunch of zeros and ones that
are somehow addressed with numbers?
DNS, I heard.
What's DNS?
Yeah.
So it turns out there's a technology
in the world-- domain name system,
in fact.
And DNS, Domain Name System, is just
a type of service on the internet
that Harvard maintains
and Yale maintains,
and Comcast and Verizon and
a lot of the big players
in the world, whose
purpose in life is to run
servers that convert what are
called domain names to IP addresses,
and vice versa, so that when we humans
type in www.example.com into a browser,
it's our Mac or PC or phone that
contacts a local server, a DNS server,
on the local campus or university
or apartment or whatever,
asks what is the IP address
for www.example.com.
And then what your Mac
or PC or phone does
is it writes that
address on the envelope.
But it puts a request for specific
web page inside of the envelope.
And when you get back a
response from that server,
it's going to be your address
that's on the front of the envelope.
And inside of the
envelope is going to be
the web page or the
email or the chat message
or whatever it is you were
trying to actually access.
So let's tease this apart
into some of its components.
First of all, this thing
here highlighted in yellow
is officially the domain name.
You've probably all
used this term before.
It's usually something dot something.
"Com" typically refers to commerce
or commercial, although anyone,
for any purpose, can use .com.
Back in the day, very popular were
.com, .net, .org, .edu, .gov, .mil.
And these were all very
US-centric because it
tended to be the United
States that really kicked off
this use of the internet and DNS.
But now it's certainly spread globally.
And so there's hundreds now of what
are called TLDs, Top-Level Domains.
They tend to be three or more
characters if they denote a word.
And they tend to be two characters
if they denote a country,
like US is United
States, JP is Japan, UK--
United Kingdom, and so forth.
Those are just country codes
that do the same thing.
But what's this at the front?
Worldwide web, or www, here, more
generally, is an example of what,
technically speaking?
What is this?
What does this mean?
Yeah?
AUDIENCE: Subdomain.
DAVID J. MALAN: It's a subdomain--
is one way of thinking about it.
In fact, all of you, many
of you here, probably
have email addresses of the form
college.harvard.edu or g.harvard.edu
or the like.
Those are subdomains.
Harvard's such a big
place that they actually
put everyone in different categories of
domains, otherwise known as subdomains.
And that might be a word or a phrase
that comes before the domain name here.
But it can also just mean
the name of a server.
So if example.com is the company or
business whose website you're trying
to visit, their domain is example.com.
And they bought that
domain name some years ago.
And they spent a few dollars every year,
probably, renewing the fee for that.
And they have at least one
server whose name is www.
And that exists within their domain.
They might have dozens or
hundreds or just one server.
Each of them can have a name.
So this is generally
called the hostname.
So when it's an email address,
it often implies a subdomain,
like a category of addresses.
But when it's in a URL like this,
it means probably a specific machine
or a specific set of
machines-- conventionally,
the web servers that the company runs--
doesn't have to be called www.
For historical purposes, MIT
tends to use web.mit.edu.
But almost everyone else in the
world uses www or nothing at all.
It's not required.
You can actually just visit many
websites without visiting any hostname.
And it just works, as well, thanks
to DNS giving you the IP address.
But what about the file
you're actually requesting?
What does it actually
mean to visit this URL?
Well, on many servers, this
implicitly means, hey, web server,
give me a file, just a text
file, called index.html.
That's the name of
the file, a text file,
that you could create with CS50
IDE or even Notepad or TextEdit
on your own Mac or PC that
contains a language called HTML.
And we'll take a look at
that language in just a bit.
And some of you might
have seen it before.
But the language in which web
pages are written is HTML.
And we'll give you the building
blocks, conceptually and practically,
for that today.
You'll use it over the coming
weeks in many different contexts.
But we'll use it, ultimately, to
create the contents of websites.
But today, we'll focus
first on this, HTTP.
Anyone know what that stands for?
Yeah?
AUDIENCE: HyperText.
DAVID J. MALAN: Yeah.
HyperText Transfer Protocol.
And honestly, in most
of technology, it's
not so much what the acronyms
represent that's all that important,
but, really, what the technology does.
And in this case, HyperText
Transfer Protocol--
we'll see hypertext in a moment.
That's another way of saying HTML.
Transfer Protocol-- P for
Protocol-- that's another buzzword.
So protocols are not
programming languages, per se.
They are conventions.
And we humans have conventions, too.
For instance, if I were to meet
someone for the first time,
I probably wouldn't stand on stage
and lean down like this to do it.
But I might say, hi, I'm David.
AUDIENCE: Hi.
I'm Stephan.
DAVID J. MALAN: Stephan,
nice to meet you.
And we have this weird handshake
that was aborted prematurely there--
that we have this weird convention--
us humans, at least in the US,
of greeting someone with a handshake.
And Stephan just knew to
do that, however awkwardly.
And then he disengaged because
the transaction was complete.
And that's not unlike
what a web server does.
When you request a web page,
you're sending a request to someone
as though you're extending your hand.
You're expecting something in return.
But in the case of a
computer, of course,
it's like the web page itself coming
back in an envelope from point B
to point A.
So that's what a protocol is.
We just have been
programmed to know what
to do when we want to request
a greeting or information
and get something back in return.
It's like a client-server
relationship in a restaurant.
A customer requests
something off the menu.
The server, the waiter or
waitress, brings it to them
and, thus, completes
that transaction as well.
And that's what the internet is, too--
clients and servers, browsers
and servers, computers
and other computers, ultimately.
So with that relationship
in mind, let's take a look
at what's actually
inside of this envelope.
In the case of Stephan's and my
greeting, it was more visual.
But in the case of a computer, it's
going to be more textual, literally.
So inside of the envelope
the, virtual envelopes,
so to speak, that your
browser sends to a server
when trying to request
a web page, is actually
a message that looks like this.
Thankfully, it's not terribly
cryptic, although the dot, dot, dot
implies there's more contents
inside of the envelope.
But the keyword here
literally is gets, a verb.
And there's other verbs
that the browser can use.
And this one literally means,
get me the following home page.
What home page you want to get?
Well, the default one.
This forward slash, as it's called,
just represents the default web page
on a website.
And in many cases, that implicitly means
an actual file called index.html, just
a convention.
It can be called other
things and not exist at all.
But in many cases,
that means, implicitly,
get me a file called index.html.
And we'll see what that
looks like in a moment.
Http/1.1 just means, hey,
Stephan, I speak HTTP version 1.1.
Hopefully, you do as well.
There can be other and newer and
older versions of the same thing.
Notice down here, though--
whoops-- notice now here, though,
that the hostname is
also in this envelope
because it turns out that web servers
can do multiple things at once.
And they can serve multiple domains.
You don't need your own personal
unique server to serve a website.
You can have tens, hundreds,
thousands of different websites
all on the same server.
And if any of you ever paid for your own
domain name or your own personal home
page or the like, you are
probably paying someone
for shared space on one
server or more servers,
not for your own personal dedicated one.
But again, this might implicitly
mean the same thing as this.
Give me index.html.
So what is it that actually
comes back from the server?
The server, hopefully, responds
with a message that looks like this.
It responds with confirmation of the
version of the protocol it speaks.
That's like Stephan saying,
yes, I speak HTTP 1.1 as well.
200 is a numeric code that
signifies literally OK.
All is well.
I understood you.
Here is the information you requested.
And Content-Type, below it, is
a more technical way of saying,
the type of content I'm handing
back to you in my own envelope
from point B to point A,
or from Stephan to me,
is in a language called HTML
that happens to be text.
Why does it look like this?
Humans, years ago,
just decided that this
would be the sequence of
characters that computers literally
send to communicate that information.
So let's actually try this in one case,
maybe, for instance, with harvard.edu,
and see what actually happens
to see what else we might see.
So let me go ahead and open
up Chrome, or any browser,
for that matter, that supports some
kind of debugging and diagnostics.
And I'm going to do this.
And you can access this
in different places.
I'm going to go up to View,
Developer, and View Developer Tools.
This is something that
comes with Chrome.
You sometimes have to enable it
in Safari and other browsers.
But almost every browser these
days has this capability.
And you'll notice that this just
opened up a whole bunch of tabs
at the bottom of my
screen here that I'm going
to be able to use to
actually explore what is--
did I kick something else?
Apologies.
It's back-- won't step on there.
So what is this going to allow us to do?
Well, notice there's a
lot of features here.
It's overwhelming at first glance.
But there's a tab here called Network.
And it turns out that one of the
features Chrome gives to developers,
which you now all are--
is software developers--
is the ability to see what's going
on underneath the hood of a browser,
to see what is inside of
these virtual envelopes
that your browser has all
those years been sending
from itself to servers elsewhere.
So I'm going to go ahead and do this.
I'm going to go ahead and actually
visit http://harvard.edu and hit Enter.
And you'll see a whole
bunch of stuff happens,
including the web page appearing
at the top of the screen.
I'm going to ignore all of
this stuff at the bottom
except for the very, very first request.
If I zoom in on this, notice
that highlighted in blue
here is the very first
request, harvard.edu.
And if I click on that, I'm going to
see a little more information at right.
And if I go scroll
down to what are called
request headers, the lines of
text that were inside the message
that my browser sent,
this is literally what
my browser sent inside the
envelope, unbeknownst to me,
when I visited harvard.edu.
Thankfully, it confirms my
prediction earlier, get/http/1.1,
because I requested
harvard.edu's home page.
Host is harvard.edu.
Then there's the dot, dot, dot, the
stuff that we don't particularly
care about today.
But let me go ahead and
look at the response.
So this was my request.
This was my hand going out to Stephan.
Let's see what his or
the server's response
is by scrolling up to this,
which is called response headers.
Harvard's server, fortunately,
does speak the same protocol
as me, 1.1 of HTTP.
But apparently, Harvard
moved permanently.
What does that mean?
I went to http://harvard.edu, not there.
Where is it?
Well, there's a little
more information here.
There's a lot of dot, dot, dot,
things we don't care about.
But if we focus on one
that-- oh, location--
where is Harvard now, apparently?
Yeah, say--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
It looks like Harvard "moved"
permanently from http://harvard.edu to,
and let me highlight it,
https://www.harvard.edu,
with two notable changes.
One, there's the www.
And two, there's also what
that might catch your eye?
S, which most of you probably
know these days means secure,
and which implies encryption in
the spirit of Caesar and Vigenere,
but much more secure than
those simple ciphers.
The information is somehow
scrambled now when I'm communicating
between myself and harvard.edu.
So there's two decisions there.
Harvard has decided that they
want to allow and, indeed, require
users to visit their website
securely so that no one--
no company, no government,
no family members--
can necessarily see what is being
requested of Harvard's website
because that is scrambled
information, much like using something
like Caesar or Vigenere.
And Harvard also, probably
for branding reasons,
but also partly for
technical reasons, decided,
we want you to think of our
website as www.harvard.edu.
And it's a mix of marketing and
technical for a few different reasons,
one of which is www we humans
just all know means website.
And if you see harvard.edu--
this is less true these days--
might not necessarily imply as
obviously that this is a websites URL.
Frankly, not too many years ago, even
advertisements and TV ads and printed
ads and the like would even show http://
to really make clear to viewers that
this is a web address.
But gradually, as more and
more people get on the internet
and understand technology
and URLs and the like,
we can just start dropping the stuff
that is unnecessary clutter because all
of us now know intuitively,
oh, harvard.edu-- it's
probably a web address that I
can just type into a browser.
And the browser or the server
will finish my thought for me
and actually prepend the secure
URL or the www or the like.
So we still haven't actually
found Harvard, it seems.
So let's do this instead.
Let me go ahead and zoom out
and visit a different URL.
Let me go ahead and, again, go
to View, Developer, Developer
Tools, Network Tab.
And now let me visit that more verbose
URL, more precise URL, and hit Enter.
Again, a whole bunch of
stuff gets requested--
more on that some other time.
But now, if I click on
the first such request
and look at my response
headers, you'll actually
see, albeit in a different format now,
that the status of this request is 200,
which, recall, meant--
AUDIENCE: OK.
DAVID J. MALAN: OK.
OK.
So now these are two numbers
that, honestly, you've
probably not really seen or cared
all that much about, 200 and 301.
But odds are you've seen at least
one other number when visiting URLs.
For instance, besides actually seeing
200 and 301, you've probably seen 404.
Now, it apparently refers to Not Found.
But more in real terms,
what does that mean?
How do you induce that error?
AUDIENCE: The site doesn't exist.
DAVID J. MALAN: The site doesn't exist.
You mistyped a URL.
The web page doesn't exist.
A system administrator just changed the
name on something or it's an old URL.
Any number of reasons can mean
that the file was not found.
That file might have been
index.html or any other URL.
But all this time when you visited
a website and you've seen 404,
it's not clear, frankly, why servers
have been bothering to tell us 404.
Most people don't need
that level of information.
But it derives from that HTTP
response, that first line
of text inside the envelope coming
back from Stephan or the web server,
more generally, that
says 404, Not Found.
And that means the user
probably did something wrong
or if the data has simply
disappeared from the server.
And there's so many more
of these things as well.
And in fact, you might get
responses, like we just
did from Harvard, supporting not
just 1.1, but version 2 of HTTP.
So just realize if you tinker
with your own Mac or PC,
the messages might
look a little different
based on your browser and the website.
And that's just because
things are evolving over time.
And versions are changing.
But there's so many others of these.
And this is just a
short, abbreviated list.
200 and 301 we saw.
404 you yourselves have probably seen.
401 and 403 generally refer
to you haven't logged in
or you're just not authorized
to access information
because it doesn't belong
to you, for instance.
500 you're all going to
experience before long--
that 500 is Internal
Server Error, which is not
so much the server's error
as your fault and my fault
when we've written buggy code.
So in the weeks to come,
not this week, but when
we start writing Python code
and SQL to talk to databases,
we're all going to
screw up at some point.
And a browser will often see
a 500 error from a server
if, indeed, there's a problem with code.
418 doesn't actually exist.
This was a April Fools' joke,
I think, in, like, 1988,
where some people with
a lot of free time
wrote up a whole formal specification
for an HTTP status code, a 418,
I am a teapot.
And it's still kind of exists
in lore, internet lore.
So those are just some of
the numbers you might see.
But they're not all that technical if
you just know where to look for them
and you know, as a developer
now, what they signify for you.
Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Good question.
What's the difference
between 200 OK and 302 Found?
So 302, if you read
into the documentation,
would actually tell you
that this also induces
a redirect, whereby, just like 301,
when the browser gets a 301 or a 302,
the browser should be redirected to
the new URL that we saw in the header,
so to speak, called location,
colon, whatever it was.
The difference is that
Moved Permanently means
that the browser should remember
that this redirection is happening
and stop bothering the server
with the same original quest.
Just remember what the new URL is.
302 means found it,
but don't rely on this.
Keep asking me again and again.
So it's just a performance
optimization so you
don't annoy the server unnecessarily
in the case of 301s, which just
costs time and money, in some sense.
So you might have heard
about this before--
can only get away with this
Cambridge, not so much New Haven.
Has anyone ever visited
safetyschool.org?
AUDIENCE: Hey.
DAVID J. MALAN: You're welcome
to on your laptop or your phone.
So some very clever Harvard students,
I think, years ago bought this domain.
Frankly, they've probably
been paying, like, $10 or more
per year ever since just
to keep this joke alive.
But it's wonderfully illustrative
because if we go back
to Chrome or any browser--
and let me go ahead and open up a
browser tab and go to safetyschool.org,
Enter.
Oh, interesting.
Where did I get redirected?
AUDIENCE: Hey.
DAVID J. MALAN: Hey.
So the more interesting question
for us is, how are they doing that?
Well, let me go back
into the IDE for a--
or actually, let me go into my
browser and open up a new tab--
View, Developer, Developer Tools.
Look at the Network tab.
And now let me go ahead--
whoops-- let me go ahead and
visit http://safetyschool.org.
Enter.
Scroll back up to the top,
where I see the first request.
And you can see, more technically,
if this doesn't take the fun out
of the joke, all these
Harvard students did
years ago was configure this
domain name to return a 301,
Moved Permanently to Yale University.
Now, it's only fair, especially
since the Yale students are watching
this live right now from New Haven--
let's take a look at one other
site called harvardsucks.org.
So this domain, too, does exist.
Let me clear that screen and
go to http://harvardsucks.org.
Enter.
And this is an actual website.
So not only did these enterprising
Yale students buy the domain name,
they've also been hosting
the website for years since.
There's a wonderful YouTube video there
that actually speaks to a very fun hack
that they did some years ago at
Harvard-Yale, the football game.
But you can see here, oh, that--
so there's a minor one.
So harvardsucks.org actually now
lives at www.harvardsucks.org.
But then you actually stay there.
And so I encourage you to go to this
site, as well as the other, for all
your Harvard and Yale shopping needs.
So that is HTTP.
HTTP is the protocol, the set
of conventions, that browsers
use when talking to web servers.
And it's the protocol
that governs how those web
servers respond to the browsers.
We've quantized this in the form of
these virtual envelopes, which is just
a physical incarnation of the zeros
and ones that are technically going
back and forth across the internet.
But it's embodied in my handshake
with Stephan, what's really happening.
I initiate.
He responds.
And it's like a client-server
type relationship.
So how do you actually
now do creative work?
How do you make yale.edu?
How do you make harvardsucks.org?
How do you make CS50's own
website or Google or Facebook?
Well, what really matters now what's--
is what's deeper inside
of that envelope.
In addition to these headers,
this textual information,
like 200 OK or 301 Moved
Permanently, there's
another language embedded inside
of that envelope, deeper down,
called HTML, HyperText Markup Language.
This is the language, which is also
text, in which web pages are written.
And so if you've ever visited
a website on the internet,
and I just noticed that Erin is doing
that on repeat, isn't she, what's--
you're looking at is a
browser's rendering of HTML.
So HTML is just text.
And we're going to see it in a moment.
The browser reads that text top to
bottom, left to right, much like Clang
reads your C code top to
bottom, left to right.
But rather than convert your text to
zeros and ones, what a browser does
is interpret it line by line by line.
And it does what you say.
So if you say, hey, browser,
put Erin's photo on the screen,
it is going to do that.
If you say, hey, browser, write the
words "staff" in big black text,
the browser's going to do that.
If you tell the browser to lay out
a whole menu, it's going to do that.
And we'll see, in just a moment,
how you convey those terms.
HTML is not a programming language.
It is, indeed, a markup language,
which means it just lays things
out structurally and aesthetically.
So the website here that we're looking
at has a bunch of images, all of which
are what are called
animated GIFs, which are
very much in vogue these days on Reddit
and phones and iMessage and the like.
But those are just images, files,
that are actually being transferred
from CS50 server to your browser.
But if I go up to View, Developer,
and now View Source, and you can--
could have been doing
this all these years--
you can actually see the so-called
HTML that drives CD50's website.
So this is all of the
HTML, and I'm deliberately
scrolling fast through it, that
implements that CS50 staff page.
And if we scroll all
the way to the bottom,
you'll see that 1,008 lines
later is the web page done.
But it's just text.
And, in fact, let me scroll back
up to the top and just point
out a few salient details.
You'll see familiar
patterns in the examples
we're about to start looking at.
The very first line probably
is that, DOCTYPE HTML, which
is like a little hint to
the browser that says,
quite explicitly, hey, browser, the
document type you're about to see
is indeed HTML.
But the rest of the web page
follows a structural pattern.
And you'll see that it's
already nicely indented,
even though some of these lines
are a little long and are wrapping.
But you'll see this convention, an open
bracket, which is an angled bracket,
like a less than sign, the keyword
html, maybe some pattern like this,
lang equals en-us--
this sounds like language--
a US English, maybe--
more on that in a bit-- and then this
close bracket, or a greater than sign,
that completes the thought.
Then inside of that HTML tag, so
to speak, indented beneath it,
is this, the head of the web page.
The head of the web page something
that you mostly can't see.
It generally refers to the tab
at the top of the page and just
invisible information.
And if I scroll down further, we'll
see, really, the guts of the web page,
which are in the so-called
body of the web page.
So these things that I've
just been highlighting,
albeit in a very big context
of a big, 1,000-line web page,
are just called HTML tags.
HTML is a tag-based language,
a markup-based language,
where you just say what you want to
appear where you want it to appear.
So what does that actually mean?
Well, let's take a look
at a simpler example
in the form of this slide, which
is perhaps the simplest web page
that you can make, this one here.
This is perhaps the simplest
correct, syntactically correct, web
page you can write that's saying, hey,
browser, the type of document is HTML.
Hey, browser, here's the
start of my HTML page.
Hey, browser, here's
the head of my web page.
Hey, browser, here comes
the title of my web page.
Hey, browser, the title of this page
shall be, for the sake of discussion,
"hello, title."
But you could say literally
anything there that you want.
But now things get interesting.
And some of you have certainly seen
HTML before, and some of you haven't.
But you can probably
just infer, even if you
haven't seen HTML, what this tag
is doing because it looks the same,
but yet a little different.
So if this is saying, hey,
browser, here comes the title,
what is this probably
saying, intuitively?
AUDIENCE: Just ends.
DAVID J. MALAN: Yeah.
That's it for the title.
Hey, browser, that's it for the title.
So you might call this a
start tag and this an end tag,
or an open tag and a close tag.
Think about it however you want.
But in HTML, there's
generally this nice symmetry.
Once you start something,
you eventually finish it.
And you do it in the right order.
So you do-- you start tags in one order.
And then you close them in reverse order
so that everything is nicely symmetric.
And indeed, the
indentation, just like in C,
technically doesn't matter at all.
You could have a really, really ugly
web page with no whitespaces whatsoever.
And it would still work fine for the
browser because it doesn't care--
just much harder for us humans to read.
So this convention is to
indent, just like in C,
just so it's more clear what
the hierarchy or the nesting
is, so to speak.
This line here means, hey,
browser, that's it for the head.
It's another close tag.
Hey, browser, here comes
the body of the page.
So much like head here, body
here, most of the page's content
is, indeed, in the body of the web page.
That's what you, the
humans, actually see.
And mostly in the head, we'll
just see things like the title
and just a couple of other
things in a little bit.
The message inside this web page
is apparently, "hello, body,"
then close body, close html.
And that's it.
So when I said earlier that
inside of these envelopes
is just a whole bunch of
text, all I meant was this.
This is what's inside of
this envelope just below
the protocol information, the HTTP
information, that just said 200 OK
or any of those other messages.
So when the browser receives
this envelope, it opens it up.
It reads it top to
bottom, left to right.
And then it literally interprets
that file top to bottom,
doing exactly what you tell it to do.
So how do we go about
actually doing this?
You can write HTML on any text program.
You can write it in TextEdit,
on a Mac, on Notepad, on a PC.
You can, technically, use
Microsoft Word or Google Docs.
But that's out of context and bad.
Those give you features you don't want.
But you generally want a text editor.
And we, of course, have a
text editor in CS50 IDE.
So let me actually go there.
I'm going to go into CS50 IDE.
And I'm going to go up to File, New.
And I'm going to go and preemptively
just save the file with the only file
name I remember from earlier,
which was index.html.
Just like C programs end in
files called something .c,
HTML files often end in .html,
sometimes .htm, but often .html.
So let me go ahead and click Save there.
And now I'm going to go ahead and
do a-- type exactly that same code--
so open bracket, exclamation point.
And that's the only
exclamation point we'll expect.
The first line is, unfortunately, a
little different from all the others.
Then I'm going to do open
bracket, html, close bracket.
And you'll notice that, just like with
C, the IDE tries to be a little helpful
and finish your thought.
So it already closed the tag for me.
Now it's just on me to hit
Enter to move it into place.
Now I'm going to-- what
came next inside the--
uh-oh.
What came next?
The head-- so open bracket,
head, close bracket.
Inside of head was--
yeah, title.
And then I think it just
said, "hello, title,"
though I could call
that anything I want.
Then below the head, but inside
the html tag still, was my body.
So let me type that here.
And I think I said, "hello, body."
So-- bdoy, boday.
OK, body-- save.
So now I have a text file in the IDE.
It seems to match up with what we
showed as a canonical page before.
Now we need to load it in a browser.
And this is a little
paradoxical because I'm,
obviously, writing
this text in a browser,
and yet I need the browser to read it.
So this is just because the IDE,
Integrated Development Environment,
that we've been using
is, itself, web-based.
That's just an incidental detail.
The fact that I have written this code
in a file now is what's important.
It could be in the cloud as it is.
It could be on my Mac.
It could be on my PC.
It could be on any other
server on the internet.
The point is I need to
access this file somehow.
And so it turns out that
we're not going to compile it.
There are no zeros and
ones involved anymore.
There is no machine code.
We're going to leave it just like this.
HTML is interpreted, literally,
line by line, top to bottom--
no zeros and ones needed.
But I am going to need to run my
own web server, not the IDE itself.
I want to run, as the
developer, my own web server.
What is a web server?
It's like Stephan.
It's just a program sitting
there, waiting and waiting
and waiting for something to happen.
And that's something is, presumably, a
request from a browser, at which point
it will respond with a handshake or,
more specifically, with this file.
So how do I do this?
Well, in the IDE, we actually include
a free program called http-server.
All of the software in CS50
IDE is free and open source.
So we've simply chosen some of the
most popular packages, one of which
is called, literally, http-server.
And if I go ahead and hit Enter,
you'll see somewhat cryptic information
at first.
But let's see.
It's starting up the http-server.
It's serving dot slash.
Well, what does dot mean?
This folder.
So just serve up the contents of
this current folder that I'm in.
Now it's saying it's
available on this URL.
And this URL's going to
vary by who is running this.
If you're running it, you're
going to see a different URL.
But what is interesting is the number--
turns out that, because this is
my little own personal web server,
it's not using port 80, which I
claimed earlier was the default.
It's using a different convention, 8080.
8080 is just a human convention.
It's not standardized in the same way.
But this way, I can serve
files separate from the IDE
because the IDE itself is
actually listening on port 80,
or, technically, 443,
because it's using HTTPS.
And I don't want to confuse my
files with CS50 IDE's own files,
the actual user interface
that you're all familiar with.
So, just like Stephan can hear from--
say hello to multiple people and Google
servers can handle multiple services,
so can my own IDE listen on
multiple ports, as they're called--
80, 25, 443, or, in this case, 8080.
So what does this all mean?
I'm going to go ahead and
literally click on this URL,
open it in another tab on my browser,
and you'll see somewhat cryptic output.
But this is just a succinct way of
saying, here is the index, the listing,
of slash, which is now the
default area of my website.
I've got two folders, source 5,
which is on the course's website--
it's all of today's files in case we
want to look them up without writing
them from scratch--
and then the file I just
created, index.html.
So if I go ahead now and click on
index.html, there we have it-- hello,
body.
And we don't see the tab just
because I full-screened Chrome.
But if I actually remove
that full screening
and zoom up to the top of the
tab, you see "hello, title" there.
And if I go back into this
file, meanwhile, and I say,
"hello, body, nice to meet
you"-- this one got weird--
now I'm going to go
ahead and click reload.
And now you see this.
Let's go ahead and take
a five-minute break
sooner, rather than later, so that
we can address the projector issue.
And we'll be right back.
So to recap, there are more tags than
just html and head and title and body.
There's things that give us
images and sounds, certainly,
and many, many, many other things.
So let's take a look more manually
at just one or two other examples
and then get a sense of the whole
menu of tags that might be available.
Let me go ahead and
create a new file now.
And I'll go ahead and
call this image.html.
And in anticipation of
making a demonstration now
that has an image, to
save time, I'm just
going to go ahead and paste the
contents of the previous file.
But I'm going to go ahead and
get rid of the body this time
and start to actually
embed an image in here.
Now, in advance, I've downloaded an
image of Yale's own bulldog, Handsome
Dan, in a file called dan.jpeg.
And I've uploaded it to
the IDE in the same folder
that index.html is in and
now that image.html is in.
And you can include an
image by using an img tag.
But you have to specify to the
browser what the image you actually
want to embed is.
And so to do this, as you
may know, we have attributes.
So just like the html tag, as we saw
earlier and can now see in the example
here, has a language attribute
specifying English as the default
language for this page to help things
like Google Translate and the like,
so does the image tag get modified
by this attribute called source.
It's just src and img because
those are more succinct
representations of "image" and
"source"-- saves us some keystrokes.
And now I can type in here dan.jpeg.
And then, just for good measure--
well, rather, I can then close the
tag using the corresponding angle
bracket, the greater than sign.
But whereas all of the
other tags thus far
have a notion of starting and
stopping or opening and closing,
the image tag doesn't because the
image is either there or it's not.
There's really no conceptual
notion of starting an image
and then eventually stopping an image.
But let's add one other detail.
It turns out that there's
yet other attributes.
So you can have zero or more on any tag.
For folks who have trouble
seeing content on web pages
and, indeed, rely on
tools like screen readers,
there's actually attributes that
can help in cases like that--
turns out there's an alternative tag,
or alt, where you can actually say,
"photo of Handsome Dan," which is a
textual description of whatever it
is you're embedding in the web page.
This way, someone who's
not sighted but who
has a screen reader that
can read that to them
can actually understand what
it is that's on the web page.
So most folks wouldn't see that
unless you actually hover over it
or have it spoken to you.
So let me go ahead and save this file,
go back to the index of the web server
that I ran earlier with
http-server, and now click on image.
And voila.
You'll see dan.jpeg
embedded in the web page.
Of course, this web page doesn't
actually do all that much yet.
And so suppose we actually wanted
to link to one page or another.
Well, we can do that as well.
Let me go back to the IDE, copy this
same code, just as a starting point,
create a new file called link.html.
And then in this file, we'll
start with the same contents.
But let me get rid of
that body and simply say,
for instance-- let's have
people visit Harvard.
So I could say visit https,
for secure, www.harvard.edu/,
or maybe even without the slash-- it
doesn't matter for the default page--
period.
Let me save this.
Let me go back to the
index of the web server,
reload so that I can see the new
file, link.html, that I created,
and now click link.html.
And voila.
So it's a URL visually.
But it's not actually clickable.
But that's because the browser's only
going to do what you told it to do.
And all I've implicitly told it to
do is display this black text here.
If I actually want to make it
interactive, I need another tag.
Well, it turns out in HTML, there's an
anchor tag, somewhat cryptically named.
And it's also succinctly
written as a, for anchor.
And with the anchor tag can you
anchor at this point in the page
a link, or a hyper-reference, as it
was once called, to that specific URL.
So that attribute, by convention,
is called href, hyper-reference.
That is the destination
to which you want to link.
I can now close that tag.
But I now need to tell the
user where they're going.
So I could just say Harvard, for
instance, and put my period out there.
Save the file.
Go back to the tab here.
Click Reload.
And now you'll see the dichotomy.
I'm seeing one thing, Harvard.
But if you hover over it,
and it's super small here,
you can actually see, as a safety
check, in the bottom left-hand corner,
typically, the URL that
you'll actually be led to.
Now, as an aside, with this very,
very simple feature of HTML,
you can actually
socially engineer people,
as is commonly done with phishing
attacks, P-H-I-S-H-I-N-G.
If you've ever gotten some spam, either
in your inbox or your spam folder,
odds are someone's tried to ask
you for your username and password
or for your money or
for your PayPal account.
PayPal is especially
a common target here.
But you can see how you can
very easily, unfortunately,
trick and mislead people,
especially if they don't necessarily
understand some of these fundamentals.
Let me go back here, for
instance, and say here--
well, there's nothing stopping me from
doing this little mischievous trick.
I can change the href to
Yale, but the text to Harvard,
thereby tricking someone.
Ha ha.
You're actually going to
Yale's website instead.
But more maliciously, and in
these phishing emails or spams
that you might have been getting
over the past several years,
you could imagine typing anything
you want here, like paypal.com.
And then here could be
www.SomeMaliciousWeb
siteThatWantsYourMoney--
hopefully, that does not exist-- .com.
Save.
Reload the page.
And honestly, most
people, myself included,
are not going to always paranoically
check where I'm actually going.
I'm just going to click on a link.
And voila.
You might not notice the URL
bar changing because you're
being whisked away to some website.
And honestly, it's not all
that hard to recreate websites.
In fact, just to really hammer this
point home, let me go to paypal.com.
And using today's
primitives, notice that you
can go to View, Developer, View Source.
This is the HTML implementing
PayPal's website--
looks good.
Let me copy and paste that into,
say, a new file called paypal.html.
Let me save that here.
Now let me go back to my web
server, reload, open paypal.html.
And voila.
I have made PayPal.
So it's not even that hard to mimic
where people think they are going.
Now, intellectual property issues
aside, that I just copied and pasted
someone else's website,
this is clearly not
fully operational because what I
don't have access to their database
and their code on the server and all of
the intellectual property and business
logic, so to speak, that
actually makes PayPal what it is.
But HTML, the point is, is purely
openly accessible by anyone.
It's not encrypted.
It's not zeros and ones.
But it tends to be so aesthetic and
structural in nature that that's not
really the juicy stuff in a business.
But this technique can
certainly be abused in this way.
So moving forward, just be more
mindful of this because most emails
you get these days by
a Gmail or any tool
are themselves implemented in HTML.
Even when you're typing
out a Gmail message
and have never even thought
about HTML, that email
is actually being sent
underneath the hood as HTML.
Why-- well, if you've ever used a
bulleted list or a numbered list,
if you do boldfacing or italics or any
of those aesthetic features in Gmail
or other programs, those
are implemented as HTML,
but just using nice,
user-friendly interfaces.
So you can just click icons.
You don't have to think about open
bracket, something, close bracket.
But we could do that.
For instance, if we go ahead and
look at a few other examples--
let me go ahead here and actually go
back to our very first one, index.html.
And suppose I just want to
really draw attention to "hello."
I can actually use the strong tag,
which implies bold, typically.
Save that.
Let me go back to the web server
that I had open a moment ago.
Click on index.html after reloading it.
And now it's a little
subtle because it's small.
But you can probably see that
"hello" is indeed boldfaced now.
So if you've ever clicked the B icon
in Gmail, that's all it's doing.
Underneath the hood, Gmail is
taking your word, hello, and
secretly putting open bracket, strong,
close bracket, and then the opposite,
the close tag, after it.
And that's what it's sending to
the recipient of that message.
So what else can you do?
Well, let me go ahead and do this.
Let me go ahead and open up, say, a
few files that I created in advance.
One is called paragraphs.html.
And let me point this out first.
So in paragraphs, I just have
three paragraphs of Latin text.
And they are rendered,
for instance, as follows.
If I go into source 5 and
I go into paragraphs.html--
looks nice-- don't know what it says.
And, in fact, it's
pretty much gibberish.
But it's nice, three nice paragraphs.
But notice how pedantic HTML is.
I actually had to use another tag
to achieve those paragraphs, even.
If I only had, very reasonably,
written these three paragraphs
like you might in Google
Docs or Microsoft Word,
it's just three paragraphs.
Indent each.
Hit Enter, Enter in
between them-- looks good.
It's wrapping because it's a really
long paragraph off to the right.
But that's fine.
And I save this.
And I go to paragraphs and reload.
Notice that it all bunches together.
Intuitively, why is
that happening, though?
What's the logic behind this bug
now, albeit an aesthetic bug?
Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
Those additional spaces are
not being accounted for.
They're just being
pushed together because
even though HTML does
respect one space--
otherwise, everything would
be completely smushed--
it ignores multiple spaces,
whether it's new lines or tabs
or multiple hits of the space bar.
And it only does, ultimately,
what you tell it to do.
So unless you explicitly, with tags
in HTML, say, give me a new paragraph,
that's it for this paragraph,
give me a new paragraph,
else that's-- now that's
it for the paragraph,
it's just going to
clump them all together,
maybe separating with a single space,
which is clearly not the effect we
want.
So just remember that HTML is really
nit-picky when it comes to that.
And much like in C, your code won't
compile if it's not quite right.
In HTML, it will display.
But it's not going to
display quite right--
is the key there.
Well, what other features
does this HTML have?
The reality is-- we'll give you a
general conceptual overview of HTML
today.
We'll give you a taste
of some of the tags.
But the reality is this,
too, is the sort of language
that you can really learn
by doing and by looking
at online references or texts that
actually summarize the various tags.
But let's look at least a few more.
Let me go into now headings.html.
And you'll see this--
turns out that there are tags
called h1, h2, h3, h4, h5, h6.
These are very commonly
used on websites that
have different headings, like big
and bold, a little smaller and bold,
a little smaller and bold to do,
like, chapter and section headings.
CS50's website is very hierarchical.
If you look through the syllabus,
you'll see lots of different font sizes
and boldfacing and the like.
That derives from our using
these built-in heading tags.
If I go ahead and open this in my
browser, we will see the effect.
By default, h1 is big and bold.
H2 is big, but not as big and bold.
H3 is a little smaller.
H4, 5, and 6--
and this follows the paradigm
in academic papers and books
that have chapters and sections
and subsections and the like.
You just get this feature
for free from HTML.
Well, what else is there?
Well, if you actually
have tabular data, things
you want to lay out in
rows and columns, well, it
turns out that HTML supports tables.
Let's glimpse at this, too.
And if I go into table.html, in
my browser, we'll see this effect.
It's not all that interesting.
I kind of mimic the idea of a phone
pad, where these numbers are lining up
in columns and in rows.
But invisibly, this thing is
actually laid out with tags.
If I go to the IDE
and look down in here,
you'll see some copy-paste of before--
html, head, and body.
But then notice here.
Hey, browser, here comes a table.
And you see, albeit surrounded by
unfamiliar tags, probably, 1, 2, 3, 4,
5, 6, 7, 8, 9, and then
the symbols down there.
So let's just infer, because the
reality is much of your learning of HTML
and soon another language, we'll
see-- it will just be indirectly.
If you're curious as to how some web
page is implementing some feature,
you actually look at its source code.
And you infer, by example,
how you could do the same.
So take a guess.
If this tag, effectively, says,
hey, browser here comes the table,
this tag here, even if you've never
seen HTML, probably means table row.
Hey, browser, here
comes a row in my table.
This one's less obvious.
But td, td, td stands for
table data or table cell.
So, hey, browser, here comes a
cell, another cell, another cell,
three of them in total.
Hey, browser, that's it for this row.
And then repeat the pattern.
So here's where HTML just gets
a little mundane after a while.
Once you see the name of the tag
and once you know what attributes,
if any, it supports, you
just follow this pattern.
That's it for HTML.
There's start tags.
There's end tags.
And sometimes, they're not even
end tags, if they're not needed.
And there's attributes.
And that's HTML.
Now, if you want to be sure
that your code is correct,
you have a few options.
Let me actually go ahead and open up,
for instance, hello.html from earlier,
just so I have a simple example--
or index.html from earlier.
Let me go to validator.w3.org--
turns out there's tools out there that
will just help give you feedback on
whether or not your HTML
is valid, is correct.
And this is useful because sometimes,
it might look OK to you on Chrome.
But honestly, if your friend or family
member visits the exact same page
on Edge or IE or Safari
or Firefox, it might not
look the same because the companies
that make those browsers sometimes
disagree on how to render HTML.
And so if it's not 100%
correct, you're only
incurring more risk that something
might render incorrectly.
I went ahead and clicked Check
after pasting my code in.
And this is good--
document checking complete,
no errors or warnings to show.
So when it comes time for Pset5
and you're dabbling with HTML,
know that there are tools
out there, this one included,
and we'll point you at it in the
spec, that just helps give you
feedback on whether something is broken
so that you can, with more confidence,
know that it's going to work OK.
Well, let's make something a
little more interesting now.
Let's re-implement Google, and not
by this little copy-paste trick,
where we just copy their
HTML and use it ourselves.
Let's actually now make a user
interface that uses Google, in some way.
So Google, of course,
in all of its forms,
ultimately has a text box into
which you can type information.
And if I go ahead and
do this, it turns out
that Google is generally going
to redirect me to a certain URL.
If I search for "cats"
and hit Enter, notice
I got redirected to a
pretty cryptic-looking URL.
There's a lot of metadata in there.
There's a lot of advertising
information these days and all that.
But it turns out, and I know
this just from experience,
I could distill this URL into this.
And it will still work.
So let me go ahead and hit Enter.
Whoops.
Let me go ahead and hit Enter after
simplifying this to question mark q
equals cats.
Enter.
And indeed, I get the
same page of cats back.
So what's going on?
So the URL itself is
not all that remarkable.
We've seen ww before.
You've certainly used google.com before.
This means it's secure.
It's speaking HTTPS.
All of this now is old hat.
It's not requesting index.html
because Google is dynamic.
The content is constantly changing.
There's not some human whose job it is
to update Google's home page every day
with HTML.
So they, instead, have a
piece of software running,
written in Python or C++ or Java or who
knows underneath the hood that is just
listening at this address.
So it doesn't have to be text
files that humans created.
It can actually be a program.
This one is called Search.
And in just a week or
two's time, you, too,
will write programs in a language called
Python that can do the same thing.
But for now, we'll let
Google do the heavy lifting.
And notice the question mark.
If you ever see a question mark in
a URL, this means to the browser,
here comes some user input,
something that the user probably
typed into the form, just like
I did "cats" a moment ago.
And then you're going
to see something equals
something, which indicates
what the human typed in.
Now, just because Larry and
Sergey, some 20 years ago,
decided with google.com that this
text box that we saw a moment ago,
the big box that's now positioned here--
they decided years ago that
the name for that text box
is going to be q for query--
but you can call it anything you want.
"Cats" is, obviously, what I typed in.
The equal sign is just
associating the two together.
So this URL just means to Google,
hey, Google, run the search program,
passing in a user input name of
q whose value shall be "cats."
And that is how Google knows what
to search for, for any of us.
And frankly, I can search
for "dogs," not even just
by typing the word "dogs" in here.
I can be a little more precise
and type it into this query
because I now know Google's URL format.
And voila.
Now I get search results
for "dogs" instead.
But that's it.
That's the basic building block
that's been happening all this time.
And even though the URL a moment
ago was longer and uglier,
that was just uninteresting detail.
It's not the core business that
the search is actually providing.
So what does this mean?
I can actually now make my
own user interface for Google
by using a few new tags as well.
Let me go ahead and copy
this, as a starting point.
Let me go ahead and create a
new file called search.html.
Just to save time, I'll
type that in there.
And I'll call this search.
And I'm going to get
rid of the "hello" body.
So I just have a starting point.
That's just the same HTML I'm
copying and pasting every time.
Well, it turns out in
HTML, there is a tag
called form that will give
you a form for user input.
And it turns out that inside of a form,
you can have different tags as well--
specifically, an input.
And inputs have names.
So I can say name equals "q" to mimic
Larry and Sergey's decision years ago,
the founders of Google.
The type of this input is text.
So it's not a button or a check
box or something like that.
Those exist, too.
It's just text.
And then I want a Submit button.
And I just know, from
having done this before,
that I can get a Submit button
by doing type equals submit.
And then the value of that
is going to be Search,
which is the word I'm
going to see on the screen.
You would only know
this by having seen it
by someone else doing it,
looking at someone's source code,
reading an online tutorial.
It's not necessarily obvious.
But the pattern is the same-- tag
name, attribute equals something,
attribute equals
something, and so forth.
Well, now let me go ahead and
save this, go into the web server,
and reload the index.
So there's my search.html.
And it's not quite as
pretty as Google's.
Let me zoom in so it's bigger.
But I do have a text box.
And I have a button
whose label is Search.
But I don't know yet where to send it.
I need one more attribute or two here.
It turns out that I want this form
to take the action of sending this
information to www.google.com/search,
the search program on Google's server.
But I want it to use that
special verb we saw a moment ago.
And again, this was
deeper in the envelope.
The method I wanted to use is
get, in lowercase in this case--
so a little low-level and technical now.
But this just means that's the verb
you should use inside the envelope
to get the web page.
But that's it.
I've told the web page
the action you should take
is submit this form to this URL
using get, the method we saw earlier.
Submit a parameter,
as it's called, called
q, with whatever the human typed in.
And then have it give
us a Search button here.
So let me save this, go
back to my page, reload.
And now let's go ahead and search for
"mice" this time and click Search.
And voila.
There we have a whole lot
of mice search results.
But why, is the question?
Well, all I've done is,
using HTML and an HTML form,
is I've generated the
prescribed format of a URL,
calling Google's Search program
with a input of q equals mice.
And now, as an aside, if
I did take more inputs,
they would be something like this--
something equals value ampersand
something equals value.
Ampersands just separate
these key-value pairs
if you have multiple inputs on the page.
But the principle is
ultimately the same.
So it's pretty powerful.
I've not implemented Google, per se.
I've implemented the front
end, the user interface.
And in future, we can we maybe start
to work on the logic behind the scenes.
So any questions then on HTTP and
now the convergence with HTML?
You feel comfy with HTML, because we're
about to move on to another language?
Yeah?
So all of my examples have looked
ugly thus far, except for PayPal.
That looked pretty nice.
But I just copied and pasted it.
So how do we begin to style our
websites in a more compelling way?
HTML, at the end of the day, is mostly
used for structure of a web page,
just laying out the data that you care
about, the words that you care about,
the images that you care about.
But the aesthetics that
last miles, so to speak,
of the really pretty colors and the
right font sizes and positioning
things exactly where you want them--
that is the job of another language
called CSS, Cascading Style Sheets.
This, too-- not a programming language.
It's entirely aesthetic in its nature.
So let's go ahead and
take a look at an example.
Let me go ahead and open up
the same web server as before,
open up an example I saw early--
that I made earlier called css0.html.
Suppose that this is the home page
that I want to create for John Harvard.
And notice I've got his name,
big and bold, at the top.
And I've got a slightly smaller font in
the middle and a slightly smaller font
below it.
But these are just minor
font size differences.
It's all centered in the page here.
How would I actually make this website?
Well, let me go ahead and
go into a new file here.
I'll call it css0.html.
Let me go ahead and paste my
starting point, as before.
And I'll call this css0.
And then in the body
of this page is where
I'm going to go ahead
and lay out that content.
So as I recall, I had John Harvard.
And then below that, it was
"Welcome to my home page!
Copyright," and funky symbol--
so I'll just do that for now--
"John Harvard."
Save.
So that's css0.html.
Let me go ahead and reload
it back from my server.
And voila.
So what's wrong, aesthetically?
It's, obviously, all on one line.
But why?
How do I fix this, as before?
Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
So I could add the paragraph
tags, just to put these
all on individual paragraphs.
And the IDE sometimes
can be a little annoying
because now I'm going in
retroactively and adding this stuff.
So it's trying to be helpful.
But then I have to delete it.
So sometimes, this autocomplete
can get in the way.
But it's an easy enough fix-- open p.
Let me move this over here
and move this over here.
Save.
Go back to the browser.
It's not going to change on its own.
I need to click Reload.
And now-- better.
It's a little ugly-- more
whitespace than I want.
But it's closer, certainly.
Let's clean up that copyright symbol.
It turns out there's some keys you
just can't type on your keyboard.
You could certainly
copy-paste it from elsewhere.
But HTML, as an aside, supports
what are called entities.
And these are numeric
codes that are sometimes
written in hexadecimal, sometimes
written in decimal, depending
on your preference.
And it's just a weird number
that represents a symbol.
You couldn't, otherwise, type.
Watch as I reload now.
So what happens to
that copyright symbol?
Now it's the one you might expect--
so minor detail.
It's not all that interesting.
But those do exist, as
well, for aesthetics.
But this isn't quite what I want.
And here is where CSS comes in.
I can lay out the
structure of this page.
Yes, I have my three
separate paragraphs.
But they're not centered.
Their font sizes are all the same.
And there's weird gaps there.
This is where CSS can help.
So let me introduce a
few new tags instead.
These aren't strictly paragraphs.
It's not sentences
and sentences of text.
This is kind of like
the header of my page.
So let me actually
rename this to header.
This is maybe the main part of my page.
So let me rename this to main.
And this is like the footer
of my page, I would claim.
Now, it's a super simple website.
But these tags exist.
And in the most recent
version of HTML called HTML5,
the world has started moving away from
generic tags, like p for paragraph,
to more semantic tags that are a
little more descriptive that say,
hey, browser, here's the
header of my page, annoyingly,
not to be confused with the head of
your page, which is, like, the title.
And, hey, browser, here's
the main part of my page.
Here's the footer of my page.
And we'll see why this
is useful, if only
because it describes my page
a little more compellingly.
But it turns out that any
HTML tag can have a style
attribute, which we've not seen before.
And if I want to alter the font size of
this tag, I can say, make this large.
And down here, I can say, style
equals font-size, let's say, medium.
And then down here, I can say
style equals font-size small.
And let me save that, go
back to the browser, reload.
And it's not centered yet.
But now it's kind of big, medium--
large, medium, and small, which
is what I intended the first time.
So how can I actually add centering?
Well, it turns out
inside of these quotes,
you can use semicolons to
separate multiple ideas.
If I put a semicolon here, I
can now say, text-align center.
And let me go ahead and copy
and paste that here and here.
Save.
And notice the pattern.
There's a keyword, a
colon, and then a value.
A semicolon separates it.
Then there's a keyword,
a colon, and a value.
That's the same pattern
we're going to see.
If I go back to the browser,
reload now, now we're on our way.
Now it looks more like what
I intended it to look like.
It took a little more effort.
But thanks to CSS, I was able to do it.
So what I've highlighted here and
what the IDE has highlighted in green
is what are called CSS properties,
Cascading Style Sheets.
CSS lets you deal with things
like centering and font sizes
and colors and positioning and all
the aesthetics I alluded to earlier.
And you just have to know
what these key values are.
Honestly, I don't know
all of them, certainly.
I always Google when I
want to know how could I
do something with this type of tag.
That's because there's a lot
of online free references
that just shows you this.
But they all follow the same pattern--
key, colon, value-- maybe semicolon--
key, colon, value, and so forth.
But even if you've never
written HTML before,
you could probably argue
that I am not making--
designing this very well.
In C, too, you might have found fault
any time my instinct was to copy-paste.
What is redundant in this example?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah.
I'm centering all three, which
honestly, it just looks a little stupid.
It literally was copied and pasted.
And that should always
rub you the wrong way.
So Cascading Style Sheets--
the first C in Cascading
Style Sheets, or the only C
in Cascading Style Sheets,
stands for Cascading,
which implies a hierarchy to it, too.
So let me, actually, make a new example.
Let me call this css1.html.
Let me paste that same exact code.
But it occurs to me that header and main
and footer are all children of body,
if you will.
They're indented inside.
And you can-- you actually
can use family tree references
in the context of HTML,
where header is a child
of body insofar as it's inside of
her, tucked, indented, inside of it.
So if these all have the
same parent, so to speak,
let me actually erase
this from all three tags.
And let me actually apply it
to the parent tag, saying,
style equals text-align
center because cascading
style sheets, indeed, cascade.
So if you apply one property, like
aligning in the center, to the parent,
it's going to cascade down on all
of the children nested inside.
So let me go ahead and save
this, go back to the listing,
and open up css1.html.
And voila-- no aesthetic difference.
But it's just better designed,
like 5 out of 5 for design
now, but not necessarily because
this is a little ugly, honestly.
And we've not had occasion
to do this yet in C
because we only had one
language in C. It, generally,
is frowned upon to combine one language,
like CSS, with another, like HTML.
And they might look very similar.
And they're all in the same context.
But this gets annoying.
And especially in the
real world, some people
might be better with
aesthetics than others.
Clearly, from my examples,
I'm not among those people.
And so I might want to work with
a colleague or a friend who's
much better at design and
colors and fonts than I am.
And so I might want them to
work independently of me.
I'll work on the structure of the web
page or, if you will, my final project,
and let them actually contribute
more of the aesthetics.
So how can we begin to
decouple these things?
Much like in C, we, at
least, had header files.
We could factor out commonalities.
Well, it turns out we can do this
a little differently from before.
Let me go ahead and open up an example
2 that I made earlier called css2.html.
And let's scroll through
this for just a moment.
Notice now that in the body of this web
page, I've introduced a different tag--
rather, a different
attribute called "class."
So it turns out that you don't have
to just copy and paste or type out
manually all of these nit-picky
font size changes and text alignment
changes.
You can give them more
descriptive names.
And arguably, it's a lot more
readable to me and my partner
to read the word "centered" and
"large" and "medium" and "small"
and not see all the stupid colons and
the semicolons and the distractions.
That's the stuff that's not interesting
when writing any sort of code.
So where did these words come from--
centered, large, medium, and small?
Well, notice that they're all
values of a class attribute, which
is-- allows for customization.
Let me scroll up to the
head of my web page.
And you'll see, and it's mostly
whitespace because I just
kept hitting Enter to clean it up--
notice that inside of my html
tag is, as before, my head tag.
If I scroll down, there's
also still a title tag.
But there's a new tag that I
alluded to earlier among the few
you can put up there called "style."
You can factor out to
the top of your page
all of the stylizations
that you care about.
And you can do it as follows.
Notice here that I've literally written
the word "centered" with a dot in front
of it, the word "large"
with a dot in front of it,
the word "medium" with a
dot, "small" with a dot.
Those define classes.
So CSS lets you define
your own collections
of configuration properties.
And you can give them names,
just so it's a little more
descriptive and user-friendly.
So you can define class,
class, class, class.
And then inside the curly braces, which
I've lined up here, just like in C,
you can have one property, two
properties, 100 properties.
But you can keep them nice and
orderly, away from all of your HTML, so
that someone else can work on them
or just you can keep the aesthetics
separate from the contents of your page.
It's the notion of
separation of concerns.
Keep the data separate from
the presentation thereof.
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Is there a library
you can use that's done this for you?
Yes.
And we'll see a little teaser
for that in just a bit.
So where are I using
these words, to be clear?
Here, I'm saying give me
a class called centered,
a class called large, medium,
and small, each of which
have these respective
properties associated with them.
And then down here, I
can just use those words.
And I don't have to
get into the business
of the semicolons, curly braces,
and all of that in my actual HTML.
But it turns out I can do
this even more fancily.
Let me open up css3.html,
another example.
In this case, notice what I've done.
Now my code is really getting
pretty, relatively speaking,
or from one person's perspective.
Now I don't have any attributes.
This is just tighter.
I'm using fewer characters,
fewer words, fewer lines of code.
This is just, generally, a good thing.
It's less work.
It's less to maintain, fewer
opportunities for mistakes.
But I've gotten rid of, it
seems, all of the aesthetics,
but not necessarily, because
CSS, this second language,
also lets you apply properties
not to tags by way of classes,
but to the actual tags themselves.
So if you only have one body, it is safe
to say, OK, CSS, apply to the body tag
this or these properties.
Hey, browser, apply to the header
tag this or these properties--
to the main tag, the
footer tag, and so forth.
So I don't even need to complicate
my world with small, medium, large,
and so forth.
I can just apply those
properties at the top of my file
to the respective tag
names, whatever they are.
And I could use the p tag.
I could use the image tag,
the a tag, any of those.
I can style them in different ways.
In fact, if you wondered or started to
wonder how could you resize an image,
you can apply CSS to
the image tag and say,
make it this many pixels or this
many pixels, or something like that.
Yeah?
AUDIENCE: Is it bad design to
then keep pushing [INAUDIBLE]
DAVID J. MALAN: Yes.
Is it not bad design to just
keep adding more stuff to the top
and pushing your actual
content down and down
and down and just bloating the file?
Yes-- which is a wonderful segue to our
fourth and final example here, which
is css4.html.
This example-- let me just zoom out.
That's it.
This css4.html has even fewer lines
of code and, indeed, no CSS in it
whatsoever.
This is just the website I care about,
the words and the data I care about.
All of the aesthetic
stuff, while important,
is relegated to a separate file that you
can probably infer is called css4.css.
Unfortunately, and this was a stupid
design decision by humans years ago,
the way you include CSS
from a separate file
is, paradoxically, to use
a link tag, not the a tag,
which probably should have
been called the link tag.
But you have a relationship
of style sheet.
So sometimes, humans
make poor decisions.
This is one of them, I would say.
But if you just copy-paste and
trust that this means, hey, browser,
open up this file and use those
features from the file in this file,
it's similar, in spirit, to
C's hash include mechanism.
It just looks a little different.
So what's in that file?
Well, you can probably
guess, if I go into css4.css,
it's just that same content.
But I factored it out, as you notes--
wasn't the best design to keep it
all together.
So I can simply put it there instead.
Any questions?
Yeah?
AUDIENCE: In the other one, the
fourth perfect one, the best one,
what does "stylesheet" do?
DAVID J. MALAN: Good question.
What does stylesheet do in this example?
Short answer is that just
makes clear to the browser
that the relationship between this
file, css4.css, and this file,
which is the HTML file, is
that of a "style sheet."
So CSS, Cascading Style
Sheets-- it's a lot of words
just to convey the idea of aesthetics.
But that is your style sheet, literally.
It's an actual file that ends in .css
that should be applied to this HTML.
Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Better design why?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: It's
really good question.
So to summarize, is it-- isn't it--
wouldn't that be better design, to have
one file with your HTML and your CSS,
rather than two because
things can get misplaced?
Now they're decoupled.
There's not the same inherent link.
Maybe, honestly.
That is a reasonable concern.
Reasonable people will disagree.
Generally, I would say
that the programming
world has decided that separation
of concerns is a good thing.
So keep your HTML in one file,
your CSS in another file.
Keep them in the same folder.
And, frankly, if you go losing your
files in a folder all the time,
the problem is probably a--
is human problem, not a technical one.
But you make a good point, too.
And you could argue, quite credibly,
that you're just over-engineering this
now.
I like it better altogether.
And you'll see in CS50's website
and Facebook and Google and others--
sometimes, you do see
CSS together with HTML
because humans decided
this does make more sense.
But there are these mechanisms in
place to facilitate collaboration,
to facilitate separation, so that you
can keep things a little more organized
in separate files.
Any questions then?
So to recap where we're at,
because this is a lot quickly,
HTTP is this protocol via which
you can just exchange information
from A to B and B to A. HTML is
the language in which web pages are
written, that structure of the web
page, and actually have your data.
And CSS lets you fine-tune it.
Now, I didn't fine-tune
it all that much.
I just centered it and
changed the font size.
But honestly, we can very quickly
get into the weeds of colors
and positioning and all of that.
But that we'll do in sections and
in Psets and in googling and looking
at online references
that we'll point you
to because it just all follows the
same patterns of tags with attributes
and then CSS properties.
So even though you've not seen the
whole vocabulary of CSS and HTML,
you have seen the entire structure,
the fundamental concepts.
So let's introduce then one
final piece of the puzzle
and bring back to bear
some of our programming
capabilities of the past several weeks.
So it turns out that in
the world of HTML and CSS,
you can actually introduce a
programming language, as well,
to make your websites even more dynamic
using something called JavaScript.
Many of you have taken
APCS and know Java--
no relation.
JavaScript was just a
marketing decision to them--
call it something similar to
an already popular language.
So JavaScript is a language
used in browsers, typically,
to give you more control
over the users' experience.
For instance, when you visit Gmail
these days and you get a new mail,
it just appears magically
as a new row in your inbox.
You don't have to reload or keep
clicking Refresh to see your new mail.
It just appears magically.
When you're using Google
Maps or something,
you can just click and drag
and see more of the map.
Back in my day, you have
to click a right arrow
to go this way, a left
arrow to go that way.
And the whole web page
would actually reload.
But JavaScript gives you logic
and programming capabilities
in your users' Macs and PCs and phones
that gets executed not on your server,
but on their browser, which means you
can do many more things by running code
on their computers.
So what does this actually mean?
Well, in JavaScript,
fortunately, we have
a language that's super similar
to C. But it's interpreted top
to bottom, left to right.
The browser just reads the instructions
in JavaScript and just does them.
There's no compilation for you.
There's no zeros and ones.
And so in that sense,
it's just easier than C.
Also, it has no pointers, which
also makes it easier than C.
But it gives us the
ability to alter a web page
once it's been delivered to a user.
And we'll see what we can
actually do with that capability.
But first, let's compare and contrast.
You'll recall a few weeks ago,
in week 1, when we introduced C,
we pulled up some Scratch we
pulled up some C, just to show
that the ideas are still the same.
Let's do the same real quick here.
So we went from Scratch to C. Let's
now go to JavaScript with variables.
So in C, if you wanted to set
a counter to 0 a la Scratch,
you would literally say
counter equals 0, semicolon.
But you would have the
data type to the left.
In JavaScript, the code
is almost the same.
But you actually don't
specify data types.
You, the programmer, don't worry
about ints or floats or strings
or all of that.
You do define the variable.
And the keyword to use, though
there's several options that
do slightly different things, is let.
Add the thinking is let the counter
equal 0, please, if you will.
But you don't specify the type,
even though JavaScript supports
numbers and strings, and so forth.
You just don't have to care
about them as much anymore.
Suppose you want to update a variable.
In Scratch, you would just
change the counter by one.
In C, you would do counter
equals counter plus 1, semicolon.
In JavaScript, you would
do the exact same thing.
Code is identical.
In C, you could also
write this more succinctly
as counter plus equals 1,
semicolon, if you recall.
If you don't, that's fine.
This is just shorthand notation.
In JavaScript-- same exact thing.
In C, you could also
do counter plus, plus,
semicolon to increment the value--
in JavaScript, same.
So this is what's nice about JavaScript.
You already know much
of it just by nature
of having spent so many
weeks in the weeds with C.
Suppose you had an if condition,
like this-- is if x is less than y.
In C, we would write
it like this at right.
JavaScript syntax is the same.
If you had an if-else,
syntax is the same.
If else, if else--
syntax is the same.
If you want a forever loop,
syntax is the same, while true.
If you want a for loop,
syntax is almost the same.
Let needs to be used instead.
So this is C because it says
int i equals 0, and so forth.
That's a data type.
JavaScript-- I just
claim doesn't worry--
you don't need to worry
about those data types.
So in JavaScript, you would
instead say "let" instead.
But otherwise, the syntax is the same.
So that's a nice starting
point because there's
nothing new to learn syntactically.
We just need to apply the same
logic that we saw in week 0 and 1
since to HTML.
So if this is a representative
web page, albeit super simple--
this is the one I brought up earlier--
how can we now start
thinking about this web page
in a way that is conducive
to programming it
and actually changing it dynamically?
Well, let me propose that you think
of this same web page as just a tree.
And we introduced trees just a week
ago, albeit in the context of C.
And frankly, in C, they're
a headache because you
have to wire things together using
pointers and nodes and all of that.
Don't worry about that now.
It's the browser's job to build
this in memory or RAM for you.
And indeed, when I keep saying
that a browser, upon receiving
an envelope with HTML, reads it
top to bottom, left to right,
I haven't said what it does with it.
What it essentially does with it
is it creates this data structure
in memory for you.
And it is Chrome or Edge or Firefox
or whatever browser you're using that
itself is written in probably C
or C++ or some other language.
Some other human at those
companies wrote the code
that builds all of the
pointers and/or whatever is
used to build this structure in memory.
But this is what the browser has
in mind once it's read your HTML.
And now that it's a data
structure in memory,
you can make changes to
it, just like last week,
we were inserting humans into our linked
list, changing the data structure.
The browser can add more nodes or
more tags to the page, dynamically.
So if you run with
this in your mind, when
you get a new email in
Gmail, what is happening?
Well, the web page, when
you first load it in Gmail,
has a whole bunch of td tags, probably,
or tr tags, rather, for table row--
table row, table row-- each of
which represents an email, perhaps.
When you get a new email, the browser
is probably just adding another tr node
to this tree because
notice the words here.
Html lines up with this tag.
Head lines up with this tag.
Body lines up with this tag.
So it stands to reason that when
you get another row in your inbox
with another email, someone is
just adding a node to that tree.
And that someone is
JavaScript, the language
in which you can control the users'
browser even after they've loaded
your web page for the first time.
So what can we actually do with this?
Let's start simple, as follows.
Let me go ahead and just whip up, really
quickly, a file called hello0.html.
And we'll do it, as before,
with our DOCTYPE html--
my html tag here, my head tag here.
My title here will be hello0.
And notice I've been moving
these to separate lines.
You don't strictly need to do
that-- just to keep the hierarchy.
The whitespace, again doesn't matter.
But I'll be consistent there.
And in my body here, I'll say this
time just "hello, world" by default.
So that's a pretty
simple web page as well.
Let's, actually, now
make it interactive.
All of my web pages thus far
have been static content,
except for the Google one.
But even that wasn't so much interactive
as it was the moment I hit Submit,
it made the problem Google's
problem to deal with.
Let's keep the user with me this time.
Let me go ahead and do this.
Let me get rid of this form here.
Let me create a new file now
called hello1 as my next version.
And let me go ahead and
paste that same code.
But this time, let me have the
browser be a little interactive.
Let me go ahead and have a form here
because what I want is a text box--
type equals text.
I'm not going to bother
giving it a name yet.
And let me have another one
called type equals submit.
Save.
And let me go ahead and open up
my server so I can see this file.
This, I said, was what-- hello1.html.
So it's just a simple form.
But there's no connection
to Google this time.
Let me start to use
this form interactively
because if I have the
ability to program,
I bet I could take the users'
input and do something with it.
So how do I do this?
Well, let me propose first
that I want the human to type
their name into this form.
And then when they
click Submit, I want it
to say "hello, David" or "hello,
Veronica" or "hello, Brian,"
whatever the name actually is,
like some of our C examples.
So you know what?
Let me write that function first.
It turns out that in the
head of your web page,
you can have not just the
title and not just style,
but also a tag called script
for JavaScript, for instance.
And in this tag, I can
actually write code.
And there's something a little
different in JavaScript.
Instead of writing void greet
as the name of my function
and then writing the body
of my function here and then
saying void here, for instance,
JavaScript's a little looser.
If you don't want to
take any arguments, just
don't mention them-- no mention of void.
If you don't have a--
and actually, don't even
mention a return type.
Just call it a function--
so slight difference
from C. It's a little lazier.
You don't worry about input types.
You don't worry about output types.
You just say, give me a
function called greet.
Well, what do I want
this function to do?
Turns out in JavaScript,
there's a function called alert
that's just going to pop up a
window that says something in it.
And I can pass, as an argument
to this JavaScript function,
whatever it is I want it to say.
So let's go ahead and say
"hello, world," semicolon.
It's almost identical to C,
again, except that I'm saying
function instead of a return type.
And alert, apparently, exists.
And there's no sharp include or any
of that that we typically had in C.
It's just literally in
my browser right now.
So let me go ahead and save that
and go down to the form tag here.
And it turns out, on
the form tag, there's
a special attribute called onsubmit.
And as the word implies,
it says when the form
is submitted, on the submission of this
form, go ahead and execute this, greet.
So I can actually tell the browser,
on submission of this form,
to call a function that I wrote.
And now let me just
preemptively write return
false for reasons we'll
come back to in a moment,
just to make sure this actually works.
Now let me go ahead and save this,
go to hello1.html, open that up.
And let me just change the
title, for consistency--
so hello1.html.
And let me go ahead
and say David, Submit--
hello, world-- not really sure what
the point of typing my name was.
But it, at least, seems
to work as programmed.
But obviously, where I'm going with
this is I want to display my name.
So when the human has typed in their
name to the box and clicked Submit,
that's triggering a
submission of the form.
But wait.
When the form is submitted,
I'm calling greet.
So it sounds like it's
greet's job to figure out what
the word is that the human typed in.
So how can I do this?
It's a little cryptic.
And this is where now it
becomes JavaScript-specific
and not C. Let me go ahead and
define a variable called name.
And let me use this fancy
technique, document.querySelector.
And then in here, I'm
going to need to specify
what node in the tree I want to select.
So I'm actually getting ahead of myself.
Let's look at the HTML.
At the moment, I've got a form tag
and two input tags, neither of which
has a name.
And I could fix that.
But let me actually do
a different technique.
HTML also supports unique identifiers.
And you can give them
literally that, unique IDs.
You can call it whatever
you want-- foobar, baz, xyz.
I'm going to make it more
descriptive and call it
ID equals name because
what I can now do up here
in querySelector is
actually specify what
it is I want to select from the tree.
That tree is called a DOM, or
Document Object Model, verbosely.
And I need to do one
last thing-- turns out,
and you would only know
this from experience,
that if "name" is the unique
identifier of an element
and not the name of a tag, I actually
need to prefix it with a hash,
unrelated to C's hash.
But otherwise, this
function, querySelector,
is going to think that
there's a tag called "name."
So this means an ID
whose value is "name."
It's a bit of a mouthful.
But here we go.
Once I select that node from the tree,
I want to get its value and set it--
I want to get its value, semicolon.
What is going on?
First, recall from this tree here
that whenever the browser loads HTML,
it has some HTML.
It builds a tree structure therein.
Each of those nodes is selectable via
this function called querySelector.
What is document?
Well, it turns out in
JavaScript, there's
this special global
variable called document
that refers to the whole
document, the whole web page.
Built into that is a function
called querySelector.
That dot notation is reminiscent
of C's struct syntax.
So you can think of document as a
struct that represents the whole page.
Inside of it is a
function, not just data,
but a function, called querySelector.
You're going to see this all over
the place in JavaScript, dots,
because people-- the JavaScript
world is much more voluminous than C.
So there's lots of functions inside
of other containers or structures.
So with that said, this is
just saying, hey, browser,
let me have a variable
called name and store
the value of the node that has
a unique identifier of name
and get that by using
this function, select it.
That grabs the rectangle
from the picture
and gives me access to the
value that the human typed in.
Now, I'm not done with this.
I need to actually display that value.
And it's not going to
be correct to do this.
Otherwise, I'm just going
to see "hello, name."
So there's not this convention,
which we had in C. There's
another way to do this.
But I'm going to go ahead
and do it as follows.
I'm just going to use concatenation.
So this is not possible
in C. But in JavaScript,
if you have a string on the
left and a string on the right,
using plus will not add them
together, which would make no sense.
It will concatenate them,
like glue one to the other.
In C, how would you do this?
It is an utter nightmare.
In C, how would you do this?
This would be an array of
characters on the left that
has a null character at the end.
This would be another
array of characters
on the right with a null
character at the end.
Neither is big enough to
fit the other as well.
So you'd have to allocate
a new array of characters,
copy these in, get rid of the backslash
0, copy these in, keep the backslash 0,
throw those away.
And then you have concatenated strings.
That is so many damn
steps in C. And this
is why no one likes programming in C.
And you don't have to do it anymore.
In JavaScript, just
use the plus operator.
That does all of that for you.
But hopefully, you do have an
underlying appreciation of what the plus
operator is actually
doing underneath the hood
because the computer is
still doing the same work.
The difference is this week onward,
we, the human, do less of that work
ourselves.
So plus is an abstraction
for all of that complexity.
So if I didn't mess this up,
let me go ahead and save now.
I'll go to the browser, reload,
and type in my name, David.
Submit.
And there we have it--
hello, David.
Let's do one more test.
We'll try, say, Veronica.
Submit.
And voila.
You'll notice that it's trying
to be helpful now, my browser.
If I start D, then it
sees autocomplete, or V--
well, forgot about Veronica, apparently.
Veronica-- let's see if we reload.
V-- that's weird.
Don't tell Veronica Chrome
doesn't remember her.
But we can turn that
feature off-- is the point--
by actually doing things like this.
And you would know this
from the online manual.
Autocomplete equals off
turns off that feature.
Autofocus also does something handy.
If you've ever been to a web page
and you can just start typing,
Chrome and macOS highlights it in blue.
That just means give focus.
Put the cursor there.
If you don't have that, the
web page starts like this.
And we've all visited websites, and
I think my.hardvard's among them,
where you have to
stupidly click there just
to start interacting with the page.
That is not necessary.
That's bad programming.
Just using the tags can
fix that kind of thing.
Questions?
AUDIENCE: What if we have
two IDs with the same name?
DAVID J. MALAN: What if we have
two IDs with the same name?
You should not.
That is human error.
An ID, by definition, must be unique.
And if you have two by the
same name, the human messed up.
And what it does--
I don't know what the behavior is.
It's probably unofficially not
documented or maybe it picks the first.
Maybe it picks the last.
I don't know.
But you shouldn't rely on it, anyway.
Good question.
Good corner case.
Other questions?
Let me jump ahead to one example.
And then we'll come back to
a fancier version of this.
Let me open up a program
that's in today's source 5
directory called background.html.
It's got some familiar letters, which
probably stand for red, green, blue,
probably.
These are three buttons.
And we've seen buttons.
We saw the Search button and the Submit
buttons that I've created before.
But using JavaScript, I can
do fun things like this.
If I click on R, the
web page just changed.
G, B, R, G, B-- this is now interactive.
If you were just writing HTML and CSS,
you'd have to pick one of those colors
and stick with it.
But with JavaScript, you can respond.
And that's because a
browser has lots and lots
of events happening all the time.
Events include clicks or mice moving
or dragging or, in a mobile device,
touching.
So there's lots of things that a
human can be doing with a web browser.
And you can write code that responds
to all of those kinds of events.
And so let me actually go ahead
and open up background.html
and show how this is working.
So for the most part,
it's just HTML at first.
Here's the html tag, the head tag,
the body tag, and three new tags.
This is another way of creating buttons.
And again, this isn't interesting.
You learn this in the
online reference or manual.
And it just tells you,
here's how to use a button.
It follows the same paradigm--
tag name, attribute equals value.
The label is just going
to be R, G, and B.
And now this is where things get
a little scary-looking at first.
But that's it.
There's just lines of code
here inside of the web page.
Now, let's walk through
this line by line,
even though it's a
little verbose at first.
So this first line here
says, hey, browser,
give me a variable called body.
And store, in that variable, the node--
the rectangle, so to speak--
that has the name body.
So that is, pluck that
rectangle out of the picture
so that I have direct access to it.
Why-- because I'm going to
manipulate it in just a moment.
This is the scariest the
JavaScript will look for now.
Document.querySelector
hash red-- could someone
translate that into just English?
What's that doing for me?
AUDIENCE: Giving the ID
of red that you just--
DAVID J. MALAN: Yeah.
Be a little more verbose.
Someone else?
Hey, browser, select for me the
node whose unique ID is red.
That's fine.
Give me access to that node,
the structure in memory.
And this is where it's a little weird.
So it turns out that every tag
in a web page or node in a tree--
the DOM tree, so to speak--
Document Object Model-- can have
event listeners associated with it.
And you would only know
this from the documentation.
But if you literally say,
go into this structure,
this node, that
represents the red button
and get its on-click value,
what's cool with JavaScript,
even though the syntax is
a little scary-looking,
is you can associate a
function with that event.
So this is saying, hey, browser,
when the red button is clicked on,
call the following function.
And what's new in JavaScript here is
that this function, at the moment,
has no name, which is weird.
You could technically do this in C.
But we always gave our functions names.
But you don't really need
to give a function a name
if you don't need to
mention it ever again.
And the detail that's
happening here for us is this.
This says, hey, browser, on
click, call this function.
What does that mean in real terms?
Hey, browser, call all
of the lines of code
in between this open curly brace
and this close curly brace.
So even if you're not
comfy with the syntax,
it just literally means execute
the following lines of code
when this button is clicked.
This is what's known as an anonymous
function insofar as it has no name.
It's just function,
open paren, close paren.
So you can probably infer what
it's doing on this line here.
Let me highlight this line in blue.
It's a little cryptic.
And again, I promise that you're
going to see lots of these dots.
But this is saying, hey,
browser, modify the body,
or specifically, the style of
the body, and specifically,
the background color of the style
of the body, to be, of course, red.
And the rest of the code is copy-paste
for now for green and blue as well.
So what is happening?
Every time you click on
one of those buttons--
R or G or B--
literally, this line of code is getting
executed that I've just highlighted
or this line of code is getting
executed or this line of code
is getting executed.
So even though the syntax
is, yes, admittedly, way
more complicated than we've seen thus
far, the idea is relatively simple.
Select the button.
Tell it, on clicking,
to call this function.
And it's fine early on if
you just copy and paste this.
And for Pset5, you won't
have to use any of this code.
This is in-- preemptive
look at what you can
do with an eye toward fancier features,
like final projects and beyond.
Any questions then on
this background example?
Yeah?
AUDIENCE: Why did we use the pound
symbol for red, green, blue, and not
for body?
DAVID J. MALAN: Good question.
Why do we use the pound symbol for
red, green, and blue, but not for body?
If you look at the HTML,
you'll see the following.
Body is, apparently, the name of a tag.
So that's why we just selected "body"
with that line of code around here.
However, red, green, and blue
are not the names of tags.
They are the unique identifiers,
values that I just came up with.
I could have called it x, y, z.
But I chose more descriptive terms.
So whenever you want
to reference or select
a node who-- that has an identifier,
you use the hash instead.
That's all.
These are just human conventions
that are non-obvious unless you
were told what they all mean.
Let's try one other
example with JavaScript.
It's not uncommon on news
websites to have the ability
to change the font size,
which you can, actually,
do on your Mac and PC sometimes
using keyboard shortcuts.
But sometimes, it's built
into the web page itself.
Let me go into, for instance, size.html.
And here's some Latin
text or Latin-like text.
And notice that it has
a little select menu.
Normally, when you have a select
menu, you select something.
And then you click Submit.
And then the server deals with it.
The information goes somewhere.
But you don't need to do that.
You can actually make little menus
interactive, just like text boxes.
Suppose I want to make
this text a little smaller.
I can do that.
I can choose extra small.
I can do extra-extra small or
I can do extra-extra large.
And so what's going on here?
Well, just like there are
click events in a browser,
there are also change
events or selection events.
Just anything that can happen on
the web page you can listen for.
So let's take a look at
this code, for instance.
We've not seen this tag before.
But we have seen paragraph.
And there's a paragraph of Latin.
And then there's a select tag,
which gives you a select menu.
A dropdown menu is called
a select menu in HTML.
And here's how you have
all of the options.
Now, there is a bit of duality here.
There's what the human sees, which is
between the open tag and close tag.
And then there's this value, which
the computer sees-- but more on that
another time, when we get to Python.
But this just gives me that
whole menu of size options.
And if I scroll down now, notice
I have a script tag down here.
And in this script tag, I have
document.querySelector "select"
because I want to select the name,
the tag whose name is select.
And then there's this event, onchange.
And you'd only know this
from the documentation.
But like onsubmit, onchange is
called any time you change that menu.
What function should get called?
Well, this one here, which
is an anonymous in the sense
that it has no name.
And go ahead and do this.
Select from the document the body tag.
Get access to its style.
And change its font size to, and
this is funky here, this.value.
So what did I do here?
Let me do this, no pun intended.
This refers to whatever
element in the web page
induced this function to be called.
So this is-- you can think of as a
variable, a special variable, that
always refers to whatever
element you are listening to.
And so this.value just saves me some
keystrokes because I don't-- you need
to use document.querySelector
to get at this select menu.
But we'll see this again,
perhaps, down the road.
Questions?
And let me point out
one thing that's stupid.
This here, fontSize,
looks different from CSS.
In CSS, what did we call this?
Do you remember?
We did font size small, medium, large.
It was font-size.
So this was left hand
not talking to right hand
when these things were invented.
It turns out that dash is
JavaScript means what, maybe?
Minus or subtraction.
And so this syntax just breaks
in the context of JavaScript.
So what did humans do?
They decided that any time you
have a CSS property that's word,
dash, something, get rid of the dash.
Capitalize the next word.
And that's now the mapping in
JavaScript-- so just a simple heuristic
there that you can perhaps keep in mind.
Let's take a look, perhaps,
at one final value--
oh, how about two final values?
Let's go ahead and do
this with blink.html.
So back in the day, when the
web was first being invented
and HTML was in its infancy,
there was a wonderful tag
that was probably on my
own personal home page
called blink that literally did that.
You could have a tag that was open
bracket, B-L-I-N-K, close bracket,
put some words, then close the tag, and
then your web page would just do this
to all visitors, which humans
eventually realized, well,
this is dumb and really annoying to
look at-- bad user experience, or UX.
And so they took it away.
It's one of the few tags, I think,
from HTML that was actually removed
by committee, as opposed to added.
There was also marquee at the time, too,
that-- like a theater sign would just
scroll words across your page.
So you've probably seen websites like
this that recreate them in some way.
But you can do this with JavaScript.
Think about this logically.
We know how, in code, we can
change the style of an element.
We've not seen how to do this yet.
But you can make an element
show or hide, show or hide.
Turns out in JavaScript,
you can use a timer.
You have access to a clock.
And you could actually write
code that says, you know what?
Every half-second, call this function.
Call this function.
Call this function.
Call this function.
And what that function does is
it changes the style of the page
to hide or show, hide or show.
Now, this used to be
built into browsers.
But now you can recreate it
with something like that.
And I'll wave my hand
at what the code is.
But that's just one feature there.
Let's look at one final example,
though, that's a little creepy.
Here's the code first.
And this is called geolocation.
This is all the rage now
with apps like Uber and Waze
and Find My Friends on
iPhone and the like.
Here is relatively little
code that will figure out
where your user is in the world.
Now, it's a bit of a mouthful here.
But it's mostly this file,
html with a script tag.
But there's this other
special global variable.
And we won't use this much.
And indeed, you might not ever use it
if you don't care about this feature.
But it's called navigator,
for historical reasons.
And navigator has a
feature called geolocation.
And geolocation, which stands
for locate people geographically,
has a function called
getCurrentPosition.
And for reasons we
won't really get into,
it takes a function as an argument.
This is a very common
JavaScript paradigm,
but more on this toward
final projects, perhaps.
This line of code is going to
write to the document the user's
latitude and, if we scroll to
the right, their longitude.
So this is where it gets creepy.
So if you were to use this code in
your websites and a user were to visit,
like I will now, and they click
the link, they will be prompted,
do you want the website
to know your location?
Sometimes, you might say yes.
Sometimes, you might say no.
Frankly, most of us probably
just click Allow instinctively
without really thinking about this.
But there's where I am, apparently.
Let me go ahead and highlight that.
Let me go to maps.google.com because
whatever website you just visited,
whether it's Facebook or CNN or--
a lot of news websites
want to know where you are.
If you go to like, what, fandango.com
or the like for movie tickets,
they might want to know where you are.
Well, you're giving them
very precise information.
If I go ahead and search for
these GPS coordinates on Google,
that's not where I am.
What the hell?
[LAUGHTER]
Why are we in Oklahoma?
[LAUGHTER]
I don't understand what's going on.
This was not part of the demonstration.
This was going to be the big climax.
Let's turn off the
wired internet in here.
And apparently, we're going
through Oklahoma today.
Let's turn on the Wi-Fi,
which will just give me
a different IP address, which
is a wonderful way to tie
the start of the lecture together.
If I wait a second, it should go green.
Come on-- no IP address.
Now these words might
make a little more sense.
Come on.
Give me an IP address.
Come on.
Harvard-- there we go.
There's my IP address.
Let's reload.
[LAUGHTER]
We'll email the IT
people about this later.
But all of my internet--
what this means is my--
no, this is really weird.
We have a lot of footage to
cut out of today's video.
So what this does is,
with low probability,
tell you where your users are in
terms of latitude and longitude
so that you could geolocate them, figure
out what the local movie theaters are
or what the starting
times of stores are,
give them directions to
places, and the like.
And while that was supposed to
be the big climactic finish,
apparently, none of this works.
Today was completely wrong.
We're in Oklahoma.
But let's end here today.
I'll stick around for questions.
We'll see you next time.
