BRIAN YU: Let's begin
by thinking about how
it is that computers and other
devices communicate with one another
over the internet.
Presumably when this
happens, one device is
sending some sort of message over
the internet to another device.
And that other device is responding
back with some sort of message.
And computers and other
devices all across the internet
are performing this process,
sending and receiving messages,
whether those messages or emails or
web pages or chat messages or something
else, but how is this all happened?
Well, it turns out that computers
and other devices on the internet
have standardized on a set
of protocols some basic rules
to follow that govern how it is that
these devices should be communicating
with one another.
And one of these protocols
is called TCP/IP.
TCP stands for transmission
control protocol,
and IP stands for Internet Protocol.
And these are the
protocols that determine
how it is that computers are able
to communicate with one another
over the internet.
How does this work?
Well, you can think of it as analogous
to sending a letter in an envelope,
for example.
We have an envelope here that we want
to send from one computer to another.
Or you can think more physically
as sending from like, one address
to another.
What information needs to
go on that envelope for us
to be able to know who
we're sending information to
and to make sure that envelope
can get to the right place.
Well, on a real envelope, you need the
name and the address of the recipient.
And you'll also put on the envelope
the sender and the sender's address.
And much in the same way that
when you're sending physical mail,
you include the address of the person
who you want to receive the message,
as well as the address of the
person sending the message.
On the internet it's very similar,
except instead of physical addresses,
computers and other
devices on the internet
have IP addresses--
internet protocol addresses.
And these often take the form of
some number, dot some other number,
dot some other number, dot some
other number-- four numbers separated
by dots.
And so you might imagine
that on this digital envelope
that we're sending from one
computer on the internet
to another computer on the internet,
we might have on the face of it
1.2.3.4, the IP address
of the computer that we
want to be receiving this message.
And then we include our own IP
address, the IP address of the sender.
In this case, 5.6.7.8.
Of course, this envelope contains
more information than just this.
But at minimum, it
definitely needs some address
that we're sending information
to and some address
that we're sending information from.
These four numbers that make
up the IP address, #.#.#.#,
all range from 0 to 255, so
from 0.0.0.0 to 255.255.255.255.
And if that number 255
sounds familiar in binary,
the number 255, as you might
recall, is really just eight ones
one after another, which means each of
these numbers inside of an IP address
is represented by eight bits of
information from all eight zeros
to all eight ones.
There are four numbers.
So four numbers with
eight bits each means
that in total, we have 32 bits with
which to work with IP addresses.
And if you recall, 32 bits means we
can count as high as about 4 billion,
which means there can only ever
be 4 billion addresses that
are available on the internet.
4 billion devices that are
connected to the internet that
have addresses to which we
can send information or that
have addresses that can be the
senders of information themselves.
And while 4 billion feels like it's
a pretty big number-- the number
that you can represent using 32 bits
worth of information-- in practice now
with the internet so ubiquitous and
with billions of people, many of whom
have computers but not just computers,
but also phones and other devices
that are all connected
to the internet, we're
fast running up against
this 32-bit limit.
And it's for that reason that
this notation of #.#.#.#,
each of which is 8
bits, is known as IPv4,
v4 of the internet protocol
using 32-bit addresses,
we now have a newer protocol called
IPv6 that uses 128-bit addresses instead
of 32-bit addresses, allowing for
far, far more addresses that can be
available on the internet.
IPv4 is still pretty common.
But many devices now are
transitioning to IPv6,
as the addresses that they use in
order to communicate with one another
over the internet.
But what other information
do we need in addition
to just the address of the person
that we're sending information to?
You might imagine that if a
computer has some IP address that
allows other people to send
them messages over the internet,
that computer might be receiving a whole
bunch of different types of messages.
That computer might be
receiving emails and packets
that are traveling over the internet.
That computer might be
receiving web pages.
It might be receiving file
transfers from other devices.
And the computer needs
some way of knowing
how to distinguish between all
these different types of packets
of information.
Is this packet of information an
email, or is this packet of information
a web page?
This is important relevant information
that this device needs to know about.
So in order to solve
that problem, we assign
each of the different services
of information-- things,
like email or web pages or
file transfers a number.
And that number is called a port number.
And so some common port numbers are 21
for FTP, a file transfer protocol that
allows you to transfer
files over the internet,
25 for SMTP commonly used for
email, and 80 for HTTP, which
you might be familiar with,
which is for sending messages
over the internet in the form
of web pages, for example.
And there are many other
ports that are used as well.
So what really goes
on the envelope, then,
is not just an IP address
of the destination
of who you want to be sending
a message over the internet
to but also a port number.
So it might look something like this--
1.2.3.4 is the IP address of who it
is that you want to send a message to,
colon, and then the port
number 80, in this case,
indicating HTTP, which
means we're sending a web
page from one computer on the internet
to another computer on the internet.
But these IP addresses are not
what you and I would normally
interact with when we're typing
in URLs into our web browser
in order to try to access a web page.
Indeed, we're not really
typing IP addresses.
You more commonly are
probably typing a URL--
something like this--
http://www.example.com
that tries to specify which website
it is that you want to visit.
But if we've just established that
all these devices on the internet
are identified by their IP address and
not by the URL that looks like this,
how is it that our web browser knows
that when we go to www.example.com,
what IP address should the
computer be trying to connect to?
In order to solve that, we use what's
called DNS or the domain name system.
And what DNS really is is it's
really just a mapping between URLs--
things like google.com or
harvard.edu and yale.edu
with their corresponding IP address.
It would be pretty annoying if every
time you wanted to visit a website,
you needed to know the IP address of
the server on which that website was
being served from in order to
type in those exact numbers.
It's much easier to remember something
like harvard.edu or google.com.
And so DNS is a bunch of servers--
these DNS servers that
exist on the internet that
know for any particular URL what
IP address does it correspond to.
And so when you type something like
google.com into your web browser,
your web browser can
check with the DNS server
and say, what is the IP
address of google.com?
And once it knows the IP
address, then your computer
is unable to communicate and
say, let me go to google.com
and request google.com's
web page from there.
So that's DNS.
This way of taking these
URLs and translating them
into the corresponding
IP addresses so that we
can take a URL, like
http://www.example.com
and figure out where on the
internet is the server that
is going to serve us that web page.
But let's now take a closer look
at this first part of URL, http.
And HTTP, as it turns out,
is yet another protocol
that exists on the internet.
In this case standing for
hypertext transfer protocol.
Hypertext just being short for
HTML or a markup language which
we're going to see shortly.
But what HTTP is all
about really is what's
inside of each of those envelopes that
when you send an envelope from one
device on the internet
to another device,
sure, on the outside it's labeled
by the IP address of the person
you're trying to communicate with and
your own IP address of the sender.
But what's inside of the envelope?
What is the content of my
request to a web server when
I'm trying to get a web page?
And what is the content
of that response that
comes back when I am trying to get
information back from a particular web
server?
Well, when I make a request, it might
look a little something like this.
So inside of that envelope,
we might see content
like this where the first
word you see here is GET.
This is what we call a request method.
And in this case it just means
I'm trying to get a web page.
Next up is this slash.
And the second part of specifying what
particular resource on the web page
that I'm trying to connect
to, do I want to get back?
And so many websites, like
google.com, for example,
have a /search or a /settings or any
number of other pages that I might want
to try to access.
Some may also have images
that I might want to get.
And so this slash in
this case is specifying
that I just want the root
of the website-- whatever
I would get to if I went to
in this case, www.example.com.
But as you might imagine, I might go to
www.example.com/ something else to get
a particular page.
And that would go in the
second parameter here.
Following that is HTTP/1.1.
This is specifying which
version of the HTTP protocol
that I'm using in order to
communicate with this host.
In this case, I'm using version
1.1, which is quite common.
Nowadays version two
is also pretty common.
But you'll still see HTTP
version 1.1 around quite a bit.
And beneath that I'm connecting
to a particular host--
something like www.example.com
example as the host
that I want to communicate with.
It's possible that the web server
that I'm communicating with
is actually holding
multiple hosts altogether
and is balancing between them.
And so in my request, I need to specify
what host do I want to connect to?
In this case, www.example.com.
And there's more information that
comes in the request other than that.
But the key ideas here
is that I am trying
to get a page from a particular host.
I specify what page it is that I want
to access in addition to specifying
which version of the HTTP protocol
I'm using in order to try and make
this request to the website.
What then responds back to me when
www.example.com receives my request
and wants to send something
back to me the person who
requested this page in
the first place, well,
the response might look
something like this.
Again, starting with HTTP/1.1, the
version of the HTTP protocol that is
being used in order to
communicate information.
Next up is this number 200.
This is what we would
call a status code.
Every time HTTP gives
you a response, it's
going to come along with
some code that specifies
how the response was resolved.
And 200 just means as the
word immediately following it
would describe everything was OK.
We were able to successfully give
you back some sort of response.
What type of response came back?
Well, that comes on the
second line of the response.
Content type colon: text/html means
the response that came back to me
is some HTML, some
markup that's ultimately
going to represent a web page.
But more on HTML in just a moment.
The response has more information
again than just that, likely
the actual content of the
HTML that's coming back to me.
But in short, this
response is specifying
the version of the HTTP protocol,
the status code that came back,
200, meaning OK, and then the content
type-- what type of information
is coming back to me.
In this case, it was HTML.
But it very well might have
been a text file or an image
or any other information that might
have been transferred over the internet.
This status code 200, meaning OK maybe
isn't something you've seen before.
But certainly if you've
used the internet,
there are other status codes that
you're probably more familiar with.
For example, 404 might be pretty
common which means not found.
If you try to request a
page that just isn't there,
HTTP says that the server is going
to respond with a status code of 404,
meaning that it wasn't found.
And there are other
status codes as well.
Status codes, like 301 which
means move permanently.
In other words, this
page has moved somewhere.
And so you'll often be
redirected from one page
to another page when you
receive a status code of 301.
403 is a status code meaning forbidden.
For example, if you try to access a page
that you don't have permission to see
or that you need to log
in to see first, you'll
very often see a status
code of 403, meaning
you don't have permission to be able
to see this particular resource.
And status code 500 generally
means an internal server error.
Whoever programmed the server
that is accepting these requests
and responding with information
likely has a bug in their code,
for example, that might result
in an error that occurs.
And so that's when a status code
of 500 gets responded as well.
And there are many other status codes.
But these are just some
of the more popular ones.
And status code 200 is what comes back
all the time when you request a page,
and you're able to view it successfully.
The web browser usually doesn't
show you the status code number 200
just because everything went
well, so there wasn't a reason
to show you anything in addition.
But as we'll see in just a
moment, we can take a look
at what's actually
happening in the network--
what messages are being sent--
what responses are coming back.
And in those responses, we'll
often see a status code of 200
which will indicate to us
that everything was indeed OK.
So let's take a look at a
real example and actually
try to open up a web
browser and see what
happens when we make a web
request when we type in a URL
and try and visit a web page.
I'll go ahead and open up Google
Chrome, although this feature is
available in other web browsers as well.
I'll go up into the View menu.
And I'll go to Developer
and then Developer Tools.
And then over here on
the right-hand side,
I'm going to go to the Network tab,
which is going to allow me to monitor
all of the network traffic--
all of the requests and responses
that are coming from my web browser
and being sent back to my web browser.
So we go to the URL bar.
And let me just visit a website,
like google.com, for example.
And we'll take a look at what
happens after this web page loads.
All right, so once
google.com loads, you'll
notice here in the Network tab
a lot of things have shown up.
But I'm going to ignore most of that
and just scroll up to the very top
where I'll notice that I had an
initial request for google.com which
is what I typed in.
And I'll click on that just to
take a look at some more details.
Here I see the headers.
And if I scroll down, I can take a
look at the Request Headers, which
was the information that
I sent to Google's servers
when I tried to request this web page.
And I can also see above
that the Response Headers,
the information that
came back to me when
Google responded with this web page.
I'll go ahead and click the
View Source button here just
to see what was inside of this request.
And you'll notice right
here we see HTTP/1.1,
meaning version 1.1
of the HTTP protocol.
And the response code that
seems to have come back to me
is 301, move permanently-- one
we've seen before that means
I've been redirected to somewhere else.
Well, I typed in google.com.
So where have I been redirected to?
Well, if we look immediately
below it on the second line,
we see location: and then
http://www.google.com.
So it seems that whenever
I try to visit google.com,
Google is redirecting me
to www.google.com instead.
And this is a fairly common convention
where www stands for worldwide web.
And many web applications
are hosting their website
on www.something, for example, although
strictly speaking, that isn't required.
So I've been redirected
to www.google.com.
And we can see that here
I have www.google.com.
This was the next request
that my web browser
made because it saw that I'd
been redirected somewhere else.
Here, 307, Internal Redirect just
means I've been redirected again.
Where have I been redirected to?
Well, it looks like here, I've been
redirected to https://www.google.com.
So https, the s standing
for secure, just
means that Google wants me to
connect to their website securely.
And indeed, more and
more websites nowadays
are using https, a version
of the protocol that
allows for information to be encrypted
when it's transferred between one
computer and another over the
internet just to make sure
that that connection is more secure.
And so now, I'll go to the next
request that my computer made.
Here, the URL that I was requesting
is now https://www.google.com.
And now, finally, the status code
is 200, meaning everything was OK.
So what happened here?
I typed google.com into my web browser.
And I got a 301 response code,
meaning that I'd been redirected
to somewhere else, www.google.com.
When I went there, I made
a request to that URL
and got another response,
which was that I'd been
redirected to https://www.google.com.
And when I made a request to
that URL, now and only now
do I see that I get a status code of
200 meaning that everything was OK.
And a lot of other
resources have come back,
as well, so when you
load Google's web page,
it's likely loading other scripts
or other images or other fonts,
for example, that all need to be loaded
in order to display this web page.
But the key here is that anytime you
make a request over the internet,
it's using this HTTP
protocol that allows me
to make a request to a particular URL.
And when the response
comes back, I can actually
see what's going on using
the Network tab of Google
Chrome or similar features
for other web browsers
that you might be using in order to
get an understanding for what exactly
is coming back from this web server.
And as we'll see later
on in this track, we'll
start to build web
applications of our own
where we're writing these
web servers and deciding
what status code should be returned?
What is the content that should
be returned back to the user?
When they make a request to
slash or to slash something
else or to any other page that
might exist on our web application.
So ultimately, there
are a lot of protocols
at play here in determining how it
is that devices on the internet,
be that your computer
or phones or tablets
are able to communicate with
other devices on the internet.
We saw a TCP and IP, a
transmission control protocol
and the internet protocol
that are determining things
like how it is that information
is sent from one device
to another-- what goes on the
outside of that envelope-- things
like the IP address, the address of
who it is that you're trying to send
information to in addition to the
port number that determines what kind
of service you're trying to communicate,
whether that be an email address
or whether that be a web
page or whether that's a file
that you're trying to send from
one person to another person.
And then we saw HTTP, the
hypertext transfer protocol,
which is the protocol
that determines what's
actually inside of that envelope.
What is the information that
I am sending to a web server
when I'm trying to request a web page?
And then what comes back
to me in that response--
the version of the
protocol, the status code,
and then the actual content
of what comes back to me.
So next, we'll take a look
at that actual content
and take a look at how
to build web pages that
can be sent using the HTTP
protocol and ultimately
viewed inside of your web browser.
