[MUSIC PLAYING]
DAVID MALAN: So odds are you're
on the internet these days,
but what does that actually mean?
And indeed, this internet that we use
very often these days for messaging,
for email, for browsing the
web and other services still,
there's a whole infrastructure that
underlies it that is increasingly
powering new ideas, new start
ups, new companies, new businesses
as well as new forms of
communication among humans.
And yet, like most every
topic we've explored,
you'll realize that
while it's very complex,
perhaps, up here, or
certainly seems complex
up here, if we begin with some of
the fundamentals and then layer
and layer and layer on top of
those, do we pretty quickly get
back to today's technology but with
a much better understanding of what's
going on from the ground up.
So here is a bit of alphabet soup.
Odds are you might have seen
one or more of these acronyms
to date, IP, DHCP, DNS, TCP,
UDP, ICMP, and so many more.
These are all examples of
something called protocols,
where protocols are
kind of like languages
that computers speak with one another.
They're not programming
languages so they're not
used by humans to make computers do
things or follow instructions per se.
A protocol is really
a set of conventions
that two computers or
two computer programs
might use when intercommunicating.
And so what's an example of
a protocol in the real world?
Well, we humans have some
silly protocols, one of which
here is, culturally, when you
meet someone to extend your hand
and then he or she
presumably extends their hand
and you do this for
who knows what reason.
And now you've sort of completed
that social transaction.
But it's a protocol in the sense
that when I extend my hand,
most any polite other person
knows that they're probably
supposed to extend their hand
as well, embrace for a moment,
and then complete.
And the protocol says, too, you
probably do this for terribly long.
And so there's these rules
of thumb or actual rules
that you follow when
implementing protocols.
And so computers, great as
they are following rules,
very often use protocols
when they intercommunicate,
in order to get data from
one place to another.
So let's tell exactly that story.
If you're on the internet,
right now, on the internet,
what does that actually
mean and how can it
help us solve problems,
ultimately, having access
to this inter-networked infrastructure?
Well, let's consider what happens
when I first visit my favorite web
page, for instance.
If I go ahead and visit something like
Facebook.com, I go ahead and log in
and I'm immediately
presented with my news feed.
Or maybe your favorite website
is Gmail or your favorite website
is Bing or maybe your favorite
website is any number of other places
you might go on the web, all of which
take in as input a request from you
and produce, ultimately, output,
the screen that you ultimately see.
But how does that data get
from one location to another?
Let's begin to draw a picture, perhaps.
And this picture might be
representative of your own home network
or maybe your campus network
or maybe your office network.
But generally speaking,
you are on the internet
maybe with your phone or your
laptop or your desktop device,
and we'll just depict that is
this sort of abstract laptop here.
So that laptop somehow wants to
communicate with a web server
elsewhere, Facebook,
Google, Bing, whatever.
And we're just going to present
that as way over here in the picture
in a really big corporate
office building, perhaps.
And inside of that
building are the servers
that compose that particular web site.
But how do I get data from that server,
which, if it's Google or somewhere else
might be all the way in California
or halfway across the world
and back to my laptop?
Well, somehow I have to be
able to send messages to it
and receive messages from it.
And of course in between me
and this resulting website
is what we'll generally
call the internet.
It's kind of conveniently
drawn as a cloud
here, which is another
semi-technical term that's
come into vogue in recent years.
And the cloud really just refers
to internet services these days.
It's not a technical term unto itself.
It's just a sexier term than saying,
my business is on the internet.
Oversimplification, and we'll
come back to that before long.
But you can assume here
that the internet is somehow
this delivery mechanism.
It somehow gets data from
point A to point B and back.
But how does that work?
If my data's coming in as
input and it's reaching,
eventually, its destination
and then a response
is coming back in this direction, what's
actually going on underneath the hood
there, especially since, in the
story at hand, all I've typed
is something like Facebook.com
Gmail.com or the like?
Well, it turns out that your computer
these days, when you first turn it on
and you connect to the Wi-Fi in a room
or you connect with an ethernet cable
to the wired network, your
computer receives some information
automatically.
Your computer speaks a protocol
called DHCP, typically, Dynamic Host
Configuration Protocol.
But in most of these cases, the
acronym isn't really what's important,
certainly, it's what the
protocol itself does.
And in this case, this Dynamic
Host Configuration Protocol
dynamically configures hosts
via a protocol, if you will.
So what does this mean?
Essentially DHCP says this,
when you turn on your computer
or you take out your
phone for the first time
and you're connected on Wi-Fi or to a
wired network, it says, hello, world.
I am alive.
I would like to be given an
address that I can communicate
with other computers on the internet.
It's not quite that verbose,
perhaps, but it is a question.
Hey, computers around me,
please give me an address.
And what it gives you is what's called
an IP address, Internet Protocol.
So just as in the real world where
physical buildings have historically
been uniquely addressed
with postal addresses
like Harvard's computer science
building is at 33 Oxford Street
Cambridge, Massachusetts, USA.
02138 is the more
precise zip code as well.
That uniquely identifies
that building in the world.
So does my computer need
an address, and it's not
going to be some free form
address like that in words.
It's actually going to
be a numeric address.
Specifically, I'm going to get an IP
address of the form number dot number
dot number dot number, so four
numbers separated by dots.
Each of those four numbers
happens to be a byte long
or eight bits, so each of these
numbers, therefore is between 0 and 255,
and so this means, long story short,
that the total address is 32 bits--
plus 8 plus 8 plus 8-- and that means
there's four billion possible addresses
in the world.
And that's great because people have got
a lot of computers and a lot of laptops
and a lot of desktops
and servers these days.
But it turns out we're
actually running out
because we have so many such devices.
So there's a newer version
of IP that's increasingly
being used called IP version 6.
We're talking here about IP version
4, since it's so omnipresent.
And IP version 6, just so you know,
uses 128 bits for its addresses,
way more than 32, so we'll
be good to go for some time.
But DHCP gives me this address, an
IP address of the form something
dot something dot
something dot something.
And the purpose of this address is
to help my data get from point A
to point B. And indeed,
anytime my computer sends
a request on the
internet like, Facebook,
please show me my news feed, or
Gmail, please show me my inbox,
my computer has to use that IP address.
So much like if sending a
letter in the real world,
you might have an
otherwise blank envelope
and you might want to send a message
to someone else in the world,
you might write their physical address.
But in the computer world, we
might write something like 1.2.3.4
in the to field, assuming that
this is the IP address to which we
want to send this data.
Meanwhile, my from
address might be 5.6.7.8,
so I'll write it in the
top left hand corner
by convention, whereby that
indicates to the whole internet this
is where this request came from.
Now, I know my origin address, the
source address here at top left
because DHCP told me.
How do I know one, two, three, four?
How do I know the IP address of
Facebook.com or Gmail.com, right?
We don't live in the
world of 800 numbers
anymore, where you dial 1-800 something,
something, something, something,
something and you have to advertise
your phone number, per se.
We don't necessarily live only
in the world of 1-800-COLLECT
any more where we had these mnemonics
where you had letters mapping
to numbers just to help remember it.
We went full in on
this idea of mnemonics
such that now we have Facebook.com
and Gmail.com and no numbers
whatsoever for us humans to remember.
So thankfully, it turns out there's
another system in this world,
another acronym, if you will, a new one
now, called DNS, Domain Name System.
So there are also in the world,
not just DHCP servers that
have people IP addresses
from their local network,
there's also DNS servers
whose purpose in life
is to convert domain names
to IP addresses and vice
versa and a few other features as well.
Now, what does that mean?
That means that when
my Mac or my PC sees
little old me, the human,
typing Facebook.com
or Gmail.com, my laptop contacts
a nearby DNS server and says,
hey, my human has asked
me for Facebook.com.
What is its IP address?
And DNS server's purpose in life
is to answer that question and say,
oh, Facebook.com, it's 1.2.3.4.
Use that address instead.
Now, thankfully, my
computer can now write
that number on its virtual
envelope, so to speak, and then
pass that envelope out to the internet.
And because of these numeric addresses,
it will be properly, hopefully,
routed across the internet
to its destination.
Because it turns out inside
of the internet here,
interconnecting everything
in between point
A and B are things called
routers or gateways.
And I could draw this picture
in any number of ways.
But the point is that it's
just so darn interconnected.
And indeed, there might be
even more pathways still
or maybe even fewer pathways.
Indeed, on the internet,
there's often multiple ways
for data to get from one point to
another, some shorter, some longer.
But there's this
resilience, this redundancy,
and this was a feature back in
the day, especially in so far
as the internet had
militaristic origins.
It was meant to be redone into as
to withstand failures of one or more
of these nodes, these
dots in the picture.
Now, each of these dots
is just a server, really,
a special server called router
or gateway, whose purpose in life
is to do exactly that, to route data.
Upon receiving a virtual
envelope like that one,
it looks at the to address realizes,
oh, this is destined for 1.2.3.4.
I know that that address
is over this way.
Meanwhile, if it gets another
envelope from someone else,
it might say, oh, this
is some other address.
It's going to go this way.
And so routers have
multiple cables or they
have multiple virtual
network connections elsewhere
or wireless connections, any
number of possible connections
might they have to other routers.
And so it can route it to
its next hop, so to speak.
And generally on the
internet, within 30 hops,
within 30 transmissions
from router, router, router
will your data get from
one point to another.
And it might not follow
the same path each time
but it will traverse
this so-called internet.
And so that's kind of
what the internet is.
It's this collection of routers
and it's this collection
of networks, a network of
networks that is incredibly
interconnected in different ways.
So DHCP gives me an IP address.
So I have a unique IP address.
DHCP, it turns out, also
tells me what the IP address
is of my local DNS server so I know whom
to ask to convert domain names to IP
addresses.
But once I have that, I
can now use a protocol
called TCP to send my data reliably,
typically, from one point to another.
So whereas IP is responsible
for a few things,
one of its most important
functions is this notion
of addressing and standardizing
how things are addressed.
But TCP, one of its
most salient features
is to guarantee, with high
probability, delivery.
And what I mean by that
is that bad stuff can
happen in the middle of the internet.
These routers can get really busy.
They can get really
congested and overloaded.
And so routers might--
well, virtually drop packets.
They might receive so many
packets at once they just
can't, like a human, deal with
it all at one time because they
have a finite amount of memory or RAM
or disk space and so they drop them,
so to speak.
They just delete them and they
expect the senders to resend them.
TCP is a protocol, another
agreement between computers,
that if the receiving computer realizes,
hmm, I got some of your packets
but not all of them, TCP mandates,
much like our human handshake,
that something next should happen.
TCP says, my laptop should
retransmit that virtual envelope.
But TCP allows us to do
something more than guarantee
with high probability delivery of data.
It also allows us to multiplex
among services, or put more simply,
it allows a server to receive
different types of data
for different types of
services, for instance,
web services on the server, email
services, chat services and the like.
And so it turns out that on this virtual
envelope that gets sent from a computer
to a server, it's
actually not sufficient
for there to be the return address
and the IP address of the destination.
I also need to specify
what type of information
is inside this envelope, or
equivalently, what kind of service
I'm trying to contact.
And I could do this by specifying in
words what's inside this envelope.
Maybe it's something
like HTTP, the prefix
that you're familiar with from the web.
Maybe it's an email.
Maybe it's a chat message or the like.
But if it is, in fact,
something like HTTP,
turns out the convention is not
to use words but to use numbers.
And so in fact, I need to pull
one other piece of information
on this envelope, which is a
so-called port number, a TCP port
number, which is numerically printed
after a colon on a virtual envelope
this.
And in this case I wrote 80
because 80 happens to be,
by human convention, the number we
humans agreed on some years ago,
identifies web services on servers.
But this means that if the
server I'm sending this to,
1.2.3.4, actually has other
services on it like a chat server
and an email server and the like,
this won't get confused with an email
that I or someone else am sending
to the server or a chat message.
The server will know
upon receipt of this,
oh, this is a request for a web page.
Let me send this virtual
envelope to the web server.
But HTTP isn't the only such protocol.
There are something called UDP, which
is common in some circles as well.
UDP works a little
differently, in so far
as its feature is to
not guarantee delivery.
If some data gets lost, packets
get dropped, so to speak,
for whatever reasons, malfunction,
technical difficulties,
routers are overloaded,
UDP says, our protocol
shall be not to retransmit that data.
And that's a strange thing,
because it sounds worse.
And yet, this protocol's been around
for quite some time, still used,
quite appropriate in some contexts.
But what context would you actually
want to just forge ahead, irrespective
of getting complete information?
Well, go to here is something like
videoconferencing or audio conferencing
or live TV on the internet, watching
a game like a football game,
for instance.
If you want to watch
it in real time, you
might prefer that the
stream, the bits that
are coming from the NFL or wherever
to your computer don't actually buffer
don't actually stall.
You would rather miss a second
so that at least you stay
current in real time with that game,
or video conferencing even more so.
It'd kind of be annoying if you have
a bad connection or some packets
get dropped and you just have to
wait and wait for the person's voice
or image to be retransmitted.
You'd rather just say, what did you say?
Could you repeat yourself?
Say again?
You can just use human protocols
to deal with that, too.
So sometimes you want live streaming
applications for whatever purpose
and you want the data
just to keep coming.
As much of it as can
make it through is great.
But you don't necessarily
want it to be resent.
So data is going from
one point to another,
but how long does all this take?
My god, this is kind of a long
story just to get data there.
Well, let's do an experiment.
Let's go ahead and pull up a program
that uses a different protocol
altogether, ICMP.
And there's other protocols, still.
This one's a little
more technical but it's
wonderfully revealing in a few ways.
I'm on my Mac here in the
so-called terminal window
that you can pull up something similar
on Windows and other operating systems
as well.
And what I'm going to
do is literally trace
the route between my laptop
here and some foreign server,
for instance, one on the west coast
of the US, Berkeley's web server.
So let me do that, traceroute,
www.berkeley.edu, enter.
And curiously, we start to see a whole
bunch of lines of output, most of them
numerical.
And indeed, notice that each
of these is an IP address.
But what is it an IP address of?
Well, we have 18 of these between
me and Berkeley, apparently.
Turns out those represent routers
between me and Berkeley, California.
Each of them has an IP
address and each of them
has a measurement of how long it took my
data to get from my Mac to that router.
It's highly variable.
Notice, it's kind of all over the place.
In fact, this is just weird.
This took 3,000 milliseconds
or three seconds,
so I'm guessing that
that router in row eight
was congested for some reason, some
kind of network issue there temporarily,
but then my data actually went through.
And it's not cumulative.
These are individual tests from
my Mac to each of these routers
iteratively, one at a time.
And you can kind of
get an aggregate sense
of how long it takes, therefore,
for data to get from the east coast
to the west coast.
If we look at some of the later
numbers, they're kind of variable
but they seem to be
around 75 milliseconds.
So this is kind of extraordinary.
If you want to fly from Boston,
Massachusetts to San Francisco,
it's going to take you
five, six, seven hours.
You want to send an
email or send a packet,
it's going to take you 75 milliseconds.
That's astonishing, how
quickly the data can transmit.
Now, notice this is not
all that enlightening
knowing these IP addresses.
But eventually, some of
them have domain names,
just because the humans
controlling those routers decided,
we're going to give these routers
actual names, domain names,
as opposed to just having IP addresses.
And you can often, but not always, infer
from the domain names where they are.
So I'm going to guess
that at least row 11
here, I don't know what
XE7000.rtsw is, but losa.net, Los
Angeles in California.
I'm guessing my data kind of came
into Southern California first.
But then notice what happens next.
A couple of nameless servers,
LAX, so maybe that's the airport.
Indeed, routers, for
historical reasons, tend
to be named after a
nearby airport codes.
I'm not sure what this
next one is here but I do
recognize Oakland and UCB, UC Berkeley.
So I'm guessing one of the next routers
is actually in Oakland or near Oakland.
And so that's a pretty long cable
or interconnection essentially
between LA and Berkeley.
But the result, ultimately,
is that my data makes its way
to Berkeley, this time via this path.
If I ran it again now
or in a day or a week,
the path might be a
little different based
on congestion and interconnectivity,
but the data actually gets there.
And cutely enough, it looks like
Berkeley's web server is called Cal web
farm prod-- for production--
ist.berkeley.edu.
75 milliseconds only.
But what about this, what
if we don't stop at the edge
as we do at the edge of this
continent but keep going?
What's going to happen?
Well, let me try to trace the
route to, say, www.cnn.co.jp,
the domain name for
what I presume is going
to be the Japanese version
of CNN's web site in Japan.
Here, too, we have a bunch of nameless
servers just with IP addresses.
Gets through them pretty quickly.
We seem to have some lulls sometimes.
This program won't-- sometimes the
routers won't respond to these queries
so they remain, essentially, anonymous.
But now this is quite interesting.
Oh, my god.
We went from routers 12, 13, 14,
15 taking about 63 milliseconds,
give or take, to 193 milliseconds,
which isn't a blip because it
stays around that
value, 180 milliseconds,
160 milliseconds, 177 milliseconds.
That's a big jump of 100-some
milliseconds just between routers 15
and 16.
Why might that be?
What could be between routers 15 and 16?
Well, if you know your geography, it
might very we be the Pacific Ocean.
There's quite a bit of
distance, there's quite a bit
of cabling that actually connects the
west coast of the country to Japan
and other areas in Asia and beyond,
and that's what's pretty amazing.
Not only is there interconnectivity
on the internet these days via cabling
and via Wi-Fi signals and
via satellite signals,
via microwave signals and the like, you
have so many different ways for data
to be transmitted.
And it's absolutely astonishing
and exciting, dare I say,
just how interconnected
the world now is.
In fact, thanks to this animation
online, let's take a look
and appreciate just how extensive
this network actually is.
[MUSIC PLAYING]
All right, so let's actually solve
a problem now with this internet.
All right, the internet, as you
probably heard, is filled with cats.
And yet, these cat
images can be pretty big.
And indeed, bigger,
still, than images are
things like video files
from Netflix and the like.
And so there's huge amounts
of traffic transmitting
over those kinds of interconnections.
So how do we ensure, at
least with high probability,
that data can actually get through?
How can we ensure that there's some
form of fairness, if not net neutrality,
so that my data can
get to its destination
just as readily as your
data can get there?
Well, sometimes it's
opportune to actually take
big packets of information
and chop them up.
So indeed, what a computer will
often do, thanks to TCP/IP,
the combination of these
protocols, is we'll
take large files and large
images, in this case,
tear them up into, say, roughly--
oops-- equal sized parts like this here
and then tear it down even further,
perhaps, to get it into a
smaller byte-sized piece
and then send not only one packet
of information over the internet.
But instead, put one piece of
information in this packet here.
Put one other piece of
information in this packet here,
whose addressing, both to
and from, is identical.
And then do the same thing
for the two other pieces
so that ultimately we have
four packets, each of which
contains one portion, one
quarter, in this case,
of the resulting message, all of which
are destined for the same destination.
But the problem to be solved, now, is
what do you do with this information?
If I have four seemingly
identical envelopes
but inside of which are
disparate pieces of information
that somehow need to be reassembled--
let's put on our proverbial
engineering hats--
how do you solve this problem?
Is this sufficient information
on the envelopes so
that if I send this out on the
internet toward Berkeley or Stanford
or Facebook or wherever, how does that
recipient know what to do with it?
What would you, the
human, do if you have
not virtual but physical envelopes?
Well, here, too, and
here's an opportunity
really to bring to bear human
intuition to a problem that
seems fairly technical and well beyond
one's own technical understanding.
And yet, it really is just a technical
manifestation of a real world problem.
I need to keep these in order somehow.
So you know what?
I'm going to say something like one
of four on the first one, like this.
The next one, I'm going to say two
of four on the next one, like this.
And then I'm going to say three
of four and then on the next one
here, I'm going to put four of four.
And what's the takeaway, now?
Now, whoever is the recipient
of these several envelopes
as I send them out on the
internet-- and indeed,
they don't have to follow the same path.
One can go this way.
One can be routed that way.
Another can go to this router.
Another can go to that router.
Because they're all addressed
and because all of these routers
are somehow interconnected,
all four of those packets
will hopefully get to their destination.
But if they don't,
the recipient can look
at that additional detail I wrote on the
envelope and see, oh, I got part one.
I got part two.
I got part three.
But where is part four of four?
It didn't arrive because of congestion.
Literally got dropped on
the floor or not picked up.
So the computer, who's
supposed to be receiving
that data, thanks to TCP recall,
can say, hey, please send me again
packet four of four.
And so as technical as
the internet might seem,
it really, again, is just some
fairly intuitive solutions
to problems like this, albeit translated
to more technical contexts, more
technical protocols, and
more technical languages.
But let's look at some
more user-facing protocols.
The ones we've discussed thus far
are fairly low level, if you will.
And indeed, there's this
whole internet hierarchy
of protocols layer on
protocols layer on protocols
so that what we humans really tend to
care about, if we're not the engineers
but we're really the software developers
and we're the users of applications,
we care about application
layer protocols that
is right between the human and all
of those lower level protocols.
For instance, these, at least one of
which has got to jump out at you, HTTP.
Odds are you've seen this.
Odds are you've typed
this, though decreasingly
do you have to still type it because
browsers will just add it for you,
HTTP.
The secure or encrypted version,
HTTPS, IMAP for email in-bounds,
SMTP for email outbound, SFTP for Secure
File Transfer, SSH for Secure Shell,
an encrypted text textual channel
between two computers, and many more.
But HTTP, let's focus on that one
because that is Hypertext Transfer
Protocol.
Or HTTPS, the same
but the S stands for--
not savings-- secure, so it's
actually encrypted in this case.
So what does this actually mean?
Well, at the end of the
day, HTTP is a protocol
that governs what kinds of messages
go inside of those envelopes
that I've been preparing for the
internet, what kinds of messages
go inside of those envelopes.
And it turns out the simplest
message that a computer sends
through this whole internet,
ultimately, inside of virtual envelope
is quite often, thanks to HTTP,
inside of this virtual envelope,
if I'm trying to request
a cat from the internet,
might literally be a
message like this, get me,
for instance slash cat.jpg for JPEG.
And maybe some additional
text after that,
maybe some additional text below
that, but at the end of the day inside
the virtual envelope, if I am on the
internet and I'm going on Google Images
and I want to find a picture of
a cat, inside of my envelope,
if I am a web browser speaking HTTP is
going to literally be a textual message
that says get/cat.jpeg, if I know that's
where the image is on some server.
The response is going to be
what was just inside of those
four envelopes back
from the server to me,
chopped up maybe into multiple
pieces but in a way where I can then
realize, oh, wait a minute,
you sent me only three or four.
Please send me the fourth one.
So it works in both ways, whether
it's me sending a cat to someone
or receiving a cat from someone.
This protocol, HTTP, governs
how the messages are formatted
and what language, so to speak, is
spoken between web browser and server.
So indeed, HTTP is
entirely about having a web
browser communicate with a server.
And we can see this in action.
I'm going to go ahead and pull up
a so-called terminal window again,
this textual command
prompt on my computer.
And I'm going to
pretend to be a browser.
So I'm not going to just trace
the route between point A
and point B. I'm actually
going to request a web
page as though I am Chrome
or Edge or Firefox or Safari
or whatever your favorite browser is.
But of course, as before,
all I know is that I
want to visit my favorite web
site, Facebook.com, for instance.
But I don't know its
IP address necessarily,
so let's go through that step.
How do I look up its IP address?
Well, my Mac already has an
IP address because of DHCP.
I'm already powered up.
I'm already connected to
the Wi-Fi here on campus,
and so I already have my
own IP address, and I also
have the IP address of a DNS server.
So my Mac just knows that.
But I can use that capability
now to look up the IP address
for the name, Facebook, and I'm
going to do that as follows,
nslookup, for name server lookup.
And I'm going to go ahead and
type in www.facebook.com, enter.
And interestingly, we get back
this somewhat cryptic response
but let's make some sense of it.
So it looks like the server that this
response came back from 10.0.0.2,
which happens to be a private IP address
here on campus that you might have
in your own company or
university or even home network,
Then a non-authoritative answer
is this, www.facebook.com,
whose canonical name is, curiously,
star-mini.c10r.facebook.com.
Well, it turns out that
companies like Facebook
absolutely have many, many,
many different web servers,
and they might not necessarily
have just one IP address.
But we might just be
seeing one IP address
depending on where I am in the
world and depending on how Facebook
has configured its infrastructure.
The takeaway, then, is that apparently
so far as my Mac is concerned,
www.facebook.com is an alias
for or a synonym for this
longer less well marketed
domain name here.
But what we really care about, if
I'm about to pretend to be a browser,
is this IP address.
Facebook's IP address is
apparently 31.13.65.36.
And I can see this, in fact.
Let me go into Google Chrome,
or any browser for that matter,
and go to http://31.13.65.36, enter.
And voila, I made it to Facebook.
Now of course no one
in their right mind is
going to advertise the IP
address as 31.13.65.36.
No one would remember that.
We're not in the age of phone numbers
on the side of billboards anymore.
Now we have Domain Name System and
DNS which does this conversion for us.
But now that I know that IP
address, I can use this information
and pretend to be a browser and
not just see the response in Chrome
as we just did, but I can
see it in my textual window
so I can look inside the envelope.
Indeed, this terminal window
is going to let me pretend to--
well, actually send a message as though
I'm a browser pretending to be one.
But it's going to let me see inside
of the response that comes back.
Here's what I'm going to do.
I'm going to go ahead
and type in cURL dash I,
and I'm going to go ahead and type
http:// and then that IP address
and I'm going to hit enter.
And notice, uh-oh, Facebook
has moved permanently.
But this is a good thing.
To where has Facebook moved?
Well, apparently we've
gone back a response,
via version 1.1 of of HTTP that
Facebook, per this status code, so
to speak, has moved permanently.
Has moved permanently, which sounds
scary, but where has it moved to?
Oh, they don't want people visiting
their IP address, even though it works.
They want to redirect people, so
to speak, to their domain name.
So we seem to be kind of in a cyclical
situation here where, wait a minute,
I thought I had to convert my
domain name to an IP address.
And indeed, I do, but it
turns out cURL is pretending
to be a text-based
browser here, effectively,
and it is already going to do this
DNS look up for me so this is OK.
I'm going to go ahead now and do cURL
dash I, http://www.facebook.com, enter.
Oh, my god.
Facebook moved again.
But where did they move this time?
Well, it seems that
Facebook would prefer
that we visit
https://www.facebook.com, which
is the secure, the encrypted version.
OK, I can oblige.
Let's go ahead and do that.
cURL dash I of the HTTPS version, which
I've just pasted in, enter, and voila.
Now, this looks overwhelming, but what's
really important is this message here.
It turns out everything is OK.
And indeed, what's come
back from the server
is a virtual envelope, inside of
which is this message here saying,
hey, no big deal.
Everything is OK.
And you never see this number
when you visit web pages,
unless you're a software developer
and you know what tools to use.
Instead, some of us out there,
some of us normal humans
occasionally see a different
number, maybe the one number
you associate with the web.
Let me simulate it as follows.
Let me go ahead and request
this completely bogus page.
Hopefully that's not actually
someone's user name and hit enter.
Scroll back up a bit.
What do you notice this time?
If you've ever wondered
what 404 means, it
is the numeric code inside of a virtual
envelope coming back from a server
when you have requested
some nonsensical URL because
of a typographical
error or just nonsense
that I typed that's now having the
server tell you, uh-uh, not found, 404.
So this is just a special numeric code.
And this is common in
programming to have
numbers correspond to different
types of things that can go wrong
or, better yet, that can go
well, as in the case of 200 OK.
Now, all of this stuff
is called HTTP headers.
So I was oversimplifying
earlier when I said
HTTP is just this handshake of
sorts between servers where you say,
get me a cat picture and then you get
back the response as per those four
envelopes.
There's more headers.
There's more key value pairs, words
with colons, words with colons,
words with colons, and then
values to the right of those.
And that is just additional metadata,
more information from the server that
tells you a little something about it.
But if I instead run that
same command one final time,
this time doing cURL and then specifying
not dash I but just the URL itself
and hit enter, this
craziness comes back.
And this looks like a whole lot of
programming language in something
called JavaScript or big JSON object.
And my god, look how much data
came back from the server.
But notice, I'm starting
to see some structure.
Open bracket div and
the word label here.
And if I go up here, input here.
And indeed, what you are seeing
is a language called HTML.
Inside of the virtual envelope, if
you're requesting not a cat image
but a web page that has your news feed
or your inbox from Gmail or your search
results from Google is
a language called HTML.
And HTML's not a programming language.
And indeed, it's not as
cryptic looking as this.
Google is being very--
or, Facebook is being
very efficient when
it comes to showing me
this information and just
getting rid of as much
formatting as they
can to save space, to save on internet
bandwidth or transmission thereof.
But it's a language that comes
back in this virtual envelope
that a browser knows how to display.
It's a markup language
in the sense that it's
going to tell the browser what to show
on the screen, where to show the cat,
where to put words, whether to make
those words big or bold or italics
or centered or any
number of other things.
And indeed, what you are seeing is this.
This is www.facebook.com graphically,
as we see it in the browser.
Underneath the hood is that black and
white seemingly nonsensical Greek,
if you will, that at first
glance, there's no way most of us
would understand it.
But that's because we're
looking at it here.
We need to dive in a
little deeper, take a look
at what HTML is, how
it's actually structured,
make the simplest of web pages, a
hello world of web pages, if you will.
And then can we realize
and build back up
to this point exactly what composes
pages like Facebook and Gmail
and Google and Bing and others.
Because at that point, we'll
have understood not only
how the internet works,
but how you can use it
as a delivery vehicle for your ideas,
for your programs, for your products,
for your companies and more and actually
deliver information and deliver cats
and much more to your
users on this internet.
