[MUSIC PLAYING]
SPEAKER: All right.
This is CS50, and this is week nine.
You'll recall that last time, we took
a look back at such things as Scratch.
Because recall that when
we introduced Scratch,
we introduced a whole
number of programming
constructs-- loops and conditions
and functions and variables and more.
And then just one week later did we
transition from that world of Scratch
to C, where the syntax was much
more cryptic looking, certainly
at first glance, and
perhaps a little bit still.
But the ideas were ultimately the same.
And now in week nine as we transition
from the world of C to Python,
you'll find that that same
finding is the case, whereby
we are using the same ideas.
We're leveraging the same concepts.
But we have to translate it now
to a slightly different domain
and a slightly different syntax.
But things really start to
get interesting this week
and beyond, especially as
we build on our ability
to write Python code, our ability
to serve up web applications.
Because now we can begin to
leverage so much more functionality
than comes with C, than
comes with the CS50 library,
because there's such a large
community of software developers
who have created some really
amazing things that we can try out.
So let's look further today at
Python, and let's look further
at the applications for
a language like this.
You'll recall that perhaps the simplest
program we could have written last time
was actually just one
line-- print Hello World.
But you'll recall as you began to
dive into some of this past week's
challenges, you might
have needed or wanted
to actually start wrapping code like
that in a method or function like main,
and then calling it by default
with this magical incantation here.
But this was the building block
toward which we were starting
to develop more interesting programs.
But now we're going to really context
switch from a command line environment
over to a web-based environment.
And the world has been writing web-based
applications for quite some time.
Even I 20 years ago recall made that
Frosh IMs-- freshman intramural sports
website.
And even since then have the languages
changed and the paradigms changed,
and we humans have learned a lot about
programming for web-based applications.
And this, for instance, is one
design pattern or one architecture
that has arisen,
whereby MVC refers again
to this pattern, whereby you put your
intelligence, your so-called business
logic in your controllers.
This is one or more files
that has all the conditions,
a lot of the functionality calls,
and actually does something
with your program.
Then you have the views, which
are often templates or files that
render information that you
might have dynamically generated
or input from users.
And then model.
And today, we start to dwell on the
M in MVC, model, because last week,
we didn't really leverage
much, if any, of a model.
But this week we're finally going
to demand of ourselves the ability
to save data, retrieve data,
search data, delete data, and more.
And we really haven't had this
capability, other than very simple CSV
files, for instance
back in the day of C,
and even last week when we
dabbled with those in Python.
And you'll recall, last
time we introduced this.
Flask is what's called
a micro framework.
So a bunch of files, a bunch of
code that a community of people
have written that just make it easier
to make web-based applications.
It's absolutely not required.
In fact, one of the sample programs
among last week's distribution code
if you like to go back and play
was a program called serve.py
that doesn't use any of this.
It just uses built-in
Python functionality.
But you'll find that
it's pretty cryptic.
It's pretty heavyweight in order
just to do something simple.
And so things like Flask have
come around that just make it easy
and dare say more pleasurable
to write web-based applications,
because people have realized in writing
web app after web app after web app
that they're just repeating
themselves, or borrowing code
they've written in the past.
And so in frameworks, you have solutions
typically to very common problems.
So we'll use this set of
solutions to the development
of a web-based application.
And for instance, the
simplest Flask application
that's also available from
last week's source code
might be something like this,
whereby the top of your file,
you don't import like the CS50 library,
you instead import someone else's
library, specifically from the
Flask framework import-- what's
called the class, recall,
called Flask-- capital F.
And then also a function or
method called Render Template.
And we used both of those as follows.
Last week, we instantiated
a Flask-based application
by essentially passing
in this special reference
to the current file, the
name of the current file,
and then allowing the
framework to do its thing
and give us back this very special,
very powerful object called app--
though we could have called
it anything-- that gives us
access to some useful functionality.
For instance, the most
useful functionality
initially was just this-- @app.
And this is just syntax for what's
called a decorator, and more
and that some other time, or more on
that in Flask's own documentation.
But essentially, this
line here-- @app.route,
says that hey server, any time you see
an HTTP request for slash, the default
web page typically of a website, go
ahead and call the following function
that's immediately below it.
That function I've called
Index, mostly by convention.
But I could've called
it anything I want.
And all it did last week in this
example was render a template.
In this case, index.html.
Now, that could be raw HTML recall.
But oftentimes, you use something
called a templating language.
And indeed, we introduced a little
bit of Ginga last time, which
is a Python-based templating
language that we'll see just
makes it easier to generate
HTML without having
to write HTML inside
of our actual Python
code, which tends to be frowned upon.
So let's take a look back at
one of those examples, which
I've renamed from Frosh IMs
last week to Frosh IMs zero.
And recall that we had
the following files.
We had a templates
directory, inside of which
was failure, index, layout, and success.
Kind of a lot of complexity
for a pretty simple program,
but we'll see what each
of those does again.
And then application.
And this was the so-called controller
code that actually did something
interesting last week.
Now, what was that?
Well, it's a pretty small program.
As before, I've imported
from the Flask framework
a whole bunch of symbols here.
I'm instantiating my application here.
This is copy-paste from
our simplest of examples,
whereby if the user just visits Slash,
I want to show him or her index.html.
But then it got interesting.
This really was the first
time we had the capability
in a web-based environment to
respond dynamically to user's inputs
based on whatever they
typed into a web form.
In fact, if you think back
a few weeks in week six
when we first introduced
the web and HTTP and TCP/IP
and making HTML-based web pages,
you recall that all we did
was implement Google's front
end, and an ugly one at that.
But just the HTML form, that
if you click the Submit button,
it submits to Slash
search on google.com,
because we pretty much
deferred completely to them,
lacking at the time a backend
and lacking at the time
even a language in which we
could implement our own backend,
a web server that actually
responds to those requests.
But in Frosh IMs Zero,
we have the ability
to have our own route,
in this case called
Slash Register that I've
specified isn't even
going to respond to HTTP Get requests,
but rather Post requests, which
typically mean a form submission
is sending one or more fields
that you don't necessarily want to end
up in the browser's URL or history.
I'm calling my function Register,
and this would be a good convention.
Just make sure your function here
lines up with what the route is there.
And then I'm checking
a couple of conditions.
If that request's form-- that
is, all of the parameters
that were submitted via HTTP
Post-- has a name field--
so a text field, for
instance, called Name--
and that equals nothing-- quote
unquote, the so-called empty string--
or, that request's form's dorm key
has a value of quote unquote, which
is to say that if the user, myself last
week, did not give my name or my dorm,
then go ahead and return the
template called Failure.html.
And we'll take a look
back at that in a moment.
Otherwise, render template
Success.html if all in fact goes well.
Now, if we take a look at Failure.html,
it didn't do all that much.
It extended Layout.html.
It declared a title of Failure.
And then it declared a body of You
must provide your name and dorm
as sort of an admonishment to the user.
If I look at Success,
meanwhile, it's pretty similar,
but the text is different.
But this is the problem
we address this week.
You are registered, well not really.
Because recall-- and you could have
seen a moment ago-- we did nothing
with the user's name or dorm
or any other information,
we just pretended to
actually register them.
And then what did the
form itself look like?
Well, this page here is mostly HTML.
But again, notice that
even this page at the top
defines a body block and a title block,
because it's extending Layout.html.
So this is the Ginga
stuff that I referred
to earlier, the templating language.
And if we finally go
into Layout.html, now you
see the basic framework for every
page in this web-based application.
It's a pretty small application
to be fair, but it does
have at least three distinct pages--
Index, Failure, Success-- all of which
are identical to the file, except that
they each have a title and a custom
body dynamically embedded thanks
to how we're using templates.
So this is a nice way of not having
to copy and paste all of that code
into every file, thereby making
it a pain to update anything,
to add CSS or JavaScript
files, or generally any
of the overall structure of the page.
We can factor that all
out to Layout.html.
Of course we just threw
all this information away.
that name, that dorm, every student
who's been registering by this app,
we're forgetting about.
And so at the tail end of last week,
we introduced this solution recall,
whereby in Frosh IMs
One now, I'll call it--
or rather today let's redress that
by borrowing an idea from last week
that we didn't incorporate
into Frosh IMs as follows.
In Application.py for this next
version called Frosh IMs One,
what do I seem to be doing differently?
File's almost identical to before.
It's a little longer this
route, because what I've grabbed
is some of the code from last
time, whereby we wrote out to a CSV
file-- Comma Separated Value--
those student structures.
We did this way back when in C. We then
ported the student struct to a student
class last week, and now
I've borrowed that same code
for saving those students to disk, so to
speak, by embedding it into this route.
So what's going on?
So if again the name
or the dorm are blank,
I go ahead and just return the template
called Failure.html and we bail out.
Otherwise, I declare a
variable called File.
I open a la C's fopen
function, registrants.csv,
which I'm just calling
this just because it's
going to contain my registrants
in a comma separated values file.
And then quote unquote a.
And you might not recall this or might
not have seen this, but quote unquote a
is for appending.
If I instead and accidentally
did w for writing,
you might think that's correct.
But every time you use quote unquote w,
you're going to write out a new file.
Which is to say, if
100 students registered
for some intramural sport via this web
app, and I was using quote unquote w,
I would actually keep
clobbering or overwriting
the file, such that only the
most recently registered student
would appear in this file.
So we want quote unquote a for append.
Then I declare a variable called
writer, and I'm using this CSV module
that we get for free
from Python that's going
to allow me to create what we'll a
writer, passing in that file as input.
Then I'm going to call this special
method Write Row, whose purpose in life
is to take a tuple,
which in this case has
two elements, the name that was passed
in via the form as well as the dorm.
And then I'm going to go
ahead and close the file.
So Write Row is now
the function that takes
care of all the complexity-- it's not
all that much complexity-- of printing
out name, comma, dorm,
name, comma, dorm.
And just in case anything has a
quote in it, like a someone O'Leary,
or a name that has an apostrophe or
some other punctuation symbol that
might otherwise confuse a
program reading this text file,
which itself might contain quotes,
this kind of function Write Row
will take care of those
kind of details, as well as
commas that might correctly or
incorrectly be in the values like name
and dorm that I'm providing.
And then we print out the success page.
But this isn't all that
useful of a file format.
A CSV file is nice in
that you can download it,
you can open it in Excel or Apple
Numbers or similar programs.
You can import it into
Google Spreadsheets.
But it's not really a database.
You can store data in it, but if
you want to read any of that data,
or change any of that
data, you pretty much
have to do what we've been doing in
C, which is open it with Open or fopen
or whatever, iterate over the
lines, maybe parse them or read them
into separate variables or into
an array or a list or whatever,
and then you can use binary
search or linear search
or whatever you want
to actually find data,
maybe to then change data,
and then save it all back out.
But this is just tedious.
To have to do all that
work simply to save data
isn't all that much fun for programming,
and it also doesn't scale very well.
As soon as you have some good success
with some web-based application or even
some mobile application, it'd
be nice if your code were
as efficient and as fast as possible.
And wouldn't it be
nice if we could stand
on the shoulders of others who have
had similar problems of storing data
efficiently so that we could learn
from them as well and leverage
some of their work?
So thus was born SQL--
Structured Query Language--
which can be used in any
number of environments.
So SQL is just a language
that we will now introduce.
And it's a programming
language, though it's not
going to be as-- we're not going to use
it as richly as we have C or Python.
We're going to use it for a number
of fairly basic, but nonetheless very
powerful operations.
And you can store data using this
world of SQL in any number of ways.
You can use things like MySQL--
which is a very popular database
server that Facebook started with and
still uses for some of its purposes--
PostgreSQL, Microsoft Access, or Oracle,
or any number of third party products
that you might have
heard of that are super
popular for web-based and
business applications.
And then there's also a
format that's even simpler
but gives us all of the same
capabilities called SQLite.
Whereas MySQL and Postgres and Oracle
and Microsoft Access and the like
typically require that you run some
special program, a server, that's
listening for requests
and responding to requests
and often has usernames and
passwords, if you want something
simpler because you're making a fairly
small scale website that might only
have hundreds or thousands or
tens of thousands of users,
and you're OK with all of your
data living on the same server
that your code lives on,
maybe in the same folder,
you can use something called SQLite,
which allows us to use the language
SQL that we're about to see,
but it doesn't require the
complexity of configuring
a whole server or a whole set of tools.
You can just store your data right there
in a .db or .sqlite or whatever file
right in your same directory.
But how are we going to store that data?
Well, many of you are probably
familiar with tools like this-- Google
Spreadsheets or Excel or Numbers,
which allow you to store data
in generally rows and columns.
And if you're looking at
a spreadsheet like this
and you're storing people's names and
dorms and email addresses and phone
numbers, generally we humans
use the top row for headers,
and we'll put quote unquote name,
quote unquote dorm, quote unquote email
or phone or whatever.
And then every row below that first
row represents in this case a student,
a record in our spreadsheet.
So if I were to do this in reality--
let me go ahead and do just that.
Name and dorm and maybe email and
phone, and any number of other fields.
And I'm going to go ahead
and really be tidy here.
And you can do these silly
aesthetics in something
like Google Spreadsheets
and Excel and Numbers.
But here's where I would do something
like David and Matthews and Rob
and Thayer, and dot dot dot, and we
can fill in the rest of these rows,
and we can just keep growing
and growing and growing.
And frankly, most of these
programs are smart enough
that even though I think
Google Spreadsheets now
goes to a default of 1,000, when
you export that file or save it,
they're only going to save
rows presumably that actually
have data up until that point.
And then you can have
any number of columns.
You get A through I think Z
probably by default. So 26 columns
even if you need fewer, but
you can create more as well.
So this is nice.
This itself is structured data.
You have meta data, like name, dorm,
email, phone, keys, if you will.
And we've used that expression
before in various CS contexts.
And then you have values, and those
values line up with those keys.
And in fact, now if
you think back to some
of the features we've
seen already in Python,
a dictionary is a very popular
and very useful data structure.
In Python, a list or an
array is another such one.
And so really, if you think about what
a spreadsheet is, it's kind of really
just a list of rows, each
of which is a dictionary.
And what do I mean by that?
Well, within every row, whether it's
two or three or 1,000, you have columns.
And those columns have
keys or names to them.
Name, dorm, email, phone.
And recall that a dict
or dictionary in Python
is just a collection of key value pairs.
So if this is just a list of
values, and each of those values
is organized horizontally essentially
as dictionaries, key value pairs,
where row three's name is Rob, row
three's dorm is Thayer, and so forth.
Well it seems that we could
map this kind of format
very cleanly to something like Python.
Or C for that matter, but Python
would be a lot more convenient.
And sometimes, one sheet is not enough.
So sometimes you might have gone
down here to the bottom corner.
You might have created
something like sheet two.
And maybe here, instead of just storing
students, you might store professor.
So the professor's name
and office and course
and any number of other
pieces of data that you
want to store about him or her.
And we could store professors.
And I'll even rename this
from sheet two to Professors.
And in sheet one, I'm going to go
here and rename this to Students.
And so we can really kind of
organize our data cleanly.
But that's about it.
Even though Google
Spreadsheets and Excel
do have some programming
functionality built
in with macros or
similar functions, it's
not all that easy to query the data.
It's not all that easy to
integrate this kind of file
into a program you're writing.
And it's definitely not
necessarily efficient.
Because you've not really told
Google Spreadsheets anything
about how much data you're
going to be putting in here,
what the type of that data is
beyond maybe the formatting thereof.
And so there are some
opportunities for better design.
And thus came along SQL
databases that give us
not only the ability to store
things in rows and columns,
but also the ability to create data
or create structures in memory,
and insert data, select data,
and update data, and delete data.
In fact, a SQL database is
really a specific instance
of what's generally known as a
relational database, a database that
allows you to maintain
relations among pieces of data,
generally spanning sheets-- or as
we're going to start calling them,
tables-- that adheres
to a silly acronym,
CRUD, the ability to Create,
Read, Update and Delete data.
And those verbs map to such
keywords or commands as these
as we're about to see.
So what does this actually mean?
Well, let me go ahead and do this.
Let me go ahead and
within CS50 IDE, I'm going
to open up a special
web-based program that's
simply going to give me the ability
to create databases and create fields
therein.
I'm going to go ahead and create
a database called Lecture.db.
And I'm going to go ahead and open
now a web-based tool via which
I can administer this
database called Lecture.
And you're going to see that
I have a tab called Structure,
which is where I'm going to
be able to define sheets,
or tables as we'll call them.
I can execute raw SQL commands, with
which we'll now start to get familiar.
And later on, I can even
export or import data as well.
But let's focus on
these fields down here.
So this tool is one of
any number of dozens
of tools that might exist that allow
you to create and administer databases
using SQL.
I'm going to go ahead here
and call my table Registrants.
So this is like creating a new
sheet in Google Spreadsheets.
And the number fields--
for now, I'm just
going to go ahead and
have two, Name and Dorm.
And now notice I get a little HTML
form here, whose purpose in life
is to make things a
little easier for me.
So I'm going to type in Name
for one field, Dorm for another.
And you'll notice that
SQL databases now allow
me to choose any number of data types.
So we're sort of going
back and forth here
in between the world of strongly typed
languages, weakly typed or loosely
typed languages.
Here it can actually
matter for performance,
and I'm going to indeed tell
my database what kind of data
I'm generally trying to store in
its columns and each of its rows.
So my options with SQLite, which is
one incarnation of the SQL language,
is Integer-- which means
what it says-- Real--
which is like a real or
floating point number
as we've seen that generally can store
up to 15 or so digits of precision.
But even then, if it can't, it's going
to use a text field automatically
instead, because text is really
just a string, otherwise known
in some environments as var char
or variable length characters.
Blob is just going to be binary data.
And this means we can store things
that are just zeros and ones that
aren't ASCII or Unicode text.
Numeric is a little more flexible.
If I want to store a dollar
amount with dollars and cents.
If I want to store maybe an
integer, maybe a real number,
I can use numeric, and just let the
database-- SQLite in this case-- figure
out what actual data type or
affinity, so to speak, to use.
Booleans is just going
to be a zero or one.
But that's really just a convenience.
It's actually still going to use a full
byte or 8 bits just to store a zero
or to store a one,
really just an integer.
And then date/time, which underneath
the hood can use any number of formats,
whether it's text or integers or reals.
I'll defer to the documentation.
But that's going to allow
me to in a standard way
store things like dates and
times so that we can actually
record when stuff is
happening, when someone
registered for freshman
intramural sports,
and even do useful arithmetic
operations on dates and times.
So what should Name be?
Given all that, there aren't
really too many options.
Indeed, my instincts
are to go with text.
So I'm going to do that
for Name and for Dorm.
And then I'm being prompted by this
web form for a few different questions.
So primary key.
It turns out that among the
features of a SQL database
is the ability to specify
that this field-- this column
shall be my primary key, the field
whose values uniquely identify
all of the rows in my database.
Now, it's not going to be relevant
just yet for the following reason.
Right now, it's quite possible
that two Davids live in Matthews,
or two Robs live in Thayer.
And so if I said that Name
or Dorm were my primary key,
that would mean I can have only one
David or one Rob or one Matthews
or one Thayer in my database.
So for now, we're going
to leave this alone.
That would seem to be a bad thing to do.
Auto increment isn't going to be
applicable, because you can't auto
incremented a string or a text field.
That's going to be germane to integers.
And not null, which is my problem that
this field cannot or shall not be null.
So I will go ahead and do this and just
promise that every student in Frosh IMs
is going to have a name and a dorm.
Meanwhile, you can
specify a default value.
I'm not going to bother doing that
here, but you can see from the drop down
that you can specify a
certain value, like null,
or you can specify the current date
or time, if that's actually germane.
But I don't want to give
people generic names.
And I don't want to just assume that
people live in some default dorm.
So I'm going to leave
that alone and expect
that the user of this
database table is always
going to give me a name and a dorm.
So let me go ahead and click Create.
And what you'll see is this.
phpLiteAdmin is really
just a handy web-based tool
for executing SQL statements.
So I don't necessarily have to
remember all of the syntax up front.
I can kind of learn
from this actual tool.
And what phpLiteAdmin has
really done for me is this.
It has in that file called
Lecture.db executed this statement,
Create Table quote unquote
registrants, Name is text not null,
Dorm is text not null, and that's it.
So phpLiteAdmin is just saving
me, at least for today's purposes,
from having to remember
exactly all that syntax.
But after practice, it
becomes pretty familiar.
And certainly with Google,
you can fill in any blanks.
But what does this now mean?
If I click on Return here
and go back to not just
the lecture where I was, but I click
now on Registrants, my actual table,
you'll see a bunch more tabs.
The top one of which is
Browse-- but this table
is empty, because I've not inserted me
or Rob or anyone else for that matter.
But if I click on Structure, you can
now see in a web-based environment
a little reminder as to what this
table looks like underneath the hood.
It's more esoteric looking than
something like Google Spreadsheets,
where you just have columns and rows.
Here we're being more precise.
I have two fields, Name and Dorm.
Each is of type text.
Neither can be null.
But neither has a default value,
and neither is a primary key.
But via the Edit link here and Delete,
I can actually change those definitions,
and I can even add more fields
if I think of things later.
And so this is allowing
me to create really
a table of information whereby these
are my columns or fields Name and Dorm.
So let's go ahead and insert
for instance me in here.
So I'm going to go ahead and insert
David, who is in from Matthews,
and click Insert.
And notice what phpLiteAdmin did.
It simply executed this SQL command.
Insert into Registrants, Name,Dorm.
So it's specifying what the two
fields are that I want to insert into.
And what values do I want to put?
David,Matthews, not to be
confused with Dave Matthews.
So now, if I click Return and
go back to the Browse tab,
notice that we see Davids and Matthews.
Meanwhile, if I click Insert Here, and
I type something like Rob and Thayer,
click Insert, a very similar
SQL statement was executed.
But this one customized for Rob.
And now Return and see Browse,
both of us are in there.
But now let me start to take
those training wheels off.
It's not all that interesting
just to click around in a web GUI.
Let me actually learn
something from this.
And no, let me go ahead
and type my own SQL now.
Insert into Registrants.
And if I really want to
be proper, I can quote it.
But so long as you don't
have any special characters,
quoting symbols is not strictly
necessary in this case.
What fields do I want to insert into?
Name and Dorm.
And what values do I want to insert?
Raw values.
Textual values that are not
special keywords or field names.
I do need to quote with
single quotes here in SQLite.
Let's go ahead and put [INAUDIBLE]
and Currier for Dorm or House there.
I can put a semicolon.
It's OK if I omit it, but if
I want to execute multiple SQL
statements at once, I will
need the semicolon again.
Let me go ahead and click Go.
And nicely enough, notice what happened.
One row was affected.
It took only 0.01 seconds,
so it's pretty darn fast.
This is just a reminder
of what it is I executed.
And if I go back now to Browse, I can
actually see three rows in the table.
All right, so now things are
getting a little more interesting.
What if I want to search for things?
I'm not even going to use
this training wheel here.
I could use the Search tab and
type in some names or some dorms
that I want to search on.
But that's not the goal here.
The goal here is not to teach
phpLiteAdmin or this web-based tool.
It's just to use pretty quickly
this kind of web-based environment
to create the schema for our
database, the design of the database,
our choices of tables and columns
and perhaps some of the initial data,
and then use that
database in actual code.
So that's where we're going.
We're going to move away from
this web-based environment
and use actual Python
code, but first let's
see some of those other instructions.
Turns out, per this little summary
here, that we have the ability
to create-- which we already did
using Create Table-- to instert--
which I just did.
But what about select,
update, and delete?
Let's not use the web GUI per se.
Let's just go ahead and
start typing some raw SQL.
I'm going to go ahead
and do exactly this.
Select let's say star from Registrants.
And now as an aside,
stylistically it's not strictly
necessary to write select or from or
all of the SQL key words in all caps.
Tends to make code I think
a little more readable.
But I could still just say
select star from Registrants.
But I find it nice to distinguish what
are built in keywords from like what
are my field names and table names.
So select star-- star
meaning everything.
Let's click Go.
And now what I see here is
this web-based representation
of two columns with three
rows that have come back.
We're going to start
calling this a result set.
A result set containing
three rows have come back.
And that's going to map
to code pretty nicely.
Now, let's go ahead and do this.
Suppose that Zamyla moves
to a different dormitory.
And let's go ahead and type
this-- update registrants,
set Dorm equal to let's say Grays
where Name equals quote unquote Zamyla.
And this is what's
particularly nice about SQL.
Even though the syntax is a bit
new and some of the key words
are certainly new, you kind
of just say what you mean.
So update the Registrants table,
setting the Dorm field or column
equal to Grays.
Not for everyone, only where the
name field or column equals Zamyla.
So let me go ahead and click Go here.
All right, one row is affected up here.
And notice, a reminder
of what I just executed.
Now if I click on Browse just
to see what the data looks like,
indeed Zamyla has relocated to a
different dorm or building altogether.
If I screwed up, though, notice
what could quickly happen.
Suppose I did this, Update,
Registrants, set Dorm equal to Grays,
and I left off the
so-called predicate, the
Where clause that I did a
moment ago and click Go.
Now three rows were
affected as indicated here.
And if I click Browse now,
notice that oops, all three of us
now live in Grays.
So I can fix this manually
if I really wanted to
without executing another SQL query.
But this is just meant
to be a user-friendly way
of quickly and dirtily
editing or creating some data.
Really, we're going to start
using those SQL commands only.
What about Delete?
Suppose that Rob has
graduated, and he's moving out.
Let me go ahead then and do Delete from
registrants, where Name equals Rob.
Go ahead and click Go.
One row effected.
And if I go and click Browse
now, only Zamyla and I remain.
All right, so those are
just a few such keywords.
But what's nice is that
again, it's pretty expressive.
So to recap here, how
did I create the table?
I literally said Create
Table Registrants.
Though I didn't type this at
the time, but this was just the
SQL that that web tool generated for me.
Spoiler alert, don't
look at that just yet.
We're going to come back
to that in a moment.
And I specified that the Name
field is going to be of type text
and Dorm is going to be
of type text as well.
Down here I then inserted
something like David from Matthews,
specifying name and dorm.
And notice, these lines
wrapped onto two lines,
but in general, this could just
be a really long query as well.
Select Star from Registrants,
Update registrants,
Delete from registrants where--
I get another little spoiler.
And it's a spoiler
because there is actually
a flaw in my database table's design.
My Registrants table was
not so well thought through.
I indeed only care about storing
names and dorms, at least right now.
Maybe later, emails and phone numbers.
But what could go wrong?
When I deleted Rob, something
could have gone wrong a moment ago.
When I updated Zamyla,
something could have gone wrong.
What if there are two
Zamylas at Harvard?
What if there are two Robs at Harvard?
Well, each of those queries didn't seem
to specify which Rob or which Zamyla.
It just said where the name
is Rob or the name is Zamyla.
So how do we distinguish the two?
Well, maybe I should have
forced undergrads or students
to register with their
full names, Zamyla Chan.
OK, so that would
decrease the probability
that we're going to have
multiples Zamyla Chans at Harvard.
But still there could
be that probability.
Same with Rob Bowden, or really any
name, whether common or uncommon.
So that doesn't feel very robust.
We could look for a Zamyla that
specifically lived in Korea,
but maybe on the off
chance there's two of them
there, that's going to create problems.
And we might update or delete
more rows than we intend.
So this is why we humans
in the US for instance
have unique identifiers,
for better or for worse.
Things like Social Security numbers.
Or here at Harvard we have Harvard
University IDs-- HUID numbers-- or at
Yale MIT IDs.
Or where you are in the world, odds
are you have in your wallet or at home
or somewhere in your
life, a unique identifier.
And you might not even know
what those identifiers are.
In fact, anytime you've
registered for an account
on some website like Facebook or
Google or any number of others,
underneath the hood, even if
those companies aren't telling you
what your unique ID is, they know you
by more than just your name and dorm
or email address or phone number.
Odds are, they have assigned
you some unique but very simple
piece of data, like an integer, so
that Zamyla might be user number three.
Rob might be two.
David might be one.
Simply because that's the order in
reverse in which I inserted them
into the table.
So let me go ahead
actually and fix this.
Let me go back into my database.
And you know what?
I'm going to be pretty bold here.
I'm going to go ahead and in
my table drop the whole thing.
So beware ever executing
the drop command,
because it will literally drop
all the data in your table.
But I'm going to confirm that, I'm
going to start over as follows.
I'm going to create a new
table called Registrants,
this time with three fields.
And in addition this
time to Name which will
be text and Dorm which will be text,
this time I'm going to think ahead.
And I'm going to give
everyone a unique ID.
Convention is to call it
literally id in all lower case.
And indeed, field name should
generally not have spaces, not
have funky punctuation.
Whether you use so-called camel case
with alternating capital letters
or snake case with underscores,
it's generally up to you.
But using underscores tends
to be the convention here,
although it's not yet to remain.
But I'm going to make
this id field an integer,
and I'm also going to
make it my primary key.
So that there shall be no two
registrants in this Frosh IMs program
that have the same primary key, the
same number uniquely identifying them.
And you know, a better
yet-- and this is one
of the features you get for free
with many relational databases--
auto increment.
I the programmer don't
care what number Zamyla is.
I don't care what
number David or Rob is.
I just want them to have
a number, whatever it is.
But I don't want them to be the same.
So let me just let my database
auto increment this ID.
Give the first user one, the second
user two, the third user three,
and so forth.
And by nature, this cannot be null.
They need to be uniquely identified.
And default value here doesn't
matter, because the database
is going to take care of that.
Let me go ahead now and click Create.
And now you'll see the
create table that I just
spoiled a moment ago on the slide.
This now creates a table
again called Registrants,
because I dropped or
deleted the old one.
But notice how much detail
we have specified what id is.
It is not only an integer, it is
also a primary key that supports auto
increment and it cannot be null.
Meanwhile, Name, Dorm, I don't
particularly care this time.
Though I could specify like before,
I don't want those to be null either.
So again, that's just going
to be a design decision.
But let me see how this affects things.
Now let me go into MySQL tab, where
I can just execute some raw SQL.
And soon we're going to be executing
SQL from within Python code.
Let me go ahead and insert into
Registrants a name and a dorm.
And notice, I don't care
about specifying the ID.
That's not interesting to me.
All I care about is the values
like David and Matthews.
And if I now click Go, notice that's
all that the database executes.
It doesn't go in and
secretly change my query,
but it does notice that I've omitted
explicit mention of that id field.
And if I click Browse, now notice,
David for Matthews has an id of one.
Let's go ahead and do two other
inserts, again via raw SQL.
Let me go ahead here and
do insert into Registrants.
And you'll notice there's a nice
little history in this web tool of all
the previous queries I've executed.
So in case you want to click and save
yourself some typing, you can do that.
But for now I'm just going to go ahead
and insert Rob as before from Thayer.
And for good measure at
the same time, Registrants,
Name Dorm values Zamyla from Currier.
And notice the semicolon, so
that hopefully both of these
should go through as separate queries.
Indeed they did.
One row affected.
One row affected.
And if I click Browse, now Rob is
number two, Zamyla is number three.
And this is just going to continue on
until we maybe run out of integers.
But SQLite will support as many as
8 byte integers, or 64-bit values.
So that's not going to
happen for quite some time.
So now notice what
I'm protected against.
There might be a whole bunch of
Robs at Harvard it's fair to say.
But now if I do a
Delete from Registrants
where name equals Rob
and id equals two--
and I don't have to quote the two,
because it is indeed a number--
this is actually more
information than I need.
Indeed, because id is a primary key,
I don't need to mention Rob's name.
If I already know his id, I can
just say Delete from Registrants
where id equals two, click Go.
One row affected.
And goodbye to Rob.
And what's nice now about SQLite
is suppose I do one other insert.
Let me go ahead and Insert into
Registrants a name and a dorm
with values of Jason and Kirkland.
And click Go.
Is Jason going to now be the new number
two, or is he going to be number four?
Let's take a look.
If we click Browse
here, he's indeed four.
So one of the features you get
by using the auto increment field
is that the database is going
to ensure that just in case
you have remnants of Rob
somewhere else, or really
his id number maybe in some other table
or in your logs or some piece of data
you care about, we're going to make
sure that Jason has not only a unique id
now, but a unique id in perpetuity.
If you don't use unique id--
or rather auto increments,
SQLite will actually try
to reuse numbers just
for efficiency, which can actually
break things down the road later.
All right, so things
are getting interesting.
We have the ability now to insert
data, delete data, update data,
and then select it.
What more can we actually do?
Well, it turns out that SQL tends to
come with not only various data types,
but also various functions.
It turns out to save ourselves trouble,
there are functions like date functions
and times and date find times.
So for instance, I want to
execute a query like select star
from registrants.
But no, I don't want to
select all the registrants.
Maybe I want to select all
of the registrants this year.
I could say something like,
select star from Registrants
where date of a particular field is
greater than or equal to January 1st,
2016 or 2017, or whatever
the year happens to be.
So these are functions that can
do that kind of thinking for me.
But SQL also supports a number
of even more compelling features.
We saw a moment ago primary
key, which again specifies
that this one and only field shall
uniquely identify all of the rows
in my table, I guarantee it.
There's other types of
indexes you can use.
So this is the special type of index.
And essentially what you're getting
is underneath the hood the database,
whether it's SQLite or MySQL or
Postgres or Oracle or whatever,
is generally leveraging some of that
week five, week four, week six material
from CS50 in the fancier data
structures, especially the tree data
structures.
And if we tell the database
in advance, hey, this field's
going to be my primary
key, or this field
is going to be an index more generally,
whoever implemented the database
software itself probably built into it
the ability to store that field using
a fancy tree or some other
data structure that ultimately
is going to expedite
selecting and updating
and inserting and deleting data.
And indeed, we can say exactly that.
If you know in advance that you're going
to be searching on a certain field all
the time-- not just the id field.
Maybe your table has an email field or
maybe a zip code field or some other
field that users might want to
search on, rather than just store it
in a column by defining
a field in your database,
and then really just letting the
database use linear search to find
everyone in 02138 or 90210
or whatever zip code,
you can actually tell your database
in advance, index this zip code field,
index this email field.
Because I know I'm going to be
searching on that specifically using
that special Where keyword in SQL.
And let the database give you
better performance than something
like linear search alone.
Meanwhile, you can also specify, I'm
not going to just search on this field.
I don't care just
about it's performance.
You know what?
I can promise you right now
this field shall be unique.
It might be an integer
like a primary key.
But if there's some other
field in a database that
in advance is going to
be unique by design,
you can tell the database as much.
And the database too will optimize
queries involving that field as well.
So among the fields we discussed today--
name, dorm, phone, email, zip code,
you can probably think
of bunches of others--
which of those might be
candidate to be unique
and to be indexed as
unique by your database?
Name already feels like a bad idea.
We want to be able to have multiple
Zamylas, multiple Robs, multiple
of any name in the world.
Probably want multiple people
to live in the same dorm,
so that there can't be just one person
from Currier, Matthews, or Thayer,
wherever.
What about email?
Email's kind of an
interesting candidate, right?
Assuming you're OK with banning users--
maybe a couple of people in your life
who happen to use the same email
address for whatever reason.
So long as you're comfortable
assuming that every human in the world
is going to have or must have
his or her own email address,
you could proactively
say to your database,
OK, the email field or the
email column shall be unique.
I don't want a user with the same email
address to accidentally register twice,
let alone three times or more.
And even if two people think they have
the same email address maybe because
of a typo, I don't want him
or her to be able to register
if someone with that address
has already been registered.
Phone number maybe works
OK in the world of mobiles,
but if people still
have landline phones,
you probably don't want to
enforce uniqueness there.
But what else might you want to impose?
Not null.
If you want to ensure that everyone
in your database has a name,
there's nothing stopping them inherently
from typing like ASDF or whatever
into your database.
You might have to protect
against bogus data.
But you can at least
ensure that the user
has to give you some kind of value.
And then foreign key.
And we'll come back to
this, but there's a way
of specifying that the data in one
spreadsheet or in our world now,
one table, is somehow related
to or identical to the data
in another spreadsheet or again a table.
But let's come back to some
of these features in a moment.
What else do we get from SQL?
We also get the ability
to join tables together.
Now what does this mean?
Well, let's go ahead and let's go back
to the spreadsheet for just a moment.
And let's not worry about students and
professors anymore, but let's go ahead
and think of this as,
for instance, users.
I'm going to rename my quick and
dirty spreadsheet here Users.
And what do I want every
user in my world to have?
Well, I want everyone
to have some unique ID,
and I'll let my database
ultimately assign that.
I want everyone to have a name.
Maybe let's say a mailing address,
a phone number, an email address,
and there could be bunches
of other fields as well.
What else might someone have?
That's probably enough
for a customer database.
And let's go ahead now and consider
what these values might look like.
So my very first customer, who
buys my first widget or whatever,
might be someone like Alice.
And she lives at 1 Oxford Street,
Cambridge, Mass, 02138, USA.
Let's go ahead and make this
column wider just so we can see it.
Her phone number shall be
617-495-5000, and her email address
shall be alice@example.com.
Let me make this field
a little wider as well.
Then Bob comes along, and
he buys something too.
He happens to live down the street
at 33 Oxford Street, Cambridge, Mass
82138, USA.
And his number is 617-495-9000.
And he is just bob@example.com.
Then there's someone
new altogether, Charlie.
He lives at 51 Prospect Street
in New Haven, Connecticut,
and his zip code is 06511, USA.
And his number will be-- he
doesn't want to cooperate,
so he's just going to give
us some bogus number there.
But he'll be also charlie@example.com.
And this table now-- if you
think of the spreadsheet
really as just a
database table of users,
could be dozens or hundreds or
even thousands of rows long.
So let's now consider
what data types I should
be using if I want to migrate this
spreadsheet into an actual SQL
database.
So the id field, which frankly
you usually get sort of for free
in a spreadsheet program, because it
just numbers all the rows for you.
But if you resort your
data, those numbers
do not followed the original rows.
And so giving our data, ultimately
we'll see their own numbers as good.
Because those should
remain with the data.
Alice should always now be one,
Bob two, and Charlie three.
So name.
Let me go ahead and consider
what should the fields here be?
Well, just for the sake of
discussion, let me go ahead
and annotate right on the screen here.
It stands to reason that
this should be an integer.
Name should probably be text.
Address should probably be text.
Phone, I mean it could be an integer.
It's definitely not a real.
Numeric is not right, because it
could then become real somehow.
But is integer right?
I'm a little worried here.
Like especially if it's
an international customer,
I'm kind of biased at the
moment toward US users.
But suppose someone typed in their
actual calling code or country code
and they did something like 011
or 001 for the US and then 617.
If I called this field
an integer, and then let
the user type in their actual number
like this, if it's an integer,
those leading zeros mean
nothing mathematically.
So my database is probably
going to throw them away.
And I might not want that to happen.
We get lucky in that this still
could work as a phone number,
but I really shouldn't be
throwing away user's data.
So you know what, I'm going to
go ahead and just call this text.
Whatever the user gives me, I
shall store from him or her.
And then finally email can be text.
And now let's consider how
we might index these columns.
We'll come back to id in
a moment, because SQL's
going to give us that
automatically in our database
we'll see by using auto increment again.
But it will be an integer.
Name, I could-- I don't
want to make it unique,
because I want to have multiple
Alices and multiple Bobs.
I could make it an index.
If for whatever reason the website
or the application I'm making
wants to make it easy for
users to search on names,
I could proactively say, OK,
database index the name column,
because it will speed up my searches to
be something better than linear search.
Address, same thing.
If I want to use auto complete
or some kind of search feature.
So I can search over address.
Maybe I want to do that.
Phone number, maybe that could be
unique, if I'm expecting only mobiles.
But that could be risky.
Email is the only one I might claim a
unique constraint is pretty reasonable,
especially if your
website-- as is commonly
the case these days-- is going
to expect users to log in,
certainly not with their id.
And I see no mention of username.
If they're going to use their
email address as their login,
then you better only
allow any email address
to be registered once and only once.
But you know what I'm not liking?
And let me make this column even wider.
What's kind of dumb about
this design at the moment?
You know, before we even
get ahead of ourselves
and start moving this into our
SQL database, what's dumb here?
I feel like some redundancy, right?
Like 1 Oxford Street,
that's special and unique.
33 Oxford Street, that's
special and unique,
even though there could be multiple
people living or working there.
51 Prospect Street, same thing there.
But like Cambridge, Mass 02138 USA.
Cambridge, Mass 02138 USA.
Like why for every
resident of Cambridge,
Massachusetts am I storing
Cambridge, Mass 08138 USA.
Cambridge, Mass 02138 USA.
This would seem redundant.
I mean it's not likely that Cambridge
is going to change its name.
I mean, once upon a time it was
Newtown, now it's Cambridge.
But that's not likely to
happen again anytime soon.
So it's not so much anticipating change,
but just look at all these darn bytes
that I'm storing redundantly.
Which of those bytes
though should hopefully
be sufficient to identify
where someone lives or works?
At least in the US, we tend to use
our postal codes pretty readily.
So you know what?
Let me go ahead and create
a field called zip code.
Why don't I just store 02138 here.
And oh, stupid Google
Spreadsheets, notice what it did.
It is assuming, because something looks
like an integer, that it is an integer.
So this is a stupid artifact
of using a spreadsheet program.
Let me change that to
plaintext and retype it.
Now Google will respect my input.
But the point is all the more clear,
we had better in our SQL database
call that text and not an integer.
And now you know what?
I can make my address really
just a street address.
And I can get rid of this, and
oh, I shouldn't get rid of this.
But I should in this column do 06511,
and then here, get rid of all of that.
Damn Google again.
Let's go ahead and
change that to plaintext.
05611, and change this to 02138.
So it's a little cleaner now.
There's a little bit of duplication.
I now have 02138 twice in my database--
or in my spreadsheet in this case.
But at least it's a lot,
lot, lot less redundant.
But I need to recover that information.
This is just my Users
spreadsheet-- my Users table.
You know what?
I'm going to go ahead and create another
sheet with Google Spreadsheets here.
I'm going to go ahead
and call this Zip Code.
I'm going to call this
City, State, Country.
Although Country makes things
a little more complicated,
because zip codes aren't going
to be the same format everywhere.
But for now, 02138 is
going to be-- dammit.
Let's change this whole column
this time to plain text.
02137, Cambridge, Massachusetts, USA.
And now for New Haven,
let's do the same thing.
Paste that, New Haven, Connecticut, USA.
And now rather than just
call the Sheet Three,
let me more intelligently
call this like Zip Codes.
And you know what?
I'm not loving how many bytes
I'm using to store Zip Code.
Why don't I learn from lessons
past and give this spreadsheet
or this column or rather
this table its own ID,
and just arbitrarily assume that
someone will number those for me.
So now I can whittle this
down to just numbers where
this should be an integer now.
And in my Zip Codes table, let me make
room for just these little annotations.
This shall be an integer.
Zip code we've already
learned had better be text.
This should be text.
This should be text.
This should be text as well.
All right, so I have taken a
very simple idea, spreadsheet
with all of my customers, and I seem
to have really over-engineered it.
I've made something simple more complex.
But why?
I mean, it's not compelling when
I have just three customers.
It took me more time
to describe what I was
doing than just do it the original
way, albeit with the redundancy.
But if you do start to have tens of
rows or hundreds of rows or thousands
or tens of thousands, having
all of that duplicated
data just doesn't make sense.
You're throwing away disk space.
You're throwing away potential
performance and running time.
You're throwing away money, if you have
to buy more disk space for your server.
And so what I've done here is what we
would call normalizing my database.
Factoring out the commonalities that
can be uniquely identified more simply,
with this case a zip
code, or even more simply,
via some arbitrary but some
consistent number that I impose.
So let's go ahead now and
port this over to my database.
I'm going to go ahead and use
phpLiteAdmin, just because it
makes things nice and easy here.
I'm going to go ahead
and drop my Registrants
table, because that story is over.
And now let me go ahead and
create a couple of tables.
The first of which I'm going to go
ahead and call my-- what did I call it?
Users.
And number of fields, I'm going to
need one, two, three, four, five,
six fields.
So let's go ahead and do that.
So six fields, and those are going to be
id, name, address, and what did we say?
Zip code, and phone, and email.
And this shall be an integer.
This shall be text.
This shall be text.
This shall be text.
This shall be text.
This shall be text.
But id shall be a primary key.
I want it to auto increment.
And you know what?
I don't want names to be null.
I'm OK with addresses
being null if they don't
want to cooperate or zip code being
null if they don't want to cooperate.
Phone, email should not be null.
So I'm going to go ahead
now and click Create.
This is the query-- somewhat
longer now-- that was executed.
And now let me go ahead
and create another table
by clicking on the name of my database
again, creating a new table called
Zip Codes.
Number of fields here is going to
be one, two, three, four, five.
So five fields, one of which is
id, one of which is the zip code.
Then city, then state, then country.
Text, text, text, text, integer.
This shall be primary key.
This also shall be auto incremented.
Zip code cannot be null.
And everything else
I'm OK with being null.
So I can also not spell country.
Create.
And so now, let's toss in some data.
Let me go ahead and insert really
quickly, I'm going to use the web form,
zip code of 02138, Cambridge,
Massachusetts, USA.
Insert.
Let me go ahead and
insert New Haven as well.
So 06511.
05611, yep.
05611.
New Haven, Connecticut, USA.
And now let me just give
myself a few customers.
Let me go in to my Users table,
insert David-- or no, who did we have?
We'll copy and paste our actual users.
Alice, so we had Alice from 1
Oxford Street in zip code now.
But this zip code-- oh, interesting.
I goofed.
I went a little too fast.
So let me abort.
Let me go to Structure.
Zip code I said was
text, but you know what?
I'm going to change that to integer.
All right, and now let me go
over to the Insert tab again.
Let's put Alice from 1
Oxford Street at location 1,
with her phone number,
which was for 495-5000.
And you can do this.
Even though this looks
even more like an integer,
a common mistake I
think in a database is
to just blindly throw in
whatever the user typed in,
even if he or she used
parentheses or pluses or dashes
or any number of other
punctuation symbols.
There's no reason for
us to store all of that.
In fact, if we want to pretty
things up and throw away
some of the inconsistencies,
I could just
use code-- Python, ultimately--
throw away all punctuation symbols,
and then just store 10 characters
in the case of a US number,
or whatever pattern I care about.
Or I could proactively with
Python even pretty this
up by just putting the dashes
where I want them to be,
not where the user necessarily put them.
And then Alice was alice@example.com.
And now let me really quickly go
ahead and insert Bob, who was from 33
Oxford Street.
He's also in zip code 1 now.
His phone number was 617-495-9000.
And I'll go ahead and pretend
like I'm formatting this in code.
And then he was bob@example.com.
And then lastly, let's go
ahead and insert Charlie
from 51 Prospect Street in New
Haven, whose id I now know to be 2.
617-000.
He was the one that didn't cooperate.
And charlie@example.com.
All right, so what does this now mean?
In my database called Lecture,
I now have two tables--
Users and Zip Codes, each of which has
some number of fields inside of them.
But this is not all that
useful to me anymore,
because now if I go ahead and select
my users with a query like this,
select star from users,
go, I get back this.
But what the heck is zip
code 1 or zip code 2?
Now OK, I can figure this out.
If I know that I'm looking for Alice,
and I see that she's in zip code 1,
I could do another query.
All right, let me go
back to that SQL tab.
Let me go ahead and do select star
from Zip Codes where id equals 1.
Because I saw that Alice's zip
code was uniquely identified by 1.
Oh, all right.
So Alice lives in zip code 1.
Yeah, that's 02138 Cambridge, Mass, USA.
OK, got that.
But now I've used two SQL queries.
The data is still like-- some of it's
over here, some of it's over here.
Now I've got to somehow
combine it in code.
And you can do that, but SQL is
much more powerful than that.
It has other keywords like
this, where I can have
the database do the thinking for me.
Let me do this.
Select star from users, join Zip Codes
on users.zipCode equals zipCodes.id.
Take a moment to think, and
ignore the red squigglies.
That just my browser thinking I'm
being grammatically bad, verbally too.
Select star from users, join zip code.
So I'm telling the database, go
ahead and join these two tables.
How?
Well if you think of one table
as this hand and one table
as this hand, what's
nice is that each of them
has a field that's inside of the other.
This table might have the id field.
This table has the zip code field.
Wouldn't it be nice if I could
somehow stitch those together,
lining up those zip codes,
whereby in this table, Users--
I called it zip code-- and then in
my Zip Codes table I called it id.
Let's see what happens.
Let me go ahead and click Go.
And amazingly, look at what I get back.
My result set this time
contains everything.
I get Alice, Bob, and Charlie, each
of whom lives at these addresses.
But look at their zip codes now.
They've been filled in
with the actual values,
plus the cities and the
states and the country.
So this is where you
really now start to scratch
the surface of the
capabilities of something
like SQL, because it can actually
combine data in this way.
And this was a pretty simple query.
But now I can sort of
exercise good design.
I can keep my data very cleanly
structured and normalized, whereby
I factored out all of the redundancies.
And yet I can programmatically
reconstruct that data
and ensure that I can get
back everything I care about.
Moreover, I can further optimize things.
If I go into Users for instance
and I go under Structure,
notice down here there's another field.
Create index on one column or more.
Let me go ahead and create
that index on my Users table.
I'm going to call it
Arbitrarily Email, and I'm
going to say duplicate
values are not allowed.
I'm not going to bother
with a where clause here,
but I'm going to put this constraint--
this unique index on my email field,
and now click Create.
And notice what got executed.
Earlier we used it to create a table.
Here I'm using it to create
a unique index called Emait--
though I could've called it
whatever-- on the Users table
using specifically this field.
So this now is my way of ensuring
that the following can't happen.
Notice that in my Users table I've
got Alice with alice@example.com.
suppose that someone else named Alice,
also with that same email address--
or who thinks her email
address is the same,
because of whatever typographical
error-- comes along,
and I try to register them
for my site as follows.
Insert into Users.
Let's see, we have name, what
are the other fields now?
Name, address, zip
code, phone, and email.
The values as follows Alice number
two, so her name's not distinct.
Her address is going to be 1 Main
Street, so her address is not the same.
Zip code, she'll go ahead and live
in New Haven, for instance, so 2.
Phone we don't need to worry about,
so she'll be another fake phone
number for today's purposes.
But suppose that she also thinks
her address is alice@example.com
Or heck, maybe this
is the same Alice who
forgot she has an account on our
website-- maybe more likely--
and is trying to reregister
with the same address.
What's going to happen?
If I didn't make any typos
here, let's click Go.
Interesting.
The database stopped me from doing this.
Error, unique constraint
failed, users.email.
And this is common
syntax in the SQL world.
tableName.fieldName.
So insert into Users,
whatever I just typed
failed, because the unique constraint.
And indeed, we can see this.
Let me go to Browse and look.
There is no second Alice.
So we could do this
in code, as we'll soon
see even more in the world of Python.
You could check by writing your
Python code to see, wait a minute,
someone's trying to register
as alice@example.com.
Let me quick select all my users
from my database, look for Alice,
and if I already have
Alice at example.com,
I'll just say, no, you can't register.
Much like I said no you
can't register for Frash IMs
if you don't have a
name or a dorm inputed.
But my database can do that for me.
And this is a nice wall
between me and my database,
a nice wall between
the software developers
and even the database
administrators so that you
don't have to worry
about data accidentally
getting into your website or your
database that shouldn't actually
be there.
All right, we have all of
these building blocks now.
We have Select and Insert and
Delete and Update and now Join.
What more is there?
Well, there are some more keywords.
But let's first use the ones we have.
But this time, not just to play
around via the web-based tool.
Let me actually show
you one other thing.
So let me go in to CS50 IDE and take
a look at my source 9 directory,
and you'll see these four
folders, two of which
you're about to look at in a moment.
And you'll also see
lecture.db, a file that I've
been using to store all of my
SQLite data, including those tables.
Well, it turns out that we can actually
use a command line client in order
to see the same data.
Let me go ahead and do that by
typing SQLite 3 for version 3.
And then let me go ahead and type in
the name of this file, lecture.db Enter.
And now you'll see a very simple and
perhaps a little cryptic command line
interface.
But thankfully if you
type as it says .help,
you'll see everything that you can
possibly do with this command line
program.
But we're going to go
ahead and keep it simple.
I'm going to go ahead
and just say .tables,
which is SQLite's way of saying show
me what tables are inside of this file
called lecture.db.
Now here I have users.
Let me go ahead and see what
is the schema of these tables.
And you'll see the SQL commands with
which the Users table and the Zip Code
tables were created.
But more interesting than that is this.
At the command line, this
is just a SQL client.
I've been using the web-based GUI,
phpLiteAdmin for the past few examples.
But we can just do this
at the command line too.
Let me go ahead and select
star from users semicolon,
and now you see in purely
textual form the exact same data.
Here's the ids, my users name
addresses, zip codes, phone, emails.
I can see the same thing
from my zip code fields.
Let me go ahead and do
select star from Zip Codes.
And there we see the same data.
So using the command line too, you
don't have to even use phpLiteAdmin.
It just tends to be a
little more user friendly.
You can also see the same
data in the same file.
And now that I've done
this purely manually,
let's now transition to
doing this in Python code.
But first, let me make mention
of one detail in the tables
that we had created.
Back here in users,
you'll recall that we
had the structure of
having an id and a name
and the address, zip
code, phone, and email.
What you'll actually find is
that the world has generally
standardized how you define what are
called foreign keys in these tables.
So specifically, which of
the fields in this table
are kind of foreign to this table?
That one of zip code.
Originally, that zip code
was a text field, 02138,
and any number of other
zip codes as well.
But then I changed it to an integer,
because it really in this table
is a foreign key.
Because those same
numbers 1, 2, and onward
are primary keys in that
other table called Zip Codes.
And so what would actually
be more typical here
is that we would often name this not
Zip Code, which is a little ambiguous.
But just as a human
convention, let me propose
that we clean up this
design a little bit
and call this Zip Code ID to make
super clear that this is indeed an ID.
It's not this table's ID, it's
an ID from the Zip Codes table.
Plural would be the convention, and
singular would be the convention here.
And indeed, we can now use the
same kind of statements in code.
But it turns out with one of
our concluding examples today,
we'll see while it's advantageous
to adhere to certain conventions,
and frankly even this might even be
typically higher in the table as an id
and not just buried there in the middle.
But that's a more minor detail.
All right, now let's return to CS50 IDE
and build upon those previous Frosh IMs
example.
But now weave in this
new feature that we have,
which is that ability to store data
and retrieve data from a database.
So recall that this
was Frosh IMs 1, which
was an improvement upon Frosh IM 0
in that at least Frosh IMs 1 actually
kind of sort of registered students.
It used registrants.csv, and it wrote
out the data to a simple text file.
But with text files, CSV files,
even though you could double
click them and open them again
in Numbers, Excel, and port
them into Google Spreadsheets, we don't
have the same expressive capabilities
as we do with a language like SQL
to select, insert, update, delete.
They're just text files.
With SQLite though, a binary
file, and the SQLite 3 program,
and we'll soon see Python
code, we can actually
execute more sophisticated queries
than the CSV format allowed.
So how are we going to do this?
Let me propose now in Frosh IMs
2, we create a new application.py
that's going to behave as follows.
Let's go ahead and
propose that I'm going
to import as before some
of the Flask functionality.
So from Flask import capital Flask,
which is the class that we're using
is the application itself.
Render template, so I can render HTML.
Redirect, so that if I
want to redirect the user,
I can do this by an
HTTP location redirect.
Request, so that I can
actually get at form data.
And we'll see URL4, which you
might recall from past problems.
But you know what?
I'm also going to import
from CS50 our SQL wrapper.
So inside of the CS50
library recall for Python
is getInt and getString
and getChar and getFloat.
But inside there also is a
whole library called SQL.
So the CS50 SQL library that
doesn't do all that much,
but it does offer us an
execute function that's
going to allow us more
simply than might otherwise
be possible to execute
SQL statements inside
of our Python code against
Lecture.db on FroshIMs.db
or whatever the file may be.
And I'm going to do that as follows.
First I'm going to instantiate
my Flask application, as always.
And now I'm going to instantiate
a database connection essentially
as follows.
SQL quote unquote SQLite
colon, slash, slash, slash.
So notice the third slash in this case.
FroshIMs2.db.
So this will be Frosh IMs version 2.
And that's it.
So I'm telling the CS50 library to
use SQLite and to open up ultimately
the Frosh IMs 2 database.
Now I'm going to go ahead and have a
simple route as before for just slash.
I'm going to define a function
called index as before,
and this one's super simple.
Just go ahead and return
render template of index.html.
And now let's reimplement
register as follows.
So app, route, slash register.
But in this case, you know what?
I want to support post.
I don't want students'
information ending up in the URL
and like the computer lab's history
or the roommate's browser or whatever.
So let's go ahead and specify
to Flasks that you know what?
Only support this list of
methods, specifically just post.
Let me go ahead now and define
a method called Register.
That's going to map to this route.
And as before, if request.form quote
unquote name equals equals nothing,
or request.form quote unquote
dorm equals equals nothing,
then go ahead and return failure, and
return render template of quote unquote
failure.html.
So the website is
almost exactly the same
now, except we have preemptively added
this database capability to our code
that we're now going to use.
Otherwise, if we're
not in failure, let me
go ahead and execute the following SQL.
Inserts into registrants a name and
a dorm with the values of-- we'll
come back to this in a moment.
Colon name colon dorm.
I could call these
anything, but a convention
in a lot of SQL
libraries-- a lot of code,
whether it's Python or PHP
or Ruby or other languages--
is to have these placeholder values with
a colon and then a word that's probably
identical to the field
name, but much like in C,
percent s has been placeholders,
much like in Python open curly
brace close curly brace
has been placeholders.
We've got one more placeholder
convention, which is colon, and then
the name of a symbol.
But for the most part,
they can almost always
be identical to what
the field names are.
And now, I need to plug these values in.
So I'm going to go ahead and plug in
for the first name equals request.form
name.
And dorm is going to
equal request.form dorm.
So again, this is a Python
thing meets SQL thing.
In our SQL library, as implemented
by the CS50 execute function,
we have this SQL statement.
Insert into Registrants name, dorm,
values, colon name colon dorm.
CS50's execute method, like
many libraries out there,
will recognize any words that you
have written in a SQL statement that
begin with a colon,
and will then proceed
to substitute in any of the named
parameters you provide thereafter.
So I'm saying, hey, library, give
plugin a name of request.form name.
So whatever came from
the user's HTTP request.
And for dorm, plug in whatever
came in the dorm field as well.
So those were from the HTML forms
that the user submitted in order
to register for this sport.
After they've done that,
let's go ahead and do this.
Return, render, template, success.
And this time I'm going to change
the success message to actually be,
yes, you really did indeed
register, or the spirit thereof.
So let me do this.
Let me go into this
Frosh IMs 2 directory.
And as you might have
done for recent problems,
let me go ahead and run Flask run.
And we have since tweaked
the configuration of the IDE
so that you no longer have to specify
a host of 0.0.0.0 or port of 8080.
Those will just be assumed by default.
You can override them if need be.
Now, I believe my
application is running.
So let me go over to my web server.
And indeed, this is the
form we saw last week.
Super simple, super ugly, super HTML.
Looks like HTML1, actually HTML5.
But let me go ahead now and
prepare a database for this.
In another terminal window,
I'm going to go ahead.
And in my source 9 directory
as well, open up FroshIMs2.db.
Which if I open it up in phpLiteAdmin
already looks like this, Frosh IMs 2.
But there's no table in there yet.
So I care only for now
about names and dorms.
But I've learned my lesson.
I'm going to care about IDs too.
I'm going to go ahead and create
registrants with three fields,
much like we conjectured earlier.
An ID, a name, and a
dorm, first of which
shall be an integer,
primary key, auto increment.
Next of which will be text,
dorm of which will be text.
I don't want any of those to be null.
They don't need to be auto
incremented or primary keys.
So let me go ahead and create.
If I return now to my Browse
Structure, the table is empty.
Now let's go to the
web-based application.
Here we go.
Let me go ahead and have
a new name altogether.
Maria wants to register.
And Maria is from Stoughton.
Register.
You are registered.
Really?
So we've not invested heavily in the
aesthetics of this website just yet,
but really I have been registered.
If I go back to phpLiteAdmin, click on
Browse now, notice Maria from Stoughton
is actually now in my database.
Wait a minute, maybe
I'm kind of cheating.
Maybe she was already there somehow.
Let's do this again.
Let me go ahead and hit back.
Let me go ahead here now and say Andy.
And let's pretend Andy is
visiting someone in Weld.
And click Register.
You're registered really.
Let's go back to phpLiteAdmin.
Click Browse.
And now Andy is in there as well.
So now we're not just storing things
on disk so to speak with CSV files.
Now we're actually interacting
with the SQL database.
And we did so by using CS50's execute
method inside of its SQL database
library.
But notice how relatively simple it was.
We simply used the
same syntax with which
we've been playing around, either
the command line or phpLiteAdmin,
but now doing it an actual Python code.
So now we literally have the ability
programmatically to create data,
to update data, delete data, or select
data, ultimately, from a database.
So let's do exactly that.
Let's take things up a notch and
do even more than we did last time.
You know what?
Let me go ahead now and create
another route for slash registrants.
Suppose I want to make
available a web page that shows
me everyone who has in fact registered.
So this is a feature we haven't
even had yet, though we could have.
At least with CSVs, we would have had
to open up the CSV and iterate over it.
Now we'll do that even more simply
with the registrants method here
that has rows gets db.execute,
select star from registrants.
So I know from our
earlier experimentation,
select star means get everything.
From registrants means from that table.
db.execute is just the CS50 method
that's going to execute the SQL
and return to you-- what?
I didn't care about a return value
earlier, although technically, I
could have gotten back a value, as
you'll see in the documentation.
But selecting star from registrants.
What do I want back?
Well earlier, I proposed that a
database table, like a spreadsheet,
really is just a list of dictionaries.
And indeed, that's exactly what the
CS50 execute method gives you back.
It will return to you if you've
used a select a Python list each
of whose elements is a
Python dict object, which
means you have access,
which each of those
rows to the field name or column name,
the so-called key in a dictionary,
and the value, the cell in that table.
So now that I have these rows,
what can I actually do with them?
Well, I'm going to go ahead and render
a new template, registrants.html,
and I'm going to pass
in as you might have
for a past problem all of my registrants
by passing in all of these rows.
So it turns out, templates
can be parametrized such
that I don't just have to spit out
some hard-coded registrants.html file.
I can pass in a key of registrants
or call whatever I want,
and the value that I just
got back from the database.
So that I'm now handing to my templating
language, Ginga all of these rows.
So that suggests that in my template,
my so-called view more generally in MVC,
my view can iterate over those rows and
spit out every one of my registrants.
Let's go ahead and do this.
Let me go ahead into registrants.html,
which will be a new file.
So let me create a new file
here called registrants.html.
And let me go ahead and
make sure that as are others
extends layout.html so that it
looks just like everything else.
And then let me go ahead and just
give it a block title up here,
so that its 2 is consistent.
This will be called registrants.
And now I do nblock.
So again, this is the
Ginga templating language.
This is not a Python
per se, not SQL per se.
It's just really for rendering a
viewer or the aesthetics of my site's.
Block body.
And now in here is going to go nblock.
So the question is, how
in this new template file
do I spit out like a list
of registered students?
Well, list.
I know from some HTML
back from week six,
I can give myself an unordered list.
So just a bulleted list like that.
And now I know I can spit
out list items to put values
next to those bulleted lists.
But I don't know how many
list items to output yet.
But wait a minute,
passed into this template
was a key called registrants
whose value is rows,
the list I got back from my database.
So you know what?
It turns out that in
Ginga, you can execute
essentially Python code that looks like
this for registrant and registrants.
And then down here, let me do N 4.
So slightly new syntax here.
So no colon here we're doing
4, and the opposite of it I N4.
I can now do list item.
And next to that list
item, you know what?
I'm going to do it and we've
seen this syntax before.
Registrant.name from registrant.dorm.
OK, this looks very
strange so what's going on?
First line just means we're
inheriting from layout.html.
So the whole page is going to just plug
in values into that template, namely
a title and a body.
Title's uninteresting.
It's just going to be registrants.
Body gets interesting, but notice
it's not hard-coded HTML anymore.
We've got an open UL tag and a close UL.
So this means hey browser, here
comes the start of unordered,
the end of an unordered list.
But notice that my template or more
generally the view in my software
is now going to use a
Python-like loop here.
But again, no colon.
You have these special tags with the
curly braces and the percent signs.
And four is the opposite
and now notice what
I want to do inside of this
loop I want to output literally
open bracket LI close bracket.
And then eventually, open
bracket slash LI closed bracket.
And then dynamically, on
each iteration of this loop,
I want to output a registrant's
name and a registrant's dorm
and just grammatically say from
in-between, so it's David from Matthews
and some Zamyla from Currior
and Rob from Thayer and so.
And the only thing that's
passed in is this thing here.
Remember the named arguments,
the named parameters that I
passed into my render template method.
It was registrants equals rows.
This is really equivalent to
iterating over the rows that
came back from my database.
In fact, if you want to
be even more l I don't
have to call registrants I
could just say rows equals rows.
So I'm literally passing in my database
rows, and then in my template here,
I could do for row in rows
row.name, registrant, row.dorm.
Just semantically, I thought it would
be a little more intuitive if I actually
say what these things are and not
just generically refer to them
as rows from my table.
All l so let's see this now.
Here that web form.
Let me now go to slash
registrants, enter.
And oh my god, a bulleted list.
This may be-- I want to
prove that this is working.
Let me go back here.
Let me go ahead and
register Stelios now,
who is currently living
in Canaday, register.
OK, Let me go to slash
registrants again.
Reload.
Stelios from Canaday.
Very interesting.
But this isn't where we need to stop.
Let me add one other feature.
Lets get that Stelios out of there.
I suppose that it wasn't
quite making it on the team
here so you know what, we
want to have one other route.
Let's have an unregisterred function
if Stelios wants to bow out after all
and focus on something else.
So apt out route slash unregister.
You know what?
This is going to support
multiple methods.
So by default, it's just get.
But I want it to be not
just get but get and post.
And then in here I'm going to have
def unregister as the name of my method.
Again, it could be anything.
But you should be
consistent, I would say.
And now I'm going to
have two conditions.
So this slash unregister
page is visited via gets.
You just go to this URL l.
Then I just want to see a
same list of registrants
but with some kind of form by which
I can delete them from my database,
by which they or I can unregister them.
Meanwhile, I want to
go ahead and if post,
I want to acually do the deletion.
So let me try this.
If request.method equals equals
gets, then Rose get's d.b.
Execute, select star from registrants.
And then I'm going to go ahead and
return the templates, render template,
unregister.hteml.
And pass in those
registrants as those rows.
Actually, c with.
Lith requests that method
equals equals post.
What I want to do now, I want to do
if request.form, quote unquote "ID."
So if there is, in fact, an ID passed
in-- more on that in a moment--
I want to go ahead and execute--
hmm, where is this going?
Let me go ahead and
see the template first.
I think we need to see this.
Let me go into froshims2.
And I whipped this one up in advance.
unregister.html-- let's
go ahead and do this.
So on registrants.html, I just
spit out an unordered list
of all the registrants.
In unregister.html, I'm going
to add a little bit more.
Inside of each of those
list items, I'm going
to have a forms input field-- input
element-- whose name is going to be ID.
I've just hard-coded
that, because this is
going to represent the ID of the
student I want to unregister.
The type of this input
is going to be radio.
So these are mutually exclusive circles
that you can essentially toggle.
Value of this is going to be whatever
this current registrant's ID is.
And then I'm going to go ahead
and put this registrant's name
and dorm just as before.
So we'll see what this
looks like in a moment.
But notice I've also added
two other lines up here.
I've specified I'm inside of a form,
the URL for which is unregister.
So we've seen this before. url_for() is
just a function that comes with Flask
that helps you dynamically figure
out what the actual URL should be
for the method called unregister().
So that's why, if you adopt
some nice naming conventions,
you can use tricks like this, and
Flask will just figure out to what URL
this form should be submitted.
And the method I'm going
to use is going to be POST.
And then down here, notice
I have a Submit button.
Type equals submit.
Value equals unregister.
So what does this look like?
Well, let's see that.
If I go ahead and visit /unregister, I
see an ordered list, just like before.
And if I played with CSS, I could
even get rid of these bullets
altogether, each of which has a
radio button next to it, one of which
I can click in order to delete
this user from my database.
But how does this work?
Let me go ahead in Chrome
now, and inspect, and look
under elements at one such row.
And notice the HTML that Python,
and in turn, my Jinja template
has dynamically output it for me.
Here's my page's HTML.
If I expand this list item, I see input
name equals ID, and I hard-coded that.
Type equals radio, I hard-coded that.
Value equals-- this was registrant.id.
So in this Jinja for loop where I'm
spitting out one registrant at a time,
this is where I was
dynamically spinning out
Maria, and Andi, and now Stelios' ID.
So if I expand, the LIs above will
see that, ahh, Andi has a value of 2,
because her ID is 2.
And Maria has an ID of
1, because her ID was 1.
Meanwhile, Stelios from
Canada is just text.
So if I select him and
then click this button,
we need to be able to handle that.
So what do I want to do?
Let me go back into CS50
IDE, into application.py.
And here we go.
If the form was submitted via POST,
as by clicking that Submit button,
and if there is indeed
an ID in the form--
so the user actually did select
one of those radio buttons,
and therefore, there's some actual
work to do-- DELETE FROM registrants
WHERE ID equals colon ID, where,
again, colon ID is just my placeholder.
The value I want to plug in for ID
is request.form quote unquote "ID."
Where did that come from?
Again, if we go back to my HTML,
notice that these radio buttons are all
called "ID."
And because they are mutually exclusive
by definition of a radio button,
only one ID will be submitted if
one of these boxes is checked,
one of these circles is checked.
And that value submitted
will be 1, 2, or 3.
So to confirm, here's
our page with unregister,
and all three students
are still in there.
Here is /registrants.
All three students are there.
Although this is
uneditable, let's go ahead
and unregister Stelios
by clicking on Register.
And it would seem that he's indeed
gone from the bulleted list.
Let's go ahead and check phpLiteAdmin.
Let's go ahead and click Browse.
And Stelios is now gone.
So this is where everything's finally
starting to come together, right?
In week 6, we talked about
HTTP, and parameters,
and how HTML and CSS worked.
But it was largely
static and hard-coded,
and we were just playing
around with fake Google
and implementing our own front end.
But now I'm using forms
again, and using them not only
to create an interactive UI--
user interface-- for users,
I'm also now implementing
the back end, the server,
the routes that are capable of
getting those HTTP parameters'
values as with request.form-- or other
mechanisms if they come in via GET,
a different syntax altogether.
I can get those values and
then do something with them
by combining those values with SQL code.
And so the last line
here that I actually
should have included for
good measure-- I simply
ran the code that I had pre-written
in advance so that I didn't mess up.
Let me go ahead and do this.
This is the one line that
was technically there
when I just ran that code even though
you didn't see it until just now.
After all this, I have decided,
just to keep the UI pretty simple,
after trying to delete someone,
just go ahead and redirect,
not to the unregister page
again, but to the registrants.
And this is why, after
deleting Stelios, I immediately
saw a new bulleted list
with just Marie and Andi,
because I redirected the user to
the route for registrants, a.k.a.
/registrants.
So again, this is where
everything's coming together.
And it's a lot to absorb all at once.
Because, my god, we had HTML and CSS.
Then we introduced Python.
Now we introduced SQL.
Then, we have Jinja and
the templating language.
And now all of this comes together.
But again, if you go
back in diving into all
of this, first principles and the
definitions of each of these, HTML,
it's just the markup language that
lets us lay out and format a web page.
HTTP is just the
language-- or the protocol,
technically-- via which web browsers
and servers intercommunicate.
Python, of course, is a
higher-level language.
It's an alternative to C.
And it seems to come with a whole
bunch of useful functionality
that, thanks to frameworks, or
micro-frameworks like Flask,
make it relatively easy to get real work
done with relatively few lines of code.
There's a learning curve, to be sure.
But you know, this is kind of
impressive, that with just a dozen
or two lines of code, I've
implemented the beginnings
of a web-based application
by which students
can register for sports,
unregister for sports,
see who's registered for sports.
You know, I might just need to add
some logins, and a few other features,
and definitely, some
prettier aesthetics.
But that's a lot of functionality
packed into just a few lines.
And now that we have SQL today and we
have a function like db.execute() that
allows me to execute SQL
inside of my Python code,
now we have the ability to store data
long-term, to access it, search it.
And we're just using small data sets.
I could store thousands,
tens of thousands of rows
and use these same
principles, especially
for data science applications,
analytics, analyzing corpuses of text.
So many possible applications now.
And you know what, if we will
go just one level deeper,
it turns out that while having
programming chops with SQL,
and knowing SELECT, and INSERT,
and DELETE, and JOIN, and CREATE,
and all of the various keywords
we've started to play with today
and scratched the surface of--
super useful and super powerful
when you want to analyze your
own data, or your own company's
data, or your own thesis'
data, or the like--
it turns out that eventually, you
even outgrow that level of interest
typically.
And an additional layer of
abstraction is often helpful.
And so another feature
you get with frameworks
like Flask is what are called
models-- literally, models.
So you recall that we've
been talking about,
in general, MVC, whereby we have
models, and views, and controllers.
Well, in this model, we
have been interacting
with our data via low-level SQL.
It's new, for sure, today.
But it turns out that once
you get comfortable with SQL,
and so long as you adopt
certain conventions of giving
all of your tables an ID field--
and if you have foreign keys,
it's something, underscore,
ID-- where you generally
adhere to certain conventions and
adhere to a framework's requirements,
it turns out you can
do things like this.
In froshims3, we have
essentially the same files,
except that we've changed
the application.py
to be a little bit different.
We're not using any of
CS50's package or module now,
so we're not using db.execute().
And in fact, you don't technically
need to use that to access SQL.
Our single-function execute
just makes it much easier
to get back lists of
dictionaries as opposed
to executing multiple lines of code
as you could do with a library called
SQLAlchemy or Postgres' own--
or rather, SQLite's own-- driver
in the world of Python.
But I'm going to add a few lines
here using another library called
Flask-SQLAlchemy, which
was pre-installed, or will
be pre-installed, for you in CS50 IDE.
There's a few lines that I
had to copy and paste earlier,
from the documentation,
to get it to work right.
But notice, this is,
perhaps, the familiar line.
Instead of froshims2,
it's now forshims3.db.
But notice, at the end,
I get another db object.
It's not CS50's.
It now belongs to the
author-- or it was created
by the author-- of this library.
But it turns out you can do some
pretty cool things as follows.
Recall, from last week, that
Python supports classes much like C
supports structures.
And inside of classes can go
properties, or pieces of data,
as well as methods, or functions.
So it turns out that we
can define, using Flask
and using libraries or frameworks like
it, what's called an Object Relational
Mapper, or ORM.
And this is just a fancy
way of saying, if you
don't want to think about
your data as rows and columns,
which we have been all
of today, ultimately,
whether in spreadsheet
form or database form,
you'd really like to think of a
registrant for a freshman intramural
sport as an entity, as an object of
some class, well, you can do that.
You can declare a
class called registrant
and have it extend the db.Model.
So this is another class that comes with
Flask that we are now inheriting from,
so to speak.
So this is truly now object
oriented programming.
We are specifying, via __tablename,
that the SQL table to which this class
should map is going to
be called registrants.
And ultimately, this class,
registrants, when it is instantiated
is going to give me an object that
represents a row in that table.
And this object is going to
have an ID, a name, and a dorm,
as we've been doing.
And notice here, using SQLAlchemy--
so using this library from Python--
I am declaring a column called ID
that's going to be of type integer.
And yes, it's true that it's the primary
key, whereas name and dorm are just
going to be text.
Because what you can do with
SQLAlchemy, and with ORMs,
more generally, is you can specify,
in Python code, or whatever language,
what your database looks like
and what your data therein
looks like without actually writing
raw SQL queries so to speak.
So even though we've just
introduced SQL syntax,
you can eventually take off
that layer altogether and build
on top of it using objects like this.
And now, as an aside, this is a
constructor or an initialization method
that you might recall
from past problems.
But let's focus, a
little later in the code,
on why this is actually compelling.
If we scroll down later in the
code, like, to my register route,
there is no SQL in this
implementation of froshims3.
3
The first few lines are the same.
Indeed, only once we get
down to this line here
do we have registrant as a variable,
registrant as the class name.
And we're passing in,
apparently, the name and the dorm
that came from the HTTP request.
And we're passing those
into the registrant class.
If you recall how classes worked,
if they have an init() method,
you can pass in some default
values, name and dorm in this case.
And that's how we are creating a
registrant object of type registrant.
And now notice what we can
do here, db.sessios.add().
So this adds to my database
sessions, so to speak,
another feature you get from
this particular library.
We're going to add that registrant.
And then wonderfully, we're
going to commit that registrant.
In other words, we have created a
variable in memory-- specifically,
called registrant-- who's type, who's
data type, is Registrant, capital R.
And that's simply a class,
inside of which is name and dorm.
There's no ID yet.
But what's really cool about this ORM is
that when you call add and then commit,
that's like putting it in a
database and then hitting Save.
And the database, because
of the code we find up here,
is going to automatically
insert that ID for us.
And meanwhile, this object now,
registrant, is actually going to have,
inside of it, the ID that
was stored in the database.
So I don't have to
worry about insertions.
I don't have to worry about
any updates or deletes.
I can interact with my data now,
completely at a higher level, in Python
alone and leave it to
the ORM-- SQLAlchemy
here-- to actually do the
creation of the SQL statements.
And if you look later in here,
if you'd like to play around,
you'll see that we've rewritten
registrants and unregister as well
to use SQLAlchemy as
opposed to raw SQL queries.
And for instance, here is how
you can get all the registrants
in your database if using an ORM.
Instead of doing SELECT* FROM
registrants, you can just say,
hey registrant class, give me
a query for all of my data.
And what you get back is a whole
bunch of rows, which, as before,
we can pass into our template.
Down here, meanwhile, we can
request all of those rows
again and pass them into that template.
Or here, notice what we can do.
If we've been passed
the ID of a registrant
like Stelios', we can say, hey
registrant class, give me a query,
but filter that query so that the
registrant ID I care about equals
the one I was passed via HTTP.
Oh, and by the way, once you
find that in my database,
call the dot delete method to just
get rid of Stelios from the database.
So again, we might not
necessarily be solving a problem
or scratching an itch just yet,
especially since SQL itself
is, odds are, quite new to you.
And so we've already solved
the problem in one way.
Here's another way to solve it.
But realize that, eventually,
it's fair to say that you find
writing SQL queries sometimes tedious.
Though frankly, you'll get a lot
more control, and potentially,
more performance out of them
if writing them yourself.
And so having the best of both
worlds is perhaps the best takeaway
here-- actually understanding
what's going on underneath the hood,
as was the entire point of our
spending so much time in C,
and understanding how you
can execute SQL queries,
but realizing, down the road, especially
if you find that, wow, it's really
getting a little slow to write
all these low level SQL queries,
wouldn't it be nice to just
create my database schema
as I did earlier with phpLiteAdmin
or at the command line
and then let my library
code figure out how
to get data in and out for me,
albeit perhaps at a performance
penalty, that's a nice place to get to.
So again, even now that
we're in week 9, and we've
abstracted so far away
from week 0s and 1s,
is there still this progression and
this onward march of abstraction
as the world gets more and more familiar
with solving problems and starts
to realize best designs for doing so?
But it's not all fine,
and good, and safe.
In fact, let's make note of perhaps one
of the most tragically common mistakes
people make when using SQL.
And indeed, one of the
reasons to use things
like libraries like CS50's library
or even higher-level, fancier
libraries like SQLAlchemy, is
to avoid these kinds of threats.
And yet many people and
many sites still suffer
from what are called SQL
injection attacks, for instance.
So what does this mean?
Well, probably, a few times a week,
you log in with your Yale NetID
or with your HarvardKey, which gives
you forms like this or like this.
And ultimately, you're providing.
simply, a username and a password.
But suppose that, for
the sake of discussion,
the HarvardKey system were implemented
on the back end with Python using SQL.
And how, then, do we implement logins?
Well, when I give Harvard or
Yale my login name or my NetID
and then my password,
well, what are they doing?
They probably have,
each of them, a users
table if they're using SQL, whether
it's a SQLite, or Oracle, or MySQL,
or Postgres, or whatever.
And they probably have written code
somewhere in that login site that says
SELECT* FROM users WHERE username equals
whatever they typed in AND password
equals whatever they typed in.
And if that gives you back a row
representing David, for instance,
then you know that he or
she has logged in correctly,
because you wouldn't have found the row
if the username and password weren't
in the database.
Now, it turns out fancier things
are done with the password.
You probably don't want to store
users' passwords in plain text,
so to speak, borrowing
language from week 2.
Rather, you want to store ciphertext
or some kind of hashed value
so that even if, in the worst case,
your database is compromised or stolen,
no one in the real world actually
sees your users' passwords but only
some cryptic-looking hashes
thereof, which at least raises
the bar to exploiting your account.
But if they're using SQL-- this is what
worries me-- they are, at some point,
taking what I, a human, typed
in-- hopefully a good guy,
but could be a bad guy,
typed in-- to their website
and plugging it into a SQL query.
Because they're not just going to
necessarily do SELECT* FROM users.
They might actually say, SELECT*
FROM users WHERE username equals such
and such AND password
equals such and such,
or at least one of those predicates.
So what if you've done
things poorly in code?
And suppose that Harvard had
implemented it in such a way
that this simple-looking, stupid-looking
query is actually a really big threat?
So I've temporarily
turned off the bullets
that you would normally
see in a password form.
This is not hard.
It's just an HTML thing.
And suppose that my email addresses
is me@examplemailprovider.com.
And suppose, for my password,
I don't type in 12345,
or whatever my actual password is.
I type in, cryptically, ' OR ''1='1.
Why this?
And there's an infinite
number of things I
could type if I am aggressively trying
to hack into Harvard or Yale's website.
But notice this feels like
it's part of a logical query.
It turns out SQL has OR.
We haven't seen that before,
but like Python, it literally
has the keyword "or."
And I am assuming, in this
case, per some of the examples,
that maybe the programmer
at Harvard or Yale
has single quotes in his or her code.
And maybe they are just foolishly, and
very riskily, plugging in what I type
in between those quotes.
So notice, this is kind
of the end of a quote.
This is the beginning of a quote, but
I've not finished the thought there.
So let's see what happens.
Suppose that the code on the
back end at Harvard or Yale--
we're in Python-- this.
So somewhere in their files, username
variable gets request.form username.
Password gets request.form password.
So just two variables called
username and password,
just so I have them handy.
Maybe they're using
CS50's library function.
db.execute( SELECT* FROM users-- and
this happens to wrap on two lines,
but ignore the extra space-- WHERE
username equals {} AND password equals
{}).
So just like we did last week when
printing things in a formatted fashion,
this was like Python's
equivalent of print diff
using the format method of the string
class passing in username and password.
But this worries me.
Because if the programmer at Harvard
or Yale has literally given me these
placeholders of '{}', whatever I typed
in as my username and my password is
literally going to go there and there.
But what if-- oop, oop, oop, oop, typo.
The whole story breaks without this.
And there-- but what if the user
types in that cryptic-looking string?
Well, then, what ends
up happening is this.
And it's red because red is bad.
SELECT* star FROM users WHERE
username name equals quote, unquote,
me@examplemailprovider.com-- no
big deal-- AND password equals--
this is interesting.
Previously, there was a single
quote and a single quote,
and then the curly braces in between.
But underlined here is what
I, the adversary, typed in.
Cryptically, 'OR'1'=1', that's it.
Notice where the
underlining starts and ends.
But I seem to have maliciously
finished that programmer's
thought at Harvard or Yale.
I've not finished it in
the way they intended.
But this is syntactically and
semantically correct, if a bit strange.
I am essentially saying, if the
password equals nothing or 1 equals 1,
because I've plugged in those
characters in such a way
that they still make a WHERE
clause syntactically accurate.
1 equals 1 always.
So this is like saying, SELECT*
FROM users WHERE username equals
me@examplemailprovider and
whatever else is the case.
Like, that is always
true, that 1 equals 1.
So it doesn't even matter
what the user's password is
or that I didn't even type it in.
I just did quote, unquote.
All right, well, what if maybe--
OK, so format may be bad.
Curly braces, like we've been doing
in Python for string formatting,
maybe that's bad.
Python also has, like
Java and JavaScript,
the ability to concatenate
things together, in this case,
using the plus operator.
So what if I just go old school
and concatenate my strings together
like this?
Maybe that's better, noticing single
quote here, single quote here.
And the double quotes are
just stopping the string
while we do this
concatenation there and there.
It's the exact same problem.
So those instincts don't
serve us well at all.
In the end, we still
get, username equals
whatever I typed in AND password equals
this WHERE 1, indeed, again, equals 1.
So no matter what, this query is
going to return a row if that email
address is in there, probably resulting
in the Harvard or Yale code logging
this malicious user in.
So there's got to be a better way.
And indeed, the reason to
use libraries in general,
whether it's CS50's for
the next couple of weeks,
or in the real world, any
of dozens of SQL libraries,
is that other people before you
have thought through these threats,
have written the requisite
code, which isn't all that much,
to notice when there's
dangerous queries being injected
and escape them properly.
So here's how we would do this
with the CS50 db.execute() method.
SELECT* FROM users WHERE username equals
colon username AND password equals
colon password-- sorry
for the type there.
I'll fix that.
Bad with spacing today--
AND password equals
colon password, unquote, passing
in username, passing in password.
And again, you don't
repeat the colons here.
But the key and the key in those
named parameters lines up with the key
and the key that do have the colons.
And the way the CS50 library
works-- and there's not much
going on in the CS50 library.
There's not much of a
training wheel there--
we essentially are simply wrapping
that other library I referred to,
SQLAlchemy, which gives you not only
the ORM feature, the Object Relational
Mapper feature, where you can create
your own classes like registrant,
they also let you, in that
library, execute raw SQL.
And they also take care
of all of this escaping.
So we, the CS50 library, are
really just taking your queries,
passing them to the SQLAlchemy library
via raw SQL, getting back the results,
and just neatening them up,
and returning to you only
a list with dictionaries
inside without expecting
you to execute multiple lines of code.
And now, in green, is the
expected solution here.
Because of the way the CS50 library
works, and in turn, the SQLAlchemy
library works, even if an
adversary types in funky syntax
like those single quotes trying to
trick your database or your Python code
into executing something malicious,
notice what the library has done.
These backslashes were not in
the user's adversarial input.
They were not in the previous
red problematic examples.
What the CS50 execute()
method does for you,
what the SQLAlchemy library does for
you is looks at user input that's been
plugged in for named parameters
and says, whoa-ho, wait a minute.
If there's a single quote in there,
let me escape it with a backslash
so that the only actual single
quotes are the outermost ones.
And only if the user's password is
that thing there that's underlined will
they actually get in.
And most likely, it's not going
to be something as crazy as that.
But in light of all this, do
you perhaps now appreciate
what this particular person, perhaps
with a little too much free time,
was trying to do?
So parked in a parking lot here
is this fellow's plate here.
And this person had taken the time to
print out something a little curious.
Let's enhance.
What the heck is going on there?
Well, it turns out that while we've been
focusing on the logic of 1 equaling 1,
the real takeaway of
SQL injection attacks
is that if you can somehow trick a
database into executing a line of code
that you have written-- previously,
the only line was, quote, unquote,
1 equals 1 or what not.
But suppose that that
adversary-- suppose
I had included a
semi-colon in my username
or a semi-colon in my password.
And you, the programmer, naively
trusted what I was typing in.
And you simply executed whatever I typed
into my username or my password field,
allowing me to have a
semi-colon in there.
And that semi-colon, as you know, ends
one SQL statement and begins a new one.
Suppose, god forbid, like
this person here tried
to finish his or her license
plate in a SQL-like way,
with a quote here, and commas,
whatever those mean, and a semi-colon,
but then also included a valid SQL
command like DROP DATABASE TABLE, which
is the only thing I
cautioned about, earlier,
being especially bad-- this person was
apparently trying to use those traffic
cameras, which, he or
she surmised, might
have been using SQL as the back
end and storing people's license
plates in that back end and,
correctly or incorrectly,
was hoping that, upon
a camera seeing this,
using Optical Character Recognition,
OCR, converting this into text,
passing that text into a database
that might not be scrubbing
or sanitizing its
inputs as we've proposed
with CS50's library or SQLAlchemy--
was hoping-- that maybe that back end
database was poorly
implemented enough to trust
what was passed in so that after
logging this person for speeding
or whatever, actually dropped
the entire database, covering
his or her tracks entirely.
So all this and more is ahead of us
as we continue to build, and build,
and build on top of lessons past.
And next week, when we introduce one
final language, JavaScript, a language
that you can not only
use on the server side
but also, and especially,
on the client side
to create all the more of
an interactive experience
and all the more of a
compelling user experience
for users using a bit of Python,
and SQL, and HTML, and CSS,
and soon, now, JavaScript,
we'll see you then.
[MUSIC PLAYING]
SPEAKER 1: Rosebud-- yeah, actually, now
that you mention it, one time, I went
into his office to look for some forms.
[KNOCKING ON DOOR]
SPEAKER 2: Bud?
David always said "pal."
