[MUSIC PLAYING]
BRIAN YU: OK welcome back
everyone to web programming
with Python and JavaScript.
So, last week, at the end
of last week's lecture,
we talked a little bit about how we
might design a web application that's
designed to store data.
So we looked at an example
of a note taking application,
where our goal was to be able
to have users type in notes
and save those notes so that they
could have a list of all the notes
that they have saved.
And we found a solution
that works just by storing
the notes inside of a list of strings
in Python, in our flossed application.
But it wasn't great, because if we
were to ever shut down the web server
and restart it again, we would suddenly
lose access to all of that data.
All that data would be lost.
So what we're going to begin to
talk about today is databases.
And in particular, trying
to find some way to use
databases to make it easier for
our web applications to store
and manipulate and use data ultimately.
And the types of databases that we're
going to be talking about today,
are relational databases.
Which you can think of as effectively
storing data inside of a table.
So the example I keep
coming back to today,
is thinking about trying
to design a database that's
going to potentially
be used by an airline,
for example, to store
flight information.
And so you might imagine that if
an airline wanted to store data
about flights, in the
simplest of senses,
they might have a table that
looks something like this.
Where we have for each
one of these rows,
each row represents
an individual flight,
and each flight has an origin, New York,
Shanghai, Istanbul, somewhere else.
A destination, and a duration
in some units, maybe minutes.
So we might imagine
that you could design
a spreadsheet, for instance, that
would have three columns, origin,
destination, and duration.
And that you have one row
for each individual flight.
And each one of those individual flights
would have an origin, destination,
and duration, respectively.
So that's what we're
going to be trying to do.
Designing these sorts of tables, but
we're going to do so using technology
called SQL, or S-Q-L.
Structured query language,
which is a language designed to allow us
to very easily interact with databases.
Interact with tables of data that
have rows and columns, for instance.
And in particular, in
this class, we're going
to be using a particular version
of SQL called PostgreSQL.
Although there are a bunch
of different versions of SQL
that have slightly different features
and slightly different aspects
to their languages and
the way that they work,
but they're all very, very similar.
So once you know how
to interact with one,
it's very easy to see
the parallels in others
because the syntax ultimately
is very, very similar
from one language to the next.
So we'll go ahead and dive
right in and start just
by thinking of the first
thing that we would
want to do once we have a SQL database,
which is going to be creating a table.
So when we're creating
a table we're going
to need to decide what sort of
data that table is going to hold.
So we saw before in
the table that I showed
before that we had an origin column,
a destination column, and a duration
column for keeping track of different
information about a particular flight.
So all of that information
needs to have a particular type.
And so what we see here
are a couple examples
of the types of data that
a SQL database might have.
So data might be in
the form of an integer,
or a decimal, which can
have a decimal point,
serial is a common data type that
you'll see in SQL or in PostgreSQL
specifically.
A serial data type is
basically just like an integer,
except it counts automatically.
So anytime you add a new thing or
a new record into your database,
it will automatically
increase its value.
So this will be common If I want
to give each flight a flight
number, for instance.
Where I want the first flight
that I put into my database
to have flight number one, the second
flight to be flight number two,
third flight number three, and so on.
Serial is a useful data type that
will just automatically increase
its value one at a time.
Varchar is another
popular data type in SQL.
Varchar stands for variable
length of characters.
So if I wanted to store a string, just
some sort of text in a SQL database,
varchar is generally the data
type I would use for that.
And there are a bunch of
other data types as well.
Data types for storing time
stamps for a date and a time,
a Boolean value for true or
false, or an enum, which is just
a way of saying one of a finite
number of discrete possible values
then I might have.
We'll focus mostly on these top four
here though for the purposes of today.
So now that we have
these data types, we need
to use them in order to actually create
a database table that we're ultimately
going to use.
So the first thing we'll
look at is the syntax
for how we would go about creating
a table in Postgres, or in SQL
more generally.
So the syntax for doing that, we'll
notice we'll find in create dot SQL,
and this is an example of SQL syntax.
Of text that we're going to
be feeding into a database,
and that database is
going to interpret the SQL
commands, and use it in
order to create a table.
So what's going on here.
Online one, I said create table flights.
And so that's telling the database
to create a new table, just
a set of rows and columns, and
we're going to name it flights.
And then when I create the table, I
need to tell the database in advance
what types of information my
flights table is going to contain.
And so you'll see that
information, one on each row.
The first thing that each table
is going to contain is an ID.
This is a very common paradigm
when designing databases.
We want it to be easy to
reference individual flights.
To be able to say, get
flight number 28, and give me
some details about
that particular flight.
And it's much easier to
say get flight number 28,
than it is to say get me that flight
that goes from Moscow to New York,
for instance.
And so having an ID on every single
table that makes it easy to say,
this is the way that I want to reference
flights, is often very practical.
And so this ID is going
to be of type serial.
Recall that serial is just an integer
that will automatically count.
The first flight will be flight
one and the second flight
will be flight two, et cetera.
And then primary key just means
this is the primary way via which
I'm going to reference a flight.
That every flight is
going to have a unique ID.
And therefore, if I
tell you ID number 28,
that will map to one
and only one flight.
Then on line three, I'm starting
to define the other columns
that I care about.
What else goes into a flight?
Well, it needs an
origin, which for now I'm
just going to represent
to the Varchar, just text.
Just the name of the place from
which the flight is departing.
And I can additionally specify
other properties of this column.
In particular, I'm saying not null.
So SQL databases support the idea that
a column may or may not have a value.
And so in some cases, you might want
it to be the case that a column doesn't
necessarily need to have a value, but
can have a value if you wanted to.
But in this particular
case, if there is a flight.
It needs to have an origin.
And I don't want there to
be a flight in my flights
table unless it does have an origin or
city from which it's flying out from.
And so I specified not null to mean
that the origin column cannot be empty.
And if you try to add a flight
where the origin column is empty,
the database should reject that.
So you can begin to enforce constraints
on what the database will or will not
allow by adding these
additional constraints
after the name of the column
and the type of the column.
Likewise, on line four, I'm defining a
destination, which is also just going
to be a varchar for now, just text.
And I'm saying that this is not null.
And then finally, I'm specifying a
duration, this will just be in minutes.
Which is going to be an
integer, and also not null.
Every flight needs to have
a duration for how long
that flight is going to last.
And so this is the syntax for how
we would go about creating a table.
So once I have a SQL database up
and running, and generally speaking,
in order to do this
in Postgres at least,
you'd need to start
up a Postgres server.
So you can start a server up locally
on your own computer, which is
what I'll be doing in today's lecture.
But you can also find PostgreSQL
databases that are hosted online.
In particular, for the next project
we all will work on, what you'll do,
is you'll be interacting
with a Postgres database,
and that database will
be living on hiroki,
which is an online platform for
hosting web sites and databases.
And you'll be able to
connect to that database
remotely using your computer to
connect to some database that's online.
But for now I'll just
connect to the database
that's local to my own computer.
So I'll go ahead and
type, go into psql, it's
just a command that that lets
me type in PostgreSQL commands.
And I'm saying lecture 3 because that's
the name of this particular database.
In reality, if you
wanted to use the Psql
command with a database that
exists somewhere on the internet,
that database would have a URL.
And information about
where to get that URL
will be in the project instructions.
You can paste that URL in
after the word Psql in order
to access that database as well.
So now I want to create the
table inside of this database.
So I'll go ahead and
copy this text, this
create table flights with all of
those individual columns listed one
after the other.
And I'll paste it in here.
And now that table is created.
So after that semi-colon, that
means the end of the command,
Postgres said back to
me create table, meaning
that it successfully acknowledged the
fact that I wanted to create the table,
and it created the table for me.
And I can verify that
if I type backslash
d, which just shows me all of the
different parts of my database
that exist right now.
And I can see that I
have this table called
flights that exists right
now inside of my database
that I have now just created.
So that was how we might go about
creating a table in Postgres.
And we talked about the
various possible constraints
that we can place when
we want to create a table
and we want to define what
the individual columns do.
So I can specify that
a column is not null,
meaning that it needs to have a value.
I can specify that a
column must be unique.
If I don't want two people to share
the same value in a particular column,
I can enforce that in
the database by saying,
this is going to be a unique column.
So for instance, if you were
managing a database of users
and every user has a user name,
you might want that user name field
to be a unique column because you
don't want two different users to have
the same username, for example.
Primary key is just the single
way by which you're primarily
going to reference a table.
Usually, that's going to be
the ID, just the integer that's
going to keep counting every time that
keeps track of which individual row
you're referring to.
You can give columns a default value.
So if most of the time your column
is going to have a value of 0,
you might say this
column default 0 to mean,
if I don't tell you what to put in
the column, just put in a 0 by default
and only put something else
in if I tell you to do that.
So that can often save time.
And you can also add
additional constraints.
Check is another constraint
that you can use to say,
I only want to allow values that
are less than 50 or less than 100,
for example.
And you can begin to add more
constraints on those databases as well.
Questions about how to go about creating
tables or what we were thinking about
in terms of deciding what
types each column should have
and how that process works?
OK.
So we've created a table, and
we've created this flight's table.
But right now that table is empty.
It's got no rows in it, so it's not
actually storing any data just yet.
So let's go ahead and fix that.
The next thing we're
going to talk about is
how to go about inserting
data into a data table.
So this is what the syntax
for that might look like.
So if I wanted to take a flight and
insert a flight into my flights table,
this would be the syntax.
First I say insert into, which
is the SQL syntax to say,
I want to add data to one of my tables.
After that I specify the name of
the table that I want to add to.
So in this case, I'm going
to add to the flights table.
In the previous example, we
created a table called flights,
now I'm saying I want to insert
into this flight's table some data.
What comes next, and this could all be
on one line, but for the sake of space
it's broken out across a couple lines.
I've specified in
parentheses and separated
by commas, the names of the columns
that I want to add information for.
So I'm going to add an origin,
a destination and a duration
for this new row that I'm adding.
And you don't necessarily
always need to add something
for every single possible row.
Note that in this case,
I didn't add in ID
for example, which is the fourth
column in the flights table,
because ID is going to
automatically increment on its own.
We defined it as a serial
type just a moment ago.
And if you had a column that allowed
no values, was allowed to be empty,
you wouldn't need to specify it inside
of the insert if you didn't want to.
Or if it had a default value,
and you didn't specify it here,
it would just go to the
default value since you
didn't tell it what value to take on.
But in this case, I'm going to add
a new flight to my flights table.
I want to give it an origin,
a destination and a duration.
And now I need to tell
the database all right,
here are the columns that I'm going
to add for this particular row.
What should their values be?
And so we say values.
And then in parentheses, again, in the
same order that I defined my columns,
first I'm going to specify my origin,
New York, then the destination, London,
and then the duration,
415 minutes in this case.
And so this would be the syntax
for how to go about inserting
a record into that database.
Let's take a look at
what that would actually
look like if I were to go over here
and go back into this database.
I would say something to be
effect of, insert into flights.
And then I would specify origin,
destination, and duration.
And then these values, I want to specify
New York, and London and 415 minutes.
So semi-colon, I press
Return, SQL says that I've now
inserted that row into the database.
And so now I've added
a row to that table.
We haven't yet looked at how to see
all that data in the table just yet,
but I have inserted
that row into the table.
And just for the sake of adding
some sample data into that table,
I have here insert dot
SQL, which just contains
a whole bunch of other insert queries.
So these are all identical queries,
each one just has insert into flights,
here are the individual
values I want to insert,
and so I'm going to go ahead and
copy these and paste them in here.
So that now I've got a
couple of different flights
that are inside of this database.
So that was inserting
values into the database.
But next up is figuring out once
I have values in the database,
how do I look at them?
How do I see what's going on?
And the way to do that, is via
what's called a select query.
And so we've looked at create table,
which creates a new table, insert into,
which inserts a row into the database.
So writing to the database.
And select is a query that is
meant for reading from a database.
I already have rows in the database, and
now I care about accessing those rows.
Figuring out what they are,
under certain conditions,
and that's when I would go
about using a SELECT query.
So what does a select query look like?
Well the simplest select query that I
can run is a select star from flights.
Select star, star just means select
everything, meaning all columns.
And then from just means what
table do I want to select from,
because I might have multiple tables.
I ran Create Table once
to create a flights table,
but I could just as well have created
multiple other tables that are
stored as part of the same database.
So I'm saying select star from
flights, meeting select everything
from my flights table,
and when I press Return
and run that query, when I get back is
all of the data that is in my table.
And It looks very similar to the table
that we showed at the very beginning.
But this is SQL telling me that OK,
here are my six flights that I currently
have, each one has an ID, each
one has an origin, a destination
and a duration.
I only ever added the origin and
the destination and the duration.
But in particular, SQL has
automatically added the ID for me,
since I specified the
ID is the serial column.
And so this gives me
access to all of that data
just by running that select query.
Questions about inserts or selects
in terms of what we've seen so far?
All right.
So let's look at some of the other
types of select queries that we can run.
So select star from flights is
the first query that we saw.
So if our data looks
something like this,
with six different flights, each
one with an ID, origin, destination
and duration.
Select star from flight is going
to give me back all of that data.
Where this light blue just
means all of this data
is information that
you'll get back, and you
saw when I ran that query a moment
ago, this is exactly what happened.
All of this data was
displayed back to me
and handed back to me from the database.
But oftentimes, when I want to query
for data inside of my web application,
it's not the case that I need
all of the data all at once.
In fact, that'd be probably
quite a lot of data,
especially if your web
application starts getting big
and starts getting a lot of data in it.
So what I'd likely want to do is only
select the things that I care about.
So select star from flight,
means select everything
across all of the columns of the
table get me that information.
I could also do a query that
looks something like this.
Select origin comma
destination from flights.
Where I've replaced the star
meaning select everything
with just two columns,
origin and destination.
And so if I were to
run that query instead,
then the only data
that's handed back to me
are the contents of the origin
and the destination columns.
So I don't care about what
the ID of the flight is.
I don't care how long
the flight is, I just
care where am I flying
from where am I flying to,
and I'll get back that
data for all of the flights
that are inside of my table.
And so it's easy and SQL to
manipulate the data in the table
to get back only the
information that you want, just
by specifying in the
query what information it
is that you actually care about.
If you only care about two columns,
no need to select everything,
you only need to select the columns
that you actually want to get back.
There are other ways to
modify your queries as well.
So select star from flight, star
again means select all of the columns.
But I can add this WHERE clause.
And the WHERE clause
is going to allow me to
instead of restricting
what columns I'm selecting,
restrict what rows I'm selecting.
So normally, select
star from flights would
give me back all of the
rows inside of my database.
But chances are, if my
database starts getting big
and I have a lot of different
rows inside of my table,
I don't want to necessarily select every
single row every time I make a query.
I only care about selecting
particular rows that I care about
in this particular instance.
And so if I say select star
from flights where ID equals 3,
as you might reasonably
guess, what that's going to do
is instead of giving me back all
six of the rows inside of my table,
it's only going to
take row number three.
And this is going to
be the only row that's
handed back to me from the database.
Questions about that
WHERE clause so far?
That WHERE clause doesn't
have to be just a number.
It could be of any type as well.
So I could say, select star from
flights were origin equals New York.
And that's going to go
through the database,
find me any flight that's
flying out of New York,
and only hand me back that information.
So I ignore all the other flights
that aren't coming from New York,
and I'm only getting these two.
And you can imagine that being
practical if you're actually
running a airline's web
application, and someone
is looking for flights out of
New York, rather than query
for every single flight
and show all of that,
you would only want to query for
the flights that are flying out
of New York, for instance, and only
give back that data back to the user.
So you can also, in addition
to specifying equality,
use the WHERE clause to
specify an inequality.
So in this case, select
star from flights
where duration is greater than 500.
What that's ultimately going
to do for me is to say,
I want to select from flights,
but only where this duration
column is greater than 500.
Giving me back only the
longer of the flights that
happened to be part of
this database so far.
And so those inequalities
are supported as well.
And I can also use Boolean
logic, ands and ors,
to manipulate my queries as well.
So in this case, I've said, select star
from flights where destination equals
Paris and duration is greater than 500.
And as you might reasonably imagine,
the only row that's going to return here
is row number two.
Because the destination is Paris,
and the duration is greater than 500.
And even though there are other
flights that are duration over 500,
or have a destination of
Paris, those aren't selected.
Because I only want to select
ones that need both criteria.
And just as we can use an and inside
of a Boolean expression in SQL,
we can just as easily use an
or inside of that expression.
So I can say select star from
flights where destination equals
Paris, or duration is greater than 500.
And now all four of these are
going to be handed back to me.
This one, because it satisfies
both of the criteria,
row number three because the
duration is greater than 500.
And then rows four and
five because I have
it such that they both have
a destination of Paris.
And so there's a variety of
different types of SQL queries
that I can run in order to do
a variety of different things.
Questions on those so far?
OK.
Let's take a look at a couple of other
types of queries that we can run,
and a couple of others of these
examples of select queries.
So I'll go back into the database.
And we'll do a couple of examples
from ones we've seen before.
So select star from flights
gave me back everything.
If I only care about maybe the
destination and the duration,
I could say select destination
comma duration from flights.
And that gives me back only
those two columns, for instance.
And then I can begin to restrict it.
Select star from flights where
destination equals Paris.
And that gives me only the rows
whose destination is Paris,
and ignores everything else.
But where SQL starts to
get even more powerful,
is when I can start
using functions to begin
to specify with even more precision
what it is that I actually care about.
So SQL, for example, has an
average function, called AVG.
And so maybe I care about knowing
what is the average length
of a flight in my database?
So I could say, select the
average duration from flights.
Where AVG is my an average function
and being the average of the duration.
And what I get is 501 point 6667.
Which happens to be the average
duration of a flight in my flights
table right now.
And this is still just a select query.
It just happens to have a function
that's modifying one of the columns.
I'm modifying the Duration column.
But I could add a WHERE
clause to this, for example.
I could say select the average
duration from the flights
where the origin is New York.
So that will only give me the average
length of a flight from New York,
for instance.
And so you can begin
to take these pieces
and begin to put them together in order
to construct more interesting queries.
Another particular popular
function is the count function,
which just counts how many rows
that are returned from the database.
So I could say, for instance, select
COUNT star from flights to mean select
all the rows from flights-- that's with
select star from flights would do--
but instead of giving
back all the rows, just
give me the count of the number of rows.
Tell me how many rows
there are in my table.
And that will give me back six.
But I could equivalently
or relatedly, try
to say select COUNT star from
flights where origin equals New York.
And that will give me back two.
Because there were only two flights
that are flying out from New York.
And likewise if I were to type
in flights from origin of Moscow,
there is only one of those flights.
And so counts can be used
in order to figure out
how many entries inside of my table
have this particular property.
And so that can often be
very, very useful as well.
Questions about any of those
things that we've seen so far?
Other popular functions
like Min and max which
can be used if I want to
select the minimum duration
from my list of flights.
OK, the shortest flight is 245 minutes.
If I'm curious as to which
flight is that one that
has a duration of 245 minutes,
I could say select star
from flights where duration equals 245.
And that will tell me OK, that flight
is the flight from Moscow to Paris.
That's the flight with ID number five.
And so that database insertion.
A couple of other things you can do.
You can also specify
using the in keyword.
A range of possible values.
So in this case, I'm saying
select star from flights
where originated in New York and Lima.
And the result of that is that we get
flights that are from either New York
or Lima that get selected
as a result. And likewise, I
can start to do string matching.
Something like this.
Which is a situation in which I select
star from flights where origin like
%a%.
So common nowadays are things like
auto complete, or where you can
type in something partial
into a search query,
and you'll get back the full thing.
This is doing something very similar.
Select star from flights
where origin like %a%.
These percents stand in as
placeholders effectively where
they can represent any text.
And so select star from
flights where origin like %a%,
says select any rows from flights
where the origin has an A in it.
Right.
It has some amount of text,
possibly none, followed by an A,
followed by some amount of text.
And the result of that is that all
of the things that have an A in it
are going to be returned
back to the user.
And you can imagine this being helpful
if you have a database of information
that you want a user to search through.
And they might search for ultimately
a sub-string of the actual text that's
contained inside of the column.
But you can use that information using
a like query as opposed to origin
equals something, to say
find me all the things that
match that thing that the user
typed in, even if it's only
a sub-string of the total thing.
And so that can be useful as well.
And so we looked at some of these
functions, sum and count and Min
and max and average that
can be used as well.
All of these are just different
ways of selecting data.
Of specifying which
columns we want to select,
of specifying which
rows we want to select,
and restricting what rows come back.
And ultimately that
is insertion of data.
Question?
AUDIENCE: So the last
query that you executed,
I guess my question is can you execute--
not a query, I mean a function.
Can you execute a function
in the where clause?
So the last query that you executed,
you did it in two different queries.
You got the min and then you looked at
the value and put it in a second query.
Could you have done that in one query?
BRIAN YU: Good question.
So the question is
about whether we could
consolidate some of these multiple
queries into a single query.
We can, and will take a look at
ways to do that in just a moment.
So that was selection of
data, and how we would
go about reading data from a database.
But what happens if we need to
change data that's already there.
Right.
So that might be the
other common use case.
That I have data, I know how to
insert data into the database.
I know how to read data from a database.
What if that data needs
to be updated or changed?
So that is where the update
query is going to come in.
Where that is going to allow us to
take data inside of our database
and modify it in some way.
And here is what that
query might look like.
So the query will start with the word
update, which is just the SQL key word
to say I want to update some
data inside the database.
Following that is the name of the table
that I want to update, in this case
the flights table.
Then I say, set duration
equals 430, I'm setting
the duration of some row to
430, or possibly multiple rows.
And where is telling me
which rows I want to update.
So in this case, I'm saying,
update the flights table,
set the duration to be 430 minutes,
but only do that for columns
where the origin is New York and
where the destination is London.
So any other columns that
don't meet that where query,
are not going to be set to 430.
If I omitted these two
rows altogether, and just
said update flights set
duration equals 430,
what that would have the effect of
doing is changing every single row
in my table to have a duration of 430.
Which is likely not what you
want in this particular case.
So the WHERE clause is very
important for specifying
these are the only rows that I
actually care about modifying,
that I actually care about editing,
as I go about updating that table.
Questions about the update query?
Yeah.
AUDIENCE: Is there a control V?
BRIAN YU: Is there a control V. SQL
does not have a Control V. However,
one common paradigm you'll see
as databases start to get bigger,
is the idea that you always want
to keep database backups around.
You always want to take a database
and have some second copy of it,
such that if something goes
wrong on the main database
you're using you can always restore it.
And if you begin to use databases
in production environments,
Amazon Web Services, for example,
offers as one of its services
a relational database just like this.
And it offers built in ways to help
you keep backups of your database
such that if you mess
something up like this
you can have some way of going
back to what you had before.
Good question.
So we've been able to
add data into our tables,
we've been able to query
for data from our tables,
we've been able to update
data in our tables.
The last thing that we
might reasonably want to do
is delete data from our tables.
Take a row and just
get rid of it entirely.
And the key word in the SQL for that,
is quite logically just called delete.
And that might look something like this.
Delete from countries where
destination equals Tokyo.
That is going to, sorry,
this should be flights.
Sorry, delete from flights
where destination equals Tokyo.
That will go through the table
deleting all of the flights that
have that particular destination.
So questions on creating data or
updating data or deleting data
from our database?
AUDIENCE: Let's say I
delete one row and then add
a new row what would happen to ID?
BRIAN YU: Good question.
So the question is, I delete one
row, and then I add another row.
What's going to happen to the IDs?
So maybe I have IDs one through six, I
delete the thing with ID number three
and then I add a new thing.
Is it going to be number
three or number seven?
The answer is that generally
it will be number seven.
That the numbers just keep adding
up and they don't fill in the blanks
from before.
As we'll see in a moment,
there's good reason for that,
and the reason has to do with the
fact that because we will oftentimes
want to represent relationships
between tables, which we'll soon see,
it can often be very, very
helpful to be able to know
that if I had the thing with ID
three, nothing else will ever
have that ID again.
Everything else will always have
a later ID, even if I go back
and delete that data.
So yes, it will just keep counting
even if I delete older things that
existed in the database before.
So other clauses that could be useful.
And here we'll get into answering
your question about how you might be
able to group multiple things together.
So let's say I wanted to
select all of my flights.
Select star from flights.
And right now I'm
getting back six results.
But you might reasonably
imagine that I don't
want to get back all of the results,
especially if the table is large.
I only want to get back a couple.
And so there is a keyword
called limit which
means I only want to get back
a certain number of results.
So select star from flights limit two,
is going to mean get all the flights,
but only give me back
a maximum of two rows.
And so I only get back
two rows from there.
But of course this is
probably more useful
if this data is ordered in some way.
Right?
I don't really care about getting
the flight with ID one or ID two,
but I might care about, for
instance, getting the two shortest
flights in my table, for example.
And so I would want some way of sorting
this table by a particular column.
And the way to do that is
via the order by keyword,
which is one that will prove
quite useful if you ever trying
to get a sorted list of your data
and only extract out parts of that.
I can say select star from
flights, order by duration,
and I'll say ASC for
ascending, although strictly
speaking that's optional in this case.
And what I get now, is rather than
all of the flights in the order
that I inserted them like
they were in before, now I
get them in order of duration.
With the shortest flight at the
top Moscow to Paris 245 minutes,
and the longest one, Shanghai to Paris
760 minutes all the way at the bottom.
And so if I only wanted to get the
three shortest flights in at my table,
I might say select star
from flights, ordered
by duration ascending, limit three.
And that only gets me the three
shortest flights in my database.
And just as there is a key
word ASC for ascending order.
There's also DESC for
descending order, that
gets me all the flights
starting with the longest one
and going down until the shortest.
Questions about any of that?
Yeah.
AUDIENCE: Can we use the roll back
commit to give us the transaction?
BRIAN YU: Good question.
Can you use roll back and commit.
Yes.
We'll talk about how you would go about
committing changes a little later.
But yes, you can rollback
changes if you make changes
and realize they aren't in the
changes that you wanted to make.
That's similar to the
control V type of feature,
but you have to tell the database that
you are starting a quote "transaction",
and then start adding
to that transaction.
We'll talk more about
that in just a moment.
AUDIENCE: Is there any tool that
I could use for this [INAUDIBLE]
SQL like a [INAUDIBLE]?
BRIAN YU: Good question.
Is there a tool for
interacting with the database?
Yeah, great question.
So there are a couple of
different ways that you can
go about interacting with the database.
One is via the command
line, like we're doing now,
where I'm just typing
in SQL commands directly
into PSQL, which is a program that
lets me run commands in the database.
One that we're going to
provide for you in this course,
which will prove quite useful to
you as you work on your project,
is called Adminer.cs50.net.
So Adminer is a third
party database service
that lets you interact
with a database online.
And so what you'll do once you get
a database for the next project,
is you'll likely want to
go to Adminer.cs50.net,
you'll paste in the credentials
for accessing your database,
which you'll get once you
start working on the project,
and from there, there are tabs from
which you can run SQL commands,
from which you can look at all the
data in a nice graphical interface.
And so that'll just make it a little
more convenient to look at that data
as well.
So you can either do it through
Adminer or an interface like Adminer,
there are a whole bunch of
services like this out there.
Or you can do it via the command line.
Or as we'll see later
today, we'll find ways
of using our Python code to be able
to interact with our database such
that we can have a flask application
that itself is running SQL commands
and interacting with
the database as well.
But we see all that in just a moment.
Good question.
So couple of other things
I want to say about how
you can go about selecting data.
We were talking about how
to go about performing
what might ordinarily
take multiple queries,
but combining them into a single query.
Let's say that I wanted to find out what
are the most popular places from which
people are flying.
So I say select star from flights, I see
you know, New York, Shanghai, Istanbul.
What are the most common
origins of flights?
Well I can run a query like this.
Select origin and count star.
Remember, that origin is just
going to select the origin column.
Count star is going to select the
count of the number of rows that are
being returned from my flights table.
And group them by origin.
And so what that's going
to say is, group by
is going to say, take all
of the rows and put them
in groups based on their origin.
So all the flights that are
from Moscow go in one group.
All the flights from New
York go in another group.
And then count star is going
to say, well count them up.
How many of them are there from
each of these individual locations?
And so select origin, count start
from flights, group by origin,
give me back something that
looks a little like this.
I have a list of all of
the different origins.
I have five different origins from
which flights are coming from.
And then I have this count
column which is telling me
how many flights have
that particular origin.
So in this case, New York has
two and all the rest have one.
And so what if I wanted to
select only the origins that have
more than one flight going out of it?
I want to select the popular places
from which people are flying,
and my definition of
popular, in this case,
is it's going to be a flight
that has, or a location that
has multiple flights coming from it.
So I can say select origin,
count star from flights
grouped by origin, same as before.
But after a group by, you can optionally
specify what's called a HAVING clause.
Which is similar to a WHERE clause,
except it follows the group by.
So I group by origin, but I only want
to select those origins that have what
particular property?
Well, I want to select the ones that
have count star greater than one.
So to break down this query.
It's a little bit long.
I'm selecting the origin, this is
going to be a list of locations.
Count star, this is how many flights
are from that location from my flights
table.
Grouping them by origin, this is how I
get all the New York flights together,
for example, and all the
Lima flights together.
And then having count star greater
than one is going to mean I only
want to get the origins that
have a count of greater than one.
And so the only thing that
I get back from that query
is New York, which in this case
has two flights coming out of it.
So that would be high might do a
more complicated query to get back
particular information
that I care about.
Questions about how that worked?
OK.
Next up, we're going to talk a
little bit about foreign keys.
So a foreign key is going to be a
method that we're going to use in order
to connect multiple tables together.
And so SQL is often called
a relational database.
And it's called a relational
database because one thing that
makes it quite powerful is the ability
to take multiple different tables
and relate them together in some way.
So you imagine that I
have a flight table,
for example, that's got an
ID, origin, destination,
duration just like we had before.
But you might imagine that as
my airline application gets
a little more complicated and
I add more features to it,
I might start to need more
out of this application.
I might, for example, want
to keep track not only
of the name of the
place I'm flying from,
but I might care about the airport
code of that particular location.
So I've just picked out a
representative airport code
from each of these individual cities.
And so New York I've picked JFK,
and each of these different origins
has a origin code, and each
destination has a destination code
to tell me which specific code
represents the airline that I'm
flying out of or flying into.
But now as it starts to
get more complicated,
you can start to see some of the
potential design improvements
that we could make to this table.
In particular, notice that I've
repeated a whole bunch of data.
For instance, I specified here,
New York, JFK, and here also
New York and JFK.
And therefore there's a
lot of redundancy here.
And it might be nice if I could
treat this information as separate.
If I could say, I want to keep track
of locations as its own entity.
And then I want to keep track of flights
that are somehow related to locations.
So I keep track of a table
of locations, or places,
and then I keep track of a table
of flights where my flights somehow
know about that table that contains
New York and Shanghai, et cetera.
So what I might do is define
a table called locations,
that looks something like this.
Where each one has an ID, each one
has a code, and each one has a name.
This is just like the
stuff we've seen before,
except now I have a
table exclusively just
for representing individual locations.
And then what I would do
in my flights table is,
instead of having an origin
column and a destination column,
I'll have an origin ID column,
and a destination ID column.
These columns technically
could be called anything.
But by convention, it's
something underscore ID.
To mean this is the ID of
the origin of this flight.
And which flight or what
location has ID number one?
I could go back to this locations table.
Here's the thing of
ID one, OK this flight
must be flying out of JFK in New York.
I go back to my flights table,
where is it flying into?
Its destination is four,
destination ID four.
So I go back to the locations table,
here is the thing with ID four,
that's London.
So you can begin to use these
numbers just to represent locations.
That way I don't need to
have New York, JFK, New
York, JFK repeated over and
over again inside of my table,
I just need to have an individual
number to represent here
is the location that I'm flying out of.
Now this might seem harder to
read, in particular for us humans.
Looking at this, this is
perhaps even less clear
than the table was before
if I want to figure out
like where is this flight flying
from and where is it flying to.
Now I need to take this number three
and look it up in a different table.
But SQL databases are actually
quite good at dealing with this.
And this is where they
really excel and thrive,
is having these
relationships between tables
and being able to connect them
in a way that's meaningful.
Questions about this general idea
of what we're trying to go for here?
And you can imagine extensions
that we could make to this.
If we wanted to keep track of
passengers on individual flights?
Well then I might have
something that looks like this.
Right, each passenger also has an ID.
Each passenger has a name.
And each passenger has a flight
ID, namely the ID of the flight
that they are a passenger on.
And so I've got two people
registered for flight number one,
two people registered flight
number two so on and so forth.
Questions about what
we're going for, yeah?
AUDIENCE: So you do this relationship
to like save space for example?
BRIAN YU: It can be used.
So the question is why would
we be using this relationship?
Certainly, we can be using it to save
space, because now I don't need to,
if I have long pieces
of text I don't need
to have them appearing multiple
times throughout the table.
I can just have it appear in
one row, and then reference it
by a number in some other table.
And the other advantage is that
it helps to keep things organized.
In particular with passengers.
You can imagine, if we only had this
flights table as we had it before,
where all of the flights had
origins and destinations,
it would be pretty tough now to in this
same table, keep track of passengers.
Right, there's no easy way to add
like a column for a passenger,
because it might be multiple
passengers, for instance.
And so it makes a lot more
intuitive sense to say, you know,
flights are one type of entity, we'll
keep track of that in one table.
Passengers are a whole
different entity, let's keep
track of that in a different table.
But there's relationships
between the tables.
Each passenger is
associated with one flight.
And so that's what the flight
ID on the passengers table
is ultimately going to be used for.
Good question.
So let's take a look at
how that might actually
work in terms of how we would
go about creating foreign keys.
And a foreign key it's just
a fancy way of saying we're
referencing the key
from a different table.
So my flight ID column is referencing
the primary key of that flights table.
So what I might do is create a
table, and I'll call it passengers.
And my passengers table
will have an ID, which
will be a serial primary key,
just like the other table.
It will have a name for
who the passenger is,
and that should be text
Varchar, not null in this case.
And then finally, a
flight ID, which is going
to be an integer that
references the flights table.
In particular, flight ID is
going to be an integer that
is going to reference the ID of
whatever is in the flights table.
So I'm linking these two
tables together effectively.
And so I've created that table as well.
So I'm going to go ahead and I have
a couple of these insert lines.
These insert lines are just going to add
a couple of passengers into my table.
So notice I'm inserting into passengers,
inserting a name and a flight ID,
and their values are just their
name and the ID of the flight
that I'm adding them to.
So I'm going to insert all those.
And now if I select
star from passengers,
you'll notice that I have
seven rows, each one of them
is an individual passenger, and
each one is an individual flight ID.
And so I can see, OK, let's say I
wanted to know what flight is Alice on?
So I could say select star from
passengers where name equals Alice.
Then I get, OK, Alice is
on flight ID number one.
And so now I want to say, OK,
what's flight ID number one.
Select star from flights
where ID equals one.OK,
Alice is on the flight going
from New York to London.
And that would be my answer.
But in order to do this,
I had to do two queries.
Right, I first queried
the passengers table,
to say what flight number is Alice on?
And then I queried the flights
table to say, all right,
which flight has ID number one?
And it would be nice if I
didn't need to do two queries.
If I could join these two queries
into one, and in fact in SQL you can,
using a special type of
syntax called a join.
And joins are very useful once you
start dealing with multiple tables.
Because when a join
allows you to do is take
two different tables that are
related in some way, and group
them together in one when
you try to select them.
So what might that look like?
Well what I'm going to do, is I want
to select now from my passengers table
and my flights table simultaneously.
And in particular, I want to select from
those tables under certain conditions.
I want to select maybe
the origin of the flight
and the destination of the flight and
the name of the person that's flying.
So origin and destination are
both columns of my flights table.
Name is a column of my
passengers table, these
are two different tables that I'm
selecting information from now.
So I'm going to say, OK, let's
select this from my flights table.
But I want to join this
with my passengers table.
So I'm selecting the origin,
destination, and name,
those are the columns I care
about from the flights table.
Joining that with the passengers table.
But the last step is telling my query
how these two tables are related.
I need to tell my table, in
particular, my query in particular,
what do I want to join them on?
What is the relationship
between these two tables?
And in particular, the relationship
here, is that passengers--
and I'm going onto a new line, but
this could theoretically be one line--
passengers dot flight ID
equals the flights dot ID.
So I'm joining the flights
and the passengers table,
and saying join these
two tables together,
and here is how they're related.
If you take the passengers table
and get the flight ID column,
that should match the ID
column of the flights table.
And so I press Return, and
this is what I get back.
I get back Alice on a
flight from New York
to London, Bob on the flight
from New York to London as well.
And so I get back all of these
rows from two different tables that
have now been matched up, that have been
grouped together into a single table.
And so if I wanted to find out what
flight is Alice on, for example,
I might reasonably do the
exact same thing as before.
Select origin, destination, name from
flights, join passengers on passengers
dot flight ID equals flights
dot ID, where name equals Alice.
And that will tell me that Alice is
on the flight from New York to London.
Questions about that joins
syntax and how that worked?
So it was how we were taking two
tables and combining them together.
Yeah.
Yeah.
AUDIENCE: I actually have
back from the reference,
when you say reference when you're
defining the tables the passengers
table.
When you insert it, if you
put some ID that was illegal.
Would the row not go in?
BRIAN YU: Good question.
So what is this
references key word doing?
What is it specifically doing.
And yeah, it helps, what
it ultimately helps to do,
is it helps to enforce constraints to
make sure that I can't do something
wrong, for instance.
So if I try to-- so Alice we just
saw was on flight number one, right?
So if I now tried to delete from
flights where ID equals one,
what I'm going to get back is an error.
And the error that I get back is
that key ID one is still referenced
from table passengers, for instance.
So something in the passengers table
is referencing flight ID number one,
and therefore I can't
delete the flight that
has ID number one
because that would result
in a violation of that constraint.
So the reference of
this key word is used
to help us enforce
that type of constraint
so that we can't do something wrong.
If I try to delete a flight while there
are still passengers registered for it,
the database will physically stop
me from being able to do that.
Good question.
Yeah.
AUDIENCE: Is the connection
between flight ID and ID
sort of implicit in MySQL?
BRIAN YU: So, OK, the question is,
is the relationship between flight ID
and ID implicit?
So generally speaking, when
we use the references keyword,
we are referencing the primary
key of some other table.
So because in the flights
table I said that the ID was
the primary key, the main thing that I'm
going to be referencing that table by,
it assumes that yes, I mean the
ID column of the flights table
is the thing to reference.
There are ways to specify
a different column
that you want to reference instead.
But generally speaking,
the references keyword
is almost always used with referencing
an ID of a different table.
Good question.
So what we saw just now
was an example of a join.
And the way that that join worked
was it took two different tables
and it, based on a particular
condition, matching up
the flight ID of the passenger
with the ID of the flight,
was able to find all the matching
rows and give me them back.
But what you noticed in that
query, was that if I go back to it,
what I get back or only the matches.
In other words, I get back only
the origin and the destination
and the name where there was a match
between the name of the passenger
and the flight itself.
But there might be flights that
have no passengers, for example.
And so there are
different types of joins.
The join we just did was
called an inner join.
And in SQL, when you
just specify the word
join without specifying
what type of join,
it assumes that you mean an inner join.
In other words, only get
the things that match.
You also have the option of
doing what's called a left join.
So what left join is
going to do, is it's
going to take the table on the left--
in this case the flights table--
and it's going to make sure that
all of the rows in the flights table
are included in the final result,
even if they don't have a match.
So if I left to join flights and
passengers together, what I get back
is this.
Which is the first seven
rows are the same as before,
it's the result of that direct
matching, but I also get back
the flights that didn't have a match.
In particular, Moscow to Paris
doesn't have any passengers on it,
Istanbul to Tokyo doesn't
have any passengers on it,
but I still get those rows back as well.
And likewise there's an equivalent right
join that selects all of the things
from the right table, even if there
are no matches on the left table.
Where the left table is just
the one that comes first.
So those are different
types of joins and those
can be used in order to help to combine
data coming from different places.
Couple other things we'll say about
SQL, and then I'll take a brief break.
So joins.
We talked about
different types of joins.
Another thing that's
frequent in databases,
especially as databases
get large, are indexes.
And when an index is, is you can think
of an index like an index in a book,
for example.
It's an easy way that if
I'm looking for something
I can quickly reference something else.
And so what you can do in SQL is
add an index to a particular column.
To say that if I'm going
to be referencing flights
by their origin a lot,
it might be helpful
if I had an index of the
origin of all of my flights.
Some way that I could very
quickly say, you know,
if I want to look up all
the flights from New York,
I just go to New York in my index and
look up all the flights from New York.
And make that faster.
So I can create an index
on an individual column,
and the syntax for that
is just create index,
followed by the name of the index--
which can be an arbitrary name--
and then what table and what column
you want to create an index on.
And what that will do
is it will speed up
any time you try to select from that
table, based on that particular column.
So if we have this way
of creating an index,
that just makes it easy to look up
a particular value from a column
and go to it immediately--
or go to it much more quickly than
you could otherwise-- why wouldn't we
just create an index on every column?
I mean, can anyone think of
reasons why you might not
do that just every column?
Yeah.
AUDIENCE: You said creating
a new table under the hood,
with the [INAUDIBLE] being the
column that you mentioned, would it
take a lot space maybe?
BRIAN YU: OK, great.
So space is one potential concern.
That creating the index
does, in fact, take up space.
And the other potential concern is time.
That while creating an
index on the origin column,
for example, of my flights
table, might make it faster
to look up a flight by its origin.
If I want to look up New York,
all flights from New York,
I can easily look that up.
What it will do is
actually slow down things
like updating data or inserting data.
Because If I now insert
a flight from New York,
for instance, it's not
as simple as before.
I can't just insert the
flight into the table.
I need to insert the flight into the
table, and then go into the index
and update the index so that
if I look for New York again,
I'll be able to find it more easily.
So it's a trade there and
just one to be mindful of.
That as data gets big, it can often be
helpful to add an index to your table
in order to make it faster
to look up that data.
But the trade off, the cost
there, is A space, but B,
some time in terms of the time
it takes to update that data
or insert that data as well.
One final thing we'll take a look
at before we take a brief break
are nested queries.
When we try to combine multiple
queries together into one,
beyond just what we've seen before.
So we saw before we had a flights
table with each column, ID, origin,
destination, and duration.
We had a passengers column where each
passenger has a name and a flight
that they're associated with.
So if we look at a
query like this, this is
a relatively on the more sophisticated
side of queries that we've seen so far.
Select flight ID from passengers.
Group by flight ID having
count star greater than one.
Any guesses as to what
that query is going to do?
It's all syntax we've seen before,
but take a moment to look at it
think about it.
What is this query going
to do when I actually
try to run this on the passengers
table which is right here?
So each passenger has an
ID, a name, and a flight ID.
Thoughts on what will happen, yeah?
AUDIENCE: [INAUDIBLE] select
passengers with more than one flight.
BRIAN YU: Good thoughts.
Selecting passengers
with more than one flight
it's very close, although not quite.
Other ideas?
AUDIENCE: When the-- they want
flights that have multiple numbers.
BRIAN YU: Exactly.
So what we're doing, since
we're grouping by flight ID,
we're saying take all the passengers
that have flight ID number
one, put them here.
Take all the flight passengers that
have flight ID two, put them here.
Group them based on their
flight, and then count them up.
And so in particular,
here, we're going to only
be selecting flight IDs when that flight
has more than one passenger on it.
And so the result that I'll get is a
table that looks something like that,
because flight IDs one, two and
six are the only flights that
have more than one passenger on it.
But what if then, so now
I've created this syntax,
select flight ID from
passengers grouped by flight ID
having counts star greater than one.
Which is this long syntax that
means get me all of the flight IDs
for flights that have
multiple passengers.
What if now I wanted to select
all of the flights themselves
that have multiple passengers?
I want to select from
my flights table, but I
want to select from the flights
table only on the flights that
have more than one passenger.
Well, I can begin to nest these
queries within each other.
And this is where SQL starts to
get a little more complicated.
I can say, well, if this is the query
that gets me all of the flight IDs
that have multiple passengers,
well let's try this query instead.
These two lines are the same
but wrapped in parentheses.
But now I'm going to select star from
flights where the ID of the flight
is in the result of this query.
So this query, these bottom
two lines in parentheses, those
were the lines that were
saying get me all the flight
IDs that have multiple passengers.
And this top line up here
is saying select everything
from flights, all the
columns from flights,
as long as the ID of the flight
is in the result of this query.
Where this is the result of the query.
So what's going to happen is if I
select flight star from flights where
ID is in the result of this
query, these are the rows
that get selected, one two and six,
because those are the flights that
have multiple passengers on them.
So if this starts to
get confusing, I usually
recommend just breaking it
down into one query at a time.
First take a look at these two lines and
make sure you get a sense for why is it
that these two lines are giving me this
table, the one that has just the flight
IDs with multiple passengers on them.
And once you have that in mind,
if the result of this is just one,
two, six, then this big
query is really just saying,
select star from flights where
ID is in one, two, and six.
Where the ID is one of those three,
where we got those three based
on running the second query.
And so SQL allows you to
nest these queries together
to begin to do more complicated things.
To combine multiple queries
into one, to get really
just the data that you care about by
relating multiple tables together.
We'll take a quick break, and when we
come back, we'll talk a little bit more
about SQL and dive in to how we
actually use this inside of a web
application using flask.
OK, welcome back.
So right now what we'll do is we'll take
a step back from the syntax of the SQL.
We looked at how we will go about
creating tables and inserting rows
into tables.
Selecting rows from tables using a
whole variety of different syntaxes,
including joining
multiple tables together.
And we'll talk about a couple
of security considerations that
should be going through
your head if you think
about how to interact with databases.
And after that, we'll
dive into actually using
SQL to build them into the
flask with applications
that we started building
last week, and see how can
we can use SQL to leverage them
to be quite a bit more powerful.
So the first thing that we want
to talk about is SQL injection.
And so the context for this is imagine
that you're running some web site,
and that web site
allows users to log in.
You've got some form
on your website that
has a user name field and a password
field, as many web sites do,
that just let the user type
in the username and password
and log themselves in.
Now start to think, once a user submits
that form, what sort of SQL command
might you be running in order to verify
whether or not that user is actually
successfully liking it?
How might you go about doing that?
If you had a, imagine you had a table
that had at all the users in it.
That had usernames and passwords
for instance, what type of query,
and what would it look like?
Yeah, it's going to be
some kind of select query,
where I'm probably
selecting from a database.
In fact, it might look
something like this.
Select star from users
where username is-- and I've
highlighted this in yellow to mean
this would be just substituted
in for whatever they type in--
username is their username,
and password is password.
And so you might imagine that if
the user typed in their username
and password correctly,
well then there's
probably going to be a row somewhere in
that table that has a matching username
and has a matching
password, and therefore
the select will return that row.
And if the user didn't
input valid credentials,
either because the user name
didn't exist or the user name did
exist but they typed in the wrong
password, then the result of this query
would be that no rows got returned.
But of course, in reality,
the query probably
doesn't look exactly like this.
For security reasons,
most databases will not
store the actual password
of a user and will instead
store a hash of the password.
But more on that a little later.
But you imagine that the query is going
to look something along these lines.
So if a user comes along, they type in
username of Alice password is 12345.
Then the query that ultimately gets run
is going to look something like this.
Where username and password are
substituted in for Alice and 12345.
So I'm selecting from the users
table where the user name is Alice
and the password is 12345.
So if Alice is indeed in the
database and her password is 12345,
then this select query
will return that row.
And if either of those things are
wrong then it won't return a row.
But now imagine what happens if a hacker
is trying to hack into my website.
Is trying to do something
malicious and guesses
that maybe the SQL code that I'm
using looks something like this.
Maybe I'm running a SQL
command, like select star
from users where user name is something
and password is something else.
And a clever hacker might be
able to do something like this.
Where in the user name
they type in hacker,
or whatever user they want to
get access to, and the password
they do something strange.
They type one single
quote or single quote
one single quote equal
a single quote one.
Now this seems sort of strange.
Why would anyone ever type
this in as the password?
But let's go back to
the SQL query and look
at what actually happens now when we
try and take this username and password
and try and run it through our database.
Well, what will happen is we select
star from users where username is
the username and password is password.
And if we plug-in
hacker as the username,
and one single quote or single
quote one single quote equals one.
Well, the result of this
as interpreted by SQL
might reasonably be, select star
from users where username is hacker,
or whatever username the hacker
is trying to get access to.
And the password is one or the
string one equals the string one.
Well of course the string one equals the
string one, those are the same thing.
So it doesn't even
matter what this password
is, it's still going to return back
to the user with that matching row
even if the user didn't
get the right password.
Which in this case they didn't.
And so this is an example of
what we call SQL injection.
What happens if we're not careful
about the way that we run our queries
and we let some user potentially
input arbitrary SQL commands.
They can potentially take over
what secret code is running just
by allowing the user
to type something in
and using that input directly
inside of a SQL query.
And so in this example
SQL injection can be
used to get access to
someone else's user account.
But you might imagine an even
more malicious situation.
You could imagine that we
talked before about how
if you do like an update something
from the particular table,
and you don't specify
what's being updated.
It could just update all
of the rows in the table,
or likewise you can delete
all the rows of a table
or delete a table altogether.
If you let users just
type something in and you
take whatever they type in to
a form and run it as SQL code,
then a user could theoretically type in
whatever SQL they want and manipulate
your data however they like.
Delete your data however they like.
So this is a pretty
significant security concern
that you want to be thinking about.
And so what's the solution to this?
Well, in SQL you can do what's
called escape particular characters.
So if the password actually has
a single quotation mark in it,
you don't want it to be interpreted
as the end of the password itself.
You want it to be interpreted as
the single quotation mark character.
And in SQL you can do that by, I
think it's doing two single quotation
marks in a row back
to back, just to mean
this should be an actual
single quotation mark, and not
the end of the password field.
But ultimately, the long
story short of this,
is that we want to
come up with ways that
allow us to avoid this type of problem,
rather than just directly substituting
user input into the query.
We want to be intelligent
about it, and make
sure that we are escaping
or sanitizing our input
as it might be called in
order to make sure that we
don't run into situations like this.
And so in a moment, when we transition
into looking at actual Python code
and how we can use that Python
code to be able to run SQL queries,
this is something we're
going to be keeping in mind.
Because if the user is
ever typing in information,
we want to be careful not to just
plug it directly into the query,
we want to make sure that we don't
run the risk of SQL injection there.
But questions about SQL injection
and why this has the potential
to lead to a security risk?
OK.
One other thing related
to SQL that's worth
talking about, especially with security,
is the idea of race conditions.
And so a race condition comes about
when there's a possibility that you've
got a database somewhere
that's potentially accessed
by multiple different
people, and you run
the possibility of things
going wrong if people
try to do things at the same time, and
operations execute in a funky order.
And so what do I mean by this?
Well, imagine that I've got a database
that represents a bank, for instance.
Where I've got a bank database where
every row is an individual person who
has an account at my bank, and we have a
column representing how much money they
currently have in the bank.
So imagine that I've got
some data in my bank account,
and the data is storing that right now.
This particular user has
$100 in their bank account.
And the person goes
up to an ATM somewhere
and they try to withdraw $100.
Well if you are running
the bank's database,
and the user tries to
withdraw $100, what's
probably the first query you
should run on this database?
Doesn't have to be a precise
query, but what would you
want to know from the database if
a user says please withdraw $100?
AUDIENCE: Do you have $100?
BRIAN YU: Do you have $100?
Great question.
So what might that look like
in terms of a SQL query?
SELECT balance from bank
where user ID equals one.
Here I'm assuming I've
got a table called bank
that's got a column called
user ID, representing the user,
and a column called balance.
So imagine that I have that.
So imagine that this current
user ID is user ID number one,
and I want to select a balance from the
bank to make sure that whatever balance
I get back it better be
at least $100, otherwise
I don't want to let you
withdraw $100 from the bank.
So I run the query, and right now
in the bank account I have $100.
So the SQL query gives me
back 100, we're all set.
Next thing that happens is
that I actually get the $100.
But what query needs to
run now on this table?
AUDIENCE: Updating.
BRIAN YU: Updating, exactly.
So I want to update the bank.
Setting the balance equal
to balance minus 100
where the user ID is equal to one.
So I update the bank balance, I
want to decrease the balance by 100.
That's what the set is doing.
And I don't want to decrease
everyone's balance by 100,
I only want to decrease by
100 for user ID number one.
And so I get the $100 back and my bank
account balance goes down to zero.
And that interaction seems fine.
I selected to make
sure I had $100, then I
updated the bank to make sure
that it reflects my new balance.
But what now happens if two
people, for some reason,
are going to different ATMs,
but they both have my ATM card.
They're both using the same ATM card
and they're going to a different ATMs.
And they're doing this at the same time.
And they're both trying
to withdraw $100.
What happens?
So you might imagine that if they
both press withdraw at the same time,
same thing will happen on the first ATM.
right, I need to make sure
that I actually have $100.
Select balance from bank
where user ID is equal to one.
Do I actually have $100?
Well Yeah, I've got $100.
But then person over here on ATM
number two is doing the same thing.
They're running the same query.
They're also selecting a balance from
the bank where user ID equals one,
and maybe this is happening
a split millisecond
after that first select query happened.
But it doesn't matter,
there's still $100 in the bank
and so this still gets me back 100.
So what happens now over
here on side number one?
Maybe right after this happens
and right after that happens, now
we're running our update.
We update the bank, set the
balance equal to balance minus 100
where user ID is equal
to one, same as before.
And same as before, I get the $100
back and my bank account balance
goes down to zero.
But then the problem comes
in over on this side.
Because I've just, we decided that the
way to withdraw money from the bank
was a 2-step process, right?
Number one, select.
Make sure you've got $100.
And if you do, then fine,
update in order to subtract $100
and then give the user $100.
From the perspective of ATM number two,
I've just selected and I have $100.
So we're all set.
ATM number two thinks it's
totally fine to withdraw.
ATM number two runs update the
bank, decrease the balance by $100
where the user ID is number one.
ATM number two gets back $100 and the
bank account goes down to negative 100.
But now from a bank account with $100,
I've managed to extract $200 from it
by taking advantage of race conditions.
By running things simultaneously.
And even though the logic of
select, make sure I have the money
and update feels like
it's the right logic.
When you consider what happens
when there's concurrency,
when multiple things are
happening at the same time.
These are the sorts of concerns
that begin to take place.
So what's the solution to this?
How do we solve the problem of
dealing with multiple people that
could be using the
database at the same time
and trying to prevent
something like this
where someone can get $200 from the ATM
when they only have $100 in the bank?
AUDIENCE: Transactions?
BRIAN YU: Transactions.
Exactly.
And so the idea of a
transaction is that I
want to, or the idea of its transactions
and the idea of locking the database.
That I want to lock the
database first, to make sure
nobody else can touch the database
while I'm running a transaction.
A sequence of SQL commands.
And so that way, I can say, all
right, I'm starting a transaction now.
Nobody else touch the database.
Let me select the balance and
then let me update the bank,
and now that I'm done with the
transaction now you, ATM number two,
feel free to go ahead.
And then everything's fine.
Because when ATM number two gets a
chance to go to the bank account,
it's zero.
And so the select query is going
to return a zero instead of 100.
And so the syntax for that in SQL
transactions are, if you type begin,
followed by semi-colon,
that begins the transaction.
And then all of the
commands that you run
are executed effectively
together as a unit.
And then at the end
of the transaction you
say commit, to say
these are the commands I
want to run, go ahead and run them.
And so that is a way that we can
begin to group these things together.
So the idea of race conditions,
the idea of SQL injection, these
are just good security
considerations to be
taking in mind as we start to approach
designing databases that are ultimately
going to go into our web applications.
But now how do we actually
take these databases
and put them into our web applications?
Last week, if you recall,
we explored flask,
which was a micro
framework written in Python
that let us pretty easily get a web
application up and running in Python.
That had a whole bunch
of routes that we were
able to use in order to direct
users to different pages,
depending on what site they visited.
What we'd like to do now is bridge
the gap between Python and SQL,
to allow our Python code to be able
to run SQL commands on databases.
And so what we'll be using for that is
a popular library called SQL alchemy.
SQL alchemy is a Python library that
is used to connect Python and SQL.
To connect these two worlds
and to allow Python code
to be able to run SQL queries.
And so we'll take a
look at that right now.
So let's imagine that I want
to run, and for now we'll
start with just command line programs.
Programs written in Python that
don't yet live on the internet,
but in just a moment we'll transition
to looking at web applications as well.
But imagine for now I just
want a Python program that
prints out all of the flights that
are currently in my flights table.
What might that code look like?
Well, let's go ahead
and look at list.py.
as an example of this.
And so what we have up here
are a couple of import lines.
So import OS is just importing
an operating system library
is going to have a useful feature
that we'll see in a moment.
And then I'm importing a bunch
of names from SQL alchemy.
Which is the library that we're going
to be using to interact with databases.
Create engine, scopes
session, and session maker.
I'll briefly describe
what these are doing,
but don't worry about
them all that much.
What's happening effectively is on line
six, I'm creating a database engine.
The engine is, just think of it as
an object created by SQL alchemy--
this third party library that
we're going to be using--
that is used to manage
connections to the database.
The engine is going to take
care of, for us, the process
of talking to the
database and making sure
that Python is able to send
commands to the database
and get results back from the database.
Instead of us needing to
worry about that database,
the engine is going to
take care of that for us.
So on line six I say create engine.
And then here, I have
os.getenv database URL.
Last week, if you
recall, we talked briefly
about the idea of environment variables.
Variables that exist inside
of my terminal environment.
We saw that for a setting
like flask app or flask debug.
The environment variable
in this case is just
going to be some variable
that's set in my terminal.
The same place where I'm running
flask or running this program.
And in this case, I'm specifying in an
environment variable the database URL.
The URL of where my database lives.
In my case, that URL is
local host, my own computer.
But if you've got a database that's
living somewhere else on the internet,
that might actually be a URL.
And when you start working
on the next project,
and start working with SQL alchemy,
we'll give you one of these URLs
that you can use as
your database online.
On line seven now, what we're doing
is we're creating a scope session.
And so this is less relevant
for this particular example,
but you imagine that once we take
our web application to the internet
and we have multiple people
that are simultaneously
trying to use our
website at the same time,
we want to make sure that the stuff
that person A is doing with the database
is kept separate from the stuff that
person B is doing with the database.
And so creating different
sessions for different people
is just a way of managing that.
But ultimately now we have this object
called d-b which is going to be the one
that we're really going to care about.
Because that's what's going to
allow us to run SQL commands.
And this code will all be
included for you in the projects,
you won't need to worry
about rewriting that.
So what's happening in the
main function of my code now?
Here's what's happening.
On line ten, I say flight.
I want a query for all of my
flights, so how do I do that?
I say db.execute.
And then in quotation marks.
I can specify whatever
SQL command I want to run.
We just talked about SQL
syntax, here, just as a string,
I'm going to feed in SQL
syntax into db.execute to say
I want to run all of this SQL code.
In particular, I'm selecting origin,
destination, and duration from flights.
Those are the three columns I want
to select from my flights table.
I want to select them all.
And then at the end I'm doing this
fetch all query, which is just to say,
run the query and get
me all of the results.
There is likewise a fetch one function
that just gets me back one row,
of I only care about one.
But in this case I want to
get back all of the flights.
So I'm saying db.execute.
Here is the SQL command I
want to run, the select query
we saw just a moment ago.
And then fetch all says
get me back everything.
So now, I have inside of
this variable flights,
a list of all of the individual rows
that came back from that SQL query.
And so now on line 11
I'm running a loop.
For flight in flights, I'm going
to print out this formatted string.
Flight dot origin,
that's going to get me
the origin of this individual flight.
Because I can use each
individual flight, each row,
and to get in a particular
column, I use the dot notation.
So flight is the name of one
of those individual rows.
To get just the origin of that
row, I use flight dot origin.
So I'm printing out flight dot
origin to flight dot destination.
And how long is it?
Flight dot duration minutes.
And that's all that's happening.
I first run a query to say select me
all the data from the flights table.
Then I run a loop for each
individual flight that I get back,
printout that result. And so now if I go
ahead and run list.py, what I get back
is one per line.
New York to London 415 minutes Shanghai
to Paris 760, so on and so forth.
Giving me all of the data coming
back from those individual flights.
Questions about the syntax
of how that code worked?
That was me using
queries to select, yeah.
AUDIENCE: Does it
always return as a list?
[INAUDIBLE]
BRIAN YU: Does it
always return as a list?
If I use this fetch all command
that is how I get it back as a list.
Fetch all says, get me back
all the results from the query
and give it back to me as a list
of all of these individual results.
Good question.
So that was selecting
data from my database.
One thing we might also want to do
using Python code is importing data.
Inserting data into the database.
Because if you remember
from before, when
I wanted to insert flights
into my flights table,
I had to use syntax like this.
I had a whole bunch of these
insert queries, each one of which
was an origin, destination, value.
Here are the values of
the things to insert.
But generally speaking, if I have
a whole bunch of flights to insert,
it's not going to be nicely formatted
like this, already in SQL query form
ready to be inserted into the database.
Likely, it's just going to
look something like this.
This is flights dot CSV.
CSV stands for comma separated value,
just an easy way to represent data.
And I'll likely have data
that looks like this.
That's just one row for each flight.
And instead of it being formatted
nicely as insert into something,
it's just the origin comma
the destination comma
and number of minutes.
And that's all I have to deal with.
So if I just give this to
SQL, SQL doesn't immediately
know how to understand
the CSV, that you can
tell it to try to import it from a CSV.
So what I might want to do is write
Python code that reads this CSV file,
and takes each row and
inserts it into my table.
So what might that look like?
Let's go into import.py and
look at what's happening here.
These first couple of
lines are really the same.
But here is the interesting activity.
And again, this code
will be made available
after, so if you're interested
in importing data into a table
as well-- which you'll need
to do for project one--
you can use this as a reference or a
guide for how to go about doing that.
So on line 11, I'm opening
up this flights.CSV file.
That's the file that contains all of
my flight data so I'm opening it up
and I'm calling it f.
And Python has a built in module
called CSV specifically meant
for reading and writing CSV files.
And so I'm going to use CSV
dot reader, which I only
know about because I've looked up
CSV reader's documentation online.
And I'm saying, I want to
read file f as a CSV file.
And now line 13 what I'm saying,
is I'm looping over this reader.
I'm looping over every single line
that the reader is going to read.
And for each one the three columns
are origin, destination, and duration.
So I loop over the reader, and I'm going
to for each row, call the first column
the origin, call the
second column destination,
call the third column duration.
And for each one of those
rows what do I want to do?
Well I want to do a db.execute.
But instead of a select
like I did last time,
I want my db.execute to be an insertion.
And so what is this insertion look like?
I'm going to insert into flights
origin, destination, duration.
Values.
But notice that when I'm at the
time that I'm writing this code,
before I look at flights
dot CSV, I don't yet
know what values I want to
insert into my database.
I don't know if the flights coming
from Lima or Shanghai or wherever.
All I know is that I want
to insert some origin,
some destinations, some duration.
And I want to fill in
those blanks later.
And so SQL alchemy supports the idea
of placeholders for this purpose.
Where I can say values and
colon origin, means that this
is a placeholder for the origin.
Meaning something is
going to go in that spot,
but I don't yet know what it is at
the time that I'm writing the code,
I only know when I run
the code and actually read
that flights dot CSV file.
So colon origin means this is
a placeholder for the origin.
Colon destination means this is a
placeholder for the destination,
and colon duration means this is
a placeholder for the duration.
And then to fill in those placeholders,
what I do immediately after it,
this could be one line but I've broken
it up into two just for clarity.
After a comma, I'm providing a Python
dictionary, which is this right here,
that tells the query what to fill in to
each of those individual placeholders.
So in the origin
placeholder, fill in origin.
Where this origin is this variable over
here, that I read from a row of my CSV.
And likewise, I'm filling
in the destination
placeholder with this destination.
And the duration placeholder with
whatever the variable duration is.
And if it makes it clear, I can just
call this like O and destination
and duration, just to
signify that these are just
variable names, the
things after the colons.
They're whatever variables I'm using.
But these things in quotation marks.
These are the placeholder
values that I'm filling in for.
This origin, destination, and
duration, correlate with this origin,
destination, and duration.
Those are the placeholders.
And then immediately
after it, I specify,
all right, now fill things
into the placeholder spots.
And then afterwards I'm
just printing a line
that says, I added a flight from
origin to destination lasting duration
minutes, just so I can keep
track of that for myself.
And then at the end, SQL
alchemy automatically does
this kind of transaction thing for me.
It automatically says, you
know what, let's assume
you want to group all
your actions together.
And then at the end you better just
say you want to commit those changes.
And so at the end then you need to
tell my database db.commit, meaning
save the changes that I just made.
I didn't need to do that
last time, because I was just
reading data from the database, I
didn't actually update anything.
But now that I'm inserting
rows into the database,
SQL alchemy wants me to at the end
of all my inserts say, all right,
now I'm done.
Commit these changes, actually run them.
This is similar to the commit
that we saw a moment ago when
we were talking about race conditions.
So let's try and run that now.
If I try and, so right now
if I run Python list.py,
that gives me the six flights that I've
had inside my database this whole time.
But now if I run Python import,py,
what that's going to do is it tells me,
all right, I've added all
these individual flights.
Paris to New York lasting 540, Tokyo
to Shanghai lasting 185 minutes.
And these are all the flights
from this flights.CSV file.
Where each row corresponding to
a flight that I wanted to insert.
So now if I run Python list.py again,
what I get is a longer list of flights.
Because now I've added all these
additional flights to the database.
Questions about that import process
or how that works from the CSV file?
Yeah.
AUDIENCE: Is this synchronous?
Is SQL synchronous or asynchronous?
BRIAN YU: Good question.
Is SQL synchronous or asynchronous?
In what context do you mean?
AUDIENCE: Like if you
did this DB commit,
you didn't do the transactional
strategy with other things.
If you a really long SQL
query and then someone
did a really short one,
if you start that query,
will no one else be able
to access your database?
BRIAN YU: Oh good question.
So what happens if you are, you
have a lock on the database.
You're running things but someone
else is running something short.
Generally speaking, if
you've locked the database,
nobody else can get access to it.
So they cannot get in there while
you're in the middle of running those
individual queries.
Repeat that.
AUDIENCE: If you don't lock it?
BRIAN YU: If you don't
lock it then there
is no guarantee on what
order things are happening.
And so something else
could theoretically
come in and run commands as well.
Yeah.
AUDIENCE: So once you do the dot
execute, does that lock the database?
BRIAN YU: Good question.
So what happens after I run
db.execute relative to db.commit?
I don't actually start running these
commands until the db.commit happens.
What happens is that effectively
SQL alchemy is keeping track
of all the changes I want to make.
And it's not until I run
db.commit that we actually
go ahead and run the actual queries
in the database to make those changes.
Good question.
So this is the code that
will allow me to import.
The result of that is that I have
imported all these flights here.
And so now let's look at something
a little more complicated.
Let's look at an example
where I want to do
a combination of multiple
different types of queries
all happening within
the same Python program.
We'll just look at this briefly.
This is passengers.py.
And what this is going
to do is it's going
to give me a text based way of looking
up what passengers are on a flight.
And so I'll just briefly go over at
a high level what's going on here.
And then you can look at
this in more detail later.
But effectively, what's happening,
is that in my main function
the first thing that I do
is execute this command.
Select ID, origin, destination,
and duration from flights,
meaning get me all the columns
from the flights table.
Save it inside of this
flight's variable, this list.
Then loop over that list.
And for each row in my table,
for each element in the list,
go ahead and print out dot flight data.
Flight ID, origin to destination,
lasting duration minutes.
So that will do much like
we did in the list example,
just print out one thing per line.
But now let's imagine
we want a program that
lets a user say, all right,
you know flight number
five, tell me who's on it.
So what happens next is I
can prompt the user for input
using that input function, and
convert that input to an integer.
Just so it's a number.
And save that result as
a flight ID variable.
So I'm just going to ask the
user type in a flight ID.
I type in flight ID 28, or whatever.
And that saves it inside
the flight ID variable.
And now I'm going to run
another execute query.
Which is that I'm going to
select origin, destination,
duration from flights, where ID
is colon ID, this placeholder.
Where that placeholder is going
to be filled in by flight ID.
And here I'm just fetching one thing.
And so the long story short of
what that's doing, is it's saying,
all right, whatever
number you typed in, let's
just query the flights table again
and make sure that there actually
is a flight with that ID.
Right?
If I type in flight ID 255,
and there is no flight 255,
then this fetch one is going to
return back nothing, or none.
Because no flight matched that query.
And so if the flight
is none, if there was
no flight, if it was an invalid flight,
it'll print error no such flight.
And just return and quit.
But otherwise, if the flight
was a valid flight ID, then
I'm going to get the passengers.
So I run a db.execute
that says, select name
from passengers where flight ID
equals whatever the flight ID is,
and get me all of those rows.
So get me the names
of all the passengers.
And then for each passenger
in that list of passengers,
I'm going to print out
that passenger's name.
And if there were no passengers,
I print out no passengers.
So what does that look like?
I run Python passengers.py,
it prints out all my flights
giving me their ID, their origin,
their destination, and how long.
And if I want to know who is on flight
number two, I type in flight two,
and then it says, OK, here are
the passengers on flight 2.
Charlie and Dave.
By running those individual queries.
So first it queried for all the
flights, then I type in a flight ID,
and based on that ID it queried
for the passengers table,
getting me all the passengers
that are on that flight as well.
Questions about how we did
that generally speaking?
OK, so we've gotten a
sense for a SQL alchemy
and how we can use it to actually
run SQL commands in Python code.
Now let's make the next
leap and say, all right,
how do we take that and merge it in
with one of our flask applications,
so that our web applications
that are living on the internet
that people can go to
where URL to and look for.
How can those be using
SQL databases as well?
So the example we'll take a
look at here is airline zero.
And this is a flask application.
So what we do, is we have a flask
application just like before,
but we also are setting
up a SQL alchemy engine.
And the SQL alchemy session called
db that we're going to be using.
And what's happening
inside of application.py?
Well in my index function,
this default route,
recall that this is
just what happens when
the user goes to the
slash route on my website
and just wants to see the website.
It runs this function.
And here's what happens
in my index function.
First, I query for all the flights.
I say db.execute, select star
from flights dot fetch all,
just get me all of the flights.
And then render the template index.html
where flights is equal to flights.
So I'm going to render index.html
passing in that list of flights.
Because I want my web page to
display those flights, because maybe
I'm making a website that lets someone
register for a flight, for example.
So what's happening in index.html?
Well the first thing to
look at is layout.html.
This is exactly the same as
layout,html from last week.
Just a simple layout
for an HTML website,
where I've got a spot to
put in a title for the page,
and a spot to put in a body
for the page to have as well.
So those are the places where
I can just insert content.
And now let's look at index.html.
What's happening here?
First thing that's happening on line
one, is I'm extending layout.html.
Same as last week,
layout.html is this HTML file
that is the template from which
I want to base this template on.
It wants to look exactly
the same, except I
might fill in something
for the title, and I
might fill in something for the body.
So inside of index.html, now I say, as
the title just make the title flights.
And then as the body, here's
what I want to happen.
There'd be a heading of the top
that says Book a flight and a form.
A form is just going to be a place
where users can fill in information.
When a user submits the form,
let's take them to the book URL.
And we'll take a look at
book in just a moment.
But this is going to be where
I'm going to book a flight.
And the method for that
request is going to be post.
Because I'm submitting data, and
oftentimes when I'm submitting data
I'll submit data via post.
Then the key elements inside
this form, I've into a couple
divs just for organizational purposes.
But the key here is that
I have a select here,
and a select is just a drop down menu.
A dropdown where you click and you
can pick from one of multiple options.
And so I'm going to select class
equals form control is just bootstraps
way of styling.
Recall the bootstrap, which we
looked at back in lecture one,
is a CSS library that
makes our dropdown boxes
and buttons and menus look a little bit
nicer and look a little more modern.
So form control is just a way
of telling bootstrap this is
something I want you to format nicely.
And I'm giving it a name, flight ID.
And I give this select dropdown a name,
because later on in my Python code
I'll want to reference it.
And when I reference it I'll
want to reference it by its name,
in this case flight ID.
What's happening next
on line 14, I'm going
to loop through all of the flights.
Recall that in application.py,
when I rendered index.html,
I passed in flights as a variable.
Meeting flights is going
to be a variable that
exists inside the index.html And
so inside of an x or index.html I
can loop over them in a for loop.
For flight in flights, for each
one of those individual flights,
here's what I want.
I want an option in the
drop down menu where
the value of that option the value
it will take on if I pick it,
is just the ID of the flight.
But the text that's
displayed in the option,
is going to be flight,origin
to flight.destination.
And recall from last week
that the double curly braces
is a way of me signifying
in this templating language,
I want to plug-in a value here.
Flight is the name of
that row of the flight
that I want to create an option for, and
I use dot to get at a specific column.
So curly brace curly brace
flight.origin, means take the flight,
get its origin, and paste that
into the HTML template right there.
And I'm going to do that for each
flight in my list of flights.
After that drop down
I have another input.
This input is going to be
called Name, and it's just
going to be a place for the
passenger to fill in their name.
And then at the bottom I have a
button that's just going to say,
book a flight.
So that's all.
What's that going to look like
now if I go into airline zero
and run a flask run to
run this web application?
It's running at this URL.
And so if I go to this URL now,
what I see is book a flight,
that big heading at the top.
And then I see this dropdown.
Where this dropdown is populated
by the data from my database.
When I click on it, and I see
all of the individual flights
that are inside my database.
Because I queried for all of them,
and then I looped over each one
and made each one an option
inside of my web page.
And if I look at the page
source of this web page,
I can see that I've got right here,
option value one, New York to London,
option value two,
Shanghai to Paris, where
the value is the ID of that flight.
And then the text of it is just
the origin to the destination.
It looped over each one of
those individual flights,
filling in to those placeholder values
the values that I actually cared about.
And so if I wanted to book a flight,
I'd go ahead and click on the flight
that I want to book,
maybe New York to Paris.
I type in my name, Brian,
and I press book flight.
And it's a success you've
booked your flight.
In fact, if I check that passengers
file that I created before, and I check,
you know, here is the New York to
Paris flight, flight ID number four.
If I type flight ID
number four, who is on it,
I now see that I am now
on that flight to Paris.
So how did that happen?
How did that book route work?
Because what happened is I
submitted a form and suddenly
it said success, what was going
on behind the scenes there?
Let's take a look at that.
So for that we're going to need
to go back to Application.py.
Because if you recall from index.html,
when this form gets submitted,
what happens is I submit
a post request to the URL
for the function called book.
And so let's look at the
function called book now.
I have a function called book,
the route is just slash book,
and it only accepts post requests.
So what am I going to do here?
The first thing that I'm going
to do is use a request.form.get,
which is my way of
extracting information
from the form the user submitted.
Request.form.get name, says
take the thing in the HTML form
that had a name of name, and
just get me that information.
So I get the name and save
that in a variable called name.
Now, this is a special syntax
in Python called try and except.
This is what I do when there is a
possibility that something could
go wrong, that there might be an error.
But if there's an error, I don't
want Python to just throw an error
and have my whole website crash.
I want to handle that error
and deal with it in some way.
So what am I trying to do here?
Well I'm saying take request.form.get
flight ID, whatever that flight ID was.
And recall that in index.html
this select dropdown menu,
I gave a name of flight ID.
So that's how application.py knows
this is the select Dropbox that I'm
referring to.
So get that result and try
to convert it to an integer.
If there's a value error, for
some reason, in other words,
if for some reason someone tried to
book a flight not to flight ID one, two,
three, or four, but to flight
ID Foo, or just some word,
something that's not an integer.
Rather than have my whole website
crash, which would probably be bad,
let's return something nice.
Let's return error.html
saying, invalid flight number.
Where the message is
invalid flight number.
And all error.html is, is nice error
page that also extends layout.html.
The title just says error,
there's a heading that says error,
and then I fill in using
those double quotes whenever
the message happened to be.
And so for a generic way to
represent an error message,
I can create this error.html, and if
I want to represent any error now,
I just need to return the
template error.html saying here
is the message to fill
in to the error page.
In this case, invalid
flight number, for instance.
So now I have the name,
now I have the flight ID,
and now all that I need to do
is run a couple SQL queries.
So if you tell me that I want
Alice to book flight number five,
the first thing I'd probably want
to do is make sure flight number
five actually exists.
Make sure flight number
five is a real flight.
So let's make sure the flight exists.
If db.execute.
So I'm running another SQL query.
SELECT star from flights where
ID is plug-in an ID here,
what ID do I plug-in?
I want to plug-in flight ID, that
variable that I extracted earlier.
Dot row count is a special
SQL alchemy feature
that just says, you know, how
many rows did you get back?
How many rows were
returned from this query?
And so dot row count equals equals
zero, it's just going to say, all right.
Well that means no flight, zero
rows had an ID of this flight ID.
So what should I do if that's the
case, if there were no matching rows?
Well let's also again,
return error.html.
This time saying message no
such flight with that ID.
There is no flight that has that
ID, sorry you can't register for it.
Now, finally, we've made sure
that the flight ID is an integer.
We've made sure that flight ID exists.
Last step is to run that insert query.
To say, we've got a person, we've got a
flight, let's put them into the table.
So I run db.execute.
Insert into the passengers table.
What columns do I want to insert?
Well, I want to insert their name
and I went into their flight ID.
What values?
Well I didn't know yet at the time
of my writing the web application.
I don't know who is going to
register for a flight, what
their name is going to be, what
flight they're going to register for.
I just know they're going
to register for something.
And so I put these
placeholders here, and then
I substituted in name for name,
and flight ID for flight ID.
And when I do this placeholder
syntax, as a good note,
SQL alchemy will take care
of for me automatically
the problem of escaping the characters
that might cause race conditions.
So, so long as I'm using
this placeholder syntax,
and then using this dictionary
to say plug-in these values,
SQL alchemy will take care of avoiding
those race conditions by escaping
any potentially dangerous characters.
As opposed to, if I just did
values plus names to just join
multiple strings together, that would
not be safe against race conditions.
Because if there were a single quote and
some other potentially malicious syntax
in the name, plugging it right
into the query just like that
would result in me running that query.
So you want to be careful never
to directly run potential user
input into the database unless you're
either sanitizing the input yourself,
or you are using this placeholder
syntax in SQL alchemy get stuff.
So we insert the passenger
into the database.
We commit those changes to say, yes,
this is the change I want to make.
And then I return the
template success.html.
And all that success.html
is, is another template
that extends from the same layout,
the title is success, and in the body
I say success.
You have successfully
booked your flight.
And the result of that is that if I
now go back to my web application,
and I say, you know what, let's take
this Hong Kong to Shanghai flight.
And I type in my name
and I book the flight.
It'll say success, I've
booked my flight, and I
I'm now registered for that flight.
So that was a lot of
things going on at once.
We were talking about
HTML, but also talking
about flask and how we go about
passing values into templates
and then using SQL alchemy to be
able to run database commands.
What questions do we have about that?
And how any of those
moving pieces fit together.
Yeah.
AUDIENCE: Should we be using a
try catch for the db.commit too?
BRIAN YU: Good question.
Should you be using a
try catch for db.commit?
Certainly that's not a bad idea.
In fact, it's probably a good
idea from a design perspective.
To put a try catch in the db.commit.
Because if, for some
reason, something goes wrong
as I try to run my SQL commands
and there's a problem that gets
encountered, and that
will throw an error.
Rather than have my website crash
it'd probably be nice if, instead,
it just gave me some sort of nice
error message in response to.
Another thing that you could
also do to deal with that,
is have a global error handler
for your flask application.
So flask supports the idea of,
if there is an internal server
error inside the application, then
return some specific template.
And so if you've ever run
into internal server error
on a website, that's likely
because some part of the code,
there was some error in the code.
Something the programmer
didn't anticipate
that resulted in something going wrong.
And you can have flask say,
you know, if there is an error,
there is an internal server
error, let's handle that
by rendering a specific error page.
So you can do that as well.
But certainly, you could put the
commit inside of a try catch too.
Other questions?
AUDIENCE: Well, you should have a
try catch for the name too, right?
BRIAN YU: You, so, good question.
Do you need a try catch for the name?
You probably don't
need one for the name.
Because even if the user doesn't
type in a name, the result of this
will still just be, it'll just be empty.
So maybe you do want
to handle that case.
Maybe I want to say, if name is
none, or if name is the empty string
if they literally just
typed in nothing, maybe you
want to not let someone register for
a flight if they don't have a name,
for instance.
So that can be something you do as well.
OK.
We'll take a look at one last
example before wrapping up for now.
We'll look at extending
this web application
to allow some additional features.
In particular, allowing
me to see who is currently
registered for a given flight.
Which in reality, is probably
not information you want
made public if you're
building an airline website,
but for now we're just going
to use it as an example.
So before we created a
Python application that
generated a list of all the passengers.
Let's do the same thing with a
flask web application as well.
So let's go ahead and take
a look at airline one.
So inside of airline one, the
index function is the same.
We're just going to select the flights,
display it for people to register them.
Book a flight is the same way as well.
Works exactly the same.
But I've added a couple
of routes here as well.
So let's take a look at this route
real quick and see what it does.
This is slash flights.
So if I go to slash flights on my
website, here's what's going to happen.
I'm going to list all the flights.
And how do I do that?
Well first, I select star from flights,
just select everything from my flights
table.
Fetch all the rows, call that flights.
And now return this
flights.html template
passing in that flights variable.
So what's happening in flights.html?
Well, the title is going
to be flights again.
It's extending from
the same layout file.
I'm using that same layout
file for all my HTML
files to avoid the need
to redundantly repeat
that HTML over and over and over again.
Then inside the body.
I have this heading that says
all flights, and then a UL.
Recalling that UL stands
for un-ordered list.
And I'm going to have an unordered
bulleted list of items, where
each item is just going to be a flight.
So I'm going to loop.
For each flight in my list of
flights, create an LI, a list item.
A bulleted item in my list,
where that item is an AH ref,
recalling that AH ref is a link.
So I'm going to have a
bulleted list of flights,
which just tells me the flight.
And clicking on each
any individual flight,
will link me to, presumably, another
page with details about that flight.
The common paradigm that happens
when you've got a bunch of data,
but you want to click on something
to go see more information
about it in more detail.
And so what's going to happen here?
Well, the link is going
to be itself just labeled
as flight.origin to flight.destination
just like we've been using before.
But where am I going to link to?
If we look back to Application.py,
I have this other route
that's going to be used not
for accessing all flights,
but for accessing data
about one specific flight.
And the route I'm using
for that is slash flights
slash an integer called flight ID.
This is a placeholder that I'm
putting into this route that
could be substituted for any integer.
Recall that in last week's
lecture, we talked about how we
could use this to say hello to anyone.
Where you could do
slash David, slash Brian
slash Maria to say hello to a
variety of different people.
Just based on what gets
passed into the URL.
This is another common
use case, where if I
want to access a URL
for a particular flight,
it would be unreasonable and
impractical to create one separate route
for each individual flight.
I wouldn't even know how many to create.
Instead, I just create a generic one.
Slash flights, slash
an integer flight ID
that is going to represent
accessing slash flight slash two
for the second flight.
Or slash flight slash
28 for the 28th flight.
And inside this flight
function, this is where
all the code for generating details
about that flight is going to be.
So what does that mean for where I'm
linking to in my flights.html template?
Well, I want to link
to the URL for flight,
where flight is the
name of that function.
But I can't just give flight
anymore, because this URL
is a flight function that
also has this flight ID
parameter that's passed in by the URL.
So if I just say, take me to the flight
function, flask is going to complain
and say, wait a minute, I don't
know which flight you actually
care about going to.
So in my URL for I need to tell flask
what flight I actually care about.
And so that's what this is.
URL for flight, comma,
flight ID equals flight.ID.
So flight ID, that is
the name of that variable
that we were using just a
moment ago in application.py.
And flight.ID, well, flight
is this individual row
that I got back from the
flights table from that query.
And to access the ID column of that row,
once again were using the dot notation.
To say flight.ID, is the ID
the thing I want to go to.
And so using this syntax I can say,
take me to that specific flight page.
Now what happens on that
specific flight page?
Well the first thing I do is
make sure the flight exists.
Right.
Select star from flights where ID
is that ID, fetching that one row.
If that row is none,
if no row gets returned
when I try and select
that flight, well there
is no such flight with that number.
So I better return an error
message that says sorry,
no such flight by that number.
But if that flight does exist,
now I want to get the passengers.
So I'm selecting names from
that passengers list where
flight ID is equal to
plugging in that flight ID
there, and getting all the rows.
And I save that in a
variable called passengers.
And now on line 54 I say render template
flight.html, passing in both the flight
that I got back from when I
queried for the flight up here.
Select star from flights.
And also passing in all the passengers.
So now my flight.html knows
information about the flight,
its destination, its duration.
And also its passenger list.
So inside of flight.html,
what's happening here?
It extends layout.html again,
it's got a title of flight.
And in the body I have a heading
that says flight details.
I've got this unordered
list just of information.
Origin is flight.origin,
destination is flight.destination,
duration is flight.duration minutes.
So that's nothing too crazy,
just using that double curly
braces to say plug-in a value here.
And flight was a variable
that I passed in from Python.
Then down here, I have another
heading that says passengers.
I have another unordered list.
And for each passenger
in my list of passengers,
I'm going to create a list item
that is just that passenger's name.
I'm going to plug-in to a bulleted list.
And this else condition
in gender two just says,
the else condition specifically
in this templating language
means that if the passenger list was
empty, if the for loop never ran,
run this code.
In this case, just saying
there were no passengers.
So what happens if I cd into
airline one and flask run this code.
Actually it's already in
use because I'm already
running a flask application over here.
I run flask run again.
We're running on this URL.
So I go here.
This is the book page.
But recall, if I want to go to
where I can see all the flights,
I go to the slash flights route.
So if I now go to slash flights, what I
get is a heading that says all flights.
And then in this
bulleted list, one flight
per line, where each one is going
to be a link to that detail page.
So right now I'm at
this URL slash flights.
But if I were to click on New York
to London, for example, and go there,
now notice my URLs changed.
Now I'm at slash flights slash one.
Because clicking on that
link took me to this route.
Slash flights slash
the integer flight ID.
And I passed in which
flight ID I actually wanted.
so going to slash flights slash
one, tells me here is the flight,
its origin is New York, its destination
is London, is lasts 415 minutes,
and Alice and Bob are the passengers
that are registered for that right now.
If I instead just go to
the URL and change flight
slash one to flight
slash two, for instance.
Now I'm looking at this next fight.
Origin Shanghai, destination
Paris, 760 minutes, the passengers
are Charlie and Dave.
And if I try going to a
flight that doesn't exist,
you know flight 28 for example, when
I don't have a flight number 28.
Now I get an error that
just says no such flight.
And so I've been able to create
arbitrarily many of these URLs
just by using this placeholder syntax,
wherein I can substitute any arbitrary
flight ID into this particular route.
So questions about how
that worked and how
I was able to create detail pages
for all of these individual flights
just by using those routes and querying
for the right data from the database?
So all of this and more
is stuff that you're
going to get to try as you begin
to approach project one, which
will be released later today.
And project one is all about
building a flask application
and using SQL commands in
order to query databases,
in order to read data and write
to data and interact with data.
Building pages sort of like this that
have detailed views for certain items
and so on and so forth.
And so more information on
that to come with project one.
But for now that's SQL.
And next week we'll dive
into even more depth
about how to interact with databases.
And in particular, how to
go about using and creating
APIs, or application
programming interfaces,
that let us interact with data in
even more useful and programmatic way.
So more on that to come.
But for now thank you all.
