[Week 8, Continued]
[David J. Malan] [Harvard University]
[This is CS50.] [CS50.TV]
This is CS50, so this is the end of Week 8 here.
We of course had a bit of a hurricane earlier this week,
so right now it is really just you and me in this lecture hall,
but today we continue our conversation about PHP
and about web programming more generally, and we also introduce the idea
of databases, particularly one called MySQL, which is quite popular these days,
in large part because of its scalability as well as because its being free and open source.
But first, a look at where we left off last time.
Recall that we were looking at the several Frosh IMs examples,
and this was the hideous form that I came up with some 15+ years ago
in order to have students on campus register for freshman intramural sports 
without actually having to trek anymore across the yard to Wigglesworth 
to slide a physical piece of paper underneath some proctor's door.
Instead we moved everything online, but to do that we needed to make use
of a few technologies, so one, we needed HTML, hypertext markup language,
which again is this markup language with which you make web pages structurally.
Using a bit of CSS these days, cascading style sheets,
whereby we use stylizations of the web page using a slightly different syntax,
whereas the HTML was all about the structure thereof.
We also need to introduce a web programming language.
In this case, we'll use PHP, and PHP is going to allow us
to dynamically output content as well as do programmatic things like
sending emails, as was the case on the note we left last week.
Recall that the code for this was in 2 parts.
One, we had froshims3.php, 
and this was largely markup with an HTML form inside of it,
a tiny bit of CSS up here in the style attributes
so that the form itself would be centered on the page, but beyond that
we had some representative form inputs, a text field, a checkbox,
some radio buttons, a select menu, and a submit button.
And via this form, we submitted to a file that was apparently called register3.php,
which itself looked a little something like this.
Now, most of the code in register3.php, recall, was all about email.
It did a little bit of validation of the form that was submitted to make sure
that the fields were actually provided that were expected.
Then we called some PHP functions using slightly new syntax,
even though it's borrowed from C.
This arrow operator allows us to make use of something called object-oriented programming.
We won't go into that in any detail here, but know for now
it's a way of having functions associated with objects,
which are a special type of structure, as we saw in C.
But for now, just take on faith that this is the correct syntax to use
when using a library like this PHPMailer library.
And then by the end of this file we had dynamically generated an email
that got sent to my jharvard@cs50.net account
from my jharvard@cs50.net account,
and we informed the user accordingly that they had been registered for this sport.
That is pretty much what the Frosh IMs site did all those years ago
when I implemented it, granted, in a different language,
but it shows you perhaps the power that you have 
now that you can express yourself not only programmatically 
at a low level in a language like C but at a much higher level
with these very real world applications like email to actually solve 
some real world problems.
Now, of course, even though I use this script to generate some
emails dynamically from jharvard@cs50.net, which is indeed an account
that I have access to, do be quite careful to send
mail only from accounts that are actually your own, 
lest things get you in a bit of hot water in life.
With that said, let's now transition to solving a different problem altogether,
that of retaining states.
Now, what does this actually mean?
HTTP, this hypertext transfer protocol, 
is actually a stateless protocol, and what this means is that
when you pull up something like Google.com and then hit enter
usually your browser has some kind of spinning icon that then
results in some web page being downloaded,
and then that little icon stops spinning, and that indeed suggests
that HTTP has completed some kind of connection to the server and that's it.
HTTP is stateless in the sense that it doesn't maintain
a persistent connection to the server in the same way Skype does
or Gchat does because with HTTP
the assumption is that once you've fetched a web page that's it.
Now, in reality these days on sites like Facebook and Google Maps
and Twitter and the like there's a lot more dynamism  whereby 
even after that icon stops spinning you can in fact get more updates
from the server, more tweets, more status updates on Facebook and the like.
But even that is using a technique that we'll talk about in a week or two
known as Ajax using a language called JavaScript,
but at the end of the day, HTTP is still stateless.
And yet if you want to somehow remember things about a user
even after they've disconnected from your server
PHP does afford you a means of doing this
because, as we saw last time, PHP has a number of superglobals, 
and a superglobal is, again, a special global variable
that's handed to you by the web server and by PHP itself.
You don't have to do anything to put values in it, 
and among the superglobals we've seen thus far are get and post,
which is where form fields are put automatically for you,
as well as a couple of others that we haven't seen yet.
Inside of $_server are some special variables related to the server itself.
What's the IP address, what protocol, HTTP or HTTPS did you use,
what request method did you use and the like, so there's some interesting, 
juicy details about the server, and in fact, the user in there as well.
There's $_cookie, which is where these things called cookies are stored.
We won't spend time on cookies themselves today,
but know for now that a cookie is just a small piece of information
that a web server can plant on a web browser
and in turn its RAM or its computer's hard drive 
to store information about a user, for instance, their user name
so that they don't have to type it every time they log in or some
unique number or identifier for that user 
so that you don't have to pester them with the same kinds of questions about
preferences in the future, but most of interest
right now is $_session.
This superglobal, which, like the others, is handed to you automatically by PHP
when you're writing PHP-based websites
can store anything you want, strings, integers,
floating points, values, arrays, objects, 
really anything that you want, and it allows you to store it in such a way
that even if the user visits you now and then
comes back a minute from now or 5 minutes from now because 
they take their time before clicking some other link
PHP will ensure that whatever you put in that session superglobal
a minute or 5 minutes ago will still be there when the user returns.
And underneath the hood this superglobal is implemented by way of
those things called cookies, but for now, it's just an abstraction
whereby it's sort of the programmatic equivalent of a shopping cart.
Whatever you, the programmer, put in that 
superglobal associative array will be there some number of minutes later
until you delete it or until the user quits his or her browser altogether.
Let's take a look at an example of how this thing is actually used.
In counter.php among today's pieces of code
we have the following line.
At the start of this file we have a bunch of blue comments, which are uninteresting for now.
But in line 13 we have a new line, 
session_start, and that actually does exactly what it says.
It starts sessions.
It enables you to use that big superglobal $_session, and it's as simple as that.
Now, if we proceed to look at line 16, let's try to figure out what this web page is going to do.
If (isset ($_SESSION ["counter"]) then go ahead
and store in the counter variable, lowercase counter,
$_SESSION ["counter"].
This seems to be declaring a local variable called counter
inside of which it's putting a copy of whatever is inside of the superglobal
called session at the location "counter."
Else, apparently, this little local variable counter, is initialized to 0.
But then a few lines later in 26 notice that the session's copy of counter,
its key, has a new value assigned which is its current value plus 1.
In short, this file seems to be updating
a counter that's stored inside of the session superglobal
by incrementing it by 1, but it first retains a copy of the previous value
by storing it in a local variable called $counter,
and then down here let's see what remains.
It turns out it's pretty much just HTML.
At the bottom of this page we see in line 37 that I have visited this site
counter number of times, so there's a couple interesting features here.
One, this is clearly a variable, but it doesn't suffice to just put 
$counter in the body of your HTML because of course
if it's just there among your HTML PHP is going to assume that's just HTML.
You literally want $counter to be printed on the screen.
But instead by dropping into PHP mode
with this piece of syntax we can dynamically insert a value here 
very similar in spirit to what we did last time with
inserting values into strings.
In fact, this is just a shorthand notation for saying something like this literally,
print ($counter) or even something like printf (%s, counter),
or even, as you may have seen online or in textbooks,
there's a function in PHP called echo 
which does the same thing, and all of those are just longer winded ways
of saying <?=.
In this one case you do not need to put
the word PHP after the question mark.
This is shorthand notation for, again, what we just saw a moment ago
which is echoing some value.
Let's see what the end result of this actually is.
Let me go over into our counter.php file,
and we'll see that David just made a mistake by playing with the code there.
Let's go fix whatever he screwed up,
and the error seems to be there, gone, on line 37.
According to the top of this page I have visited this site 0 times.
Well, let's go ahead now, and at the top of the browser click on the
reload icon, and I click reload,
and now I've visited the site 1 time, 2, 3, 4, 5, 6, 7, 8.
And indeed, if we look at the source of this page the actual source code is changing,
and notice the complete absence of any PHP, and that's because 
PHP code is evaluated or interpreted server side,
and so that means that the output of the PHP script is what's ultimately sent to the browser,
which in this case is some raw HTML and some raw text.
What's going on here?
Well, with relatively few lines of code I am able to store
persistently over the course of several seconds, or if we waited long enough,
minutes, even hours, some value in a way that makes HTTP
seem stateful as though we've retained 
this connection to the server, and it's just remembering what I told it last time,
but in reality there's a whole bunch of complexity going on underneath the hood
involving cookies that's enabling PHP to give me this illusion 
of this shopping cart-like feature.
For now, a trivial example where we're just storing an integer,
but that feature will come back to be of great value
when we start talking about more complex projects, 
among them problem set 7.
This is your very last problem set in CS50.
I know, it's so sad, but what you'll find is that we're going to conclude
this part of the semester by actually transitioning 
from the context of C certainly to the context of PHP
but while using some of the very same fundamentals 
we've talked about for some time. 
The objective with pset 7 is to implement CS50 Finance,
which is your own version of Yahoo Finance or Google Finance
or even Etrade.com whereby you have the ability to 
look up stock prices for given symbols, but even more than that
you have the ability to "buy" and "sell" stocks
that are traded on various stock exchanges because as this home page 
here suggests, which is really the extent to which we've begun
the problem set for you, you have a login form that's asking for a user name and a password.
It has a submit button, but thereafter, as we'll eventually see,
there is nothing really going on underneath the hood yet because it
remains for you to implement the ability to register new users,
the ability to buy stocks, to sell stocks,
to actually look up current stock prices.
And indeed, this will be as real world as possible because we did
include a bit of code that will allow you with a single function 
to query Yahoo Finance, which wonderfully makes available free data
for looking up stock prices based on the stock symbol or ticker symbol,
and you get back the current stock price of the day.
The data you're actually seeing in this particular pset will be
about as real world as it can get so that you're actually interfacing
with real world stocks, real world prices,
and we shall see how much money you can make perhaps
over the next several days of playing with your own problem set.
But let's first set the stage for how to design something that's certainly more complicated
than counter.php, that's more complicated than any of the Frosh IMs examples thus far,
and let's try to introduce a few paradigms here that allow us
both for pset 7 and maybe for your final project if you do something web based
to keep your code well organized, to keep yourself sane,
and to take a step toward collaborating, whether in CS50's final project
or beyond if you continue to program something in the future.
There's this general design paradigm 
in computer science and in software development more generally
known as MVC, model view controller,
and this is a silly acronym that describes a very nice idea, 
which is the separation of different aspects of a program,
specifically keeping separate the logic or the business logic of a website
so that anything that involves things like 
calling functions and querying databases and the like
happens not among your HTML
but rather in separate files, and indeed, there's this one file 
typically that you have called the controller
that's really the brains behind the operation, and we'll see an example 
of this in just a moment.
There's a model which is programming code
that does the talking to your databases, that talks to Yahoo Finance and the like,
and then there's the V in MVC, the views,
all of the stuff that's related to aesthetics, the files that actually contain
your HTML, maybe your CSS and the like.
The idea here, as this picture suggests, is that the controller
is the file, as we'll soon see and as you'll particularly see in pset 7,
that the world talks to via their web browsers.
That is the file that gets visited on the public internet,
but the controller talks to potentially a model,
which is one or more other files that contain code related to data,
code related to databases and the like, and then it talks to 
the controller one or more other files known as views,
which are the aesthetics of a web page, templates of sorts,
that might take some data as input, but at the end of the day
the only logic inside of a view should be the rendering of that data,
iterating over a loop and actually spitting out some 
HTML-based rendition thereof or even something like a PDF.
What's nice about MVC is that you can have different views
based on the type of device, based on the type of file format that you actually 
want to show to the user.
Let's take a look at a few progressively more complex and well designed examples
by starting first with version 0 here.
Let me go ahead and open up in our MVC directory today
a file called index.php in directory 0.
Notice this is a super simple and very underwhelming website
that's sort of version 0 of a homepage for CS50,
and notice how we have a link to Lectures, we have a link to Syllabus, 
and if I follow the link to Lectures notice that the URL
up top is going to change to lectures.php.
If I then follow the link to Week 1 notice that the URL changes to week1.php.
There seems to be a fairly simple hierarchical structure here.
Let's take a quick look underneath the hood at how this is laid out,
and indeed, if I look at index.php it's pretty simple.
In fact, even though I called this a PHP file there's no actual programming code.
There's a comment that I wrote here in PHP just so the user doesn't end up seeing it. 
Of course, as before, anything that's in between PHP tags 
gets interpreted, even if it's a comment, and to interpret a comment
means just to throw it away at the end of the day and not actually 
send it to the browser, so everything in here is just aesthetics. 
If I open up similarly lectures.php this too is just a hard coded file.
It happens to be called something .php, 
but it really is just .html, and week1.php, week2.php
similarly are just markup, so there's a bunch of shortcomings of this design. 
One, it's a huge amount of copy/paste.
Even though the only thing that's changing among these files is the unordered list,
the li tags, I nonetheless have doc type, HTML, head, 
title, close body, close HTML and more
in every single file, which means that if I ever want to restructure
this web page or restylize it I have to go in and change
all these files manually or with some massive find and replace.
Let's take a step toward a smarter, longer thinking design in version 1 here
whereby as per the read me that we've included so that you can play along with
these more leisurely at home notice that we have here
a summary of the files in version 1 of this site, 
and it seems that I've taken it upon myself to factor out 
some common code, header.php and footer.php.
Well, let's take a look at what's inside the first of those.
Header.php looks familiar, 
but notice where does it get cut off?
Right after line 19, so that's everything that was common
from the files index.php, lectures.php,
week1 and week2.php from the previous example.
What I did was copy and cut everything that was common to all those files,
put it in a separate header file, and similarly in footer.php
did I apply the same principle whereby the only interesting lines
in footer.php are these two, close body and close HTML.
But what this means now is that in the new version 
of index.php notice how much simpler it can get.
Granted, a bit more cryptic looking, a little less intuitive
to follow top to bottom, but my God, all of that redundancy is now gone.
We require using a PHP function literally called require up top,
which is very reminiscent, recall, of C's #include mechanism. 
We require header.php at the top.
We require footer.php at the bottom, and the only thing that is different 
or special about this file is the content that's meant to be unique to it.
If I then go into, say, lectures.php, the same principle applies.
Again, some comments up top, but then I require header, require footer,
and in between it's only the content that's actually changed.
And if we looked into Week 1 and Week 2 we'd see
that the same principle had been applied there.
Well, we're not quite done there.
Let's take a look at version 2, which has a similar structure,
but notice now I've introduced something else.
In line 10 I've introduced helpers.php,
which apparently contains helper functions.
A helper function is generally a relatively short function 
that you write to help you out in various places,
and let's see what's inside of helpers.php.
In this case, it looks like it has 2 functions.
Recall from the other day with our cube example
you can define your own functions in PHP, and what I've done now is I've 
defined functions called render footer and render header,
the first of which takes a parameter called data,
whose default value is an empty array, as suggested there,
and we can actually write this even more succinctly in the latest version of PHP
by saying open square bracket, closed square bracket.
That means an empty array of size 0 but nonetheless an array. 
This extract function is a little special in that
what it does is it takes as its argument an associative array
that has 0 or more key value pairs, and if you have a key of foo
and a value of bar the extract function 
creates a situation in which now, as of line 11,
you have a local variable called $foo whose value is bar.
And if you had more keys and values in the data array,
similarly would they be extracted into the local scope
or name space so that footer.php and 
the same idea down here so that header.php
have access to those variables.
In fact, let me open up again header.php
and draw attention to now what it looks like in this version.
Rather than hard coding CS50 as the title for every single page
notice the dynamism that's possible now.
In line 5 I'm echoing a title variable,
but first I'm passing that title variable to a function called htmlspecialchars.
A stupid name for a function, long as it is, but it really does what it says.
It ensures that any special characters
in the string that's been passed in are properly escaped HTML.
This is actually a way of avoiding something called a cross site scripting attack
whereby someone can maliciously or accidentally
inject their own HTML into your website
by pasting into some form, for instance, 
something that you weren't quite expecting, particularly JavaScript code,
as we'll talk about in a week or two's time.
This now header.php, it's a view
in the sense that it allows you to view aesthetically the contents of some data set.
But more specifically, it's a template.
This is sort of a blueprint now of what we want the header of every page to look like,
but there's some dynamism in that we want the title to be dynamically inserted
based on the title variable
that was extracted when we called, again,
the render header function.
Now, if we looked at render footer, there's actually not much use of that right now
because in footer.php there's no dynamism whatsoever.
There could be, but at the moment it's a hard coded list of 2 tags, 
but the same idea applies, so that actually suggests why 
did we waste time having a render header and a render footer function?
Let me go instead now into version 3,
and in version 3 in helpers I decided to simplify it even more.
Let me have one render function.
Let me have it take another argument, this time called template,
which is meant to be the name of a template,
and then I will concatenate presumptuously .php to that variable's value,
and then if it exists foo.php, bar.php or header.php and footer.php,
then I'm going to go ahead and extract the variable data
and then require that path.
In other words, to use this now, if I open up index.php
notice that I don't call render header anymore.
I just call render, but I pass in a quoted value of header 
to make clear which template I actually want to load.
Then over here notice what I'm doing.
I'm passing in dynamically a key of title,
a value of CS50, and this too, as we saw before,
could be made more succinct in the latest version of PHP
where I can replace the array function with square brackets,
which I propose is even more readable and certainly 
a little easier to type.
And of course, with the render footer call at the bottom,
we don't bother passing in a second argument at all, no associative array,
because there's nothing dynamic inside of that footer.
It's just some close tags for HTML.
Good, we're taking steps towards really cleaning things up here,
but let me open up 2 final examples.
This one, number 4, notice that I've made a conscious decision now
to improve upon the previous example by finally using some hierarchy to my files.
Notice that in this summary, in this read me, I've introduced
an includes directory and a templates directory
whose contents are going to be the things I want to include
and the templates that I want to render, respectively.
This is really me being anal and trying to keep things tidy, 
keep related files together, but the end result 
is that we now have a slightly tidier setup, but we have to remember now
in, for instance, index.php 
when we require the file helpers.php
we have to now require it via includes/helpers.php
rather than just saying helpers.php because now it's actually in a sub directory.
Now, as an aside, you'll see in these examples and some others
functions like require, require once. 
There's actually a function itself called include, and they all have slightly different behavior.
Here I say require once to make super clear that I only want those 
helpers included in my project once.
But if I'm careful and if I'm actually thinking through my logic properly
it should suffice too just to say require up top
so long as I myself don't accidentally require that same file elsewhere.
In fact, this is a slightly more efficient way of doing things then using 
require once, so I'll trim it down to just require.
Let's take one step further.
This last example now, version 5, has an even cleaner folder hierarchy.
Notice what I've done here per the read me in this final version
is now I have my HTML directory, which I've had all this time,
but inside of there now is only index.php, lectures.php, 
week1.php and week2.php.
The includes directory now lives alongside of the HTML directory,
so at the same level as a sibling, so to speak.
So does the templates folder. 
The key takeaway here is I've introduced a little bit more structure,
but the key feature now is that only the files 
that need to be web accessible, publicly addressable 
by a URL on the public internet are in my HTML directory.
Meanwhile, other files, helpers.php, footer.php,
header.php, that are arguably maybe more sensitive,
maybe helpers actually has some user names and passwords or some intellectual
property of mine, functions I really don't want the world to see, even if accidentally.
It's good practice to keep out of the public HTML directory
any files that do not need themselves to be public.
All you have to do in this case when looking at, for instance,
the HTML directory's index.php file,
notice we just have to be a little more careful when requiring
or require once this file.
I need to first do .. to go to the parent directory,
then do /includes/helpers.php
to dive back down in to get the file that I care about.
Any questions then on MVC
or this relatively simple incarnation thereof?
And let me make clear that we focused quite a bit on the
V here, the views and the factoring out of these templates.
We've not really distinguished M from C just yet.
In fact, there really is no M here, and even our C,
controller, isn't really doing all that much, but you'll get much more
familiar with both of those 2 letters from MVC,
or rather, you'll get much more familiar with the C
in MVC for problem set 7, so there's more of that on the horizon.
Questions?
There's actually no one here.
Okay, let's now move on to the second and final topic for today.
That is the introduction of a database.
Up until this point we've had a few ways of storing data.
We've used variables.
Back in our C file, I/O discussion we started using text files
and using files like fprintf, and then we even started
talking about CSV files a little bit, comma separated values, 
so all of these allowed us to have data stored
either non-persistently or persistently.
But even CSVs aren't really conducive to searching
and inserting and deleting.
It's really just a stupid text file separated by commas
row by row by row by row, so if you want to 
search that file the best you can do really is linear search.
You have to start at the top of the file, read the whole thing in, 
and look for some value of interest.
If you want to insert into it you have to do the same thing,
iterating over it and inserting in a particular place,
and in fact, you have to do all of the searching logic yourself.
You can't do clever pattern matching on a CSV file unless you yourself write the code.
You can't do filtration of a CSV file 
unless you yourself write the code.
Wouldn't it be nice if someone else put in all of the effort
to actually make searching easy and insertion easy
and deletion and updating and so forth?
That's exactly what a database is.
SQL, structured query language, is yet another language
that we're introducing here today, but this too is fairly accessible, 
and what we're really going to do is just pluck out of it some of the most salient
characteristics so that for pset 7, and if you do something web based,
your final project, you have the ability to express yourself
in terms of data queries.
You have the ability to store a little or a lot of data
in a much more structured way that will at the end of the day
make your life easier because with SQL you can express yourself
much more precisely, much more methodically in order to 
get back some subset of data from a larger corpus of data.
You can think of a database, in this case, an SQL database, really like Excel
or Numbers where it's a spreadsheet,
or maybe multiple spreadsheets, and a spreadsheet, of course,
has rows and columns, and that's because 
SQL databases are relational, relational in the sense
that they store data in terms of these tables,
rows and columns.
They are higher performing than something like a spreadsheet,
and a spreadsheet is meant to be used by a human. 
A database is meant to be used by a programmer
writing code against it, so the incarnation of a database
is going to be either command line.
One of the most popular relational databases out there is, again, MySQL,
which is wonderfully free, very high performing, and this is what 
Facebook used very early on and to some extent still today
to store a lot of its data, and we'll see in a moment
that using relatively simple commands
we can select data, insert data, update data,
delete data and the like, but thankfully, there's a more user-friendly interface
than just typing at a black and white prompt here.
We'll use for pset 7 and beyond a free tool called phpMyAdmin.
The name is coincidental.
The tool happens to be implemented in PHP,
but that's fundamentally irrelevant.
What's useful about phpMyAdmin is that it's a web-based utility.
We've pre-installed it in the appliance for you,
and with it you can create tables in a database, 
you can insert data, delete data, and generally see
your data in a fairly user-friendly environment.
Your users are not going to use phpMyAdmin.
This is really just an administrative or developer's tool
with which to see and poke around your data and figure out how to structure it,
much like you yourself might use Excel or Numbers,
but it's going to be a great way of visualizing what's going on underneath the hood
so that you can focus on the interesting problem solving and not so much
on the arcane commands.
Let's take a look at an example of data that might be stored tabularly 
in a relational database.
Here's one such example.
Now, unfortunately, phpMyAdmin erred on the side of throwing way too many words
and graphics at you, but if you hone in just on the
ID column, the user name column, and the hash column,
this is effectively a spreadsheet, but it happens to be a snippet
of a table inside of the appliance
using a file that we provide you with in problem set 7.
In particular, we give you a file that represents
a user's table, so a spreadsheet containing users with 3 columns,
one of which is a unique ID starting at 1 and being incremented thereafter.
The second column is a user name, and those of you who did Hacker,
the Hacker edition for pset 2, might recognize some of these user names at least.
On the right-hand side are passwords, but they're not literal passwords.
They're hashes thereof, so it turns out
that storing passwords in a database is a really bad idea.
You've all probably read at some point of some website
or some company's database being compromised, and then you have to 
change your password, you need to get refunds on things
because some bad guy actually broke into your account as a result.
Storing passwords in clear text, unencrypted in a database
is completely asinine, and yet it's greatly amusing
then to read about some very well known companies
sometimes in the press whose databases are compromised,
and that part is not funny, but the fact that the databases contain unencrypted
passwords is ridiculous because literally with one line of code
you can protect against that particular threat, and that's what we've done here.
Even for our fake little CS50 Finance version 
we are encrypting passwords just for good measure, and the fact that
all of these passwords start with $1$ is just a convention. 
That just means they're encrypted or really hashed,
which is like a one-way encryption function 
whereby you can't reverse its effects with something called MD5.
The fact that 50 is after that means that a salt value
of 50 was used for hashing all of these passwords except for one.
Mine, of course, as you can see there, HA, 
was using a different salt, so those of you who got slightly tripped up
maybe in Hacker 2, that might have been the result of our having used
a different hash than the others because my password is actually the same
as some other user up there.
In fact, if you've been waiting all these weeks to find out
what those passwords were here were the passwords that you were challenged
to crack in the Hacker edition of problem set 2, so none too tricky.
In fact, malan's was the same as jharvard,
but if we go back they looked different.
Focus on jharvard in crimson because they were salted differently.
The algorithm was perturbed in a way that the
hash value, the encrypted value looks a little different 
because the inputs were slightly different, but the password underneath the hood
was still ultimately crimson.
Now, who cares about this?
Well, we are providing you with the sample users, the sample user names 
and hashes of their passwords so that you actually have some
customers for CS50 Finance when you first get off the ground with your code.
You'll have to implement more tables inside of MySQL, inside of the database.
You'll have to create more spreadsheets, effectively, but we decided to give you this one
to get you started, and you'll see that the problem set specification
walks you through the process of importing this table
and also explaining what some of the characteristics are, 
and you'll also see that we provide you with the code
to handle the hashing or the encryption of these passwords,
so you don't need to worry too much about what MD5 or the like is actually all about.
So, SQL, structured query language.
This is, quite simply, the language we're about to start using in pset 7
and perhaps beyond to request data from some database.
The data is, again, stored tabularly in these relational tables, columns, and rows,
but using some relatively simple syntax like delete,
insert, update and select can we do exactly that.
We can delete from the database, insert,
update data, as well as select, that is, retrieve data from the database.
How do we go about doing this?
Let me go ahead into the appliance.
Let me pull up http://localhost,
which, again, is the local appliance itself.
That's its default nickname.
And let me go to /phpmyadmin.
This happens to be a special URL that the appliance is preconfigured to understand
that immediately prompts me for a user name and password.
As usual, I'm going to type jharvard and crimson,
but realize that's the administrator account on the computer.
It's just a coincidence that there's also a jharvard registered for CS50 Finance.
Jharvard, crimson, enter gives me the user interface that we saw
a glimpse of a moment ago, and it's a little overwhelming at first,
but rest assured, you're never going to have to click most of the links in this tool.
You'll end up using a small subset that are super helpful,
the first of which is databases up here.
If I go up to databases, notice that I'm prompted to create a database.
This is like creating a new Excel file, effectively.
I'm going to go ahead and call this lecture, and I'm just going to ignore the field there, collation.
It has to do with the representation of data therein, 
and I'm going to click create, and now notice as I let go of create
on the left-hand side where it says no databases
I should soon see the lecture database.
If I now click at the left-hand side, the lecture database,
notice my tabs change a little bit.
I've got structure, SQL, export, import and some other things.
Structure is pretty much blank.
No tables found in database, as it says here,
so let's create a table, and let's go ahead and create a table
like students, and how many columns do we want?
Let's keep this simple, and let's record for every student 
an ID number, a name, and an email address.
We'll keep it simple like that, so 3 columns, go.
The form that you see here is now a little bit messy and overwhelming,
but we just have to go through it row by row, so really quickly let's give
the first column in this database a name of ID for the unique identifier.
It will be an integer. I can actually ignore length and values.
An int is going to be 32 bits no matter what you type in there, so let's leave it blank.
Default value, I could make it null, as defined. 
I'm going to leave that alone. Let's not worry about default values.
Let's scroll over here to the right, attributes.
This is interesting. 
Let's go ahead and somewhat arbitrarily say that IDs must be unsigned. 
Let's not waste any negative numbers. 
Let's go 0 to 4 billion, give or take,
and then let's not touch any of these fields just yet over there,
but then let me type in name down here, 
and then the other was email, so the catch is email
and name, obviously not integers, so let's change these to a different field.
It turns out varchar, variable length char, 
is like a string in an SQL database
but a variable length, and you actually have to tell it in advance the
maximum length of the string, so I'm going to somewhat arbitrarily
by convention type 255 characters.
I could totally say 32. I could say 1,000.
You kind of need to decide for yourself based on your demographics what is 
the longest student's name and go with that number or a little bigger,
but what's nice about a varchar is it's not going to waste 
255 bytes on every student's name.
If it's D-A-V-I-D it's not going to use a whole 255 bytes,
but that's an upper bound, so I'll go with 255 just by convention,
but we could debate that to be some lower value, and for email address
just to be consistent 255, but again, we could have the same debate.
But I'm going to do one other thing over here on the right-hand side.
What's powerful about a database is that it can do a lot of heavy lifting
or complex work for you.
In particular, I really don't care what my student's ID numbers are.
It's just meant to be a unique identifier in a database
so I have a 32-bit succinct representation of that student
so that I have some way of uniquely identifying them 
lest there be 2 Davids, for instance, in a class.
In fact, I'm going to check this AI box, auto increment,
so that the database, MySQL, figures out 
what every newly inserted student's ID is going to be. 
I don't even have to care about that in my code,
and I'm also going to choose something under the index menu.
The index drop down here has primary, unique, 
index and full text.
You can maybe guess what a couple of these things are, 
but it turns out in relational databases
you the programmer or the database administrator get to preemptively 
give hints to the database as to what fields
in a table are a little special.
For instance, in this case I'm going to say that ID
is going to be a primary index, otherwise known as a primary key.
What this means by definition is that ID henceforth 
will uniquely identify students in this table.
No student will have the same ID because I'm imposing this constraint or this index.
Moreover, what this is going to do for me is it's going to tell 
MySQL that ID is special.
I care particularly about ID, so go ahead and do your fancy data structure's magic,
build up some kind of tree.
Typically it's something called the B-tree, which we didn't look at weeks ago,
but it's another such data structure similar in spirit to the binary trees
and tries that we looked at, but it's going to say
to the database this field is so important that I probably 
want to be able to search on it, go ahead and build up some fancy
data structure in memory to expedite searches so that ideally 
they're constant time or at least as close to that as possible
so that it doesn't devolve into linear search, which is not going to be
the most high performing approach.
By contrast, email address could have been a primary key.
In theory, everyone's email address is unique, unless you're sharing some account,
but it's generally not good to use something like a string
as a primary key because if its purpose in life is to uniquely identify
rows in your table there's no reason to use 255 bytes maximally 
to uniquely identify someone if you can get away with just 4 bytes
or a 32-bit int.
In general, a primary key should be short and succinct 
and ideally something like an integer or a big int, which happens to be 64 bits.
But an email address should be unique, and one of the features of a database too
is to enforce uniqueness for me.
By selecting unique here next to email, even though email itself
is scrolled off the screen, I'm saying to the database
don't trust me.
Don't let me insert into the database
the same email address twice, even if I'm an idiot and am not
very good with my ifs and else ifs and actual PHP code
and I accidentally let the user register with an existing email address
the database is yet another level of defense for correctness
to ensure that that duplicate email address doesn't end up in the table.
Now, by contrast, for name you probably don't want to make that unique
because then there could never be 2 Davids or 2 Mike Smiths, for instance,
in your database, so that one we'll just leave alone.
I'm going to go ahead and click save at bottom right,
and everything looks good, but notice up here
this is a part that for now we won't spend too much time on
because the syntax is a little complex, and we don't have to create tables 
all that often, but SQL itself is a language,
the syntax for which is right here that I've highlighted.
What phpMyAdmin really does is it creates a web-based GUI for you
with which you can save time and not have to manually type out 
a fairly long SQL query like that.
In other words, if you wanted to manually create this table,
either at that black and white prompt or even in phpMyAdmin
by using this other tab, this SQL tab where you can type in any SQL queries
you want, frankly, this would have taken me a minute
to actually remember the entire syntax, and even then I probably would have
made some typos, so this tool is useful for things like that, and it's also instructive.
You can begin to infer what the syntax is 
just by the nice color coding that phpMyAdmin is adding
for our visual convenience.
But now let's do this instead. 
Let me go to the insert tab at top, and let me go ahead and insert for instance
an ID of let's say—actually I don't care.
It's going to auto increment. I'm going to let the database deal with this.
But I shall be David, and my email should be malan@harvard.edu.
Let's go ahead here and put in Mike Smith as another one.
I'll give myself a last name as well,
and we'll have him be smith@example.com,
and then where do I go next?
Well, it looks like go is the button to click, and voila.
Notice at the top 2 rows inserted.
This is the actual SQL query.
That's the phpMyAdmin tool executed for me,
but the end result, notice, if I now go to the browse tab,
is to see 2 rows in this table, very reminiscent aesthetically
of the table we saw earlier for our users from pset 7,
one of whom is David Malan, one of whom now is Mike Smith.
But just to be clear, I don't need to use phpMyAdmin, 
and indeed, you're going to soon be writing code for pset 7
that automates the process of adding rows, deleting rows, updating rows and the like,
so let me instead go to the SQL tab up here
and type in select * from students where 
email = "malan@harvard.edu."
In other words suppose now you have 
some HTML form, and the user types in their email address, among other fields,
and the goal now is in PHP on the back end code
to actually look up that user's other details.
What is your full name? What is your ID number?
You could write a SQL query like this, select * from students
where email = "malan@harvard.edu."
And if I then click go, notice that I should, and indeed I do, get back just one row.
Mike is omitted from this result set, as the collection of rows
is generally called, because he doesn't have the same email address as me.
Now, again, here for pset 7 you'll use phpMyAdmin as an administrative tool
and a pedagogical tool to learn your way
around the world of SQL, but at the end of the day
you're going to be writing these queries inside of actual PHP code,
and so stay tuned in Zamyla's walkthrough in particular
where you'll get a tour of the distribution code for this problem set
where we've given you not only the aesthetics for the login page
and the nice sexy logo that says CS50 Finance, but we've also given you 
a bunch of functions that will make your life a bit easier.
We've also written part of the pset for you,
the login portion of it in particular, to give you a sense of a representative design
that actually uses a controller, for instance, index.php, 
login.php and the like, and then you'll see the pset also has a templates directory
that has all of your views, all of the aesthetics.
And so the overall workflow in pset 7 is going to be that
your users visit a controller via a URL in a browser.
That controller contains PHP code that you wrote, and inside of your PHP code
might be some lines of SQL nested in between double quotes
and passed to a function we wrote called query
that will help you talk to the database without using something like
an administrative tool like phpMyAdmin.
You'll be able to write SQL statements in PHP code
and get back a PHP array of the result set, 
of the rows that actually match that query.
And similarly will you be able to do inserts or deletes
or updates or the like, the syntax for which is fairly similar,
and you'll see from some online references, from the distribution code
and from the pset pack itself exactly how to go about doing that.
Realize ultimately we're really just scratching the surface of SQL
and of MySQL, but the power of it really is that it frees you 
to focus on the problems you want to solve, the use cases you want to implement
without having to worry quite as much, at least early on,
about where and how to store and search your database,
and this is quite literally where Facebook itself got its start
using MySQL and then using more MySQL servers and then more MySQL servers
before long until they then had to really start thinking hard about how to 
store data, how to store things even more efficiently,
so even though we'll take for granted the fact that indexes and unique constraints
and so forth just work there's a very interesting conversation
that this can all eventually lead to, so realize that we're just scratching the surface
of what could eventually for you or your projects become quite a bit of big data.
With that said, let's end here, and we'll see you next week.
[CS50.TV]
