>>Now we're going to combine what we did
with the splitting of strings
with the accessing of data
in a comma delimited file,
or comma separated file.
So, what we'll be doing is using
that Nike 2 file that you created.
I recommend that you have the Nike 2 file
stored in the same folder,
or the same directory,
that you want to store this program
that we're working on in.
If you do not, then you need to include
the whole path to the file
as you're doing the program.
But I'm going to store it in the same file
so I don't have to worry about that.
I'm going to call this
program data mining one,
and what we will do in this, is we will
open a data file, and
we will get individual
lines of data out of the file.
Now it's going to come
out as a whole line,
as we did when we first were accessing
and looking at the data
in the notepad document;
it will be similar to that.
So I'm going to set up my main,
and access it there, alright.
So, first thing I need to do is
let the computer know which file it is
I want to open, what I'm
going to call that file
for the purpose of like
using a variable name,
for the purpose of accessing it here,
during the program, and
how I want to open it.
Do I want to open it to read from it,
to write from it, or to append to it?
And of course I want to open it to read.
So I'm just going to give
this a variable name my file.
So as I write my program here,
whenever I'm wanting to access
that file that is stored
out on my computer,
I'll just refer to it as my file,
rather than as anything else.
I can use the open command,
and inside the open
I need to have two specific things:
I have to have the name of the file,
which is Nike 2 dot CSV, then a comma,
and then how I want to open it.
I want to open it for reading.
That's called the mode.
So what this does is it opens this file
and gives it the variable name
so that whenever I'm
accessing it, that file,
in the program, I'm going to be
referring to it as my file,
rather than as Nike 2 dot CSV.
Alright, so, the first thing I want to do
is get all the data in from the file.
Right now, all I've done is said
that this is the file I wanna use,
but I haven't gotten any of the data in.
So to get the data in, I'm going to,
let's call a variable list of lines,
and I'm going to read from my file,
using the read command,
and I'm going to split it
by line, so you split lines.
What this will do, is it
will take that data file,
let me see if I still have it here, yes,
so it's going to bring in
this first line of code
and then it will bring
in the next line of code,
and then, or of data, right?
Each row of data in the spreadsheet
is going to come in as a separate line.
This is all coming in as
one massive thing now,
but it's going to be
separated out by lines.
So if I wanted to now, I
could print out values.
Similarly to how when,
in the presentation,
let's see, two videos ago, when we did
the splitting strings, remember when
we looked at that
string, then we split it,
and we could have it element zero,
element one, element two, for each
of the words in that string?
Well here we have a data file,
and what this splitting is doing
it's reading this whole thing in,
but it's going to take each
line as a separate thing.
So this is element zero, element one,
element two, and so on, and we'd go
down to the end, however
many elements there are.
And each of those is
going to be stored as,
in essence, a string in a list of strings,
which I'm calling list of lines.
So now, just as I could with the string,
when we split the string and
I could access the first word
by printing out element zero,
I can do the same thing here.
I can print out element
zero of my list of lines,
and it should give me that first line.
Let me save this data mining one dot PY
and I will run it, whoops, list of line?
Oh well yes, that's because
it should be list of lines.
Program's only as good
as the programmer, right?
Typos, user error.
Okay, so it did.
It brought in that first
line that's in the data file,
which is this very first line.
Now, it doesn't have data in it, right,
it only has the column headings,
but nonetheless it
brought the first one in.
If I wanted to bring in the next line,
which is the first line of data,
all I have to do is bring in element one,
which will be the next
line, as it split those up.
So now I have not only
the column headings,
but now I have the values.
Now notice these are not
in any kind of formatting.
I haven't separated out into columns,
there are no spaces or tabs or anything,
because this is just readiness text.
Just as a string, it's reading in
exactly what it sees in this data file.
What we'll be doing as we go into this
is we'll be separating
it out by the commas,
and then the dates we'll actually
be separating those out by the slashes.
But this is to get us going.
If I wanted to print out
the very last of the lines,
the rows of data in my
file, I could print out
list of lines and then think a moment:
How am I going to get the very last one?
Well, I'm going to use len, because len
will tell me how long list of lines is.
But what I have to remember is
that line zero is the
first line, so now if my,
let's say my list has
ten lines of data in it.
So the length is going to be
ten, and I start with zero,
I don't want it to try to
access the tenth element
of that list, because it doesn't exist.
That's too far.
I need to be sure that
I subtract one from this
so that it doesn't go out of bounds
of the range of the list.
And let's run this, and
sure enough it works fine.
Notice what happens if I
leave off this minus one:
I will get an error.
Out of range, the index out of range.
It went, tried to go up to
ten, but there was no ten.
Or whatever the equivalent,
whatever the length of lines is.
So I need to have that minus one in there,
so that it doesn't go too far.
If I wanted to print out
all of the lines one by one,
I could just set up a little for-loop
for I in range zero through
len of list of lines,
print list of lines sub I.
And that will print out
all the data that is,
whoops, oh for I in range, okay,
so that's printing out all
the data that's in the file.
Clearly that's not what
I want to have happen
in my program, but I could do
that and I can see it, right?
So pause the video, type this up,
make sure you can open your
file, it can read from the file,
it splits it up into individual lines.
We printed out three separate lines here:
the first one that had
just the column titles,
then the first line of data, and then
the last line of data from the data file.
And then also had a loop here,
to print all the values out.
By the way, it wouldn't hurt, here,
let me comment this out and run this again
so you see just the two lines of data.
Here I'm assuming that this
is the last line of data
in my data file, but
just to check to be sure,
I should look at my
data file and make sure
that that is indeed the last line of data.
So 11/24, 59, yes, so it matches up.
Alright, just as a double-check,
to make sure I'm getting good data,
processing the right thing.
Alright, so pause the video, type this up,
make sure you can get it to work.
This extra little loop
here to get it to print
is optional, but it's just
nice to know how to do it.
Let me know if you have questions.
