>>Now that we know how to open a file,
how to bring all the lines of the text in,
how to separate those into different parts
and then even to separate
those, separating them out
by the commas and the slashes,
this time we can actually
do some mining of the data.
Before we get started,
though, writing the code,
I'd like to kind of walk
through the algorithm
for what we're going to do.
I don't know that I've
shown you this before
but you can put three
apostrophes as a block comment.
What this means is that
everything in between
the three apostrophes,
the starting and the
ending three apostrophes,
are considered comments.
This doesn't work in
all versions of Python,
but it does work in 3.7, so I'm
going to go ahead and use it
so you can see how it works.
Otherwise I'd just have
to put the hashtags
or number symbols or pounds or sharps,
whatever you wanna call that
symbol in front of each line.
So here is the algorithm
for what we want to do in this program.
We know that the program is
going to go into the data file
and read all of the adjusted
closing prices and add them up.
That's the big picture.
We're going to divide
that up in the next video
into months, so it will sum
up each month individually.
For right now in this
program all we wanna do
is add up all the adjusted
closing costs or prices
and compute the average.
So we need to keep track of the sum
and of a counter so that
we can compute the average.
So the first thing we'll
do is set up all variables.
And those will be for sum and counter.
And then what we want to do
is go through line by line
of our file and process that line.
So we're going to set up a loop
starting with the first line
of data, which is the second
line of the data file, right?
And we will read one line of data,
and that's from list_of_lines.
Then we will separate
that into components.
Then we will be able to get
the adjusted closing cost.
We will add adjusted close
to the running total sum.
And we will add one to the counter.
And once we're done with all that,
we've gone through the whole file then,
we will compute the average.
And then we will display the sum,
the counter, and the average.
So this is not that much different
than a program you might have written
to compute the sum of a bunch of numbers
that the user types into the keyboard.
The only difference is we're
getting the information,
the data, the numbers out of a file
instead of from the user directly.
So what we need to do is
I'll just go through now
and start coding this
following the algorithm
that I've set up.
So first thing I will do
is set up the variables.
And we'll start with
sum = 0 and counter = 0.
That's fine, that's pretty easy.
Then I want to set up the
loop to process the data.
So to process this I'm
going to set up a for loop
for i in range,
and I'm going to start with one here
because I don't want to
process that very first line
with the column titles.
So I'm going to go from one until
the length of list_of_lines.
So it goes down to the very bottom.
And we'll specify that we're
going to be going up by one.
As you know, this comma one
here at the end is optional.
It's just making clear that
we're going up one by one.
And so now that I'm
going to be going through
each of these lines separately,
I'm going to follow pretty
much the same pattern
I had in the last program.
So I'll have one_line =
list_of_lines.split().
So that's going to take that
first line and split it,
but I need to tell it how
it's going to split it, right?
I need to split it by the commas.
Now at this point it wouldn't hurt...
To go ahead and test this.
I'm all happy.
(ding)
Oops, unexpected indent.
Oh, I didn't have these at the same...
Wait a minute.
That is at line eight, column four.
Column four, huh.
So it is perhaps not
liking these comments.
Because then it's clear that
it is not at another indent.
Oh, okay.
So I come across an error here
and the list object
has no attribute split.
So what was it that I had done here?
I split my lines and then I
wanted to take the line...
Oh, one line is going to
be, let me back up here.
One line is going to be list_of_lines[i]),
and then I can split that.
So then I can have my line_items.
I left out a step, didn't I?
Equals one_line.split.
Separated by the commas.
And then what I wanna print
out here is line_items.
Now because I know that my
data file has many, many lines
of code in it, I'm going to change this.
For right now I'm just
going to set that up there.
Just so I see a few lines of code
rather than all of them.
And there we have the five lines of code.
Well, four lines.
It goes from one up to
five so it's four lines.
So that works.
So at this point I can take this back
and put it in the place of the five
so that now I would see all of them.
And I don't wanna print them all up.
That would go on and on and on.
I just wanted to do that for testing.
Right.
So now I have those line items.
Now remember in the data file we know
that this is element zero.
One, two, three, four, five, six
of an individual line is the
number we're wanting to get.
That's element six.
So let me find the right one here.
I can take that out and add it onto my sum
because I can then have
sum += line_items[6].
And I know that that is the right one.
Now this is going to create a problem.
And you will see as I try to run it
that I get an error message.
Notice, unsupported operant
type for += 'int' and 'str'.
What's happening here is that
the data that's brought in
from that text file, from
the CSV file, is a string.
It's characters, it's not numbers.
So we can't do the addition
here of a character.
Well, so, no.
I know that the data out here
is floating point numbers,
there are decimal values to it.
So in my code I need to
do the conversion is all.
So here I'll make this into float.
And then after the loop is done,
let's print out the sum equals...
And let's do that much.
Ooh, and here I have some
big number, sum equals.
Well, it would be really nice
to know if that's right, right?
It went through and maybe it
did what it was supposed to do.
What I can do out here
in my Excel spreadsheet
is just scroll down to the very bottom
and go to the line after
the last line of code.
Let me make this a little bigger.
Choose the sum function,
which is this one,
and it will automatically
pick all those lines
and numbers up above it.
And so I come out with $12,432.18
and in fact, 12,432,18 is
exactly what I came up with
from my program.
So my program has gone through.
It has pulled out the
adjusted closing costs
column of the data from the data file
and added those values
together, and I have the sum.
Now what I haven't done
here is I said we were also
going to print out the average.
Well, to do the average
I need to have a counter.
So every time I go through the loop
I'm going to add one onto the counter.
After the loop is done.
Average = sum / counter.
And then I can
print("Average =", average).
And let's see if that works.
And that's probably right.
Again, I can go out to
my Excel spreadsheet.
Here I have my sum.
I can also do the average function.
I just want to make sure
that it does not include that last number.
So I will take that one out.
And 54.76732.
Whoops.
Hit the output.
54.76732, yep, came out right.
So my program works, hooray.
I have mined the data out of that file.
The Excel spreadsheet or
the comma separated file.
Hooray.
By the way, you'll notice
that the number there
comes out with these long decimal values
after the number here.
If I wanted to adjust that
so that it only shows two
points after the decimal,
I'll just add the round function.
In the round function you put the name
of the variable you want to round
and then the number of places
you want to round it to.
So I'll round it to two
places and that will be two.
Because these are floating point values,
it'll be to places
after the decimal point.
There are several different
ways of doing this.
This is the more recent way of doing it.
So then we have nicer looking values.
Okay.
So we followed our algorithm,
and I have that over here also.
And we went through step by step,
did everything we needed to do.
Validating results, that was
looking at the spreadsheet
and making sure that our program
came up with the same
results as the data file.
Now once we have written this
program for one data file
we could apply it to any
number of data files.
What we would have to change
is where the value is being pulled from,
which of the elements
of an individual line
that value is being pulled from.
You have to be familiar with your data
to know what kinds of values,
are they floating point values?
Are they integers?
Are they strings?
If it's string values,
you wouldn't be able to do
any math on them, of course.
So there are lots of things to consider.
But this is a good start.
So type this up, make sure it runs.
If you run into problems, let me know.
