Hello world
It's Siraj
And how risky is your credit that is the question that we are answering today by looking at this?
Credit Risk Data set it is a German credit Risk Data set we want to know
Based on your employment history based on your family history your in you know your income are
You at risk for not paying back your loan?
Usually we have people who do this and it takes a long time, and you have to there's a bunch of biases
Built into the system you know you meet the person whether you're an insurance
Agent or you work for some bank, and you're trying to assess whether or not this person deserves a loan as humans
We have a lot of biases and these biases don't necessarily add
Real value to whether or not this person deserves a loan or not, right?
So the way to fix that is to let machines do the job because machines can find relations and data that we can't
So the data set we're going to look at is a bunch of financial
attributes of somebody the status of their existing checking account
This is a German data set by the way
and I found it on the UCI a
Website also a that is just a great website to find data sets on so definitely check that website out if you haven't already
We're going to look at their credit history the duration of payments
They've made the purchases they made the cars furniture all of these things are features, right?
They're all features and we can use that to assess whether or not this person
Has is that risk for is that risk for not paying back their loans? This is used in a bunch of different fields insurance of
Finance
Whether or not to rent a house to somebody right the landlord assesses whether or not you can pay pay it back
Savings account that whether or not, they're employed all these features and the label is
This is a history of credit risk. So there's also label based on all these features this person has already been assessed as
at risk or not at risk right, so
That's that's the mapping that we're going to learn right it's a binary mapping and the way we're going to do that is by building
a random forest that's really the goal here to learn about random for it and
Yeah, so basically what it would look like it's something like this like this picture that we're looking at right now
Eventually once we built this random forest which consists of several. What's what are called decision trees. It will look like this
We'll feed it a new data point and then it will iteratively ask a series of questions like is their checking account balance above
200 or below 200 and then
Based on that it will ask another set of questions like if you say no if the answer is no
Based on that Data point it'll say well
What's the length of their current employment, and then are they credible or not creditable, right?
So it'll just keep going down to this this chain of decisions a decision tree, okay?
So that's what we're going to build we're going to build a random forest
So what is a random forest? Well a random forest is a collection of decision trees, so let's talk about what a decision
Tree is so a decision
Tree is actually a relatively simple concept and so I just didn't even talk about that in this series so far
I'm just going straight into rAndom forest because
Decision trees are easy stuff. We want to get right into the random Forest part, right?
But let's go over decision trees really quickly and so a decision tree is
basically a set of decisions on
whether or not to classify something the technical name for this is the
classification and regression tree or court and it was invented by a dude named Leo Breiman a
Couple decades ago like two decades ago, and it can be used for both as the name suggests
classification and
regression both of those things
but we're going to use it for
Classification because that is that is what we're trying to do or we're trying to classify whether or not someone is creditable or not creditable
Okay, so how does this thing work? Well you have a set of features, right?
Let's say this features are the temperature the wind speed and the air pressure and based on these three features
We want to classify whether or not it's rainy
Well what the decision tree that we build is going to do is it's going to create
iteratively
Iterative ly I should say recursively actually it's going to recurse the tree itself is built
Recursively, you'll see what I mean want to only look at the code
But the experience elf is built recursively and for all of these features. They will ask a series of questions
Until it classifies it as raining or not raining so the real question is how do we build this thing?
What is how do we build the optimal tree where it is asking the right?
Threshold values like how does it know that the temperature should be?
Greater than or less than 70 degrees Fahrenheit, and then based on that answer
How does it know whether the wind speed should be greater than 4.5 specifically where are these Magic numbers coming from well?
they're coming from
The Genie Index that's what it's called
It's called a genie index genie in a bottle genie in a bottle not genie in a bottle ge is the Italian Genie
So it was some dude named Genie. I actually don't know we're not going to get into that anyway
Let's talk about the genie index, so the Genie index is the loss function here
But the difference here between what we've done before with gradient descent with Newton's method. Is that there is no convex optimization?
Happening here. We're not trying to find the minimum of some convex function
There is no error that we're trying to minimize the genie index is a cost function
That works differently and here's how it works basically for every single point in our data set right we've got a bunch of different features
We want to find that ideal
Threshold value, and so I'm going to explain this once and then explain it again when we get into the code
But we want to find that ideal feature all right what that ideal?
Value for a feature okay, so what do I mean by that so here's how it works?
Check this out
We've got a data set with a bunch of features right ten features, right?
So one let's say let's say one of them is the income so the income could be anywhere
And I'm going to use usd for this example. It could be anywhere from
$10,000 a year to a million dollars a year
So what the genie index is going to do is it's going to go through so what we're going to do is we're going to
iterate through every single Data point for that feature
So we're going to say we're going to iterate through every single data point and we're going to compute this genie index which is one
Minus the sum of it is no accep. So the the genie index is where is the formula ohright?
It is one
It is one
- the average times the average where the average is proportion right one - the average of all of the class values
Times that average and that gives us the gini index and so what we want to do is it comes out to some single value
some scalar value
And so basically we use we start from we start from Data point zero and we go up to Data Point N
Where n is the number of Data points and we compute the index for each of these data points for specific feature?
Right so let's say the first data point is 10,000 we'll compute a gini index for that data point for that value for
That amount of income and so what happens is
basically it goes on a scale so a
Genius core of Zero is the worst case a genies core zero means that based on that index for all the other data points
They're going to be they're going to be evenly split between less than that value and greater than that value
But that's not what we want ideally we want a gini score of 1 that is the ideal gini score
And what that means is that for that given a value that given value for that specific feature?
All the classes from 1 all the classes from all the data points from one class will be on one side of that
Threshold value and all the Data point from the other class will be on the other side of that threshold value that will give us
a 1
Value that is the a one gini value right and so what we do is
We just compute the gini index for every single Data point for every single feature
And we just do that for every feature right away, so for let's say for you know income, we'll compute the gini index
So we'll say okay
so let's start with
10000 we'll compare every other data point to 10000 and see whether it lies on the left or right whether it's greater or less than
And that we get we then we compute the gini index from that okay, which is this formula right here?
And we do that for every single data point and so what's going to happen is we're going to have a collection of GD
Indices we'll have a set of gini indices
And then we'll pick the one that that is the highest and the highest one is the one such that the data points are most
of our most
Not evenly split that means the most data points from class a are going to be on the left or right and the most data
Point from the other class will be on the opposite side you see what I'm saying and by side
I mean greater than or less than
so
So the worst genie the worst case is when the data points are evenly split. We don't want that
We want them to be all on one side and all on the other side that means that when we get a new data point
It'll plop down right into it
it's a little bucket with all the rest of its related data points right, so that's the genie index and there are different measures of
Loss when it comes to
arithmetic based Machine learning models instead of Matrix
Matrix operation based machine learning models like we see with Neural networks
Okay, so that's a gini score gini index. Whatever you want to call it. So how do we build this decision tree?
Well it is there are two parts we first
We've got to construct the tree so we that's a recursive process that you'll see
when to construct the tree and once we construct the tree then we prune the tree so that means we identify and remove the
Irrelevant branches that might lead to outliers to increase
Classification accuracy, so wait a second you might be asking why are we building a random forest in the first place?
Why can't we just build a decision tree alone? Well? What happens is if you just build a decision tree. That's not fun
No, there's a better answer if you just build a decision tree then
Your decision tree could be over fit that is a big problem when it comes to the decision trees right the decision tree gets over
fit to the Data right it's like it's like a
The boy someone might memorize an eye chart. It's not like they can see it properly right with one eye closed
They just memorize with the position of where everything is
Right in that same way we don't want to over fit for Data to our data
So the way to prevent that is to create a bunch of decision trees on random subsamples of the data
So we'll define subset some set of sub samples
And we'll say for each of these sub samples will create a decision tree
Then once we have a bunch of decision trees that we've trained right then by train
I mean, we've computed the genie in debt for all of the features and then we've recursively built the tree
It's a binary tree by the way. I didn't mention that decision Trees are binary trees, right?
They're either a left node or right node or no node right? It's the leaf. It's the last node and so
if you haven't reviewed binary Trees
I mean, we're essentially building a binary tree right now, but if you want to learn more about data structures and algorithms
or if you're curious
If you should know data structures and algorithms for machine learning
The answer is yes for two reasons one just to the logic sake you need to know how data is stored
Because machine learning isn't just about Matrix operations. It's also about storing Data, right?
serializing and storing data in the most efficient way possible and
Retrieving it and if you want to build algorithms, you got to have your basic data structure and algorithm knowledge intact, okay?
I just wanted to say that back to this
the way Random Forests work are
Each of the Decision trees that are generated then we'll just once we have a new data point
We'll run it through all of those decision trees. They'll all make a prediction, and then we'll make a majority vote
So it will calculate a majority vote so each of the votes for each of the trees
Whatever is the majority vote
Which is the class that is the class that we're predicting and what this does is it gives us higher accuracy
Than just using a decision tree alone
Let me also say that random forests are one of the most used
Machine learning techniques out there. They can be because they can be used for both classification and regression
and therein lies almost 90 what 90 plus percent of problems right and
it also works well for very small data sets which we tend to have a lot of
and
So that's it's so random forests are just used a lot
They're you so much that how can I how can I how can I say this?
They're you so much that the guy was in Josh, Gordon
The Google dude his name on Twitter is random for it. So they are very useful hi, Josh
If you're watching this ok back to this
The okay so we're training the we're training it on subsets of there right one subset per tree, and that is our random forest
It's a forest because it consists of trees as you
Bubbly desk, and so if you create a giant random forest you get lord of the rings Rivendell style
No, you don't you the bigger the better generally you'll see at the end the more trees. We add the better our accuracy score gets
Right so yeah and each of our nodes are going to represent a set of feature splits, right?
So what's the color green red Green ok what's the size small big big ok that fruit is a watermelon, right?
So we just recursively do that are
There other good
Examples of this you might be asking in the answer is yes
Of course there are other good example stock price prediction and classification
I've got two great examples here. Definitely check them out the documentation is pretty Sparse, but the code itself
It's not using any library so definitely check it out alright. So now let's
Go into the order of functions that we're going to follow. So we're not going to have time to write every single function
We're not using any libraries, but we will write the two most important functions split and get split
And that's going to really take take on the majority of our
Logic but that's we're going to do and that going to be 40 lines of code but for the rest of it
This is the order of functions that I'm going to follow the chain of functions
So let's just get right down into what this chain of functions will look like so
I'm going to first of all look at our dependencies here
So I'm going to import seed from Random and see it is going to generate pSeudo random numbers
This is useful for debugging you want to do it anytime you have some random numbers
And you want to debug your code in production, or otherwise?
it's always great to have some seed so that the random numbers that are generated start from the same point every time and
So that's just great for reproducibility of results
I'm also going to import R and range so it's going to return a randomly selected element from a range of numbers
CSV because our data set is by the way let's see our data set our data
Set is a CSV file is our data set is a CSV file. So let's open our data set and see what it looks like
it is the
Numeric Data right here right so all of it is numeric at the end the result is either 2 or 1 right?
It's a binary level either 2 or 1 and the rest are like 15 features here
We're going to use every single one of them no feature selection
We're going to use every single one of these features okay, and it's using arithmetic so that we're only importing math
We're not even importing numpy. We're only importing the math library
Okay, so let's let's look at this thing
So we've got some really basic Data Loader functions here Load CSV
Initialize the data set as a list open it as a readable file
Initialize the CSV reader and then for every row in the data set appended to this data set Matrix
It's a 2d Matrix return it and so we have an in-memory version of our data
We know that part that's that that's general to all machine learning really whenever you're reading a CSV file
What else we have here we have two more helper methods functions?
one to convert a column to an int and want to convert one to convert a string to an int and one to convert an
into a string and that's if we have
String values so in this case
We don't we have numerical values so we don't need this okay
So let's go into this order that I was talking about the order of algorithm, so the first
So the first thing we want to look at is this main code here, right?
So we started off with the seed so that we always start with the with the same random numbers
We loaded up our data set right that's CSV file, and then we converted our strings to integers
we don't actually need to do that but
Then we said okay
So how many folds do we want to have and bold means subsamples of data so we want 5 sub samples?
What is the max depth and the depth means how many nodes?
What is the depth of the tree right how many levels of that tree?
Do we want to create so we're going to say max 10 levels and these are our hyper parameters. We can
Tune them we can make them more or less, and we'll have different results. They're kind of like note
The number of Neurons in A neural network, right?
And so we say what's the minimum size? What's the minimum size for each of those nodes?
How many features do we have we'll count all of those as well, and then what we're going to do is we're going to create
Three different random forests all right we're going to create one random forest with just one tree
So it's actually a decision tree, and then one with five trees and then one with ten trees
And then we'll assess and then we'll assess how
Good each of these random ports are by measuring the number of trees the accuracy score for each of them, okay?
That's what we're going to do. So notice that here is the big boy right here this evaluate algorithm
Is that main function that we're going to use to train our model? So we're going to give evaluate algorithm our data set
we're going to give it the
The Random Forest model that we built the number of folds or the sub samples of the data?
How big we want to tree to be the mid size?
The sample size the number of Trees and the number of features that we've counted so let's look at what this evaluates
Algorithm function looks like because that's really the big boy right we want to see what is going on in this main this big function
Right here, so what I'll do is. I will go right here, okay, so
What it's doing is it's going to say this let's look at okay ready. Let's look at this
What it's going to do is it's going to say okay, so for a given data set and for a given algorithm
Which is the random forest algorithm that we're going to feed it
Let's say the folds are the sub samples that are used to train and validate our models
so we'll split our data into a training and a validation set I'll
By the number of folds, so what do I mean by that as well okay? Let's look at that cross-validation split method
Where is that?
It's right here
So so we basically want to split the data into k fold the original sample is randomly partitioned into K
equal sized sub samples and then of those k sub samples a single sub sample is retained as the
Validation Data and the rests are going to be used for training data
It's splitting the data to that k minus 1 sub samples are used for training and then there's one sub sample that's left for validation
That's it okay. So back to this we talked about that function
So once we split our data into those folds then we're going to say okay. Let's score each of them right because we're evaluating our
Random Forest algorithms all say for each of the sub samples that we have of our data
Let's create a copy of the data and then remove the given sub sample
What initialize a test set because this algorithm really does two things it?
It trains our model on the training data and then it tests it out on the testing data
that is it makes predictions on the testing data ignoring the labels, so we'll say okay for each, so
We'll add each row in a given subsample to the test set so that we have test samples as well
And then we'll get the predicted label right so for so we'll use a random forest algorithm
This is a random forest algorithm that that's the next thing we're going to look at
but
It's going to get all the predicted labels from them right from our training and our testing set and then it will get the actual
Labels right so and once it has the predicted labels
And it has the actual labels then it can compare the two
Via this accuracy metric and the accuracy metric is a scalar value
And it is how we assess the validity of every random Force that we've built and so
The accuracy metric to go into this is really simple for each label if the actual label matches a predicted label add
One to the correct iterator
and then we just calculate the percentage of predictions that were correct which is the number of them divided by the
Number of correct divided by the actual one times 100 really simple like I said, it's all arithmetic
It's all plus minus plus minus
Multiply and divide that's all this is there's there's no linear Algebra here. It's all Algebra
So but despite how simple this model is it is quite powerful. Which is why it's awesome
Okay, so that's our evaluate algorithm function. So let's keep going down right we're going down the chain go into the moon
Moonride Moonride Moonride
Moving my crazy in as well so back to this
If you got that reference to pool if you didn't get that references from SpongeBob back to this you're so cool so back to this
So we're evaluating our algorithm here. Let's evaluate this thing so if we're evaluating our algorithm
So what does this algorithm function even do what? What is this, right?
Let's let's see what this algorithm function is what this algorithm function is is it is our?
Random Forest that is what it is
We say for the range of trees that we have let's compute a subsample of those trees and then for that sub sample
We'll build a specific decision tree for that specific sub sample, and then once we built that tree
We'll add it to our subtrees, and then we're going to make predictions based on all of those trees
And we'll return the list of predictions right seems simple enough, right?
So what is this bagging predict now notice that we just keep going down the chain, right?
We just keep going down the chain, so this is the list of trees responsible for making a prediction for with each decision tree
So it combines the predictions from each decision tree
and we
select the most common prediction the one that comes up the most that that label that the some of that label is greater than the
Rest of the some of the others right but and there are only two labels because this is binary classification
You can also do multi-class classification, but we're not going to talk about that right now okay, so then we talked about bagging predict
So what's next subsample right? What is this the sub simple function? So how are we subsampling?
How are we choosing how to split this data right? That's the question. How are we choosing how to do that? Well the answer is
We are we are creating a random sub sample, and this is where our randomness comes in right?
We are creating a random sub sample in this random range for the number of samples in the data
And we'll add that list of sub-samples to this sample array
And then return that so the sample array or list
Contains all of those samples from our from our data set right we split them
And so that's four sub sampling and so now where were we so we talked about subsampling the building prepar
Let's let's see how the tree itself is built so if we give it some sub sample
the depth and size of the Trees
It's too high per parameters as well as a number of features we expect this function to build this tree
So let's look at how this how this function works. How is it building the tree itself well inside of this function
We said we notice that it's it's first using this method get split which we're going to code
And which is where the meat of the code goes?
But we're going to build a tree and that involves creating the root node
and that's going to get that split and that this kind of this is going to output the root node right that first node and
Then we'll call the split function that's going to call itself recursively to build out build out the whole tree
So once we've got that root node then we're going to call split recursively
And so it's just going to continuously build that tree recursively by calling itself in case you haven't heard of recurred
It's when a function calls itself. It's like inception
Except it's recursion. Yeah, wow I never actually made that reference until I just said that right
inception is
recursion a dream inside a dream
Whoa, okay back to this um right right? So where were we?
So we were at split and get split, so let's let's write out get split
Right so get split is the first one that we want to write out
This is going to select the best split point in a data set right that that key question
How do we know when to split our data in this?
Decision tree of the many decisions that we make in our random forest the answer is we'll have to compute it
This way this is an exhaustive and greedy algorithm. That means it is just going to go through
Every single iteration that it can write there are no heuristics here. There's no educated guesses. There are no educated guesses
It's going to go through every single Data point to compute that split, so let's look at what this looks like well
we've got to give it first of all our data sets of course and
We want to give it the number of features right because for each of those features, we're going to compute that split
So given the data set we've got to check every value on each attribute as a candidate split
So we're going to what we're going to do is we want to say
Let's get all of those class values right and that the set that set of class values is going to be a list
That set of class saw is going to be a list and it's going to be a list for every single Data point
And so we'll say all of the rows from our data set are our data points right? We have that we know that
we want to calculate the index the value the score and the
groups and so
We'll initialize all of these has really big numbers, and they will be updated with time
but we'll initialize them as very big numbers as well, okay, so check this out, so
the gini Index essentially gives us two things it gives us an index, and it gives us a value the index of the feature of
The feature of the so it gives us it gives us the index
some feature
Whose value is the Optimal value to split the data on for that feature you?
See what I'm saying it gives us the index of say for income let's say that
30,000 is the best feature it is going to be that decision node from then we can put everything based on
30,000 that is where the the classes are most split?
It's going to give us the index of 30,000 in the in the dataset as well as the value whatever it is
30,000 right so that's the pair that the gini index gives us the
Index and the value and
It will also give us a score and the groups and the groups are the sub samples and the score is
The is it's how good it is right. It's a measure of how good it is so
We all want to initialize our features here as a list and then we're going to say okay
so while the
number features is less than
the number of Features whereas there's going to be 0 and we're going to increase it every time as we iterate through each of the
features well let's this decide some random range right some some random index in our data set to
then
To append to our features list so say if the index is not in the features
Which it won't be
Then a pent at first
but eventually it will well a pendant will append the index wherever we are to the list of features that we initialize is empty and
then
Once we've done that we'll say
for every index in the data set
For each of those indices let's let's go through every single row in the data set so we're computing groups here, right?
So we're computing groups to split our data into so we're saying the same what we want to
decide the the test values that we want to split as well as the
the gini Index and so the gini Index is
What we're computing right here
This is the point that we're computing our gini index for the current group of data that we're in right
So for 4 we picked a feature and we're going to get through everywhere
for that feature we're going to compute the gini index for all those values and
We're going to pick the gini index that is d. That is the
largest right
And that is what we're going for and that once we picked the gini index at its largest
that will give us the index and value to then build out that that note of the
decision tree, okay, so then
We computed that and now we're going to say okay if the gini index is less than the optimal score
then we want to say we're going to
Update these values to the new values to the score the value of the index and the group's
So we're going to use a dictionary to store the node in the decision tree
By its name, so we'll say return or were we so we'll say return the index
as well as the value the value and
the Group's
be groups
Right so with the index the value in the group's right because we've computed all those right
So this that function gives us gives us the root node right and so once we have the root node then we can actually
Perform the splitting write it down the best splitting point and so now we want to recursively compute the splitting itself
So that's where our split function comes in given that root node
How do we build a tree such that it is split along the ideal lines, okay?
So let's let's let's write out this split function, okay?
So it's going to so basically so this is the binary tree part if you've if you've created a binary tree before it is exactly
the same so given some
Root node right. We're going to say some root node
Will compute a left and right leaf for that node, and then we'll delete the original node so today, okay
so now we can delete that and
then
Once we've done that
We can check if either the left or right group of note is empty
so we're checking if either the left note, or the right note is empty, and if so then we create a terminal node using the
records that we do have here right so
in a terminal node by the way
Let's look at what a terminal note is we select a class value for a group of rows and then return the most
common output value in that list of Rows
What is the most common output value in that list of rows and that is the most common class so that's what we're doing
We're select the most common class
Okay, so that's the first part and so we want to check if we've reached our maximum depth right so that that depth is our
Hyper parameter is a threshold for how large we want our trait to be
So we'll check if we've if we've reached that point so we'll say
If the depth is greater than or equal to the max depth then
So if we reached our max depth then we create a terminal node
That's what that's what that's saying okay? So then we've got two more parts here. All right, so the next part is to say
okay, so first the two groups of data that are split by the node we retrieved them when we store them in the left and
right variables here
And then we delete that node
then we check if either the left or the right group of rows is empty and if they are we create a
Terminal node using the records that we already have right here and so the terminal node by the way is
Where we just select the class value? That's the most-used right?
That is the that is the output class the output class the terminal node. Is that what is that?
What is the what is the prediction itself right? That's the end point and so then we check it
So then we check it either the left or right group of Rows is empty and if so we create that terminal Road
So then we check if we've reached our maximum depths and if so, we create a terminal node
And so that's what this part is and then and so lastly if the group of rows is too small will create a terminal node
else
We'll add the left node in a depth first fashion until the bottom of the tree is reached on this branch will do this
So we'll do the same for the right child as well
And so the right side is then processed in the same way
And then as we ride back up to construct a tree all the way back up to the root, okay?
So get splits notice how gets split is being called here over and over again
two more functions and then and then we're good with this the two more functions
I had where the gini index and so the gini index is like
I said it is it was that formula right up here, okay?
This is the gini index or gini score. Whatever you want to call it
so the gini Index splits the data set
Involving one input feature and one value for that feature write what the gini index gives us is remember that pair that that the value?
And the index of some feature some features for some data points right and that's that's the line that. We then split. Well. That's the
boundary from which we can split data based on that feature in the future and so
The way we compute that is it starts off at zero?
It's some scalar value and we're computing it for all of the data points
So for each class value that we have for all of our classes, and we only have two
fraudulent or not fraudulent
And we only have two credit worthy or not credit worthy for each of those classes
We'll select a random subset of that class we'll compute the average value for that feature
And then we'll compute p times 1 minus p
Where p is the average and that is our gini scalar okay?
And we'll add them all up together because we have all of those because it's the sum of all of those values
And that's where the sigma notation comes in and we'll return that as a gini score, okay?
we compute that for all of the subsets of our data and
So the last function to show you is the predict function and the predict function is right here, right?
So whenever we're actually making predictions. This is how it works it navigates down the tree
This is it's asking is this person employed or not with this person go to school?
What is this person's social security number?
What is this person's you know just a bunch of random questions based on the features each of the features that we have
So predict is recursive so whereas the node is always changing for a given row
The node could be the left node or the right node
So whether or not the value for some Data point is the than work greater than some nodes?
Threshold value that we've computed using the gini index
it will then update the node and then use that as a new parameter to then run predict again and
Eventually once it's reach the terminal node the last node the label
It will return the label and that and we and then
Because and that's for one decision tree and because we have a random forest. It's computing that for every single decision tree
We sum up the values and we use the one that is the majority vote and that is our prediction
so then if we test our code will notice that the
We've got our
accuracy scores here
And so the accuracy is getting is improving every time so we've tried it for three different random forests we we tried
We tried it for one with one decision tree
We tried it for one with five decision trees
and we try to it for one with ten decision Trees and
Every time the accuracy accuracy score improved and so what this means is if we give it a one hundred
One hundred Tree Random forest or a thousand Tree forest. It's going to do really really well okay, so
And then we'll be able to predict whether or not someone's
Someone is worthy of getting their credit assessed or not and if you made it to the end of this
I'm very happy so thank you, and that's all please subscribe for more programming videos and for now
I've got to do something random. So thanks for watching
