hey guys this is Aayushi from in Edureka and today's session we would be discussing
algorithm that helps us analyze the past
trends and lets us focus what is to
unfold next so this will go rhythm is
time series analysis now let's quickly
jump onto our agenda and let's see what
all we are going to cover in today's
training so we'll start off this session
by understanding why do we need time
series analysis and then we'll
understand what exactly it is now once
we clear with time series will then see
the different components that we need to
take care while we apply time series
then we'll also discuss when should you
not use time series analysis or what are
the cases when you should not apply time
series analysis moving ahead in the
session we also discuss what is
stationarity or what are the tests that
are used to perform to check the
stationerity of the data next we'll be
discussing the ARIMA models now ARIMA
model is one of the best model that has
been used in time series so we'll have a
discussion on that and we'll finally go
ahead with the demo part we're in and
implement all these things and help you
guys to forecast the future as well so I
hope you guys are clear with the agenda
so kindly drop me a quick confirmation
or you can just write it down in your
chat box so that I can proceed all right
Monica says I yes okay so what they gave
me a thumbs up naman Shivani
all right since you guys are clear so
let's begin with the very first topic
that is why should you use time series
analysis so first of all in time series
analysis you just have one variable that
is time now you must have seen there is
a lot of algorithms present then why do
we need one more algorithm that is time
series so let me explain you this with
an example now let's take an example of
a supervised learning so under
supervised learning we have linear
regression or logistic so there we have
an independent variable and we have a
dependent variable so there what we do
we deduce a function or you can say a
mapping function of how one variable is
related to another and then we can go
ahead with analysis part but in time
series analysis you just have one
variable that is time so for example you
own a coffee shop it's quite a
successful coffee shop in the town so
what do you do you try to see how many
number of cups of coffee you sell every
month for that what you will do you add
up all the sales of your coffee now
let's say you started this coffee shop
in the first month that is the January
so what you'll do you record the data
month wise and then you'll sum it up so
you will have all the data till the
present month but what if you want to
know the sales the next month or the
next year
now imagine guys you just have one
variable that is sales and you need to
predict that variable in accordance with
time so in such cases we're just
halftime and you need to predict the
other variable you need time series
analysis now we know why do we need time
series analysis let's move ahead and
understand what exactly time series is
so time series is a set of observations
or you can say data points which are
taken at a specified time now over here
at your x-axis you have the time and on
the y-axis you have the magnitude of the
data so if you try to plot time series
plot on the x axis you will always get
the time which is divided in two equal
intervals so cannot create a time series
in one data point is at week level and
other are different this should be equal
interval let's say a day a week a month
a year a decade and a century so that is
the constant thing that a time series
require now let us see the importance of
time series analysis now first and
foremost is business forecasting because
your pass defines what is going to
happen in future so let's say you'll be
seeing a lot of traders in the same six
who are trying to predict what will be
the price of the stock market tomorrow
so that is nothing but a business
forecasting you also see a lot of
retailers who tries to know how many
number of goods they are going to sell
the next day so all of this can be
achieved with time series analysis now
this is not just limited to one domain
like retail or finance but it is
applicable almost everywhere now it is
also help us to analyze a past behavior
so here you can analyze in which man did
the sales went up or when was the dip so
here you can easily understand your past
data so with every dip and a peak there
is a business reason attached to it so
you can understand this with respect to
time for example some festival is there
and you're selling chocolates so your
sales will increase during a festival so
you need to think about the seasonality
part also now don't worry guys we'll be
having a complete discussion on
seasonality as well so now coming bad it
also helps you to plan the future
operations
so you can analyze the past and then you
can forecast your future using this
algorithm that is time series analysis
now apart from all these we can also
evaluate current accomplishment so this
means you can deter my
which goals you have met in the current
scenario let's say you have predicted
okay I am going to sell around hundred
chocolates in a day but didn't you
actually do that so all of this can be
analyzed using time series analysis
moving ahead let us see the different
components of time series now most of
time series have trained seasonality and
irregularity associated with them and
some of them do have cyclic patterns
also but it is not compulsory that there
has to be pattern present so let us
discuss each one of them in detail now
the first is train the trend is nothing
but a movement to relatively higher or
lower values over a long period of time
so when the time series analysis shows a
general pattern that is up firt we call
it an uptrend also if the trend exhibits
a lower pattern that is down we'd call
it as a downtrend and if there was no
Train we call it as a horizontal train
or you can say a stay steady train so
now let me explain you better with an
example so there is a new Township that
has been constructed okay and people are
going to come and live over there so
what happens a hardware guy comes up and
opens up a shop there so people will be
coming up will definitely buy a stuff
from there now once all these houses are
settled up or it's been occupied the
mean of hardware reduces so the train
may go down so let's say the sales were
up in the first year and buy another one
here or maybe in six months it has gone
down so that is the trend guys so for
some amount of time selling was high and
then it got down but this is not a
pattern this is something that is
happening here on here but trend is
something that happens for some time and
then it disappears then we have
seasonality so Hill season and it is
basically upward or downward swings but
this is quite different it's a repeating
pattern within a fixed time period so
for example Christmas happens every year
2050 simple let's say you're on the
business of chocolates so every year on
year chocolates are served more and more
in the last week of December now this is
because Christmas is there and you've
been able to sing this across to you
that is from past two years four years
six years ten years and so on so it's a
repeating pattern within a fixed time
period while in trend that is not the
case now let me take another example
let's say ice cream this time so ice
cream sales will go comparatively higher
in summers rather than in winter so that
is again a seasonality
then we have irregularity or it is also
called as noise so these are erratic in
nature or you can say unsystematic it is
also called as residual so this happens
basically for short duration and is non
repeating so here let me give you an
example so let's say there is a natural
disaster let's say there is a flood in
your town out of nowhere in one year now
a lot of people are buying medicines and
oil meant for relief but after some time
when everything is settled up the sales
of those oilman's have gone down so this
is something that no one could have
predicted it's going to happen
erratically you don't know how much
number of sales are going to happen so
you cannot force you about the event
that the flood is happening okay so this
is some random variation so this is what
a regularity is now moving ahead we have
cyclic so cyclic is basically repeating
up and down movements so this means you
can go over more than a year so they
don't have a fixed pattern so they can
happen anytime let's say in two years
then fourth year then maybe in six
months so they keep on repeating and
they are much harder to predict now
moving ahead let's discuss when not to
apply time series analysis so first of
all you cannot see time series analysis
when the values are constant so let me
take the same coffee example over here
so let's say the sales of number of
coffee in the previous month were 500
then this month also the sales number is
almost the same that is 500 I wanted to
predict the number of sales in the next
month now in such cases where the values
are constant as in our case the number
of sales so 500 in the previous month
and then in this month also we have the
same number and now we want to predict
it for the next month so in such cases
where the values are constant time
series cannot be applied similarly if
you have values in the form of functions
let's say you have sine of X or cos of X
so for example in this case you have X
value and you can get the value by just
putting it in the function so there is
no point of applying time series
analysis where you can calculate the
values by just using a function now you
can apply time series to these as well
but again there is no point of applying
it if you have a formula before that or
the values are just constant so these
are the cases when you should not apply
it I am series moving ahead let us see
what is stationarity so no matter what
guys how much you try to avoid the
stationarity part it will
always be there in dying cities so here
time-series requires the reader to be
stationary so any kind of statistical
model that will apply on time series the
data should be stationary so let's
understand what exactly it is now most
of the models work on the assumption
that time series is stationary
now if the time series has a particular
behavior over time there is a very high
probability that it will have then it
will follow the same in the future also
the theories and formulas that are
related to stationary series are more
mature and easier to implement as
compared to non stationary series now
there are two major reasons behind the
non stationary of a time series so first
is train which is basically the wearing
mean over time secondly we have
seasonality so this is the variation of
a specific time frame but did you guys
get the answer to this question what
exactly is stationary or how exactly
Society is defined so stationarity
basically has a very strict criteria the
first one is it should have a constant
mean now here the mean should be
constant according to the time secondly
we have constant variance so again
beading should be equal at different
time intervals and thirdly we have auto
covariance that does not depend on time
so for those of you who don't know what
mean is I not go into the details but
I'll just explain you in a nutshell so
mean is basically the average then
variance is just the distance from the
mean so each points distance from the
mean should be equal and then we have
Auto covariance that should not depend
on time or it should be equal as well so
for example let's say you're standing at
time T okay and your previous time
period was P minus 1 or P minus 2 let's
say there are previous two time periods
so the values at P minus 2 or P minus 1
P they should not have any kind of
correlation between them which is
basically dependent on your time period
so that is nothing but auto covariance
so when these three conditions are met
then we can say at series is stationary
and then we can apply time series
analysis over it now to check the
stationerity in python we have two
popular tests now first is rolling
statistics and second is a DCF Oregons
are augmented by key for your test now
in rolling statistics we can plot the
moving average or you can say moving
variance and see if it varies with time
now by moving average or variance I mean
that any instance T
take the average or variance of a time
window let's say if you want to know for
the last year that is for last 12 months
or anything and also guys this is more
of a visual technique so you cannot
deploy this kind of stuff on production
but it is quite useful for the POC
purpose then we have a DCF or you can
say augmented dickey-fuller test in the
world of data science so Dickey for your
days which is another statistical test
for checking stationarity now here you
have the null hypothesis which is time
series is non stationary and once you
perform this test you will get a result
which comprises of a test statistic and
some critical values for different
confidence level now here it is said
that if the test statistic is less than
the critical value we can reject the
null hypothesis and say that the series
is stationary so don't worry guys I will
be explaining this again when we go to a
demo part but I hope you guys are clear
with what exactly stay steady and how we
can check the stationerity all right so
now let me just move on to my next topic
so now I will discuss what exactly is
ARIMA model now ARIMA is one of the best
model to work with time series data so
this is basically the combination of two
models that is AR plus MA and it's quite
powerful guys so once you combine both
of these model you get the ARIMA model
now your AR model stands for auto
regressive part an MA model stands for
moving average so AR is a separate model
MA is a separate model and what binds it
together is the integration part that is
indicated by I so air is nothing but the
correlation between the previous time
period to the current so what does this
mean now let's take this into
consideration that you are standing at a
time period t and there are previous
time periods like t minus 1 t minus 2 t
minus 3 now if you find any correlation
between p minus 3 and t that is nothing
but the auto regressive part so as i
told you earlier that there is always
some kind of noise or irregularity
attached in a time series so need to
figure out that noise in fact we need to
average that out now whenever we try to
average it out the cross and drop set of
prison in that noise smoothen out and we
can have average focused of that noise
you can actually never predict when a
next customer is going to come in and
buy hundred items at once so try to
soothin it up by taking its average now
ARIMA model has three parameters it has
p
it has Q and has D so P basically refers
to your auto regressive lags then Q
stands for moving average and D is the
order of differentiation so we have each
parameter for each of the models so if
we take the integration by just one
order then the value of D would be one
if we differentiate it in the order of
two then we have the value D equals to
two so that is how we can predict these
values PQ and D and each of them has a
different method to it so if you want to
predict the value of P you will be using
and PS EF graph that is nothing but a
partial autocorrelation graph then to
predict Q value we need to plot a CF lot
that is autocorrelation plot and D I
have already told you to make data
stationary we use some kind of
differentiation so the order
differentiation defines the value D so I
guess enough of theory part so now let's
quickly jump onto the demo and let's see
how you can implement all of these
things so now we'll have a look to a
demo and we'll focus the future so here
we have a problem statement with is a
line which has the data of passengers
across months so here what you need to
do you need to build a forecast to
determine how many number of passengers
are going to abort these Airlines at the
month level in the future so here we
have month or you can say dates so here
we have dates from 1949 till 1960 and we
have the number of passengers traveling
per month so now we have this kind of
data and we need to analyze what will be
the number of passengers if you have to
predict it for next ten years so now let
me just go to my jupiter notebook and it
is how my predictions look like so guys
this is my jupiter notebook pen i have
the code and we'll be implementing all
the things that we have discussed till
now so first of all we'll be inputting
all the necessary libraries so here we
have imported numpy then we have
imported pandas for data analysis part
and you can say data processing then we
have imported Madrid live for data
visualization creating plots and all
those things then in order to implement
matplotlib we have also written
percentage matplotlib in line for
jupiter notebook so not get a particular
plot open in a new window everything
will be there in your jupiter notebook
itself and then i have just defined the
size so now let me just run this
next what I've done I have imported my
air passengers data using pandas so we
have a function of read CSV in bundle
that is represented with PD so we have
substituted this in a variable data set
and then what we have done we have just
passed those strings in a date-time
format so here we have set our data
month wise so using pandas we have a
function to date/time so over here you
can specify a month and then you can
just set this as your index so here you
have index variable as month next what I
have done I have imported date/time and
then I have just printed the top five
values so now let me just run this
this is how my data looks like I have
month asthma index and then I have
number of passengers asthma second
column so this data have already showed
you in the presentation where I have the
data from 1949 until 1960 so I have just
printed the head of it so now let me
explain the pain so let's say I want to
know the last five data entries so here
we have data till 1960 and we have the
number of passengers next what we have
done we have simply plotted a graph
between them so guys in time series we
have date and we have another variable
so here my other variable is number of
air passengers so here we have date on
my x-axis and number of passengers on my
y-axis and then we have simply plot that
graph so now let me just run this so
this is how your data look like so here
if you notice you have a trend so our
next step is to check the stationerity
so I'll give you 10 second guys and
think whether this data is stationary or
not
so just think and give me a reply
whether this data is stationary or not
right Shivani so this data is
non-stationary so here you can see the
trend is going up so let's say if you
want to calculate the mean at 1951 so
here your mean will lie somewhat over
here and let's say we want to calculate
the mean of this year that is 1960 so
here your mean will be somewhere here so
here you can see that you have up for
train and the mean is not constant so
this tails mean your data is not
stationary so now I have told you guys
that there are two tests which basically
helps you in checking the Society of the
data so here we have rolling statistics
as well as we have a DCF let us go
through each one of them so I will be
first going to the rolling statistics so
here we have rolling mean and we are
rolling standard deviation so here as
you can see we have a window of 12 that
is nothing but the window of 12 months
so let's say we have Jan of 1949 and you
place the value of Jan 1950 with the
value of 1949 so this gives you the
rolling mean at a yearly level and you
have to do the same with the standard
deviation as well so in Python to
calculate mean and standard deviation
you have a function dot mean and you
have got STD so this will automatically
calculate mean and standard deviation so
now let me just run this
so here if you notice your first 11
roses na n that is not a number now this
is because we have guys created all the
averages of these 11 and given over here
and similarly you can do the same for
the next ones next if you just scroll a
little bit you see it's a long data set
and you have the same result for
standard deviation as well so it's the
same procedure guys average has been
calculated and then just give an hour so
here must be having a question by only
11 values are in here so over here we
have just given a window of 12 lets have
given a window at daily basis or you
have data at a day level then your
window size would be 365 so here my data
is at monthly levels so the focus will
be on monthly only now similarly if you
have data at day level then probably
your window can be 365 so I hope you get
the reason why I am giving the wind as
12 and by via calculating the mean and
standard deviation then what we have
done we have simply plotted this rolling
statistics bar so here we have the
original data which is just plotted by
the color view then we have the mean
data so here we have just plotted the
mean for what we have just calculated
above and then we have given the color
red to it similarly we have plotted the
same for standard deviation and we have
given a color black to it after that we
have just given a legend we have given a
title to it and now let me just run this
code for it so over here you can see we
have a plot somewhat like this so nice
blue line is my original data and as you
can see I have my mean in red and I have
a strolling standard deviation in black
color so over here you can conclude that
your mean and even your standard
deviation is not constant so our data is
not stationary so guys this is my
rolling statistics method is again a
visual technique so here we have already
concluded that this is not a stationary
data set now let me perform
dickey-fuller test as well so to perform
dickey-fuller test in python you have to
import from stats modeled or TSH scat
tools input a be fuller now this is the
function which has been provided in
Dickey fuller test so here I have a
function that is ad filler I have passed
the data set into it which is the number
of passengers and then I have just given
a lag which is equals to a I see now
AIC is basically a chi k information
criterion now what does this AI c mean
so a IC gives you the information about
what you wanted and I
Cirie's the exact values the actual
value and analyzes from the difference
between them so don't just worry about
these guys for now just think about this
as a metric and see what happens when we
just run this particular test so when we
run this we'll have values to test
statistics we have key value number of
lags that has been used and number of
observations used and then we have
printed the values in a loop so now let
me just run the cell as well
so this false statement will basically
pin all the values now I have a state
statistic value a p-value number of lags
use number of observation and we have
critical value at different percentages
so here your null hypothesis says that
your p-value should be always less so
here we have a very large value that is
0.9 so this should be somewhat around
0.5 so that would be a great thing also
a critical value should also be more
than the test statistic so here we
cannot reject the null hypothesis and we
can say that data is not stationary then
what we'll do we estimate the trend so
here also with the results of Dickey
fuller we got to note that the data is
not stationary then what we'll do we'll
estimate the Train so here what we have
done we have taken a log of the index
data set so index data set is nothing
but the data set which has index has
time or the data which has been set
monthly wised so here we have just taken
a log and let me just run this for you
now if you see here numbers on your
y-axis half gene because the scale
itself has change here we have taken the
log but here your trends remains the
same whereas the value of y has been
changed next let us calculate the moving
average with the same window but keep in
mind guys at this time we'll be taking
up with the log time series so again
we'll be having windows will show 12
that is nothing but the twelve months
and then we'll be just plotting the
graph with a long time series so here
data is already in the Log form so now
let me just print it
so here you can conclude that mean is
not stationary but it is quite better
than the previous one but again it is
not stationary because it's moving with
the time and this train is again an
upward train so we can say that the data
is not stationary again next what we'll
do we'll get the difference between the
moving average and the actual number of
passengers so we have mean and the
actual time series that we have now why
are we doing this now the reason is that
unless we perform all this
transformation will not get the time
series are stationary so now you must be
having a question as to whether it's the
standard way to make a time sea
stationary no it's not guys because it
depends on your time series as in how
you can make it stationary like
sometimes you have to take log sometimes
you might want to take a square of it
some time cube roots so it all depends
on data what it holds so here we're
going to log scale so we are going to
take MA and then subtract both of them
so here we have the log scale and we
have the moving average and then we have
just painted the head of it that is the
top 12 values then what we have done we
have just removed them na n values so
that is done by just typing drop na and
the brace you can write in place not
true and then just print the head of it
so now let me just run this so here we
have the month and we have the number of
passenger so here we have the numbers
which is basically the difference then
moving ahead I have purposely put an
actual code of this a DCF test so a DCF
is augmented dickey-fuller test so above
I have just applied a simple a DCF
function but this is the whole code guys
so you have to perform this whenever you
have to determine whether time series is
stationary or not so here I have defined
a function which is pair stationary and
I have performed both the tests I have
determined rolling statistics as well as
performed dickey-fuller test so over
here I have used the windows 12 and then
I have plot rolling statistics as well I
have performed the dickey-fuller test
ezreal so let me just run this and I'll
just land of action as well so now if
you see you have the original data as
blue lines then you have standard
deviation in black line and you have
rolling mean in red line so here you can
visually notice that there is no such
trend or you can say it is much better
than what we use to see earlier so here
we have rolling standard deviation and
we have rolling me
now let me see that a DCF results as
well so here if you notice your p-value
is relatively less in only cases we used
to have 0.9 something and where you have
P value at 0.02 now if you notice your
critical value and your test statistics
values are almost equal which basically
helps you to determine whether your data
is stationary or not so I hope by now
you got the idea between the
dickey-fuller test and the rolling
statistics text as to how you can
determine whether the data is stationary
or not next what I have done I've
calculated the weighted average of time
series now why I have done this because
we need to see the trend that is present
inside a time series so that is why if
you have calculated the weighted average
of time series so now let me just run
this I didn't get to know why I'm
talking about this so as you can see
here as the time series is progressive
the average is also progressing towards
the higher side so here your trend is
upward ants and keeps on increasing with
respect to time moving ahead let's see
another transformation where we have a
log scale and then we'll subtract the
weighted average from it so in a
previous scenario we have subtracted
simple mean but in this will be using
weighted mean and then we'll check for
stationarity so here we have just
subtracted them and then pass the
variable in the test stationarity
function that we'll just define it over
here so over here it will go through
both of the tests and then it will
display the results so over here I'll
just run the cell so over here you can
notice that your standard deviation is
quite flat it is not moving here and
there and in fact you can also say that
this doesn't have any trend also if you
notice the rolling mean it is quite
better than the previous one now let me
go see the results of a VCF test as well
so over here you have a very list value
of P that is P is equal to 0.005 so your
TS is again stationary which means that
your time series is again stationary so
here you can use both this
transformation to check whether your
data is stationary or not so now we know
that a data is stationary now what we'll
do we'll shift the values into time
series so that we can use it in the
forecasting so what we have done earlier
we have subtracted the value of mean
from the actual value now what we'll do
we'll use the function called a shift to
shift all of those values so here let me
just run this plot so this is how the
plot looks like now here we have taken a
lag of bun so here we have just shift
the values by 1 or you can say
difference your time series ones so why
is if you remember I talked about the
ARIMA model so a Rhema model has three
models in it that is the AR model which
stands for auto regressive then we have
ma model that is for moving average and
is for the integration so re model
basically takes three parameters and B
there stands for the integration part or
you can say how many times you have
differentiated a time series so here
your value becomes one now next what I
have done I have simply dropped the NA n
values so here if you just run this code
you will see that output is quite flat
so here your null hypothesis or the
augmented dickey-fuller test whaling
will take the null hypothesis is
rejected and hence we can say that your
time series is stationary now so here
you can say that you again have blue as
the original data you have red as you're
rolling mean and you have black as your
standard deviation so visually also we
see that there is no train presence
quite flat so here we can say that your
time series is stationary now let us see
the components of time series so here
you first need to import from stats
model to TSA or seasonal input seasonal
decompose so here your seasonal
decomposed segregates three components
that is train seasonal and residual so
here what we have done we have simply
plotted these graphs and let us see how
all these graphs looks like
let me just run this
so this is how your output look like
this is my original data which we saw
that there was a trend so this is my
trend line so this is going upward in
which you can say it's quite linear in
nature along with that we have
seasonality also present in high scale
so we have a seasonality graph over here
and then we have the residuals as well
so residuals are nothing guys the
irregularities that is present in your
data so they do not have any shape any
size and it cannot find out what is
going to happen next so it's quite
regular in nature now what we are going
to do we'll check the noise if it's
stationary or not so overlay we take the
residual and we'll save it in a variable
that is decomposed log data and again I
just pass it to the same function that
we have just created above which is test
stage theory and inside the stay
stationary function we have to test that
is rolling statistics and a DCF test so
now let me just run this cell and this
is how your graph looks like so looking
at the output visually you can say that
this is not stationary that is why we
have to have your moving average
parameter in place so that it's smooth
and set out to predict what will happen
next
now we know the value of D but how can
you know the value of P and Q that is
the value of autoregressive lags and the
value of moving average so here as I
told you guys we need to plot a CF graph
and P ACF graph so in order to calculate
the values of p we need to plot PS EF
graph and in order to calculate the
value of Q we need to calculate a CF
graph so is here basically refers to a
autocorrelation graph and a PS you have
stands for partial autocorrelation graph
so in Python we first need to import
these two graphs that is from stats
model tortillas a dot stat tools input
ACF and P ACF then using this function
ACF and PS EF Piazza's pass in a data
set and we have preferred a method that
is OLS so there are various methods but
we usually prefer OLS so where is his
ordinary least square method then what
we have done we have simply plot a CF
graph and we have plotted the PS EF Roth
so now let me just run this and let's
determine how you can calculate p-value
and Q value so guys this is my
autocorrelation graph and this is my
partial autocorrelation graph now in
order to calculate the P and Q values
you need to check that what is the value
where the graph cuts off you
or you can set drops to zero for the
first time so if you look closely you
have it touches the confidence level
over here so here if you see your
p-values almost around two and similarly
if you look at this graph you see that
it cuts it over here or drops to zero
over here and then the value of Q also
becomes two so this is how you can
calculate the value of P and Q using PS
here graph and a CF graph next we have
the value of P if you have the value of
Q and we have the value of D so what we
can do we can simply substitute these
values in the Rhema model so here what I
have done I first imported the model
ARIMA and then using the function edema
I have the order listed over here so I
have P value as - I have differenced it
1 so my D value becomes 1 and my Q value
is again 2 so here I have just plotted
the graph and then calculate the RSS
which is the residual sum of squares so
here let me just run this graph
so here you can see the residual sum of
square is quite good that is one point
zero two so here you have plotted the
values of P Q and D as two two and one
now you can also play around with these
P and Q parameters now let's say I want
to change the parameters to two one zero
so if I do that let me just run this
again so here if you see once I have
just changed the value to two one zero
my RSS score has been increased so
greater than our essence the bad it is
for you now let me again change it to
zero one two now in that case also my
RSS has been increased to one point four
so here you need to take care of the RSS
part so the greater the RSS the bad it
is for you so here we'll just revert
back to 2 1 2 wherein we have the value
of P as 2 Q as 2 and we have taken only
one difference so the value would be 1
now let's take the moving average model
in consideration so here a p value is 0
now for our model you have to do 2 1 0
next for a our model what you can do you
have to do 2 1 0 wherein you have the
value of Q as 0 so here I have 2 1 0 and
let me just run it for you
so here you can see that if RSS has
again reached 1.5 now we have seen that
with respect to a R that is your auto
regressive part your RSS is 1.5 now
affair again go to ma wherein I have the
values as 0 1 2 the RSS score is 1.4 so
here we conclude that with respect to
auto regressive part we have the RSS as
1.5 with respect to moving average we
have the RSS is 1.4 and if we combine
both of them and make a rim out of it
that is this part that is 2 1 2 we have
very less RSS so let me just run this as
well so here when I substitute the
values as 2 1 2 that is P and Q value is
equal to 2 and D we have taken as 1 so
here your Rima model gives you RSS of 1
point 0 2 which is quite good next what
we'll do let's fit them in a combined
model that is ARIMA so here we have seen
that with respect to a R we have RSS is
1.5 with respect to ma that is moving
average we have RSS as 1.4 and when we
apply the combined model that is ARIMA
the RSS or you can say the residual sum
of square is dropped to 1 point
so here let's do some fitting on the
time series on what data we have so here
we have just converted the fitted values
into a series format and then we have
just printed the head of it so now let
me just run this so over here we have
the month as well as the predictions
over here next what we'll do we'll find
it the cumulative sum and then we'll
find them and then we're going to have
the predictions done for the fitted
values so now - Cal - the cumulative sum
we have the function called has come sum
and then again we have just printed the
head so this is my result and finally
we're going to have the predictions done
for the fitted values and then we have
just printed the head of it so now let
me just run this next few also keep in
mind that after performing these
transformations we also need the
exponential of the whole data so that it
comes back to the original form from
where we have just started using it so
in order to know the values in that form
you need to take the exponent of it so
these are the three steps which are very
important for data transformation so
you'll be finding cumulative sum we'll
do the predictions and we'll and we'll
also calculate the exponent of it so as
to get your data in your original format
now after that we just plot the actual
values to how our model has fitted so
now let me just run this so you can see
that the orange line is basically the
model that we have fitted and here you
can see at only the magnitude is varying
whereas the shape has been properly
captured by the Rema model now how we
can do predictions guys now there is a
function in Python that is predict now
before predicting the values let me
first see is my data that how many rows
are there in Benares a so this is my
data set name so now let me just run
this so here we have the data set from
1949 we have the number of passengers it
will go on to 1960 and we have 144 rows
into one column so here we got to know
that we have 144 rows so what if I want
to predict it for next ten years so what
will be my prediction now here you have
to see that how many number of data
points would you want so let's see if
you want to grid it for ten years so the
number of data points would be 120 that
is 12 into pen so here if you want to
predict it for 10 years you have 120 so
using that plot dot predict function I
can actually predict the future so here
using this function I'll give the first
index of the time sees and then the
number of data points you want the time
series flow so
I have 144 rows plus 120 because I
wanted for 10 years so 144 plus 120 is
equal to 264 so I'll write it over here
now let me just comment this for now and
let me just run it
so over here if you can see it my blue
is the forecasted value and this gray
part is your confidence level so now
whatever happens or however you do the
forecasting this value will not exceed
the confidence level so this is how you
can see that for the next ten years you
have the prediction somewhat like this
so this is how you can do prediction and
if you don't want to see the graph you
can actually write in the data point so
here I want the prediction for ten years
so I have just type in the steps that is
equals to 120 and you get the result in
an array format so that is how you can
perform a lot of operations with this
data and predict it for let's say six
months 12 months next year 10 years and
it's totally up to you guys whatever
topics that I've covered I hope these
are clear to you so now let me just go
back to a presentation and let's see
what all we are left with so here we
have just build a model wearing we have
forecasted the demand for the next 10
years so in a data set we have the date
in the monthly basis and we have the
number of passengers so that's all for
today guys now let me just recap what
all we have covered till now so we have
started off by discussing what exactly's
time series and we've also gone through
the various components that are trend
seasonality cyclic and irregularity then
we have understood what is stationarity
and one of the different tests to check
the stationerity of the data then we
discussed one of the best models which
is used in the time series analysis that
is the ARIMA model so here we have
understood that ie my model is a
combination of three models that is the
AI model which stands for auto
regression we have MA for moving average
and i's for the integration part and
then we have implemented all these
things and we have forecasted the data
for the next ten years so I hope you
guys are clear with whatever concepts
that have taught in the session so do
you guys have any questions or any
doubts with respect to any other topics
that I have discussed till now all right
so I don't see any doubts over here all
right no problem guys this takes time so
just go home just practice just go
through the code again practice as much
as you can and in case you have any
doubt or any error you can always come
back to me or you can simply ask me in
my next session I hope you guys found
the session informative well thank you
so much bye-bye
I hope you have enjoyed listening to
this video please be kind enough to like
it and you can comment any of your
doubts and queries and
we will reply them at the earliest do
look out for more videos in our playlist
and subscribe to Eddie Rica channel to
learn more happy learning
