Welcome to this video lesson on Data Modelling.
After completing this video, you will be able to
define data modelling and its importance,
model solution steps to the problem statement on EPL,
and predict the winner of the current EPL season.
In reference to Data Science,
modelling means formulating every step
and gathering the techniques required
to achieve the solution.
All the calculations cannot be performed
at once or parallelly.
You need to list down the flow of the calculations
which is nothing but modelling steps to the solution.
The second important factor
is how to perform the calculations.
There are various techniques
under Statistics and Machine Learning
that you can choose based on the requirement.
For EPL data, we chose
to use statistical techniques
to predict the winner in the current season.
Based on the data analysis,
pattern and insights drawn,
every data scientist can formulate
different steps to solve the problem.
For illustration purposes,
we can formulate steps as shown here.
Let’s get started with the first step.
As per the data of the last three seasons,
we have learnt that the winner of EPL
is changing but the top 6 clubs remain the same.
These clubs are Arsenal, Chelsea, Liverpool,
Manchester City, Manchester United
and Tottenham Hotspur.
The winner this season is very likely
to be from these top 6 teams.
We will perform all our modeling on these six clubs.
As we know, in football
a player can be an Attacker, Defender,
Midfielder or a Goalkeeper.
Each of these positions requires
a different set of physical skills.
A player is assigned a score based on
his physical skills and
the position he is playing in.
The scores are given on a scale
of 0-100 as shown here.
For example, a goalkeeper requires skills
such as diving, handling, positioning, reflexes etc.
Similarly, for other positions,
skills are shown on the screen.
Each of these skills is assigned a weight.
Various skills and their weightages
are collected from EA sports.
We have also collected the players’ names
of the top six clubs for the current season
from the official premier league website.
Let’s take the goalkeeper from Manchester City,Ederson.
His ratings for goalkeeping skills are as follows.
You can calculate the overall rating
of Ederson by multiplying the ratings
with their respective weights
and taking the sum, which gives Ederson
an overall rating of 84.08.
Similarly, this is done for all the players.
By now we have the overall ratings
of all players in each club.
To calculate the rating of the club
we need to consider the overall ratings
of the 11 team members.
We have assumed 3 attackers, 4 Midfielders,
3 defenders and 1 goalkeeper for each team,
and selected the player based on play time
in the previous season.
Based on this selection, the ratings
for Arsenal is as shown.
To calculate it, you can take the average
of the ‘Rating’ column.
This gives Arsenal an overall rating of 80.01.
Similarly, the overall rating
of the other 5 clubs is calculated.
You can refer to the calculations
in the upcoming excel document.
In the next step, we rank the performance
of the teams based on the number
of wins and difference in goals.
Considering 80% weightage to the players’ skills
and 20% weightage to the past performance of clubs
we get the final ranking of the top 6 clubs.
Based on these calculations,
Tottenham Hotspur is likely to win this season.
In the next section, you will learn
how to present and communicate your analysis
using visualisation methods.
