okay guys very good evening we will
start with her the beginner's guide to
Kaggle and in this session the agenda
would be the introduction what is kaggle
and what are the typical data science
problems are and how to get started with
kaggle and tips for you know participating
and enjoying kaggle competitions are with
the summary that is the key takeaways
okay
Kaggle is actually a kind of a product
which is from Google Google have got
Google's one off the
competition platform is Kaggle and there
is always a question or do I have
necessary skills to take part and
catechol competition you should have
specific qualification that you know
helps you to take part specifically in
competitions like a Kaggle data science
analytics it is not just goggle you have
lot other competitions that is in the
industry as well okay kaggle is more
popular popular the reason behind us you
know it is backed up by Google and you
have a very big community to do that
okay so probably any many of you have
come across this question what is the
necessary skills to take part in the
Kaggle competation we will just you
know look into it you know little by
little see people generally they have a
fear okay just by you know the level of
difficulty it offers okay it's something
like you know you have to take it up
first it is something like you go into a
swimming class where you have a lot of
fear on the water on the first place
okay
but make sure that once you learn till
the moment you don't step into the water
you cannot swim the same thing you have
to get into it and before getting in
there is a set of protocols that you
have to follow okay so it's it's it's
all the similar philosophy okay it's
it's it's home of data science I would
say like Kaggle is a home of data
science which provides a global platform
for all the company competitions as well
as for the competitors and customer
solutions there are a lot of companies
which says like you know please solve
our problem
take the prize money something like that
and you have a lot of job board as well
there is a certain section which says
job board and you know they take your
you know participation thing you know
would beat rank which kind of problems
that you are participating which level
of the you know difficulty that you are
taking up based on that you know people
will find you as well
you have recruiters there you have
product companies who are giving their
problems to solve it for you and you
have competitions you have competitors
you have lot more things to do and there
is a caption candle okay these
competitions not only make you think out
of the box but also offers a handsome
prize money Netflix competition it was I
think like it's half a million or more
than a million dollars okay so you have
a very good handsome prize money it is
not about winning the money or not it is
all about participation and how good you
build the model and everything there are
a lot of hesitations from our data
scientists to you know participate in
Kegel goggle competitions
you know the reason behind us you know
they feel that they are belittle in
their level of skills knowledge and the
techniques that they have acquired and
irrespective of their level of skill
sets they choose the problem offering
the highest prize money okay so and one
more thing is like if they almost you
know they fail to equip okay their level
of skill set with the difficulty level
of the problem there is always a problem
with the difficulty level like you know
low medium high and everything so this
issue you know is having Cagney by
itself is having a lot of problems with
these kind of you know skillset that
people have to acquire okay so there is
no specific thing that you have to have
all this okay and Kaggle it's skagle
calm if you have some time please go
ahead and create an account okay it
doesn't provide any information which
can help people to choose the most
appropriate problem okay
what it cannot do that and it does not
it is impossible to do that because it
comes from various resources okay and
and also it is not first of all it is
not providing any information which can
help people to choose the most
appropriate problem and again it you
cannot say that this skillset is what is
required to solve this problem there is
nothing called that okay so what happens
is like it becomes a very difficult task
which kind of problems to go for and
especially for the beginners and
intermediate say if you're an advanced
stage here you can you know what kind of
data said that you deal with and you
know how to wrangle with the data's and
everything but in the beginning you know
you don't have that you know can do
lessons have kind of prerequisites or
skill sets required for any particular
data set so as a beginner you always
have a problem stating that yeah this is
something like which I have to think
about it okay so it's something like
that but but these you know it's you
have to start first and before you get
into it okay so if you I'm just you know
showing some examples passenger
screening algorithm this is given by US
government's homeland security threat
recognition algorithm and 89 teams has
been there 100 and oh it's not even 100
it's 100 1 million and one and a half
million dollars and you see Zelos home
value prediction just he made that
surname home for company and you have
Amazon you have instacart which goes
with MBA market basket analysis so which
products will and in stock are tokens
consumer purchase again if you go ahead
and build a model if they like your
model if it is coming with the high
accuracy with everything yep there you
go you got oh you know $25,000 there
they go with at least first till seven
price okay and when I say a team they
form a team you know or they form a team
across the globe probably one will be
good in math one will be good in coding
one will be strong and algorithm a
concepts one will be good in model
building and I know increasing that
curacy so it's a team where they will
have a different skill set
everyone knows data science in depth
everyone
machine learning but some will have you
know some will be very good in
exploratory data analysis some will be
very strong in coding some will be very
strong at math so they form a team in
such a way that like you know people
yeah you can even form a team and you
can just go ahead and participate in a
glass web ok so let's see how how it
goes okay you have lot of Kaggle
problems you know even from from Titanic
you have you have lot of problems the
digit recognize your bag of words you
have at an NLP you have it and even
crime classification taxi trajectory
Facebook recruiting you have a lot okay
so to start with here how do I even
start
will I be against the teams of
experienced PhD researchers or as you
have a lot of questions right you know
the first kaggle problem you should take
up is something very simple something
like a taxi trajectory prediction reason
being the problem has a complex data set
okay problem has a complex data set
which includes JSON format please
understand that you there is a different
for there is an entire total different
concept of data acquiring itself you
have to acquire the data and the data
could be of any form it could be an XML
HTML jace and videos audios anything
okay JSON format and then you know which
includes the JSON format in one of the
columns which tells you set of
coordinates the taxi has visited you
have lot of things here okay and if you
are able to break this down whatever the
problem statement is getting some
initial estimate on you know what I have
to do what kind of target destination
I'm looking for or even the time okay
then you can go with that hence you can
always use your coding strength to find
your value in this industry always make
sure you have something called a coding
background you should have a very good
coding basics so that you can understand
which library to use to what level I can
get into it okay so then you can you can
go with you know what even if the
competition is closed it's okay fine you
have the data sets there you have the
approach there you just try started
working in it
okay you take step by step then you go
to you know some for example at this
Titanic disaster data said you can you
can you can understand how to handle
complex datasets and how to do some
predictions and then you might go with a
very hard one like a Facebook recruiting
okay this will help you how
understanding you you could come up with
you know how domain understanding can
help you get the best out of machine
learning domain understanding is really
important first the coding ii complex
data set and third one would be
something like you know your domain
expertise once we have all these pieces
in place you are good to try any problem
and cattle coding and then you are
breaking up the data set the complex
data sets and then you go with domain
expertise if we have these three yeah
you you would be you would be you know i
would say that there is nothing wrong in
in participating in cattle okay so yeah
cattle versus tape typical data science
what happens is you know since it is
coming up with prize money's problems
must be difficult solutions must be new
performance must be at elective which
means that you should have a noble
architecture which has not been done
before or else why would they give 100
or 100 thousand dollars or 1 million
dollars it's almost 7 crores and problem
must be it will be always difficult
please try to understand that if someone
has come up with eighty seven point zero
seven five percent accuracy in a cancer
or detection data said even 0.01 makes a
lot of difference there please
understand that it's not that easy to
get into it okay it will be always you
know even point 0.01 0.02 everything
matters a lot there so you have to be
very careful and the solutions the
accuracy and whatever you are achieving
should be know well when i say no well
it is new the architecture should be new
I'm not asking you to innovate something
that is totally different but you are
taking an approach which has this comes
with lot of knowledge when you have
which algorithm
use usually people go with xgg boost
adaboost can't boost this is actually it
will boost up the accuracy okay
so accuracy plays a very important role
here the person who got some maximum
accuracy with the very good model yeah
they there they go and always you can
compare which one is you know who is
performing better I have a question one
yeah you can you can have it you can
just contact their intake you know I
will I will I will ask them to share it
with you no issues you can contact us
so then so then typically the latest
things in typical data signs you know
you have lot of support actually you
know you have because this comes with
the time frame this comes with the time
frame say about 10 days 15 days
sometimes even 20 days this comes with
the time frame this comes with a lot of
rules and regulations and they give
importance to the high level accuracy
okay but in the real time there are lot
of other factors that has been done okay
you have to find a solution which can be
used in the industry and for the
deployment as well because you are not
finding anything and just leave it as
such it is not a research work where it
is not just a research work where you
research and keep it you have to take it
it up into the production as well most
of the research work that has been
happening today where it goes to the
production so and also your performance
should be really good okay so what
happens is previous benchmark yeah take
down this cancer detection data set
someone has received a bit reached a bin
benchmark of detecting cancer of say
will go with a normal classification
using a neural network of convolution in
your network and someone have achieved
an accuracy of eighty six point zero
seven five even if you achieve 86 point
zero seven six that's a very big
achievement okay so that is specific to
the data set up some data's it's you
know it's not 86 to 87 that is really
really hard
if the benchmark of a highest accuracy
is 86 point zero seven five even if you
achieve 86 point zero seven six that is
a very big achievement in cancer
detection data set so what happens is
and even yeah as you see the kind
accommodation encourage you to squeeze
out every last drop of performance while
typical data science encourages
efficiency in acts amazing business info
this is perfectly true the reason behind
us like you know oh it's not like a
gloves not bothered about efficiency in
business impact the reason behind is
they want something in a very high
accuracy they have people to take care
of for another deployment and everything
but in the real world industry it's not
the case
you cannot just over fit the model and
get a 95% accuracy and deploy that is
not at all possible okay you have to do
all the fine-tuning hyper parameters you
have to check the accuracy build with
lot of other models compile the curacy
then whether this model can be taken
into you know further production or not
how to choose a model for deployment
there are a lot of things to be you know
taken into consideration okay this is
skagle worth it okay yeah I would say
there were there are see nowadays
recruiters are not looking for any now
career monster they look for people in
LinkedIn github repo and cackle you
should definitely have these accounts
where it is not like they wanna you know
they wanna you know data sense is a very
vast industry okay it's a very vast
industry instead of you speaking to them
you have to show them what you have done
that's that's something like that's what
industry requires if you have a github
repo for example you will put all your
data sets your codes there is a readme
file which gives an overall you know the
objective and everything on your you
know your model then you come up with an
accuracy your results and everything so
that's how it works
you have to showcase your skills in such
a way that you are you have the ability
to do it okay you have to prove it first
that you have did it and then that is
the kind of an icebreaker for
recruitment and there are a lot of
people who have got just limited
recruited in Kangol okay probably you
will not be the first 10 ranks you might
be in a 22nd or 23rd right but the
approach that you have taken the
algorithm that you have chose the path
that you have gone with the way of EDA
everything matters a lot okay there are
a lot of companies who recruit directly
from Kari glassware so it it will give
you an even if you showcase in your
resume that you have participated in
casual competitions and you know you
went to this level and you ranking is
this and your accuracy was this people
are really happy to you know understand
much more on the projects that you have
done because it is a practical world you
ha you are dealing with the real-time
data set and on a public forum and you
prove yourself that you have the skills
to participate in the first place okay
so this is something like really good it
is always really good for the bigness
Alba as well as for intermediates okay
as you see here each competition is
self-contained practice is practice the
one leeway to learn data science is to
do data science there is no other way
you learn the theory and you start doing
data science that's the only way it's
not just learning and coding you have to
deal with the data as its various number
of data sets by your own you can create
a self of you know a self prepared
projects and you can just showcase your
skills these are all the dataset is it I
have chosen this is the path that I have
taken this is my github repo to show my
codes and everything yeah there you go
you can just give the name of the
project and you can give the github repo
they are very much happy to check the
github repo and get back to you what
kind of approach you have taken so you
this is an industry where you have to
prove yourself to get even to enter into
a kind of an interview procedures or the
process okay no one wants in my opinion
no one has bothered about certification
please understand that no one is
bothered about certification no one is
bothered about your you know previous
background or anything if you are able
to prove yourself yeah there you go then
you have an option so people want you
know recruiters want someone who have
already proved their ability that they
can do something yeah that is what the
industry requires as you see here huh
for catalysis typical data science
competition so these are all kaggle
winner interviews this is an ear this is
I have we have lot of yeah you have to
read lot of blocks actually when you
read lot of blocks these guys who won
the competition will write what they
have gone through in their thing it will
really give lot of insights even you can
write a blog after participating it will
help a lot of others ok so this is this
will increase your storytelling concept
chance of what kind of data sets that
you are looking for what kind of
competition you went in what were your
teammates how did you find them how
you deal with the EDA how did you you
know then through the entire procedure
until last oh this will really help
others and as well as you to understand
the biggest very much understanding on
the in-depth problem of data science and
the Cardinal competition okay as I said
before you have to start picking up a
programming language on the first place
and I would always prefer Python okay
I'm not saying R is not good you know
yeah when compared with Python yeah R as
little bit less importance the reason
behind of the libraries that they have
okay we always and one more thing is
like R is not so intuitive like Python
Python is very intuitive where you can
understand the codes as well as the
logics much better than R that's one of
the biggest advantage in Python that
anyone who can read write speak English
can learn Python I'm not saying like you
can do it in you can do it in five days
or 10 days okay
I'm saying it it takes some time but
please try to make sure that you are
doing something good okay I understand I
understand you know we in the previous
session I was talking about Python I
understand what python is used for get
the basics strong yeah it doesn't matter
whether even if you spend two to three
hours more nothing is going to change in
this world into our species understand
there okay no one is running on a
competition and data science and machine
learning is not a sprint race where you
can complete in in a month and get into
a job it's a marathon it's a long time
learning the more you learn the more you
practice the better person would be
people need someone who have a very good
understanding of the concepts good
coding knowledge and who have already
proved themselves in dealing with large
number of data sets variety of data sets
you might be from any background if you
are from finance background and if you
are looking only for the finance kind of
data sets go with stock predictions
there are a lot of predictions using
credit cards banks whether the loan can
be approved or not pick those rate as a
date as its I have taken a session on
the you know data sets which is
available
you see I can go with hub ripple you
have a lot of lot of places where you
can take that real-time data said play
with that create your own or your own
repository make make the project's
little bit looks good then you know
start doing it but picking a programming
language I would always start with
Python me I strongly recommend Python
because it's a high-level programming
language object-oriented interpreted
language it gives a lot of options with
the libraries and everything so start
with Python that is a big big you know
advantage for you yeah
EDA we call it as exploratory data
analysis exploratory data analysis is
nothing but you know how do you deal
with the data on the first place okay
anyone can deal with the data it's it's
not like you know or you cannot you have
to have a strong understanding of
everything yeah you can deal with the
data unless and until you have a strong
understanding of the variables variables
or or even your you know data set
understanding so unless and until you
have a strong understanding on your data
you cannot explore anything and Edie is
a very very important part X you know I
would say out of 100 400-person of the
time for example you have to build a
model which takes three hours you will
be spending close to two two to two
hours and 15 minutes only on ABA okay
there are no shortcuts please understand
that there is no shortcuts for data
exploration if you're in a state of mind
that machine learning can sail you away
from every data storm trust me it won't
it doesn't do that you have to get into
the basics after some point of time you
will realize that you are struggling at
improving models accuracy the reason
behind is you don't have a strong
background in statistics probability and
coding get the basics clear okay in
whatever situation it is data
exploration techniques will come to your
rescue if you want to improve accuracy
if you want to build a better model I
can confidently say that because
been through lot of situations where it
DDA helped me a lot in building a model
which optimizes my errors improves my
accuracy improving the performance and
everything so it's it's something like
you know it's it's it's about four or
five step process you have to do a data
exploration and preparation you should
have you should treat your missing value
you know why data has missing values for
how-to methods to treat the missing
values and then the third one would be
the techniques of outlier detection what
does an outlier on the first place I
have a dataset which is age group of say
about income range between two to five
lakhs and 90% of the data is in between
two to five lakhs but just the 10% is
about in lakhs so they are main outliers
how do I treat them what are the types
of outliers what causes out layers what
is the impact of the out layers on my
data set and how to detect this outlier
and how to remove this outlier all this
comes into outlier detection then there
is a there is something called feature
engineering it's an art okay what are my
features saying that what is my variable
telling me what all the common methods
of variable transformation what is the
feature variable creation what all the
benefits so you have to explore in
detail in detail if you do that yeah
these four steps will help you a lot on
the long way run but at the same time
please make sure that you are very
strong in this categorical values
continuous variables Z test T test I
square test analysis of variance ANOVA
and Co analysis of covariance and while
coming to the missing value treatment
also you have lot of math there so if
you understand all this math and then if
you understand the process that will
really help you in EDA and trust me
EDA is not easy media is not at all easy
in the same way when you come to future
engineering like you are trying to
extract more information from existing
data when you torture the data you get
more information please understand in
that way the more you torture the data
the better information would be so
future engineering is the science of
extracting more information you
not adding any data on the first place
please understand that but you are
actually making the data you already
have more useful I have a data I'm not
going to add in I can't add any data
that is my X X is my independent
variable that is my input variable which
I cannot do anything but at the same
point of time you have to make sure that
you are making a more you are taking out
more information okay making the data
you already have more useful you already
have a data but you are want to have
more useful information from that that's
the exact a core principle of feature
engineering let's say an example okay
let's say you are trying to predict
something in um say food fall in a
shopping mall based on the dates if you
try use the dates directly you may not
be able to extract meaningful insights
from the data please understand that
this is because this footfall is less
affected by the day of the month than it
is in the day of the week day of the
month will differ day of the week will
differ so now this information about the
day of the week okay is implicit to your
data now you need to bring it out how to
make your model better okay
these kind of exercises bring you lot of
information from the data when you
extract lot of information from the data
that is what future engineering is all
about you would transform the variable
you do the variable future creation
there is a lot of steps in it okay it
could be a long linear and you know you
always there is the data is always
complex nonlinear relationship and you
are turning it into a linear
relationship your you have to do all
this and then you take logarithm you
know because you know variable
transformation is really important you
cannot multiply kilogram with you know
say for example miles or litter with
someone's height that is not possible
okay so you have to find lot of there
are a lot of methods used to transform
variables like you know you take square
root cube root you logarithmic bending
reciprocal okay so you you have to
understand that to take a logarithm
there to take a square and cube root but
you do thinning
so all these will make sure that you are
transforming a data on a common place
okay and then you know how to how to do
a feature and variable creation and you
know you are you have to generate new
variables features based on the existing
variables generating innocence like you
cannot add it just by the way that you
are generating something from the
existing one okay say for example we
have a data which is date month and here
as an input variable in a data set we
can generate new variables like day
month year week week me hour okay that
may help better in understanding the
relationship between the variables this
are you understanding this concepts is
it something like very tough to
understand
I'm not getting there can you please
explain where to use this you will be
using it while you are dealing with the
data set I'm sorry
is there any doubts here guys before we
proceed further
you
yeah you you use it to know almost all
the data types that you have yeah you
have to use it thank you
what happens heroes yeah as I said
before we have something called a date
which is in the date month and here that
is your input variable but your you you
know you are making new variables like
date separately month separately year
week we'd be week and you know it's from
the same data but you're finding lot of
information from that um that could be a
better you that would be helping you in
a much better way to find the
relationship between your target
variable and the input variable right so
that's that's all you need okay and one
more thing is like the the quality and
the effort that you put in data Explorer
exploration differentiate a good data
scientist and their average data
scientist ethic there is nothing called
bad okay so a person who is spending a
lot of time in EDF will be a better
person
yeah you can train test and check the
accuracy in two three lines of course in
Python there is there is nothing to do
with it you have sky kid learn you have
lot of you know you have lot of
algorithms you take support vector
machine do you take a random trees
random forestation tree you take
anything yeah your algorithm is again
the back end you have sky Caitlyn to
support it but when it comes to EDA you
are on your own you have to do that the
entire processes three years I'm just
giving an example guys it's not just
three hours so if it is three hours you
will be spending two hours nothing less
than two hours only on a da okay so make
sure that you are very strong in the
statistical concepts yes the next before
coming into Kaggle do you recommend you
to train a model that is easier more
manageable
don't get into GB and GB of data go with
a little bit just 20 MB or 50 MB or even
KB of data try to wrangle the more you
wrangle with the small kind of data sets
when you deal with the large data sets
you know what to do and what not to do
okay anyways there is always a splitting
yeah the splitting is 8020 as a thumb
rule 80
of your data set goes for the training
and 20-person goes for the testing okay
we cannot delve deep into
cross-validation over footing and a
performance metric but you have to make
sure these terminologies or will
understood now getting started with the
competition how do I get started with
the casual competition on the first
place yeah you have it's not that easy
to get into caracal competition I'm not
making you afraid but at the same point
of time you have to understand that this
is not everyone's cup of tea unless and
until you have a strong domain expertise
and a very good
you know coding skills understanding of
the algorithms and everything okay
you have lot of sponsored by companies
organization most of the times by the
governments and they have largest price
as well okay so there is something
called you know you have something an
entry level or a beginner level or even
in the intermediate level go choose
Jupiter notebooks okay
Jupiter notebooks go with Python and try
to use pandas numpy libraries and use
anaconda framework these will help you
to you know get into you know it will
help you your job easier to start with
okay the first thing is today what you
do is like first join the open a
Cadillac or create a user account on
kaggle it is the world's largest online
community of people working in AI
machine learning and data science please
understand that it is a very large
community by Google okay and now the
cattles Titanic service competition is
one of any cattle newcomer should start
with if you stand with please stand with
the Titanic survivors
data said you know is it's always open
there is nothing called the closed there
is something called leader board which
periodically cleans up and it's a
straight forward to follow and easy to
understand you have something like rules
and regulations you can just read it and
even though the competition is over or
not that's not a problem please try to
triangle with the data try to download
and learn just to learn and in case if
you have any queries there is
stackoverflow to get you answers and
then just you know if you are absolute
beginner for everything please do not do
self loading it doesn't works in data
science it doesn't works in data science
get some expert advice get some expert
advice I'm not advocating you to get
with any specific thing but please get
expert advice join some online or
classroom sessions for a couple of
months or weeks then try to get that
glitch then you work on your own that
would be the real good start
self start a self learning is not that
easy and people don't or it is not that
easy to do it because the data sense
domain is very vast and you will get
lost in the middle I'm serious about it
you will get lost in the middle unless
and until you are already at data
engineer or a data analyst or a business
analyst data science is not that easy to
you know or do would like myself
learning okay and also when you are
dealing with these kind of datasets you
know you know from the training set of
examples the listing passengers you know
you got lot of variables that you can
understand okay you can create there is
something called my work folder and you
can save it you you have to you know
it's a step by step procedure okay you
input your libraries you load your
training and testing data set you do EBA
okay and then you find out the
correlated data it's a light of you know
it's a kind of submission then you do
you know you do the submission in the
data submission on the candle okay there
is there is a lot of things that happens
in you know Kaggle so it's one one thing
is like choose a data set which everyone
or start with no to my knowledge
everyone starts with this Titanic
service because it teaches you a lot of
things it not just teaches you the
future and ABA it teaches a lot of
things and you you can actually you know
get a lot of knowledge as well yeah as I
said before research and recruitment is
always there there these are sponsored
by companies who want to hire data
scientist these are still relatively
uncommon but to my knowledge it is
common I have seen a lot of people
getting recruited by
from a Kangol okay so here attacking the
getting started competition you can see
all these problems of six thousand
almost seven thousand teams have taken
it's almost posted three years ago you
can just click onto it you have
something called you know about the
competition the rules and regulations
the Colonel's then everything you can
just wrangle through that then you will
be able to understand that okay
and even there are many ways to learn
and practice applied machine learning
Kaggle is a very good way to learn it's
a kind of fun it's not a core machine
learning it's applied when you apply
your skills to real-world community that
is applied okay okay and one more thing
is like here the problems are vege
defined and all the available data is
provided directly to you there is a very
big advantage you can not get a direct
problem statement for which you get a
data set you have a data set you have a
problem statement which is clearly
understood and it is really harder to
you know go like a fool yourself with a
bad test setup okay
this is this has lot of truth or both
you know the on the public and private
leaderboards and moreover you have
something called discussion board and
sharing around each competition that you
can learn from from each other and you
can also contribute it's a very big
community okay
and using this you can create a good
portfolio of projects on difficult
real-world data sets that can really
demonstrate and showcase your skill it's
something like that compete to maximize
learnings not earnings yeah it's true
you don't bother about the prize money
in the beginning you just try to learn
as much as possible pick a platform
practice on standard data sets practice
on casual problems and then compete on
the casual okay the process is very easy
to describe to my knowledge but very
difficult to implement I do understand
that okay it will take a lot of time and
effort it is going to be really hard
work
you know I don't want to give up false
in a perception that yeah you day you do
one night study and then the next day
you participate in calculate dozen books
in that way it's a systematic procedure
and it's a hard work it is not easy okay
I can't say it is very very difficult
but it is definitely not easy to get
into it but he's understand it will pay
off and if you are if you are doing
everything in a in a methodical or in a
kind of business strategy and stick to
it
you will be a kind of world class at
least a very good above average machine
learning practitioner no one can say I'm
a machine learning engineer to my
knowledge everyone practices machine
learning it's you can practice machine
learning you cannot master machine
learning just like that even people with
two to three PhDs feel that that they
are just scratching your surface so you
will be a very good machine learning
practitioner because data set is totally
different algorithms is totally
different you build an accuracy that you
cannot even think that you made
something new okay so you can just keep
on practicing but if you go with all
those year you can be a very good
practitioner okay so my thing is like
tips for enjoying angle set incremental
loans if you have ever played an
addicting video game you'll know the
power of incremental goals okay you know
you have to you don't have to be a kind
of a you know the player who and the
teenage having this Xbox kind of thing
but make sure you spend lot of time in
understanding the data how to solve the
problem and everything leave out the
prize money leave out what is happening
just concentrate on your learning path
that would be really good okay so yeah
this is make a submission submission
essence once you completed you are
making a submission you are scoring in
the top 50% 10% yeah it this complete
step I would call this journey would
take some couple of months it doesn't
happens in in a month or two it might
happen some people will do it for six
months some people will take two years
so it depends okay so the more you try
yeah the better person you are having a
platform where you can learn itself but
before before coming into the skagle you
have to learn the base
it's very strongly then you can do that
okay
yeah that is again a tip there is
something called kernel which is nothing
but which are short could short short
scripts that explore a concept showcase
a technique or even share a solution
you'll have the solution to my knowledge
Cowen's shall all the solutions yeah you
can from there you can even take it up
okay
reviewing popular kernels can spark more
ideas because you can understand what
kind of approach these guys have already
taken extra boost under boost okay
decision tree okay what kind of accuracy
he got what is the idea that he has done
is there any way that they can deviate
and do something new yeah always there
okay and please try to practice on a
standard data sets okay once you pick a
platform like a Python you need to get a
very good at using real time data sets
and I recommend using some you know
before coming into casual try standard
machine learning problems on the UCI
this University of California at are we
in machine learning repository or
similar to that but you see is really
good okay and it's like whenever you get
into a data set please understand it is
a mini competition split it down to the
training and then held by the test set
split the data set and then you know put
it in the leaderboard you know there is
a lot of procedures to go for it just
take step by step okay and also before
you getting into a new problem practice
old kaggle problems how people have
achieved read blocks that will really
help you in selecting a variety of
different problem types that you you
know you want to learn and apply to a
new and different techniques that will
that will make you a new a good person
yeah that is as I said before always I
say it in the classes there is nothing
called silly question or a hard question
or an easy question question is always
the question doesn't matter what level
of we are here to learn share knowledge
didn't contribute so don't worry about
that just ask questions and you over the
period of family will be fine with it
yeah work solar to develop core skills
once you understand all this please try
to understand that you know no one is
going to help you when you are doing an
EDA it's you are all on yourself learn
learn from experts take help learn your
basic strongly then we are
dealing with the data set work solo if
there is a because you have something
called Stack Overflow which helps you to
answer all your queries in case if you
are stuck for some time say both are a
day or two where you cannot get any
support then you reach someone okay and
again if you start taking help from
others here part for that particular
problem you are able to get it but every
time then you will start seeking
someone's help
data science doesn't works in that way
they need someone who can independently
take it from start till the end no one
is gonna support you trust me on that
okay yeah so rather than teaming up in
future competitions yeah you can yeah if
you have about four or five friends yeah
and you if you feel that you have
acquired all the sufficient knowledge
and you want to just improve yeah please
go ahead and start it but it will take
some time don't rush and you know
because when you get a then you are not
able to make anything out of it probably
you might do motivate yourself so before
getting into it
explore explore explore practice
practice practice that's the only two
things that I can say so that you can
you know get into that mode okay
yeah as I said before cattle is always a
stepping stone for your success okay so
don't worry about that you can always
you can always get into that
okay so just it will help you to boost
your career showcase your skills it has
a lot of advantages leave with the prize
money for right now you wanna learn okay
so there are people from Russia Ukraine
who are committed to a long time cattle
or their only job is to make money over
there they are very strong in math they
are mostly from PhD in math and then
they have got a very strong you know
pythonic concepts so we are not going to
be a long time carrier we just want to
learn that's it keep it in that way just
learn it from Kangol which will help you
okay yeah as I said before don't worry
about the rhymes don't worry about the
price when you just leave it just try to
learn as much as you can okay yeah as I
said before this is the major five key
take a take aways that's the summary of
this session bigger programming
languages python learn the basics of IDI
train your first machine learning model
like this Titanic data set tackle the
getting started competition and then
compete to maximize learning snort
earnings okay
see all the best guys just go ahead with
it if you have any doubts just let me
know
it's my pleasure Ragini no issues I
would like to get a feedback from you
all of you guys if you could you know
post it in the chat I know is there any
scope of improvement how do you feel
this vision to me is it informative or
do you mean what do you want more
sessions or you want any kind of boot
camp that has to be conducted for us you
know just you know this is share your
idea so that you know we'll try to you
know take it up as much as possible and
try to help you guys
