Hey guys, it's Navjyotsinh Jadeja here
again and in today's lecture for data
mining we're gonna talk on KDD process
if you're in student of data mining you
know how important KDD processes is
it's the heart of the data mining subject
right so what are we talking and what is
KDD so quick introduction it is
knowledge discovery from the data or
some books refer it as knowledge
discovery from the databases so what are
we trying to actually extract we are
extracting interesting patterns or
whatever you call as a knowledge from a
very large amount of data this large
amount of data comes from multiple
sources that is what we have discussed in
the lecture of data warehouse also but
what should be the qualities what
should be the characteristics of the
data which we are trying to extract so
it should be non-trivial it should be
implicit it should be previously unknown
so it should be the new pattern right
and it should be something which is
potentially useful but now the question
arises so if you are actually doing
knowledge mining or if you are seeking
knowledge from the data how is that data
mining so guys we will discuss that
further more in the lecture also and in
the further videos also so stay tuned so
let us understand what are the different
names in which KDD is known as or data
mining or even knowledge mining is known
as so one name as we saw is knowledge
discovery in databases it is also known
as knowledge extraction if you are in
certain companies or the older companies
are still referring as patern analysis
some you know people refer it as
information harvesting and the latest
and the buzzword is business
intelligence right so data mining is
actually everywhere if you understand
from the simple Google search or your
Facebook recommending your school
friends which you have not met for ten
years how is it happening that is data
mining guys so let us understand what
the KDD process is all about what
happens
in the data mining so this is a typical
you know diagram which is you know I
have taken the reference from the Han and Kamber
book which is considered as in
the Bible for data mining that is what I
used in my college days and let's break
this diagram into simple steps for you
to understand so before you go further
into this this is an analogy which I'm
going to share with you right now is one
which I use in my classes when I teach
and if you remember this I'm sure you
can actually remember writing knowledge
discovery in databases or KDD process in
the examination very easily relate this
data mining process with the cooking of
the fish analogy so you might think how
are we connecting let's do it so what
are we supposed to do if we want to
actually cook the fish so we need to
have the fish so that fish is nothing
but which data you have in here so it
comes from different databases or the
collective databases which is known as
the data warehouse now I'm going back here
once you have the data which is your
fish what do we need to do we need to
clean it so in this step we remove the
noise we remove the inconsistency and
there are different techniques related
to that so what is that again going by
the analogy we have to clean the fish
right then what do we do we actually do
data integration so what is data
integration in terms of you know data
mining here we are multiplying different
sources or different multiple sources
into a combined form right remember in
the analogy that fish needs to served with
Lots of masalas right next up what do we
do data selection so going by the fish
example we select which is a good part
of the fish we remove the thorns we
remove the unwanted material and in the
data mining we keep the data relevant to
the analysis task so if we are actually
going by analysis which requires the
number of patients which are infected
through communal transmission in Kuwait
98
I will not be focusing on their
financial backgrounds why because that
is not relevant so here I select
relevant data then I perform
transformation going by the analogy what
do we do we actually cut chop and put it
into the ingredients right so same way
in terms of data transformation in KDD
data is transformed into a form which is
more suitable for the tools or of the you
know system where we are going to supply
the data next step is data mining which is
our cooking so basically we apply
different data mining techniques such as
supervised learning unsupervised
learning etc and that would actually
extract some data patterns now the fish
is ready to eat but what do we do should
we share it directly to the guests
no we evaluate we taste just like your
mother's in the kitchen we will take a
pinch and taste it that is what we do in
here we do the pattern evaluation we
check in the patterns are helpful we
check if the patterns are appropriate or
not and similarly once that is done we
garnish the plate we garnish you know
the fish in a way well guest like it
same way knowledge presentation is the
step where we serve it in a way where
the business leaders or the people who are
supposed to take decision understand that
in a way which is simpler so the form
which the business taker
decision makers understand not only the
scientists so this is the whole KDD
process again a quick overview before we
finish that we start with data cleaning
we end with knowledge presentation so
from cleaning from there we go into
integration then selection
transformation then mining evaluation
and presentation and remember the fish
example guys if you like our effort if you
like our trick please like share and
subscribe our channel. Thank you so much
