Hello I will walk you over the chapter on
data mining
The learning outcomes from this chapter are
Describe data mining as an enabling technology for
Business analytics
Discuss the objectives and benefits of data
mining
Identify with a wide range of applications
of data mining
Describe the standardized data mining processes
List the steps involved in data processing
for data mining
Discuss different methods and algorithms for data mining
Identify the existing data mining software
tools
Explore the privacy issues pitfalls and myths of data mining
I will start with the definition of data mining
Data mining is the process of analyzing data
To extract information not offered by the
raw data alone
Data mining can begin at a summary information level
And progress through increasing levels of
details
To perform data mining we need data mining tools
Data mining tools use a variety of techniques
Such as statistics decision trees neural net etc
To find patterns and relationships in large
volumes of information
And infer rules from them that predict future
behavior
And guide decision making
Some examples of data mining tools would be query
Tools reporting tools multidimensional analysis tools
Statistical tools and intelligent agents
Next we will be talking about the data in
data mining
Data it is a collection of facts
Usually obtained as a result of experiences
observations
Or experiments
Data can be two types broadly categorical
and numerical
Categorical data is a variable divided
Into specific groups
And numerical data represents the numeric
values
Of specific variables
Some examples of categorical data would be
Race sex age group education level etc
Some examples of numerical data would be
Age number of children total house hold income etc
Categorical data can be of two types nominal data and
Ordinal data nominal data example would be
Marital status
Like single married divorced
Nominal data may have two possible values
Such as yes no true false good bad or multiple values
Such as blue green red
White black Asian etc
Ordinal data represents rank order in data
For example credit score which can be low
medium high
Education level which can be high school college graduate school etc
Numerical data can be of two types
Interval data and ratio data
Interval data includes variables measured
on
Interval scales for example temperature in
Celsius scale
Ratio data can be used in science such as
Mass time energy etc
Data mining capabilities include association sequencing
Classification clustering and forecasting
Lets start with association
Association shows us how two variables go
together
Determine the degree to which the variables are related
And the nature and frequency of these relationships in the information
Lets take the example of market basket
When we purchase coffee we also
Buy bread 35 percent of the time
With chips we buy soft drinks 65 percent of
the time
With promotion 85 percent of the time
Such information is helpful for deciding on
the store layout
Items bundling discount and promotion
If you know buying behavior you can predict future behavior
By identifying affinities among customer choices
Of products and services
Market basket analysis is frequently used
for developing
Marketing campaigns for cross selling of products and services
In industry such as banking insurance finance
For inventory control for shelf product placement etc
Classification refers to prediction of a target
variable
Which is categorical in nature high versus
low risk
Purchase versus non purchase flawed versus non flawed
This is used heavily in banking
Credit card and telephone company use them to
Detect characteristics of customers who are likely to leave
Managers then can provide
Special campaigns to retain such customers
Clustering the purpose here is to group objects in
Such a way that objects belonging to the same cluster
Are similar and objects belonging to different
Clusters are dissimilar
Used for market segmentation used in customer relationship management
To identify customer with similar behavioral
traits
Then sequencing sequencing talks about events linked over time
When a home is purchased a refrigerator will be
Purchased within two weeks 65 percent of the time
Going back to clustering clustering is
Being heavily used also in medicine
IBM and Mayo clinic unearthed hidden patterns in medical records
Discovering that infant leukemia has
Three distinct clusters each of which benefits from
Tailored treatments
Trying to do the same for cancer patients
IBM life sciences
Mining the records of cancer patients for
clustering patterns
Finally forecasting future predictions made
on the
Basis of time series information
Web visits per hour sales per month calls
per day
Product investment staffing decisions are
Made using forecasting models
Data mining is very heavily used in
Several areas here are some examples of data mining applications
Customer relationship management
Data mining is used to maximize data marketing campaigns
To improve customer retention
Maximize customer value
Identify and treat most valued customers
Banking and other financial data mining can be used to
Automate the loan application process detect fraudulent transactions
Maximize customer value optimize cash reserves with forecasting
In retailing and logistic it can be used to
Optimize inventory levels at different locations
Improve the store layout and sales promotions
Optimize logistics by prediction seasonal
effects
Minimize losses due to limited shelf life
In manufacturing and maintenance
Predict and prevent machinery failures
Identify anomalies in production systems
To optimize use of manufacturing capacity
Discover novel patterns to improve product
quality
In brokerage and securities trading
Predict changes on certain bond prices
Forecast the direction of stock fluctuations
Assess the effect of events on market movements
Identify and prevent fraudulent activities
in trading
In insurance
Forecast claim cost for better business planning
Determine optimal rate plans
Optimize marketing to specific customers and
Identify and prevent fraudulent claim activities
Other data mining applications are also in
computer hardware and software
Science and engineering government and defense
Homeland security and law enforcement travel industry
Healthcare medicine entertainment industry sports etc
So we can see that the data mining capabilities
Of sequencing association classification clustering and forecasting
Has made it popular in several industries
Where it can be heavily used
Lastly some data mining software
Commercially available IBM SPSS modeler
SAS enterprise miner IBM intelligent miner
Statsoft statistical data miner
These are all data mining software
There are other free and open source software like
Rapid miner R etc
This is essential of data mining chapter
We will have assignments and we will have
other
Software data visualization software
Where we will work on data mining hands on applications
Thank you
