i 
am here to introduce the course data science
for engineers my name is professor raghunathan
i am with the department of chemical engineering
at iit madras and i am going to quote with
this course with professor shankar narasimhan
also from the same department ah let me start
by saying what data science is data science
is a science of analyzing raw data using statistics
and machine learning techniques so that you
can get information out of this data ah why
is there so much interest in data science
the interest is mainly because ah industries
can make better decisions ah and ah data science
is ah useful in a variety of ah application
scenarios
however doing data science requires the process
of inspecting data cleaning data transforming
modeling analyzing and interpreting raw data
we also frequently hear this term big data
what does big data mean big data literally
means large amounts of data and when you have
lots of data there is this notion that if
you had algorithms which are very powerful
and sophisticated you could use this data
ah with these algorithms to make the inferences
that were not possible to do before ah with
smaller data sets
so extremely large data sets may be analyzed
computationally to reveal patterns trends
and associations which are not transparent
or easy to identify so there are these two
concepts one is a big data the other one is
data sense which talks to algorithms that
can be used with big data now you might ask
is this all matter ah why is everyone interested
in big data now as you can see big data is
everywhere every time you go to the web and
do something that data is collected every
time you buy ah something from one of the
web retailers your data is collected whenever
you go to store ah data is collected at point
of ah sale when you do bank transactions that
data is there when you go to social network
facebook twitter that data is collected
now these are more social data but the same
thing is starting to happen with real engineering
plants real time data is collected from plants
all over the world ah not only these if you
were doing much more sophisticated simulation
molecular simulations which generates tons
of data that is also collected and stored
so the point is that there is so much data
of variety that is available now the question
is can all this data be used to do something
useful if what ah big ah data and data science
is all about you might ask how much data is
big data for example google processes twenty
petabytes per data this is two thousand eight
statistic and similarly you can see the kind
of statistics for facebook ebay so on and
so on
now that we have talked about big data and
data science we might ask this question why
does all of this matter to engineers it looks
like most of this ah are being bandied about
and talked in terms of social context ah the
important thing to note is with engineering
plants of the past you didnt have lots of
data but with this new notion of internet
of things where you have information from
anywhere and information from anything not
only do you have information from anywhere
and anything this information is also accessible
queryable and organized so this gives a hope
that you could actually uncover interactions
and patterns that were not possible to derive
before
now this is the context from engineering plants
where lots of data is there so we have big
data in process industries manufacturing industries
and so on so again what you need to do is
be able to use this data with data science
algorithms for deriving useful information
about the state of the equipment process and
so on
now the excitement is that there are all whether
algorithms for all kinds of problems thats
you can think about in data science and some
of these algorithms are self learning and
so on so there is lot of excitement about
how to use these algorithms from ah an industrial
perspective
so from a purely engineering viewpoint ah
you could think of using this data and data
science for productivity profitability safety
and so on take the simple example of ah an
industrial accident ah if an industrial accident
occurred thirty years before that information
is not going to transmit readily to people
around that region but in this day and age
if anything happens this information gets
transmitted in no time so you can imagine
the kind of safety that you can provide with
information
so it is important for engineers to understand
the notion of big data understand internet
of things and understand data science so that
we can make our plants more productive more
profitable and safer so what is this course
going to cover its going to tell you that
collecting data does not mean discovering
so just having data does not matter what you
should be able to do is you should be able
to use this data to lead to value propositions
so this art or science of converting collected
data to useful knowledge is what is called
data science and whenever we talk about data
science there are these three aspects that
we need to keep in mind one is the domain
knowledge what domain are we using this data
science for so thats the reason why um data
science is important for mechanical engineers
chemical engineers aeronautical engineers
and so on
however there are lot of things that you can
do without the domain knowledge in terms of
the algorithms itself so this course is going
to be largely domain agnostic and its going
to introduce algorithms for all kinds of engineers
to use so in terms of algorithms themselves
the two things that we are going to really
spend time on in this course is the statistics
and the statistical techniques and underlie
these algorithms and the machine learning
techniques that are part of data science there
is also the software aspect of programming
this ah data science algorithms so we are
going to use a particular software to demonstrate
this ideas and data science
so if you were to look at the objectives of
the course ah this is going to be a first
simple course on data science we want to introduce
participants to programming language that
they can use with this ah course ah we will
introduce the participants to the mathematical
fundamentals ah from a data science focuse
perspective ah some of these you would have
seen in your core courses in your colleges
but what we are going to do is we are going
to take the same material and approach it
from a data science perspective and then tell
you why what you have learned is important
from a data science viewpoint
now data science means many things for many
people so we are going to give you a framework
for understanding data science problems this
is we what we call as typology of data science
problems and again there are many data science
problems solved in different ways so we would
like to give you a conceptual framework for
solving data science problems so that you
can apply the same kind of thought process
to multiple problems ah we will teach commonly
used data science algorithms and give you
a practical orientation at the end through
a case study which you have to program and
solve
i also want to emphasize that before this
course starts we are going to give you pre
course material on r which is going to be
the programming platform for this course and
we would expect you to have work through that
prior to attending this course this is a more
detailed outline of this course so there is
a section on ah what is the programming platform
that we are going to use and in terms of fundamentals
we are going to have two major blocks one
is on linear algebra and how is it useful
for data science the second block is statistics
and how is it useful for data science then
we are going to talk about the typology of
data science problems and provide a solution
framework and most data science problems can
be either categorized as function approximation
problems or classification problems we are
going to teach regression as a framework for
solving function approximation problems and
we are going to show you some well known techniques
for solving classification problems
now that is the outline of the course once
you are done with this course what do we expect
the outcomes of the course to be in other
words what can you do if you do this course
well first you will be able to describe a
flow process for any data science problem
that you come across you will be able to classify
data science problems into standard typology
you would be able to use all as a programming
platform to solve data science problems and
once you program and get some results then
you will be able to correlate the results
that you get to the solution approach that
you followed which will help you assess the
solution approach and in cases where you are
not happy with the solution approach you will
be able to do modifications as required and
finally this course is the first fundamental
course on data science which will be a prerequisite
for more advanced machine learning course
that will follow this course
now from a purely ah job viewpoint for students
who would wish to take this course i just
want to point out that data science its just
not a bus ah there is going to be lot of demand
for this the demand is supposed to grow thirty
percent year on year in india and at the end
of twenty twenty ah people expect ah about
two hundred and twenty k jobs to be available
in this area of data science ah these are
interesting jobs well paying jobs ah so its
a good idea for engineers to be geared up
ah for taking ah advantage of this opportunity
that is going to show up or its already shown
up ah ah now ah
thank you i hope to see you ah taking this
course ah next year
