Welcome to the Data Professor 
YouTube channel.
On this channel we cover about
big data and data science.
(What contents do we have?)
Explainer
videos, practical tutorials and Q&A session.
So before we begin maybe a
little bit about myself.
I've been working in data science for the past 15
years since when I was a PhD student.
So much of the work that I have been
involved with is the application of data
science to biology, chemistry and
medicine in order to try to understand
the underlying basis of the disease, how
do they occur and how to target specific
protein in order to modulate their
activity so that we can cure the disease
and in effect we will try to find novel
drug and we use machine learning to try
to understand the specific features of
compounds and small molecules and how do
they specifically interact with the
target protein in order to cause the
function modulation and which will
subsequently cure the disease.
(How did I start data science?)
So the data mining tools that I first started
out with is a program called WEKA
developed by the University of Waikato
and this software is a GUI software
where we can click various buttons in
order to import data, perform feature
selection, normalize the data, remove
missing data and also to perform the
learning process. So some of the
algorithms that we use were decision
trees, linear regression, artificial
neural network and support vector machine.
So we initially started out as a
user of the GUI software and over time
we began to become aware of some 
of the limitations and hurdles. We also
noticed that the time that it takes to run the data
mining software takes a lot of time,
specifically when we want to optimize
the parameters we have to modify the
numbers in the program a couple of times
and so after a couple of years using the
program we felt that we need to develop
our own software or our own workflow. 
So the natural next step would be to grow
to a programming language such as Python and R.
(Applying programming in our data science work)
So our research lab uses R and
Python to create data mining workflow
and we try to create reproducible
workflow and we share our code and data
set that we have compiled on to the
GitHub so that interested individuals
can download the data and the code and
reproduce the model.
(Learning to program)
So at first it was a
bit difficult but then over time as we
try to solve problems bit by bit, modify
the code we gradually grasp the concept
of programming and coding. So I pretty
much tried everything from reading books,
tutorials online, Stack Overflow, asking
colleagues, etc. But then as a biologist
learning to code, the concepts and basic
theories I have never been exposed to
such as algorithms, data structures and
all that. So what I discovered was that
if I set my own problem that I want to
solve which is relevant to the research
that I'm doing then I will try to find
similar solutions step-by-step on Stack
Overflow and then I would try to solve
it, break down the problem into
small steps and then and each of the
small step try to create the code with
reference to Stack Overflow and also
from books and tutorials. So the thing
is I did not learn directly from the
example set out in the book or tutorial
but then I use my own research problem
as the basis for my learning. So for
example, I have this research problem
that I want to solve, so I found tutorials,
Stack Overflow answers and also specific
section of books in order to solve that
specific problem that I wanted to have
to solve. So doing this repetitively over
and over again then I started to realize
that coding was not that tough and it is
something that is attainable and so with
more problems that I had solved and that
I can solve, I started to gain motivation,
I started to have the sense of
satisfaction that I have created
something on my own
and that is able to solve the research
problem that I have; and being
able to solve that with coding really
saved a lot of time. So what I have been
doing manually, e.g. maybe in Excel, using text
editor, etc. could be solved using coding
maybe a hundred-fold faster or a
thousand-fold faster. So what used to take
maybe six months will take a couple of
minutes in the R or Python code. So if a
biologist like me can learn programming
then I believe that everyone can learn programming.
So with determination and practice,
anyone can code.
(Programming = Super Powers)
Being able to code is kind of like having superpower.
If you can solve problems that used to take you
six months or one year in a couple of
minutes or hours,
then you see what I mean.
(See you in the next one!)
