Welcome back to the Data Professor
YouTube channel. If you're new here, my name
is Chanin Nantasenamat and on this
channel we cover about concepts and
tutorials about data science so if
you're into this kind of content please
consider subscribing. So for today's
episode we're going to cover about how
to become a data scientist in 2020.
If you're wondering on the path that is
required to become a data scientist if
you're starting out or if you're
interested in the field, what path should
you take in order to become a data
scientist? Let's say should you have a
computer science degree in order to
become a data scientist or if you have a
non-technical background could you also
make that kind of transition to the
field. So based on my own experience, I am
NOT a computer scientist. So as you may
recall from my very first episode my
first undergraduate degree is in
biological science and because of my
interest in computers and data analytics
I have been self-studying in order to
learn the concepts that are necessary
for doing data science. So if you think
of it, data science is a
multidisciplinary field which encompass
several disciplines such as informatics
computer science (of course) statistics
are science mathematics, data
visualization and most importantly
problem-solving. So if you look into the
website LinkedIn and look for data
scientist you will notice that there are
various first degree that the data
scientists have. So the background is
quite diverse. If you look at that
you will see that many are not computer
science graduates. However if you have a
computer science degree you may have an
advantage. If you don't have a computer
science degree that's okay. So just the
passion to learn about the field and to
apply data science to whatever field you
are in, I think that is what it takes to
become a data scientist. There are quite
big data in the field of biology. I have
made the transition to become a data
scientist by applying data science in
analyzing the big
in biology and biomedical sciences in
order to find and discover new drugs to
understand the mechanism of action of
drugs and also to create diagnostic
tools that will be able to assist
clinicians and health professionals in
diagnosing patients for a particular
disease of interest. So the second
question that you may be wondering about
is the amount of time that is needed to
become a data scientist. So if you have a
computer science degree, the time to
become a data scientist would not be so
long because you already have the
fundamentals, you already have the
technical background, you already know
how to program. So that will make your
transition much quicker. So let's say
that if you are a web developer and
you're going to learn about R or Python
then you will be at a better position to
learn both languages or either one of
the languages than a non-technical
person coming from say biology. So for a
biology major the time that it takes to
learn R or Python might be longer, it
really depends on the background of the
individual person. However I believe that
if you have the mindset to learn, if you
have the passion to learn that I think
that is all that is necessary so I read
somewhere that if you spend say maybe
10,000 hours you will be able to master
anything skills or knowledge. So let's
say that you spend about two hours a day
learning about the concepts of
programming and data science so I
believe that within a year or two you
will be able to learn enough to become a
data scientist. So given that you also
practice. So the next question that you
may have is do you need to learn how to
program to become a data scientist. So it
really depends yes or no so before I
learn how to program I use this program
called WEKA. It is a point-and-click
graphical user interface software that
allows me to analyze data that I have
compiled during the course of my PhD
study. So over time I began to notice
that analyzing the data via the
point-and-click interface was not so
efficient
and it really require manual time in
which I have to physically use the mouse
to click the program, to import the data,
to specify the input parameter to
initiate the training of the model, to
collect the data, to put it into Excel, to
combine it and so all of these are quite
tedious. And so I remember that during
the course of my PhD study I use let's
say maybe 40 to 50 computers at the same
time. So each computer I will run some
simulation and then I will manually
collect the data from each of the 40
computers and then pull it together and
analyze the data in a Microsoft Excel. So
is it possible to analyze the data so
you might become a data analyst or a bio-
informatician if you're in the field of
biology so yes it's possible to become a
junior data scientist and however in
order to be efficient if you know how to
program just only a little bit that will
greatly speed up your workflow. So yes I
would recommend learning how to program
so if I could turn back the clock to the
time when I was doing my PhD which was
about 15 years ago I would indeed want
to learn how to program so that would
greatly increase the speed at which I
perform the project. So now the very
important question is what language
should you learn how to program so if
you've been googling or if you've been
watching videos on YouTube you may come
across two languages that are very
popular for data science. So the first
one being R and the second one being
Python though there's a debate whether
to let R or Python so this really
dependent on your own personal
preference or your mentor. So personally
I've learned Python first not because of
any particular reason of the language
itself. So the decision to learn Python
was rather due to the fact that one of
my colleague he knows already how to use
Python and we were working together on a
research project in which he coded
in Python and so at the time I had no
programming experience so he recommended
a book a Python book and I looked at his
source code in which we published
together. We generated artificial data
set for bacterial and edification so
that work we published the paper in the
EXCLI Journal and over the years I
had another master student over the
course of his master's degree I have
learned R as a new language to do data
science and also to help to tweak his
code
so really the language depends on your
own personal preference R or Python so
if you have a mentor who knows R then go
ahead and learn R if you have a mentor
who knows Python go ahead and choose
Python if you have no mentor then follow
this channel and I can be your virtual
mentor so you can ask your question in
the comment section or you can also ask
and then you could post a question in
the comments down below so the next step
to becoming a data scientist is you have
to become familiar with the standard
library of Python or the standard
package and modules of R so you have to
know what packages or libraries R or
Python are available for you to wrangle
with your data, to pre-process your data
also to create your prediction model. So
for example if you want to deal with
data frames in Python you would use
pandas so you have to learn about pandas
how you can merge different data frames
together and in R it is DPLYR and also
the data frame built-in function of R. If
you want to know how to create graphs in
Python you would use matplotlib and sea-
born in R you would use the our base
plot function or the ggplot2 in order to
do machine learning or build prediction
models in python you would need to use
scikit-learn or you want to use keras or
tensorflow for your deep learning
models in our you would use carrot and
rattle and also tensorflow and keras as
well one of the particular package that
I really like is called shiny. So this
package in our allows me to create web
applications that are data-driven and we
published several papers in
collaboration with one of our colleague
at the University. And the most important
point of becoming a data scientist is
that you have to persevere, you have to
try hard so this journey won't be easy
but you have to put in your effort and
most importantly of all you have to code
you have to do data science project and
so on this channel we're going to have a
series of data science project in R
and if you would like to learn and
follow along with this journey please
subscribe to the channel so every week
we will have tutorials on how you can
create your data science project. We will
also cover basic concepts in our
programming in our R Programming 101
series and we'll also have some concepts
about data science in the Data Science
101 series so we have a lot in store for
you and so the year 2020, this coming
year has a lot of hope for you guys if
you want to become a data scientist and as
DJ Patil has mentioned data scientists
is the sexiest job of the 21st century
and so if you want to become a data
scientist, please do so and I hope that
this channel will also help you to
embark on this very important journey
and new milestone of your career. I have
been working as a data scientist, a
biomedical data scientist for the past
15 to 16 years
if including my PhD studies and now I'm
an Associate Professor of Bioinformatics.
My journey so far in data science has
been very phenomenal I have learned a
lot of things I have met a lot of
talented people I have the opportunity
to do and get paid for what I really
like. So I'm never bored of this job and
there's always new data, to be collected,
to be analyzed and we have ongoing
collaborations there's really a
continuous inflow of new data and new
opportunities to learn about biology so
it is a very very fulfilling job. If I
could turn back the clock and decide
again what I would like to do I will
stick to the same path and become a data
scientist. Okay so thank you for watching
please like, subscribe and share and I'll
see you in the next one but in the
meantime please check out these videos!
