hey Jonathan here in this video I'm
gonna be talking about R vs. Python
for data analytics and data science and
I thought it was important to make this
video because there's been a lot of
great information but I just thought
it's maybe a little incomplete a little
imbalanced about which tools should
actually be used if you're new to this
channel and you're keen to learn the
latest tips tricks and tools for working
more effectively with data please hit
the subscribe button for weekly videos
now R and Python are both fantastic
tools and in some circumstances R is a
much much better tool and in some
circumstances Python is a much
better tool so the question is you know
which is the right tool for you and this
is important I think because at the end
of day we need to make data more
accessible to more people not
necessarily just PhDs not necessarily
just software engineers and the world is
becoming data-driven and so it's
important aside from just kind of R
and Python and stuff as well that we
make data analytics and data science
more accessible to more people now I
wanted to start off by saying that the
rise of R and python has been absolutely
fantastic obviously they are both open
source tools which means they're free
and they have been massively extended
through tens of thousands of libraries
which do a ton of work for you
automatically which is absolutely
incredible now compare this to what you
used to need to use things like
MATLAB SAS and SPSS which were great
but massively expensive tools and the
other thing is besides the initial cost
of the actual tool was the fact that the
power of these tools often comes through
all of these additional libraries and if
you needed to get any of these libraries
for these tools that that would be
additional license cost as well
which just made them like really
difficult to actually do anything at any
pace if you need to get my cost
approvals or any that sort of thing
whereas with R and Python you type in a
short command and within seconds
you've got your library installed and
you're ready to go so key benefit there
of both being open-source now let's go
back and talk a bit about fundamentally
python is a general-purpose programming
language so it's been around for it's
been around for longer than R and it's
used for a lot more things so in that
respect Python is a lot more popular in
the kind of wider area of not just
data science so you can use it for
everything like you know running a
server maintaining scripts you can use
it to program your raspberry pi to like
give yourself smart lights to manage
your cloud computing environments as
well as now data with the canvas library
so you know python is a very very
powerful tool and it's got about 70,000
libraries or so which do kind of all of
these all of these different things now
R on the other hand is a data specific
programming language so which means it's
actually very very nice for working with
data and it's only got 10,000 libraries
but instead of having you know 10,000
libraries spread across like all these
kind of different functionalities and
areas
it's got 10,000 libraries which are very
much focused around a data analyst data
scientist kind of process and workflow
so that's something to keep in mind as
well that if you are actually
specifically working in data there's
some very very nice libraries that maybe
Python doesn't really have right now so
let me give you example of that so one
of the things with Python is people say
that okay well you can take your
analysis and turn it into web application
using things like flask and Django so
these are kind of really popular well
established frameworks for Python which
allow you to build web applications now
the thing to keep in mind with this is
that in order to use these you kind of
need to also be a web developer right so
you need to know flask and Django and
you also need to know some HTML as a
JavaScript asynchronous programming
concepts you know like some reactive
programming all this kind of stuff right
now if you have these skills fantastic
that's that's great right a nice little
thing to add to your CV right but the
thing to keep in mind is that even if
you do have these skills it basically
takes a lot you know it basically takes
a bit of work to kind of actually build
these things out and it's almost like a
kind of another job in itself whereas
with R a lot of people don't realize
that you can actually build web
applications as well you can build
reactive web applications actually very
very quickly actually much faster than
Python using things like shiny and flex
dashboards now the cool thing about this
is that
your focus is predominantly as a data
analyst and a data analyst or data
scientist you want to be able to do your
research and then you want to be able to
quickly present it without necessarily
needing to be a web developer or being
dependent on a web developer to get your
work out and this is where R is just
really really powerful because you can
basically just write markdown documents
with a little bit of your R code
embedded and you can build reactive web
applications know HTML no JavaScript no
learning up flask or or any of that kind
of thing and it is just unbelievably
fast for doing it now not as flexible
you know you're not going to get a job
as a web developer using things like R
shiny flex dashboards but you're going
to get out work a hell of a lot quicker
so even if you're into things
like a rapid prototyping and different
things like that R can actually be a
really really good option for that so
you know just something to keep in mind
as well
okay other things like often people
talk about how Python is basically for
machine learning and so the thing is
this R had machine learning libraries
kind of well before Python but the
problem was was that because they were
written by lots of different people it
was actually quite fragmented and so
when python came out with scikit-learn
it was very very kind of powerful
because it created a single cohesive way
for accessing local of machine
learning models and you know since then
Python has actually taken the front seat
in terms of all the kind of different
machine learning models which have been
developed access to the different API is
everything like that on you know their
computing environments Python does take
a front row seat to the machine learning
models which is you know which is an
important thing to take into
consideration especially since machine
learning is such a big thing right now
but the thing is is that the R
community is not sitting still on this
either they are rapidly developing and
porting all of these libraries over as
well so you know you still have access
to all your things like carrots and
tensorflow you know you've got access to
automated machine learning models like
h2o and everything like that so there is
still a lot of options on R as well
but typically it is you know just
slightly behind Python right in this
respect so something to keep in mind
there
okay um so yeah I just kind of mentioned
kind of cloud computing and stuff as
well so right now with cloud computing
the big thing is you know like now you
can build like do service programming
and stuff like this right and I'll talk
about that in probably another video
sometime but a lot of that basically
uses Python to build all of these
functions which is kind of powerful
a lot of the follow the API access and
everything again is basically through
Python right so Python is slightly
better when it comes to this disrespect
especially on AWS which is kind of the
kind of key player in the cloud
computing space really it's it's all
it's all Python now
but you've got other players as well for
instance Microsoft Azure they're kind of
pretty much the second place player and
there there's still a massive massive
car computer to the cloud computing
space and it did do a ton of really
really good things like a lot of people
don't realize that Microsoft Azure has
actually much more machine learning
pre-built machine learning models then
actually AWS does so something to kind
of keep in mind there microsoft also
actually has much better support for R
as well so they have support for
basically R and Python knowing that okay
well these are the two big languages for
data analytics and data science so
you've got things like a Microsoft ML
server
they'll take R or a Python code and then
compile it and expose these stuff by our
a REST API which you can then to kind of
run your stuff off right so really stop
there you've also got things like it
basically Microsoft has a a no code
machine learning sort of machine
learning kind of builder where you can
basically drag and drop processes onto
kind of a workflow to build out machine
learning models and that also integrates
with R or Python so you know a bit
better kind of our support there
as well so again you've kind of got some
options which kind of leads me onto my
second well the kind of next point which
is about putting stuff into production
so a lot of times people say that okay
well if you want to put something into
production well I'll just use Python
right Python is kind of thing that you
would use to put things into production
but again that's not necessarily the not
necessarily the case
now effectively Python is a language
which has been used a lot more by
software developers software engineers
and you know so they're much more
familiar with the whole process and
workflow on getting stuff into
production whereas R has been typically
been a language used by researchers and
mathematicians institutions who are not
necessarily software developers and
because of that they're not really
familiar with production workflows right
so you know like people think okay well
R is just a bunch of scripts which is
you know not really that suitable but
actually it's possible to expose you
know you can containerize R functions
into like docker containers and you can
expose your functions by R REST API s
and all this kind of stuff right so um
actually there are actually a lot of
options there for using either language
in production it's really just it's kind
of more how you use the language rather
than the actual language itself all
right so the next point I wanted to
cover off here is documentation to help
and just basically I guess how easy each
language is to learn now again as
mentioned Python is typically something
which is already being used by software
developers and so because of that if you
already come from like a software
developer kind of background then you
know potentially Python that will be
easier to pick up whereas R is
I'm gonna say that R is easier for
working with data than Python and you
know again this is partially because it
was natively built right as a language
specifically for data analytics right
also I mean the design the design
considerations for both languages Python
was designed to be easy to read easy to
understand language R was designed to be
a language which was very very forgiving
which means you could write code in kind
of lots of different ways and have it
still work which when you're getting
started actually makes a big difference
it actually helps a lot to get up and
running right on the flip side of that
to know sometimes people complain that's
like this R code is
just so messy because there's you know
like instead of having one way to do
something you maybe have a dozen I still
suggest you know like now if you're
learning R I suggest learning tidyverse
as like a really really good
standard for using R but even if you
don't strictly adhere to it your code
will probably still work and that is
kind of nice now in terms of help files
and documentation at least for the data
side of things
I found the R documentation and help
files to be much much better now part of
this is the fact that the standards for
R documentation it requires like
vignette so basically code samples and
examples that can run and now this
really really helps out a lot now a lot
of times with the Python libraries
documentation file tells you what all
the different kind of functions are but
it has no examples to help you along
which means maybe you have to go to
Google Stack Overflow whatever it is to
try and look those up to try and figure
those things out whereas R code you
know you hit f1 right there on the
function you've got all the you've got
all the documentation in your code
samples and you basically copy/paste and
start using and they have lots of nice little examples and this actually really
really helps
a lot to get up and running like really
really quickly
so you know important thing to take into
consideration as well okay now the other
thing I want to talk about is bi
solutions now you know that's mentioned
before Python has much better
integration with things like cloud
computing you know it has better support
for like a lot of different api's which
is which is also important but R has got
better support for the AI tools right so
a lot of good VI tools like your
tableaus your power BI and all
these kind of things that's you know
like a lot of businesses are already
using to do their data analysis they
will have built in art integration and
but they probably don't have I haven't
seen Python integration as well so
something to kind of keep in mind there
so R is kind of integrated better in
different kinds of ways right okay so
which is which is right for you I think
probably the most important thing is
what is the rest of your team using
right because if your team is using R or
they're using Python well you should
really just go off and learn that but
other than that like python is generally
going to be better for kind of
software developers software developer
teams whereas R it's going to be kind of
a better for kind of researchers and
analysts I'd say if you have more
business facing and not sitting
in a software development apartment then
you know you should really consider R as
as a tool just because you can build
again you can build everything from end
to end like very very quickly without
needing to learn like a lot of other
kind of skills and stuff as well
so again with the web development you
can build a web application within
minutes in our without HTML or
JavaScript or flask you won't get as
much control but you know you will be
able to churn out some really
other things I forgot to kind of mention
as well is that the languages like
SQL are very very useful very
important for accessing data from
databases and different things right so
it's a that's a good language learn and
is thought it's not too difficult to
learn SQL but if you learn R you
don't necessarily need to learn SQL
either because R has got database
connectors which allows you to basically
just write your regular R code and
have it run against databases and stuff
as well which is just kind of really
nice because again you've got like one
language that pretty much does you know
that pretty much does everything in your
workflow right I mean again like Python
kind of overall does more things but
it's merged in with HTML and JavaScript
and SQL and all these different
things which are which again great
skills to learn and know but if your
focus is as a data analyst or data
scientist you may not necessarily want
to learn those or even if you do learn
them you may not necessarily want to
spend time on them or you know even if
yeah or basically have dependence on
other people with those skills to be
able to get your work out all right so
if you found this helpful please give a
thumbs up part of my mission is to try
and help more people from different
backgrounds and different starting
points get involved in data so if you
want to get access to some free training
you can head over to my website
www.DataStrategyWithJonathan.com
