Python and R are the two most commonly used
languages in data science and
nowadays, most of the fresher's get confused,
whether they should use R or Python to kick-start
their career in the field of data science
domain.
Hey Guys! This is Shubham from Intellipaat
and in this video, I am gonna tell you the
long and the short of both of these topics.
So, without wasting more time, let's get
started.
I am gonna start off with their basic definitions:
Starting off with R-
R is a programming language made by statisticians
and data miners for statistical analysis and
graphics supported by the R foundation for
statistical computing.
R also provides high-quality graphics
and It also has some popular libraries which
help in analytical parts such as R Markdown
and Shiny. Python, on the other hand, is a fully-fledged,
Object-oriented & high-level programming language
made by programmers and developers' for general
purpose programming.
Python is widely used in GUI based applications
such as games, graphic designs, Web applications
and many more
So, we can say that R's functionality is
developed by statisticians mind, thereby giving
it a field-specific advantages
while Python is often praised for being a
general-purpose language with an easy-to-understand
syntax.
Let us start from the first factor, that is
speed.
When it comes to speed, python is faster
than R only till 1000 iterations but, after
the 1000 iterations, R starts using the lapply
function which increases its speed, in that
case, R becomes faster than python.
So, both have their own advantages.
Right?
Moving forward to the next point: that
is, Code and Syntax.
In this topic, I am gonna give you a brief
about the variable declaration, Data handling
capacity with the scatterplot visualization
and.. the ClusPlot graphics.
Starting off with Variable Declaration.
Let's take the case of String here. As R
uses the similar implementation to that of
the S programming language, which uses arrow
signs in order to initialize the variable
which was also present in case of S programming
language. These arrows can be used from right
to left or left to right indicating whom to
assign the variables whereas python uses an
assignment operator to initialize the variables.
Basically, R developers thought that it would
be better to tell the direction of assignment
rather than just using an assignment operator,
which could actually confuse any new programmer
about which variable is being assigned.
Next is the Data Handling capability, here,
I am gonna show you the case of ScatterPlots,
by which you will see the visualizations in
R and python.
These are the piece of codes in R and Python
and after running these codes, you will get
the very similar plot results in both the
cases, if you check the code here, then this
shows that how R data science ecosystem has
many smaller packages like GGally, which basically
is a package that helps ggplot2 and also,
it is the most-used R plotting package) whereas
In Python, matplotlib is the primary plotting
package, and seaborn is a widely used layer
over the matplotlib.
So, guys, these are the plot results that
I was talking about, you can see that the
graph results for both R and Python are similar,
but the only difference is their visualization.
So guys, based on these points and plot results,
we can conclude that R has Many packages supporting
different methods of doing things Whereas
there is usually one way to do something in
python.
Moving on to the next point that is Graphics
Here we will take the case of ClusPlots.
So Guys, as we already discussed that R was
basically built for statistical analysis,
so it has many specific libraries for plotting.
This is the reason R comes up with beautiful
charts and graphs whereas Python's main
agenda was not a statistical analysis, so
in the early stages of Python, packages for
data analysis was an issue, but it has improved
a lot.
Here is the plot result:
As you know that a picture says more than
a thousand words.
Here You can see by yourself that R comes
up with beautiful graphical representations.
So here we can say that R is handy when it
comes to Data Handling.
Our next point of attention is Deep Learning,
which is today's trend. As you all know,
almost the majority of the companies are working
on Artificial Intelligence, And Deep Learning
is the main part of Artificial intelligence
So, When it comes to Deep Learning, Python
is more versatile than R as it provides more
features to deep learning whereas R is new
to Deep Learning.
R has newly added APIs like Keras and KerasR
which are written in Python.
Right?
So now somewhere in your mind, this question
might be floating why Keras? Actually, Keras
in Python has the capabilities to run over
python's strong APIs like tensorflow or
Theano or Microsoft's CNTK
So we can say that Python has a greater advantage
here.
Till now, we have seen that both are useful
in their own terms.
Now if we look at the Ease of Learning
Point:
Python is easy to start with as its languages
are based on standardized format, i.e. people
find it easy to read. It looks like you are
reading English. R, on the other hand, is
an unstandardized language. It is quite hard
to learn as compared to Python. Beginners
may find this hurdle in the starting.
In the past years of research, the percentage
of people switching from R to Python are more
as compared to Python to R.
Let's say, if 10% people are switching from
Python to R then, 20% are switching from R
to Python, which is twice as compared to the
before scenario
Next, we are gonna look at the trends,
community support, and Jobs:
Before 2016, R was more in use. But here we
can see that from 2016, Python is in trend.
So, it's more popular than R.
And because of its popularity, it has overall
good support for general purpose programming.
Well if we talk about the community support,
Then Python and R support aspects are almost
similar as Python's support is found at:
Mailing list, user-contributed code & documentation
& StackOverflow. Basically, it has more adoption
from developers & programmers end.
Whereas R language support is also found at:
Mailing list, user-contributed documentation
& active StackOverflow members. Basically,
R has more adoption from researchers, data
scientist and statisticians end.
Now if we talk about Job trends, let's
check the Google Job Trends graph right here,
this is the Job postings for R and Python
in past 12 months "WORLDWIDE" where python
is asked more as compared to R. How is it
possible? Because of its popularity and its
need in the current industry. Since Python
is more versatile and an all-rounder programming
language which can be used for majority of
the purposes such as web and application development,
game development, artificial intelligence,
data science, statistical analysis etc, whereas
R language is used among statisticians and
data miners for developing statistical software
and data analysis.
Which clearly depicts that, there are more
jobs for python than R.
Now let's move forward!
So, Which one to choose for Data Science R
or Python?
Guys, this the frequently asked question by
the majority of the learners in this domain.
I would suggest using both if you have the
choice.
They complete each other gracefully and will
make your life better if you leverage their
strengths and avoid their weaknesses.
Everything has their own pros as well as cons,
so as in the case of R and Python.
If we talk about pros in R, well, then
R is great for prototyping and for statistical
analysis.
It has a huge set of libraries which are available
for different statistical type analysis.
Even RStudio IDE is definitely a big plus
as it eases most of the tedious tasks and
fastens your workflow.
Talking about its cons, well
The syntax could be obscure sometimes.
And it is harder for it to integrate to production
workflow.
In my opinion, it is better suited for "consultancy-type"
tasks.
The libraries documentation isn't always user-friendly.
Talking about the pros in Python,
Python is great for scripting and automating
your different data mining pipelines. It is
the de facto scripting language nowadays.
And it also integrates easily in a production
workflow.
Besides, it can be used across different parts
of your software engineering team (like for
back-end, cloud architecture etc.
The scikit-learn library in python is awesome
for machine-learning tasks.
Ipython (and its notebook) is also a powerful
tool for exploratory analysis and presentations.
Talking of its cons
Then python isn't as thorough for statistical
analysis as R, but it has come a long way
these recent years
In my opinion, the learning curve is steeper
than R, since you can do much more with Python.
To conclude it,
I'd like to that you can use R and Python
both. Learn how they inter-operate together.
Start with one and then add the other to your
workflow. It only adds another skill-set into
your resume, which comes as an added bonus
to your career, Isn't it?
So, guys, now it's a wrap time.
Thank you so much for watching this session.
I'd love to hear from you guys that which
one according to you is better and why?
Please reply to us in the comment section
below.
See you again!
