R is a programming language and software environment for
statistical analysis, graphics representation and reporting. R was created by
Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and
is currently developed by the R Development Core Team.
The core of R is an interpreted computer language which allows
branching and looping as well as modular programming using functions. R allows
integration with the procedures written in the C, C++, .Net, Python or FORTRAN
languages for efficiency.
R is freely available under the GNU General Public License, and
pre-compiled binary versions are provided for various operating systems like
Linux, Windows and Mac.
R is free software distributed under a GNU-style copy left, and an
official part of the GNU project called GNU S.
Evolution of R
R was initially written by Ross Ihaka and Robert
Gentleman at the Department of Statistics of the University of
Auckland in Auckland, New Zealand. R made its first appearance in 1993.
· A large group of individuals
has contributed to R by sending code and bug reports.
· Since mid-1997 there has
been a core group (the "R Core Team") who can modify the R source
code archive.
Features of R
As stated earlier, R is a programming language and software
environment for statistical analysis, graphics representation and reporting.
The following are the important features of R −
· R is a well-developed,
simple and effective programming language which includes conditionals, loops,
user defined recursive functions and input and output facilities.
· R has an effective data
handling and storage facility,
· R provides a suite of
operators for calculations on arrays, lists, vectors and matrices.
· R provides a large, coherent
and integrated collection of tools for data analysis.
· R provides graphical
facilities for data analysis and display either directly at the computer or
printing at the papers.
As a conclusion, R is world’s most widely used statistics
programming language. It's the # 1 choice of data scientists and supported by a
vibrant and talented community of contributors. R is taught in universities and
deployed in mission critical business applications. This tutorial will teach
you R programming along with suitable examples in simple and easy steps.
Why use R for
statistical computing and graphics?
·
R is open-source and free!
R is free to download as it is licensed under the terms of the GNU General
Public License. You can look at the source to see what’s happening under the
hood. There’s more, most R packages are available under the same license so you
can use them, even in commercial applications without having to call your
lawyer.
·
R is popular – and increasing in popularity
IEEE publishes a list of the most popular programming languages each
year. R was ranked 5th in 2016, up from 6th in 2015. It is a big deal for a
domain-specific language like R to be more popular than a general-purpose
language like C#. This not only shows the increasing interest in R as a
programming language, but also of the fields like Data Science and Machine
Learning where R is commonly used.
·
R runs on all platforms
You can find distributions of R for all popular platforms – Windows, Linux and
Mac.R code that you write on one platform can easily be ported to another
without any issues. Cross-platform interoperability is an important feature to
have in today’s computing world – even Microsoft is making its coveted .NET
platform available on all platforms after realizing the benefits of technology
that runs on all systems.
·
Learning R will increase your chances of getting a job
According to the Data Science Salary Survey conducted by O’Reilly Media in
2014, data scientists are paid a median of $98,000 worldwide. The figure is
higher in the US – around $144,000.Of course, knowing how to write R programs
won’t get you a job straight away, a data scientist has to juggle a lot of
tools to do their work. Even if you are applying for a software developer
position, R programming experience can make you stand out from the crowd.
·
R is being used by the biggest tech giants
Adoption by tech giants is always a sign of a programming language’s potential.
Today’s companies don’t make their decisions on a whim. Every major decision
has to be backed by a concrete analysis of data.
Companies Using R
R is the right mix of simplicity and power, and companies all over the world
use it to make calculated decisions. Here are a few ways industry stalwarts are
using R and contributing to the R ecosystem.
Company |
Application/Contribution |
Twitter |
Monitor
user experience |
Ford |
Analyze
social media to support design decisions for their cars |
New
York Times |
Infographics,
data journalism |
Microsoft |
Released
Microsoft R Open, an enhanced R distribution and Microsoft R server after
acquiring Revolution Analytics in 2015 |
Human
Rights Data Analysis Group |
Measure
the impact of war |
Google |
Created
the R style guide for the R user community inside Google |
While using
R, you can rest assured that you are standing on the shoulders of giants.
Is R programming an easy language to learn?
This is a difficult question to answer.
Many researchers are learning R as their first language to solve their data
analysis needs.
That’s the power of the R programming, it
is simple enough to learn as you go. All you need is data and a clear intent to
draw a conclusion based on analysis on that data.
In fact, R is built on top of the language S programming that was originally
intended as a programming language that would help the student learn to program
while playing around with data.
However, programmers that come from a
Python, PHP or Java background might find R quirky and confusing at first. The
syntax that R uses is a bit different from other common programming languages.
While R does have all the capabilities of a
programming language, you will not find yourself writing a lot of if conditions
or loops while writing code in the R language. There are other programming
constructs like vectors, lists, frames, data tables, matrices etc. that allow
you to perform transformations on data in bulk.
Applications of R Programming in Real World
- Data Science
Harvard Business Review named data scientist the “sexiest job of the 21st century”. Glassdoor named it the “best job of the year” for 2016. With the advent of IoT devices creating terabytes and terabytes of data that can be used to make better decisions, data science is a field that has no other way to go but up. Simply explained, a data scientist is a statistician with an extra asset: computer programming skills. Programming languages like R give a data scientist superpowers that allow them to collect data in realtime, perform statistical and predictive analysis, create visualizations and communicate actionable results to stakeholders.Most courses on data science include R in their curriculum because it is the data scientist’s favorite tool. - Statistical computing
R is the most popular programming language among statisticians. In fact, it was initially built by statisticians for statisticians. It has a rich package repository with more than 9100 packages with every statistical function you can imagine. R’s expressive syntax allows researchers – even those from non computer science backgrounds to quickly import, clean and analyze data from various data sources.
R also has charting capabilities, which means you can plot your data and create interesting visualizations from any dataset. - Machine Learning
R has found a lot of use in predictive analytics and machine learning. It has various packages for common ML tasks like linear and non-linear regression, decision trees, linear and non-linear classification and many more. Everyone from machine learning enthusiasts to researchers use R to implement machine learning algorithms in fields like finance, genetics research, retail, marketing and health care.
Alternatives to R programming
R is not the only language that you can use
for statistical computing and graphics. Some of the popular alternatives of R
programming are:
· Python – Popular general-purpose language
Python is a very powerful high-level, object-oriented programming
language with an easy-to-use and simple syntax.
Python is extremely popular among data
scientists and researchers. Most of the packages in R have equivalent libraries
in Python as well.
While R is the first choice of
statisticians and mathematicians, professional programmers prefer implementing
new algorithms in a programming language they already know.
The choice between R vs Python also depends
on what you are trying to accomplish with your code. If you are trying to
analyze a dataset and present the findings in a research paper, then R is
probably a better choice. But if you are writing a data analysis program that
runs in a distributed system and interacts with lots of other components, it
would be preferable to work with Python.
· SAS (Statistical
Analysis System)
SAS is
a powerful software that has been the first choice of private enterprise for
their analytics needs for a long time. Its GUI and comprehensive documentation,
coupled with reliable technical support make it a very good tool for companies.
While R is the undisputed champion in
academics and research, SAS is extremely popular in commercial analytics. But R
and Python are gaining momentum in the enterprise space and companies are also
trying to move towards open-source technologies. Time will tell if SAS will
continue its dominance or R/Python will take over.
· SPSS – Software package for
statistical analysis
SPSS is another popular statistical tool. It is
used most commonly in the social sciences and is considered the easiest to
learn among enterprise statistical tools.
SPSS is loved by non-statisticians because
it is similar to excel so those who are already familiar with it will find SPSS
very easy to use.
SPSS has the same downside as SAS – it is
expensive. SPSS was acquired by IBM in 2009 for a reported $1.2 billion.
No comments:
Post a Comment