R vs. Python: Which is a better programming language for data science?
- by 7wData
The Python vs. R debate rages on in the data scientist community, Here's how the two coding languages match up.
Python vs. R is a common debate among data scientists, as both languages are useful for data work and among the most frequently mentioned skills in job postings for data science positions. Each language offers different advantages and disadvantages for data science work, and should be chosen depending on the work you are doing.
To help data scientists select the right language, Norm Matloff, a professor of computer science at the University of California Davis wrote a Github post aiming to shed some light on the debate.
Matloff compared R and Python across the following 10 domains to determine which programming language was the better choice:
While this is subjective, Python greatly reduces the use of parentheses and braces when coding, making it more sleek, Matloff wrote in the post.
While data scientists working with Python must learn a lot of material to get started, including NumPy, Pandas and matplotlib, matrix types and basic graphics are already built into base R, Matloff wrote.
With R, "the novice can be doing simple data analyses within minutes," he added. "Python libraries can be tricky to configure, even for the systems-savvy, while most R packages run right out of the box."
The Python Package Index (PyPI) has over 183,000 packages, while the Comprehensive R Archive Network (CRAN) has over 12,000. However, PyPI is rather thin on data science, Matloff wrote.
"For example, I once needed code to do fast calculation of nearest-neighbors of a given data point. (NOT code using that to do classification.)" Matloff wrote. "I was able to immediately find not one but two packages to do this. By contrast, just now I tried to find nearest-neighbor code for Python and at least with my cursory search, came up empty-handed; there was just one implementation that described itself as simple and straightforward, nothing fast."
When you search the following terms on PyPI, nothing comes up, Matloff added: log-linear model; Poisson regression; instrumental variables; spatial data; familywise error rate.
Python's massive growth in recent years is partially fueled by the rise of machine learning and artificial intelligence (AI). While Python offers a number of finely-tuned libraries for image recognition, such as AlexNet, R versions can easily be developed as well, Matloff wrote.
"The Python libraries' power comes from setting certain image-smoothing ops, which easily could be implemented in R's Keras wrapper, and for that matter, a pure-R version of TensorFlow could be developed," Matloff wrote.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
How is Artificial Intelligence Transforming Humanity in Every Dimension
14 Oct, 2022Mark Minevich is a highly regarded and trusted Digital Cognitive AI Strategist, Artificial Intelligence expert, Global Social Innovation and Technology …
Pig vs Hive vs SQL – Difference between the Big Data Tools
7 Jun, 2017Hadoop is the hot new technology and SQL is the old, tried and tested tool for diving deep into big …
Airline Algorithms – To Delay or Not Delay? Big Data has the Answer
11 Apr, 2017A few years ago, I boarded a Southwest Airlines flight from Chicago to Kansas. The flight was full and delayed …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.