When it comes to picking a language for a new data science project, developers often have to go through the debate of whether Python or R would be the best suited for the task. R is a language specifically designed for data analysis so it has a lot of useful features built-in, but Python is a general-purpose language with a lot of data-centric libraries that also makes it a suitable choice.
R had long been the go-to language for data analysis, but in recent years that has begun to change as people began to see the potential in Python as well.
According to a 2020 survey from recruiting company BurtchWorks, the five-year trend of language preference for data science has shown R falling while Python rises. In 2016, 20% of data scientists preferred Python and 43% preferred R, but in 2020, 47% preferred Python and only 29% preferred R.
Even though Python is rapidly rising in popularity as a preferred choice, it’s not possible to say one language is definitively better than the other; both languages have their pros and cons. Carefully considering each of those, along with your project’s specific needs, can help make the decision over which language to go with easier. Eric McGee, a senior network engineer at TRG Datacenters, commented:
“The solution, like any other problem, is largely dependent on the problem’s criteria, and there is no correct response to this issue other than ‘it depends. Both of these languages are extremely powerful, and regardless of which one you invest your time in, there is no wrong answer if you want a long-term career in data science; learning either of these two languages will pay you in the future one way or another, so instead of getting stuck in analysis paralysis, just pick one and get to work. The bulk of data science problems can be solved with any of these languages, and the rest is a matter of technique, team capabilities, and available resources, all of which are mostly independent of the language.”
Python pros
One nice thing about Python is that it’s very easy to use. It’s often recommended as a first language to learn for people wanting to learn programming because of that ease of use. Veronica Miller, a cybersecurity expert at VPNoverview said:
“Python, as a general-purpose programming language, appears to be a better choice if you want to start into programming in general and want something that can be utilized in various fields of software development, such as web development.”
Because it is a general programming language, it might be a better option if you need to create APIs to expose data models or want it to be able to interact with other software, McGee added. In addition, Python supports a wide range of programming paradigms, including object-oriented programming and procedural programming, according to Miller.
It also has the advantage of having a number of packages and libraries for data science, such as TensorFlow, Pandas, Keras, NumPy, and PyTorch, Miller explained.
R pros
Like Python, R also has a lot of advantages that make it a good choice. It was designed to be used for data analysis, which means it has really advanced capabilities for that built into it. Francesco Tisiot, developer advocate at the database as a service company Aiven commented:
“R is specifically related to statistics, with most of the statistical algorithms having their first release in R and it is used in related introductory courses. This makes R a good fit for exploratory data analysis with a very low barrier to go from data to insights, creating stunning reports, dashboards or APIs.”
Like Python, R also has a large collection of tools and packages to extend its core functionality. It also has a lot of capabilities for building dashboards and visualizations, Miller explained. In addition, according to Miller, since R is a procedural language, developers might prefer that they’re able to break large problems into smaller segments to make problem-solving easier.
Python cons
One negative against Python is that many popular R libraries for statistical analysis aren’t available for it, said Miller. According to Yan, another con is that Python can consume a lot of memory.
R Cons
Some of the main disadvantages of R are that it can be difficult to understand, can be slow if not used properly, it doesn’t have sufficient documentation, and it’s slower than Python, according to Miller. Yan agreed that R can be difficult to learn and implement. She also added that it lacks robust security features.