R’s increasing popularity. Should we care?

Some people will say ‘you have to learn R if you want to get a job doing statistics/data science’. I say bullshit, you have to learn statistics and learn to work in a variety of languages if you want to be any good, beyond getting a job today coding in R.

R4stats has a recent post discussing the increasing popularity of R against other statistical software, using citation counts in Google Scholar. It is a flawed methodology, at least as flawed as other methodologies used to measure language popularities. Nevertheless, I think is hard to argue against the general trend: R is becoming more popular. There is a deluge of books looking at R from every angle, thousands of packages and many jobs openings asking for R experience, which prompts the following question:

Should you/I/we care?

First answer: no. I try to use the best tool for the job; which often happens to be R but it can also be Python, SAS or Fortran. It is nice to be able to use the same tool, say R, across a range of problems, but there are occasions when it feels like using Excel for statistics: one can do it, but one knows that it isn’t a great idea. I know good statisticians that prefer R, SAS or Genstat; the tool doesn’t make you good in the same way that I could buy a Rickenbacker 4001 and I wouldn’t play like Geddy Lee.

Second answer: yes. Popularity attracts good people, who develop good packages, making new techniques available first in R. This doesn’t matter if you are into plain vanilla analyses (there is nothing wrong with this, by the way). Popularity + open source means that the system has been ported to a diversity of computer systems. Need R in a supercomputer? Done. R in a mac? Done. R for your strange operating system, for which there are C and Fortran compilers? Download it and compile it. Done. There is also the ‘I’m not crazy aspect’: other people take the software seriously.

Gratuitous picture: Firescapes II, night illuminated by bonfire (Photo: Luis).

I find people learning R because of ‘job opportunities’ irritating, in the same way that people learn javascript or java only to get a job. Give me any time people that learn R—or any other language for that matter—because they want to, because they are curious, because they want to share their discoveries with other people. Again, it is the difference between someone competent and someone great at what they do; great analysts are very much better than competent ones.

In the comments for the R4stats post there is a reference to R fanboys. Are R fanboys worse than fanboys of other statistical systems? In some respects the answer is yes, because many R users are also open source & open science supporters. Personally, I support both concepts, although I’m not dogmatic about them: I do buy some proprietary software and often can’t provide every detail about my work (commercially sensitive results). Maybe we are looking for a deeper change: we want to democratize statistics. We push for R not necessarily because it is intrinsically a better language, but because we can envision many people doing statistics to better understand the world around us and R is free. Anyway, I would prefer you call me a Python fanboy with split R personality.

9 thoughts on “R’s increasing popularity. Should we care?

  • 2012/05/18 at 1:22 am
    Permalink

    Well said! I first became interested in tracking software popularity not because of R but because of Stata. Its use was growing on our campus and I wanted to know how it was doing overall. As I gathered information about Stata growth, I realized that R might be growing more quickly. R didn’t yet have much visibility on our campus but due to its growth overall, I learned it and was in a good position to support something that quickly became more popular than Stata here at UT. I expect that one day my graphs at http://r4stats.com/articles/popularity/ will show a new package whose rapid growth will indicate that it is worth investigating. I plan on adding AdviseStats to it in the coming year (http://adviseanalytics.com/).

    I definitely agree that the democratization of statistics is the most interesting thing that R provides. With computers cheaper than ever, powerful research tools available for free and a tripling or quadrupling of the number scientists graduating worldwide, exciting times lay ahead.

    Cheers,
    Bob Muenchen (r4stats)

    Reply
  • 2012/05/18 at 7:43 am
    Permalink

    While I would like to agree, not so much. The reason is that, modulo insider access, getting a job/position requires meeting job requirements, and I’ve not seen any for quant work (Wall Street or otherwise) that didn’t specify the stat tool to use. In my primary field of relational databases, knowing the relational model and SQL’s attempt to be relational doesn’t help. One needs to know the admin and built-in procedural language (they all different ones) of a specific database; DB2, Oracle, etc. If it’s a SAS shop, knowing R won’t get you to the “decision maker” stage in interviewing.

    And, not to be too dour, but democratizing statistics generally leads to propaganda; figures don’t lie but liars figure. That sort of thing. Kind of like the WWW/interTubes. It started out as a vehicle for the educated to share data and knowledge, but it wanted to be popular, so has become democratic: social networking and porn for the feeble minded.

    Reply
  • 2012/05/18 at 8:20 am
    Permalink

    Dear Luis,

    I enjoyed your post. I agree that clinging to one language is being short sighted (as much as I do not like that observation, I accept it).
    However, the reason for my comment is to compliment you on your photos – they are beautiful :)

    Cheers,
    Tal

    Reply
    • 2012/05/18 at 11:36 am
      Permalink

      Thanks for that. I do enjoy taking pictures and I think they add some interest to the posts (and look nice in the preview in r-bloggers).

      Reply
  • 2012/05/18 at 8:49 am
    Permalink

    Good post. I blame a lot of this “you have to know R” on this new hype surrounding Data Science and Big Data. “Data Science” and “Big Data” are not a fad, but the hype surrounding them is a fad, and sets forward this bizarre notion that everyone needs to use the same tools. R and Hadoop seem to be most commonly cited.

    R is flexible and is “kind of” similar to a programming language. I think the mantra should be “you need to learn how to program” in a modern language. That is not true for many of the long-time fields like finance, psychology etc., but for the newer technical fields it is.

    I personally prefer Python and R over the other tools, but there are definitely fields that still require SAS, Stata, SPSS etc. and for a reason.

    Reply
    • 2012/05/18 at 11:39 am
      Permalink

      Programming and statistics are the basic skills; once one works long enough and in several environments there is an appreciation of the importance of picking up new languages quickly.

      Reply
  • 2012/05/18 at 9:26 am
    Permalink

    R fans are important because commercial software has something R doesn’t: a marketing department.

    Reply
  • 2012/05/19 at 5:14 am
    Permalink

    I once played a Rickenbacker 4001 and – belive me – it was fantastic. Super Sound, perfect handling and really, really beautiful. A real legend. Oh, and I like R, too. :-)

    Reply
    • 2012/05/19 at 9:38 am
      Permalink

      Lucky you! I’ve never had a chance to play one.

      Reply

Leave a Reply