Evolving notes, images and sounds by Luis Apiolaza

Teaching with R: the tools

I bought an Android phone, nothing fancy just my first foray in the smartphone world, which is a big change coming from the dumb phone world(*). Everything is different and I am back at being a newbie; this is what many students feel the same time they are exposed to R. However, and before getting into software, I find it useful to think of teaching from several points of view, considering that there are several user cases:

  1. A few of the students will get into statistics and heavy duty coding, they will be highly motivated and take several stats courses. It is in their best interest to learn R (and other software) very well.
  2. Some, but not many, of the students will use statistics once in while. A point and click GUI may help to remember commands.
  3. Most students will have to consume statistics, read reports, request information from colleagues and act as responsible, statistically literate citizens. I think that concepts (rather than software) are far the most relevant element to this group.

The first group requires access to all the power in R and, very likely, has at least a passing idea of coding in other languages. The second and third groups are occasional users, which will tend to forget the language and notation and most likely will need a system with menus.

At this point, some of the readers may be tempted to say that everyone (that is groups 1—3) should learn to write R code and they just need a good text editor (say Emacs). This may come as a surprise but normal people not only do not use text editors, they even don’t know what they are. You could be tempted to say that having a good editor would also let them write in LaTeX (or XeLaTeX), which is an excellent way to ‘future proof’ your documents. Please let me repeat this: normal people do not write in LaTeX, they use Word or something equivalent.

But, but. Yes, I know, we are not normal people.

What are the problems?

When working in my Ph.D. I had the ‘brilliant’ (a.k.a. masochistic) idea of using different languages for each part of my project: Fortran 90 (Lahey), ASReml, Python (ActiveState), Matlab and Mathematica. One thing that I experienced, was that working with a scripting language integrated with a good IDE (e.g. ActiveState Python or Matlab) was much more conducive to learning than a separate console and text editor. I still have fond memories of learning and using Python. This meandering description brings me back to what we should use for teaching.

Let’s be honest, the stock R IDE that one gets with the initial download is spartan if you are in OS X and plain sucky if you are in Windows. Given that working with the console plus a text editor (Emacs, Vim, Textmate, etc) is an uncomfortable learning experience (at least in my opinion) there is a nice niche for IDEs like RStudio, which integrate editor, data manager, graphs, etc.; particularly if they are cross-platform. Why is that RStudio is not included as the default R IDE? (Incidentally, I have never used Revolution R Productivity Environment—Windows only—that looks quite cool).

Today I am tempted to recommend moving the whole course to RStudio, which means installing it in an awful lot of computers at the university. One of the issues that stops me is that is introducing another layer of abstraction to R. We have the plain-vanilla console, then the normal installation and, on top, RStudio. On the other hand, we are already introducing an extra level with R commander.

At this point we reach the point-and-click GUI. The last two years we have used R Commander, which has helped, but I have never felt entirely comfortable with it. This year I had a chat with some students that used SPSS before and, after the initial shock, they seemed to cope with R Commander. In a previous post someone suggested Deducer, which I hope to check before the end of this year. I am always on the look out for a good and easy interface for students that fall in the second and third cases (see above). It would be nice to have a series of interfaces that look like Jeroen Ooms’s prototypes. Please let me know if you have any suggestions.

(*)This is not strictly true, as I had a Nokia E72, which was a dumb phone with a lot of buttons pretending to be a smartphone.

(**)This post should be read as me thinking aloud and reflecting today’s impressions, which are continually evolving. I am not yet truly comfortable with any GUI in the R world, and still feel that SPSS, Splus or Genstat (for example) provide a nicer flow on the GUI front.

24 Comments

  1. Jason

    As someone between student 1 and 2 who is now learning to use R at work, I would recommend sticking with R Commander even though I use RStudio.

    Here's why:
    R Commander allows people who are familiar with statistics or data analysis to discover how things work with the R syntax. Although I am a huge proponent of learning how to code and I think R Studio is better for that purpose, discovery is what will pique someone's interest and drive them do do more complex analyses and "play" with the tool.

    In later courses (beyond the first level), I would want to introduce a tool like R Studio, because I think ultimately anyone who gets slightly more advanced needs to learn how to code to ensure easily replicated results and to learn how to write and design good software (which is a cross-functional skill that I think all students/analysts need to have some experience with).

    • Luis

      It is an option, although there is the issue of sometimes needing to use code anyway to conduct some of the data manipulation and analyses.

  2. Ian Fellows

    Please do give Deducer a try and let me know what you think. I definitely coded it with 1 and 2 in mind. The GUI should assist 2 in their analyses, and stay out of their way of 1 when they want to just code.

    • Luis

      Hi Ian,

      I'm certainly planning to give Deducer a try once life is a bit less hectic (writing this now in Wellington Airport, waiting for a delayed flight).

      Cheers,

      Luis

  3. xingmowang

    The last sentence from the above brings a question and that is whether the ability to use R is a must-have, good-to-have or just icing on a cake for an analyst at present and in the future.

    • Luis

      In my opinion ,good analysts must be polyglots able to pick new languages with time. Today R is a very good bet because it has achieved a critical mass where many, if not most, people developing new analytical techniques will implement them as R packages. We also have a free language that has dramatically lowered the entry barriers to high quality statistics and graphics for many people in the world.

      However, programming languages come and go, and there is no guarantee of permanent supremacy (ask SAS). In addition, we often need to integrate our analytical tools with other services, so it pays to know more than one language.

      These comments refer mostly to the first group (heavy duty users).

  4. Tony Hirst

    "I am tempted to recommend moving the whole course to RStudio, which means installing it in an awful lot of computers at the university"

    I think RStudio can be run as a service (though I haven't tried it), which means you could host it centrally and allow students to access it via a browser: http://www.rstudio.org/docs/server/getting_starte

    The RStudio manipulate functions ( http://rstudio.org/docs/advanced/manipulate ) also suggest to me that we might be able to use RStudio to deliver packaged, interactive analyses around teaching data sets, for example, and then allow students to extend the underlying code, modify it etc?

    Disclaimer: I'm not a statistician, but I have been exploring how we can use R for creating statistical graphics/visualisations around open data. RStudio is the only way I've interacted with R, except for this online GUI wrapper for creating ggplot2 graphics [ http://www.yeroon.net/ggplot2/ ] , which might provide another way in for students? Eg using the online tool to generate some R code, then taking that into eg RStudio and working it up a little more?

    • Luis

      Thanks Tony. I'm certainly not made my mind up yet, and I'm very keen on listening what other people are using (or thinking of using) for teaching stats. I don't think it is a big deal to get students to install R in their computers (at least most of them seem to cope well with the process AND like having access to the software outside the university environment), so I am not looking at this stage for a server version of R. The possibility of packaging interactive analyses sounds a lot more interesting though.

  5. Mischa

    Another flavour of GUI is rkward. A very nice gui, also for statistics and general R development and learning. It has features such as text completion and argument showing of function. Also better overview, less crowded than Rstudio.
    It is naive on 'nix and an installation bundle is available for windows: [ http://sourceforge.net/projects/rkwardextras/file… ].

    • Luis

      Hi Mischa, RKward looks nice but I need a solid GUI for th main operating systems that are used by the students: Windows and OS X. A goof Linux version and an experimental Windows one won't cut it.

  6. alex shenkin

    I'm confused: can you use R Commander *with* R Studio, or is it R Commander *or* R Studio?

    • Luis

      Alex, in fact it is possible to use both at the same time, What I am wondering about is what are the best options for teaching purposes given that there are several types of students.

  7. edivimo

    I'm an agricultural researcher, and I think I belong to the second group: I'll use statistics for experimental design and results analysis, every six moths. I use RStudio and you're right that I forget how to do the analysis. But I think that is better to learn better programming practices in my R scripts (commenting, good variable names, etc.) than the point and click GUI. I am no expert in many R-GUI's but the subsetting and manipulation of data in R functions is quite difficult to implement in a R-GUI.
    For example: First, I have a data of count of acari in leaves of plants (632 observations), but the problem is I need the sum of total acari by plant, and then the mean of each experimental unit.
    Solution: new data.frames with the function "aggregate", function that I made like 6 months ago but I don't remember how, hidden in a long forgotten file in another unrelated R script. Two "aggregates" (and two frantic file searchings) later and is ready to analysis.
    I think people from the second group need a way to remember that little details.

    • Luis

      I concur that data manipulation is often something that one can rarely do properly through menus; I have very similar problems dealing with plot data that have to be manipulated before analyses, so we end up using R commander but also having to write some code to prepare the data.

      • edivimo

        First, sorry for the double post. It's all fault of my awful third-world-internet-connection.
        Second, I am beginning to learn about mixed-effects models in R, the two books I use emphasize numeric and plot analysis of models, another aspect that I find difficult to implement in a point and click GUI.
        In conclusion, it's going to be painful to learn R (and re-learn it after a while), but it pays.

  8. Roger

    I'd also recommend Deducer. I've started using it with my undergraduate Psychology students and graduate HCI students. The interface is much better than Rcommander (which is ugly and kludgy).

    Cheers,
    Roger

    • Luis

      I will spend some time at the end of January having a look at Deducer. My current holiday mood (summer down under) makes doing any work on R difficult.

  9. Xiao H.

    I have both the regular R-console and Rstudio on my computer. I usually have both on when I work on different things. This makes things a little less cluttered.

    However, one problem I have with Rstudio is that it seems a bit slow and chunky. Of course, that could be a result of the computer I have (MacBook Air 2011 ). But even doing simple stuff seems to be slow – in comparison to the R Console. I wonder if you’ve ever encountered a similar issue with Rstudio.

    • Luis

      Hi Xiao,

      I use RStudio for day to day work without problems (both in a 15″ Macbook Pro 2009 and a 27″ iMac 2010). However, if I need to do very big runs I sometimes use the Terminal version in the iMac and close everything other program in the computer, as to release as much resources as possible.

  10. R Studio has a server distribution which allows: 1/ a unique installation of R and R studio, 2/ universal access via web (compatibilities to be tested), 3/ rather nice IDE.

    I’m trying to push a computer illiterate crowd over to greener pastures of R, and for now I’m on the fence on R Studio vs. R commander. R Studio has the edge because of the version control integration (but since my computer illiterate colleagues don’t know how to spell version control, I doubt that will be the selling point).

    I was expecting R Studio to be a smoother click-here get-it-done IDE, but my first dry run resulted in a resounding failure: I ended up using the command line console to do most stuff, enough to discourage the most motivated of my colleagues from going at it solo.

    The server solution is a good step forward, but for the point-and-click we’re not there yet. The learning curve of R is still too steep for the crowd.

    My 2¢ worth.

    • Luis

      We recently installed RStudio in around 100 computers in a lab and moved all (most?) students from the basic R version. I’ve been trying to wean students from R commander, but some of them certainly feel better in that environment.

      These days RStudio (not the web version) is my default R experience and I’m very happy with it. Having said that, I think there are groups of people that are better off using point and click software (plenty of options in that department) and efforts of developing R in that direction may hit diminishing returns rather soon.

      • Point-and-click for the statistical analysis in R might prove futile, but for loading data, data manipulation (putting variables in the correct class, or changing factor levels values) and a certain amount of graphic setup could certainly gain from some gui assistance. The current arcane cli for setting graphics up is a real putdown for most beginners. Yes, you want to be able to blow your mind out with a impressive data analysis; your boss just wants a great plot to show in the next council meeting. 😉
        I’m not advocating WISIWYG for R, far from it. But in some areas, a nice gui could ease the overall task ALOT.
        Cheers

  11. blackie

    I have had a nice time using the rattle GUI. Seems more professional than deducer,
    polished or something.

    • Luis

      It looks good, I should give it a try.

© 2024 Palimpsest

Theme by Anders NorenUp ↑