Flotsam 13: early July links

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon:

Sometimes people are truthful and cruel. Here Gappy on a mission goes for the jugular:

Over and out.

Late-April flotsam

It has been month and a half since I compiled a list of statistical/programming internet flotsam and jetsam.

  • Via Lambda The Ultimate: Evaluating the Design of the R Language: Objects and Functions For Data Analysis (PDF). A very detailed evaluation of the design and performance of R. HT: Christophe Lalanne. If you are in statistical genetics and Twitter Christophe is the man to follow.
  • Attributed to John Tukey, “without assumptions there can be no conclusions” is an extremely important point, which comes to mind when listening to the fascinating interview to Richard Burkhauser on the changes of income for the middle class in USA. Changes to the definition of the unit of analysis may give a completely different result. By the way, does someone have a first-hand reference to Tukey’s quote?
  • Nature news publishes RNA studies under fire: High-profile results challenged over statistical analysis of sequence data. I expect to see happening more often once researchers get used to upload the data and code for their papers.
  • Bob O’Hara writes on Why simple models are better, which is not positive towards the machine learning crowd.
  • A Matlab Programmer’s Take On Julia, and a Python developer interacts with Julia developers. Not everything is smooth. HT: Mike Croucher. ‏
  • Dear NASA: No More Rainbow Color Scales, Please. HT: Mike Dickinson. Important: this applies to R graphs too.
  • Rafael Maia asks “are programmers trying on purpose to come up with names for their languages that make it hard to google for info?” These are the suggestions if one searches Google for Julia:

    Unhelpful search suggestions.

  • I suggest creating a language called Bieber and search for dimension Bieber, loop Bieber and regression Bieber.

That’s all folks.

Early-March flotsam

It has been a strange last ten days since we unexpectedly entered grant writing mode. I was looking forward to work on this issue near the end of the year but a likely change on funding agency priorities requires applying in a few weeks; unfortunately, it means that all this is happening at the same time I am teaching.

  • As usual I got involved in a strange, for me, project which will require semantic analysis of international treaties. I will start having a look at Latent Semantic Analysis using lsa in R and gensim in Python. I’ll have to retrieve documents from the web and process them in quite a few ways.
  • The success of some of Hadley Wickham’s packages got me thinking about underlying design issues in R that make functions so hard to master for users. Don’t get me wrong, I still think that R is great, but why are there so many problems to understand part of the core functionality? A quick web search will highlight that there is, for example, an incredible amount of confusion on how to use the apply family of functions. The management of dates and strings is also a sore point. I perfectly understand the need for, and even the desirability of, having new packages that extend the functionality of R. However, this is another kettle of fish; we are talking about making sane design choices so there is no need to repackage basic functionality to make it usable.
  • Talking about failures, Andrew Gelman mentions the sempiternal problem of designers Turn(ing) a Boring Bar Graph into a 3D Masterpiece or, as a commenter put it, “Turn(ing) a Boring Bar Graph into a 3D Pile of Steaming Crap”. While it is always easy to have a laugh on designers, we should remember that the abundance of 3D piles[...] also reflects our failure to make the point on good data presentation clear. Well, that and the spawn of evil Microsoft Excel and PowerPoint.
  • Gratuitous picture of a vegetarian friend: caterpillar of Emperor gum moth (Photo: Luis).

  • Beware if you are going out for dinner with vegetarian friends. Besides vegetarians having an inordinate influence on the choice of restaurant you may end up subsidizing their meals. HT: @EricCrampton.
  • A cool collection of movie snippets that display mathematics. HT: @Freakonometrics.
  • http://numfocus.org a foundation for supporting scientific computing in Python. HT: @teoliphant.
  • Andrew Gelman again, this time pointing out to Thaddeus Tarpey’s presentation All models are right… most are useless (PDF), focusing on the positive aspects of model approximation.

And that’s all folks.

Mid-February flotsam

This coming Monday we start the first semester in Canterbury (and in New Zealand for that matter). We are all looking forward to an earthquake-free year; more realistically, I’d be happy with low magnitude aftershocks.

  • The Wall Street Journal reports that more pediatricians are ‘firing’ patients that refuse to use vaccines. I’m wondering about practices that will cluster with ‘vaccine refusers’.
  • I am collaborating with a researcher in Electrical Engineering where he and his students develop very cool tools for us (see example in this previous post on dealing with autocorrelation in mixed models). They use Python to control the tools, data extraction and do some basic processing (isn’t that cool?). Python + Scipy have moved a lot towards creating a nice environment for scientific computing; however, in my opinion setting an R environment is way easier than dealing with all the versions for python, wxpython, etc. At the end I went for the 32 bit version in OS X because 64 bit is not supported (wxpython targets carbon, meh).
  • Why not publish your data too? HT: @chlalanne. I’ll be uploading datasets that I’ve used in my papers (at least the ones that do not contain industry/confidential data).
  • Timandra Harkness asks Have you been seduced by statistics? in Significance Journal (free access). In the same issue Matt Briggs asks another important question Why do statisticians answer silly questions that no one ever asks?
  • How companies learn your secrets: the creepy side of statistics/machine learning.
  • Feeling all smug in a discussion in Engineering: people struggling with software licensing (Office, Matlab, etc). I pointed out that we didn’t have any licensing issues with R to be at the bleeding edge when teaching. Take that!
  • In the nothing-to-do with-stats category, Retronaut displays a beautiful collection of Soviet space propaganda posters. HT: can’t remember.

Supermarkets in New Zealand, Are they creepy too? Checkouts at Pak N Save, Christchurch (Photo: Luis).

I’ve been running analyses for two to three papers, mostly using R + asreml-R + ggplot2 + plyr. I hope to write one of them using XeLaTeX (I’ll be the sole author), while in the other(s) I am condemned to MS Word (!). Thinking of journals to send the papers to, with the most liberal copyright as possible (hard in forestry).

P.S. Flotsam posts compile small bits of information, twitter favorites, shared links, etc.

IPython

I installed the latest version (0.12) of IPython from source in my mac, but forgot that I had a previous version (0.10) installed using easy_install. When trying to run ipython I kept getting the error: No module named terminal.ipapp.

Fixing it. I ran easy_install -m ipython in Terminal, so Python doesn’t continue looking for the old package, as explained here. Then I was able to navigate to /Library/Python/2.6/site-packages and then use rm -r ipython-0.10.1-py2.6.egg. Now everything works fine.