Early-March flotsam

It has been a strange last ten days since we unexpectedly entered grant writing mode. I was looking forward to work on this issue near the end of the year but a likely change on funding agency priorities requires applying in a few weeks; unfortunately, it means that all this is happening at the same time I am teaching.

  • As usual I got involved in a strange, for me, project which will require semantic analysis of international treaties. I will start having a look at Latent Semantic Analysis using lsa in R and gensim in Python. I’ll have to retrieve documents from the web and process them in quite a few ways.
  • The success of some of Hadley Wickham’s packages got me thinking about underlying design issues in R that make functions so hard to master for users. Don’t get me wrong, I still think that R is great, but why are there so many problems to understand part of the core functionality? A quick web search will highlight that there is, for example, an incredible amount of confusion on how to use the apply family of functions. The management of dates and strings is also a sore point. I perfectly understand the need for, and even the desirability of, having new packages that extend the functionality of R. However, this is another kettle of fish; we are talking about making sane design choices so there is no need to repackage basic functionality to make it usable.
  • Talking about failures, Andrew Gelman mentions the sempiternal problem of designers Turn(ing) a Boring Bar Graph into a 3D Masterpiece or, as a commenter put it, “Turn(ing) a Boring Bar Graph into a 3D Pile of Steaming Crap”. While it is always easy to have a laugh on designers, we should remember that the abundance of 3D piles[…] also reflects our failure to make the point on good data presentation clear. Well, that and the spawn of evil Microsoft Excel and PowerPoint.
  • Gratuitous picture of a vegetarian friend: caterpillar of Emperor gum moth (Photo: Luis).

  • Beware if you are going out for dinner with vegetarian friends. Besides vegetarians having an inordinate influence on the choice of restaurant you may end up subsidizing their meals. HT: @EricCrampton.
  • A cool collection of movie snippets that display mathematics. HT: @Freakonometrics.
  • a foundation for supporting scientific computing in Python. HT: @teoliphant.
  • Andrew Gelman again, this time pointing out to Thaddeus Tarpey’s presentation All models are right… most are useless (PDF), focusing on the positive aspects of model approximation.

And that’s all folks.

Mid-February flotsam

This coming Monday we start the first semester in Canterbury (and in New Zealand for that matter). We are all looking forward to an earthquake-free year; more realistically, I’d be happy with low magnitude aftershocks.

  • The Wall Street Journal reports that more pediatricians are ‘firing’ patients that refuse to use vaccines. I’m wondering about practices that will cluster with ‘vaccine refusers’.
  • I am collaborating with a researcher in Electrical Engineering where he and his students develop very cool tools for us (see example in this previous post on dealing with autocorrelation in mixed models). They use Python to control the tools, data extraction and do some basic processing (isn’t that cool?). Python + Scipy have moved a lot towards creating a nice environment for scientific computing; however, in my opinion setting an R environment is way easier than dealing with all the versions for python, wxpython, etc. At the end I went for the 32 bit version in OS X because 64 bit is not supported (wxpython targets carbon, meh).
  • Why not publish your data too? HT: @chlalanne. I’ll be uploading datasets that I’ve used in my papers (at least the ones that do not contain industry/confidential data).
  • Timandra Harkness asks Have you been seduced by statistics? in Significance Journal (free access). In the same issue Matt Briggs asks another important question Why do statisticians answer silly questions that no one ever asks?
  • How companies learn your secrets: the creepy side of statistics/machine learning.
  • Feeling all smug in a discussion in Engineering: people struggling with software licensing (Office, Matlab, etc). I pointed out that we didn’t have any licensing issues with R to be at the bleeding edge when teaching. Take that!
  • In the nothing-to-do with-stats category, Retronaut displays a beautiful collection of Soviet space propaganda posters. HT: can’t remember.

Supermarkets in New Zealand, Are they creepy too? Checkouts at Pak N Save, Christchurch (Photo: Luis).

I’ve been running analyses for two to three papers, mostly using R + asreml-R + ggplot2 + plyr. I hope to write one of them using XeLaTeX (I’ll be the sole author), while in the other(s) I am condemned to MS Word (!). Thinking of journals to send the papers to, with the most liberal copyright as possible (hard in forestry).

Early-February flotsam

Mike Croucher at Walking Randomly points out an interesting difference in operator precedence for several mathematical packages to evaluate a simple operation 2^3^4. It is pretty much a divide between Matlab and Excel (does the later qualify as mathematical software?) on one side with result 4096 (or (2^3)^4) and Mathematica, R and Python on the other, resulting on 2417851639229258349412352 (or 2^(3^4)). Remember your parentheses…

Corey Chivers, aka Bayesian Biologist, uses R to help students understand the Monty Hall problem. I think a large part of the confusion to grok it stems from a convenient distraction: opening doors. The problem could be reframed as: i- you pick a door (so probability of winning the prize is 1/3) and Monty gets the other two doors (probability of winning is 2/3), ii- Monty is offering to switch all his doors for yours, so switching increases the probability of winning, iii- Monty will never open a winning door to entice the switch, so we should forget about them.

To make the point clearer, let’s imagine now that instead of 3 doors the game has 10 doors. You pick one (probability of winning 1/10) and Monty keeps 9 (probability of winning 9/10). Would you switch one door for nine? Of course! The fact that Monty will open 8 non-winning doors rather than all of his doors does not make a difference in the deal.

Pierre Lemieux reminds us that “a dishonest statistician is an outliar”.

If you want to make dulce de leche using condensed milk—but lack a pressure cooker—use an autoclave for 50 to 60 minutes. HT: Heidi Smith. Geeky and one needs an autoclave worth thousands of dollars, but that’s what universities are for.

Lesser and Pearl inform us that there are at least 20 modalities for making statistics fun in “Functional Fun in Statistics Teaching: Resources, Research and Recommendations”. HT: Chelsea Heaven. I’ve used music, videos, cartoons, jokes, striking examples using body parts, quotations, food, juggling, etc.

An old review of Buddhism without Beliefs: A Contemporary Guide to Awakening by Stephen Batchelor. I can’t see any statistical angle, but I liked that book.

P.S. Awesome video by OK Go HT: Eric Crampton.

Back to quantitative genetics!

Mid-January flotsam: teaching edition

I was thinking about new material that I will use for teaching this coming semester (starting the third week of February) and suddenly compiled the following list of links:

