An opinion piece on Calculus and statistics by Daniel Kaplan, on teaching a different version of your typical introductory calculus course, so it is useful for statistics. He goes as far as teaching calculus using R. There is more information in Project MOSAIC.
Biased and Inefficient, Thomas Lumley’s personal statistics blog (he insists that posting 75% of Statschat is not enough to qualify as personal). You may know Thomas from the survey package (or a few others).
If you are a postgrad student in New Zealand you can apply for a NeSI (New Zealand eScience Infrastructure) postgraduate allocation to access high performance computing facilities.
It has been month and a half since I compiled a list of statistical/programming internet flotsam and jetsam.
Via Lambda The Ultimate: Evaluating the Design of the R Language: Objects and Functions For Data Analysis (PDF). A very detailed evaluation of the design and performance of R. HT: Christophe Lalanne. If you are in statistical genetics and Twitter Christophe is the man to follow.
Attributed to John Tukey, “without assumptions there can be no conclusions” is an extremely important point, which comes to mind when listening to the fascinating interview to Richard Burkhauser on the changes of income for the middle class in USA. Changes to the definition of the unit of analysis may give a completely different result. By the way, does someone have a first-hand reference to Tukey’s quote?
Nature news publishes RNA studies under fire: High-profile results challenged over statistical analysis of sequence data. I expect to see happening more often once researchers get used to upload the data and code for their papers.
It has been a strange last ten days since we unexpectedly entered grant writing mode. I was looking forward to work on this issue near the end of the year but a likely change on funding agency priorities requires applying in a few weeks; unfortunately, it means that all this is happening at the same time I am teaching.
As usual I got involved in a strange, for me, project which will require semantic analysis of international treaties. I will start having a look at Latent Semantic Analysis using lsa in R and gensim in Python. I’ll have to retrieve documents from the web and process them in quite a few ways.
The success of some of Hadley Wickham’s packages got me thinking about underlying design issues in R that make functions so hard to master for users. Don’t get me wrong, I still think that R is great, but why are there so many problems to understand part of the core functionality? A quick web search will highlight that there is, for example, an incredible amount of confusion on how to use the apply family of functions. The management of dates and strings is also a sore point. I perfectly understand the need for, and even the desirability of, having new packages that extend the functionality of R. However, this is another kettle of fish; we are talking about making sane design choices so there is no need to repackage basic functionality to make it usable.
Talking about failures, Andrew Gelman mentions the sempiternal problem of designers Turn(ing) a Boring Bar Graph into a 3D Masterpiece or, as a commenter put it, “Turn(ing) a Boring Bar Graph into a 3D Pile of Steaming Crap”. While it is always easy to have a laugh on designers, we should remember that the abundance of 3D piles[…] also reflects our failure to make the point on good data presentation clear. Well, that and the spawn of evil Microsoft Excel and PowerPoint.
Beware if you are going out for dinner with vegetarian friends. Besides vegetarians having an inordinate influence on the choice of restaurant you may end up subsidizing their meals. HT: @EricCrampton.
This coming Monday we start the first semester in Canterbury (and in New Zealand for that matter). We are all looking forward to an earthquake-free year; more realistically, I’d be happy with low magnitude aftershocks.
The Wall Street Journal reports that more pediatricians are ‘firing’ patients that refuse to use vaccines. I’m wondering about practices that will cluster with ‘vaccine refusers’.
I am collaborating with a researcher in Electrical Engineering where he and his students develop very cool tools for us (see example in this previous post on dealing with autocorrelation in mixed models). They use Python to control the tools, data extraction and do some basic processing (isn’t that cool?). Python + Scipy have moved a lot towards creating a nice environment for scientific computing; however, in my opinion setting an R environment is way easier than dealing with all the versions for python, wxpython, etc. At the end I went for the 32 bit version in OS X because 64 bit is not supported (wxpython targets carbon, meh).
Feeling all smug in a discussion in Engineering: people struggling with software licensing (Office, Matlab, etc). I pointed out that we didn’t have any licensing issues with R to be at the bleeding edge when teaching. Take that!
I’ve been running analyses for two to three papers, mostly using R + asreml-R + ggplot2 + plyr. I hope to write one of them using XeLaTeX (I’ll be the sole author), while in the other(s) I am condemned to MS Word (!). Thinking of journals to send the papers to, with the most liberal copyright as possible (hard in forestry).
P.S. Flotsam posts compile small bits of information, twitter favorites, shared links, etc.
I installed the latest version (0.12) of IPython from source in my mac, but forgot that I had a previous version (0.10) installed using easy_install. When trying to run ipython I kept getting the error: No module named terminal.ipapp.
Fixing it. I ran easy_install -m ipython in Terminal, so Python doesn’t continue looking for the old package, as explained here. Then I was able to navigate to /Library/Python/2.6/site-packages and then use rm -r ipython-0.10.1-py2.6.egg. Now everything works fine.