My setup

Yesterday I accidentally started a dialogue in Twitter with the dude running The Setup. Tonight I decided to procrastinate in my hotel room (for work in Rotovegas) writing up my own Luis Uses This:

Since 2005 I’ve been using Apple computers as my main machines. They tend to be well built and keep on running without rebooting for a while and I ssh to a unix box from them when I need extra oomph. At the moment I have a 2009 15″ macbook pro and a 2010 27″ iMac; both computers are pretty much the default, except for extra RAM and they are still running Snow Leopard. I have never liked Apple mice, so I bought a Logitech mouse for the iMac. I use a generic Android phone, because I’m too cheap to spend money on an iPhone. I don’t have an iPad either, because I don’t have a proper use for it and I dislike lugging around gear for the sake of it.

I’m not ‘addicted’ to any software. However, I do use some programs frequently: R for data analysis/scripting (often with RStudio as a frontend), asreml for quantitative genetics, Python for scripting/scraping, XeLaTeX for writing lecture notes and writing solo papers (because my second surname is Zúñiga), MS Word for writing anything that requires collaborating with other people, Keynote for presentations (but sometimes have to use PowerPoint). I check my university email with Thunderbird or Entourage, my browsing is mostly done using Chrome, but when paranoid I use Tor + Firefox + Vidalia. I use a bunch of other programs but not often enough to deserve a mention. If you think about it, Keynote is the only format that defeats my platform agnosticism (I could still write Word documents using OpenOffice or similar). I almost forgot! I do rely on Dropbox to keep computers synced.

I keep on changing text editors: I don’t understand how people can cope with emacs and am uncomfortably writing this post using vim (which is awful as well), I own a copy of Textmate but I feel annoyed by the author’s abandonment, so I’m at a point where I tend to use software-specific editors: R – RStudio, XeLaTeX – TeXShop, etc.

If I weren’t allowed to use a mac at work I’d probably move to Linux; the major hassle would be converting Keynote presentations to something else. I could live with Windows, but I would start with a totally clean install, because I find the pre-installed software very unhelpful. These days I think that I’ve been unconsciously preparing myself for the impermanence of software, so if I need to learn a new stats package or new editor that is ‘just fine’: software agnostic Buddhism.

Non-computer-wise I’m permanently dissatisfied with my bag/backpack: I haven’t found a nice overnight trip bag that it’s designed for walking around carrying a laptop. (Did I mention that I like to walk?) Most of them are dorky or plain useless and my current theory is that the solution goes for getting a smaller (say 11-13″) laptop. Because the university depreciates laptops over 4 years I still have to wait a year to test the theory.

I tend to doodle when thinking or preparing a talk. I prefer to write in unlined photocopy paper with a pen or pencil. A fountain pen is nice, but a $0.20 pen will do too. It has the advantage of being i-) cheap and ii-) available everywhere.

I like to take pictures and muck around with shutter speeds and apertures, which doesn’t mean that I’m any good at it. I use a Nikon Coolpix P7100 camera, but I’m sure that a Canon G12 would do the job as well. It is the smallest camera that gives me the degree of control I like. I process the pictures in Lightroom, which is just OK, but, again, it sort of fits my platform agnosticism.

I’m slowly moving to ebooks, for which I use a Sony Reader (which I got for free) that I manage using Calibre. I keep wireless disabled and non-configured: it is only for reading books and I often use the dictionary feature while reading (I’m always surprised by the large number of English words).

What would be your dream setup?

This would be a ‘sabbatical package’ where I would spend 6 months living in another (earthquake-proof) country, near the ocean, with my family, good food, a light notebook with a week’s worth of battery life, decent internet connection and the chance to catch up with my research subject.

P.S. 2012-04-19. I came across this post in 37 signals discussing a switch from OS X to Ubuntu. I think that there is a class of user cases (e.g. web developers, scientific programming) where moving from one to the other should be relatively painless.

Eucalypt earthquake memories

This week was the first anniversary of the February 22nd earthquake in Christchurch. Between that and the first week of lectures it has been hard to find time to write much about data analysis. Thus, if I owe you some analyses, some code or some text be patient, please. I’ll be back soon(ish).

I have never been much of a downtown person, so it is no surprise that I haven’t been in Hagley Park for a while. Today (Sunday in my part of the planet) was a a nice treat to spend some time there and see the interaction between city recovery and the Botanic Gardens, particularly in a Eucalyptus delegatensis surrounded by messages.

Earthquake messages around Eucalyptus delegatensis, Christchurch Botanic Gardens (Photo: Luis).
Detail of messages (Photo: Luis).
Trees can carry a lot of sadness and hope (Photo: Luis).
Up the eucalypt tree (Photo: Luis).

Time for a pause, time to look forward.

P.S. I wanted to attend the Christchurch PechaKucha tonight, but I ran out of time at the end. Next time.

Academic publication boycott

The last few weeks there has been a number of researchers calling for, or supporting, a boycott against Elsevier; for example, Scientific Community to Elsevier: Drop Dead, Elsevier—my part in its downfall or, more general, Should you boycott academic publishers?

What metrics are used to compare Elsevier to other publishers? It is common to refer to cost-per-article; for example, in my area Forest Ecology and Management (one of the most popular general Forestry Journals) charges USD 31.50 per article but Tree Genetics and Genomes (published by Springer Verlag) costs EUR 34.95 (roughly USD 46). Nevertheless, researchers affiliated to universities or research institutes rarely pay per article; instead, our libraries have institution-wide subscriptions. Before the great consolidation drive we would have access to individual journal subscription prices (sometimes reaching thousands of dollars per year, each of them). Now libraries buy bundles from a given publisher (e.g. Elsevier, Springer, Blackwell, Wiley, etc) so it is very hard to get a feeling of the actual cost of a single journal. With this consideration, I am not sure if Elsevier ‘deserves’ being singled out in this mess; at least not any more than Springer or Blackwell, or… a number of other publishers.

Gaahl Gorgoroth
Elsevier? No, just Gaahl Gorgoroth

What we do know is that most of the work is done and paid for by scientists (and society in general) rather than journals. Researchers do research and our salaries and research expenses are many times paid for (at least partially if not completely) by public funding. We also act as referees for publications and a subset of us are part of editorial boards of journals. We do use some journal facilities; for example, an electronic submission system (for which there are free alternatives) and someone will ‘produce’ the papers in electronic format, which would be a small(ish) problem if everyone used LaTeX.

If we go back some years ago, many scientific societies used to run their own journals (many times scrapping by or directly running them at a loss). Then big publishers came ‘to the rescue’ offering economies of scale and an opportunity to make a buck. There is nothing wrong with the existence of publishers facilitating the publication process; but when combined with the distortions in the publication process (see below) publishers have achieved a tremendous power. At the same time, publishers have hiked prices and moved a large part of their operations to cheaper countries (e.g. India, Indonesia, etc) leaving us researchers struggling to pay for the subscriptions to read our own work. Not only that, but copyright restrictions in many journals do not allow us to make our work available to the people who paid for the research: you, the tax payer.

Today scientific societies could run their own journals and completely drop the printed version, so we could have cheaper journals while societies wouldn’t go belly up moving paper across continents. Some questions, Would scientific societies be willing to change? If that’s the case, Could they change their contractual arrangements with publishers?

Why do we play the game?

The most important part of the problem is that we (the researchers) are willing to participate in the publication process with the current set of rules. Why do we do it? At the end of the day, many of us play the journal publication game because it has been subverted from dissemination of important research results to signaling researcher value. University and research institute managers need to have a way to evaluate their researchers, managing tenures, promotions, etc. Rather than going for actually doing a proper evaluation (difficult, expensive and subjective), they go for an easy one (subjective as well): number of publications in ‘good’ journals. If I want to get promoted or taken seriously in funding applications I have to publish in journals.

I think it is easy to see that I enjoy openly communicating what I have learned (for example this blog and in my main site). I would rather spend more time doing this than writing ‘proper’ papers, but of course this is rarely considered important in my evaluations.

If you already are a top-of-the-scale, tenured professor it is very easy to say ‘I don’t want to play the game anymore’. If you are a newcomer to the game, trying to establish yourself in these times of PhD gluts and very few available research positions, all incentives line up to play the game.

This is only part of the problem

The questioning does not stop at the publication process. Instead, the peer value of review process is also under scrutiny. Then we enter into open science: beyond having access to publications, How much can we trust the results? We have discussions on open access data even when it is in closed journals. And on, and on.

We have moved from a situation of scarcity, where publishing was expensive, the tools to analyze our data were expensive and making data available was painfully difficult to a time when all that is trivially easy. I can collect some data, upload it to my site, rely on the democratization of statistics, write it up and create a PDF or HTML version by pressing a button. We would like to have feedback: relatively easy if the publication is interesting. We want an idea of reliability or trust: we could have, for example, some within-organization peer reviewing. Remember though that peer reviewing is not a panacea. We want to have an idea of community standing, which would be the number of people referring to that document (paper, blog post, wiki, whatever).

Maybe the most important thing is that we are trying to carry on with ‘traditional’ practices that do not extend beyond, say, 100 years. We do not need to do so if we are open to a more fluid environment on both publication, analytics and data sharing. Better, we wouldn’t need to continue if we stopped putting so much weight on traditional publication avenues when evaluating researchers.

Is Elsevier evil? I don’t think so; or, at least, it doesn’t seem to be significantly worse than other publishers. Have we vested too much power on Elsevier and other publishers? You bet! At the very least we should get back to saner copyright practices, where the authors retain copyright and provide a non-exclusive license to the publishers. Publishers will still make money but everyone will be able to freely access our research results because, you know, they already pay for the research.

Disclaimer: I have published in journals managed by Elsevier and Springer. I currently have articles under review for both publishers.

P.S. Gaahl Gorgoroth image from Wikipedia.

P.S.2 The cost of knowledge is keeping track of academics taking a stand against Elsevier; 1503 of them at 12:32 NZST 2012-01-30. HT: Arthur Charpentier.

P.S.3 2012-01-31 NZST I would love to know what other big publishers are thinking.

P.S.4 2012-02-01 NZST Research Works Act: are you kidding me?

The Research Works Act (RWA) bill (H.R.3699) introduced to the US Congress on 16 December 2011 proposes that:

No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that–

(1) causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work; or

(2) requires that any actual or prospective author, or the employer of such an actual or prospective author, assent to network dissemination of a private-sector research work.

The idea of calling researcher’s work funded by government, edited by their peers (probably at least partially funded by government funds) private-sector research work because a publishing company applied whatever document template they use on top of the original manuscript is obscene. By the way, Richard Poynder has a post that lists a number of publishers that have publicly disavowed the RWA.

P.S.5 2012-02-02 16:38 NZST Doron Zeilberger points to the obvious corollary: we don’t need journals for research dissemination anymore (although still we do for signaling). Therefore if one is keen on boycotts it should affect all publishers. Academics are stuck with last century’s publication model.

P.S.6 2012-10-19 15:18 NZST I have some comments on publication incentives.

Hiking

Stitched pictures of the view, courtesy of Hugin and Skitch (Photo: Luis).

Traveling with friends and family, view from Green Lake towards Lake Tarawera, North Island, New Zealand. Time for walks, nice meals and setting R aside (although I have a few drafts for the blog soon to be published). Now back in the South Island we prepare for a great weekend with more walks, food and friends. Merry Christmas.

On R, bloggers, politics, sex, alcohol and rock & roll

Yesterday morning at 7 am I was outside walking the dog before getting a taxi to go to the airport to catch a plane to travel from Christchurch to Blenheim (now I can breath after reading without a pause). It was raining cats and dogs while I was walking doggyo, thinking of a post idea for Quantum Forest; something that I could work on without a computer. Then I remembered that I told Tal Galili that I would ‘mention r-bloggers’ in a future post. Well, Tal, this is it.

I started this blog on 4th October and I thought ‘I could write a few things and see how it goes. I may even get 10 people a day reading this blog’. I reached that number almost immediately and I was thinking that, if I kept going at it, I could eventually reach 50 people. Then I came across R-bloggers and, after some hesitation, I submitted this blog to Tal’s web site. I jumped to over 200 people a day visiting the blog (see below) and even got comments! I repeat: I was writing about mixed models and got comments!

As I point out in the ‘About’ page, this is not an ‘R blog’ but a blog about statistics and data analysis in general, that mostly uses R as a vehicle to express ideas. There will be some Python (my favorite language) and, maybe, other tools; however, the ideas are more important than the syntax. Nevertheless, R has democratized the practice of statistics, as well as facilitated the production of some very interesting visuals. Once thing was clear to me: I did not want to write about ‘data visualization’, which is receiving a lot (I dare to say too much) attention in the R world. Most infographics produce the same reaction on me as choirs and mimes; which is to say, they bore me to tears (apologies if you are a choir-singer mime in your spare time). I want to write a bit about analysis and models and ‘bread and butter’ issues, because I think that they are many times ignored by people chasing the latest smoke and mirrors.

I have to say that I am learning a lot from comments, particularly people suggesting packages that I didn’t even know that existed. I am also learning about spam-comments and I wish there was a horrible place where spammers would go and suffer for eternity (together with pedophiles, torturers and other not-so-nice seedy characters). I am thankful to Tal mostly for putting the work to create a repository of R blogs; when I have some free time I often wander around looking for some interesting explanation to an R problem that is bugging me. As Tolkien said: ‘Not all those who wander are lost’.

Politics

Most of my professional life I have dealt with, let’s say, ‘agricultural’ statistics; that is, designed experiments, linear mixed models (mostly frequentist, but sometimes Bayesian), often with pedigrees. One of my pet peeves (and somewhat political issue) is that many statisticians—particularly in mathematics departments—tend to look down on ‘bread and butter’ work. They seem to forget that experiments and breeding/genetics have been the basis of many theoretical developments and that we deal with heavily multivariate data, longitudinal data, spatial data, models with sometimes hundreds of covariance components or, if you are into genomics, models with tens of thousands of random effects.

There is also the politics of software, where in the R community there is a strong bias against any package that is not free (sensu both speech and beer). At some level I share the ideal of having free access to tools, which make the practice of statistics/data analyses available to a large number of people. At another level, it seems counterproductive to work with substandard tools and to go for lesser models only because ‘package X can’t deal with the model that I would like to fit’. This is just another example of software defining our field, which was (and still is) a common accusation pointed towards SAS. In contrast, in this blog you will see several references to ASReml-R a commercial mixed models package, which I tend to use because (as far as I know) there is nothing at the moment that comes close to its functionality. If I pay for my computer running MS Windows or OS X (far from being paradigms of openness), how can I justify being dogmatic about using a commercial R package (particularly if I can access it via academic or nonprofit pricing)?

Sex and alcohol

As I pointed out above, most of my work has dealt with forestry/agriculture. Nevertheless, this year I have become more interested in the relationship between statistics and public policy issues; for example, minimum wage and unemployment, which I covered as a simple example in this blog. I have to thank my colleague Eric Crampton (in Canterbury’s Department of Economics) for igniting my interest on the use and misuse of statistics, and their interaction with economics, to justify all sort of restrictions and interventions in society. There are very interesting datasets in this area available in, for example, Statistics New Zealand that could be analyzed and would make very interesting case studies for teaching stats. There is a quote by H.G. Wells that comes to mind:

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.

Rock & roll? This is getting too long, so I will leave music for another time.

Upgrading R (and packages)

I tend not to upgrade R very often—running from 6 months to 1 year behind in version numbers—because I had to reinstall all packages: a real pain. A quick search shows that people have managed to come up with good solutions to this problem, as presented in this stackoverflow thread. I used the code in my mac:

From all installed packages, I only had issues with 5 of them, which require installation from their respective websites: Acinonyx, INLA (and AnimalINLA) and asreml. Package graph is now available from bioconductor.org. INLA can be installed really easily from inside R (see below), while I did not bother downloading again asreml and just copied the folder from ~/Library/R/OldVersion/library/asreml to ~/Library/R/CurrentVersion/library/asreml.

Overall, it was a good upgrade experience, so thanks to the stackoverflow crowd for so many ideas on how to make R even nicer than it is.

P.S. 20100-10-14 Similar instructions, but including compiling R and installing bioconductor.

A shoebox for data analysis

Recidivism. That’s my situation concerning this posting flotsam in/on/to the ether. I’ve tried before and, often, will change priorities after a few posts; I rationalize this process thinking that I’m cured and have moved on to real life.

This time may not be different but, simultaneously, it will be more focused: just a shoebox for the little pieces of code or ideas that I use when running analyses.