## Quantum Forest

### notes in a shoebox

#### Category: meta (page 1 of 2)

Travel is part of life and if you have pets, allergy finding appropriate boarding for them is a must. This is my explanation for why you should not entrust your dog to Glenstar Kennels, in Canterbury, New Zealand.

At the end of 2016 I had work approval for a two-month trip overseas. Normally I would book accommodation for my dog at the SPCA boarding kennels (as when we had two-months repairs to our house following Christchurch’s earthquake). However, as this trip included Christmas/New Year, it was impossible to find a vacancy. I was happy to find a spot for my dog at Glenstar Kennels spanning the whole end of year period.

Sadly, after 2 months travelling I found a sad surprise when I went to pick up my dog. He was almost 5 kg overweight, which for a 27 kg dog is 20% of weight gain in 2 months. As an illustration, imagine if you were 75 kg and gained 15 kg in only 2 months.

I immediately wrote to the owner of Glenstar Kennels, who stated that “Whilst I do agree he has put on weight had he fed a cheap food and lost weight in the kennels I feel you would be more upset”. Well, a dog becoming overweight is not any better than a dog losing weight! Both situations lead to reduced animal lifespan. While I believe the quality of the food provided was probably appropriate, the combination of physical activity and food quantity was clearly inappropriate.

The New Zealand Animal Welfare Act 1999 and the Code of Welfare for Dogs 2010 (administered by the Ministry for Primary Industries), establish a series of minimum standards for dog care:

1. Dogs need a balanced daily diet in quantities that meet their requirements for health and welfare and to maintain their ideal bodyweight.
2. The amount of food offered needs to be increased if a dog is losing condition, or decreased if it is becoming overweight.
3. The code of welfare for dogs applies in all situations, including temporary housing such as shelters, doggy daycares or day boarding facilities, and kennels

According to the schedule sent to me by Glenstar Kennels, there were 12 hours of contact a day when someone had access to my dog and 65 days to figure out that he was becoming overweight and take appropriate action. The photo below shows my dog’s increased girth as I tried to fit his harness as per the size when I drop him off at Glenstar’s facility on 9th December and his size on 12th February (after gaining 5 kg). There is a dramatic difference, well explained by the term negligence.

Poor doggio showing his change of girth after two months in Glenstar Kennels.

I am very unhappy with the level of care provided by Glenstar Kennels and the lack of a satisfactory reply to my written complaints. After 2 months away I had to take my dog to his usual veterinary to discuss his overweight, with the associated cost, so I could bring him back to good health. Dog and I have been doing our usual daily walks (as we did before the trip), and I have been very careful about his nutrition, so he can go slowly back to his optimal 27 kg.

Unfortunately, there is no compulsory regulatory body for dog boarding kennels that could enforce the Code of Welfare for Dogs. However, I feel that I have to write this review and make Glenstar Kennels negligence public, so other dog owners (and potential customers) are aware of their extremely poor service.

P.S. The owners of Glenstar Kennels also have another company: Star Pet Travel for pet relocations. They use Glenstar Kennels for their temporary accommodation. I wouldn’t use them either.

Almost 3 years ago I posted my computer setup following the model introduced by The Setup. A few things have changed in the meantime and this time is as good as any for updating the list.

### Hardware

Computers: I have been using a 13″ macbook air for 2 years now, hepatitis with a 256GB SSD and 8GB of RAM. At the beginning it was strange moving from a 3.5 year older macbook 15″ to a new computer with the same hard drive and RAM. Soon the differences became more apparent: 1.2 kg lighter and much longer battery life made lugging around the computer easier. I didn’t miss (much) the larger screen. The biggest constraint has been disk space; now I only have 25 to 30GB available in disk, which involves some juggling on my part. Next computer should have a minimum of 512GB hard drive, especially when considering the size of photographs. I also have a 2009 iMac 27″ which keeps on going; at most I’ll go for extra RAM this year but It’s OK as it stands.

Phone: I use a Samsung S4 (courtesy of my job), which is good enough although I have to turn it off every few days or some errand process will consume battery like there is no tomorrow. It meets most of my requirements, except I find the camera disappointing. Basic apps: email, calendar, twitter, runkeeper, pocket casts (paid), 1weather (paid), camscanner (paid), kindle & evernote.

Bag: For the last year and a half or so I have been using an Osprey Flapjack backpack, which is OK for walking with a laptop during short distances. However, it has poor back ventilation,  making walking for fun (yes, I do that sometimes) and cycling uncomfortable. I’m considering buying an Ortlieb Downtown pannier for cycling to work, instead of my current crappy panniers.

Photo/sound:  I still use my Nikon P7100, which is a point and click with manual features too. I take fewer pictures than I would like, but it is not its fault. Sometimes I carry a Sony PCM-M10 digital recorder, which does a pretty good job in general.

### Software

I have continued my belief on the impermanence of software and the need to stay operating system agnostic as much as possible.

Statistics: plain R for quickies, plain R + RStudio for bigger problems, plain R + RStudio + ASReml for quantitative genetics. SAS the odd time for historical reasons.

Presentations: back to PowerPoint after several years of Keynote. Main reason: Keynote is horrible at supporting presentations in older versions, which is death by a thousand cuts when preparing lectures. Secondary reason: the updates  have worsened Keynote.

Writing: most journal articles in Word, because most of my coauthors use it, short bursts of writing/quickies go to a text editor. I keep on changing editors, but I tend to return to Textmate 2, which has received some TLC since it was open sourced. I keep up some lecture and lab notes in latex but overtime I update them I think if it’s worth the trouble: a combination of cargo cult and Stockholm syndrome.

Photos: an old version of Adobe Lightroom for photo management, Skitch for quick image manipulation. Not completely happy with the latter, but haven’t found a good substitute.

Email: I dislike Outlook and can put up with Thunderbird, so Thunderbird it is. I can’t understand people who say I’ve kept all my email for the last 20 years, so every few years I have a catastrophic email cleansing and messages disappear. Note to self, organize an email implosion for 2015.

Browser: jumping between Firefox and Safari depending on my mood. Add-ons: Adblock Plus to make the internet free of ads.

Keeping things in sync: Dropbox.

All this software works well/it’s palatable in both Mac and Windows (keeping up with my agnosticism); some of it (Thunderbird, R, RStudio) also works in Linux. Some days I’m tempted to use OpenOffice to reduce operating system dependencies but, let’s be honest, OpenOffice is still clunky as hell.

There are loads of other programs in my computers, but don’t use them often enough to mentioning them.

I have been writing in internet on and off—perhaps mostly off—for near 20 years, order including various blog stints since July 2003. This is my fifth or sixth iteration for a blog and I figured out that one element that makes it difficult to keep going in its current form is how skewed is the sampling of topics I covered. I mean all this quantitative, diet coding, capsule etc. is like looking through a prism that only lets through a tiny portion of life.

Prism used to set ‘prism plots’ in forest inventory, where the distance to the tree and its size determines if it is inside the plot (Photo: Luis, click to enlarge).

I am loosening my mental definition of what should be in this site because as much as I like programming and numbers, it becomes tiring to always be switched on for those topics.  Some times this change will go unnoticed while others will represent a big departure from what is (or used to be) the core of this blog’s content.

I am hoping to try different topics (perhaps more common in a previous blog incarnation), angles and media. We will see how it works out.

This week I’ve been feeling tired of excessive fanaticism (or zealotry) of open source software (OSS) and R in general. I do use a fair amount of OSS and pushed for the adoption of R in our courses; in fact, medicine I do think OSS is a Good ThingTM. I do not like, however, constant yabbering on why using exclusively OSS in science is a good idea and the reduction of science to repeatability and computability (both of which I covered in my previous post). I also dislike the snobbery of ‘you shall use R and not Excel at all, because the latter is evil’ (going back ages).

We often have several experiments running during the year and most of the time we do not bother setting up a data base to keep data. Doing that would essentially mean that I would have to do it, and I have a few things more important to do. Therefore, many data sets end up in… (drum roll here) Microsoft Excel.

How should a researcher setup data in Excel? Rather than reinventing the wheel, I’ll use a(n) (im)perfect diagram that I found years ago in a Genstat manual.

Suggested sane data setup in a spreadsheet.

I like it because:

• It makes clear how to setup the experimental and/or sampling structure; one can handle any design with enough columns.
• It also manages any number of traits assessed in the experimental units.
• It contains metadata in the first few rows, which can be easily skipped when reading the file. I normally convert Excel files to text and then I skip the first few lines (using skip in R or firstobs in SAS).

People doing data analysis often start convulsing at the mention of Excel; personally, I deeply dislike it for analyses but it makes data entry very easy, and even a monkey can understand how to use it (I’ve seen them typing, I swear). The secret for sane use is to use Excel only for data entry; any data manipulation (subsetting, merging, derived variables, etc.) or analysis is done in statistical software (I use either R or SAS for general statistics, ASReml for quantitative genetics).

It is far from a perfect solution but it fits in the realm of the possible and, considering all my work responsibilities, it’s a reasonable use of my time. Would it be possible that someone makes a weird change in the spreadsheet? Yes. Could you fart while moving the mouse and create a non-obvious side effect? Yes, I guess so. Will it make your life easier, and make possible to complete your research projects? Yes sir!

P.S. One could even save data using a text-based format (e.g. csv, tab-delimited) and use Excel only as a front-end for data entry. Other spreadsheets are of course equally useful.

P.S.2. Some of my data are machine-generated (e.g. by acoustic scanners and NIR spectroscopy) and get dumped by the machine in a separate—usually very wide; for example 2000 columns—text file for each sample. I never put them in Excel, but read them directly (a directory-full of them) in to R for manipulation and analysis.

As an interesting aside, the post A summary of the evidence that most published research is false provides a good summary for the need to freak out about repeatability.

Every so often I get bored writing about statistical analyses, check software and torturing data and spend time in alternative creative endeavors: taking and processing pictures, story writing short stories or exploring new research topics. The former is, clinic mostly, covered in 500px, I keep the stories private and I’m just starting to play with bioacoustics.

While I’ve been away from this blog came the Google Reader debacle; Google announced that Reader will be canned mid-year, probably because they want to move everyone towards Google+. Let’s be straightforward, it is not the end of the world but a (relatively) minor annoyance. The main consequence is that this decision led me to reevaluate my relationship with Google services and the result is that I’m replacing most services, particularly those where what I consider private information is stored.

My work email (Luis.Apiolaza@canterbury.ac.nz) stayed the same while I moved my Google calendar back to my work’s exchange server. I setup my personal email address in one of my domains, served by Zoho. There are no ads in this account. I opted for this to avoid worrying about maintaining email servers, spam filtering, etc. I’ll see how it works, but if it doesn’t, will swap it for another service: my email address will stay the same.

I exported my RSS subscriptions from Google Reader and put them in Vienna. I tried some of the online alternatives, like Feedly, but didn’t like them.

I barely use Google Docs, so it won’t be a big deal to move from them. I deleted my Google+ account, no big loss. I’m keeping my Gmail account for a little while while I transition registration to various services. Nevertheless, the most difficult services to replace are Search, Maps and Scholar, which I’m now using without being logged-in in Google. I’m testing Duck Duck Go for search (kind of OK), while I’m sticking to Maps and, particularly, to Scholar. Funnily enough I have access to Web of Science and Scopus—two well-known academic search services—though the university and I will often prefer to look in Scholar, which is easier to use, more responsive and much better coverage of the literature; particularly of conferences and reports.

Google didn’t remove a small service. It did remove my confidence on their whole ecosystem.

Finding our way in the darkness (Photo: Luis, click to enlarge).

First go and read An R wish list for 2012. None of the wishes came through in 2012. Fix the R website? No, sales it is the same this year. In fact, sickness it is the same as in 2005. Easy to find help? Sorry, recipe next year. Consistency and sane defaults? Coming soon to a theater near you (one day). Thus my wish list for 2012 is, very handy, still the wish list for 2013.

## R as social software

The strength of R is not the software itself, but the community surrounding the software. Put another way, there are several languages that could offer the core functionality, but the whole ‘ecosystem’ that’s another thing. Softening @gappy3000’s comment: innovation is (mostly) happening outside the core.

This prompts some questions: Why isn’t ggplot2 or plyr in the default download? I don’t know if some people realize that ggplot2 is now one of the main attractions for R as data visualization language. Why isn’t Hadley’s name in this page? (Sorry I’m picking on him, first name that came to mind). How come there is not one woman in that page? I’m not saying there is an evil plan, but I’m wondering if (and how) the site and core reflect the R community and the diversity of interests (and uses). I’m also wondering what is the process to express these questions beyond a blog post. Perhaps in the developers email list?

I think that, in summary, my R wish for 2013 is that ‘The R project’—whoever that is—recognizes that the project is much more than the core download. I wish the list of contributors goes beyond the fairly small number of people with writing access to the source. I’d include those who write packages, those who explain, those who market and, yes, those who sell R. Finally, I wish all readers of Quantum Forest a great 2013.

Entry point to the R world. Same as ever.

P.S. Just in case, no, I’m not suggesting to be included in any list.

End-of-year posts are corny but, stomach what the heck, I think I can let myself delve in to corniness once a year. The following code gives a snapshot of what and how was R for me in 2012.

So one can query this over-the-top structure with code like R.2012[[3]]$didnt.use.at.all to learn [1] "Emacs", but you already new that, didn’t you? Despite all my complaints, monologuing about other languages and overall frustration, R has served me well. It’s just that I’d be disappointed if I were still using it a lot in ten-years time. Gratuitous picture: building blocks for research (Photo: Luis, click to enlarge). Of course there was a lot more than R and stats this year. For example, the blogs I read most often have nothing to do with either topic: Isomorphismes (can’t define it), The music of sound (sound design), Offsetting behaviour (economics/politics in NZ). In fact, I need reading about a broad range of topics to feel human. P.S. Incidentally, my favorite R function this year was subset(); I’ve been subsetting like there is no tomorrow. By the way, you are welcome to browse around the blog and subset whatever you like. A post on high-dimensional arrays by @isomorphisms reminded me of APL and, click more generally, of matrix languages, which took me back to inquisitive computing: computing not in the sense of software engineering, or databases, or formats, but of learning by poking problems through a computer. I like languages not because I can get a job by using one, but because I can think thoughts and express ideas through them. The way we think about a problem is somehow molded by the tools we use, and if we have loops, loops we use or if we have a terse matrix notation (see my previous post on Matrix Algebra Useful for Statistics), we may use that. I used APL fairly briefly but I was impressed by some superficial aspects (hey, that’s a weird set of characters that needs a keyboard overlay) and some deeper ones (this is an actual language, cool PDF paper). The APL revolution didn’t happen, at least not directly, but it had an influence over several other languages (including R). Somehow as a group we took a different path from ‘Expository programming’, but I think that we have to recover at least part of that ethos, programming for understanding the world. While many times I struggle with R frustrations, it is now my primary language for inquisitive computing, although some times I dive into something else. I like Mathematica, but can access it only while plugged to the university network (license limits). Python is turning into a great scientific computing environment—although still with a feeling of sellotape holding it together, J is like APL without the Klingon keyboard. If anything, dealing with other ways of doing things leads to a better understanding of one’s primary language. Idioms that seem natural acquire a new sense of weirdness when compared to other languages. R’s basic functionality gives an excellent starting point for inquisitive computing but don’t forget other languages that can enrich the way we look at problems. I am curious about what are people’s favorite inquisitive languages. Gratuitous picture: inquisition, Why bloody trees grow like this? (Photo: Luis, click to enlarge). This post is tangential to R, purchase although R has a fair share of the issues I mention here, which include research reproducibility, open source, paying for software, multiple languages, salt and pepper. There is an increasing interest in the reproducibility of research. In many topics we face multiple, often conflicting claims and as researchers we value the ability to evaluate those claims, including repeating/reproducing research results. While I share the interest in reproducibility, some times I feel we are obsessing too much on only part of the research process: statistical analysis. Even here, many people focus not on the models per se, but only on the code for the analysis, which should only use tools that are free of charge. There has been enormous progress in the R world on literate programming, where the combination of RStudio + Markdown + knitr has made analyzing data and documenting the process almost enjoyable. Nevertheless, and here is the BUT coming, there is a large difference between making the code repeatable and making research reproducible. As an example, currently I am working in a project that relies on two trials, which have taken a decade to grow. We took a few hundred increment cores from a sample of trees and processed them using a densitometer, an X-Ray diffractometer and a few other lab toys. By now you get the idea, actually replicating the research may take you quite a few resources before you even start to play with free software. At that point, of course, I want to be able to get the most of my data, which means that I won’t settle for a half-assed model because the software is not able to fit it. If you think about it, spending a couple of grands in software (say ASReml and Mathematica licenses) doesn’t sound outrageous at all. Furthermore, reproducing this piece of research would require: a decade, access to genetic material and lab toys. I’ll give you the code for free, but I can’t give you ten years or$0.25 million…

In addition, the research process may require linking disparate sources of data for which other languages (e.g. Python) may be more appropriate. Some times R is the perfect tool for the job, while other times I feel like we have reached peak VBS (Visual Basic Syndrome) in R: people want to use it for everything, even when it’s a bad idea.

In summary,

• research is much more than a few lines of R (although they are very important),
• even when considering data collection and analysis it is a good idea to know more than a single language/software, because it broadens analytical options
• I prefer free (freedom+beer) software for research; however, I rely on non-free, commercial software for part of my work because it happens to be the best option for specific analyses.

Disclaimer: my primary analysis language is R and I often use lme4, MCMCglmm and INLA (all free). However, many (if not most) of my analyses that use genetic information rely on ASReml (paid, not open source). I’ve used Mathematica, Matlab, Stata and SAS for specific applications with reasonably priced academic licenses.

Gratuitous picture: 3000 trees leaning in a foggy Christchurch day (Photo: Luis, click to enlarge).

(This post continues discussing issues I described back in January in Academic publication boycott)

Some weeks ago I received a couple of emails the same day: one asking me to submit a paper to an open access journal, diabetes and pregnancy while the other one was inviting me to be the editor of an ‘special issue’ of my choice for another journal. I haven’t heard before about any of the two publications, approved which follow pretty much the same model: submit a paper for $600 and—if they like it—it will be published. However, store the special issue email had this ‘buy your way in’ feeling: find ten contributors (i.e.$6,000) and you get to be an editor. Now, there is nothing wrong per-se with open access journals, some of my favorite ones (e.g. PLoS ONE) follow that model. However, I was surprised by the increasing number of new journals that look at filling the gap for ‘I need to publish soon, somewhere’. Surprised until one remembers the incentives at play in academic environments.

If I, or most academics for that matter, want to apply for academic promotion I have to show that I’m a good guy that has a ‘teaching philosophy’ and that my work is good enough to get published in journals; hopefully in lots of them. The first part is a pain, but most people can write something along the lines ‘I’m passionate about teaching and enjoy creating a challenging environment for students…’ without puking. The second part is trickier because one has to really have the papers in actual journals.

Personally, I would be happier with only having the odd ‘formal’ publication. The first time (OK, few times) I saw my name in a properly typeset paper was very exciting, but it gets old after a while. These days, however, I would prefer to just upload my work to a website, saying here you have some ideas and code, play with it. If you like it great, if not well, next time I hope it’ll be better. Nevertheless, this doesn’t count as proper publication, because it isn’t peer reviewed, independently of the number of comments the post may get. PLoS ONE counts, but it’s still a journal and I (and many other researchers) work in many things that are too small for a paper, but cool enough to share. The problem: there is little or no credit for sharing so Quantum Forest is mostly a ‘labor of love’, which counts bugger all for anything else.

These days as a researcher I often learn more from other people’s blogs and quick idea exchanges (for example through Twitter) than via formal publication. I enjoy sharing analysis, ideas and code in this blog. So what’s the point of so many papers in so many journals? I guess that many times we are just ‘ticking the box’ for promotions purposes. In addition, the idea of facing referees’ or editors’ comments like ‘it would be a good idea that you cite the following papers…’ puts me off. And what about authorship arrangements? We have moved from papers with 2-3 authors to enough authors to have a football team (with reserves and everything). Some research groups also run arrangements where ‘I scratch your back (include you as a coauthor) and you scratch mine (include me in your papers)’. We break ideas into little pieces that count for many papers, etc.

Another related issue is the cost of publication (and the barriers it imposes on readership). You see, we referee papers for journals for free (as in for zero money) and tell ourselves that we are doing a professional service to uphold the high standards of whatever research branch we belong to. Then we spend a fortune from our library budget to subscribe to the same journals for which we reviewed the papers (for free, remember?). It is not a great deal, as many reasonable people have pointed out; I added a few comments in academic publication boycott.

So, what do we need? We need promotion committees to reduce the weight on publication. We need to move away from impact factor. We can and need to communicate in other ways: scientific papers will not go away, but their importance should be reduced.

Some times the way forward is unclear. Incense doesn’t hurt (Photo: Luis).

Making an effort to prepare interesting lectures doesn’t hurt either.
These days it is fairly common editors ‘suggesting’ to include additional references in our manuscripts, which just happen to be to papers in the same journal, hoping to inflate the impact factor of the journal. Referees tend to suggest their own papers (some times useful, many times not). Lame, isn’t it?

PS. 2012-10-19 15:27 NZST. You also have to remember that not because something was published it is actually correct: outrageously funny example (via Arthur Charpentier). Yep, through Twitter.