Archive for the ‘research’ Category

This post is tangential to R, although R has a fair share of the issues I mention here, which include research reproducibility, open source, paying for software, multiple languages, salt and pepper.

There is an increasing interest in the reproducibility of research. In many topics we face multiple, often conflicting claims and as researchers we value the ability to evaluate those claims, including repeating/reproducing research results. While I share the interest in reproducibility, some times I feel we are obsessing too much on only part of the research process: statistical analysis. Even here, many people focus not on the models per se, but only on the code for the analysis, which should only use tools that are free of charge.

There has been enormous progress in the R world on literate programming, where the combination of RStudio + Markdown + knitr has made analyzing data and documenting the process almost enjoyable. Nevertheless, and here is the BUT coming, there is a large difference between making the code repeatable and making research reproducible.

As an example, currently I am working in a project that relies on two trials, which have taken a decade to grow. We took a few hundred increment cores from a sample of trees and processed them using a densitometer, an X-Ray diffractometer and a few other lab toys. By now you get the idea, actually replicating the research may take you quite a few resources before you even start to play with free software. At that point, of course, I want to be able to get the most of my data, which means that I won’t settle for a half-assed model because the software is not able to fit it. If you think about it, spending a couple of grands in software (say ASReml and Mathematica licenses) doesn’t sound outrageous at all. Furthermore, reproducing this piece of research would require: a decade, access to genetic material and lab toys. I’ll give you the code for free, but I can’t give you ten years or $0.25 million… In addition, the research process may require linking disparate sources of data for which other languages (e.g. Python) may be more appropriate. Some times R is the perfect tool for the job, while other times I feel like we have reached peak VBS (Visual Basic Syndrome) in R: people want to use it for everything, even when it’s a bad idea. In summary, • research is much more than a few lines of R (although they are very important), • even when considering data collection and analysis it is a good idea to know more than a single language/software, because it broadens analytical options • I prefer free (freedom+beer) software for research; however, I rely on non-free, commercial software for part of my work because it happens to be the best option for specific analyses. Disclaimer: my primary analysis language is R and I often use lme4, MCMCglmm and INLA (all free). However, many (if not most) of my analyses that use genetic information rely on ASReml (paid, not open source). I’ve used Mathematica, Matlab, Stata and SAS for specific applications with reasonably priced academic licenses. Gratuitous picture: 3,000 trees leaning in a foggy Christchurch day (Photo: Luis). (This post continues discussing issues I described back in January in Academic publication boycott) Some weeks ago I received a couple of emails the same day: one asking me to submit a paper to an open access journal, while the other one was inviting me to be the editor of an ‘special issue’ of my choice for another journal. I haven’t heard before about any of the two publications, which follow pretty much the same model: submit a paper for$600 and—if they like it—it will be published. However, the special issue email had this ‘buy your way in’ feeling: find ten contributors (i.e. \$6,000) and you get to be an editor. Now, there is nothing wrong per-se with open access journals, some of my favorite ones (e.g. PLoS ONE) follow that model. However, I was surprised by the increasing number of new journals that look at filling the gap for ‘I need to publish soon, somewhere’. Surprised until one remembers the incentives at play in academic environments.

If I, or most academics for that matter, want to apply for academic promotion I have to show that I’m a good guy that has a ‘teaching philosophy’ and that my work is good enough to get published in journals; hopefully in lots of them. The first part is a pain, but most people can write something along the lines ‘I’m passionate about teaching and enjoy creating a challenging environment for students…’ without puking. The second part is trickier because one has to really have the papers in actual journals.

Personally, I would be happier with only having the odd ‘formal’ publication. The first time (OK, few times) I saw my name in a properly typeset paper was very exciting, but it gets old after a while. These days, however, I would prefer to just upload my work to a website, saying here you have some ideas and code, play with it. If you like it great, if not well, next time I hope it’ll be better. Nevertheless, this doesn’t count as proper publication, because it isn’t peer reviewed, independently of the number of comments the post may get. PLoS ONE counts, but it’s still a journal and I (and many other researchers) work in many things that are too small for a paper, but cool enough to share. The problem: there is little or no credit for sharing so Quantum Forest is mostly a ‘labor of love’, which counts bugger all for anything else.

These days as a researcher I often learn more from other people’s blogs and quick idea exchanges (for example through Twitter) than via formal publication. I enjoy sharing analysis, ideas and code in this blog. So what’s the point of so many papers in so many journals? I guess that many times we are just ‘ticking the box’ for promotions purposes. In addition, the idea of facing referees’ or editors’ comments like ‘it would be a good idea that you cite the following papers…’ puts me off. And what about authorship arrangements? We have moved from papers with 2-3 authors to enough authors to have a football team (with reserves and everything). Some research groups also run arrangements where ‘I scratch your back (include you as a coauthor) and you scratch mine (include me in your papers)’. We break ideas into little pieces that count for many papers, etc.

Another related issue is the cost of publication (and the barriers it imposes on readership). You see, we referee papers for journals for free (as in for zero money) and tell ourselves that we are doing a professional service to uphold the high standards of whatever research branch we belong to. Then we spend a fortune from our library budget to subscribe to the same journals for which we reviewed the papers (for free, remember?). It is not a great deal, as many reasonable people have pointed out; I added a few comments in academic publication boycott.

So, what do we need? We need promotion committees to reduce the weight on publication. We need to move away from impact factor. We can and need to communicate in other ways: scientific papers will not go away, but their importance should be reduced.

Some times the way forward is unclear. Incense doesn’t hurt (Photo: Luis).

Making an effort to prepare interesting lectures doesn’t hurt either.
These days it is fairly common editors ‘suggesting’ to include additional references in our manuscripts, which just happen to be to papers in the same journal, hoping to inflate the impact factor of the journal. Referees tend to suggest their own papers (some times useful, many times not). Lame, isn’t it?

PS. 2012-10-19 15:27 NZST. You also have to remember that not because something was published it is actually correct: outrageously funny example (via Arthur Charpentier). Yep, through Twitter.

The media in New Zealand briefly covered the destruction of a trial with genetically modified pines (Pinus radiata D. Don, vulgar name Radiata pine, Monterey pine) near Rotorua. This is not the first time that Luddites destroy a trial, ignoring that they have been established following regulations from the Environmental Protection Agency. Most people have discussed this pseudo-religious vandalism either from the wasting resources (money, more importantly time, delays on publication for scientists, etc) or from the criminal activity points of view.

I will discuss something slightly different, when would we plant genetically modified trees?

Some background first

In New Zealand, plantations of forests trees are established by the private sector (mostly forest companies and small growers–usually farmers). Most of the stock planted in the country has some degree of (traditional) breeding, and it ranges from seed mixes with a large numbers of parents to the deployment of genetically identical clones. The higher the degree of improvement the most likely is that tree deployment involves a small number of highly selected genotypes. Overall, most tree plantations are based on open-pollinated seed with a modest degree of genetic improvement, which is much more genetically diverse than most agricultural crops. In contrast, agricultural crops tend to deploy named clonal varieties which is what we buy in supermarkets: Gold kiwifruit, Gala apples, Nadine potatoes, etc.

Stating the obvious, tree and agricultural growers will pay more for genetic material if they have the expectation that the seeds, cuttings, tubers, etc are going to provide higher quantity and/or quality of products which will pay for the extra expense. Here we can see a big difference between people growing trees and annual/short rotation crops: there is a large lag between tree establishment and income coming from the trees, which means that when one runs a discounted cash flow analysis to estimate profitability:

1. Income is in the distant future (say 25-30 years) and are heavily discounted.
2. Establishment costs, which include buying the genetic material, are not discounted because they happen right now.

Unsurprisingly, growers want to reduce establishment costs as much as they can and remember that the cost of trees is an important component. This means that most people planting trees will go for cheaper, low level of genetic improvement trees (often seedlings), unless they are convinced that they can recover the extra expense with more improved trees (usually clones, which cost at least double than seedlings).

What’s the relationship with genetic modification?

Modification of any organism is an expensive process, which means that:

1. One would only modify individuals with an outstanding genetic background; i.e. start with a good genotype to end up with a great one.
2. Successful modifications will be clonally propagated to scale up the modification, driving down unit cost.

Thus, we have a combination of very good genotypes plus clonal propagation plus no discounting, which would make establishment costs very high (although no impossible). There is a second element that, at least for now, would delay adoption. Most large forest growers will have some type of product certification, which establishes that the grower is using good forestry, environmental and social practices. Think of it as a sticker that says the producer of this piece of wood is a good guy, so please feel confident about buying this product; that is, this sticker is part of a marketing strategy. Currently some forest certification organizations do not accept the use of genetically modified organisms (e.g. Forest Certification Council, PDF of GMO policy).

This does not mean that it is not financially possible to plant genetically modified trees. For once, modification costs would reduce with economies of scale (as for most biotechnologies), and one of the reasons we don’t have these economies is the political pressure by almost-religious zealots against GMO, which make people scared about being first to plant GM trees/plants. Another option is to change the GMO policy for some certification agencies or, relying on other certification organizations that do accept GMOs. Each individual forest company would have to evaluate the trade-offs of the certification decision, as they do not work as a block.

A simple scenario

Roughly 80% percent of the forest plantations in New Zealand correspond to radiata pine. Now imagine that we face a very destructive pest or disease that has the potential to severely damage the survival/growth of the trees. I know that it would take us a long time (decades?) to breed trees resistant to this problem. I also know that the GM crowd could insert several disease resistance genes and silence flowering, so we don’t have reproduction of modified trees. Would you support the use of genetic modification to save one of the largest industries of the country? I would.

However, before using the technology I would like to have access to data from trials growing in New Zealand conditions. The destruction of trials makes extremely difficult to make informed decisions and this is the worst crime. This people are not just destroying trees but damaging our ability to properly make decisions as a society, evaluating the pros and cons of our activities.

P.S. These are just my personal musings about the subject and do not represent the views of the forest companies, the university or anyone else. I do not work on genetic modification, but I am a quantitative geneticist & tree breeder.
P.S.2. While I do not work on genetic modification—so I’d struggle to call that crowd ‘colleagues’—I support researchers on that topic in their effort to properly evaluate the performance of genetically modified trees.

The last few weeks there has been a number of researchers calling for, or supporting, a boycott against Elsevier; for example, Scientific Community to Elsevier: Drop Dead, Elsevier—my part in its downfall or, more general, Should you boycott academic publishers?

What metrics are used to compare Elsevier to other publishers? It is common to refer to cost-per-article; for example, in my area Forest Ecology and Management (one of the most popular general Forestry Journals) charges USD 31.50 per article but Tree Genetics and Genomes (published by Springer Verlag) costs EUR 34.95 (roughly USD 46). Nevertheless, researchers affiliated to universities or research institutes rarely pay per article; instead, our libraries have institution-wide subscriptions. Before the great consolidation drive we would have access to individual journal subscription prices (sometimes reaching thousands of dollars per year, each of them). Now libraries buy bundles from a given publisher (e.g. Elsevier, Springer, Blackwell, Wiley, etc) so it is very hard to get a feeling of the actual cost of a single journal. With this consideration, I am not sure if Elsevier ‘deserves’ being singled out in this mess; at least not any more than Springer or Blackwell, or… a number of other publishers.

Elsevier? No, just Gaahl Gorgoroth

What we do know is that most of the work is done and paid for by scientists (and society in general) rather than journals. Researchers do research and our salaries and research expenses are many times paid for (at least partially if not completely) by public funding. We also act as referees for publications and a subset of us are part of editorial boards of journals. We do use some journal facilities; for example, an electronic submission system (for which there are free alternatives) and someone will ‘produce’ the papers in electronic format, which would be a small(ish) problem if everyone used LaTeX.

If we go back some years ago, many scientific societies used to run their own journals (many times scrapping by or directly running them at a loss). Then big publishers came ‘to the rescue’ offering economies of scale and an opportunity to make a buck. There is nothing wrong with the existence of publishers facilitating the publication process; but when combined with the distortions in the publication process (see below) publishers have achieved a tremendous power. At the same time, publishers have hiked prices and moved a large part of their operations to cheaper countries (e.g. India, Indonesia, etc) leaving us researchers struggling to pay for the subscriptions to read our own work. Not only that, but copyright restrictions in many journals do not allow us to make our work available to the people who paid for the research: you, the tax payer.

Today scientific societies could run their own journals and completely drop the printed version, so we could have cheaper journals while societies wouldn’t go belly up moving paper across continents. Some questions, Would scientific societies be willing to change? If that’s the case, Could they change their contractual arrangements with publishers?

Why do we play the game?

The most important part of the problem is that we (the researchers) are willing to participate in the publication process with the current set of rules. Why do we do it? At the end of the day, many of us play the journal publication game because it has been subverted from dissemination of important research results to signaling researcher value. University and research institute managers need to have a way to evaluate their researchers, managing tenures, promotions, etc. Rather than going for actually doing a proper evaluation (difficult, expensive and subjective), they go for an easy one (subjective as well): number of publications in ‘good’ journals. If I want to get promoted or taken seriously in funding applications I have to publish in journals.

I think it is easy to see that I enjoy openly communicating what I have learned (for example this blog and in my main site). I would rather spend more time doing this than writing ‘proper’ papers, but of course this is rarely considered important in my evaluations.

If you already are a top-of-the-scale, tenured professor it is very easy to say ‘I don’t want to play the game anymore’. If you are a newcomer to the game, trying to establish yourself in these times of PhD gluts and very few available research positions, all incentives line up to play the game.

This is only part of the problem

The questioning does not stop at the publication process. Instead, the peer value of review process is also under scrutiny. Then we enter into open science: beyond having access to publications, How much can we trust the results? We have discussions on open access data even when it is in closed journals. And on, and on.

We have moved from a situation of scarcity, where publishing was expensive, the tools to analyze our data were expensive and making data available was painfully difficult to a time when all that is trivially easy. I can collect some data, upload it to my site, rely on the democratization of statistics, write it up and create a PDF or HTML version by pressing a button. We would like to have feedback: relatively easy if the publication is interesting. We want an idea of reliability or trust: we could have, for example, some within-organization peer reviewing. Remember though that peer reviewing is not a panacea. We want to have an idea of community standing, which would be the number of people referring to that document (paper, blog post, wiki, whatever).

Maybe the most important thing is that we are trying to carry on with ‘traditional’ practices that do not extend beyond, say, 100 years. We do not need to do so if we are open to a more fluid environment on both publication, analytics and data sharing. Better, we wouldn’t need to continue if we stopped putting so much weight on traditional publication avenues when evaluating researchers.

Is Elsevier evil? I don’t think so; or, at least, it doesn’t seem to be significantly worse than other publishers. Have we vested too much power on Elsevier and other publishers? You bet! At the very least we should get back to saner copyright practices, where the authors retain copyright and provide a non-exclusive license to the publishers. Publishers will still make money but everyone will be able to freely access our research results because, you know, they already pay for the research.

Disclaimer: I have published in journals managed by Elsevier and Springer. I currently have articles under review for both publishers.

P.S. Gaahl Gorgoroth image from Wikipedia.

P.S.2 The cost of knowledge is keeping track of academics taking a stand against Elsevier; 1503 of them at 12:32 NZST 2012-01-30. HT: Arthur Charpentier.

P.S.3 2012-01-31 NZST I would love to know what other big publishers are thinking.

P.S.4 2012-02-01 NZST Research Works Act: are you kidding me?

The Research Works Act (RWA) bill (H.R.3699) introduced to the US Congress on 16 December 2011 proposes that:

No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that–

(1) causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work; or

(2) requires that any actual or prospective author, or the employer of such an actual or prospective author, assent to network dissemination of a private-sector research work.

The idea of calling researcher’s work funded by government, edited by their peers (probably at least partially funded by government funds) private-sector research work because a publishing company applied whatever document template they use on top of the original manuscript is obscene. By the way, Richard Poynder has a post that lists a number of publishers that have publicly disavowed the RWA.

P.S.5 2012-02-02 16:38 NZST Doron Zeilberger points to the obvious corollary: we don’t need journals for research dissemination anymore (although still we do for signaling). Therefore if one is keen on boycotts it should affect all publishers. Academics are stuck with last century’s publication model.

P.S.6 2012-10-19 15:18 NZST I have some comments on publication incentives.