## Quantum Forest

### notes in a shoebox

In a previous post I summarily described our options for (generalized to varying degrees) linear mixed models from a frequentist point of view: nlme, order lme4 and ASReml-R, info followed by a quick example for a split-plot experiment.

But who is really happy with a toy example? We can show a slightly more complicated example assuming that we have a simple situation in breeding: a number of half-sib trials (so we have progeny that share one parent in common), each of them established following a randomized complete block design, analyzed using a ‘family model’. That is, the response variable (dbh: tree stem diameter assessed at breast height—1.3m from ground level) can be expressed as a function of an overall mean, fixed site effects, random block effects (within each site), random family effects and a random site-family interaction. The latter provides an indication of genotype by environment interaction.

First I should make clear that this model has several drawbacks:

• It assumes that we have homogeneous variances for all trials, as well as identical correlation between trials. That is, that we are assuming compound symmetry, which is very rarely the case in multi-environment progeny trials.
• It also assumes that all families are unrelated to each other, which in this case I know it is not true, as a quick look at the pedigree will show that several mothers are related.
• We assume only one type of families (half-sibs), which is true just because I got rid of all the full-sib families and clonal material, to keep things simple in this example.

Just to make the point, a quick look at the data using ggplot2—with some jittering and alpha transparency to better display a large number of points—shows differences in both mean and variance among sites.

Anyway, let’s keep on going with this simplistic model. In my computer, a two year old iMac, ASReml-R takes roughly 9 seconds, while lme4 takes around 65 seconds, obtaining similar results.

The residual matrix of this model is a, fairly large, diagonal matrix (residual variance times an identity matrix). At this point we can relax this assumption, adding a bit more complexity to the model so we can highlight some syntax. Residuals in one trial should have nothing to do with residuals in another trial, which could be hundreds of kilometers away. I will then allow for a new matrix of residuals, which is the direct sum of trial-specific diagonal matrices. In ASReml-R we can do so by specifying a diagonal matrix at each trial with `rcov = ~ at(Trial):units`:

There is a big improvement of log-Likelihood from m2 (-744452.1) to m2b (-738011.6) for 30 additional variances. At this stage, we can also start thinking of heterogeneous variances for blocks, with a small change to the code:

At this point we could do extra modeling at the site level by including any other experimental design features, allowing for spatially correlated residuals, etc. I will cover some of these issues in future posts, as this one is getting a bit too long. However, we could gain even more by expressing the problem in a multivariate fashion, where the performance of Mothers in each trial would be considered a different trait. This would push us towards some large problems, which require modeling covariance structures so we have a change of achieving convergence during our lifetime.

Disclaimer: I am emphasizing the use of ASReml-R because 1.- I am most familiar with it than with nlme or lme4 (although I tend to use nlme for small projects), and 2.- There are plenty of examples for the other packages but not many for ASReml in the R world. I know some of the people involved in the development of ASReml-R but otherwise I have no commercial links with VSN (the guys that market ASReml and Genstat).

There are other packages that I have not had the chance to investigate yet, like hglm.

1. Kevin Wright

2011/10/19 at 8:48 am

Luis, why the "disclaimer"? Why not a "proclaimer" along the lines of:

ASREML is the only software that can fit mixed models to _large_ data with _complex variance_ structures.

Maybe lme4 or SAS' HPMIXED will support complex variance structures someday, but not yet as far as I know.

2. Luis

2011/10/19 at 6:12 pm

Hi Kevin,

I do agree with your opinion of ASReml, but I do not want to be pedantic about that. In addition, there are interesting parts of other packages (e.g. MCMCglmmm, INLA and lme4) that I really appreciate. However, I have not come across yet with a package more powerful than ASReml-R for the type of work we do.