Bivariate linear mixed models using ASReml-R with multiple cores

A while ago I wanted to run a quantitative genetic analysis where the performance of genotypes in each site was considered as a different trait. If you think about it, with 70 sites and thousands of genotypes one is trying to fit a 70×70 additive genetic covariance matrix, which requires 70*69/2 = 2,415 covariance components. Besides requiring huge amounts of memory and being subject to all sort of estimation problems there were all sort of connectedness issues that precluded the use of Factor Analytic models to model the covariance matrix. The best next thing was to run over 2,000 bivariate analyses to build a large genetic correlation matrix (which has all sort of issues, I know). This meant leaving the computer running for over a week.

In another unrelated post, Kevin asked if I have ever considered using ASReml-R to run in parallel using a computer with multiple cores. Short answer, I haven’t, but there is always a first time.

require(asreml) # Multivariate mixed models
require(doMC)   # To access multiple cores

registerDoMC()  # to start a "parallel backend"

# Read basic density data
setwd('~/Dropbox/quantumforest')
# Phenotypes
dat = read.table('density09.txt', header = TRUE)
dat$FCLN = factor(dat$FCLN)
dat$MCLN = factor(dat$MCLN)
dat$REP = factor(dat$REP)
dat$SETS = factor(dat$SETS)
dat$FAM = factor(dat$FAM)
dat$CP = factor(dat$CP)
dat$Tree_id = factor(dat$Tree_id)
head(dat)

# Pedigree
ped = read.table('pedigree.txt', header = TRUE)
names(ped)[1] = 'Tree_id'
ped$Tree_id = factor(ped$Tree_id)
ped$Mother = factor(ped$Mother)
ped$Father = factor(ped$Father)

# Inverse of the numerator relationship matrix
Ainv = asreml.Ainverse(ped)$ginv

# Wrap call to a generic bivariate analysis
# (This one uses the same model for all sites
#  but it doesn't need to be so)
bivar = function(trial1, trial2) {
  t2 = dat[dat$Trial_id == trial1 | dat$Trial_id == trial2,]
  t2$den1 = ifelse(t2$Trial_id == trial1, t2$DEN, NA)
  t2$den2 = ifelse(t2$Trial_id == trial2, t2$DEN, NA)
  t2$Trial_id = t2$Trial_id[drop = TRUE]

  # Bivariate run
  # Fits overall mean, random correlated additive effects with
  # heterogeneous variances and diagonal matrix for heterogeneous
  # residuals
  den.bi =  asreml(cbind(den1,den2) ~ trait,
                   random = ~ ped(Tree_id):corgh(trait),
                   data = t2,
                   ginverse = list(Tree_id = Ainv),
                   rcov = ~ units:diag(trait),
                   workspace=64e06, maxiter=30)

  # Returns only log-Likelihood for this example
  return(summary(den.bi)$loglik)
}

# Run the bivariate example in parallel for two pairs of sites
# FR38_1 with FR38_2, FR38_1 with FR38_3
foreach(trial1=rep("FR38_1",2), trial2=c("FR38_2", "FR38_3")) %dopar%
        bivar(trial1, trial2)

The code runs in my laptop (only two cores) but I still have to test its performance in my desktop (4 cores) and see if it really makes a difference in time versus running the analyses sequentially. Initially my mistake was using the multicore package (require(multicore)) directly, which doesn’t start the ‘parallel backend’ and ran sequentially. Using require(doMC) loads multicore but takes care of starting the ‘parallel backend’.

Gratuitous picture: Trees at 8 pm illuminated by bonfire and full moon (Photo: Luis).

P.S. Thanks to Etienne Laliberté, who sent me an email in early 2010 pointing out that I had to use doMC. One of the few exceptions in the pile of useless emails I have archived.
P.S.2. If you want to learn more about this type of models I recommend two books: Mrode’s Linear Models for the Prediction of Animal Breeding Values, which covers multivariate evaluation with lots of gory details, and Lynch and Walsh’s Genetics and Analysis of Quantitative Traits, which is the closest thing to the bible in quantitative genetics.
P.S.3. 2012-05-08. I did run the code in my desktop (iMac with 4 cores), making a couple of small changes in the code to have a fairer comparison between using %dopar% (for parallel code) and %par% (for sequential execution).

  1. Added trace = FALSE to the asreml() call, to avoid printing partial iteration details to the screen, which happened when using %do%.
  2. Used 4 pairs of sites, so I could utilize all cores in my desktop.

Execution time for the parallel version—measured with Sys.time()—was, roughly, 1/(number of cores) the time required by the sequential version.

On the (statistical) road, workshops and R

Things have been a bit quiet at Quantum Forest during the last ten days. Last Monday (Sunday for most readers) I flew to Australia to attend a couple of one-day workshops; one on spatial analysis (in Sydney) and another one on modern applications of linear mixed models (in Wollongong). This will be followed by attending The International Biometric Society Australasian Region Conference in Kiama.

I would like to comment on the workshops to look for commonalities and differences. First, both workshops heavily relied on R, supporting the idea that if you want to reach a lot of people and get them using your ideas, R is pretty much the vehicle to do so. It is almost trivial to get people to install R and RStudio before the workshop so they are ready to go. “Almost” because you have to count on someone having a bizarre software configuration or draconian security policies for their computer.

The workshop on Spatial Analysis also used WinBUGS, which with all respect to the developers, is a clunky piece of software. Calling it from R or using JAGS from R seems to me a much more sensible way of using a Bayesian approach while maintaining access to the full power of R. The workshop on linear mixed models relied on asreml-R; if you haven’t tried it, please give it a go (it is a free license for academic/research use). There were applications on multi-stage experiments, composite samples and high dimensional data (molecular information). In addition, there was an initial session on optimal design of experiments.

In my opinion, the second workshop (modern applications…) was much more successful than the first one (spatial analysis…) for a few reasons:

  • One has to limit the material to cover in a one-day workshop; if you want to cover a lot consider three days so people can digest all the material.
  • One has to avoid the “split-personality” approach to presentations; having very-basic and super-hard but nothing in the middle is not a great idea (IMHO). Pick a starting point and gradually move people from there.
  • Limit the introduction of new software. One software per day seems to be a good rule of thumb.

Something bizarre (for me, at least) was the difference between the audiences. In Sydney the crowd was a lot younger, with many trainees in biostatistics coming mostly from health research. They had little exposure to R and seemed to come from a mostly SAS shop. The crowd in Wollongong had people with a lot of experience (OK, oldish) both in statistics and R. I was expecting young people to be more conversant in R.

Tomorrow we will drive down to Kiama, sort out registration and then go to the welcome BBQ. Funny thing is that this is my first statistics conference; as I mentioned in the About page of this blog, I am a simple forester. :-)

Surviving a binomial mixed model

A few years ago we had this really cool idea: we had to establish a trial to understand wood quality in context. Sort of following the saying “we don’t know who discovered water, but we are sure that it wasn’t a fish” (attributed to Marshall McLuhan). By now you are thinking WTF is this guy talking about? But the idea was simple; let’s put a trial that had the species we wanted to study (Pinus radiata, a gymnosperm) and an angiosperm (Eucalyptus nitens if you wish to know) to provide the contrast, as they are supposed to have vastly different types of wood. From space the trial looked like this:

The reason you can clearly see the pines but not the eucalypts is because the latter were dying like crazy over a summer drought (45% mortality in one month). And here we get to the analytical part: we will have a look only at the eucalypts where the response variable can’t get any clearer, trees were either totally dead or alive. The experiment followed a randomized complete block design, with 50 open-pollinated families in 48 blocks. The original idea was to harvest 12 blocks each year but—for obvious reasons—we canned this part of the experiment after the first year.

The following code shows the analysis in asreml-R, lme4 and MCMCglmm:

load('~/Dropbox/euc.Rdata')

library(asreml)
sasreml = asreml(surv ~ 1, random = ~ Fami + Block,
                 data = euc,
                 family = asreml.binomial(link = 'logit'))
summary(sasreml)$varcomp

#                      gamma component  std.error  z.ratio
#Fami!Fami.var     0.5704205 0.5704205 0.14348068 3.975591
#Block!Block.var   0.1298339 0.1298339 0.04893254 2.653324
#R!variance        1.0000000 1.0000000         NA       NA

#                 constraint
#Fami!Fami.var      Positive
#Block!Block.var    Positive
#R!variance            Fixed

# Quick look at heritability
varFami = summary(sasreml)$varcomp[1, 2]
varRep = summary(sasreml)$varcomp[2, 2]
h2 = 4*varFami/(varFami + varRep + 3.29)
h2
#[1] 0.5718137

library(lme4)
slme4 = lmer(surv ~ 1 + (1|Fami) + (1|Block),
             data = euc,
             family = binomial(link = 'logit'))

summary(slme4)

#Generalized linear mixed model fit by the Laplace approximation
#Formula: surv ~ 1 + (1 | Fami) + (1 | Block)
#   Data: euc
#  AIC  BIC logLik deviance
# 2725 2742  -1360     2719
#Random effects:
# Groups   Name        Variance Std.Dev.
# Fami     (Intercept) 0.60941  0.78065
# Block    (Intercept) 0.13796  0.37143
#Number of obs: 2090, groups: Fami, 51; Block, 48
#
#Fixed effects:
#            Estimate Std. Error z value Pr(>|z|)
#(Intercept)   0.2970     0.1315   2.259   0.0239 *

# Quick look at heritability
varFami = VarCorr(slme4)$Fami[1]
varRep = VarCorr(slme4)$Block[1]
h2 = 4*varFami/(varFami + varRep + 3.29)
h2
#[1] 0.6037697

# And let's play to be Bayesians!
library(MCMCglmm)
pr = list(R = list(V = 1, n = 0, fix = 1),
          G = list(G1 = list(V = 1, n = 0.002),
                   G2 = list(V = 1, n = 0.002)))

sb <- MCMCglmm(surv ~ 1,
               random = ~ Fami + Block,
               family = 'categorical',
               data = euc,
               prior = pr,
               verbose = FALSE,
               pr = TRUE,
               burnin = 10000,
               nitt = 100000,
               thin = 10)

plot(sb$VCV)

You may be wondering Where does the 3.29 in the heritability formula comes from? Well, that’s the variance of the link function that, in the case of the logit link is pi*pi/3. In the case of MCMCglmm we can estimate the degree of genetic control quite easily, remembering that we have half-siblings (open-pollinated plants):

# Heritability
h2 = 4*sb$VCV[, 'Fami']/(sb$VCV[, 'Fami'] +
                         sb$VCV[, 'Block'] + 3.29 + 1)
posterior.mode(h2)
#     var1
#0.6476185 

HPDinterval(h2)
#         lower     upper
#var1 0.4056492 0.9698148
#attr(,"Probability")
#[1] 0.95

plot(h2)

By the way, it is good to remember that we need to back-transform the estimated effects to probabilities, with very simple code:

# Getting mode and credible interval for solutions
inv.logit(posterior.mode(sb$Sol))
inv.logit(HPDinterval(sb$Sol, 0.95))

Even if one of your trials is trashed there is a silver lining: it is possible to have a look at survival.

Coming out of the (Bayesian) closet: multivariate version

This week I’m facing my—and many other lecturers’—least favorite part of teaching: grading exams. In a supreme act of procrastination I will continue the previous post, and the antepenultimate one, showing the code for a bivariate analysis of a randomized complete block design.

Just to recap, the results from the REML multivariate analysis (that used ASReml-R) was the following:

library(asreml)

m4 = asreml(cbind(bden, veloc) ~ trait,
            random = ~ us(trait):Block +  us(trait):Family, data = a,
            rcov = ~ units:us(trait))

summary(m4)$varcomp

#                                      gamma    component    std.error
#trait:Block!trait.bden:bden    1.628812e+02 1.628812e+02 7.854123e+01
#trait:Block!trait.veloc:bden   1.960789e-01 1.960789e-01 2.273473e-01
#trait:Block!trait.veloc:veloc  2.185595e-03 2.185595e-03 1.205128e-03
#trait:Family!trait.bden:bden   8.248391e+01 8.248391e+01 2.932427e+01
#trait:Family!trait.veloc:bden  1.594152e-01 1.594152e-01 1.138992e-01
#trait:Family!trait.veloc:veloc 2.264225e-03 2.264225e-03 8.188618e-04
#R!variance                     1.000000e+00 1.000000e+00           NA
#R!trait.bden:bden              5.460010e+02 5.460010e+02 3.712833e+01
#R!trait.veloc:bden             6.028132e-01 6.028132e-01 1.387624e-01
#R!trait.veloc:veloc            1.710482e-02 1.710482e-02 9.820673e-04
#                                  z.ratio constraint
#trait:Block!trait.bden:bden     2.0738303   Positive
#trait:Block!trait.veloc:bden    0.8624639   Positive
#trait:Block!trait.veloc:veloc   1.8135789   Positive
#trait:Family!trait.bden:bden    2.8128203   Positive
#trait:Family!trait.veloc:bden   1.3996166   Positive
#trait:Family!trait.veloc:veloc  2.7650886   Positive
#R!variance                             NA      Fixed
#R!trait.bden:bden              14.7057812   Positive
#R!trait.veloc:bden              4.3442117   Positive
#R!trait.veloc:veloc            17.4171524   Positive

The corresponding MCMCglmm code is not that different from ASReml-R, after which it is modeled anyway. Following the recommendations of the MCMCglmm Course Notes (included with the package), the priors have been expanded to diagonal matrices with degree of belief equal to the number of traits. The general intercept is dropped (-1) so the trait keyword represents trait means. We are fitting unstructured (us(trait)) covariance matrices for both Block and Family, as well as an unstructured covariance matrix for the residuals. Finally, both traits follow a gaussian distribution:

library(MCMCglmm)
bp = list(R = list(V = diag(c(0.007, 260)), n = 2),
          G = list(G1 = list(V = diag(c(0.007, 260)), n = 2),
                   G2 = list(V = diag(c(0.007, 260)), n = 2)))

bmod = MCMCglmm(cbind(veloc, bden) ~ trait - 1,
                random = ~ us(trait):Block + us(trait):Family,
                rcov = ~ us(trait):units,
                family = c('gaussian', 'gaussian'),
                data = a,
                prior = bp,
                verbose = FALSE,
                pr = TRUE,
                burnin = 10000,
                nitt = 20000,
                thin = 10)

Further manipulation of the posterior distributions requires having an idea of the names used to store the results. Following that, we can build an estimate of the genetic correlation between the traits (Family covariance between traits divided by the square root of the product of the Family variances). Incidentally, it wouldn’t be a bad idea to run a much longer chain for this model, so the plot of the posterior for the correlation looks better, but I’m short of time:

dimnames(bmod$VCV)

rg = bmod$VCV[,'veloc:bden.Family']/sqrt(bmod$VCV[,'veloc:veloc.Family'] *
                                  bmod$VCV[,'bden:bden.Family'])

plot(rg)

posterior.mode(rg)
#     var1
#0.2221953 

HPDinterval(rg, prob = 0.95)
#         lower     upper
#var1 -0.132996 0.5764006
#attr(,"Probability")
#[1] 0.95

And that’s it! Time to congratulate Jarrod Hadfield for developing this package.

Coming out of the (Bayesian) closet

Until today all the posts in this blog have used a frequentist view of the world. I have a confession to make: I have an ecumenical view of statistics and I do sometimes use Bayesian approaches in data analyses. This is not quite one of those “the truth will set you free” moments, but I’ll show that one could almost painlessly repeat some of the analyses I presented before using MCMC.

MCMCglmm is a very neat package that—as its rather complicated em cee em cee gee el em em acronym implies—implements MCMC for generalized linear mixed models. We’ll skip that the frequentist fixed vs random effects distinction gets blurry in Bayesian models and still use the f… and r… terms. I’ll first repeat the code for a Randomized Complete Block design with Family effects (so we have two random factors) using both lme4 and ASReml-R and add the MCMCglmm counterpart:

library(lme4)
m1 = lmer(bden ~ 1 + (1|Block) + (1|Family),
                 data = trees)

summary(m1)

#Linear mixed model fit by REML
#Formula: bden ~ (1 | Block) + (1 | Family)
#   Data: a
#  AIC  BIC logLik deviance REMLdev
# 4572 4589  -2282     4569    4564
#Random effects:
# Groups   Name        Variance Std.Dev.
# Family   (Intercept)  82.803   9.0996
# Block    (Intercept) 162.743  12.7571
# Residual             545.980  23.3662
#Number of obs: 492, groups: Family, 50; Block, 11

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)  306.306      4.197   72.97

library(asreml)
m2 = asreml(bden ~ 1, random = ~ Block + Family,
                      data = trees)

summary(m2)$varcomp
#                      gamma component std.error   z.ratio
#Block!Block.var   0.2980766 162.74383  78.49271  2.073362
#Family!Family.var 0.1516591  82.80282  29.47153  2.809587
#R!variance        1.0000000 545.97983  37.18323 14.683496

#                 constraint
#Block!Block.var    Positive
#Family!Family.var  Positive
#R!variance         Positive

m2$coeff$fixed
#(Intercept)
#    306.306

We had already established that the results obtained from lme4 and ASReml-R were pretty much the same, at least for relatively simple models where we can use both packages (as their functionality diverges later for more complex models). This example is no exception and we quickly move to fitting the same model using MCMCglmm:

library(MCMCglmm)
priors = list(R = list(V = 260, n = 0.002),
              G = list(G1 = list(V = 260, n = 0.002),
                       G2 = list(V = 260, n = 0.002)))

m4 = MCMCglmm(bden ~ 1,
              random = ~ Block + Family,
              family = 'gaussian',
              data = a,
              prior = priors,
              verbose = FALSE,
              pr = TRUE,
              burnin = 10000,
              nitt = 20000,
              thin = 10)

plot(mcmc.list(m4$VCV))

autocorr(m4$VCV)

posterior.mode(m4$VCV)
#    Block    Family     units
#126.66633  72.97771 542.42237 

HPDinterval(m4$VCV)
#           lower    upper
#Block   33.12823 431.0233
#Family  26.34490 146.6648
#units  479.24201 627.7724

The first difference is that we have to specify priors for the coefficients that we would like to estimate (by default fixed effects, the overall intercept for our example, start with a zero mean and very large variance: 106). The phenotypic variance for our response is around 780, which I split into equal parts for Block, Family and Residuals. For each random effect we have provided our prior for the variance (V) and a degree of belief on our prior (n).

In addition to the model equation, name of the data set and prior information we need to specify things like the number of iterations in the chain (nitt), how many we are discarding for the initial burnin period (burnin), and how many values we are keeping (thin, every ten). Besides the pretty plot of the posterior distributions (see previous figure) they can be summarized using the posterior mode and high probability densities.

One of the neat things we can do is to painlessly create functions of variance components and get their posterior mode and credible interval. For example, the heritability (or degree of additive genetic control) can be estimated in this trial with full-sib families using the following formula:

\hat{h^2} = \frac{2 \sigma_F^2}{\sigma_F^2 + \sigma_B^2 + \sigma_R^2}

h2 <- 2 * m4$VCV[, 'Family']/(m4$VCV[, 'Family'] +
          m4$VCV[, 'Block'] + m4$VCV[, 'units'])
plot(h2)

posterior.mode(h2)
#0.1887017

HPDinterval(h2)
#          lower     upper
#var1 0.05951232 0.3414216
#attr(,"Probability")
#[1] 0.95

There are some differences on the final results between ASReml-R/lme4 and MCMCglmm; however, the gammas (ratios of variance component/error variance) for the posterior distributions are very similar, and the estimated heritabilities are almost identical (~0.19 vs ~0.21). Overall, MCMCglmm is a very interesting package that covers a lot of ground. It pays to read the reference manual and vignettes, which go into a lot of detail on how to specify different types of models. I will present the MCMCglmm bivariate analyses in another post.

P.S. There are several other ways that we could fit this model in R using a Bayesian approach: it is possible to call WinBugs or JAGS (in Linux and OS X) from R, or we could have used INLA. More on this in future posts.

Multivariate linear mixed models: livin’ la vida loca

I swear there was a point in writing an introduction to covariance structures: now we can start joining all sort of analyses using very similar notation. In a previous post I described simple (even simplistic) models for a single response variable (or ‘trait’ in quantitative geneticist speak). The R code in three R packages (asreml-R, lme4 and nlme) was quite similar and we were happy-clappy with the consistency of results across packages. The curse of the analyst/statistician/guy who dabbles in analyses is the idea that we can always fit a nicer model and—as both Bob the Builder and Obama like to say—yes, we can.

Let’s assume that we have a single trial where we have assessed our experimental units (trees in my case) for more than one response variable; for example, we measured the trees for acoustic velocity (related to stiffness) and basic density. If you don’t like trees, think of having height and body weight for people, or whatever picks your fancy. Anyway, we have our traditional randomized complete block design with random blocks, random families and that’s it. Rhetorical question: Can we simultaneously run an analysis for all responses?

setwd('~/Dropbox/quantumforest')
library(asreml)
canty = read.csv('canty_trials.csv', header = TRUE)
summary(canty)

m1 = asreml(bden ~ 1, random = ~ Block + Family, data = canty)
summary(m1)$varcomp

                      gamma component std.error   z.ratio constraint
Block!Block.var   0.2980766 162.74383  78.49271  2.073362   Positive
Family!Family.var 0.1516591  82.80282  29.47153  2.809587   Positive
R!variance        1.0000000 545.97983  37.18323 14.683496   Positive

m2 = asreml(veloc ~ 1, random = ~ Block + Family, data = canty)
summary(m2)$varcomp

                      gamma   component    std.error   z.ratio constraint
Block!Block.var   0.1255846 0.002186295 0.0011774906  1.856741   Positive
Family!Family.var 0.1290489 0.002246605 0.0008311341  2.703059   Positive
R!variance        1.0000000 0.017408946 0.0012004136 14.502456   Positive

Up to this point we are using the same old code, and remember that we could fit the same model using lme4, so what’s the point of this post? Well, we can now move to fit a multivariate model, where we have two responses at the same time (incidentally, below we have a plot of the two response variables, showing a correlation of ~0.2).

We can first refit the model as a multivariate analysis, assuming block-diagonal covariance matrices. The notation now includes:

  • The use of cbind() to specify the response matrix,
  • the reserved keyword trait, which creates a vector to hold the overall mean for each response,
  • at(trait), which asks ASReml-R to fit an effect (e.g. Block) at each trait, by default using a diagonal covariance matrix (σ2 I). We could also use diag(trait) for the same effect,
  • rcov = ~ units:diag(trait) specifies a different diagonal matrix for the residuals (units) of each trait.
m3 = asreml(cbind(bden, veloc) ~ trait,
            random = ~ at(trait):Block +  at(trait):Family, data = canty,
            rcov = ~ units:diag(trait))

summary(m3)$varcomp
                                          gamma    component    std.error
at(trait, bden):Block!Block.var    1.627438e+02 1.627438e+02 78.492736507
at(trait, veloc):Block!Block.var   2.186295e-03 2.186295e-03  0.001177495
at(trait, bden):Family!Family.var  8.280282e+01 8.280282e+01 29.471507439
at(trait, veloc):Family!Family.var 2.246605e-03 2.246605e-03  0.000831134
R!variance                         1.000000e+00 1.000000e+00           NA
R!trait.bden.var                   5.459799e+02 5.459799e+02 37.183234014
R!trait.veloc.var                  1.740894e-02 1.740894e-02  0.001200414
                                     z.ratio constraint
at(trait, bden):Block!Block.var     2.073362   Positive
at(trait, veloc):Block!Block.var    1.856733   Positive
at(trait, bden):Family!Family.var   2.809589   Positive
at(trait, veloc):Family!Family.var  2.703059   Positive
R!variance                                NA      Fixed
R!trait.bden.var                   14.683496   Positive
R!trait.veloc.var                  14.502455   Positive

Initially, you may not notice that the results are identical, as there is a distracting change to scientific notation for the variance components. A closer inspection shows that we have obtained the same results for both traits, but did we gain anything? Not really, as we took the defaults for covariance components (direct sum of diagonal matrices, which assumes uncorrelated traits); however, we can do better and actually tell ASReml-R to fit the correlation between traits for block and family effects as well as for residuals.

m4 = asreml(cbind(bden, veloc) ~ trait,
            random = ~ us(trait):Block +  us(trait):Family, data = a,
            rcov = ~ units:us(trait))

summary(m4)$varcomp

                                      gamma    component    std.error
trait:Block!trait.bden:bden    1.628812e+02 1.628812e+02 7.854123e+01
trait:Block!trait.veloc:bden   1.960789e-01 1.960789e-01 2.273473e-01
trait:Block!trait.veloc:veloc  2.185595e-03 2.185595e-03 1.205128e-03
trait:Family!trait.bden:bden   8.248391e+01 8.248391e+01 2.932427e+01
trait:Family!trait.veloc:bden  1.594152e-01 1.594152e-01 1.138992e-01
trait:Family!trait.veloc:veloc 2.264225e-03 2.264225e-03 8.188618e-04
R!variance                     1.000000e+00 1.000000e+00           NA
R!trait.bden:bden              5.460010e+02 5.460010e+02 3.712833e+01
R!trait.veloc:bden             6.028132e-01 6.028132e-01 1.387624e-01
R!trait.veloc:veloc            1.710482e-02 1.710482e-02 9.820673e-04
                                  z.ratio constraint
trait:Block!trait.bden:bden     2.0738303   Positive
trait:Block!trait.veloc:bden    0.8624639   Positive
trait:Block!trait.veloc:veloc   1.8135789   Positive
trait:Family!trait.bden:bden    2.8128203   Positive
trait:Family!trait.veloc:bden   1.3996166   Positive
trait:Family!trait.veloc:veloc  2.7650886   Positive
R!variance                             NA      Fixed
R!trait.bden:bden              14.7057812   Positive
R!trait.veloc:bden              4.3442117   Positive
R!trait.veloc:veloc            17.4171524   Positive

Moving from model 3 to model 4 we are adding three covariance components (one for each family, block and residuals) and improving log-likelihood by 8.5. A quick look at the output from m4 would indicate that most of that gain is coming from allowing for the covariance of residuals for the two traits, as the covariances for family and, particularly, block are more modest:

summary(m3)$loglik
[1] -1133.312

summary(m4)$loglik
[1] -1124.781

An alternative parameterization for model 4 would be to use a correlation matrix with heterogeneous variances (corgh(trait) instead of us(trait)), which would model the correlation and the variances instead of the covariance and the variances. This approach seems to be more numerically stable sometimes.

As an aside, we can estimate the between traits correlation for Blocks (probably not that interesting) and the one for Family (much more interesting, as it is an estimate of the genetic correlation between traits: 1.594152e-01 / sqrt(8.248391e + 01*2.264225e-03) = 0.37).

Spatial correlation in designed experiments

Last Wednesday I had a meeting with the folks of the New Zealand Drylands Forest Initiative in Blenheim. In addition to sitting in a conference room and having nice sandwiches we went to visit one of our progeny trials at Cravens. Plantation forestry trials are usually laid out following a rectangular lattice defined by rows and columns. The trial follows an incomplete block design with 148 blocks and is testing 60 Eucalyptus bosistoana families. A quick look at survival shows an interesting trend: the bottom of the trial was much more affected by frost than the top.

setwd('~/Dropbox/quantumforest')
library(ggplot2) # for qplot
library(asreml)

load('cravens.Rdata')

qplot(col, row, fill = Surv, geom='tile', data = cravens)

We have the trial assessed for tree height (ht) at 2 years of age, where a plot also shows some spatial trends, with better growth on the top-left corner; this is the trait that I will analyze here.

Tree height can be expressed as a function of an overall constant, random block effects, random family effects and a normally distributed residual (this is our base model, m1). I will then take into account the position of the trees (expressed as rows and columns within the trial) to fit spatially correlated residuals (m2)—using separable rows and columns processes—to finally add a spatially independent residual to the mix (m3).

# Base model (non-spatial)
m1 = asreml(ht ~ 1, random = ~ Block + Family, data = cravens)
summary(m1)$loglik
[1] -8834.071

summary(m1)$varcomp

                       gamma component std.error   z.ratio constraint
Block!Block.var   0.52606739 1227.4058 168.84681  7.269345   Positive
Family!Family.var 0.06257139  145.9898  42.20675  3.458921   Positive
R!variance        1.00000000 2333.1722  78.32733 29.787459   Positive

m1 represents a traditional family model with only the overall constant as the only fixed effect and a diagonal residual matrix (identity times the residual variance). In m2 I am modeling the R matrix as the Kronecker product of two separable autoregressive processes (one in rows and one in columns) times a spatial residual.

# Spatial model (without non-spatial residual)
m2 = asreml(ht ~ 1, random = ~ Block + Family,
            rcov = ~ ar1(row):ar1(col),
            data = cravens[order(cravens$row, cravens$col),])
summary(m2)$loglik
[1] -8782.112

summary(m2)$varcomp

                       gamma    component    std.error   z.ratio
Block!Block.var   0.42232303 1025.9588216 157.00799442  6.534437
Family!Family.var 0.05848639  142.0822874  40.20346956  3.534080
R!variance        1.00000000 2429.3224308  88.83064209 27.347798
R!row.cor         0.09915320    0.0991532   0.02981808  3.325271
R!col.cor         0.28044024    0.2804402   0.02605972 10.761445

                      constraint
Block!Block.var         Positive
Family!Family.var       Positive
R!variance              Positive
R!row.cor          Unconstrained
R!col.cor          Unconstrained

Adding two parameters to the model results in improving the log-likelihood from -8834.071 to -8782.112, a difference of 51.959, but with what appears to be a low correlation in both directions. In m3 I am adding an spatially independent residual (using the keyword units), improving log-likelihood from -8782.112 to -8660.411: not too shabby.

m3 = asreml(ht ~ 1, random = ~ Block + Family + units,
            rcov = ~ ar1(row):ar1(col),
            data = cravens[order(cravens$row, cravens$col),])
summary(m3)$loglik
[1] -8660.411

summary(m3)$varcomp

                         gamma    component    std.error    z.ratio
Block!Block.var   3.069864e-07 4.155431e-04 6.676718e-05   6.223763
Family!Family.var 1.205382e-01 1.631630e+02 4.327885e+01   3.770040
units!units.var   1.354166e+00 1.833027e+03 7.085134e+01  25.871456
R!variance        1.000000e+00 1.353621e+03 2.174923e+02   6.223763
R!row.cor         7.814065e-01 7.814065e-01 4.355976e-02  17.938724
R!col.cor         9.529984e-01 9.529984e-01 9.100529e-03 104.719017
                     constraint
Block!Block.var        Boundary
Family!Family.var      Positive
units!units.var        Positive
R!variance             Positive
R!row.cor         Unconstrained
R!col.cor         Unconstrained

Allowing for the independent residual (in addition to the spatial one) has permitted higher autocorrelations (particularly in columns), while the Block variance has gone very close to zero. Most of the Block variation is being captured now through the residual matrix (in the rows and columns autoregressive correlations), but we are going to keep it in the model, as it represents a restriction to the randomization process. In addition to log-likelihood (or AIC) we can get graphical descriptions of the fit by plotting the empirical semi-variogram as in plot(variogram(m3)):

Large applications of linear mixed models

In a previous post I summarily described our options for (generalized to varying degrees) linear mixed models from a frequentist point of view: nlme, lme4 and ASReml-R, followed by a quick example for a split-plot experiment.

But who is really happy with a toy example? We can show a slightly more complicated example assuming that we have a simple situation in breeding: a number of half-sib trials (so we have progeny that share one parent in common), each of them established following a randomized complete block design, analyzed using a ‘family model’. That is, the response variable (dbh: tree stem diameter assessed at breast height—1.3m from ground level) can be expressed as a function of an overall mean, fixed site effects, random block effects (within each site), random family effects and a random site-family interaction. The latter provides an indication of genotype by environment interaction.

setwd('~/Dropbox/quantumforest')

trees = read.table('treedbh.txt', header=TRUE)

# Removing all full-siblings and dropping Trial levels
# that are not used anymore
trees = subset(trees, Father == 0)
trees$Trial = trees$Trial[drop = TRUE]

# Which let me with 189,976 trees in 31 trials
xtabs(~Trial, data = trees)

# First run the analysis with lme4
library(lme4)
m1 = lmer(dbh ~ Trial + (1|Trial:Block) + (1|Mother) +
                (1|Trial:Mother), data = trees)

# Now run the analysis with ASReml-R
library(asreml)
m2 = asreml(dbh ~ Trial, random = ~Trial:Block + Mother +
                  Trial:Mother, data = trees)

First I should make clear that this model has several drawbacks:

  • It assumes that we have homogeneous variances for all trials, as well as identical correlation between trials. That is, that we are assuming compound symmetry, which is very rarely the case in multi-environment progeny trials.
  • It also assumes that all families are unrelated to each other, which in this case I know it is not true, as a quick look at the pedigree will show that several mothers are related.
  • We assume only one type of families (half-sibs), which is true just because I got rid of all the full-sib families and clonal material, to keep things simple in this example.

Just to make the point, a quick look at the data using ggplot2—with some jittering and alpha transparency to better display a large number of points—shows differences in both mean and variance among sites.

library(ggplot2)
ggplot(aes(x=jitter(as.numeric(Trial)), y=dbh), data = trees) +
      geom_point(colour=alpha('black', 0.05)) +
      scale_x_continuous('Trial')

Anyway, let’s keep on going with this simplistic model. In my computer, a two year old iMac, ASReml-R takes roughly 9 seconds, while lme4 takes around 65 seconds, obtaining similar results.

# First lme4 (and rounding off the variances)
summary(m1)

Random effects:
 Groups       Name        Variance Std.Dev.
 Trial:Mother (Intercept)  26.457   5.1436
 Mother       (Intercept)  32.817   5.7286
 Trial:Block  (Intercept)  77.390   8.7972
 Residual                 892.391  29.8729 

# And now ASReml-R
# Round off part of the variance components table
round(summary(m2)$varcomp[,1:4], 3)
                       gamma component std.error z.ratio
Trial:Block!Trial.var  0.087    77.399     4.598  16.832
Mother!Mother.var      0.037    32.819     1.641  20.002
Trial:Mother!Trial.var 0.030    26.459     1.241  21.328
R!variance             1.000   892.389     2.958 301.672

The residual matrix of this model is a, fairly large, diagonal matrix (residual variance times an identity matrix). At this point we can relax this assumption, adding a bit more complexity to the model so we can highlight some syntax. Residuals in one trial should have nothing to do with residuals in another trial, which could be hundreds of kilometers away. I will then allow for a new matrix of residuals, which is the direct sum of trial-specific diagonal matrices. In ASReml-R we can do so by specifying a diagonal matrix at each trial with rcov = ~ at(Trial):units:

m2b =  asreml(dbh ~ Trial, random = ~Trial:Block + Mother +
                    Trial:Mother, data = trees,
                    rcov = ~ at(Trial):units)

# Again, extracting and rounding variance components
round(summary(m2b)$varcomp[,1:4], 3)
                          gamma component std.error z.ratio
Trial:Block!Trial.var    77.650    77.650     4.602  16.874
Mother!Mother.var        30.241    30.241     1.512  20.006
Trial:Mother!Trial.var   22.435    22.435     1.118  20.065
Trial_1!variance       1176.893  1176.893    18.798  62.606
Trial_2!variance       1093.409  1093.409    13.946  78.403
Trial_3!variance        983.924   983.924    12.061  81.581
...
Trial_29!variance      2104.867  2104.867    55.821  37.708
Trial_30!variance       520.932   520.932    16.372  31.819
Trial_31!variance       936.785   936.785    31.211  30.015

There is a big improvement of log-Likelihood from m2 (-744452.1) to m2b (-738011.6) for 30 additional variances. At this stage, we can also start thinking of heterogeneous variances for blocks, with a small change to the code:

m2c =  asreml(dbh ~ Trial, random = ~at(Trial):Block + Mother +
                    Trial:Mother, data = wood,
                    rcov = ~ at(Trial):units)
round(summary(m2c)$varcomp[,1:4], 3)

# Which adds another 30 variances (one for each Trial)
                                 gamma component std.error z.ratio
at(Trial, 1):Block!Block.var     2.473     2.473     2.268   1.091
at(Trial, 2):Block!Block.var    95.911    95.911    68.124   1.408
at(Trial, 3):Block!Block.var     1.147     1.147     1.064   1.079
...
Mother!Mother.var               30.243    30.243     1.512  20.008
Trial:Mother!Trial.var          22.428    22.428     1.118  20.062
Trial_1!variance              1176.891  1176.891    18.798  62.607
Trial_2!variance              1093.415  1093.415    13.946  78.403
Trial_3!variance               983.926   983.926    12.061  81.581
...

At this point we could do extra modeling at the site level by including any other experimental design features, allowing for spatially correlated residuals, etc. I will cover some of these issues in future posts, as this one is getting a bit too long. However, we could gain even more by expressing the problem in a multivariate fashion, where the performance of Mothers in each trial would be considered a different trait. This would push us towards some large problems, which require modeling covariance structures so we have a change of achieving convergence during our lifetime.

Disclaimer: I am emphasizing the use of ASReml-R because 1.- I am most familiar with it than with nlme or lme4 (although I tend to use nlme for small projects), and 2.- There are plenty of examples for the other packages but not many for ASReml in the R world. I know some of the people involved in the development of ASReml-R but otherwise I have no commercial links with VSN (the guys that market ASReml and Genstat).

There are other packages that I have not had the chance to investigate yet, like hglm.

Linear mixed models in R

A substantial part of my job has little to do with statistics; nevertheless, a large proportion of the statistical side of things relates to applications of linear mixed models. The bulk of my use of mixed models relates to the analysis of experiments that have a genetic structure.

A brief history of time

At the beginning (1992-1995) I would use SAS (first proc glm, later proc mixed), but things started getting painfully slow and limiting if one wanted to move into animal model BLUP. At that time (1995-1996), I moved to DFREML (by Karen Meyer, now replaced by WOMBAT) and AIREML (by Dave Johnson, now defunct—the program I mean), which were designed for the analysis of animal breeding progeny trials, so it was a hassle to deal with experimental design features. At the end of 1996 (or was it the beginning of 1997?) I started playing with ASReml (programed by Arthur Gilmour mostly based on theoretical work by Robin Thompson and Brian Cullis). I was still using SAS for data preparation, but all my analyses went through ASReml (for which I wrote the cookbook), which was orders of magnitude faster than SAS (and could deal with much bigger problems). Around 1999, I started playing with R (prompted by a suggestion from Rod Ball), but I didn’t really use R/S+ often enough until 2003. At the end of 2005 I started using OS X and quickly realized that using a virtual machine or dual booting was not really worth it, so I dropped SAS and totally relied on R in 2009.

Options

As for many other problems, there are several packages in R that let you deal with linear mixed models from a frequentist (REML) point of view. I will only mention nlme (Non-Linear Mixed Effects), lme4 (Linear Mixed Effects) and asreml (average spatial reml). There are also several options for Bayesian approaches, but that will be another post.

nlme is the most mature one and comes by default with any R installation. In addition to fitting hierarchical generalized linear mixed models it also allows fitting non-linear mixed models following a Gaussian distribution (my explanation wasn’t very clear, thanks to ucfagls below for pointing this out). Its main advantages are, in my humble opinion, the ability to fit fairly complex hierarchical models using linear or non-linear approaches, a good variety of variance and correlation structures, and access to several distributions and link functions for generalized models. In my opinion, its main drawbacks are i- fitting cross-classified random factors is a pain, ii- it can be slow and may struggle with lots of data, iii- it does not deal with pedigrees by default and iv- it does not deal with multivariate data.

lme4 is a project led by Douglas Bates (one of the co-authors of nlme), looking at modernizing the code and making room for trying new ideas. On the positive side, it seems to be a bit faster than nlme and it deals a lot better with cross-classified random factors. Drawbacks: similar to nlme’s, but dropping point i- and adding that it doesn’t deal with covariance and correlation structures yet. It is possible to fit pedigrees using the pedigreemm package, but I find the combination a bit flimsy.

ASReml-R is, unsurprisingly, an R package interface to ASReml. On the plus side it i- deals well with cross-classified random effects, ii- copes very well with pedigrees, iii- can work with fairly large datasets, iv-can run multivariate analyses and v- covers a large number of covariance and correlation structures. Main drawbacks are i- limited functionality for non-Gaussian responses, ii- it does not cover non-linear models and iii- it is non-free (as in beer and speech). The last drawback is relative; it is possible to freely use asreml for academic purposes (and there is also a version for developing countries). Besides researchers, the main users of ASReml/ASReml-R are breeding companies.

All these three packages are available for Windows, Linux and OS X.

A (very) simple example

I will use a traditional dataset to show examples of the notation for the three packages: Yates’ variety and nitrogen split-plot experiment. We can get the dataset from the MASS package, after which it is a good idea to rename the variables using meaningful names. In addition, I will follow Bill Venables’s excellent advice and create additional variables for main plot and subplots, as it is confusing to use the same factor for two purposes (e.g. variety as treatment and main plot). Incidentally, if you haven’t read Bill’s post go and read it; it is one of the best explanations I have ever seen for a split-plot analysis.

library(MASS)
data(oats)
names(oats) = c('block', 'variety', 'nitrogen', 'yield')
oats$mainplot = oats$variety
oats$subplot = oats$nitrogen

summary(oats)
 block           variety     nitrogen      yield              mainplot
 I  :12   Golden.rain:24   0.0cwt:18   Min.   : 53.0   Golden.rain:24
 II :12   Marvellous :24   0.2cwt:18   1st Qu.: 86.0   Marvellous :24
 III:12   Victory    :24   0.4cwt:18   Median :102.5   Victory    :24
 IV :12                    0.6cwt:18   Mean   :104.0
 V  :12                                3rd Qu.:121.2
 VI :12                                Max.   :174.0
   subplot
 0.0cwt:18
 0.2cwt:18
 0.4cwt:18
 0.6cwt:18

The nlme code for this analysis is fairly simple: response on the left-hand side of the tilde, followed by the fixed effects (variety, nitrogen and their interaction). Then there is the specification of the random effects (which also uses a tilde) and the data set containing all the data. Notice that 1|block/mainplot is fitting block and mainplot within block. There is no reference to subplot as there is a single assessment for each subplot, which ends up being used at the residual level.

library(nlme)
m1.nlme = lme(yield ~ variety*nitrogen,
                      random = ~ 1|block/mainplot,
                      data = oats)

summary(m1.nlme)

Linear mixed-effects model fit by REML
 Data: oats
       AIC      BIC    logLik
  559.0285 590.4437 -264.5143

Random effects:
 Formula: ~1 | block
        (Intercept)
StdDev:    14.64496

 Formula: ~1 | mainplot %in% block
        (Intercept) Residual
StdDev:    10.29863 13.30727

Fixed effects: yield ~ variety * nitrogen
                                    Value Std.Error DF   t-value p-value
(Intercept)                      80.00000  9.106958 45  8.784492  0.0000
varietyMarvellous                 6.66667  9.715028 10  0.686222  0.5082
varietyVictory                   -8.50000  9.715028 10 -0.874933  0.4021
nitrogen0.2cwt                   18.50000  7.682957 45  2.407927  0.0202
nitrogen0.4cwt                   34.66667  7.682957 45  4.512152  0.0000
nitrogen0.6cwt                   44.83333  7.682957 45  5.835427  0.0000
varietyMarvellous:nitrogen0.2cwt  3.33333 10.865342 45  0.306786  0.7604
varietyVictory:nitrogen0.2cwt    -0.33333 10.865342 45 -0.030679  0.9757
varietyMarvellous:nitrogen0.4cwt -4.16667 10.865342 45 -0.383482  0.7032
varietyVictory:nitrogen0.4cwt     4.66667 10.865342 45  0.429500  0.6696
varietyMarvellous:nitrogen0.6cwt -4.66667 10.865342 45 -0.429500  0.6696
varietyVictory:nitrogen0.6cwt     2.16667 10.865342 45  0.199411  0.8428

anova(m1.nlme)

                 numDF denDF   F-value p-value
(Intercept)          1    45 245.14299  &lt;.0001
variety              2    10   1.48534  0.2724
nitrogen             3    45  37.68562  &lt;.0001
variety:nitrogen     6    45   0.30282  0.9322

The syntax for lme4 is not that dissimilar, with random effects specified using a (1|something here) syntax. One difference between the two packages is that nlme reports standard deviations instead of variances for the random effects.

library(lme4)
m1.lme4 = lmer(yield ~ variety*nitrogen + (1|block/mainplot),
                       data = oats)

summary(m1.lme4)

Linear mixed model fit by REML
Formula: yield ~ variety * nitrogen + (1 | block/mainplot)
   Data: oats
 AIC   BIC logLik deviance REMLdev
 559 593.2 -264.5    595.9     529
Random effects:
 Groups         Name        Variance Std.Dev.
 mainplot:block (Intercept) 106.06   10.299
 block          (Intercept) 214.48   14.645
 Residual                   177.08   13.307
Number of obs: 72, groups: mainplot:block, 18; block, 6

Fixed effects:
                                 Estimate Std. Error t value
(Intercept)                       80.0000     9.1064   8.785
varietyMarvellous                  6.6667     9.7150   0.686
varietyVictory                    -8.5000     9.7150  -0.875
nitrogen0.2cwt                    18.5000     7.6830   2.408
nitrogen0.4cwt                    34.6667     7.6830   4.512
nitrogen0.6cwt                    44.8333     7.6830   5.835
varietyMarvellous:nitrogen0.2cwt   3.3333    10.8653   0.307
varietyVictory:nitrogen0.2cwt     -0.3333    10.8653  -0.031
varietyMarvellous:nitrogen0.4cwt  -4.1667    10.8653  -0.383
varietyVictory:nitrogen0.4cwt      4.6667    10.8653   0.430
varietyMarvellous:nitrogen0.6cwt  -4.6667    10.8653  -0.430
varietyVictory:nitrogen0.6cwt      2.1667    10.8653   0.199

anova(m1.lme4)

Analysis of Variance Table
                 Df  Sum Sq Mean Sq F value
variety           2   526.1   263.0  1.4853
nitrogen          3 20020.5  6673.5 37.6856
variety:nitrogen  6   321.7    53.6  0.3028

For this type of problem, the notation for asreml is also very similar, particularly when compared to nlme.

library(asreml)
m1.asreml = asreml(yield ~ variety*nitrogen,
                           random = ~ block/mainplot,
                           data = oats)

summary(m1.asreml)$varcomp

                             gamma component std.error  z.ratio constraint
block!block.var          1.2111647  214.4771 168.83404 1.270343   Positive
block:mainplot!block.var 0.5989373  106.0618  67.87553 1.562593   Positive
R!variance               1.0000000  177.0833  37.33244 4.743416   Positive

wald(m1.asreml, denDF = 'algebraic')

$Wald
                 Df denDF    F.inc           Pr
(Intercept)       1     5 245.1000 1.931825e-05
variety           2    10   1.4850 2.723869e-01
nitrogen          3    45  37.6900 2.457710e-12
variety:nitrogen  6    45   0.3028 9.321988e-01

$stratumVariances
               df  Variance block block:mainplot R!variance
block           5 3175.0556    12              4          1
block:mainplot 10  601.3306     0              4          1
R!variance     45  177.0833     0              0          1

In this simple example one pretty much gets the same results, independently of the package used (which is certainly comforting). I will soon cover another simple model, but with much larger dataset, to highlight some performance differences between the packages.

Maximum likelihood

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding.

Background

Let’s assume that we are working with a fairly simple linear model, where we only have a response variable (say tree stem diameter in cm). If we want to ‘guess’ the diameter for a tree (yi) our best bet is the average (μ) and we will have a residual (εi). The model equation then looks like:

y_i = \mu + \varepsilon_i

We want to estimate both the overall mean (μ) and the error variance (\sigma_{\varepsilon}^2), for which we could use least squares. However, we will use an alternative method (maximum likelihood) because that is the point of this post. A likelihood function expresses the probability of obtaining the observed sample from a population given a set of model parameters. Thus, it answers the question What is the probability of observing the current dataset, when I assume a given set of parameters for my linear model? To do so, we have to assume a distribution, say normal, for which the probability density function (p.d.f) is:

p.d.f. (y) = \frac{1} {\sqrt{2 \pi} \sigma} e^{-1/2 \frac{(y- \mu)^2} {\sigma^2}}

with n independent samples the likelihood (L) is:

L = \prod_{i=1}^n \frac{1} {\sqrt{2 \pi} \sigma} e^{-1/2 \frac{(y- \mu)^2} {\sigma^2}}

where ∏ is a multiplication operator, analogous to the summation operator ∑. Maximizing L or its natural logarithm (LogL) is equivalent, then:

LogL = \sum_{i=1}^n \log\left ( \frac{1} {\sqrt{2 \pi} \sigma} e^{-1/2 \frac{(y- \mu)^2} {\sigma^2}} \right )

LogL= \sum_{i=1}^n \left [ \log\left ( \frac{1} {\sqrt{2 \pi} \sigma} \right ) + \log\left ( e^{-1/2 \frac{(y- \mu)^2} {\sigma^2}}  \right ) \right ]

LogL= \sum_{i=1}^n \log(2 \pi)^{-1/2} + \sum_{i=1}^n \log\left (\frac{1} {\sigma}\right ) + \sum_{i=1}^n \left ( -1/2 \frac{(y- \mu)^2} {\sigma^2} \right )

LogL= -\frac{n} {2} \log(2 \pi) - n \log(\sigma) - \frac{1} {2 \sigma^2} \sum_{i=1}^n ( y - \mu)^2

Considering only μ, LogL is maximum when \sum_{i=1}^n ( y - \mu)^2 is a minimum, i.e. for:

\mu = \frac{\sum_{i=1}^n y_i}{n}

Now considering σ:

\frac{\delta LogL}{\delta \sigma} = -\frac{n}{\sigma} - (-2) \frac{1}{2} \sum_{i=1}^n (y - \mu)^2 \sigma^{-3}
\frac{\delta LogL}{\delta \sigma} = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i=1}^n (y - \mu)^2

and setting the last expression equal to 0:

\frac{n}{\sigma} = \frac{1}{\sigma^3} \sum_{i=1}^n (y - \mu)^2
\sigma^2 = \frac{\sum_{i=1}^n (y - \mu)^2}{n}

We can obtain an estimate of the error variance (\hat{\sigma}, notice the hat) by replacing μ by our previous estimate:
\hat{\sigma}^2 = \frac{\sum_{i=1}^n (y - \bar{y})^2}{n}

which is biased, because the denominator is n rather than (n – 1) (the typical denominator for sample variance). This bias arises because maximum likelihood estimates do not take into account the loss of degrees of freedom when estimating fixed effects.

Playing in R with an example

We have data for stem diameters (in mm) for twelve 10 year-old radiata pine (Pinus radiata D. Don) trees:

diams = c(150, 124, 100, 107, 170, 144,
          113, 108, 92, 129, 123, 118)

# Obtain typical mean and standard deviation
mean(diams)

[1] 123.1667

sd(diams)
[1] 22.38438

# A simple function to calculate likelihood.
# Values quickly become very small
like = function(data, mu, sigma) {
  like = 1
  for(obs in data){
    like = like * 1/(sqrt(2*pi)*sigma) *
                     exp(-1/2 * (obs - mu)^2/(sigma^2))
  }
  return(like)
}

# For example, likelihood of data if mu=120 and sigma=20
like(diams, 120, 20)
[1] 3.476384e-24

# A simple function to calculate log-likelihood.
# Values will be easier to manage
loglike = function(data, mu, sigma) {
  loglike = 0
    for(obs in data){
      loglike = loglike +
                log(1/(sqrt(2*pi)*sigma) *
                       exp(-1/2 * (obs - mu)^2/(sigma^2)))
    }
    return(loglike)
}

# Example, log-likelihood of data if mu=120 and sigma=20
loglike(diams, 122, 20)
[1] -53.88605

# Let's try some combinations of parameters and
# plot the results
params = expand.grid(mu = seq(50, 200, 1),
                     sigma = seq(10, 40, 1))
params$logL = with(params, loglike(diams, mu, sigma))
summary(params)

library(lattice)
contourplot(logL ~ mu*sigma, data = params, cuts = 20)

It is difficult to see the maximum likelihood in this plot, so we will zoom-in by generating a smaller grid around the typical mean and standard deviation estimates.

# Zooming in
params = expand.grid(mu = seq(120, 126, 0.01),
                     sigma = seq(18, 25, 0.1))
params$logL = with(params, loglike(diams, mu, sigma))
summary(params)

contourplot(logL ~ mu*sigma, data = params, cuts = 10)

We can now check what would be actual results using any of the functions to fit models using maximum likelihood, for example gls:

library(nlme)
m1 = gls(diams ~ 1, method='ML')
summary(m1)

Generalized least squares fit by maximum likelihood
  Model: diams ~ 1
  Data: NULL
       AIC      BIC    logLik
  111.6111 112.5809 -53.80556

Coefficients:
               Value Std.Error  t-value p-value
(Intercept) 123.1667  6.461815 19.06069       0

Residual standard error: 21.43142
Degrees of freedom: 12 total; 11 residual

In real life, software will use an iterative process to find the combination of parameters that maximizes the log-likelihood value. If we go back to our function and use loglike(diams, 123.1667, 21.43142) we will obtain -53.80556; exactly the same value calculated by gls. In addition, we can see that the estimated standard deviation (21.43) is slightly lower than the one calculated by the function sd() (22.38), because our biased estimate divides the sum of squared deviations by n rather than by n-1.

P.S. The code for the likelihood and log-likelihood functions is far from being optimal (the loops could be vectorized). However, the loops are easier to understand for many people.