Coming out of the (Bayesian) closet

2011-11-04 / Luis

Until today all the posts in this blog have used a frequentist view of the world. I have a confession to make: I have an ecumenical view of statistics and I do sometimes use Bayesian approaches in data analyses. This is not quite one of those “the truth will set you free” moments, but I’ll show that one could almost painlessly repeat some of the analyses I presented before using MCMC.

MCMCglmm is a very neat package that—as its rather complicated em cee em cee gee el em em acronym implies—implements MCMC for generalized linear mixed models. We’ll skip that the frequentist fixed vs random effects distinction gets blurry in Bayesian models and still use the f… and r… terms. I’ll first repeat the code for a Randomized Complete Block design with Family effects (so we have two random factors) using both lme4 and ASReml-R and add the MCMCglmm counterpart:

library(lme4)
m1 <- lmer(bden ~ 1 + (1|Block) + (1|Family),
           data = trees)

summary(m1)

#Linear mixed model fit by REML
#Formula: bden ~ (1 | Block) + (1 | Family)
#   Data: a
#  AIC  BIC logLik deviance REMLdev
# 4572 4589  -2282     4569    4564
#Random effects:
# Groups   Name        Variance Std.Dev.
# Family   (Intercept)  82.803   9.0996
# Block    (Intercept) 162.743  12.7571
# Residual             545.980  23.3662
#Number of obs: 492, groups: Family, 50; Block, 11

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)  306.306      4.197   72.97


library(asreml)
m2 <- asreml(bden ~ 1, random = ~ Block + Family,
             data = trees)

summary(m2)$varcomp
#                      gamma component std.error   z.ratio
#Block!Block.var   0.2980766 162.74383  78.49271  2.073362
#Family!Family.var 0.1516591  82.80282  29.47153  2.809587
#R!variance        1.0000000 545.97983  37.18323 14.683496

#                 constraint
#Block!Block.var    Positive
#Family!Family.var  Positive
#R!variance         Positive


m2$coeff$fixed
#(Intercept)
#    306.306

We had already established that the results obtained from lme4 and ASReml-R were pretty much the same, at least for relatively simple models where we can use both packages (as their functionality diverges later for more complex models). This example is no exception and we quickly move to fitting the same model using MCMCglmm:

library(MCMCglmm)
priors <- list(R = list(V = 260, n = 0.002),
               G = list(G1 = list(V = 260, n = 0.002),
               G2 = list(V = 260, n = 0.002)))

m4 <- MCMCglmm(bden ~ 1,
               random = ~ Block + Family,
               family = 'gaussian',
               data = a,
               prior = priors,
               verbose = FALSE,
               pr = TRUE,
               burnin = 10000,
               nitt = 20000,
               thin = 10)

plot(mcmc.list(m4$VCV))

autocorr(m4$VCV)

posterior.mode(m4$VCV)
#    Block    Family     units
#126.66633  72.97771 542.42237

HPDinterval(m4$VCV)
#           lower    upper
#Block   33.12823 431.0233
#Family  26.34490 146.6648
#units  479.24201 627.7724

The first difference is that we have to specify priors for the coefficients that we would like to estimate (by default fixed effects, the overall intercept for our example, start with a zero mean and very large variance: 10⁶). The phenotypic variance for our response is around 780, which I split into equal parts for Block, Family and Residuals. For each random effect we have provided our prior for the variance (V) and a degree of belief on our prior (n).

In addition to the model equation, name of the data set and prior information we need to specify things like the number of iterations in the chain (nitt), how many we are discarding for the initial burnin period (burnin), and how many values we are keeping (thin, every ten). Besides the pretty plot of the posterior distributions (see previous figure) they can be summarized using the posterior mode and high probability densities.

One of the neat things we can do is to painlessly create functions of variance components and get their posterior mode and credible interval. For example, the heritability (or degree of additive genetic control) can be estimated in this trial with full-sib families using the following formula: \( \hat{h^2} = \frac{2 \sigma_F^2}{\sigma_F^2 + \sigma_B^2 + \sigma_R^2} \)

h2 <- 2 * m4$VCV[, 'Family']/(m4$VCV[, 'Family'] +
          m4$VCV[, 'Block'] + m4$VCV[, 'units'])
plot(h2)

posterior.mode(h2)
#0.1887017

HPDinterval(h2)
#          lower     upper
#var1 0.05951232 0.3414216
#attr(,"Probability")
#[1] 0.95

There are some differences on the final results between ASReml-R/lme4 and MCMCglmm; however, the gammas (ratios of variance component/error variance) for the posterior distributions are very similar, and the estimated heritabilities are almost identical (~0.19 vs ~0.21). Overall, MCMCglmm is a very interesting package that covers a lot of ground. It pays to read the reference manual and vignettes, which go into a lot of detail on how to specify different types of models. I will present the MCMCglmm bivariate analyses in another post.

P.S. There are several other ways that we could fit this model in R using a Bayesian approach: it is possible to call WinBugs or JAGS (in Linux and OS X) from R, or we could have used INLA. More on this in future posts.

asreml, bayesian, linear models, lme4, MCMCglmm, rblogs

9 Comments

Ankur
2011-11-04 at 20:25

Hello ! I stumbled onto your blog since it was recommended on inside-r ! I absolutely love most of the examples that you've provided from the simple to the most complex! Being a newbie to this fascinating world of data mining and stats, I'm trying to get a grasp on Bayesian stats including MCMC and their applications. Is there an online resource you can recommend where I could get some further grounding in this subject ? I'm sure you might get these questions a lot !! But I really do need some guidance !! Thank you.
- Luis (Post author)
  2011-11-04 at 22:46
  
  Hi Ankur,
  
  The short answer is I don't know; I've struggled to find something readable (either online or offline). My approach to learning is to grab one of my problems and start there trying to apply the new techniques. I find most material on Bayesian stats very hard to digest and the only book that I've enjoyed reading is Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill. The nice thing is that they start with a 'traditional approach' and then they move to Bayesian stats.
  - Ankur
    2011-11-05 at 00:19
    
    Thanks very much! I'll definitely give it a go ! It's tough getting access to some of these books which is why I usually tend to leand towards online resources. Hope to see some more posts on Bayesian stats !!
Antonio Pedro Ramos
2011-11-07 at 16:47

Nice post! Does this package handle (well) longitudinal structures? I've done some stuff on Jags/Winbugs myself byt I have been strugling to find good R libraries on longitudinal analysis within the bayesian world. Cheers, Antonio.
- Luis (Post author)
  2011-11-07 at 22:32
  
  Hi (Hola) Antonio,
  
  I think the answer is "some" structures. There are several possible models that could be used for longitudinal analyses (for example, a bit over ten years ago I tried these ones in ASReml—PDF paper 121 KB—in addition to random splines). As far as I know, MCMCglmm can fit only some of those.
  
  As soon as I have some free time (ha!) I will expand some examples for MCMCglmm and INLA (another rather cool Bayesian package). At the moment life has been overtaken by marking exams.
  
  Finally, I'm quite positive about the progress made by Bayesian packages in R during the last two years, so I expect to see a marked improvement on this area.
LarsR
2011-11-08 at 20:30

There is a report on the R-inla website comparing INLA with MCMCglmm for an animal model that I can recommend. See http://www.math.ntnu.no/preprint/statistics/2011/… . Further comparisons on more complex pedigrees might be needed in the future though. It would be nice to have a comparison of the performance forAnimalINLA with other packages such as lme4 and hglm.
- Luis (Post author)
  2011-11-08 at 21:50
  
  Hi Lars,
  
  I had a quick look at that report a couple of months ago and I am running some comparisons for animal models applied to my own data sets (using asreml vs MCMCglmm vs INLA) to see how they fare against each other. I haven't had the time to even explain animal models in this blog but, eventually, I'll get around that.
LarsR
2011-11-08 at 20:31

I would also like to point out that classical frequentist and Bayesians are not the only schools of statistics useful for animal models.
Classical frequentist inference uses marginal likelihood (random effects integrated out) that includes fixed parameters and only the observations are treated as random.
Bayesian inference is a probabilistic framework that combines likelihood and prior information, and treats all parameters and observations as random.
A third important school is the one based on the Extended Likelihood Principle. Assuming simple statistical principles Bjørnstad (1996) showed that all information in the data about the random and fixed effects is included in a joint likelihood including three components: fixed parameters, unobserved random effects, and observations as random. Lee and Nelder's (1996) h-likelihood is an implementation of the Extended Likelihood Principle. The hglm package (that I have developed together with colleagues) is based on h-likelihood theory and allows fitting of animal models.
- Luis (Post author)
  2011-11-08 at 21:54
  
  We'll get to that! I have to acknowledge that I'm unfamiliar with h-likelihood but, once I'm done with marking, I am hoping to have a look at hglm.