Quantum Forest

notes in a shoebox

Overlay of design matrices in genetic analysis

I’ve ignored my quantitative geneticist side of things for a while (at least in this blog) so this time I’ll cover some code I was exchanging with a couple of colleagues who work for other organizations.

It is common to use diallel mating designs in plant and tree breeding, where a small number of parents acts as both males and females. For example, with 5 parents we can have 25 crosses, including reciprocals and selfing (crossing an individual with itself). Decades ago this mating design was tricky to fit and, considering an experimental layout with randomized complete blocks, one would have something like y = mu + blocks + dads + mums + cross + error. In this model dads and mums were estimating a fraction of the additive genetic variance. With the advent of animal model BLUP, was possible to fit something like y = mu + blocks + individual (using a pedigree) + cross + error. Another less computationally demanding alternative (at least with unrelated parents) is to fit a parental model, overlaying the design matrices for parents with something like this y = mu + blocks + (dad + mum) + cross + error.

The following code simulate data for a a diallel trial with three replicates and runs a parental model with ASReml. Later I expand the analysis building the matrices by hand.

How is the matrix overlay working? We can replicate the calculations used by ASReml by building the matrices from scratch and reusing the variance components, so we avoid the nastiness of writing code for residual maximum likelihood. Once I build the basic matrices I use the code from my How does a linear mixed model look like? post.

The overlay matrix Zparents is double the actual value we should use:

Repeating the analyses ‘by hand’ using Zparents2 to build the Z matrix results in the same overall mean and block effects, but the predicted breeding values for parents when using Zparents are 0.7 of the predicted breeding values for parents when using Zparents2. I may need to refit the model and obtain new variance components for parents when working with the correct overlaid matrix.

Gratuitous picture: walking in Quail Island (Photo: Luis).

8 Comments

  1. I have question, hopefully you can help, how to estimate the breeding values for some lines using Asreml-R, given that we have the kinship matrix ( from molecular markers) and the lines were evaluated in single replicate over number of environments.
    in other words how to include the covariance among environment as well as the the covariance among lines.
    please refer to this example:
    dat <- data.frame(env = c(10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,10,11,12,13,14,15,16,17,18,
    19,20,21,22,23,24,25,26,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,4,5,6,7,8,9,10,11,
    12,13,14,15,16,17,26,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),
    gen = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,
    2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,
    4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5),
    y = c(63,72,69,55,52,67,63,55,66,57,69,73,68,64,67,70,70,64,82,76,53,59,62,74,54,66,54,
    64,69,66,79,66,66,65,75,72,55,52,79,72,83,78,40,54,65,73,66,59,67,55,85,77,51,
    49,80,65,88,86,63,53,68,63,52,60,49,40,57,53,74,71,46,44,62,51,72,70,60,51,65,59,51,66))

    • Luis

      2012/10/13 at 12:59 pm

      Your question has nothing to do with this post. Anyway, there should be zero environmental covariance (as they are different sites) and you could model the genetic covariance matrix using a multivariate approach as I explained here. I recommend that you go to the asreml forum for more help.

      • Hi:
        thank you for your reply —- and i am so sorry for asking irrelevant question to your post. however it was not clear for me were to post my question. i would like to mention also that i never used ASreml before, so excuse me if my question was a little stupid.

        regarding your reply:
        let me make my point more clear
        in plant breeding we are trying to use many locations to evaluate hundreds (if not thousands of lines). in some cases we cannot plant all the lines in each location, but we have some lines overlap from one location to anther. In the same time we can estimate the covariance matrix among all lines using molecular marker data. here is what i have in mind

        asreml(yield ~ Site,
        random =~fa(Site):ped(Genotype) + fa(Site):ide(Genotype),
        rcov =~at(Site):ar1(Column):ar1(Row), ## covariance structure
        ginverse =~list(Genotype=ainv), data=qb)
        is that make sense:
        i will appreciate your comments.

        • Luis

          2012/10/18 at 1:32 pm

          In addition to my previous answer, which included “I recommend that you go to the asreml forum for more help”, I would also like to point 5 in this page.

  2. Odilon Morais

    2014/03/15 at 6:05 pm

    Dear Luiz,

    I am conducting a diallel analysis, and would to employ the mixed models based on Bayesian approach with MCMCglmm package.
    But I’m struggling to define the model because of the sources of variation dad and mum, which must be defined in the model together so that a single variance component is obtained for both (dad and mum), as you defined in the your model, observed in the model m1 detailed below.

    # Fitting the model with ASReml
    library(asreml)
    m1 <- asreml(yield ~ 1,
    random = ~ block + dad + and(mum) + mate,
    data = trial)
    summary(m1)
    # gamma component std.error z.ratio
    #block!block.var 1.299110e-02 3.861892e-01 1.588423e+00 0.2431274
    #dad!dad.var 2.101417e-01 6.246930e+00 5.120745e+00 1.2199259
    #mate!mate.var 4.589938e-07 1.364461e-05 2.340032e-06 5.8309519
    #R!variance 1.000000e+00 2.972722e+01 5.098177e+00 5.8309519

    But I did not get information about this possibility for MCMCglmm package in R. It would be possible to set the mcmc model for MCMCglmm also detailed below, in the same way that is set to m1 model, the ASReml?

    # Fitting the model with MCMCglmm
    library(MCMCglmm)
    prior.mcmc <- list(R = list(V = 1e-16, nu = -2),
    G = list(G1 = list(V = 1e-16, nu = -2),
    G2 = list(V = 1e-16, nu = -2),
    G3 = list(V = 1e-16, nu = -2)))

    mcmc = MCMCglmm(yield ~ 1,
    random = ~ block + dad + mum + mate,
    family = 'gaussian',
    data = trial,
    prior = prior.mcmc,
    verbose = FALSE,
    pr = TRUE,
    burnin = 10000,
    nitt = 20000,
    thin = 10)
    summary(mcmc)

    Already very grateful for your attention!

    • Luis

      2014/03/15 at 7:27 pm

      Hi Odilon,

      Luis (with s) here. While in this post I discuss the issue of overlapping matrices in the context of a diallel cross, I also point out that “With the advent of animal model BLUP, was possible to fit something like y = mu + blocks + individual (using a pedigree) + cross + error”. Fitting separate mums and dads predates the use of animal models, it is old-fashioned (we’re talking 30-40 years old) and much more complicated. I have code for the analysis of diallels in asreml here. In the case of MCMCglmm there are a few differences, starting with the pedigree terms HAS to be called animal. You could set up something like:

      prior = list(R = list(V = 0.007, n = 0),
      G = list(G1 = list(V = 0.002, n = 0), G2 = list(V = 0.001, n = 0), G3 = list(V = 0.001, n = 0))))

      # I use high thinning to avoid autocorrelation with animal model
      nvel.bayes <- MCMCglmm(nvel ~ 1, random = ~ animal + Block + Family, family = ‘gaussian’, data = ncs, pedigree = ped, prior = prior, verbose = FALSE, pr = TRUE, burnin = 10000, nitt = 200000, thin = 200) ped is a data frame with the pedigree, with columns animal, Mum, Dad. This should fit the additive and family effects in a randomized complete block design. Reciprocal effects are home work. ๐Ÿ˜‰

      • Odilon Morais

        2014/03/28 at 8:00 pm

        Hi Luis,

        Thank you very much by the answer.

        Excuse the ignorance on the subject, but I am now initiating studies on Bayesian inference. I’d like to ask a question:
        I want to define a priori information, with normal distribution for the data, in MCMCglmm package. So, what are the values โ€‹โ€‹that should be assigned to “V” ans “n” in argument “list (V = …. n …. =)” ??

        Already thank you very much!!

  3. Nice post. If you use set.seed(), your example will be reproducible.

    Question: Are the calculated breeding values estimates of the u.g effects?

Leave a Reply

© 2015 Quantum Forest

Theme by Anders NorenUp ↑