Quantum Forest

notes in a shoebox

R pitfall #1: check data structure

A common problem when running a simple (or not so simple) analysis is forgetting that the levels of a factor has been coded using integers. R doesn’t know that this variable is supposed to be a factor and when fitting, hygiene for example, sickness something as simple as a one-way anova (using lm()) the variable will be used as a covariate rather than as a factor.

There is a series of steps that I follow to make sure that I am using the right variables (and types) when running a series of analyses. I always define the working directory (using setwd()), so I know where the files that I am reading from and writing to are.

After reading a dataset I will have a look at the first and last few observations (using head() and tail(), which by default show 6 observations). This gives you an idea of how the dataset looks like, but it doesn’t confirm the structure (for example, which variables are factors). The function str() provides a good overview of variable types and together with summary() one gets an idea of ranges, numbers of observations and missing values.

This code should help you avoid the ‘fitting factors as covariates’ pitfall; anyway, always check the degrees of freedom of the ANOVA table just in case.

5 Comments

  1. Kevin Wright

    2011/10/17 at 4:09 pm

    I have a similar sequence of steps, plus one more:

    library(Hmsic)
    describe(apo)

    • Kevin Wright

      2011/10/18 at 8:22 am

      Small typo. In case it is not obvious, here is the correct code:

      library(Hmisc)
      describe(abo)

  2. Luis

    2011/10/17 at 4:15 pm

    Hi Kevin, Nice to hear from you and thanks for the tip.

  3. matthew gushta

    2011/10/27 at 2:50 pm

    similar to kevin, though i prefer this:
    library(psych)
    describe(apo, skew=F)

    also, as a native windows user who copy-pastes directories, i find it easier to add a slash than reverse direction:
    setwd('c:Documentsapophenia')

    • Luis

      2011/10/27 at 3:59 pm

      Thanks Matthew. I used to do the double backslash, but retrained muscle memory to single slash (in OS X) within a week in early 2006. Part of your code was eaten by the commenting system <pre>setwd('c:Documentsapophenia') </pre>

Leave a Reply

© 2017 Quantum Forest

Theme by Anders NorenUp ↑