Category: pitfalls

Gratuitous image: Tree spread on metal frame to provide shade in a plaza, Lisbon, Portugal. Some days I would love to have a coffee there without computer, just watching the world pass by. (Photo: Luis).

Careless comparison bites back (again)

2012-08-07 / Luis

When running stats labs I like to allocate a slightly different subset of data to each student, which acts as an incentive for people to do their own work (rather than copying the same results from a fellow student). We also need to be able to replicate the results when marking, so we need a record of exactly which observations were dropped to create a particular data set. I have done this in a variety of ways, but this time I opted for code that looked like:

setwd('~/Dropbox/teaching/stat202-2012')

biom <- read.csv('biom2012.csv', header = TRUE)
drops <- read.csv('lab4-dels.csv', header = TRUE)

# Use here your OWN student code
my.drop <- subset(drops, student.code == 'mjl159')
my.data <- subset(biom, !(id %in% my.drop))

R pitfall #3: friggin’ factors

2011-12-15 / Luis

I received an email from one of my students expressing deep frustration with a seemingly simple problem. He had a factor containing names of potato lines and wanted to set some levels to NA. Using simple letters as example names he was baffled by the result of the following code:

lines <- factor(LETTERS)
lines
# [1] A B C D E F G H...
# Levels: A B C D E F G H...

linesNA <- ifelse(lines %in% c('C', 'G', 'P'), NA, lines)
linesNA
#  [1]  1  2 NA  4  5  6 NA  8...

R pitfall #1: check data structure

2011-10-05 / Luis

A common problem when running a simple (or not so simple) analysis is forgetting that the levels of a factor has been coded using integers. R doesn’t know that this variable is supposed to be a factor and when fitting, for example, something as simple as a one-way anova (using lm()) the variable will be used as a covariate rather than as a factor.

There is a series of steps that I follow to make sure that I am using the right variables (and types) when running a series of analyses. I always define the working directory (using setwd()), so I know where the files that I am reading from and writing to are.
Continue reading