When running stats labs I like to allocate a slightly different subset of data to each student, which acts as an incentive for people to do their own work (rather than copying the same results from a fellow student). We also need to be able to replicate the results when marking, so we need a… Continue reading Careless comparison bites back (again)

# Category: pitfalls

## R pitfall #3: friggin’ factors

I received an email from one of my students expressing deep frustation with a seemingly simple problem. He had a factor containing names of potato lines and wanted to set some levels to NA. Using simple letters as example names he was baffled by the result of the following code:

1 2 3 4 5 6 7 8 |
lines = factor(LETTERS) lines # [1] A B C D E F G H... # Levels: A B C D E F G H... linesNA = ifelse(lines %in% c('C', 'G', 'P'), NA, lines) linesNA # [1] 1 2 NA 4 5 6 NA 8... |

The factor has been… Continue reading R pitfall #3: friggin’ factors

## R pitfall #1: check data structure

A common problem when running a simple (or not so simple) analysis is forgetting that the levels of a factor has been coded using integers. R doesn't know that this variable is supposed to be a factor and when fitting, for example, something as simple as a one-way anova (using lm()) the variable will be… Continue reading R pitfall #1: check data structure