# Category: rblogs

I started following the debate on differential minimum wage for youth (15-19 year old) and adults in New Zealand. Eric Crampton has written a nice series of blog posts, making the data from Statistics New Zealand available. I will use the nzunemployment.csv data file (with quarterly data from March 1986 to June 2011) and show an example of multiple linear regression with autocorrelated residuals in R.

A first approach could be to ignore autocorrelation and fit a linear model that attempts to predict youth unemployment with two explanatory variables: adult unemployment (continuous) and minimum wage rules (categorical: equal or different). This can be done using:

A common problem when running a simple (or not so simple) analysis is forgetting that the levels of a factor has been coded using integers. R doesn’t know that this variable is supposed to be a factor and when fitting, for example, something as simple as a one-way anova (using `lm()`) the variable will be used as a covariate rather than as a factor.

There is a series of steps that I follow to make sure that I am using the right variables (and types) when running a series of analyses. I always define the working directory (using `setwd()`), so I know where the files that I am reading from and writing to are.

There are circumstances when one wants to generate all possible combinations of levels for two factors. For example, factor one with levels ‘A’, ‘B’ and ‘C’, and factor two with levels ‘D’, ‘E’, ‘F’. The function `expand.grid()` comes very handy here:

Omitting the variable names (factor1 and factor 2) will automatically name the variables as Var1 and Var2. Of course we do not have to use letters for the factor levels; if you have defined a couple of factors (say Fertilizer and Irrigation) you can use `levels(Fertilizer)` and `levels(Irrigation)` instead of LETTERS…

When processing data it is common to test if an observation belongs to a set. Let’s suppose that we want to see if the sample code belongs to a set that includes A, B, C and D. In R it is easy to write something like:

Now, what happens if what we want are the observations that are not in that set? Simple, we use the negation operator (!) as in:

In summary, surround the condition by !().

Page 21 of 21