This is simple example code to display side-by-side lattice plots or ggplot2 plots, using the mtcars
dataset that comes with any R installation. We will display a scatterplot of miles per US gallon (mpg) on car weight (wt) next to another scatterplot of the same data, but using different colors by number of engine cylinders (cyl, treated as factor) and adding a smooth line (under the type
option).
Continue reading
Category: rblogs (Page 20 of 22)
I tend not to upgrade R very often—running from 6 months to 1 year behind in version numbers—because I had to reinstall all packages: a real pain. A quick search shows that people have managed to come up with good solutions to this problem, as presented in this stackoverflow thread. I used the code in my mac:
Continue reading
We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed through internet. In summary, we need to access an HTML file, parse it so we can access specific content and then remove the HTML tags. Finally, we may want to replace some text (the end of lines, \n
, for example) before continue processing the files.
The package XML
has the necessary functionality to deal with HTML, while the rest is done using a few standard R functions.
Continue reading
There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any variable with names starting with a capital letter. There are a few issues involved in this problem, including:
- Obtaining a text version of the name of the dataset (using the
substitute()
function). - Looping over the variable names and checking if they start with a capital letter (comparing with the
LETTERS
vector of constants). - Generating the plain text version of the factor conversion, glueing the dataset and variable names (using
paste()
). - Parsing the plain text version of the code to R code (using
parse()
) and evaluating it (usingeval()
). This evaluation has to be done in the parent environment or we will lose any transformation when we leave the function, which is the reason for theenvir()
specification.
Once one starts writing more R code the need for consistency increases, as it facilitates managing larger projects and their maintenance. There are several style guides or suggestions for R; for example, Andrew Gelman’s, Hadley Wickham’s, Bioconductor’s and this one. I tend to write closer to Google’s R style guide, which contains some helpful suggestions. I use something similar but:
Continue reading