I’m the first to acknowledge that most of my code could run faster. The truth of the matter is that, in essence, I write ‘quickies': code that will run once or twice, so there is no incentive to spend days or hours in shaving seconds of a computation. Most analyses of research data fall in to this approach: read data-clean data-fit model-check model-be happy-write article-(perhaps) make data and code available-move on with life.

One of the reasons why my code doesn’t run faster or uses less memory is the trade-off between the cost of my time (very high) compared to the cost of more memory or faster processors (very cheap) and the gains of shaving a few seconds or minutes of computer time, which tend to be fairly little.

In R vectorization is faster than working with each vector element, although it implies allocating memory for whole vectors and matrices, which for large-enough problems may become prohibitively expensive. On the other hand, not vectorizing some operations may turn your problem into an excruciatingly slow exercise and, for example in large simulations, in practice intractable in a useful timeframe.

Dealing with vectors and matrices is like dealing with many chairs simultaneously. Some patterns and operations are easier and faster than others. (Photo: Luis, click to enlarge).

John Muschelli wrote an interesting post reimplementing 2×2 frequency tables for a highly specific use, comparing the results of image processing algorithms. In John’s case, there are two greater-than-9-million-long-elements logical vectors, and when comparing vectors for *many* images the process becomes very slow. He explains part of his rationale in his blog (go and read it), but say that his solution can be expressed like this:

# I don't have any image data, but I'll simulate a couple
# of 10 million long logical vectors (just to round things)
set.seed(2014)
manual = sample(c(TRUE, FALSE), 10E6, replace = TRUE)
auto = sample(c(TRUE, FALSE), 10E6, replace = TRUE)
logical.tab = function(x, y) {
tt = sum(x & y)
tf = sum(x & !y)
ft = sum(!x & y)
ff = sum(!x & !y)
return(matrix(c(ff, tf, ft, tt), 2, 2))
}
logical.tab(manual, auto) |

which uses 1/4 of the time used by `table(manual, auto)`

. Mission accomplished! However, if I stopped here this blog post would not make much sense, simply rehashing (pun intended) John’s code. The point is to explain what is going on and, perhaps, to find even faster ways of performing the calculations. As a start, we have to be aware that the calculations in `logical.tab()`

rely on logical (boolean) operations and on the coercion of logical vectors to numerical values, as stated in the documentation:

Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE being mapped to 1L, FALSE to 0L and NA to NA_integer_.

In R Logical operations can be slower than mathematical ones, a consideration that may guide us to think of the problem in a slightly different way. For example, take the difference between the vectors (`dif = x - y`

), so both `TRUE-TRUE`

(1-1) and `FALSE-FALSE`

(0-0) are 0, while `TRUE - FALSE`

(1-0) is 1 and `FALSE - TRUE`

(0-1) is -1. Therefore:

- the sum of positive values (
`sum(dif > 0)`

) is the frequency of `TRUE & FALSE`

,
- while the sum of negative values (
`sum(dif < 0)`

) is the frequency of `FALSE & TRUE`

.

The values for `TRUE & TRUE`

can be obtained by adding up the element-wise multiplication of the two vectors, as `TRUE*TRUE`

(1*1) is the only product that's different from zero. A vectorized way of performing this operation would be to use `t(x) %*% y`

; however, for large vectors the implementation `crossprod(x, y)`

is faster. The values for `FALSE & FALSE`

are simply the difference of the length of the vectors and the previously calculated frequencies `length(dif) - tt - tf - ft`

. Putting it all together:

basic.tab = function(x, y) {
dif = x - y
tf = sum(dif > 0)
ft = sum(dif < 0)
tt = crossprod(x, y)
ff = length(dif) - tt - tf - ft
return(c(tf, ft, tt, ff))
} |

This code takes 1/20 of the time taken by `table(x, y)`

. An interesting result is that the crossproduct `crossprod(x, y)`

can also be expressed as `sum(x * y)`

, which I didn't expect to be faster but, hey, it is. So we can express the code as:

basic.tab2 = function(x, y) {
dif = x - y
tf = sum(dif > 0)
ft = sum(dif < 0)
tt = sum(x*y)
ff = length(dif) - tt - tf - ft
return(c(tf, ft, tt, ff))
} |

to get roughly 1/22 of the time. The cost of logical operations is easier to see if we isolate particular calculations in logical.tab and basic.tab; for example,

tf1 = function(x, y) {
tf = sum(x & !y)
} |

is slower than

tf2 = function(x, y) {
dif = x - y
tf = sum(dif > 0)
} |

This also got me thinking of the cost of coercion: Would it take long? In fact, coercing logical vectors to numeric has little (at least I couldn't see from a practical viewpoint) if any cost. In some cases relying on logical vectors converted using `as.numeric()`

seems to be detrimental on terms of speed.

As I mentioned at the beginning, vectorization uses plenty of memory, so if we were constrained in that front and we wanted to do a single pass on the data we could write an explicit loop:

loopy.tab = function(x, y) {
tt = 0; tf = 0; ft = 0; ff = 0
for(i in seq_along(x)) {
if(x[i] == TRUE & y[i] == TRUE)
tt = tt + 1
else
if(x[i] == TRUE & y[i] == FALSE)
tf = tf + 1
else
if(x[i] == FALSE & y[i] == TRUE)
ft = ft + 1
else
ff = ff + 1
}
return(matrix(c(ff, tf, ft, tt), 2, 2))
} |

The loopy.tab does only one pass over the vectors and should use less memory as it doesn't need to create those huge 10M elements vectors all the time (at least it would if we used a proper iterator in the loop instead of iterating on a vector of length 10M (that's 40 MB big in this case). The iterators package may help here). We save room/memory at the expense of speed, as `loopy.tab`

is ten times slower than the original `table()`

function. Of course one could run it a lot faster if implemented in another language like C++ or Fortran and here Rcpp or Rcpp11 would come handy (updated! See below).

This is only a not-so-short way of reminding myself what's going on when trading-off memory, execution speed and my personal time. Thus, I am not saying that any of these functions is the best possible solution, but playing with ideas and constraints that one often considers when writing code. Incidentally, an easy way to compare the execution time of these functions is using the `microbenchmark`

library. For example:

library(microbenchmark)
microbenchmark(logical.tab(manual, auto), basic.tab2(manual, auto), times = 1000) |

will spit numbers when running a couple of functions 1,000 times with results that apply to your specific system.

PS 2014-07-12 11:08 NZST Hadley Wickham suggests using `tabulate(manual + auto * 2 + 1, 4)`

as a fast alternative. Nevertheless, I would like to point out that i- the initial tabulation problem is only an excuse to discuss a subset of performance issues when working with R and ii- this post is more about the journey than the destination.

PS 2014-07-13 20:32 NZST Links to two posts related to this one:

- Wiekvoet's work which i- relies on marginals and ii- reduces the use of logical operators even more than my
`basic.tab2()`

, simultaneously taking around 1/4 of the time.
- Yin Zhu's post describing vectors is a handy reminder of the basic building blocks used in this post.

### Updated with Rcpp goodness!

PS 2014-07-13 21:45 NZST I browsed the Rcpp Gallery and Hadley Wickham's Rcpp tutorial and quickly converted `loopy.tab()`

to C++. Being a bit of an oldie I've used Fortran (90, not *that* old) before but never C++, so the following code is probably not very idiomatic.

library(Rcpp)
cppFunction('NumericVector loopyCpp(LogicalVector x, LogicalVector y) {
int niter = x.size();
int tt = 0, tf = 0, ft = 0, ff = 0;
NumericVector tab(4);
for(int i = 0; i < niter; i++) {
if(x[i] == TRUE && y[i] == TRUE)
tt++;
else
if(x[i] == TRUE && y[i] == FALSE)
tf++;
else
if(x[i] == FALSE && y[i] == TRUE)
ft++;
else
ff++;
}
tab[0] = ff; tab[1] = tf; tab[2] = ft; tab[3] = tt;
return tab;
}'
)
loopyCpp(manual, auto) |

Loops and all it runs roughly twice as fast as `basic.tab2()`

but it should also use much less memory.