The beauty of code vectorisation

Metal sunflower
Garden wind sculpture

I came across this problem in Twitter:

Description of the problem by Fermat’s Library.

The basic pieces of the problem are:

  • We need to generate pseudorandom numbers with an identical distribution, add them up until they go over 1 and report back how many numbers we needed to achieve that condition.
  • Well, do the above “quite a few times” and take the average, which should converge to the number e (2.7182…).

In most traditional languages, the first step would look more or less like this:

or by setting a “big enough” predefined number of iterations:

We would need to define another loop to run the above code “quite a few times”, which for ten million times (10^7) could look like:

However, we are working with R, which happens to be an array-based (or matrix-based, or vector-based) language and it has functionality for data processing. A first, rough approach could be:

It can be much shorter if, rather than thinking of loops, we think of vectors:

mean(replicate(10^7, min(which(cumsum(runif(100)) > 1, TRUE))))

The code works the following way:

  1. Generate 100 uniformly distributed random numbers (0, 1): runif(100)
  2. Calculate the cumulative sum of the 100 numbers: cumsum()
  3. Check if the cumulative sum is greater than 1: which(cumsum() > 1, TRUE)
  4. Find the minimum position that meets the criterion: min()
  5. Replicate that process ten million times: replicate(10^7, )
  6. Take the average of all the positions: mean()

Half a tweet to solve the problem. Look, ma. No explicit loops!

A half-tweet long code solution.

Leave a comment

Your email address will not be published.