notes in a shoebox

Quantum Forest

Category: rblogs Page 1 of 21

Creating an n x n autocorrelation matrix

Between covid-19 news and announcements of imminent Russia-Ukraine wars I needed a bit of a distraction. Sooo, here it is how to create an n x n autocorrelation matrix based on a correlation rho, with a simple 5 x 5 example in R:

This produces the following output:

How does this work? Starting from defining n as 5, and the basic rho correlation as 0.9, then mat <- diag(n) creates a diagonal matrix—a square matrix with ones in the diagonal and zeros elsewhere—with 5 rows and 5 columns.

Metal sunflower

The beauty of code vectorisation

I came across this problem in Twitter:

Description of the problem by Fermat’s Library.

The basic pieces of the problem are:

  • We need to generate pseudorandom numbers with an identical distribution, add them up until they go over 1 and report back how many numbers we needed to achieve that condition.
  • Well, do the above “quite a few times” and take the average, which should converge to the number e (2.7182…).
Screenshot of problem 1, Advent of Code 2021

Recreational programming

I think programming, aka coding, is a fun activity. We are solving a problem subject to a set of constraints that can be time, memory, quantity of code, language, etc. Besides being a part of my work, coding is also a good distraction when doing it for the sake of it.

In this type of situation I like to set myself additional constraints. For example, I try to use only base R or a very minimal set of features. Moreover, I prefer vectorised code: there are no explicit loops in the code. For toy problems this restriction doesn’t make much of a difference in execution time, but in real problems R (and Matlab) run much more slowly with non-vectorised code.

Implementing a model as an R package

In our research group we often have people creating statistical models that end up in publications but, most of the time, the practical implementation of those models is lacking. I mean, we have a bunch of barely functioning code that is very difficult to use in a reliable way in operations of the breeding programs. I was very keen on continue using one of the models in our research, enough to rewrite and document the model fitting, and then create another package for using the.model in operations.

Unfortunately, neither the data nor the model are mine to give away, so I can’t share them (yet). But I hope these notes will help you in you are in the same boat and need to use your models (or ‘you’ are in fact future me, who tend to forget how or why I wrote code in a specific way).

Reading a folder with many small files

One of the tools we use in our research is NIR (Near-Infrared Spectroscopy), which we apply to thousands of samples to predict their chemical composition. Each NIR spectrum is contained in a CSV text file with two numerical columns: wavelength and reflectance. All files have the same number of rows (1296 in our case), which corresponds to the number of wavelengths assessed by the spectrometer. One last thing: the sample ID is encoded in the file name.

As an example, file A1-4-999-H-L.0000.csv’s contents look like:

Page 1 of 21

Powered by WordPress & Theme by Anders Norén