Evolving notes, images and sounds by Luis Apiolaza

Category: stats (Page 6 of 8)

R, academia and the democratization of statistics

I am not a statistician but I use statistics, teach statistics and write about applications of statistics in biological problems.

Last week I was in this biostatistics conference, talking with a Ph.D. student who was surprised about this situation because I didn’t have any statistical training. I corrected “any formal training”. On the first day one of the invited speakers was musing about the growing number of “amateurs” using statistics—many times wrongly—and about what biostatisticians could offer as professional value-adding. Yes, he was talking about people like me spoiling the party.
Continue reading

On the (statistical) road, workshops and R

Things have been a bit quiet at Quantum Forest during the last ten days. Last Monday (Sunday for most readers) I flew to Australia to attend a couple of one-day workshops; one on spatial analysis (in Sydney) and another one on modern applications of linear mixed models (in Wollongong). This will be followed by attending The International Biometric Society Australasian Region Conference in Kiama.

I would like to comment on the workshops to look for commonalities and differences. First, both workshops heavily relied on R, supporting the idea that if you want to reach a lot of people and get them using your ideas, R is pretty much the vehicle to do so. It is almost trivial to get people to install R and RStudio before the workshop so they are ready to go. “Almost” because you have to count on someone having a bizarre software configuration or draconian security policies for their computer.
Continue reading

No one would ever conceive

I believe that no one who is familiar, either with mathematical advances in other fields, or with the range of special biological conditions to be considered, would ever conceive that everything could be summed up in a single mathematical formula, however complex.

R.A. Fisher (1932) quoted in the preface to Foundations of Mathematical Genetics by A.W.F. Edwards (1976).

Do we need to deal with ‘big data’ in R?

David Smith at the Revolutions blog posted a nice presentation on “big data” (oh, how I dislike that term). It is a nice piece of work and the Revolution guys managed to process a large amount of records, starting with a download of 70GB and ending up with a series of linear regressions.

I’ve spent the last two weeks traveling (including a visit to the trial below) and finishing marking for the semester, which has somewhat affected my perception on dealing with large amounts of data. The thing is that dealing with hotel internet caps (100MB) or even with my lowly home connection monthly cap (5GB) does get one thinking… Would I spend several months of internet connection just downloading data so I could graph and plot some regression lines for 110 data points? Or does it make sense to run a linear regression with two predictors using 100 million records?

Continue reading

On “true” models

Before starting the description of the probability distributions, we want to impose on the reader the essential feature that a model is an interpretation of a real phenomenon that fits its characteristics to some degree of approximation rather than an explanation that would require the model to be “true”. In short, there is no such thing as a “true model”, even though some models are more appropriate than others!

Jean-Michel Marin and Christian P. Robert in Bayesian Core: a practical approach to computational Bayesian statistics.
« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑