Categories
flotsam python r rblogs stats

Flotsam 13: early July links

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon:

Sometimes people are truthful and cruel. Here Gappy on a mission goes for the jugular:

https://twitter.com/gappy3000/status/354063247814561792

Over and out.

Categories
flotsam rblogs

Flotsam 12: early June linkathon

A list of interesting R/Stats quickies to keep the mind distracted:

  • A long draft Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi, in which he uses R to drive home the message. Not your average elementary point of view.
  • Good notes by Frank Davenport on starting using R with data from a Geographic Information System (GIS). Read this so you get a general idea of how things fit together.
  • If you are in to maps, Omnia sunt Communia! provides many good tips on producing them using R.
  • Mark James Adams reminded us that Prediction ? Understanding, probably inspired by Dan Gianola‘s course on Whole Genome Prediction. He is a monster of Bayesian applications to genetic evaluation.
  • If you are in to data/learning visualization you have to watch Bret Victor’s presentation on Media for thinking the unthinkable. He is so far ahead what we normally do that it is embarrassing.
  • I follow mathematician Atabey Kaygun in twitter and since yesterday I’ve been avidly reading his coverage of the protests in Turkey. Surely there are more important things going on in the world than the latest R gossip.

I’m marking too many assignments right now to have enough time to write something more substantial. I can see the light at the end of the tunnel though.

Categories
books flotsam rblogs sas

Flotsam 11: mostly on books

‘No estaba muerto, andaba the parranda’ as the song says. Although rather than partying it mostly has been reading, taking pictures and trying to learn how to record sounds. Here there are some things I’ve come across lately.

I can’t remember if I’ve recommended Matloff’s The Art of R Programming before; if I haven’t, go and read the book for a good exposition of the language. Matloff also has an open book (as in free PDF, 3.5MB) entitled ‘From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science’. The download link is near the end of the page. He states that the reader ‘must know calculus, basic matrix algebra, and have some minimal skill in programming’, which incidentally is the bare minimum for someone that wants to get a good handle on stats. In my case I learned calculus partly with Piskunov’s book (I’m a sucker for Soviet books, free DjVu), matrix algebra with Searle’s book and programming with… that’s another story.

I’ve ordered a couple of books from CRC Press, which I hope to receive soon (it depends on how long it takes for the parcel to arrive to the middle of nowhere):

  • Stroup’s Generalized Linear Mixed Models: Modern Concepts, Methods and Applications, which according to the blurb comes ‘with numerous examples using SAS PROC GLIMMIX’. You could be wondering Why is he reading a book that includes SAS as a selling point? Well, SAS is a very good statistical thinking that still has a fairly broad installed based. However, the real selling point is that I’ve read some explanations on mixed models written by Stroup and he has superb understanding of the topic. I’m really looking forward to put my paws on this book.
  • Lunn et al.’s The BUGS Book: A Practical Introduction to Bayesian Analysis. I don’t use BUGS but occasionally use JAGS and one of the things that irks me of programs like BUGS, JAGS or INLA is that they follow the ‘here is a bunch of examples’ approach to documentation. This books is supposed to provide a much more detailed account of the ins and outs of fitting models and a proper manual. Or at least that’s what I’m hoping to find in it.

Finally, a link to a fairly long (and somewhat old) list of R tips and the acknowledgements of a PhD thesis that make you smile (via Arthur Charpentier).

Gratuitous picture: frozen fence (Photo: Luis, click to enlarge).
Gratuitous picture: frozen fence (Photo: Luis, click to enlarge).

‘He was not dead, he was out partying’.

Categories
flotsam python

Pythonic links

Before I forget: a few links about starting up in Python for scientific projects:

Now if we had a great Python library for linear mixed models life would be easier.

Categories
flotsam photos rblogs

Mid-September flotsam

This is one of those times of the year: struggling to keep the head above the water, roughly one month before the last lecture of the semester. On top trying to squeeze trips, meetings and presentations in between while dealing with man flu.

Gratuitous picture: looking for peace in Japan (Photo: Luis).
Categories
flotsam photos rblogs stats

Mid-August flotsam

Reached mid-semester point, with quite a few new lectures to prepare. Nothing extremely complicated but, as always, the tricky part is finding a way to make it meaningful and memorable. Sometimes, and this is one of those times, I sound like a broken record but I’m a bit obsessive about helping people to ‘get’ a topic.

Gratuitous picture: Lola, Lisbon, Portugal(Photo: Luis).
Categories
flotsam photos rblogs

Early August flotsam

Back teaching a couple of subjects and it’s the constant challenge to find enough common ground with students so one can push/pull them to the other side of new concepts. We are not talking about complex hierarchical models using mixed models or Bayesian approaches, but multiple linear regression or similar. What do students actually learn in first year stats…?

  • I’m enjoying reading Machine Learning for Hackers by Drew Conway and John Myles White. There isn’t a lot of stuff new for me in the book—although working with text is not something I usually do—but I have chosen to read the book with newbie eyes. I’m (repeating myself) looking for enough common ground with students so one can push/pull them to the other side of new concepts and, let’s face it, I was 20 quite a few years ago.
  • Observation on teaching a lab for STAT202, in which many students are using R for the first time. Do you remember your first steps in S+/R? Some students see the light quickly while others are struggling to get their heads around giving commands to a computer (without clicking on icons).
  • Videos and screencasts on using IPython via Vince Buffalo.
  • This tweet by @isomorphisms resonated with me: ‘Someday I hope to be reading more Penguin Classics than John Wileys & Springer Verlags’.
  • Tom points to an explanation of ‘What really shoots out of spiderman’s modified forelimbs, and why this causes such consternation’.
  • I have to convince College IT guys to install R-Studio in a few hundred computers. R-Studio is becoming better all the time, making it obscene to subject students to the naked R for Windows installation without syntax highlighting.
  • Finally, reasons why men should not write advice columns via Arthur Charpentier.
Derelict house in Sintra, Portugal (Photo: Luis).
Categories
flotsam rblogs

End of May flotsam

The end is near! At least the semester is coming to an end, so students have crazy expectations like getting marks back for assignments, and administrators want to see exam scripts. Sigh! What has been happening meanwhile in Quantum Forest?

  • Tom cracked me up with “…my data is so fucking patchy. I’m zipoissoning the place up like a motherfucker, or something”. I probably need to embark in some zipoissoning, and he was kind enough to send me some links.
  • People keep on kicking this guy called “p-value” when he is still unconscious on the floor. Bob O’Hara declares that p-values are evil. Not funny! John Cook reminds us that “The language of science is the language of probability, and not of p-values.” —Luis Pericchi”. Actually, these days the language of Science is English or whatever passes for English in a press release.
  • Discussion with Mark about the canonical pronunciation for MCMCglmm: mac-mac-glim, em-see-em-see-glim or Luis’s dumb em-see-em-see-gee-el-em-em. We need a press release from Jarrod Hadfield to clear the air!
  • RStudio now supports knitr; I’m looking forward to being able to send email from it. Wait, then it would be like a pretty Emacs.
Unfortunately named Fiat dealer in Southern Brazil. Ideal if you want to zipoisson your way around. Locals told me that it was a German surname, pronounced Fook. Mmh. (Photo: Luis)
  • Did you know? There is life beyond R. Pandas keeps on growing (if Python is your thing). Douglas Bates keeps on digging Julia. I ‘discovered’ Bartosz Milewski‘s blog, which I enjoy reading although I understand a small fraction of what he’s taking about. I came across Bartosz while looking for information on using supercomputers.
  • Data points: “How do you know you have an ageing economy? Adult nappy sales are more than kids’ nappy sales. That’s Japan now.” tweeted Bernard. “Crap!” was my reaction (nappy = diaper for US readers).
  • Feeling frustrated using R? Just go for some language Schadenfreude at Abandon Matlab.
  • Going to Auckland? Our agent Mike has just the place to go “La Voie Française (875 Dom Rd) is worth a trip. Great baguettes, $2 flaky croissants, queue out the door”.
  • Still shaking in Christchurch. Last Monday I was teaching while we had a 5.2 magnitude quake; we kept on going with the lecture.

And that’s my view of the month from down under Christchurch.

Categories
flotsam julia python r rblogs

Late-April flotsam

It has been month and a half since I compiled a list of statistical/programming internet flotsam and jetsam.

  • Via Lambda The Ultimate: Evaluating the Design of the R Language: Objects and Functions For Data Analysis (PDF). A very detailed evaluation of the design and performance of R. HT: Christophe Lalanne. If you are in statistical genetics and Twitter Christophe is the man to follow.
  • Attributed to John Tukey, “without assumptions there can be no conclusions” is an extremely important point, which comes to mind when listening to the fascinating interview to Richard Burkhauser on the changes of income for the middle class in USA. Changes to the definition of the unit of analysis may give a completely different result. By the way, does someone have a first-hand reference to Tukey’s quote?
  • Nature news publishes RNA studies under fire: High-profile results challenged over statistical analysis of sequence data. I expect to see happening more often once researchers get used to upload the data and code for their papers.
  • Bob O’Hara writes on Why simple models are better, which is not positive towards the machine learning crowd.
  • A Matlab Programmer’s Take On Julia, and a Python developer interacts with Julia developers. Not everything is smooth. HT: Mike Croucher. ?
  • Dear NASA: No More Rainbow Color Scales, Please. HT: Mike Dickinson. Important: this applies to R graphs too.
  • Rafael Maia asks “are programmers trying on purpose to come up with names for their languages that make it hard to google for info?” These are the suggestions if one searches Google for Julia:

    Unhelpful search suggestions.
  • I suggest creating a language called Bieber and search for dimension Bieber, loop Bieber and regression Bieber.

That’s all folks.

Categories
flotsam python r rblogs

Early-March flotsam

It has been a strange last ten days since we unexpectedly entered grant writing mode. I was looking forward to work on this issue near the end of the year but a likely change on funding agency priorities requires applying in a few weeks; unfortunately, it means that all this is happening at the same time I am teaching.

  • As usual I got involved in a strange, for me, project which will require semantic analysis of international treaties. I will start having a look at Latent Semantic Analysis using lsa in R and gensim in Python. I’ll have to retrieve documents from the web and process them in quite a few ways.
  • The success of some of Hadley Wickham’s packages got me thinking about underlying design issues in R that make functions so hard to master for users. Don’t get me wrong, I still think that R is great, but why are there so many problems to understand part of the core functionality? A quick web search will highlight that there is, for example, an incredible amount of confusion on how to use the apply family of functions. The management of dates and strings is also a sore point. I perfectly understand the need for, and even the desirability of, having new packages that extend the functionality of R. However, this is another kettle of fish; we are talking about making sane design choices so there is no need to repackage basic functionality to make it usable.
  • Talking about failures, Andrew Gelman mentions the sempiternal problem of designers Turn(ing) a Boring Bar Graph into a 3D Masterpiece or, as a commenter put it, “Turn(ing) a Boring Bar Graph into a 3D Pile of Steaming Crap”. While it is always easy to have a laugh on designers, we should remember that the abundance of 3D piles[…] also reflects our failure to make the point on good data presentation clear. Well, that and the spawn of evil Microsoft Excel and PowerPoint.
  • Gratuitous picture of a vegetarian friend: caterpillar of Emperor gum moth (Photo: Luis).
  • Beware if you are going out for dinner with vegetarian friends. Besides vegetarians having an inordinate influence on the choice of restaurant you may end up subsidizing their meals. HT: @EricCrampton.
  • A cool collection of movie snippets that display mathematics. HT: @Freakonometrics.
  • http://numfocus.org a foundation for supporting scientific computing in Python. HT: @teoliphant.
  • Andrew Gelman again, this time pointing out to Thaddeus Tarpey’s presentation All models are right… most are useless (PDF), focusing on the positive aspects of model approximation.

And that’s all folks.

Categories
flotsam photos python r rblogs

Mid-February flotsam

This coming Monday we start the first semester in Canterbury (and in New Zealand for that matter). We are all looking forward to an earthquake-free year; more realistically, I’d be happy with low magnitude aftershocks.

  • The Wall Street Journal reports that more pediatricians are ‘firing’ patients that refuse to use vaccines. I’m wondering about practices that will cluster with ‘vaccine refusers’.
  • I am collaborating with a researcher in Electrical Engineering where he and his students develop very cool tools for us (see example in this previous post on dealing with autocorrelation in mixed models). They use Python to control the tools, data extraction and do some basic processing (isn’t that cool?). Python + Scipy have moved a lot towards creating a nice environment for scientific computing; however, in my opinion setting an R environment is way easier than dealing with all the versions for python, wxpython, etc. At the end I went for the 32 bit version in OS X because 64 bit is not supported (wxpython targets carbon, meh).
  • Why not publish your data too? HT: @chlalanne. I’ll be uploading datasets that I’ve used in my papers (at least the ones that do not contain industry/confidential data).
  • Timandra Harkness asks Have you been seduced by statistics? in Significance Journal (free access). In the same issue Matt Briggs asks another important question Why do statisticians answer silly questions that no one ever asks?
  • How companies learn your secrets: the creepy side of statistics/machine learning.
  • Feeling all smug in a discussion in Engineering: people struggling with software licensing (Office, Matlab, etc). I pointed out that we didn’t have any licensing issues with R to be at the bleeding edge when teaching. Take that!
  • In the nothing-to-do with-stats category, Retronaut displays a beautiful collection of Soviet space propaganda posters. HT: can’t remember.
Supermarkets in New Zealand, Are they creepy too? Checkouts at Pak N Save, Christchurch (Photo: Luis).

I’ve been running analyses for two to three papers, mostly using R + asreml-R + ggplot2 + plyr. I hope to write one of them using XeLaTeX (I’ll be the sole author), while in the other(s) I am condemned to MS Word (!). Thinking of journals to send the papers to, with the most liberal copyright as possible (hard in forestry).

P.S. Flotsam posts compile small bits of information, twitter favorites, shared links, etc.

Categories
flotsam r rblogs teaching

Early-February flotsam

Mike Croucher at Walking Randomly points out an interesting difference in operator precedence for several mathematical packages to evaluate a simple operation 2^3^4. It is pretty much a divide between Matlab and Excel (does the later qualify as mathematical software?) on one side with result 4096 (or (2^3)^4) and Mathematica, R and Python on the other, resulting on 2417851639229258349412352 (or 2^(3^4)). Remember your parentheses…

Corey Chivers, aka Bayesian Biologist, uses R to help students understand the Monty Hall problem. I think a large part of the confusion to grok it stems from a convenient distraction: opening doors. The problem could be reframed as: i- you pick a door (so probability of winning the prize is 1/3) and Monty gets the other two doors (probability of winning is 2/3), ii- Monty is offering to switch all his doors for yours, so switching increases the probability of winning, iii- Monty will never open a winning door to entice the switch, so we should forget about them.

To make the point clearer, let’s imagine now that instead of 3 doors the game has 10 doors. You pick one (probability of winning 1/10) and Monty keeps 9 (probability of winning 9/10). Would you switch one door for nine? Of course! The fact that Monty will open 8 non-winning doors rather than all of his doors does not make a difference in the deal.

[sourcecode language=”R”]
# Number of games and doors
n.games = 10000
n.doors = 10

# Assign prize to door for each game. Remember:
# Monty keeps all doors not chosen by player
prize.door = floor(runif(n.games, 1, n.doors + 1))
player.door = floor(runif(n.games, 1, n.doors + 1))

# If prize.door and player.door are the same
# and player does not switch
are.same = prize.door == player.door
cat(‘Probability of winning by not switching’, sum(are.same)/n.games, ‘
‘)
cat(‘Probability of winning by switching’, (n.games – sum(are.same))/n.games, ‘
‘)
[/sourcecode]

Gratuitous picture: fish in New Brighton pier (Photo: Luis).

Pierre Lemieux reminds us that “a dishonest statistician is an outliar”.

If you want to make dulce de leche using condensed milk—but lack a pressure cooker—use an autoclave for 50 to 60 minutes. HT: Heidi Smith. Geeky and one needs an autoclave worth thousands of dollars, but that’s what universities are for.

Lesser and Pearl inform us that there are at least 20 modalities for making statistics fun in “Functional Fun in Statistics Teaching: Resources, Research and Recommendations”. HT: Chelsea Heaven. I’ve used music, videos, cartoons, jokes, striking examples using body parts, quotations, food, juggling, etc.

An old review of Buddhism without Beliefs: A Contemporary Guide to Awakening by Stephen Batchelor. I can’t see any statistical angle, but I liked that book.

P.S. Awesome video by OK Go HT: Eric Crampton.

Back to quantitative genetics!