This AIC looks way more fun than the other AIC for (soft toy) model selection.
Category: stats (Page 1 of 8)
There is no logical warrant for considering an event known to occur in a given hypothesis, even if infrequently, as disproving the hypothesis.
Joseph Berkson in “Tests of significance considered as evidence”. Journal of the American Statistical Association 37: 325-335.
Over the birdsite dumpster fire. Emily Harvey was asking:
do you know of any good guidelines/advice for what one should do to sense check and make sure they understand any data before using it?
I replied the following:
Typically, I might be very familiar with the type of data and its variables (if it is one of my trials) or chat/email multiple times with the owner of the dataset(s) so I can check that:
- units and recorded values match. If units are mm, for example, the magnitudes should make sense in mm.
- the order of assessments and experimental/sampling design match: people often get lost in trials or when doing data collection, recording the wrong sampling unit codes.
- dates are OK. I prefer 2023-04-07; anyway, this is often a problem when dealing with Excel data.
- if we are using environmental data that it matches my expectation about the site. Have found a few weather station problems doing that, where rainfall was too low (because there was a sensor failure).
- the relationship between variables are OK. Example of problems: tall and too skinny trees, fat and short ones, suspicious (unless broken, etc), diameter under bark smaller than over bark, big etc.
- levels of factor match planned levels (typically there are spelling mistakes and there are more levels). Same issue with locality names.
- map coverage/orientation is OK (sometimes maps are sideways). Am I using the right projection?
- joins retain the appropriate number of rows (I mean table joins using merge or left_join in R, etc).
- Missing values! Are NA coded correctly or with zeros, negative numbers? Are they “random”?
- If longitudinal data: are older observations larger (or do we get shrinking trees?)
- etc
Of course these questions are dataset dependent and need to be adapted to each separate situation. Finally: Do results make any sense?
Null hypotheses of no difference are usually known to be false before the data are collected … when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science
Savage 1957 cited by Nelder 1999 “From Statistics to Statistical Science”. The Statistician 48(2): 257-269.
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
John W Tukey in Sunset Salvo. 1986. The American Statistician 40(1): 72-76.