I was attending a course of Bayesian Statistics where this problem showed up:

There is a number of individuals, say 12, who take a pass/fail test 15 times. For each individual we have recorded the number of passes, which can go from 0 to 15. Because of confidentiality issues, we are presented with rounded-to-the-closest-multiple-of-3 data (\(\mathbf{R}\)). We are interested on estimating \(\theta\) of the Binomial distribution behind the data.

Rounding is probabilistic, with probability 2/3 if you are one count away from a multiple of 3 and probability 1/3 if the count is you are two counts away. Multiples of 3 are not rounded.

We can use Gibbs sampling to alternate between sampling the posterior for the unrounded \(\mathbf{Y}\) and \(\theta\). In the case of \(\mathbf{Y}\) I used:

While for \(\theta\) we are assuming a vague \(\mbox{Beta}(\alpha, \beta)\), with \(\alpha\) and \(\beta\) equal to 1, as prior density function for \(\theta\), so the posterior density is a \(\mbox{Beta}(\alpha + \sum Y_i, \beta + 12*15 – \sum Y_i)\).

I then implemented the sampler as:

And plotted the results as:

Posterior density for Binomials's theta.

Posterior density for [latex]\theta[/latex].

Posterior mass for each rounded observation.

Posterior mass for each rounded observation.

I thought it was a nice, cute example of simultaneously estimating a latent variable and, based on that, estimating the parameter behind it.