There’s an argument over at Andrew Gelman’s blog about the proper way to design a variance prior in a hierarchical normal model (here and here). Since this is more or less my go-to approach to meta-analysis (e.g. Karr, Areshenkoff, Duggan, and Garcia-Barrera, 2014; Smart, Karr, Areshenkoff, Rabin, Hudon, …, Hampel, 2017), and since I’ve used both kinds of prior, this is a pretty important issue to me, so I thought I’d play around with it, though I haven’t had much time to interpret the results.
The model is a common one in meta-analysis. As a simple example, say we observe Cohen’s d effects sizes , with associated variances , given by
where and are the sample sizes of the control and treatment groups, respectively. The model is then
This is essentially what is usually called a random-effects meta-analysis, which is made more clear if we marginalize over the study effects to get
It’s usually better to fit this model in a Bayesian way, for three reasons: it gives us better effect estimates at the study level; it makes it easier to extend the model to more complex designs; and, finally, when the number of observations is small, is generally only weakly identified, and so the model needs a fair bit of regularization, which is easiest with a strong prior.
The main issue with is that variance parameters for latent effects often have likelihoods which butt up against zero, especially when the number of effects is small. This means that extremal estimates (like MLEs or MAPs) can often be exactly zero, which causes problems for the study level effects. It also causes problems for MCMC in the region of , often spitting out divergences in Stan, or no errors at all in languages like JAGS that don’t report errors. For this reason, it’s been suggested to use a boundary avoiding prior — one which places zero mass at . Something like a gamma distribution, as opposed to a half-normal or Cauchy.
I simulated some data from the model as follows:
- Set parameters , number of studies , and study sample size(s) .
- Draw .
- For , compute , where .
- Fit model to .
Here, I set , , and varied in the range and in the range .
For the simulation, we set and , and vary and . For the priors, I use a half Cauchy prior with scales , or a gamma prior with parameters for , as suggested by Daniel Lakeland, which has a mode at , and becomes more tightly concentrated as increases. This isn’t entirely fair, since we don’t know the true in practice, but I haven’t had time to run a more thorough simulation yet.
For each parameter combination, I simulated 50 datasets and fit a non-centered parametrization of the model in Stan. For each fit, a recorded the number of divergent transitions, the posterior mean and quantile of , and RMSE of the posterior mean study effects .
The results for the Cauchy prior are:
The results for the gamma prior are:
Karr, J. E., Areshenkoff, C. N., Duggan, E. C., & Garcia-Barrera, M. A. (2014). Blast-related mild traumatic brain injury: a Bayesian random-effects meta-analysis on the cognitive outcomes of concussion among military personnel. Neuropsychology review, 24(4), 428-444.
Smart, C. M., Karr, J. E., Areshenkoff, C. N., Rabin, L. A., Hudon, C., Gates, N., … & Hampel, H. (2017). Non-pharmacologic interventions for older adults with subjective cognitive decline: systematic review, meta-analysis, and preliminary recommendations. Neuropsychology review, 27(3), 245-257.