Priors for variance parameters in hierarchical models

There’s an argument over at Andrew Gelman’s blog about the proper way to design a variance prior in a hierarchical normal model (here and here). Since this is more or less my go-to approach to meta-analysis (e.g. Karr, Areshenkoff, Duggan, and Garcia-Barrera, 2014; Smart, Karr, Areshenkoff, Rabin, Hudon, …, Hampel, 2017), and since I’ve used both kinds of prior, this is a pretty important issue to me, so I thought I’d play around with it, though I haven’t had much time to interpret the results.

The model

The model is a common one in meta-analysis. As a simple example, say we observe N Cohen’s d effects sizes y_1,y_2,\dots,y_N, with associated variances v_1^2,v_2^2,\dots,v_N^2, given by

    \[           v^2 = \frac{n_c + n_t}{n_c n_t} + \frac{y^2}{2(n_c + n_t - 2)}      \]

where n_c and n_t are the sample sizes of the control and treatment groups, respectively. The model is then

    \begin{align*}           y_i|\theta_i &\sim \mathrm{N}(\theta_i,v_i^2) \\           \theta_i|\mu,\tau &\sim \mathrm{N}(\mu,\tau^2)      \end{align*}

This is essentially what is usually called a random-effects meta-analysis, which is made more clear if we marginalize over the study effects to get

    \[ y_i|\mu,\tau \sim \mathrm{N}(\mu, \tau^2 + v_i^2). \]

It’s usually better to fit this model in a Bayesian way, for three reasons: it gives us better effect estimates at the study level; it makes it easier to extend the model to more complex designs; and, finally, when the number of observations is small, \tau is generally only weakly identified, and so the model needs a fair bit of regularization, which is easiest with a strong prior.

The main issue with \tau is that variance parameters for latent effects often have likelihoods which butt up against zero, especially when the number of effects is small. This means that extremal estimates (like MLEs or MAPs) can often be exactly zero, which causes problems for the study level effects. It also causes problems for MCMC in the region of \tau = 0, often spitting out divergences in Stan, or no errors at all in languages like JAGS that don’t report errors. For this reason, it’s been suggested to use a boundary avoiding prior — one which places zero mass at \tau = 0. Something like a gamma distribution, as opposed to a half-normal or Cauchy.


I simulated some data from the model as follows:

  1. Set parameters (\mu,\tau), number of studies N, and study sample size(s) (n_c,n_t)_i.
  2. Draw (\theta_1,\dots,\theta_N) \sim \mathrm{N}(\mu,\tau).
  3. For i = \{1,\dots,N\}, compute y_i = \theta_i + e_i, where e_i \sim \mathrm{N}(0,v_i).
  4. Fit model to (y_1,v_1,\dots,y_N,v_N).

Here, I set \mu = 0, \tau = .25, and varied N in the range (10,30) and n_c=n_t in the range (5,20).

For the simulation, we set \mu = 0 and n^C_i = n^T_i \sim \mathrm{uniform(10,20)}, and vary \tau and N. For the priors, I use a half Cauchy prior with scales \sigma \in \{.1,.4,.7,1\}, or a gamma prior with parameters (k,(k-1)/.25) for k \in \{5,10,15,20\}, as suggested by Daniel Lakeland, which has a mode at .25, and becomes more tightly concentrated as k increases. This isn’t entirely fair, since we don’t know the true \tau in practice, but I haven’t had time to run a more thorough simulation yet.

For each parameter combination, I simulated 50 datasets and fit a non-centered parametrization of the model in Stan. For each fit, a recorded the number of divergent transitions, the posterior mean and .025 quantile of \tau, and RMSE of the posterior mean study effects \hat{\theta}_i.

The results for the Cauchy prior are:

The results for the gamma prior are:


Karr, J. E., Areshenkoff, C. N., Duggan, E. C., & Garcia-Barrera, M. A. (2014). Blast-related mild traumatic brain injury: a Bayesian random-effects meta-analysis on the cognitive outcomes of concussion among military personnel. Neuropsychology review, 24(4), 428-444.

Smart, C. M., Karr, J. E., Areshenkoff, C. N., Rabin, L. A., Hudon, C., Gates, N., … & Hampel, H. (2017). Non-pharmacologic interventions for older adults with subjective cognitive decline: systematic review, meta-analysis, and preliminary recommendations. Neuropsychology review, 27(3), 245-257.