Doing meta-analysis II

I recently collaborated on a meta-analysis investigating the effects of blast-related (i.e. *BOOM*) mild traumatic brain injury (mTBI) on cognitive performance (Karr, et al. 2014). Each of the eight included studies used control and mTBI groups, and reported means and standard deviations for the outcome measures, and so we used a Cohen’s d effect size

    \[ d = \frac{\bar{x}_1 - \bar{x}_2}{s} \]

    \[ s = \sqrt{\frac{ (n_1 - 1)s^2_1 + (n_2 - 1)s^2_2 }{n_1 + n_2 - 2}} \]

where the variance of d is

    \[ \text{Var}(d) = \left ( \frac{n_1 + n_2}{n_1n_2} + \frac{d^2}{2(n_1 + n_2 - 2)} \right ) \left ( \frac{n_1 + n_2}{n_1 + n_2 - 2} \right ) \]

Simple enough, except that each study reported multiple outcome measures (e.g. several different neuropsychological tests). We could just average together all of the effect sizes within each study to get an estimate of the study effect, but a better way is to explicitly incorporate the within-study variability into the model.

The model is just normals stacked on normals. Let y_{ij} be the j‘th effect size from the i‘th study. Note that different effect sizes even within the same study have different variances (since they have different magnitudes, and different outcome measures might have different n‘s due to missing data, etc), so we assume that each observed effect size estimates a true effect size for that outcome measure, so that

    \[ y_{ij} \sim \mathrm{N}(\mu_{\text{effect}_{ij}}, \text{Var}(y_{ij})) \]

Moreover, we claim that all of the true effect sizes in a study are drawn from a study level distribution of effect sizes, so that

    \[ \mu_{\text{effect}_{ij}} \sim \mathrm{N}(\mu_{\text{study}_i}, \sigma^2_i) \]

and we assume that each of the study level true effects are drawn from a common distribution

    \[ \mu_{\text{study}_i} \sim \mathrm{N}(\mu, \tau^2) \]

More conceptually, the first step in the model is to use the multiple outcome measures within a study to estimate a true effect for that study. The problem is that some effect sizes have greater variance than others, and we want the effects with higher precision to be given more weight. Then, we want to use the study level effects to estimate the “true” global effect (i.e. the true effect of blast injury on cognitive performance), and again we want those studies that were estimated with the greatest precision to be given greater weight.

An alternate way to write the model, if you’re a fan of regression, is to note that y_{ij} approximates \mu_{\text{effect}_{ij}} with error, and so

    \[y_{ij} = \mu_{\text{effect}_{ij}} + \epsilon_{\text{effect}_{ij}}\]

but \mu_{\text{effect}_{ij}} approximates \mu_{\text{study}_i}, and so on, and so our random effects model can be written

    \[y_{ij} = \mu + \epsilon_{\text{global}_i} + \epsilon_{\text{study}_{ij}} + \epsilon_{\text{effect}_{ij}}\]


(1)   \begin{align*} \epsilon_{\text{global}_i} &\sim \mathrm{N}(0,\tau^2) \\ \epsilon_{\text{study}_{ij}} &\sim \mathrm{N}(0,\sigma_i^2) \\ \epsilon_{\text{effect}_{ij}} &\sim \mathrm{N}(0,\mathrm{Var}(y_{ij})) \end{align*}

which makes it clear that all we’re really doing is modelling different sources of variation.

A potential problem is that, with some studies reporting only a small number of effect sizes, it might be difficult to estimate the study level variance \sigma^2_i. Some possible solutions to this problem:

  • Let n_i be the number of effect sizes reported by study i. We can set \sigma^2_i = \sigma^2/n_i, so that the problem reduces to estimating a single study level variance parameter \sigma^2, which then assigns a variance to each study according to its sample size (i.e. studies reporting more effect sizes have a lower variance). This sidesteps the issue of estimating a variance parameter from only a few effect sizes by using data from all of the studies at once.
  • Alternately, we can estimate the variance of each study independently, but incorporate extra information into the model which keeps the \sigma^2_i‘s at a reasonable level (I like this approach better). This assumes that we know what constitutes “a reasonable level”. Fortunately, we’re dealing with effect sizes which we know from experience to be fairly moderate (i.e. less than 1 in absolute value), so the variance is certainly going to be less than 1 or so. To enforce this, we stick a half \text{Cauchy}^+(0,1) prior on the study level variance parameters. The Cauchy is a t distribution on 1 degree of freedom; it’s fat tailed distribution with a peak that tapers off fairly quickly as we move away from zero, and so helps to keep the variance from getting too large when we’re estimating with a small number of effect sizes.

With that issue settled, the only thing left is to stick the priors on the top-level parameters \mu and \tau^2. We have good reason to believe that \mu will be less than zero (i.e. blast-related concussion will reduce cognitive performance), but we really don’t have a good idea of the magnitude of the difference, and there’s not a huge amount of literature to give an estimate (well, the whole point of the meta-analysis is to get the estimate), so we just use a relatively flat prior of \mathrm{N}(0, 10^4). Since there are few studies, we stick another \mathrm{Cauchy}^+(0,1) prior on \tau^2.

Just to be sure that our fairly harsh prior on \tau^2, and our solution to the problem above didn’t unduly influence our results, we actually refit the model several times with harsher or more lenient priors on the \sigma^2_i‘s and \tau^2, and it turns out that the results are pretty much the same. The model arrives at the same conclusion regardless, suggesting that the data are pretty clear about the true effect of mTBI. Note that the above model is slightly different from the version that made it into the paper (though it arrives at exactly the same conclusion), since all models are easily improved in retrospect.


Karr, J. E., Areshenkoff, C. N., Duggan, E. C., & Garcia-Barrera, M. A. (2014). Blast-related mild traumatic brain injury: a Bayesian random-effects meta-analysis on the cognitive outcomes of concussion among military personnel. Neuropsychology Review, 24(4), 428-444.