Doing meta-analysis I

Meta-analysis is, broadly, a set of statistical models for combining the results of several research studies in order to summarize the literature, or estimate an effect more precisely than can be done by a single study alone. The actual process of conducting a meta-analysis involves lots of non-statistical labour like searching through the literature, evaluating studies for eligibility, etc, none of which is very interesting, and so I won’t talk about it.

Meta-analysis can be used to summarize all kinds of analyses, but the simplest by far involve effect sizes. In this case, we have a collection of studies, each of which report an effect size (or information from which an effect size can be calculated) of some experimental manipulation (e.g. the effect of a drug treatment on recovery time, or the effect of working memory load in a speeded response task), and we want to use these effect sizes to estimate the true effect of the manipulation.

The easy case: fixed-effects

Take the problem of estimating the mean of a normal distribution \mathrm{N}(\mu,\sigma^2). If we draw a random sample \{x_1, \dots, x_n\}, then the best estimate of the \mu is the sample mean

    \[ 		\hat{\mu} = \frac{1}{n}\sum_1^n x_i 	\]

So, if we assume that the study effect sizes are approximately normally distributed, then we might naively think that that mean would give the best estimate of the true effect size. The problem is that not all studies are created equal, and study level differences (like sample size) mean that some effect sizes are estimated more accurately than others. This means that we have to take into account the precision (the inverse of the variance) of an effect size when doing our estimation: when a study reports an effect size with very high precision, then that effect size should be given greater importance than one with very low precision.

As an analogy, suppose that we take our random sample and add noise to each of the observations, so that each x_i becomes x_i + \epsilon_i, where \epsilon_i \sim \mathrm{N}(0,\sigma_i^2) (i.e. we sample with error). Then, when \sigma^2_i is high, we’re less certain about the true value of x_i, and so x_i should be given less weight when estimating \mu. The usual approach in this case is to take a weighted mean, where each observation is weighted by its precision, so that observations with small error are given the most weight. The estimate is then

    \[ 		\hat{\mu}_w = \frac{\sum_1^n w_ix_i}{\sum_1^n w_i} 	\]

where w_i = \frac{1}{\sigma^2_i}. In meta-analysis, this is called a fixed-effects model. Formally, let \{y_1, \dots, y_k\} be effect sizes. Then the fixed-effects model states that

    \[ 		y_i \sim \mathrm{N}(\mu, \sigma^2) 	\]

for all i, with \mu estimated (usually) by the weighted average described above. The model assumes that every study estimates the same true effect, which is usually an unrealistic assumption. Moreover, the estimate can often be dominated by a single study with a large sample size.


It usually happens that different studies use different outcome measures, or samples from different populations, and so we can’t assume that every study estimates the same true effect. A simple solution is to assume that each study estimates a different true effect size, but that the true effect sizes are related (i.e. come from the same distribution). This kind of hierarchical model is called a random-effects model, and it can be carried out in a number of ways. The simplest version looks like this:

Let \{y_1, \dots, y_k\} be effect sizes, where y_i has variance \sigma^2_i. In the fixed-effects model, we assumed that each y_i estimates the same true effect \mu, and the meta-analysis involves the weighted mean described above. This time, we assume that each y_i estimates a study level true effect size \mu_{\text{study}_i}, where the study level true effects are themselves normally distributed, so that

    \[\mu_{\text{study}_i} \sim \mathrm{N}(\mu, \tau^2)\]

We now have variability at 2 levels: the study level variance \sigma^2_i, as well as \tau^2, the variance of the true effect sizes. The actual estimation of \tau^2 is fairly involved (see e.g. Borenstein et al. 2011), but the most common approach is to adapt the fixed-effect weights using the estimate \hat{\tau}^2 so that

    \[w_i = \frac{1}{\sigma^2_i + \hat{\tau}^2}\]

In practice, this has absolutely no effect beyond increasing the weight given to effect sizes with high variance. When \tau^2 is very high relative to the study level variance, the weights become essentially equal and the whole process reduces to an ordinary mean. This is actually a serious problem with the method, and I outline a better approach to random effects modelling in part II.


Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2011). Introduction to meta-analysis. John Wiley & Sons.