Mixture modelling for reaction time distributions

I’m swimming in reaction time data at the moment. My usual approach to analyzing reaction times in cognitive psychology is through some sort of normal random effects model of the log or inverse RT. This is really only good for picking up effects on the mean of the RT distribution, but we’re increasingly finding evidence that certain manipulations preferentially affect the upper or lower quantiles, and so we really need a more fine grained analysis than the usual “mean differences” approach.

Diffusion models (see e.g. Voss, Nagler & Lerche, 2013) are an extremely successful approach to the analysis of reaction time data, but they make specific assumptions about the cognitive processes underlying responses that aren’t really compatible with the kinds of conflict tasks that we use, so they can’t help us here. A second-best approach is to model the RT distribution directly, while avoiding any assumptions about the processes that generated the distribution.

The ex-Gaussian distribution

Theoretically, the ex-Gaussian distribution is the convolution of a normal distribution with an exponential distribution (equivalently, an ex-Gaussian random variable is the sum of a normal random variable with an exponential random variable), and is known to provide a very good fit to many kinds of empirical reaction time data. It is a three parameter distribution with parameters \mu and \sigma^2, which control the mean and variance of the normal component, and a parameter \lambda which controls the rate of the exponential component (the right-skew of the distribution). Note that by convention it is often parametrized with \tau = \frac{1}{\lambda}. A few examples:
If you care, its density function is given by

    \[             	f(x; \mu, \sigma, \lambda) = \frac{\lambda}{\sqrt{\pi}}e^{\frac{\lambda}{2}(2\mu + \lambda \sigma^2 - 2x)}             				\mathrm{erfc} \left ( \frac{\mu + \lambda \sigma^2 - x}{\sqrt{2}\sigma}\right )             \]

The ex-Gaussian distribution can be fit straightforwardly by maximum likelihood, but it is known to be extremely sensitive to outliers (which usually means extremely long RT’s) and to require a very large sample. Outlier removal is tricky here, since the RT distribution is potentially very long-tailed, and so any approach that involves simply throwing away “extreme” observations is likely to throw away useful information.

Dealing with outliers by mixture modelling

An alternative to identifying specific observations as outliers is to assume instead that response times might be drawn from one of two distributions: An ex-Gaussian distribution of true responses, and a distribution of “contaminant” responses which are assumed to have been generated by some process other than normal decision making (the participant was distracted, they anticipated the stimulus and responded early, etc). These contaminant responses might be fast or slow, so for the sake of simplicity we assume that they are uniformly distributed (i.e. they can occur with any speed). The result looks something like this:
This is called a mixture model. We assume that the observed distribution of RT’s can be written as a weighted mixture of an ex-Gaussian and a uniform distribution. Intuitively, we can think of it as saying that an individual response comes from an \mathrm{exGaussian}(\mu, \sigma, \tau) distribution with probability \alpha_1, and a uniform distribution with probability \alpha_2. Our model can then be written

    \[             	p(rt) = \alpha_1 \mathrm{exGaussian}(\mu, \sigma, \tau) + \alpha_2 \mathrm{Uniform}(L,U)             \]

where L is the smallest observed reaction time, and U is the largest. This lets us fit an ex-Gaussian distribution to the data, while allowing for the fact that a few observations may not “fit” into the ex-Gaussian model.

Fitting the model

The downloads section includes Stan and R code for fitting the mixture model to a single participant. It also includes a file for fitting the model to simulated response times with outliers. I’m currently working on a hierarchical version of the model, to accomodate multiple participants, which will be uploaded when completed.

I’ve never actually seen this kind of model done before in a Bayesian way, so the priors should probably be played around with. The upper and lower limits for the uniform distribution of contaminants are not estimated; we simply define them to be the minimum and maximum of the data. All response times are assumed to be measured in seconds, so we choose priors that allow for a reasonable range of values

(1)   \begin{align*}             	\mu &\sim \mathrm{N}(0, 2) \\             	\sigma^2 &\sim \mathrm{Cauchy}(0,\frac{1}{2}) \\             	\lambda &\sim \mathrm{Cauchy}(0,2)            \end{align*}

The mixture weights (\alpha_1,\alpha_2) are trickier. I originally didn’t bother specifying a prior at all, but the model kept finding mixture weights on the order of .3 for the contaminant distribution, which just seems unrealistic. I ended up sticking a fairly harsh beta prior

    \[             	\alpha_2 \sim \mathrm{Beta}(0.1, 0.9)            \]

on the contaminant distribution weight to force it down to a more reasonable level around < .1. An alternative is to not estimate the mixture weights at all, and just fix them at something like (\alpha_1,\alpha_2) = (0.95, 0.05), but I wanted allow them to very in the future hierarchical model, so I settled for a strongly informative prior.


I’ve drawn 100 observations from an ex-Gaussian distribution with parameters \mu = .5, \sigma = .1, and \tau = .3, along with 10 uniformly distributed contaminant responses between 0 and 4 seconds. The simulated data look like this:
Normally, the rightmost observations would be tagged as outliers and deleted, but let’s see how the model fares. Posterior distributions for the model parameters are plotted below
The estimates of \mu and \sigma are close to dead-on. The model seems to overestimate \tau (the length of the tail), which may suggest that the contaminants are still exerting some influence, but it’s probably just a property of the ex-gaussian distribution itself. Maximum likelihood is known to overestimate \tau in the absence of a huge sample, so that’s probably what’s happening here. I don’t think there’s any way of fixing this with the prior, since large values of \tau are perfectly plausible in empirical data. I’m hoping that the shrinkage that comes from fitting a hierarchical version to many participants will improve the estimate a little. Code is on the downloads page.


Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion models in experimental psychology: A practical introduction. Experimental psychology, 60(6), 385.