Quantile models with random effects

More experimenting with the quantile regression model described here. Good regression models are multi-level, so I’ve been playing around at validating a quantile model with random effects.

The data

In the validation of the simple quantile regression model, we used simulated reaction time data from an ex-Gaussian distribution. I thought it would be more fun (and more informative) this time around to use more realistic data, and so we simulate response times following the pattern of results observed by Voss, Rutherman, & Voss (2004) in their validation of the diffusion model of binary decision making. The experiment is a forced two-choice design with the following conditions:

  1. A baseline condition in which participants respond with a keypress to indicate the color of a square presented at fixation.
  2. An accuracy condition in which participant are told not to respond until they are absolutely sure of their answer.
  3. A difficult condition in which the color of the square is muted.
  4. A handicap condition in which the hand begins further away from the response pad.

The authors observed increased boundary separation in the accuracy condition, decreased drift rate in the difficult condition, and increased non-decision time in the handicap condition. Recall that the basic diffusion model has a boundary separation parameter \alpha, a non-decision time parameter \tau, a bias parameter \beta, and a drift rate parameter \delta. Using the parameter values obtained by the authors in each condition, and letting y be a reaction time, we draw y from a Wiener distribution so that

    \begin{align*}           y &\sim \mathrm{Wiener}(\alpha, \tau, \beta, \delta) \\           \alpha &= 1.2 + 0.2x_\text{acc} + 0.0x_\text{diff} + 0.0x_\text{hand} \\           \tau &= 0.5 + 0.0x_\text{acc} + 0.0x_\text{diff} + 0.2x_\text{hand} \\           \beta &= 0.5 \\           \delta &= 0.9 + 0.0x_\text{acc} - 0.4x_\text{diff} + 0.1x_\text{hand}      \end{align*}

where x_\text{acc}, x_\text{diff}, and x_\text{hand} are indicator variables for the accuracy, difficult, and handicap conditions, respectively. Note that we assume no bias in any condition, and, for simplicity, we do not allow model parameters to vary across trials. Parameters for each participant are generated by adding \mathrm{N}(0, .03) noise to the above coefficients. We perform 3 separate simulations, each using 40 participants and either 40, 200, or 500 observations per condition. For each simulation, we fit both an unpooled and a partially pooled model. Following standard practice in psychology, we examine only “correct” responses generated by the diffusion model (those in which the response was at the upper boundary).

The model

Let {\bf y} be a vector of outcomes and let {\bf X} be a design matrix. Then a linear model for the q‘th quantile has log-likelihood

    \[           \log{\mathcal{L}(\beta)} = n\log(q) + n\log{(1-q)} - \sum_{i=1}^n \rho_q(y_i -                  {\bf x}_i \beta)      \]


    \[           \rho_q(x) = \frac{|x| + (2q-1)}{2}      \]

which is derived by assuming that y_i follows an asymmetric Laplace distribution (ALD), with density

    \[           f_q(u) = q(1-q)e^{-\rho_q(u)}      \]

It’s not exactly obvious how the parameters of a quantile model should vary, so we just assume that they are normally distributed. If y_{ij} is the j‘th observation from the i‘th participant, then our model for the q‘th quantile is

    \begin{align*}           y_{ij} - {\bf x}_{ij}\beta_i &\sim \mathrm{ALD}(q) \\           \beta_i &\sim \mathrm{N}(\mu, \Sigma)      \end{align*}

where, for simplicity, we assume that \Sigma is diagonal. We put weakly informative, independent normal priors on the components of \mu, and standard Cauchy priors on the diagonal entries of \Sigma. For the unpooled model, each group is estimated independently.

In Stan, the model looks like this:

functions {
    real q_loss(real q, real u){
        return .5 * (fabs(u) + (2*q - 1)*u);
    real log_ald(real q, real u){
        return log(q) + log(1-q) - q_loss(q, u);

data {
    int N;                 // Number of observation
    int M;                 // Number of predictors
    int K;                 // Number of groups
    real q;                // Quantile
    vector[N] y;           // Response variable
    vector[M] X[N];        // Predictors
    int grp[N];            // Group indicator

parameters {
    vector[M] beta_dummy[K];
    vector[M] mu;
    vector[M] sigma;

transformed parameters {
    vector[M] beta[K];
    for (i in 1:K){
        beta[i] <- mu + sigma .* beta_dummy[i];

model {
    // Likelihood
    for (i in 1:N){
                           y[i] - dot_product(beta[grp[i]], X[i])));
    // Priors
    for (i in 1:K){
        beta_dummy[i] ~ normal(0,1);
    for (i in 1:M){
        square(sigma[i]) ~ cauchy(0,1);
    mu ~ normal(0,5);


40 observations per group

For each condition, we plot the pooled and unpooled group level estimates against the true values. The dashed green lines are the true top-level means.


It seems like there is very little information in 40 observations about the true .75 quantile, and so the pooled model shrinks everything to the mean. This gives a pretty precise estimate of the mean, but essentially erases any group level differences. The unpooled estimates are nearly worthless, so 40 observations would seem to be enough to estimate the mean (assuming you have enough groups), but there's really no way to generate precise estimates of group level quantiles. Below, we plot the pooled estimates against the unpooled estimates for each condition. The red lines are least-squares fits.


200 observations per group



500 observations per group




In all cases, partial pooling decreases the mean squared-error of both the mean and group level quantiles estimates. Pooling gives better mean estimates in all cases, obviously, but the pooled group level estimates are essentially worthless for smaller sample sizes (despite smaller error), since the shrinkage is so severe. Since we're interested in reaction times, this isn't so bad, since it would be unusual to have only 40 observations per participant per condition, but it's definitely not worth it with less than, say, 200. The problem would probably get worse with larger quantiles, but it would also probably be easier with normally distributed data (in the above examples, we're estimating quantiles in the upper tails of skewed distributions).


Voss, A., Rothermund, K., & Voss, J. (2004). Interpreting the parameters of the diffusion model: An empirical validation. Memory & Cognition, 32(7), 1206-1220.