Experimenting with IGT factorization

The Iowa Gambling Task has always been one of my favorite tasks, if only because it’s so much fun to analyze. In the standard version of the task, subjects select a card on each trial from one of four decks (A,B,C,D), resulting in the gain or loss of some amount of money. The probabilities and win/loss magnitudes vary depending on the deck, and the subject must learn which decks are better/worse in order to maximize their winnings. Conventionally, decks A and B have a negative expected value (the “bad decks”), while C and D are positive (the “good decks”). Further, decks B and D are “high reward frequency” decks, which pay off more frequently than decks A and C.

The task is usually played out over 100 trials, and results in a string of deck selections and associated payoffs. Usually, only a few summary statistics are reported (e.g. total winnings, number of selections from the good decks, etc), although it has become popular to fit reinforcement learning models to the task to try and tease out different cognitive processes contributing to task performance. I generally don’t like these kinds of models, since they’re difficult to estimate, their reliability is terrible, and essentially nothing is known about the model likelihood function or the statistical properties of the estimates. So I’m always interested in new methods for analyzing IGT data.

A common way to visualize performance on the IGT, and to assess the predictive accuracy of reinforcement learning models, is to separate the trials into blocks, and compute the proportion of selections from each deck within each block. The results is a matrix \mathbf{X} where the j,k‘th entry is the proportion of selections from deck j in block k. Since I’ve been playing around with a lot of matrix-valued data in my motion tracking and fMRI work, I thought it might be interesting to analyze the deck selection matrices directly.

To test out the model, I used the data provided by Steingroever et al. (2015) — the result of a collaboration between several labs. I picked those data sets which used payoff schedule 2 and which recorded at least 100 trials (for those studies which used more than 100 trials, I used only the first 100). In total, I was left with 330 subjects. For each subject i, I divided the task into 5 blocks of 20 trials and computed the deck selection matrix \mathbf{X}_i. The matrices for each participant were then assembled into a 3-tensor \mathcal{X} with modes deck, block, and subject, respectively.


Lately, I’ve been having fun playing around with the population-value decomposition described by Crainiceanu et al. (2012), which can be thought of almost like a hierarchical singular-value decomposition (and, by Lock et al., is closely related to Tucker decomposition and other tensor decomposition techniques). Given a a set of matrices \mathbf{X}_i, thought to be independent realizations of some common process, we want to decompose \mathbf{X}_i as

    \[           \mathbf{X}_i = \mathbf{PV}_i\mathbf{D}}      \]

where \mathbf{P} and \textbf{D} are sets of row and column factors, common to all observations, and \mathbf{V}_i (ideally, smaller than \mathbf{X}_i) describes the contributions of these factor to i‘th observation. The matrices \mathbf{P} and \mathbf{D} are obtained by first performing singular value decomposition on each \mathbf{X}_i

    \[           \mathbf{X}_i = \mathbf{U}_i \Sigma_i \mathbf{W}_i^\top      \]

and then applying PCA to the total set of factors in all the \mathbf{U}_is and \mathbf{W}_is. The result is a set of components which are common to all observations.

This technique can easily be extended to incorporate constraints like sparsity or non-negativity, though none of these constraints have outperformed pure SVD for this particular data set.


Before fitting the model, I centered the data along the first and second modes. Selecting the right number of components is difficult compared to matrix factorization methods like PCA, since there are more parameters to tune (ranks for the row and column decomposition of each trial, and the number of deck and time factors). Moreover, if the tensor is very noisy, then a good decomposition may not explain a large proportion of the variance anyway. In the end, I extracted 3 deck and 4 time factors, and fixed the row and column ranks of the trial SVDs to 2 and 2.

The matrix \mathbf{P} contains a set of deck factors, representing characteristic sets of deck preferences, which are plotted below. Note, for example, that factor 1 represents a strong preference for deck B, which is often observed in empirical data, while factor 2 displays a preference for decks B and D (the high reward frequency decks).

The matrix \mathbf{D} contains a set of time course factors, representing characteristic changes in preferences over the five blocks., which are plotted below. For example, the first factor represents the reversal of a deck preference over the course of the five blocks.

For each subject k, the entries of the matrix \mathbf{V}_k describe the interactions between the deck and time factors. Specifically, the i,j‘th entry describes the extent to which deck factor i following time course j contributes to the subjects performance. This interaction can be visualized as the outer product of the i‘th deck factor and the j‘th time factor, which is plotted below for the first deck and time factors (entry [1,1]).

Entry [1,1] thus describes an initial preference for deck B, which quickly reverses over the course of the task. A few of the entries are plotted below for the four studies

The entries in \mathbf{V}_i can then easily be used as features for clustering/classification, or can be further factorized using PCA or whatever. In any case, my experience playing around with a few data sets is that the factors tend to be very interpretable, and generally correspond to known behaviours (e.g. preference for high-reward frequency decks, bias towards deck B, etc).


Crainiceanu, C. M., Caffo, B. S., Luo, S., Zipunnikov, V. M., & Punjabi, N. M. (2012). Population value decomposition, a framework for the analysis of image populations. Journal of the American Statistical Association.

Steingroever, H., Fridberg, D., Horstmann, A., Kjome, K., Kumari, V., Lane, S. D., … & Stout, J. (2015). Data from 617 healthy participants performing the Iowa gambling task: A “many labs” collaboration. Journal of Open Psychology Data, 3(1).

E. F. Lock, A. B. Nobel, and J. S. Marron (2011). Journal of the American Statistical Association Vol. 106 , Iss. 495