Factorizing motion trajectories

It’s difficult to analyze trajectory data, since the “object” of analysis (the trajectory) can’t easily be summarized by a single value without throwing away large amounts of information, unless you already know what you’re looking for (e.g. if you’re looking for deviations from a straight line, then you might summarize each trajectory by summing over the curvature at each point). Hehmen et al. (2014) discuss using PCA as an exploratory technique in order to find interesting components that might index different cognitive processes involved in the movement, but PCA isn’t really well suited to decomposing trajectories, since (1) it ignores most of the structure in the data and (2) the trajectories tend to be sampled at a high enough frequency that the number of “variables” (time points) is usually much larger than the number of observations. It’s not really obvious what a “good” factor model should look like, but it’s easy enough to make a few improvements.

Simulating trajectories

I generated 50 observations in each of two groups. Each observation is a linear combination of two components, plus some autocorrelated noise generated as an AR(1) process (see below)
Each group has a greater weight given to one of the two components, giving something that looks like this:
The simulated data are far cleaner than anything we see in practice, but the method is still very sensitive to noise, so I thought I’d start with something easy while I finetune the model.

Matrix factorization for trajectory data

Arrange the data into a matrix \bf{Y}_{m,n} where each row is a trajectory. We want to decompose the trajectories into linear combinations of p simpler trajectory “factors” that are, hopefully, easier to interpret, and may help to distinguish between groups of trials or participants. This is the same as writing

    \[ \bf{Y} = \bf{A}_{m,p}\bf{X}_{p,n} \]

where each row of \bf{X} is a “trajectory factor”, and the corresponding column of \bf{A} gives the loading of that factor for each trajectory. We fit the model by minimizing the squared error

    \[ F(\bf{A},\bf{X}) = \|\bf{Y - AX}\|^2_F \]

plus whatever other constraints we want on the solution. Since the factors are supposed to be trajectories, we would expect them to be smooth. We can force the model to generate smooth factors penalizing the derivatives of the factors, since a large (in absolute value) derivative at a point indicates a rapid change in location at that point. Our trajectories are sampled discretely, so the derivative at a point can be approximated by the difference between that point and the previous point, which can be computed “automatically” for each factor by multiplying \bf{X} by a difference matrix

    \[ \bf{\Gamma}_{n,n} = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ -1 & 1 & \cdots & 0 \\ 0 & -1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \\ 0 & 0 & \cdots & -1 \end{pmatrix} \]

The magnitude of the derivative is then \|\bf{X\Gamma}\|_F. We also include a Tikhonov-style penalty on \bf{A} and \bf{X} which helps the fitting process when m \ll n, and has the effect of shrinking the estimates towards zero. The final objective function is

    \[ F(\bf{A},\bf{X}) =      \|\bf{Y - AX}\|^2_F + \lambda_1(\|\bf{A}\|_F^2 + \|\bf{X}\|_F^2) + \lambda_2\|\bf{X\Gamma}\|_F^2 \]

where \lambda_1 controls the degree of shrinkage and \lambda_2 controls the smoothness of the factors. Unfortunately, F is not convex, but I’ve had good success initializing \bf{A} and \bf{X} by singular value decomposition and then running gradient descent, where

    \begin{align*} \frac{\partial F}{\partial \bf{A}} &= \bf{AXX^\top - YX^\top} + \lambda_1\bf{A} \\ \frac{\partial F}{\partial \bf{X}} &= \bf{A^\top AX - A^\top Y} + \lambda_1\bf{X} + \lambda_2 \bf{X\Gamma \Gamma^\top} \end{align*}

PCA suggested that five factors were enough to explain most of the variability in the data, so I fit the model with 5 factors and a high degree of smoothing. Of the five factors, 2 of them were useful for discriminating between groups, and I’ve plotted them (and the mean loading per group) below.
Not bad. It does a good job of identifying the underlying components (although component 1 looks a little weird, with the decrease towards the end the trajectory). The problem is that when the data aren’t so nice, for instance when there is greater mixing between the components (so that a trajectory might contain an initial bump and a slow increase), the model tends to find factors that blend the different components together.

Alternately, we can prespecify the matrix \bf{A} using whatever variables we want to include in the model. In our case, we only have group membership, so we can hold

    \[      \bf{A} =           \begin{pmatrix}                1 & 0\\                1 & 0\\                \vdots & \vdots\\                0 & 1 \\                0 & 1           \end{pmatrix} \]

fixed and estimate \bf{X} as a sort of path-regression model, which does a little bit of a better job:


Hehman, E., Stolier, R. M., & Freeman, J. B. (2014). Advanced mouse-tracking analytic techniques for enhancing psychological science. Group Processes & Intergroup Relations, 1368430214538325.