One major research program in our lab aims to understand how the motor system makes use of task related and perceptual information when planning and executing movement. For example, a right handed “reach and grasp” action is executed more quickly when preceded by an image of a right-handed beer mug than when preceded by a left-handed one, but no such effect appears for a right-handed button press. This means that the motor system is selective in the information it uses when planning an action — the affordance of an object isn’t important when planning a button press, but it is important when planning a reach and grasp action. The question is, are these effects confined only to the planning stage of the movement, or do they persist into the execution of the movement itself? The answer, it turns out, is the latter.

Several labs have found evidence for parallel programming of multiple actions in the case where there is uncertainty about the action that will ultimately have to be executed (see e.g. [1]). We’ve found evidence for similar effects when participants hold an object affording a particular grasp in memory. For example, participants are slower to rotate their hand to the correct position to perform a vertical grasp when holding in memory an object affording a horizontal grasp. It’s not entirely clear if the same mechanism gives rise to both kinds of effects, but it’s possible that our effects are the result of some kind of intentional weighting of one of the alternative action plans — the motor system programs each potential action and ultimately performs the cued action, but attention to an object affording one of the alternative actions causes that particular alternative to be weighted more heavily.

We can reliably detect these effects by examining the position and rotation of the hand directly, but lately I’ve been experimenting with techniques to detect and quantify them more “automatically”. One way of doing this is to try to create some kind of hand posture analog of an “Eigenface” ([2]).


I’ll preface this section by noting that the experiment has so far been a failure, in the that the effects (which can be reliably detected using more direct measurements and averaging over subjects) seem to be too small to be detected on a trial by trial basis.

The experiment started with my attempt to get some kind kind of “intrinsic” characterization of the shape of the hand, independent of the location of the hand in space, or the rotation of the hand as a whole. In other words, we want a description of where the parts of the hand are relative to each other. Our equipment isn’t really designed to do this optimally, but we can improvise. Our setup involves five electromagnetic sensors placed on the tips of the thumb, index, and middle fingers, the back of the hand, and the wrist. Each sensor outputs it’s position and rotation (in the form of a quaternion) relative to baseline at 60 Hz. We can thus track the precise position and rotation of the hand as it performs a particular action. We’ll take a look at a dataset from a participant performing both precision grasps (think “picking up a pencil”) and power grasps (think “picking up a beer mug”).

The first step in TensorGrasp is to characterize the shape of the hand by its (approximate) joint angles. We don’t have enough sensors to calculate these properly, but we can use the following angles as approximations


On each trial, we isolate the movement phase, resample it to 100 time points, smooth, and calculate the five joint angles at each sample. A participant then contributes a 5 \times 100 \times T tensor \mathcal{X}, where T is the number of trials. Before proceeding, we center each joint angle by subtracting the mean of that joint angle (over all trials) at each time point. Note that there are certain statistical difficulties involved in working with angles, since they lie on the circle and not in \mathbb{R}. In this case, the angles are concentrated in one quadrant of the circle, and so we just pick a suitable coordinate chart and treat them as if they were ordinary real numbers.

Let \textbf{X}_t be the tensor slice corresponding to trial t. Borrowing on Crainiceanu et al’s work with image classification in [3], the goal, by analogy with singular value decomposition, is to decompose \textbf{X}_t as

    \[           \textbf{X}_t = \textbf{PV}_t\textbf{D}}      \]

where \textbf{P} \in \mathbb{R}^{5 \times a}, \textbf{D} \in \mathbb{R}^{b \times 100}, and \textbf{V}_t \in \mathbb{R}^{a \times b} (though not necessarily diagonal, as in SVD). Intuitively, we decompose the population of trials into a set of factors describing characteristic joint configurations (in \textbf{P}) and characteristic time-courses (in \textbf{D}), and describe each trial as an interaction between these factors, specified by \textbf{V}_t. Each element of the tensor is thus written

    \[           \mathcal{X}_{i,j,k} = \sum_a \sum_b P_{i,a} V_{k_{a,b}} D_{b,j}      \]

Some Results

Some playing around suggested that 3 grasp factors (a = 3) and 5 time factors (b = 5) were enough to give a reasonably well performing and interpretable model, so we’ll go with that for now.

The estimated matrix of grasp factors \textbf{P} is

           [,1]       [,2]       [,3]
[1,] -0.2891312 -0.5928156 -0.1074127
[2,] -0.3715419 -0.6255056 -0.1622750
[3,]  0.3624156 -0.3873869  0.8445203
[4,] -0.5828417  0.2354606  0.3101996
[5,] -0.5543557  0.2276006  0.3907570

There is always a danger that interpreting factors devolves into ad hoc palm reading, but we can make some effort to understand what these factors mean. The second factor (column 2) distinguished between the closing of the hand and the coming together of the thumb and fingers. In this sense, it seems to encode for either a pinch or power grasp. The first factor is more difficult to interpret, since it seems to single out the middle finger, but otherwise it seems to involve the closing of the hand has a whole, which is common to all responses in the task.

The five time course factors (columns of \textbf{D}) are plotted below, though they lack the interpretability of the factors in \textbf{P}.
Other than noting that one factor seems to decay over the course of the grasp, and another appears only briefly and then disappears again, there isn’t much that can be read off of this plot.

More interesting are the matrices \textbf{V}_t, which describe the factor composition of the grasp on each trial. As an example:

            [,1]      [,2]       [,3]      [,4]     [,5]
[1,]   0.7965091 -2.134727 -0.5806201 -4.817485 2.830758
[2,]  -9.9339591  7.247757  8.0021267 -7.664726 2.925099
[3,] -10.2770774 -1.087876  3.2454525  1.453340 1.332886

The entries of the matrix can be used to characterize particular features of the grasp. For example, here are some scatterplots of 15 features (row-wise), with power grasp in green and precision in blue.
The first feature (relating the first grasp factor to the first time course factor) is almost completely sufficient to characterize the grasp as either power or precision.

An interesting use for this technique would be to detect latent features of a particular grasp type on the performance of another grasp. For example, we already know that forcing a subject to attend to an object affording one grasp causes changes in the trajectory of the hand when performing a different grasp. If one could isolate a feature corresponding to, say, a precision grasp, could a higher level of this feature be detected in a power grasp primed by a precision object? So far, no. I suspect that this is because an individual grasp is too noisy to encode subtle information about prime condition.

Oh well. Back to the drawing board.


[1] Gallivan, J. P., Logan, L., Wolpert, D. M., & Flanagan, J. R. (2016). Parallel specification of competing sensorimotor control policies for alternative action options. Nature neuroscience.

[2] Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1), 71-86.

[3] Crainiceanu, C. M., Caffo, B. S., Luo, S., Zipunnikov, V. M., & Punjabi, N. M. (2012). Population value decomposition, a framework for the analysis of image populations. Journal of the American Statistical Association.