Table of Contents

Other posts in this series:

- Part I: Simulating some data

### Background

Recall our setup: We have two groups of subjects performing a task, for which we have simulated BOLD data from seven regions. The functional connectivity of these regions is given by a graph, which we want to infer from the BOLD signal alone. One group maintains the same network configuration over all 600 TRs, and the other switches to a new configuration midway through the task (the exact time varies across subjects). Our goal is three fold: to identify the two distinct groups of subjects, to characterize the underlying network structure(s), and to identify the timing of the network reconfiguration. Our general approach will look something like this:

- Segment the data into distinct time periods. We do this by sliding a moving window across the data in order to produce a sequence of (maybe overlapping) chunks of BOLD signal.
- Estimate the pairwise functional connectivity between regions. Intuitively, regions with similar BOLD time series are assumed to be connected, while those with dissimilar time series are not. This will give a sequence of similarity matrices for each subject, quantifying the evolution of functional connectivity over the course of the task.
- Use the similarity matrices to construct graphs describing the underlying network structure.

This post concerns the first two of these steps. Segmentation of the data is actually a rather difficult issue, since smaller time windows produce very noisy estimates of functional connectivity, but larger windows risk overlapping distinct network states. In our case of a two second TR, a window width of e.g. 50 time points gives fairly short time series’, and thus noisy estimates, but still corresponds to almost two minutes of recording time. Depending on the task, it may not be safe to assume that the underlying network is static for a full two minutes. We’ll ignore this issue for now, since it’s not really even solvable, but it remains a major issue for functional connectivity work based on fMRI data. It also gives rise to certain statistical and computational problems, which I’ll point out later. I’ve chosen more or less arbitrarily to use a sliding window of 50 TRs, so that the first window is , the second is , etc.

The general procedure for estimating functional connectivity looks like this: Suppose that we have regions measured at time points, giving a matrix , where each row of is a time series of BOLD activations. If regions and are functionally connected (that is, they are transmitting information), we would expect the time series and to be similar in some way. We quantify the similarity with a function , where the value of is called the *functional connectivity* (FC) of regions and . The inner product notation is not standard (FC measures are not inner products in general), but I use it to be suggestive since I plan to play around with kernel methods in a later post, and because my measure of FC in this project is the covariance, which *is* an inner-product. Computing all pairwise similarities gives a functional connectivity matrix , which is the fundamental object of functional connectivity analysis.

Wang et al. (2014) have done some simulation work comparing a variety of different FC measures, though it’s not clear that any one measure is obviously superior to the rest. For this example, my measure of FC will be the covariance between the observed time series, for reasons that I’ll explain below.

### A digression on high-dimensional covariance estimation

Functional connectivity analysis is fundamentally nothing more than the estimation of a similarity matrix across voxels or brain regions. In the case where is the observed correlation/covariance, the FC matrix is nothing more than the sample correlation/covariance matrix. This poses a severe problem in the case of fMRI research, in that the sample correlation matrix is an extremely poorly behaved estimator in high dimensional settings — especially when the sample size is small relative to the number of free parameters. To see why, consider what the covariance matrix actually is.

Matrices, including the covariance matrix, are functions on vector spaces, so it makes to ask what a covariance matrix does to a vector space, geometrically. The answer turns out to be very simple: Let be multivariate normally distributed with zero mean and identity covariance matrix (i.e. a “standard” multivariate normal distribution), and let be a covariance matrix. Covariance matrices have nice properties as far as matrices go; in particular, they have a unique non-negative square root. Taking to be the root of , is multivariate normal with zero mean and covariance matrix . In other words, the covariance matrix stretches and squeezes a standard normal distribution so that it has a particular covariance structure, where the directions of stretching and squeezing are given by the eigenvectors of , and the magnitudes are given by the eigenvalues (hence their role in principle component analysis)

This geometric intuition places strong constraints on the structure of . In particular, must be positive semi-definite, which means that it’s eigenvalues must be non-negative. When any of the eigenvalues are zero, the distribution is called degenerate, and lies in some lower dimensional linear subspace of the ambient space. In the language of dimension reduction or manifold estimation, data from a non-degenerate -variate normal distribution is always truly -dimensional; it never lies on some dimensional linear submanifold of , as it does in the degenerate case.

Since the covariance matrix as a function determines the global structure of a multivariate normal distribution, it is generally more important to estimate accurately *as a function* than it is to estimate any individual covariance or correlation. More specifically, we want accurate estimates of its eigenvectors and eigenvalues. Consider independent observations from a -variate normal distribution with covariance , and let be the sample covariance matrix (i.e. the matrix of pairwise sample covariances). The eigenvalues of are strongly connected to the ratio , which is small (less than 1) when the number of observations exceeds the number of variables, and large (greater than 1) when the number of variables exceeds the number of observations. When , the smallest eigenvalue of becomes equal to zero, and so ceases to be a valid covariance matrix. As the ratio increases, more and more eigenvalues go to zero, and the largest eigenvalues begin to increase. What this means, geometrically, is that when the number of observations is relatively small compared to the number of variables, the sample covariance matrix starts to look like this

So, even though we may have a large enough sample to estimate any specific pairwise covariance accurately, we get a terrible estimate of the large-scale covariance structure. Add to this the fact that covariance estimates are *really noisy*, and the result is that sample covariance/correlation matrices are just terrible estimators in general, at least in high dimensional settings (for a more thorough overview, see something like Pourahmadi, 2013).

This problem affects functional connectivity analyses very strongly for two reasons. First, whole brain functional connectivity analyses generally involve atlases with hundreds of brain areas, so that is often very large. Second, functional connectivity changes dynamically, meaning that it is often necessary to calculate FC over a relatively narrow time window; certainly, a few hundred TRs at most. Further, these TRs are not *independent* observations — they are highly autocorrelated, which means that TRs contain substantially less than independent observations worth of information about . And, of course, the BOLD signal is very noisy. This makes accurate estimation of of from BOLD data almost impossible (a fact that is almost completely ignored in the fMRI literature.)

### Procedure

So, why use correlation? Because other measures of FC, like Granger causality or mutual information, are just as noisy, and there’s not much reason to believe that estimation of large scale FC structure is any easier with these measures. What correlation buys you is a very very large literature. Correlation and covariance estimation in high dimensions is an important enough problem that there have been decades of research on how to do it effectively. In particular, it’s very easy to penalize covariance estimates. This has a dual function: First, regularization lets us estimate legitimate (positive definite) covariance matrices even when , and second, sparsity penalties let us identify functionally (conditionally-) independent regions (those with no edge connecting them) without having to decide on a arbitrary cutoff, which is the way FC is usually done in the literature.

For my own work, I’ve been using the Glasso (graphical lasso) estimator developed by Friedman, Hastie, & Tibshirani (2008) tailored for the estimation of large graphs from high-dimensional covariance matrices. Assuming that the true covariance structure is given by a sparse graph, where conditionally independent variables are *not* connected by an edge, their technique uses an penalty (the lasso) to obtain a sparse estimate of the precision (inverse covariance) matrix. The penalty is estimated from the data, essentially by cross-validation, eliminating the need to manually select an arbitrary cutoff. If I’m interested in the covariance itself, rather than just the graph, then I usually use some kind of shrinkage as well. In this case, since we have only 7 nodes, this isn’t really necessary, so I’ll just use the sample covariance.

To summarize, our dataset now consists of

- Raw BOLD data for 40 subjects, comprising 600 TRs for seven brain regions.
- Covariance matrices estimated in 50 TR bins, for a total of 12 per subject. In this case, because the number of TRs far exceeds the number of regions, we use the sample covariance for simplicity.
- Graphs estimated in 50 TR bins, for a total of 12 per subject. The adjacency matrices were estimated using the Glasso estimates, which involve estimating the precision matrix wit a sparsity penalty. Zero terms in the precision matrix indicate conditionally independent brain regions, corresponding to vertices which are
*not*connected by an edge. The penalty term is selected for each graph by cross-validation.

In later posts, I’ll experiment with different approaches to functional connectivity analysis making use of all three of these object.

### References

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. *Biostatistics*, 9(3), 432-441.

Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. *The annals of statistics*, 1436-1462.

Pourahmadi, M. (2013). High-dimensional covariance estimation: with high-dimensional data (Vol. 882). John Wiley & Sons.

Wang, H. E., Bénar, C. G., Quilichini, P. P., Friston, K. J., Jirsa, V. K., & Bernard, C. (2014). A systematic framework for functional connectivity measures. *Frontiers in neuroscience*, 8, 405.