Table of Contents
I’ve compiled a list of material for basic topics in mathematics and statistics that I’ve found useful over the years, or for topics that are particularly common in psychology or cognitive science. I am, admittedly, biased, since my background is in mathematics, but lacking anything described here will almost certainly leave you having to “trust the defaults” at some point, which is a sin. I’ve tried to recommend free and open source material whenever possible, though in some cases it’s impossible.
- Single variable differential and integral calculus. Used absolutely everywhere. I don’t know of any good introductory textbooks that aren’t geared towards students in pure math, but the course offered by David Jerison at MIT OCW is very good.
- A good introduction to linear algebra, which means computations with vectors and matrices and some general theory about vector spaces and linear transformations. This is necessary for almost everything in statistics and machine learning. Gilbert Strang’s course at MIT OCW is great, but a bit light on the theory, so maybe follow it up with something like Shilov’s Linear Algebra.
- Multivariable calculus at least covering partial derivatives and multiple integration, since this is necessary to do any kind of statistics. A bit of familiarity with vector calculus would be even better, since this is necessary for optimization, dynamical systems, etc. Again, I like Denis Auroux’s course at MIT OCW.
- A little bit of combinatorics. Counting problems pop up almost everywhere in statistics, so it’s important to be a least a little familiar with combinatorial problems. Oscar Levin has written a good open source textbook here.
- A good introduction to differential equations and dynamical systems. The best and most accessible intro textbook is Strogatz’ Nonlinear Dynamics and Chaos. For something slightly more technical, Arnold’s Ordinary Differential Equations is one of the best textbooks ever written.
- A good introduction to stochastic processes. Lots of statistical models are stochastic processes, and there is plenty of stochastic modeling in neuroscience (e.g. modelling spike trains by Poisson processes). Something like Ross’ Introduction to Probability Models, though there may be some better textbooks out there.
- A good course in optimization. At some point, you’ll want to fit a model and there will be no software to fit it for you. Then you’ll have fit it yourself, which means either working with a prebuilt optimizer, or building one yourself. In either case, you should have at least some familiarity with basic optimization. A few good sources:
- The Matrix Cookbook by Petersen and Pedersen
An assortment of useful matrix results and identities. The section on matrix derivatives is especially useful, since this pops up all the time when calculating gradients.
Enough to get a general familiarity with statistical theory and estimation; lots of comfort with linear modeling (which is the first line of attack for almost any problem); a good survey of multivariate techniques; and, finally, a thorough introduction to Bayesian methods, which is how you should be implementing the models described in the earlier books.
- At least one rigorous course in mathematical statistics, using something like Introduction to Probability and Mathematical Statistics by Bain and Engelhardt. This assumes a good level of comfort with multivariable calculus, and covers the background in probability, distribution theory, and estimation theory that is necessary for almost every kind of statistics. For even more background, Casella and Berger’s Statistical Inference is probably the most thorough reference for classical (non-Bayesian) statistics.
- Data Analysis Using Regression and Multilevel/Hierarchical Models — Gelman and Hill
This book specifically. Linear models are the first line of attack for almost any problem, so you need a thorough knowledge of linear modeling, and this is the best book on applied linear modeling there is.
- Bayesian Data Analysis — Gelman et al.
Again, this book specifically. Most research questions are Bayesian questions (e.g. “How much does X affect Y?”), and the ones that aren’t probably aren’t the right questions anyway. Beyond the Bayesian vs. frequentist issue, studying Bayesian statistics is important for two reasons:
- Priors allow you to regularize. Even if you don’t like the idea of incorporating a prior into a model, they allow models to be constrained in a simple and rigorous way. Many interesting models are difficult to fit “out of the box” because the data do not contain very much information about their parameters. This is especially true for high-dimensional data, where estimating the covariance structure is almost impossible without huge samples. This uncertainty bleeds over into the other parameters of the model and makes reasoning about the model difficult. Bayesian statistics allows these parameters to be constrained, which generally gives far better results. Some models (e.g. mixed-effects models, meta-analysis) should really only ever be fit in a Bayesian way.
- Bayesian statistics encourages model building. If you learned statistics from an introductory course outside of the statistics department, you’re probably used to thinking of things like ANOVA and regression as isolated techniques that you apply in very specific circumstances (e.g. “if I have continuous data I do regression”, “if I have groups I use ANOVA”). Bayesian statistics encourages you to think more generally, and to construct models from scratch depending on the structure of your data and your research question. This is useful regardless of whether or not you actually use Bayesian methods in practice.
There are plenty of introductory books on Bayesian statistics for the social/biological sciences, but if you have the background provided by the previous two books, Gelman’s book is easily the best. I’ve also heard great things about Statistical Rethinking: A Bayesian Course with Examples in R and Stan by McElreath.
- Multivariate Data Analysis
It’s hard to recommend a good book on multivariate statistics, since most introductory books focus on multivariate ANOVA and hypothesis testing, which are essentally useless, or act as encyclopedias of multivariate techniques without really explaining any of them. To my mind, there are four major lessons that someone should get out of a course on multivariate analysis (starting with the most important):
- Inference is harder in high-dimensions. The sample sizes needed to properly estimate a normal distribution grow very quickly with the dimension of the data. If you measure 16 variables in a behavioral experiment, you don’t have enough data to estimate all of the covariances between them. This leaves two options: Use strong priors or regularization if you want to use interpretable parametric models, or try a more general machine learning approach at the expense of interpretability.
- Data visualization is important — important enough that you should find a good textbook on data visualization and read all of it. Visualizing high-dimensional data is hard, which means the data usually need to be simplified through some kind of dimensionality reduction. In fact, the first step in any multivariate problem should be to reduce and plot the data in as many ways as you can think of.
- Matrix factorization. All linear multivariate techniques are just different kinds of matrix factorization. This includes the general linear model, PCA, ICA, factor analysis, k-means, and just about everything else. Don’t bother memorizing the details of a dozen different techniques, just understand the concept of matrix factorization and set whatever constraints make sense for your problem.
- There is never. ever. ever. any reason to do a dozen or a hundred or a thousand hypothesis tests. Significance testing has terrible properties as a filter on high-dimensional datasets, and it is almost always better to replace large numbers of tests with a single model. This could be as simple as a regression model or an ANOVA, or something more complex like a hierarchical model encompassing a large number of smaller comparisons. The mass-univariate approach is never optimal, it’s just something researchers do when they don’t have the knowledge or ability to fit better models. If you find yourself doing a dozen or a hundred or a thousand t-tests, you’re almost certainly making a mistake. There may be exceptions, but I can’t think of any. EEG is not an exception. FMRI is not an exception.
- I find myself taking a machine learning approach more and more, especially in very high-dimensional problems where I don’t have a specific model in mind. Even if you don’t do ML yourself, it’s becoming common enough in most fields that everyone should have a basic understanding. For a fairly comprehensive and rigorous survey:
- Gelman and Hill contains almost everything you need to know about applying generalized linear models, but if you find yourself needing a more technical understanding, then Dobson’s An Introduction to Generalized Linear Models is worth a read.
Some resources for software I use.
For data analysis, I use R most of the time. R is a free and open-source programming language designed primarily for statistics. Though SPSS is more widely used in psychology, it has limited to non-existent support for anything except the most routine analyses, generates ugly graphs, hides the inner workings of statistics from the user, is expensive, and is not open-source. Excel has all of the same problems, and is also extremely sensitive to rounding error and other numerical problems.
If you’re used to software like SPSS, there are several packages like R commander that provide R with an easy to use graphical interface. For learning to program in R, Coursera has what looks like a fairly complete introduction.
Matlab (or Octave)
Matlab is a programming environment designed largely for linear algebra, although it has support for statistics and data analysis. Its statistical library is not as well developed as R’s, and it does not handle data input/output as well, so I recommend R for most purposes. Matlab has an open-source alternative in GNU Octave (which I sometimes use), which mimics most of Matlab’s basic functionality. I program most of my experiments in Psychtoolbox, which is a software suite for Matlab/Octave that allows them to do stimulus presentation. This can also be done in PsychoPy, which is better but has compatibility issues with certain graphics cards, making the code slightly less portable. For an introduction to Psychtoolbox see e.g.
I do all my writing in Latex, an open-source typesetting language popularly used in mathematics, and almost essential for anyone who writes about quantitative subjects. Besides being beautiful, Latex makes typesetting equations and other math almost trivially easy, as opposed to the tedious difficulty of the equation editor in Microsoft Word. See here for why you should use it, and here for how to get started. Using Beamer, Latex can also do posters and slides. All of the math on this website is typeset in latex.
The most popular resources for fitting Bayesian models are, to my knowledge, BUGS, Jags, and Stan. I use Stan for all of my models, since it’s generally faster and samples more efficiently than either BUGS or Jags, but R can interface with all three of them, so they can all be incorporated into the usual data-analysis workflow fairly easily.