 Close Statistical
Parametric
Mapping

Home By members and collaborators of the Wellcome Centre for Human Neuroimaging

# New Functionality

Many new components of SPM2 rely on an Expectation Maximisation (EM) algorithm that is used to estimate restricted maximum likelihood (ReML) estimates of various hyperparameters or variance components. This enables a number of useful things. For example, non-sphericity can be estimated in single-level observation models and, in hierarchical observation models, one can adopt a parametric empirical Bayesian (PEB) approach to parameter estimation and inference. Furthermore, the EM algorithm allows fully Bayesian estimation for any general linear model under Gaussian assumptions. Examples of all these applications will be found in SPM2.

## Non-sphericity and ML estimation

Sphericity refers to the assumption of identically and independently distributed errors. Departures from this assumption can either be in terms of non-identical distributions (e.g. heterogeneity of variance among conditions or groups, known as heteroscedasticy). Departure from independence implies correlations amongst the errors that may be induced by observing different things in the same subject. The two most pertinent sources of non-sphericity in neuroimaging are heteroscedasticy (e.g. when entering the coefficients of different basis functions into a second-level analysis), and serial correlations in fMRI. It is important to estimate non-sphericity for two reasons: (i) Proper estimates of the co-variances among the error terms are needed to construct valid statistics. In this instance non-sphericity enters twice, first in the estimate of the variance component hyperparameters (e.g. error variance) and second in the computation of the degrees of freedom. The effective or adjusted degrees of freedom relies upon the Satterthwaite approximation as described in Worsley & Friston (1995) and is commonly employed in things like the Greenhouse-Geisser correction. (ii) The second reason why non-sphericity estimation is important is that it enables multivariate analyses to be implemented within a univariate framework. For example, it is entirely possible to combine PET and fMRI data within the same statistical model allowing for different error variances and serial correlations between the two modalities. The parameter estimates that are obtained from a multivariate analysis and the equivalent univariate analysis are identical. The only point of departure is at the point of inference. In the univariate framework this is based upon an F ratio using the Satterthwaite approximation, whereas multivariate inference uses a slightly different statistic and distributional approximation (usually one based upon Wilk's Lambda). The critical thing here is that multivariate analyses can now proceed within the univariate framework provided by SPM2.

By default SPM2 uses WLS to provide ML estimators based on the non-sphericity. In this case the weighting 'whitens' the errors rendering them i.i.d. The effective degrees of freedom then revert to the classical degrees of freedom and the Satterthwaite approximation becomes exact. It is possible the use any WLS estimate (SPM2 will automatically compute the effective degrees of freedom) but this is not provided as an option in the user interface). This departure from SPM99 has been enables by highly precise non-sphericity estimates that obtain by pooling over voxels using ReML.

In addition to finessing the computation of the t statistics and effective degrees of freedom the non-sphericity estimates are also used to provide conditional estimates in the Bayesian analysis stream (see below).

### User specification

For PET, you are requested to specify whether or not the errors are identically and independently distributed over the different levels of each factor. In any experimental design there will be one factor whose different levels are assumed to be replications. This is necessary in order to pool the sum of squared residuals over these levels to estimate the error variance hyperparameter. By default, SPM2 assumes that this factor is the one with the greatest number of levels. A common use of this non-sphericity option would be in the second-level analysis of basis function coefficients from a first-level fMRI experiment. Clearly, the coefficients for different basis functions will have different scaling and error variance properties and will not be identically distributed. Furthermore, they may not be independent because the error terms from any two basis functions maybe coupled because they were observed in the same subject. Using the non-sphericity option allows one to take up multiple contrasts from the same subject to a second level, to emulate a mixed or random effects analysis. In fMRI the key non-sphericity is attributable to serial correlations in the fMRI time series. SPM2 assumes that an auto-regressive process with white noise can model these. Critical to this assumption is the decomposition of fMRI time-series into three components. Induced experimental effects embodied in the design matrix, fixed deterministic drifts in the confounds (that are used in high pass filtering) and, finally, a stochastic error term that shows short-term correlations conforming to an AR(1) process. The AR(1) characterisation is only appropriate when long-term correlations due to drift have been modelled deterministically through high-pass filtering or, equivalently by inclusion in the design matrix as nuisance variables.

If non-sphericity is modelled a weighted least squares (WLS) will be used during estimation to render the residuals i.i.d. This WLS estimator then becomes a ML estimator, which is the most efficient of linear unbiased estimators. To over-ride this default WLS simply specify the required weighting matrix before estimation (this matrix is in SPM.xX.W, see below).

### Software implementation

The hyperparameters or coefficients governing the different variance components of the error co-variances are estimated using ReML. The form of the variance components is described by covariance basis in the non-sphericity sub-field of the SPM structure (SPM.xVi.Vi, see below). In SPM2 a multiple hyperparameter problem is reduced to a single hyperparameter problem by factorising the non-sphericity into a single voxel-specific hyperparameter and the remaining hyperparameters that are invariant over voxels. By reformulating a multiple hyperparameter problem like this one can employ standard least squares estimators and classical statistics in the usual way. The voxel invariant hyperparameters are estimated at voxels that survive the default F test for all effects of interest (assuming sphericity) during the estimation stage in SPM. This entails a first pass of the data to identify the voxels that contribute to the ReML hyperparameter estimation. A second pass then uses the non-sphericity V (in SPM.xVi.V) to compute the appropriate weighting matrix (in SPM.xX.W) such that WW' = inv(V). [If W is already specified only a single estimation pass is required.]

## Bayesian estimation and inference

Having performed a weighted least squares estimation one can optionally revisit all the 'within mask' voxels to obtain conditional or posterior estimates. Conditional estimates are the most likely parameters estimates given the data. This is in contrast to the least squares maximum likelihood estimators that are the estimates that render the data most likely. The critical distinction between the two estimators relies upon prior information about the parameters. Effectively, under Gaussian assumptions and the sorts of hierarchical models used in SPM2, the conditional estimators are shrinkage estimators. This means that the estimates shrink toward their prior expectation that is typically zero. The degree of shrinkage depends upon the relative variance of the errors and the prior distribution. Although the likelihood estimator is the most accurate from the point of view of any one voxel, the conditional estimators minimize the equivalent cost function over the voxels analysed. Currently, in SPM2, these voxels are all 'in mask' or intracranial voxels. In short, the conditional estimators will not be the best for each voxel but will, on average the best over all voxels. The conditional estimators provided in SPM2 rely upon an empirical Bayesian approach, where the priors are estimated from the data. Empirical Bayes rest on a hierarchical observation model. The one employed by SPM2 is perhaps, the simplest and harnesses the fact that imaging time-series implicitly look for the same effect over many voxels. Consequently, a second level, in the observation hierarchy, is provided by voxel to voxel variation in the parameter estimates that is used as the prior variance. Put simply, the priors are estimated by partitioning the observed variance at each and every voxel into a voxel-specific error term and a voxel wide component generated by voxel to voxel variations in the parameter estimates. Giving the prior variance, one is in a position to work out the posterior or conditional distribution of the parameter estimates. In turn, this allows the computation of the posterior probability that the estimate exceeds some specified threshold. These posterior probabilities constitute a posterior probability map (PPM) that is a summary of the posterior density specific to the specified threshold.

Having performed the usual least squares estimation simply select Bayesian estimators and the appropriate SPM.mat file. Additional Beta images will be created that are prefixed with a C to denote conditional estimates. When proceeding to inference, if the contrast specified is a t contrast (and conditional estimates are available) you will be asked whether the inference should be Bayesian or Classical. If you select Bayesian you will then be prompted for a threshold to form the posterior probability maps. By default this is one standard deviation of the prior distribution. The resulting PPM is treated in exactly the same way as the SPM, with the exception of P value adjustments that are required only in a classical context. Inferences based upon PPMs will generally have higher face validity, in that they refer to activation effects or differences among cohorts that exceed a meaningful size. Furthermore, by thresholding the PPM appropriately one can infer that activation did not occur. This is important in terms of characterising treatment effects or indeed true functional segregation.

### Software implementation

The expected variance components over voxels attributable to error and parameter variation at the second level (i.e. voxels) are estimated using ReML overall intracranial voxels. `spm_spm_Bayes.m` then revisits each voxel in turn to compute conditional parameter estimates and voxel-specific hyper-parameters. The conditional variance is computed using a first-order Taylor expansion about the expectation of the hyperparameters over voxels.

## Inference based on False Discovery rate

In addition to the Gaussian field correction 'adjusted' p values are now provided based on FDR. For a given threshold on n SPM, the False Discovery Rate is the proportion of supra-threshold voxels which are false positives.

Recall that the thresholding of each voxel consists of a hypothesis test, where the null hypothesis is rejected if the statistic is larger than threshold. In this terminology, the FDR is the proportion of rejected tests where the null hypothesis is actually true. A FDR procedure produces a threshold that controls the expected FDR at or below q. In comparison, a traditional multiple comparisons procedure (e.g. Bonferroni or random field correction) controls Family-wise Error rate (FWE) at or below alpha. FWER is the chance of one or more false positives anywhere (not just among supra-threshold voxels). If there is truly no signal in the image anywhere, then a FDR procedure controls FWER, just as Bonferroni and random field methods do. (Precisely, controlling E(FDR) yields weak control of FWE). If there is some signal in the image, a FDR method will be more powerful than a traditional method.

For careful definition of FDR-adjusted p-values (and distinction between corrected and adjusted p-values) see Yekutieli & Benjamini (1999).

## Dynamic Causal Modelling

Dynamic causal modelling is the final option in the data analysis stream and enables inferences about inter-regional coupling. It is based upon a generalised convolution model of how experimentally manipulated inputs evoked changes in particular cortical areas and how these changes induced changes elsewhere in the brain. The underlying model is an input-state-output dynamical model that incorporates the hemodynamic model described in the literature. Critically, the most important [coupling] parameters of this model are connection strengths among pre-selected volumes of interest. Conditional estimates of these connection strengths are derived using a posterior mode analysis that rests upon exactly the same EM algorithm used in the previous sections. In other words, the algorithm performs a gradient assent on the log posterior probability of the connection strengths given the data. The utility of dynamic causal modelling is that it enables inferences about connection strengths and the modulation of connection strengths by experimentally defined effects. For example, one can make inferences about the existence of effective connections from posterior parietal cortex to V5 and how attention increases or decreases this effective connectivity. Interestingly, if one took the dynamic causal model, focussed on a single region and linearised it through a first-order Taylor expansion, one would end up with exactly the same convolution model used in conventional analyses with a hemodynamic response function and various partial derivatives. This means that the standard approach to fMRI time-series is a special case of a dynamic causal model. In contradistinction to previous approaches to effective connectivity, dynamic causal modelling views all measured brain responses as evoked by experimental design. The only noise or stochastic component is observation error. This contrasts with structural equation modelling, and other variants, where it is assumes that the dynamics are driven by unobservable noise. We regard dynamic causal modelling (DCM) as a much more plausible approach to effective connectivity given fMRI data from designed experiments.

### User specification

To specify s DCM, first extract the required volumes of interest using VOI in the results section. These are stored in VOI.mat files. You will be requested to select from these files. In addition you will be asked to specify the experimental inputs or causes that are exactly the same as those used to construct the original design matrix. Once the program knows the number of regions, and experimental inputs it has to accommodate, it will ask you to specify the connectivity structure, first in terms of intrinsic connections and then in terms of those that are modulated by each input. Once the specification is complete the model parameters will be estimated and can be reviewed using the DCM results section. Inferences for DCM are based upon posterior probabilities and require no correction for multiple comparisons.

### Software implementation

The identification of a DCM uses a generic set of routines for non-linear system identification under Gaussian assumptions. Essentially, these can be regarded as Gauss-Newton assent upon the log of the posterior probability. This provides the maximum aposteriori or posterior mode and can therefore be regarded as a posterior mode analysis using a Gaussian approximation to the posterior density. The posterior probabilities are based upon fully Bayesian priors that, in turn, reflect natural constraints upon the system that ensure it does not exponentially diverge. Priors on the biophysical parameters of the hemodynamic response per se were determined using previous empirical studies. See `spm_dcm_ui.m` and `spm_nlsi.m` for operational details.

## Hemodynamic modelling

This, from a technical point of view, is exactly the same as DCM but is restricted to a single voxel or region. In this instance the interesting parameters are the connection of the experimentally designed inputs or causes to flow-inducing state variables of the hemodynamic model. The posterior distributions of these connections or efficacies are provided to enable inferences about the ability of a particular input or compound of causes to evoke a response.

### User specification

Within the results section, having placed the cursor over the volume of interest, you will be asked to specify which experimental causes you want to enter into the model. The computation then proceeds automatically and a separate figures window will show the results.

### Software implementation

This is a streamlined version of the DCM. See `spm_hdm_ui.m` for details.

## Hemodynamic deconvolution

There is an option in the main menu to form PPIs from fMRI data. PPIs are psychophysiological or physiophysiological interaction terms that are usually entered into a linear model as bilinear terms. Because the interaction takes place at a neuronal level (as opposed to hemodynamic it is necessary to deconvolve the observed fMRI time-series to estimate the underlying neuronal responses. This deconvolution uses the same EM procedure and hierarchical principles used by the Bayesian analyses above. It can be regarded as Weiner filtering using empirically determined priors on the frequency structure of the neuronal signal.

## Bias Correction

A module for bias correcting MR images has been incorporated into SPM2. Applying bias correction to MR images may allow more accurate spatial registration (or other processing) to be achieved with images corrupted by smooth intensity variations. The correction model is non-parametric, and is based on minimising a function related to the entropy of the image intensity histogram. A number of approaches already exist that minimise the entropy of the intensity histogram of the log-transformed image. This approach is slightly different in that the cost function is based on histograms of the original intensities, but includes a correction so that the solution is not biased towards a uniform scaling of all zeros.

## fMRI Unwarping

In addition to realigning (transforming the images according to a rigid body model) the time series there is the option of modelling the residual movement related variance in an EPI (fMRI) time series that can be explained by a model for suceptibility-by-movement interactions. This is given by the "Realign & Unwarp" option.

Susceptibility artefacts in EPI time series is a consequence of the "field disturbances" caused by the presence of an object in the field. Susceptibility is a property of a material, and can be thought of as the "resistance" put up by that material against being magnetised. At the interface between materials with different susceptibility rather severe field disturbancies will ensue. I.e. the field is no longer nice and homogenous. Since varying field strenght (gradients) is used to encode position in MRI, these disturbances cause spatial misplacement of RF signal, i.e. geometric distortions. Because of the low bandwidth in the phase encode direction these distortions will be appreciable mainly in that direction (typically the Ant-Post direction). The distortions are noticable mainly near air-tissue interfaces inside the scull, i.e. near the sinuses and the auditory canals.

In these areas in particular the observed image is a severly warped version of reality, much like a funny mirror at a fair ground. When one moves in front of such a mirror ones image will distort in different ways and ones head may change from very elongated to seriously flattened. If we were to take digital snapshots of the reflection at these different positions it is rather obvious that realignment alone will not suffice to bring them into a common space.

The situation is similar with EPI images, and an image collected for a given subject position will not be identical to that collected at another. We call this effect suscebtibility-by-movement interaction. The "Unwarp" toolbox is predicated on the assumption that the suscebtibility-by-movement interaction is responsible for a sizeable part of residual movement related variance.

Assume that we know how the deformations change when the subject changes position (i.e. we know the derivatives of the deformations with respect to subject position). That means that for a given time series and a given set of subject movements we should be able to predict the "shape changes" in the object and the ensuing variance in the time series. It also means that, in principle, we should be able to formulate the inverse problem, i.e. given the observed variance (after realignment) and known (estimated) movements we should be able to estimate how deformations change with subject movement.

We have made an attempt at formulating such an inverse model, and at solving for the "derivative fields". A deformation field can be thought of as little vectors at each position in space showing how that particular location has been deflected. A "derivative field" is then the rate of change of those vectors with respect to subject movement. Given these "derivative fields" we should be able to remove the variance caused by the suscebtibility-by-movement interaction. Since the underlying model is so restricted we would also expect experimentally induced variance to be preserved.