# Geoffrey Aguirre Lab

### Site Tools

public:basic_information_on_the_glm

### What is the GLM?

The general linear model is a univariate statistical approach that relates a set of predictor covariates (a model) to a data vector using least-squares estimation. The parameters and errors that result are the basis of such common statistics as the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many multivariate tests. The GLM may be expressed as:

`Y = Gβ + ε`

Where Y is a vector of n data points, G is an n x m design matrix, β is a vector of m parameters, and ε is a vector of n errors. The creation of a GLM analysis of fMRI data in VoxBo centers upon the development of the design matrix, G.

There are several excellent texts that introduce the concepts of linear analysis in general, and several journal articles that consider the application of these methods to functional imaging data (Friston et al., 1995).

Within IDL, this expression solves for β given Y and G:

```Betas = (invert(transpose(G)##G)##transpose(G))##Y
Residuals = Y-G##Betas
ErrorSquared = total(transpose(Residuals)##Residuals)```

The corresponding MATLAB code is:

```function [Betas,ErrorSq,Residuals] = getBetas(G,Vec)
% [Betas,ErrorSq,Residuals] = getBetas(G,Vec)
%
% given a G matrix and a vector, gives the betas
% fitting the G matrix to that vector, and the errorsq
%
% 2007 ddrucker@psych.upenn.edu adapted from gka

OrderG=length(Vec);

Fac1 = (inv(G'*G)*G');
Betas = Fac1*Vec;
R=(-1).*(G*Fac1);
R((([1:OrderG]-1)*(OrderG+1))+1)=1+R((([1:OrderG]-1)*(OrderG+1))+1);  %matlab starts indexing at 1
Residuals=R*Vec;
ErrorSq=(Residuals'*Residuals);```

### What is the modified GLM?

One of the assumptions of the GLM is that the data are independent (i.e., uncorrelated) under the null-hypothesis. There is good evidence that fMRI data do not satisfy this assumption (i.e., they are “colored”) (Zarahn et al., 1997a). As a consequence, parametric (Aguirre et al., 1997; Zarahn et al., 1997a) and non-parametric (Aguirre et al., 1998a) analysis methods that assume independence do not provide adequate control of the false-positive rate. In 1995, Worsley and Friston advocated the use of a modified GLM that can accommodate serial correlation in the error terms (Worsley and Friston, Analysis of fMRI Time-Series Revisited - Again, 1995). This is the statistical approach implemented in VoxBo.

### Overview of the analysis process

The first step is to prepare the dependent data for analysis by pre-processing. In most cases, this involves converting the raw scanner files into TES files and performing a series of processing steps (e.g., distortion correction, realignment). If data are to be pooled across subjects using a fixed-effects analysis, conversion of the TES files to a standard spatial frame is necessary. Also, it may be desirable to smooth the dependent data in space prior to analysis.

The second step (usually) is to create condition functions. These text files contain a list of values that code the state of the experiment at each data point (image). As is discussed below, these files may be created for each TES file or a composite condition function may be made.

The third step is to start IDL and use the GLM software to design a modified GLM. There are six primary steps here, including the selection of a location for the analysis, identification of the dependent data, design of the G matrix, defining endogenous filtering and defining an intrinsic noise model. These steps, in particular the design of the G matrix, are described in some detail below.

The fourth step begins when the analysis job is submitted to the VoxBo job scheduler. The analysis will proceed automatically from here and you will receive e-mail when it is complete. You may monitor the status of your analysis job using the VoxQ job tool.

### What are the dependent data of a VoxBo GLM?

In general, TES files contain the four-dimensional (three in space, one in time) data that are acquired from the scanner. However, the GLM only ever considers a single-time series at a time (i.e., it is a univariate statistic). Generally, an analysis of a series of TES files involves obtaining the parameter estimates and residual error for the time-series from each point (voxel) in the brain. Notably, a given model can be applied to any vector of data of the appropriate length. Thus, one might also average together the time-series data over several voxels (perhaps within an a priori region of interest) and use this average fMRI signal as the dependent data.

### What are all these files that are created when I run a GLM?

In the process of conducting a GLM analysis, VoxBo places a number of files within the directory named by the user. Here is a thumb-nail sketch of the contents of a GLM directory, listed roughly in the order that they are created:

FileDescription
.subA text file that lists the full path(s) to the TES file(s) that contain the dependent data for the analysis. Update this file if you move your data to another location.
.refThe concatenated condition function. This text file contains as many values as there are observations in the analysis (i.e., order G). Each value codes for the experimental state during that observation (e.g., flash of light, ITI, button press, etc.). This file is used, for example, to determine the behavior of trial-averaging.
.GThe design matrix. This is a two-dimensional array with the number of columns equal to the number of independent variables in the design, and the number of rows equal to the number of observations in the dependent data. The text header of this file contains a list of the parameters.
.preGA high-temporal resolution version of the “G” design matrix. This is the internal representation used by the “Design G Workshop” software, with a temporal resolution of (usually) 100msecs. This representation is down-sampled to produce the G matrix prior to its use in analysis.
.ExoFiltTime-domain representation of the exogenous filtering to be performed upon data. This may include high and low pass notch filters, and/or a representation of the IRF. Only the magnitude component is used (i.e., this kernel should contain no phase information).
.IntrinCorTime-domain representation of intrinsic noise under the null hypothesis (e.g. fft1 of the square root of the 1/f power spectrum). The mean of the signal must be zero. All phase information in the signal will be discarded.
.KThe identity matrix, convolved with a model of 1) intrinsic temporal autocorrelation under the null-hypothesis and 2) any exogenous smoothing to be applied to the data.
.KGThe G matrix times the K matrix. It contains covariates that reflect both endogenous and exogenous smoothing. Note that the G matrix is de-convolved by the representation of the intrinsic noise model prior to the creation of KG. KG=K•G
.VThe variance-covariance structure of the errors. V=K•KT
.F1, .F3Factors of the covariance matrix of the parameter estimates. F1 is the pseudo-inverse* of KG. F1=(KGT•KG)-1•KGT. F3=V•KG•(KGT•KG)-1
.RThe residual forming matrix. R=I-(KG•F1) (where I is the identity matrix)
.RV, .RVRVMatrices related to the variance-covariance structure of the residuals. These are temporary files that are automatically deleted when the analysis is complete. RV=R•V. RVRV=RV•RV
.tracesScalars that contribute to the calculation of the effective degrees of freedom of the analysis. They are used to render the sum-of-squares of the residuals an unbiased estimate of the true error. TraceRV=trace(RV). TraceRVRV=trace(RVRV) where trace(•) indicates the sum of the diagonal elements
.prmThe parameter estimates (and error term of the model) for the parameters of interest for each voxel in the data set. This file, which is four-dimensional, is effectively a TES file.

* Note that Matlab has a pseudo-inverse function, pinv, which calculates the Moore-Penrose pseudo-inverse of a matrix. This is not the same pseudo-inverse.

### Calculation of a t-value

Within the modified GLM, a t-statistic is calculated as follows (in IDL):

```Betas = (invert(transpose(G)##G)##transpose(G))##Y
Residuals = Y-G##Betas
ErrorSquared = total(transpose(Residuals)##Residuals)
Fact = float(ContrastArray##F1##F3##transpose(ContrastArray))
Fact = Fact(0)
ScaledError = sqrt(ErrorSquared*Fact)
tval = (ContrastArray##transpose(Betas)) / ScaledError```

where `ContrastArray` is a vector describing how the beta values are to be combined for the test of interest.

/var/www/html/aguirreg/html/wiki/data/pages/public/basic_information_on_the_glm.txt · Last modified: 2011/01/21 15:33 by aguirreg

### Page Tools 