Multimodal, Multisubject data fusion¶

Preprocessing M/EEG data¶

We will start by creating pipelines (using SPM’s batch interface) for preprocessing the M/EEG data for a single subject, and then scripting these pipelines to repeat over multiple subjects. For no particular reason, we will start with Subject 15.

Convert (and epoch)¶

The first step is to convert raw M/EEG data from its native format (which depends on the acquisition system) to the format used by SPM. In the present example, the raw data are continuous. They can be converted to continuous SPM data, but to save disk space and time, we can “cut out” time windows (epochs) around the trials during the conversion step, so that the resulting SPM file contains epoched data.

In the batch editor, select SPM on the top toolbar, and from the dropdown menu select M/EEG. At the top of the new dropdown menu, select “Conversion”. Once selected, the Module List on the left of the batch editor window will now list “Conversion” as the first (and only) step. Within the main, Current Module window will be listed several variables. The first variable listed is “File Name”. On the right hand of this pane, you will see “\(<\)-X”, this indicates that you need to update this field. To do so, click on “File Name”, this will then open up your current working directory. For this example we will be using the M/EEG data from subject number 15 (in the Sub15 directory). Select the file named run_01_sss.fif and press “Done”.

Many variables in the Current Module window have default values, but we need to change some of them. For example, we want to epoch during conversion, so highlight the “reading mode” and change from “continuous” to “epoched”. To epoch, we need to define the start and end of each trial. This information normally comes from a trigger channel containing pulses that encode the onset and type of trial. You can ask SPM to read this channel and select the relevant trial codes by hand. However, here we will use “trial definition” files that have been created in advance (given the complexity of the trigger codes in this dataset; contact authors for more information on trial definitions). So select “Trial definition file” under the “epoched” option, click on “Specify” and select the file run_01_trldef.mat. (The contents of this file conform to the FieldTrip format, with one row per trial, each with three columns, which represent the sample numbers from the start of the continuous recording for 1) onset of the epoch, 2) offset of the epoch and 3) pre-stimulus baseline period.) Each trial runs from -500ms to +1200ms relative to the stimulus onset. (In fact, we only really care about the period -100ms to +800ms in the analysis below, and will later “crop” the trials to this period, but we first need the extra 400ms at the start and end to provide “padding” to avoid “end-effects” in the wavelet analysis done later.)

Another change to the defaults is that we do not want to convert all channels in the original file (since many are extraneous), so will select a subset by their type. We first need to delete the default option to convert all channels. To do this, click “channel selection”, and scroll down until you can select the “Delete All(1)” option. Then click the “channel selection” again, but this time choose the option “New: Select channels by type”. This will add “Select channels by type” to the Current Module, and you will see “\(<\)-X” on the right hand side of this, indicating the need for user input. Click the “\(<\)-X” and then select “EEG” from the “Current Item” section. Repeat this process to additionally include the “MEGMAG” and “MEGPLANAR” channels.

The remaining options for conversion can be left with their default values (which includes the output filename, which defaults to the input filename, simply prepended with spmeeg_). Once all variables are specified, the “play” button on the top toolbar will turn green and the batch could be run. However, for this example, we will continue to use the current batch editor window, so do not press the “play” button yet.

Prepare¶

The next step in the current pipeline is to update some other properties of the data using the “Prepare” module. This is a general-purpose “housekeeping” module that includes options like re-defining channel names, types, locations, etc. as specific to the particular laboratory set-up. In our case, some of channels currently labelled EEG were in fact used to record EOG.

Select “Prepare”, from the preprocessing menu. Highlight “Prepare” in the Module list, this will open up the variables in the current module window. Again we need to complete those variables indicated by the “\(<\)-X”. If we had already run the previous conversion stage, we could select the new spmeeg_run_01_sss.mat file produced by that stage as the input for this stage. Here however, we will create a pipeline in which all stages are run in one go, in which case we need to tell SPM that the output of the conversion step, even though not yet created, will be the input of this preparation step. You can do this by selecting the “Dependency” button located further down the window. This will open up a new window, listing all the processing steps up to this point. So far this is just one, the conversion step. Highlight this step and select “OK”.

The next variable to define is the “task(s)”. Clicking this variable will display a variety of options in the “current item” box. Within this, select “New: Set channel type”, then return to the current module window. In here, highlight “Channel selection”, which displays further variables within the current item window. Select “New: Custom channel”. Now return to the current module box and select the variable with an “\(<\)-X”. This should be “Custom channel”. Selecting this and clicking on “Specify” will open up a small text window. Within this, type EEG061. Create a new Custom channel and type EEG062; then select “new channel type” and highlight “EOG”. This is because channels EEG061 and EEG062 in fact represented bipolar horizontal (HEOG) and vertical (VEOG) electroculagrams respectively, which can be used to detect ocular artifacts.

This process then needs to be repeated for two other new channel types. First create a new channel type like above, label it EEG063 and set its channel type as “ECG”. This channel represents the electrocardiogram, which can be used to detect cardiac artifacts. Second, create another new channel labelled EEG064 and set its channel type as “other” (this is just a free-floating electrode, and does not contain data of interest).

One more job we need to do is specify “bad” channels. These only exist for the EEG (MEG bad channels are assumed to be corrected by the prior MaxFiltering step). Bad EEG channels were identified by hand by an experienced EEG operator and saved in a file called bad_channels.mat in each subject’s directory. It is important to note that there are many other ways that bad channels can be defined automatically by SPM, using the “artefact” module for example, but the various options for this are not explored in this chapter. To specify bad channels, add a second Prepare module, select the “New: Set/unset bad channels”, then under “Channel selection”, replace the default “All” with the option “New: Channel file”, and then select the bad_channels.mat file in this subject’s directory (note it is important that you delete the “All (1)” option first). In fact, this subject does not have any bad channels, so this step could be ignored if you only want to analyse this subject. But if you want to use the preprocessing script that we will use below for the remaining 15 subjects, you need to set-up this stage now (i.e, the bad_channels.mat files for the other subjects are not empty).

This will complete the preparation stage, as indicated by the “play” button turning green. (Note that you can also use this “Prepare” option interactively, in order, for example, to create and save a bad channel selection file).

Downsample¶

The data were sampled at 1000Hz, but for the analyses below, we rarely care about frequencies above 100Hz. So to save processing time and disk space (by a factor of 5) we can therefore downsample the data to 200Hz (which includes a lowpass filtering to avoid aliasing). Select “downsampling” from the module list “SPM – M/EEG – Preprocessing”, click on “File Name”, select “Dependency” and in the pop-up window, select the prepared datafile at the bottom of the list. Next, set the downsampling rate by clicking the “New sampling rate” variable within the current module box. Type 200 into the pop-up window that appears and use “OK” to set this value. The output file of this stage will be prepended with a d.

Baseline Correction¶

The next step in this preprocessing pipeline is to baseline correct the data by subtracting from each channel the mean from -100ms to 0ms relative to the stimulus onset. Select “Baseline Correct” from under the “SPM – M/EEG – Preprocessing” menu. Select the input file name as being dependent on the output of the downsampling by selecting the “Dependency”. To define the baseline, highlight the “Baseline” button in the current module window and click on the “Specify” button. The baseline needs to be entered as a 1-by-2 array. Enter [-100 0] (units are milliseconds). The prefix of this file name will be b. Note, that to be on the safe side it might be a good idea to perform baseline correction before downsampling. The reason is that in the general case there might be large DC offsets in the data leading to prolonged ringing that even padding will not protect against. For our data, however, the order shown here works equally well.

Deleting intermediate steps (optional)¶

The four steps (modules) described above create a preprocessing pipeline for the data. If this pipeline is run straight away, there will be four different output files. If you are short of diskspace, you might want to delete some of the intermediate files. To do this, select “SPM” from the top toolbar of the batch editor window and choose “M/EEG – Other – Delete” several times. Then you will need to specify the File Names to delete. Highlight each “Delete” module and set the File Name as the output of the Prepare step using the “Dependency” button to delete any output from the conversion/prepare step onward. In fact, all processing steps up until the most recent step (Baseline-correction) can be deleted. To do this, right click on “Delete” in the Module List and select “Replicate Module”, but change the dependency to the downsampled file.

Creating a script for combining pipelines within a subject.¶

Once you have created a linear pipeline, you might want to repeat it on multiple sessions (runs) within a subject, or even across multiple subjects. In the present case, there were 6 independent MEG runs (separated only by a short period to give the subjects a rest), which can all be processed identically. One option would be to save the batch file, manually alter the “File Name” that is initially loaded into the batch editor, and repeat this process separately for each run. A more powerful approach is to create a script. To do this, select “File” from the Batch Editor window, and select “Save Batch and Script”. This will produce two files: a batch file (same as that created when you save a batch) but also a MATLAB script that calls that batch file. So if you call the batch file batch_preproc_meeg_convert, you will get a batch file called batch_preproc_meeg_convert_job.m and a script file called batch_preproc_meeg_convert.m.

The script file batch_preproc_meeg_convert.m will automatically be loaded into the MATLAB editor window, and should appear something like this:

% Conversion: File Name - cfg_files
% Conversion: Trial File - cfg_files
% Prepare: Channel file - cfg_files
nrun = X; % enter the number of runs here
jobfile = {'batch_preproc_meeg_convert_job.m'};
jobs = repmat(jobfile, 1, nrun);
inputs = cell(3, nrun);
for crun = 1:nrun
    inputs{1, crun} = MATLAB_CODE_TO_FILL_INPUT; % Conversion: File Name - cfg_files
    inputs{2, crun} = MATLAB_CODE_TO_FILL_INPUT; % Conversion: Trial File - cfg_files
    inputs{3, crun} = MATLAB_CODE_TO_FILL_INPUT; % Prepare: Channel file - cfg_files
end
spm('defaults', 'EEG');
spm_jobman('run', jobs, inputs{:});

At the top of this script is listed the variable nrun = X: replace X with 6 for the six runs you wish to convert. You also need to complete the missing MATLAB code needed for each run: here, 1) the raw input file to convert, 2) the trial definition file for that run, and 3) the channel file containing the bad channels (which is actually the same for each run, but differs across subjects, which will matter later when we extend the script to loop over subjects). In order to automate selection of these files, you need to know some basic MATLAB . For example, because the files are named systematically by run, we can complete the relevant lines of the script with:

inputs{1, crun} = cellstr(fullfile(rawpth,'Sub15','MEEG',sprintf('run_%02d_sss.fif',crun)));
     inputs{2, crun} = cellstr(fullfile(rawpth,'Sub15','MEEG','Trials',sprintf('run_%02d_trldef.mat',crun)));
     inputs{3, crun} = cellstr(fullfile(rawpth,'Sub15','MEEG','bad_channels.mat'));

where rawpth refers to your base directory, Sub15 is the subject directory that contains MEG and Trials sub-directories (assuming you mirrored the directory structure from the FTP site) and %02d refers to the run number, expressed as two digits.

The other file – the batch or “job” file – can be reloaded into the batch editor at any time. It can also be viewed in the MATLAB editor. If you type edit batch_preproc_meeg_convert_job.m, you will see your selections from the earlier GUI steps. But we need to change those selections that depend on the run number (so they can be passed from the script instead). To make the batch file accept variables from the script file, we need change three of the specifications to ’<UNDEFINED>’ instead. So edit the following lines so they read:

matlabbatch{1}.spm.meeg.convert.dataset = '<UNDEFINED>';
     matlabbatch{1}.spm.meeg.convert.mode.epoched.trlfile = '<UNDEFINED>';
     ...
     matlabbatch{2}.spm.meeg.preproc.prepare.task{4}.setbadchan.channels{1}.chanfile = '<UNDEFINED>';

Then save these edits (overwriting the previous batch file). If you’re unsure, the batch file should look like the batch_preproc_meeg_convert_job.m file in the SPM12batch part of the SPMscripts directory on the FTP site.

This completes the first part of the preprocessing pipeline. You can then run this script by selecting the green play button on the upper toolbar of the script MATLAB Editor window. The results will be 6 files labelled bdspmeeg_run_%02d_sss.mat, where %02d refers to the run number 1-6. If you want to view any of these output files, press “Display” on the main SPM menu pane, select “M/EEG”, then select one of these files. You will be able to review the preprocessing steps as a pipeline from the “History” section of the “Info” tab, and can view single trials by selecting one of the EEG, MEG (magnetometer) or MPLANAR (gradiometer) tabs (see other chapters for how to use the buttons in the M/EEG Review window).

Merging (concatenating runs)¶

To analyse the data as one file, the six runs need to be merged. To do this, select “Merging” from “SPM – M/EEG – Preprocessing – Merging”, select “File Names”, “specify”, and select the 6 file names bdspmeeg_run_%02d_sss.mat. If you were to run this stage now, the output file would match the first input file, but be prepended with a c, i.e, cbdspmeeg_run_01_sss.mat. However, we will wait to add some more modules before running, as below. At this stage, you could also add delete modules to delete all the previous individual run files (since the concatenated file will contain all trials from all runs, i.e, contain the same data).

Prepare (a montage for re-referencing the EEG)¶

Below, we want to re-reference the EEG data to the average across channels (as is sometimes conventional for ERP analyses; note the MEG data have no reference). We can do this with the “montage” module below, which is a general purpose module for creating new channel data from linear combinations of existing channel data. However, we first need to create a montage file, which includes a matrix that, when multiplied by the existing data, creates the new channel data. There is another sub-function (task) of the “Prepare” module that does this, so add another “Prepare” module, select the dependency on the previous merged file as the “FileName”, but for the “task”, select “Create average reference montage” and enter avref_montage.mat as the output filename. (If you want to look at this montage, you can run this module, load avref_montage.mat into MATLAB and look at the montage.tra matrix, where you can see that each new EEG channel is equal to the old EEG channel minus the average of all other channels.) Note that this montage will differ across subjects because different subjects had different EEG channels marked as bad (from steps above), and bad channels need to be excluded when estimating the average EEG signal across channels.

At this point, we can also do one more bit of house-keeping within the same “Prepare” module, which is simply to re-order the condition labels. This only matters for the final stage of “Contrasting conditions” below, where the contrast weights assume a certain order of the conditions. The current order of conditions is based purely on the order they appear in the raw data (e.g, if the first few trials of the first run were: Scrambled, Unfamiliar, Unfamiliar, Scrambled, Famous..., then the condition labels will be ordered Scrambled-Unfamiliar-Famous), and this may vary across subjects. To set the condition order to be invariant across subjects, add a new task by selecting the “Sort conditions” task, then “Specify conditions lists” add three “New: Condition labels”, and name them “Famous”, “Unfamiliar” and “Scrambled” (in that order). Note that this operation does not physically reorder the trials at this stage, but just defines the order that will be used where required at later steps.

Montage¶

Now we have the montage file, we can apply it, in order to re-reference the EEG data to the average. Select “Montage” from the “Preprocessing” menu, and specify the “File Name” as being dependent on the output of the “Merging” module above. For the “Montage file name”, choose a different dependency, namely the output of the “Prepare” module above. Next, highlight “keep other channels” and select “yes” in the “Current Item” box, in order to keep all the MEG channels (which are unchanged). All other default values can remain the same. The output file will be prepended with M.

As with the previous pipeline, if you are short of diskspace (particularly if you later run all 16 subjects), the outputs produced from the intermediate stages can be deleted using the “SPM – M/EEG – Other – Delete” function (see earlier). However, refrain from deleting the montaged data, as these will be used later in the Section on Time-Frequency analysis.

Save batch and review.¶

This completes the second part of the preprocessing pipeline. At this point, you can run the batch. Alternatively, you can save and script, and run the batch from a script. The resulting script can also be combined with the previous script created (e.g., in the SPMscripts FTP directory, scripts for all the stages are appended into a single master script called master_script.m, which loops over each subject too). Remember that, if you want to pass variables from a script to a batch, you need to first ensure the relevant fields in the batch file are set to ’<UNDEFINED>’ (see for example the batch_preproc_meeg_merge_job.m file in the SPM12batch FTP directory).

To view the output, press “Display” on the main SPM menu pane, select “M/EEG”, then select Mcbdspmeeg_run_01_sss.mat. Again, you will be able to review the preprocessing steps from the “History” section of the “Info” tab.