Renamed fetch_data.ipynb and updated docs

fd5b3b2e · Sean Fitzgibbon · 969f6080 · fd5b3b2e
Commit fd5b3b2e authored 5 years ago by Sean Fitzgibbon
--- a/talks/matlab_vs_python/migp/fetch_data.ipynb
+++ b/talks/matlab_vs_python/migp/fetch_data.ipynb
@@ -4,15 +4,29 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Fetch Data\n",
+    "# MIGP\n",
    "\n",
-    "This notebook will download an open fMRI dataset (~50MB) for use in the MIGP demo. It also regresses confounds from the data and performs spatial smoothing with 10mm FWHM.\n",
+    "For group ICA,  `melodic` uses multi-session temporal concatenation. This will perform a single 2D ICA run on the concatenated data matrix (obtained by stacking all 2D data matrices of every single data set on top of each other).\n",
    "\n",
-    "This data is a derivative from the COBRE sample found in the International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html), originally released under Creative Commons - Attribution Non-Commercial.\n",
+    "![temporal concatenation](concat_diag.png)\n",
    "\n",
-    "It comprises 10 preprocessed resting-state fMRI selected from 72 patients diagnosed with schizophrenia and 74 healthy controls (6mm isotropic, TR=2s, 150 volumes).\n",
+    "Resulting in **high dimension** datasets!\n",
+    "\n",
+    "Furthermore, with ICA we are typically only interested in a comparitively low dimension decomposition so that we can capture spatially extended networks.\n",
+    "\n",
+    "Therefore the first step is to reduce the dimensionality of the data.  This can be achieved in a number of ways, but `melodic`, by default, uses `MIGP`.\n",
+    "\n",
+    "> MIGP is an incremental approach that aims to provide a very close approximation to full temporal concatenation followed by PCA, but without the large memory requirements *(Smith et al., 2014)*.\n",
    "\n",
-    "* [Download the data](#download-the-data)\n",
+    "Essentially, MIGP stacks the datasets incrementally in the temporal dimension, and whenever the temporal dimension exceeds a specified size, a PCA-based temporal reduction is performed.\n",
+    "\n",
+    "> MIGP does not increase at all in memory requirement with increasing numbers of subjects, no large matrices are ever formed, and the computation time scales linearly with the number of subjects. It is easily parallelisable, simply by applying the approach in parallel to subsets of subjects, and then combining across these with the same “concatenate and reduce” approach described above *(Smith et al., 2014)*.\n",
+    "\n",
+    "## This notebook\n",
+    "\n",
+    "This notebook will download an open fMRI dataset (~50MB) for use in the MIGP demo, regresses confounds from the data, performs spatial smoothing with 10mm FWHM, and then runs group `melodic` with `MIGP`.\n",
+    "\n",
+    "* [Fetch the data](#download-the-data)\n",
    "* [Clean the data](#clean-the-data)\n",
    "* [Run `melodic`](#run-melodic)\n",
    "* [Plot group ICs](#plot-group-ics)\n",
@@ -42,7 +56,11 @@
   "metadata": {},
   "source": [
    "<a class=\"anchor\" id=\"download-the-data\"></a>\n",
-    "## Download the data\n",
+    "## Fetch the data\n",
+    "\n",
+    "This data is a derivative from the COBRE sample found in the International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html), originally released under Creative Commons - Attribution Non-Commercial.\n",
+    "\n",
+    "It comprises 10 preprocessed resting-state fMRI selected from 72 patients diagnosed with schizophrenia and 74 healthy controls (6mm isotropic, TR=2s, 150 volumes).\n",
    "\n",
    "Create a directory in the users home directory to store the downloaded data:\n",
    "\n",
@@ -248,13 +266,6 @@
    "# plot\n",
    "fig = map_plot(ics)"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
@@ -273,7 +284,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.7.6"
  }
 },
 "nbformat": 4,

 %% Cell type:markdown id: tags:

-# Fetch Data
+# MIGP

-This notebook will download an open fMRI dataset (~50MB) for use in the MIGP demo. It also regresses confounds from the data and performs spatial smoothing with 10mm FWHM.
+For group ICA,  `melodic` uses multi-session temporal concatenation. This will perform a single 2D ICA run on the concatenated data matrix (obtained by stacking all 2D data matrices of every single data set on top of each other).

-This data is a derivative from the COBRE sample found in the International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html), originally released under Creative Commons - Attribution Non-Commercial.
+![temporal concatenation](concat_diag.png)

-It comprises 10 preprocessed resting-state fMRI selected from 72 patients diagnosed with schizophrenia and 74 healthy controls (6mm isotropic, TR=2s, 150 volumes).
+Resulting in **high dimension** datasets!
+
+Furthermore, with ICA we are typically only interested in a comparitively low dimension decomposition so that we can capture spatially extended networks.
+
+Therefore the first step is to reduce the dimensionality of the data.  This can be achieved in a number of ways, but `melodic`, by default, uses `MIGP`.
+
+> MIGP is an incremental approach that aims to provide a very close approximation to full temporal concatenation followed by PCA, but without the large memory requirements *(Smith et al., 2014)*.
+
+Essentially, MIGP stacks the datasets incrementally in the temporal dimension, and whenever the temporal dimension exceeds a specified size, a PCA-based temporal reduction is performed.
+
+> MIGP does not increase at all in memory requirement with increasing numbers of subjects, no large matrices are ever formed, and the computation time scales linearly with the number of subjects. It is easily parallelisable, simply by applying the approach in parallel to subsets of subjects, and then combining across these with the same “concatenate and reduce” approach described above *(Smith et al., 2014)*.
+
+## This notebook
+
+This notebook will download an open fMRI dataset (~50MB) for use in the MIGP demo, regresses confounds from the data, performs spatial smoothing with 10mm FWHM, and then runs group `melodic` with `MIGP`.

-* [Download the data](#download-the-data)
+* [Fetch the data](#download-the-data)
 * [Clean the data](#clean-the-data)
 * [Run `melodic`](#run-melodic)
 * [Plot group ICs](#plot-group-ics)

 Firstly we will import the necessary packages for this notebook:

 %% Cell type:code id: tags:

 ``` python
 from nilearn import datasets
 from nilearn import image
 from nilearn import plotting
 import nibabel as nb
 import numpy as np
 import os.path as op
 import os
 import glob
 import matplotlib.pyplot as plt
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="download-the-data"></a>
-## Download the data
+## Fetch the data
+
+This data is a derivative from the COBRE sample found in the International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html), originally released under Creative Commons - Attribution Non-Commercial.
+
+It comprises 10 preprocessed resting-state fMRI selected from 72 patients diagnosed with schizophrenia and 74 healthy controls (6mm isotropic, TR=2s, 150 volumes).

 Create a directory in the users home directory to store the downloaded data:

 > **NOTE:** `expanduser` will expand the `~` to the be users home directory:

 %% Cell type:code id: tags:

 ``` python
 data_dir = op.expanduser('~/nilearn_data')

 if not op.exists(data_dir):
    os.makedirs(data_dir)
 ```

 %% Cell type:markdown id: tags:

 Download the data (if not already downloaded):

 > **Note:** We use a method from [`nilearn`](https://nilearn.github.io/index.html) called `fetch_cobre` to download the fMRI data

 %% Cell type:code id: tags:

 ``` python
 d = datasets.fetch_cobre(data_dir=data_dir)
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="clean-the-data"></a>
 ## Clean the data

 Regress confounds from the data and to spatially smooth the data with a gaussian filter of 10mm FWHM.

 > **Note:**
 > 1. We use `clean_img` from the [`nilearn`](https://nilearn.github.io/index.html) package to regress confounds from the data
 > 2. We use `smooth_img` from the [`nilearn`](https://nilearn.github.io/index.html) package to spatially smooth the data
 > 3. `zip` takes iterables and aggregates them in a tuple.  Here it is used to iterate through four lists simultaneously
 > 4. We use list comprehension to loop through all the filenames and append suffixes

 %% Cell type:code id: tags:

 ``` python
 # Create a list of filenames for cleaned and smoothed data
 clean = [f.replace('.nii.gz', '_clean.nii.gz') for f in d.func]
 smooth = [f.replace('.nii.gz', '_clean_smooth.nii.gz') for f in d.func]

 # loop through each subject, regress confounds and smooth
 for img, cleaned, smoothed, conf in zip(d.func, clean, smooth, d.confounds):
    print(f'{img}: regress confounds: ', end='')
    image.clean_img(img, confounds=conf).to_filename(cleaned)
    print(f'smooth.')
    image.smooth_img(img, 10).to_filename(smoothed)
 ```

 %% Cell type:markdown id: tags:

 To run ```melodic``` we will need a brain mask in MNI152 space at the same resolution as the fMRI.

 > **Note:**
 > 1. We use `load_mni152_brain_mask` from the [`nilearn`](https://nilearn.github.io/index.html) package to load the MNI152 mask
 > 2. We use `resample_to_img` from the [`nilearn`](https://nilearn.github.io/index.html) package to resample the mask to the resolution of the fMRI
 > 3. We use `math_img` from the [`nilearn`](https://nilearn.github.io/index.html) package to binarize the resample mask
 > 4. The mask is plotted using `plot_anat` from the [`nilearn`](https://nilearn.github.io/index.html) package

 %% Cell type:code id: tags:

 ``` python
 # load a single fMRI dataset (func0)
 func0 = nb.load(d.func[0].replace('.nii.gz', '_clean_smooth.nii.gz'))

 # load MNI153 brainmask, resample to func0 resolution, binarize, and save to nifti
 mask = datasets.load_mni152_brain_mask()
 mask = image.resample_to_img(mask, func0)
 mask = image.math_img('img > 0.5', img=mask)
 mask.to_filename(op.join(data_dir, 'brain_mask.nii.gz'))

 # plot brainmask to make sure it looks OK
 disp = plotting.plot_anat(image.index_img(func0, 0))
 disp.add_contours(mask, threshold=0.5)
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="run-melodic"></a>
 ### Run ```melodic```

 Generate a command line string and run group ```melodic``` on the smoothed fMRI with a dimension of 10 components:

 > **Note**:
 > 1. Here we use python [f-strings](https://www.python.org/dev/peps/pep-0498/), formally known as literal string interpolation, which allow for easy formatting
 > 2. `op.join` will join path strings using the platform-specific directory separator
 > 3. `','.join(smooth)` will create a comma seprated string of all the items in the list `smooth`

 %% Cell type:code id: tags:

 ``` python
 # generate melodic command line string
 melodic_cmd = f"melodic -i {','.join(smooth)} --mask={op.join(data_dir, 'brain_mask.nii.gz')} -d 10 -v -o cobre.gica "
 print(melodic_cmd)
 ```

 %% Cell type:markdown id: tags:

 > **Note:**
 > 1. Here we use the `!` operator to execute the command in the shell
 > 2. The `{}` will expand the contained python variable in the shell

 %% Cell type:code id: tags:

 ``` python
 # run melodic
 ! {melodic_cmd}
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="plot-group-ics"></a>
 ### Plot group ICs

 Now we can load and plot the group ICs generated by ```melodic```.

 This function will be used to plot ICs:

 > **NOTE:**
 > 1. Here we use `plot_stat_map` from the `nilearn` package to plot the orthographic images
 > 2. `subplots` from `matplotlib.pyplot` creates a figure and multiple subplots
 > 3. `find_xyz_cut_coords` from the `nilearn` package will find the image coordinates of the center of the largest activation connected component
 > 4. `zip` takes iterables and aggregates them in a tuple.  Here it is used to iterate through two lists simultaneously
 > 5. `iter_img` from the `nilearn` package creates an iterator from an image that steps through each volume/time-point of the image

 %% Cell type:code id: tags:

 ``` python
 def map_plot(d):

    N = d.shape[-1]

    fig, ax = plt.subplots(int(np.ceil((N/2))),2, figsize=(12, N))

    for img, ax0 in zip(image.iter_img(d), ax.ravel()):
        coord = plotting.find_xyz_cut_coords(img, activation_threshold=3.5)
        plotting.plot_stat_map(img, cut_coords=coord, vmax=10, axes=ax0)

    return fig
 ```

 %% Cell type:markdown id: tags:

 Hopefully you can see some familiar looking RSN spatial patterns:

 %% Cell type:code id: tags:

 ``` python
 # Load ICs
 ics = nb.load('cobre.gica/melodic_IC.nii.gz')

 # plot
 fig = map_plot(ics)
 ```
-
-%% Cell type:code id: tags:
-
-``` python
-```