Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • fsl/pytreat-practicals-2020
  • mchiew/pytreat-practicals-2020
  • ndcn0236/pytreat-practicals-2020
  • nichols/pytreat-practicals-2020
4 results
Show changes
Commits on Source (92)
Showing
with 4246 additions and 516 deletions
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Modules and packages # Modules and packages
Python gives you a lot of flexibility in how you organise your code. If you Python gives you a lot of flexibility in how you organise your code. If you
want, you can write a Python program just as you would write a Bash script. want, you can write a Python program just as you would write a Bash script.
You don't _have_ to use functions, classes, modules or packages if you don't You don't _have_ to use functions, classes, modules or packages if you don't
want to, or if the script's task does not require them. want to, or if the script task does not require them.
But when your code starts to grow beyond what can reasonably be defined in a But when your code starts to grow beyond what can reasonably be defined in a
single file, you will (hopefully) want to start arranging it in a more single file, you will (hopefully) want to start arranging it in a more
understandable manner. understandable manner.
For this practical we have prepared a handful of example files - you can find For this practical we have prepared a handful of example files - you can find
them alongside this notebook file, in a directory called them alongside this notebook file, in a directory called
`02_modules_and_packages/`. `02_modules_and_packages/`.
## Contents ## Contents
* [What is a module?](#what-is-a-module) * [What is a module?](#what-is-a-module)
* [Importing modules](#importing-modules) * [Importing modules](#importing-modules)
* [Importing specific items from a module](#importing-specific-items-from-a-module) * [Importing specific items from a module](#importing-specific-items-from-a-module)
* [Importing everything from a module](#importing-everything-from-a-module) * [Importing everything from a module](#importing-everything-from-a-module)
* [Module aliases](#module-aliases) * [Module aliases](#module-aliases)
* [What happens when I import a module?](#what-happens-when-i-import-a-module) * [What happens when I import a module?](#what-happens-when-i-import-a-module)
* [How can I make my own modules importable?](#how-can-i-make-my-own-modules-importable) * [How can I make my own modules importable?](#how-can-i-make-my-own-modules-importable)
* [Modules versus scripts](#modules-versus-scripts) * [Modules versus scripts](#modules-versus-scripts)
* [What is a package?](#what-is-a-package) * [What is a package?](#what-is-a-package)
* [`__init__.py`](#init-py) * [`__init__.py`](#init-py)
* [Useful references](#useful-references) * [Useful references](#useful-references)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import os import os
os.chdir('02_modules_and_packages') os.chdir('02_modules_and_packages')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="what-is-a-module"></a> <a class="anchor" id="what-is-a-module"></a>
## What is a module? ## What is a module?
Any file ending with `.py` is considered to be a module in Python. Take a look Any file ending with `.py` is considered to be a module in Python. Take a look
at `02_modules_and_packages/numfuncs.py` - either open it in your editor, or at `02_modules_and_packages/numfuncs.py` - either open it in your editor, or
run this code block: run this code block:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with open('numfuncs.py', 'rt') as f: with open('numfuncs.py', 'rt') as f:
for line in f: for line in f:
print(line.rstrip()) print(line.rstrip())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This is a perfectly valid Python module, although not a particularly useful This is a perfectly valid Python module, although not a particularly useful
one. It contains an attribute called `PI`, and a function `add`. one. It contains an attribute called `PI`, and a function `add`.
<a class="anchor" id="importing-modules"></a> <a class="anchor" id="importing-modules"></a>
## Importing modules ## Importing modules
Before we can use our module, we must `import` it. Importing a module in Before we can use our module, we must `import` it. Importing a module in
Python will make its contents available in the local scope. We can import the Python will make its contents available in the local scope. We can import the
contents of `numfuncs` like so: contents of `numfuncs` like so:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numfuncs import numfuncs
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This imports `numfuncs` into the local scope - everything defined in the This imports `numfuncs` into the local scope - everything defined in the
`numfuncs` module can be accessed by prefixing it with `numfuncs.`: `numfuncs` module can be accessed by prefixing it with `numfuncs.`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('PI:', numfuncs.PI) print('PI:', numfuncs.PI)
print(numfuncs.add(1, 50)) print(numfuncs.add(1, 50))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
There are a couple of other ways to import items from a module... There are a couple of other ways to import items from a module...
<a class="anchor" id="importing-specific-items-from-a-module"></a> <a class="anchor" id="importing-specific-items-from-a-module"></a>
### Importing specific items from a module ### Importing specific items from a module
If you only want to use one, or a few items from a module, you can import just If you only want to use one, or a few items from a module, you can import just
those items - a reference to just those items will be created in the local those items - a reference to just those items will be created in the local
scope: scope:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from numfuncs import add from numfuncs import add
print(add(1, 3)) print(add(1, 3))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="importing-everything-from-a-module"></a> <a class="anchor" id="importing-everything-from-a-module"></a>
### Importing everything from a module ### Importing everything from a module
It is possible to import _everything_ that is defined in a module like so: It is possible to import _everything_ that is defined in a module like so:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from numfuncs import * from numfuncs import *
print('PI: ', PI) print('PI: ', PI)
print(add(1, 5)) print(add(1, 5))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
__PLEASE DON'T DO THIS!__ Because every time you do, somewhere in the world, a __PLEASE DON'T DO THIS!__ Because every time you do, somewhere in the world, a
software developer will will spontaneously stub his/her toe, and start crying. software developer will spontaneously stub his/her toe, and start crying.
Using this approach can make more complicated programs very difficult to read, Using this approach can make more complicated programs very difficult to read,
because it is not possible to determine the origin of the functions and because it is not possible to determine the origin of the functions and
attributes that are being used. attributes that are being used.
And naming collisions are inevitable when importing multiple modules in this And naming collisions are inevitable when importing multiple modules in this
way, making it very difficult for somebody else to figure out what your code way, making it very difficult for somebody else to figure out what your code
is doing: is doing:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from numfuncs import * from numfuncs import *
from strfuncs import * from strfuncs import *
print(add(1, 5)) print(add(1, 5))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Instead, it is better to give modules a name when you import them. While this Instead, it is better to give modules a name when you import them. While this
requires you to type more code, the benefits of doing this far outweigh the requires you to type more code, the benefits of doing this far outweigh the
hassle of typing a few extra characters - it becomes much easier to read and hassle of typing a few extra characters - it becomes much easier to read and
trace through code when the functions you use are accessed through a namespace trace through code when the functions you use are accessed through a namespace
for each module: for each module:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numfuncs import numfuncs
import strfuncs import strfuncs
print('number add: ', numfuncs.add(1, 2)) print('number add: ', numfuncs.add(1, 2))
print('string add: ', strfuncs.add(1, 2)) print('string add: ', strfuncs.add(1, 2))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="module-aliases"></a> <a class="anchor" id="module-aliases"></a>
### Module aliases ### Module aliases
And Python allows you to define an _alias_ for a module when you import it, And Python allows you to define an _alias_ for a module when you import it,
so you don't necessarily need to type out the full module name each time so you don't necessarily need to type out the full module name each time
you want to access something inside: you want to access something inside:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numfuncs as nf import numfuncs as nf
import strfuncs as sf import strfuncs as sf
print('number add: ', nf.add(1, 2)) print('number add: ', nf.add(1, 2))
print('string add: ', sf.add(1, 2)) print('string add: ', sf.add(1, 2))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You have already seen this in the earlier practicals - here are a few You have already seen this in the earlier practicals - here are a few
aliases which have become a de-facto standard for commonly used Python aliases which have become a de-facto standard for commonly used Python
modules: modules:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import os.path as op import os.path as op
import numpy as np import numpy as np
import nibabel as nib import nibabel as nib
import matplotlib as mpl import matplotlib as mpl
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="what-happens-when-i-import-a-module"></a> <a class="anchor" id="what-happens-when-i-import-a-module"></a>
### What happens when I import a module? ### What happens when I import a module?
When you `import` a module, the contents of the module file are literally When you `import` a module, the contents of the module file are literally
executed by the Python runtime, exactly the same as if you had typed its executed by the Python runtime, exactly the same as if you had typed its
contents into `ipython`. Any attributes, functions, or classes which are contents into `ipython`. Any attributes, functions, or classes which are
defined in the module will be bundled up into an object that represents the defined in the module will be bundled up into an object that represents the
module, and through which you can access the module's contents. module, and through which you can access the module's contents.
When we typed `import numfuncs` in the examples above, the following events When we typed `import numfuncs` in the examples above, the following events
occurred: occurred:
1. Python created a `module` object to represent the module. 1. Python created a `module` object to represent the module.
2. The `numfuncs.py` file was read and executed, and all of the items defined 2. The `numfuncs.py` file was read and executed, and all of the items defined
inside `numfuncs.py` (i.e. the `PI` attribute and the `add` function) were inside `numfuncs.py` (i.e. the `PI` attribute and the `add` function) were
added to the `module` object. added to the `module` object.
3. A local variable called `numfuncs`, pointing to the `module` object, 3. A local variable called `numfuncs`, pointing to the `module` object,
was added to the local scope. was added to the local scope.
Because module files are literally executed on import, any statements in the Because module files are literally executed on import, any statements in the
module file which are not encapsulated inside a class or function will be module file which are not encapsulated inside a class or function will be
executed. As an example, take a look at the file `sideeffects.py`. Let's executed. As an example, take a look at the file `sideeffects.py`. Let's
import it and see what happens: import it and see what happens:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import sideeffects import sideeffects
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Ok, hopefully that wasn't too much of a surprise. Something which may be less Ok, hopefully that wasn't too much of a surprise. Something which may be less
intuitive, however, is that a module's contents will only be executed on the intuitive, however, is that a module's contents will only be executed on the
_first_ time that it is imported. After the first import, Python caches the _first_ time that it is imported. After the first import, Python caches the
module's contents (all loaded modules are accessible through module's contents (all loaded modules are accessible through
[`sys.modules`](https://docs.python.org/3.5/library/sys.html#sys.modules)). On [`sys.modules`](https://docs.python.org/3.5/library/sys.html#sys.modules)). On
subsequent imports, the cached version of the module is returned: subsequent imports, the cached version of the module is returned:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import sideeffects import sideeffects
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="how-can-i-make-my-own-modules-importable"></a> <a class="anchor" id="how-can-i-make-my-own-modules-importable"></a>
### How can I make my own modules importable? ### How can I make my own modules importable?
When you `import` a module, Python searches for it in the following locations, When you `import` a module, Python searches for it in the following locations,
in the following order: in the following order:
1. Built-in modules (e.g. `os`, `sys`, etc.). 1. Built-in modules (e.g. `os`, `sys`, etc.).
2. In the current directory or, if a script has been executed, in the directory 2. In the current directory or, if a script has been executed, in the directory
containing that script. containing that script.
3. In directories listed in the `$PYTHONPATH` environment variable. 3. In directories listed in the `$PYTHONPATH` environment variable.
4. In installed third-party libraries (e.g. `numpy`). 4. In installed third-party libraries (e.g. `numpy`).
If you are experimenting or developing your program, the quickest and easiest If you are experimenting or developing your program, the quickest and easiest
way to make your module(s) importable is to add their containing directory to way to make your module(s) importable is to add their containing directory to
the `PYTHONPATH`. But if you are developing a larger piece of software, you the `PYTHONPATH`. But if you are developing a larger piece of software, you
should probably organise your modules into _packages_, which are [described should probably organise your modules into *packages*, which are [described
below](#what-is-a-package). below](#what-is-a-package).
<a class="anchor" id="modules-versus-scripts"></a> <a class="anchor" id="modules-versus-scripts"></a>
## Modules versus scripts ## Modules versus scripts
You now know that Python treats all files ending in `.py` as importable You now know that Python treats all files ending in `.py` as importable
modules. But all files ending in `.py` can also be treated as scripts. In modules. But all files ending in `.py` can also be treated as scripts. In
fact, there no difference between a _module_ and a _script_ - any `.py` file fact, there no difference between a _module_ and a _script_ - any `.py` file
can be executed as a script, or imported as a module, or both. can be executed as a script, or imported as a module, or both.
Have a look at the file `02_modules_and_packages/module_and_script.py`: Have a look at the file `02_modules_and_packages/module_and_script.py`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with open('module_and_script.py', 'rt') as f: with open('module_and_script.py', 'rt') as f:
for line in f: for line in f:
print(line.rstrip()) print(line.rstrip())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This file contains two functions `mul` and `main`. The This file contains two functions `mul` and `main`. The
`if __name__ == '__main__':` clause at the bottom is a standard trick in Python `if __name__ == '__main__':` clause at the bottom is a standard trick in Python
that allows you to add code to a file that is _only executed when the module is that allows you to add code to a file that is _only executed when the module is
called as a script_. Try it in a terminal now: called as a script_. Try it in a terminal now:
> `python 02_modules_and_packages/module_and_script.py` > `python 02_modules_and_packages/module_and_script.py`
But if we `import` this module from another file, or from an interactive But if we `import` this module from another file, or from an interactive
session, the code within the `if __name__ == '__main__':` clause will not be session, the code within the `if __name__ == '__main__':` clause will not be
executed, and we can access its functions just like any other module that we executed, and we can access its functions just like any other module that we
import. import.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import module_and_script as mas import module_and_script as mas
a = 1.5 a = 1.5
b = 3 b = 3
print('mul({}, {}): {}'.format(a, b, mas.mul(a, b))) print('mul({}, {}): {}'.format(a, b, mas.mul(a, b)))
print('calling main...') print('calling main...')
mas.main([str(a), str(b)]) mas.main([str(a), str(b)])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="what-is-a-package"></a> <a class="anchor" id="what-is-a-package"></a>
## What is a package? ## What is a package?
You now know how to split your Python code up into separate files You now know how to split your Python code up into separate files
(a.k.a. _modules_). When your code grows beyond a handful of files, you may (a.k.a. *modules*). When your code grows beyond a handful of files, you may
wish for more fine-grained control over the namespaces in which your modules wish for more fine-grained control over the namespaces in which your modules
live. Python has another feature which allows you to organise your modules live. Python has another feature which allows you to organise your modules
into _packages_. into *packages*.
A package in Python is simply a directory which: A package in Python is simply a directory which:
* Contains a special file called `__init__.py` * Contains a special file called `__init__.py`
* May contain one or more module files (any other files ending in `*.py`) * May contain one or more module files (any other files ending in `*.py`)
* May contain other package directories. * May contain other package directories.
For example, the [FSLeyes](https://git.fmrib.ox.ac.uk/fsl/fsleyes/fsleyes) For example, the [FSLeyes](https://git.fmrib.ox.ac.uk/fsl/fsleyes/fsleyes)
code is organised into packages and sub-packages as follows (abridged): code is organised into packages and sub-packages as follows (abridged):
> ``` > ```
> fsleyes/ > fsleyes/
> __init__.py > __init__.py
> main.py > main.py
> frame.py > frame.py
> views/ > views/
> __init__.py > __init__.py
> orthopanel.py > orthopanel.py
> lightboxpanel.py > lightboxpanel.py
> controls/ > controls/
> __init__.py > __init__.py
> locationpanel.py > locationpanel.py
> overlaylistpanel.py > overlaylistpanel.py
> ``` > ```
Within a package structure, we will typically still import modules directly, Within a package structure, we will typically still import modules directly,
via their full path within the package: via their full path within the package:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import fsleyes.main as fmain import fsleyes.main as fmain
fmain.fsleyes_main() fmain.fsleyes_main()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="init-py"></a> <a class="anchor" id="init-py"></a>
### `__init__.py` ### `__init__.py`
Every Python package must have an `__init__.py` file. In many cases, this will Every Python package must have an `__init__.py` file. In many cases, this will
actually be an empty file, and you don't need to worry about it any more, apart actually be an empty file, and you don't need to worry about it any more, apart
from knowing that it is needed. But you can use `__init__.py` to perform some from knowing that it is needed. But you can use `__init__.py` to perform some
package-specific initialisation, and/or to customise the package's namespace. package-specific initialisation, and/or to customise the package's namespace.
As an example, take a look the `02_modules_and_packages/fsleyes/__init__.py` As an example, take a look the `02_modules_and_packages/fsleyes/__init__.py`
file in our mock FSLeyes package. We have imported the `fsleyes_main` function file in our mock FSLeyes package. We have imported the `fsleyes_main` function
from the `fsleyes.main` module, making it available at the package level. So from the `fsleyes.main` module, making it available at the package level. So
instead of importing the `fsleyes.main` module, we could instead just import instead of importing the `fsleyes.main` module, we could instead just import
the `fsleyes` package: the `fsleyes` package:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import fsleyes import fsleyes
fsleyes.fsleyes_main() fsleyes.fsleyes_main()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="useful-references"></a> <a class="anchor" id="useful-references"></a>
## Useful references ## Useful references
* [Modules and packages in Python](https://docs.python.org/3.5/tutorial/modules.html) * [Modules and packages in Python](https://docs.python.org/3/tutorial/modules.html)
* [Using `__init__.py`](http://mikegrouchy.com/blog/2012/05/be-pythonic-__init__py.html) * [Using `__init__.py`](http://mikegrouchy.com/blog/2012/05/be-pythonic-__init__py.html)
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
Python gives you a lot of flexibility in how you organise your code. If you Python gives you a lot of flexibility in how you organise your code. If you
want, you can write a Python program just as you would write a Bash script. want, you can write a Python program just as you would write a Bash script.
You don't _have_ to use functions, classes, modules or packages if you don't You don't _have_ to use functions, classes, modules or packages if you don't
want to, or if the script's task does not require them. want to, or if the script task does not require them.
But when your code starts to grow beyond what can reasonably be defined in a But when your code starts to grow beyond what can reasonably be defined in a
...@@ -113,7 +113,7 @@ print(add(1, 5)) ...@@ -113,7 +113,7 @@ print(add(1, 5))
__PLEASE DON'T DO THIS!__ Because every time you do, somewhere in the world, a __PLEASE DON'T DO THIS!__ Because every time you do, somewhere in the world, a
software developer will will spontaneously stub his/her toe, and start crying. software developer will spontaneously stub his/her toe, and start crying.
Using this approach can make more complicated programs very difficult to read, Using this approach can make more complicated programs very difficult to read,
because it is not possible to determine the origin of the functions and because it is not possible to determine the origin of the functions and
attributes that are being used. attributes that are being used.
...@@ -242,7 +242,7 @@ in the following order: ...@@ -242,7 +242,7 @@ in the following order:
If you are experimenting or developing your program, the quickest and easiest If you are experimenting or developing your program, the quickest and easiest
way to make your module(s) importable is to add their containing directory to way to make your module(s) importable is to add their containing directory to
the `PYTHONPATH`. But if you are developing a larger piece of software, you the `PYTHONPATH`. But if you are developing a larger piece of software, you
should probably organise your modules into _packages_, which are [described should probably organise your modules into *packages*, which are [described
below](#what-is-a-package). below](#what-is-a-package).
...@@ -298,10 +298,10 @@ mas.main([str(a), str(b)]) ...@@ -298,10 +298,10 @@ mas.main([str(a), str(b)])
You now know how to split your Python code up into separate files You now know how to split your Python code up into separate files
(a.k.a. _modules_). When your code grows beyond a handful of files, you may (a.k.a. *modules*). When your code grows beyond a handful of files, you may
wish for more fine-grained control over the namespaces in which your modules wish for more fine-grained control over the namespaces in which your modules
live. Python has another feature which allows you to organise your modules live. Python has another feature which allows you to organise your modules
into _packages_. into *packages*.
A package in Python is simply a directory which: A package in Python is simply a directory which:
...@@ -367,5 +367,5 @@ fsleyes.fsleyes_main() ...@@ -367,5 +367,5 @@ fsleyes.fsleyes_main()
<a class="anchor" id="useful-references"></a> <a class="anchor" id="useful-references"></a>
## Useful references ## Useful references
* [Modules and packages in Python](https://docs.python.org/3.5/tutorial/modules.html) * [Modules and packages in Python](https://docs.python.org/3/tutorial/modules.html)
* [Using `__init__.py`](http://mikegrouchy.com/blog/2012/05/be-pythonic-__init__py.html) * [Using `__init__.py`](http://mikegrouchy.com/blog/2012/05/be-pythonic-__init__py.html)
\ No newline at end of file
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Object-oriented programming in Python # Object-oriented programming in Python
By now you might have realised that __everything__ in Python is an By now you might have realised that __everything__ in Python is an
object. Strings are objects, numbers are objects, functions are objects, object. Strings are objects, numbers are objects, functions are objects,
modules are objects - __everything__ is an object! modules are objects - __everything__ is an object!
But this does not mean that you have to use Python in an object-oriented But this does not mean that you have to use Python in an object-oriented
fashion. You can stick with functions and statements, and get quite a lot fashion. You can stick with functions and statements, and get quite a lot
done. But some problems are just easier to solve, and to reason about, when done. But some problems are just easier to solve, and to reason about, when
you use an object-oriented approach. you use an object-oriented approach.
* [Objects versus classes](#objects-versus-classes) * [Objects versus classes](#objects-versus-classes)
* [Defining a class](#defining-a-class) * [Defining a class](#defining-a-class)
* [Object creation - the `__init__` method](#object-creation-the-init-method) * [Object creation - the `__init__` method](#object-creation-the-init-method)
* [Our method is called `__init__`, but we didn't actually call the `__init__` method!](#our-method-is-called-init) * [Our method is called `__init__`, but we didn't actually call the `__init__` method!](#our-method-is-called-init)
* [We didn't specify the `self` argument - what gives?!?](#we-didnt-specify-the-self-argument) * [We didn't specify the `self` argument - what gives?!?](#we-didnt-specify-the-self-argument)
* [Attributes](#attributes) * [Attributes](#attributes)
* [Methods](#methods) * [Methods](#methods)
* [Method chaining](#method-chaining)
* [Protecting attribute access](#protecting-attribute-access) * [Protecting attribute access](#protecting-attribute-access)
* [A better way - properties](#a-better-way-properties]) * [A better way - properties](#a-better-way-properties])
* [Inheritance](#inheritance) * [Inheritance](#inheritance)
* [The basics](#the-basics) * [The basics](#the-basics)
* [Code re-use and problem decomposition](#code-re-use-and-problem-decomposition) * [Code re-use and problem decomposition](#code-re-use-and-problem-decomposition)
* [Polymorphism](#polymorphism) * [Polymorphism](#polymorphism)
* [Multiple inheritance](#multiple-inheritance) * [Multiple inheritance](#multiple-inheritance)
* [Class attributes and methods](#class-attributes-and-methods) * [Class attributes and methods](#class-attributes-and-methods)
* [Class attributes](#class-attributes) * [Class attributes](#class-attributes)
* [Class methods](#class-methods) * [Class methods](#class-methods)
* [Appendix: The `object` base-class](#appendix-the-object-base-class) * [Appendix: The `object` base-class](#appendix-the-object-base-class)
* [Appendix: `__init__` versus `__new__`](#appendix-init-versus-new) * [Appendix: `__init__` versus `__new__`](#appendix-init-versus-new)
* [Appendix: Monkey-patching](#appendix-monkey-patching) * [Appendix: Monkey-patching](#appendix-monkey-patching)
* [Appendix: Method overloading](#appendix-method-overloading) * [Appendix: Method overloading](#appendix-method-overloading)
* [Useful references](#useful-references) * [Useful references](#useful-references)
<a class="anchor" id="objects-versus-classes"></a> <a class="anchor" id="objects-versus-classes"></a>
## Objects versus classes ## Objects versus classes
If you are versed in C++, Java, C#, or some other object-oriented language, If you are versed in C++, Java, C#, or some other object-oriented language,
then this should all hopefully sound familiar, and you can skip to the next then this should all hopefully sound familiar, and you can skip to the next
section. section.
If you have not done any object-oriented programming before, your first step If you have not done any object-oriented programming before, your first step
is to understand the difference between _objects_ (also known as is to understand the difference between *objects* (also known as
_instances_) and _classes_ (also known as _types_). *instances*) and *classes* (also known as *types*).
If you have some experience in C, then you can start off by thinking of a If you have some experience in C, then you can start off by thinking of a
class as like a `struct` definition - a `struct` is a specification for the class as like a `struct` definition - a `struct` is a specification for the
layout of a chunk of memory. For example, here is a typical struct definition: layout of a chunk of memory. For example, here is a typical struct definition:
> ``` > ```
> /** > /**
> * Struct representing a stack. > * Struct representing a stack.
> */ > */
> typedef struct __stack { > typedef struct __stack {
> uint8_t capacity; /**< the maximum capacity of this stack */ > uint8_t capacity; /**< the maximum capacity of this stack */
> uint8_t size; /**< the current size of this stack */ > uint8_t size; /**< the current size of this stack */
> void **top; /**< pointer to the top of this stack */ > void **top; /**< pointer to the top of this stack */
> } stack_t; > } stack_t;
> ``` > ```
Now, an _object_ is not a definition, but rather a thing which resides in Now, an *object* is not a definition, but rather a thing which resides in
memory. An object can have _attributes_ (pieces of information), and _methods_ memory. An object can have *attributes* (pieces of information), and *methods*
(functions associated with the object). You can pass objects around your code, (functions associated with the object). You can pass objects around your code,
manipulate their attributes, and call their methods. manipulate their attributes, and call their methods.
Returning to our C metaphor, you can think of an object as like an Returning to our C metaphor, you can think of an object as like an
instantiation of a struct: instantiation of a struct:
> ``` > ```
> stack_t stack; > stack_t stack;
> stack.capacity = 10; > stack.capacity = 10;
> ``` > ```
One of the major differences between a `struct` in C, and a `class` in Python One of the major differences between a `struct` in C, and a `class` in Python
and other object oriented languages, is that you can't (easily) add functions and other object oriented languages, is that you can't (easily) add functions
to a `struct` - it is just a chunk of memory. Whereas in Python, you can add to a `struct` - it is just a chunk of memory. Whereas in Python, you can add
functions to your class definition, which will then be added as methods when functions to your class definition, which will then be added as methods when
you create an object from that class. you create an object from that class.
Of course there are many more differences between C structs and classes (most Of course there are many more differences between C structs and classes (most
notably [inheritance](todo), [polymorphism](todo), and [access notably [inheritance](todo), [polymorphism](todo), and [access
protection](todo)). But if you can understand the difference between a protection](todo)). But if you can understand the difference between a
_definition_ of a C struct, and an _instantiation_ of that struct, then you *definition* of a C struct, and an *instantiation* of that struct, then you
are most of the way towards understanding the difference between a _class_, are most of the way towards understanding the difference between a *class*,
and an _object_. and an *object*.
> But just to confuse you, remember that in Python, __everything__ is an > But just to confuse you, remember that in Python, **everything** is an
> object - even classes! > object - even classes!
<a class="anchor" id="defining-a-class"></a> <a class="anchor" id="defining-a-class"></a>
## Defining a class ## Defining a class
Defining a class in Python is simple. Let's take on a small project, by Defining a class in Python is simple. Let's take on a small project, by
developing a class which can be used in place of the `fslmaths` shell command. developing a class which can be used in place of the `fslmaths` shell command.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
pass pass
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In this statement, we defined a new class called `FSLMaths`, which inherits In this statement, we defined a new class called `FSLMaths`, which inherits
from the built-in `object` base-class (see [below](inheritance) for more from the built-in `object` base-class (see [below](inheritance) for more
details on inheritance). details on inheritance).
Now that we have defined our class, we can create objects - instances of that Now that we have defined our class, we can create objects - instances of that
class - by calling the class itself, as if it were a function: class - by calling the class itself, as if it were a function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fm1 = FSLMaths() fm1 = FSLMaths()
fm2 = FSLMaths() fm2 = FSLMaths()
print(fm1) print(fm1)
print(fm2) print(fm2)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Although these objects are not of much use at this stage. Let's do some more Although these objects are not of much use at this stage. Let's do some more
work. work.
<a class="anchor" id="object-creation-the-init-method"></a> <a class="anchor" id="object-creation-the-init-method"></a>
## Object creation - the `__init__` method ## Object creation - the `__init__` method
The first thing that our `fslmaths` replacement will need is an input image. The first thing that our `fslmaths` replacement will need is an input image.
It makes sense to pass this in when we create an `FSLMaths` object: It makes sense to pass this in when we create an `FSLMaths` object:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here we have added a _method_ called `__init__` to our class (remember that a Here we have added a _method_ called `__init__` to our class (remember that a
_method_ is just a function which is defined in a class, and which can be _method_ is just a function which is defined in a class, and which can be
called on instances of that class). This method expects two arguments - called on instances of that class). This method expects two arguments -
`self`, and `inimg`. So now, when we create an instance of the `FSLMaths` `self`, and `inimg`. So now, when we create an instance of the `FSLMaths`
class, we will need to provide an input image: class, we will need to provide an input image:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import nibabel as nib import nibabel as nib
import os.path as op import os.path as op
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
There are a couple of things to note here... There are a couple of things to note here...
<a class="anchor" id="our-method-is-called-init"></a> <a class="anchor" id="our-method-is-called-init"></a>
### Our method is called `__init__`, but we didn't actually call the `__init__` method! ### Our method is called `__init__`, but we didn't actually call the `__init__` method!
`__init__` is a special method in Python - it is called when an instance of a `__init__` is a special method in Python - it is called when an instance of a
class is created. And recall that we can create an instance of a class by class is created. And recall that we can create an instance of a class by
calling the class in the same way that we call a function. calling the class in the same way that we call a function.
There are a number of "special" methods that you can add to a class in Python There are a number of "special" methods that you can add to a class in Python
to customise various aspects of how instances of the class behave. One of the to customise various aspects of how instances of the class behave. One of the
first ones you may come across is the `__str__` method, which defines how an first ones you may come across is the `__str__` method, which defines how an
object should be printed (more specifically, how an object gets converted into object should be printed (more specifically, how an object gets converted into
a string). For example, we could add a `__str__` method to our `FSLMaths` a string). For example, we could add a `__str__` method to our `FSLMaths`
class like so: class like so:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
def __str__(self): def __str__(self):
return 'FSLMaths({})'.format(self.img.get_filename()) return 'FSLMaths({})'.format(self.img.get_filename())
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
print(fm) print(fm)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Refer to the [official Refer to the [official
docs](https://docs.python.org/3.5/reference/datamodel.html#special-method-names) docs](https://docs.python.org/3/reference/datamodel.html#special-method-names)
for details on all of the special methods that can be defined in a class. And for details on all of the special methods that can be defined in a class. And
take a look at the appendix for some more details on [how Python objects get take a look at the appendix for some more details on [how Python objects get
created](appendix-init-versus-new). created](appendix-init-versus-new).
<a class="anchor" id="we-didnt-specify-the-self-argument"></a> <a class="anchor" id="we-didnt-specify-the-self-argument"></a>
### We didn't specify the `self` argument - what gives?!? ### We didn't specify the `self` argument - what gives?!?
The `self` argument is a special argument for methods in Python. If you are The `self` argument is a special argument for methods in Python. If you are
coming from C++, Java, C# or similar, `self` in Python is equivalent to `this` coming from C++, Java, C# or similar, `self` in Python is equivalent to `this`
in those languages. in those languages.
In a method, the `self` argument is a reference to the object that the method In a method, the `self` argument is a reference to the object that the method
was called on. So in this line of code: was called on. So in this line of code:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
the `self` argument in `__init__` will be a reference to the `FSLMaths` object the `self` argument in `__init__` will be a reference to the `FSLMaths` object
that has been created (and is then assigned to the `fm` variable, after the that has been created (and is then assigned to the `fm` variable, after the
`__init__` method has finished). `__init__` method has finished).
But note that you __do not__ need to explicitly provide the `self` argument But note that you __do not__ need to explicitly provide the `self` argument
when you call a method on an object, or when you create a new object. The when you call a method on an object, or when you create a new object. The
Python runtime will take care of passing the instance to its method, as the Python runtime will take care of passing the instance to its method, as the
first argument to the method. first argument to the method.
But when you are writing a class, you __do__ need to explicitly list `self` as But when you are writing a class, you __do__ need to explicitly list `self` as
the first argument to all of the methods of the class. the first argument to all of the methods of the class.
<a class="anchor" id="attributes"></a> <a class="anchor" id="attributes"></a>
## Attributes ## Attributes
In Python, the term __attribute__ is used to refer to a piece of information In Python, the term __attribute__ is used to refer to a piece of information
that is associated with an object. An attribute is generally a reference to that is associated with an object. An attribute is generally a reference to
another object (which might be a string, a number, or a list, or some other another object (which might be a string, a number, or a list, or some other
more complicated object). more complicated object).
Remember that we modified our `FSLMaths` class so that it is passed an input Remember that we modified our `FSLMaths` class so that it is passed an input
image on creation: image on creation:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fm = FSLMaths(nib.load(fpath)) fm = FSLMaths(nib.load(fpath))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Take a look at what is going on in the `__init__` method - we take the `inimg` Take a look at what is going on in the `__init__` method - we take the `inimg`
argument, and create a reference to it called `self.img`. We have added an argument, and create a reference to it called `self.img`. We have added an
_attribute_ to the `FSLMaths` instance, called `img`, and we can access that _attribute_ to the `FSLMaths` instance, called `img`, and we can access that
attribute like so: attribute like so:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('Input for our FSLMaths instance: {}'.format(fm.img.get_filename())) print('Input for our FSLMaths instance: {}'.format(fm.img.get_filename()))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And that concludes the section on adding attributes to Python objects. And that concludes the section on adding attributes to Python objects.
Just kidding. But it really is that simple. This is one aspect of Python which Just kidding. But it really is that simple. This is one aspect of Python which
might be quite jarring to you if you are coming from a language with more might be quite jarring to you if you are coming from a language with more
rigid semantics, such as C++ or Java. In those languages, you must pre-specify rigid semantics, such as C++ or Java. In those languages, you must pre-specify
all of the attributes and methods that are a part of a class. But Python is all of the attributes and methods that are a part of a class. But Python is
much more flexible - you can simply add attributes to an object after it has much more flexible - you can simply add attributes to an object after it has
been created. In fact, you can even do this outside of the class been created. In fact, you can even do this outside of the class
definition<sup>1</sup>: definition<sup>1</sup>:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
fm.another_attribute = 'Haha' fm.another_attribute = 'Haha'
print(fm.another_attribute) print(fm.another_attribute)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
__But ...__ while attributes can be added to a Python object at any time, it is __But ...__ while attributes can be added to a Python object at any time, it is
good practice (and makes for more readable and maintainable code) to add all good practice (and makes for more readable and maintainable code) to add all
of an object's attributes within the `__init__` method. of an object's attributes within the `__init__` method.
> <sup>1</sup>This not possible with many of the built-in types, such as > <sup>1</sup>This not possible with many of the built-in types, such as
> `list` and `dict` objects, nor with types that are defined in Python > `list` and `dict` objects, nor with types that are defined in Python
> extensions (Python modules that are written in C). > extensions (Python modules that are written in C).
<a class="anchor" id="methods"></a> <a class="anchor" id="methods"></a>
## Methods ## Methods
We've been dilly-dallying on this little `FSLMaths` project for a while now, We've been dilly-dallying on this little `FSLMaths` project for a while now,
but our class still can't actually do anything. Let's start adding some but our class still can't actually do anything. Let's start adding some
functionality: functionality:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
self.operations = [] self.operations = []
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Woah woah, [slow down egg-head!](https://www.youtube.com/watch?v=yz-TemWooa4) Woah woah, [slow down egg-head!](https://www.youtube.com/watch?v=yz-TemWooa4)
We've modified `__init__` so that a second attribute called `operations` is We've modified `__init__` so that a second attribute called `operations` is
added to our object - this `operations` attribute is simply a list. added to our object - this `operations` attribute is simply a list.
Then, we added a handful of methods - `add`, `mul`, and `div` - which each Then, we added a handful of methods - `add`, `mul`, and `div` - which each
append a tuple to that `operations` list. append a tuple to that `operations` list.
> Note that, just like in the `__init__` method, the first argument that will > Note that, just like in the `__init__` method, the first argument that will
> be passed to these methods is `self` - a reference to the object that the > be passed to these methods is `self` - a reference to the object that the
> method has been called on. > method has been called on.
The idea behind this design is that our `FSLMaths` class will not actually do The idea behind this design is that our `FSLMaths` class will not actually do
anything when we call the `add`, `mul` or `div` methods. Instead, it will anything when we call the `add`, `mul` or `div` methods. Instead, it will
"stage" each operation, and then perform them all in one go. So let's add *stage* each operation, and then perform them all in one go at a later point
another method, `run`, which actually does the work: in time. So let's add another method, `run`, which actually does the work:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
import nibabel as nib import nibabel as nib
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
self.operations = [] self.operations = []
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
def run(self, output=None): def run(self, output=None):
data = np.array(self.img.get_data()) data = np.array(self.img.get_data())
for oper, value in self.operations: for oper, value in self.operations:
# Value could be an image. # Value could be an image.
# If not, we assume that # If not, we assume that
# it is a scalar/numpy array. # it is a scalar/numpy array.
if isinstance(value, nib.nifti1.Nifti1Image): if isinstance(value, nib.nifti1.Nifti1Image):
value = value.get_data() value = value.get_data()
if oper == 'add': if oper == 'add':
data = data + value data = data + value
elif oper == 'mul': elif oper == 'mul':
data = data * value data = data * value
elif oper == 'div': elif oper == 'div':
data = data / value data = data / value
# turn final output into a nifti, # turn final output into a nifti,
# and save it to disk if an # and save it to disk if an
# 'output' has been specified. # 'output' has been specified.
outimg = nib.nifti1.Nifti1Image(data, inimg.affine) outimg = nib.nifti1.Nifti1Image(data, inimg.affine)
if output is not None: if output is not None:
nib.save(outimg, output) nib.save(outimg, output)
return outimg return outimg
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We now have a useable (but not very useful) `FSLMaths` class! We now have a useable (but not very useful) `FSLMaths` class!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz') fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
mask = nib.load(fmask) mask = nib.load(fmask)
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
fm.mul(mask) fm.mul(mask)
fm.add(-10) fm.add(-10)
outimg = fm.run() outimg = fm.run()
norigvox = (inimg .get_data() > 0).sum() norigvox = (inimg .get_data() > 0).sum()
nmaskvox = (outimg.get_data() > 0).sum() nmaskvox = (outimg.get_data() > 0).sum()
print('Number of voxels >0 in original image: {}'.format(norigvox)) print('Number of voxels >0 in original image: {}'.format(norigvox))
print('Number of voxels >0 in masked image: {}'.format(nmaskvox)) print('Number of voxels >0 in masked image: {}'.format(nmaskvox))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="method-chaining"></a>
## Method chaining
A neat trick, which is used by all the cool kids these days, is to write
classes that allow *method chaining* - writing one line of code which
calls more than one method on an object, e.g.:
> ```
> fm = FSLMaths(img)
> result = fm.add(1).mul(10).run()
> ```
Adding this feature to our budding `FSLMaths` class is easy - all we have
to do is return `self` from each method:
%% Cell type:code id: tags:
```
import numpy as np
import nibabel as nib
class FSLMaths(object):
def __init__(self, inimg):
self.img = inimg
self.operations = []
def add(self, value):
self.operations.append(('add', value))
return self
def mul(self, value):
self.operations.append(('mul', value))
return self
def div(self, value):
self.operations.append(('div', value))
return self
def run(self, output=None):
data = np.array(self.img.get_data())
for oper, value in self.operations:
# Value could be an image.
# If not, we assume that
# it is a scalar/numpy array.
if isinstance(value, nib.nifti1.Nifti1Image):
value = value.get_data()
if oper == 'add':
data = data + value
elif oper == 'mul':
data = data * value
elif oper == 'div':
data = data / value
# turn final output into a nifti,
# and save it to disk if an
# 'output' has been specified.
outimg = nib.nifti1.Nifti1Image(data, inimg.affine)
if output is not None:
nib.save(outimg, output)
return outimg
```
%% Cell type:markdown id: tags:
Now we can chain all of our method calls, and even the creation of our
`FSLMaths` object, into a single line:
%% Cell type:code id: tags:
```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath)
mask = nib.load(fmask)
outimg = FSLMaths(inimg).mul(mask).add(-10).run()
norigvox = (inimg .get_data() > 0).sum()
nmaskvox = (outimg.get_data() > 0).sum()
print('Number of voxels >0 in original image: {}'.format(norigvox))
print('Number of voxels >0 in masked image: {}'.format(nmaskvox))
```
%% Cell type:markdown id: tags:
> In fact, this is precisely how the
> [`fsl.wrappers.fslmaths`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.wrappers.fslmaths.html)
> function works.
<a class="anchor" id="protecting-attribute-access"></a> <a class="anchor" id="protecting-attribute-access"></a>
## Protecting attribute access ## Protecting attribute access
In our `FSLMaths` class, the input image was added as an attribute called In our `FSLMaths` class, the input image was added as an attribute called
`img` to `FSLMaths` objects. We saw that it is easy to read the attributes `img` to `FSLMaths` objects. We saw that it is easy to read the attributes
of an object - if we have a `FSLMaths` instance called `fm`, we can read its of an object - if we have a `FSLMaths` instance called `fm`, we can read its
input image via `fm.img`. input image via `fm.img`.
But it is just as easy to write the attributes of an object. What's to stop But it is just as easy to write the attributes of an object. What's to stop
some sloppy research assistant from overwriting our `img` attribute? some sloppy research assistant from overwriting our `img` attribute?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
inimg = nib.load(op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')) inimg = nib.load(op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz'))
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
fm.img = None fm.img = None
fm.run() fm.run()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Well, the scary answer is ... there is __nothing__ stopping you from doing Well, the scary answer is ... there is __nothing__ stopping you from doing
whatever you want to a Python object. You can add, remove, and modify whatever you want to a Python object. You can add, remove, and modify
attributes at will. You can even replace the methods of an existing object if attributes at will. You can even replace the methods of an existing object if
you like: you like:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
def myadd(value): def myadd(value):
print('Oh no, I\'m not going to add {} to ' print('Oh no, I\'m not going to add {} to '
'your image. Go away!'.format(value)) 'your image. Go away!'.format(value))
fm.add = myadd fm.add = myadd
fm.add(123) fm.add(123)
fm.mul = None fm.mul = None
fm.mul(123) fm.mul(123)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But you really shouldn't get into the habit of doing devious things like But you really shouldn't get into the habit of doing devious things like
this. Think of the poor souls who inherit your code years after you have left this. Think of the poor souls who inherit your code years after you have left
the lab - if you go around overwriting all of the methods and attributes of the lab - if you go around overwriting all of the methods and attributes of
your objects, they are not going to have a hope in hell of understanding what your objects, they are not going to have a hope in hell of understanding what
your code is actually doing, and they are not going to like you very your code is actually doing, and they are not going to like you very
much. Take a look at the appendix for a [brief discussion on this much. Take a look at the appendix for a [brief discussion on this
topic](appendix-monkey-patching). topic](appendix-monkey-patching).
Python tends to assume that programmers are "responsible adults", and hence Python tends to assume that programmers are "responsible adults", and hence
doesn't do much in the way of restricting access to the attributes or methods doesn't do much in the way of restricting access to the attributes or methods
of an object. This is in contrast to languages like C++ and Java, where the of an object. This is in contrast to languages like C++ and Java, where the
notion of a private attribute or method is strictly enforced by the language. notion of a private attribute or method is strictly enforced by the language.
However, there are a couple of conventions in Python that are [universally However, there are a couple of conventions in Python that are
adhered [universally adhered to](https://docs.python.org/3/tutorial/classes.html#private-variables):
to](https://docs.python.org/3.5/tutorial/classes.html#private-variables):
* Class-level attributes and methods, and module-level attributes, functions, * Class-level attributes and methods, and module-level attributes, functions,
and classes, which begin with a single underscore (`_`), should be and classes, which begin with a single underscore (`_`), should be
considered __protected__ - they are intended for internal use only, and considered __protected__ - they are intended for internal use only, and
should not be considered part of the public API of a class or module. This should not be considered part of the public API of a class or module. This
is not enforced by the language in any way<sup>2</sup> - remember, we are is not enforced by the language in any way<sup>2</sup> - remember, we are
all responsible adults here! all responsible adults here!
* Class-level attributes and methods which begin with a double-underscore * Class-level attributes and methods which begin with a double-underscore
(`__`) should be considered __private__. Python provides a weak form of (`__`) should be considered __private__. Python provides a weak form of
enforcement for this rule - any attribute or method with such a name will enforcement for this rule - any attribute or method with such a name will
actually be _renamed_ (in a standardised manner) at runtime, so that it is actually be _renamed_ (in a standardised manner) at runtime, so that it is
not accessible through its original name (it is still accessible via its not accessible through its original name (it is still accessible via its
[mangled [mangled name](https://docs.python.org/3/tutorial/classes.html#private-variables)
name](https://docs.python.org/3.5/tutorial/classes.html#private-variables)
though). though).
> <sup>2</sup> With the exception that module-level fields which begin with a > <sup>2</sup> With the exception that module-level fields which begin with a
> single underscore will not be imported into the local scope via the > single underscore will not be imported into the local scope via the
> `from [module] import *` techinque. > `from [module] import *` technique.
So with all of this in mind, we can adjust our `FSLMaths` class to discourage So with all of this in mind, we can adjust our `FSLMaths` class to discourage
our sloppy research assistant from overwriting the `img` attribute: our sloppy research assistant from overwriting the `img` attribute:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
# remainder of definition omitted for brevity # remainder of definition omitted for brevity
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.__img = inimg self.__img = inimg
self.__operations = [] self.__operations = []
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But now we have lost the ability to read our `__img` attribute: But now we have lost the ability to read our `__img` attribute:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
inimg = nib.load(op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')) inimg = nib.load(op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz'))
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
print(fm.__img) print(fm.__img)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="a-better-way-properties"></a> <a class="anchor" id="a-better-way-properties"></a>
### A better way - properties ### A better way - properties
Python has a feature called Python has a feature called
[`properties`](https://docs.python.org/3.5/library/functions.html#property), [`properties`](https://docs.python.org/3/library/functions.html#property),
which is a nice way of controlling access to the attributes of an object. We which is a nice way of controlling access to the attributes of an object. We
can use properties by defining a "getter" method which can be used to access can use properties by defining a "getter" method which can be used to access
our attributes, and "decorating" them with the `@property` decorator (we will our attributes, and "decorating" them with the `@property` decorator (we will
cover decorators in a later practical). cover decorators in a later practical).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.__img = inimg self.__img = inimg
self.__operations = [] self.__operations = []
@property @property
def img(self): def img(self):
return self.__img return self.__img
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So we are still storing our input image as a private attribute, but now we So we are still storing our input image as a private attribute, but now we
have made it available in a read-only manner via the public `img` property: have made it available in a read-only manner via the public `img` property:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
print(fm.img.get_filename()) print(fm.img.get_filename())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that, even though we have defined `img` as a method, we can access it Note that, even though we have defined `img` as a method, we can access it
like an attribute - this is due to the magic behind the `@property` decorator. like an attribute - this is due to the magic behind the `@property` decorator.
We can also define "setter" methods for a property. For example, we might wish We can also define "setter" methods for a property. For example, we might wish
to add the ability for a user of our `FSLMaths` class to change the input to add the ability for a user of our `FSLMaths` class to change the input
image after creation. image after creation.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
def __init__(self, inimg): def __init__(self, inimg):
self.__img = None self.__img = None
self.__operations = [] self.__operations = []
self.img = inimg self.img = inimg
@property @property
def img(self): def img(self):
return self.__img return self.__img
@img.setter @img.setter
def img(self, value): def img(self, value):
if not isinstance(value, nib.nifti1.Nifti1Image): if not isinstance(value, nib.nifti1.Nifti1Image):
raise ValueError('value must be a NIFTI image!') raise ValueError('value must be a NIFTI image!')
self.__img = value self.__img = value
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Note that we used the `img` setter method within `__init__` to validate the > Note that we used the `img` setter method within `__init__` to validate the
> initial `inimg` that was passed in during creation. > initial `inimg` that was passed in during creation.
Property setters are a nice way to add validation logic for when an attribute Property setters are a nice way to add validation logic for when an attribute
is assigned a value. In this example, an error will be raised if the new input is assigned a value. In this example, an error will be raised if the new input
is not a NIFTI image. is not a NIFTI image.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
fm = FSLMaths(inimg) fm = FSLMaths(inimg)
print('Input: ', fm.img.get_filename()) print('Input: ', fm.img.get_filename())
# let's change the input # let's change the input
# to a different image # to a different image
fpath2 = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain.nii.gz') fpath2 = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain.nii.gz')
inimg2 = nib.load(fpath2) inimg2 = nib.load(fpath2)
fm.img = inimg2 fm.img = inimg2
print('New input: ', fm.img.get_filename()) print('New input: ', fm.img.get_filename())
print('This is going to explode') print('This is going to explode')
fm.img = 'abcde' fm.img = 'abcde'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="inheritance"></a> <a class="anchor" id="inheritance"></a>
## Inheritance ## Inheritance
One of the major advantages of an object-oriented programming approach is One of the major advantages of an object-oriented programming approach is
_inheritance_ - the ability to define hierarchical relationships between _inheritance_ - the ability to define hierarchical relationships between
classes and instances. classes and instances.
<a class="anchor" id="the-basics"></a> <a class="anchor" id="the-basics"></a>
### The basics ### The basics
My local veterinary surgery runs some Python code which looks like the My local veterinary surgery runs some Python code which looks like the
following, to assist the nurses in identifying an animal when it arrives at following, to assist the nurses in identifying an animal when it arrives at
the surgery: the surgery:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Animal(object): class Animal(object):
def noiseMade(self): def noiseMade(self):
raise NotImplementedError('This method must be ' raise NotImplementedError('This method must be '
'implemented by sub-classes') 'implemented by sub-classes')
class Dog(Animal): class Dog(Animal):
def noiseMade(self): def noiseMade(self):
return 'Woof' return 'Woof'
class TalkingDog(Dog): class TalkingDog(Dog):
def noiseMade(self): def noiseMade(self):
return 'Hi Homer, find your soulmate!' return 'Hi Homer, find your soulmate!'
class Cat(Animal): class Cat(Animal):
def noiseMade(self): def noiseMade(self):
return 'Meow' return 'Meow'
class Labrador(Dog): class Labrador(Dog):
pass pass
class Chihuahua(Dog): class Chihuahua(Dog):
def noiseMade(self): def noiseMade(self):
return 'Yap yap yap' return 'Yap yap yap'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Hopefully this example doesn't need much in the way of explanation - this Hopefully this example doesn't need much in the way of explanation - this
collection of classes captures a hierarchical relationship which exists in the collection of classes represents a hierarchical relationship which exists in
real world (and also captures the inherently annoying nature of the real world (and also represents the inherently annoying nature of
chihuahuas). For example, in the real world, all dogs are animals, but not all chihuahuas). For example, in the real world, all dogs are animals, but not all
animals are dogs. Therefore in our model, the `Dog` class has specified animals are dogs. Therefore in our model, the `Dog` class has specified
`Animal` as its base class. We say that the `Dog` class _extends_, _derives `Animal` as its base class. We say that the `Dog` class *extends*, *derives
from_, or _inherits from_, the `Animal` class, and that all `Dog` instances from*, or *inherits from*, the `Animal` class, and that all `Dog` instances
are also `Animal` instances (but not vice-versa). are also `Animal` instances (but not vice-versa).
What does that `noiseMade` method do? There is a `noiseMade` method defined What does that `noiseMade` method do? There is a `noiseMade` method defined
on the `Animal` class, but it has been re-implemented, or _overridden_ in the on the `Animal` class, but it has been re-implemented, or *overridden* in the
`Dog`, `Dog`,
[`TalkingDog`](https://twitter.com/simpsonsqotd/status/427941665836630016?lang=en), [`TalkingDog`](https://twitter.com/simpsonsqotd/status/427941665836630016?lang=en),
`Cat`, and `Chihuahua` classes (but not on the `Labrador` class). We can call `Cat`, and `Chihuahua` classes (but not on the `Labrador` class). We can call
the `noiseMade` method on any `Animal` instance, but the specific behaviour the `noiseMade` method on any `Animal` instance, but the specific behaviour
that we get is dependent on the specific type of animal. that we get is dependent on the specific type of animal.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
d = Dog() d = Dog()
l = Labrador() l = Labrador()
c = Cat() c = Cat()
ch = Chihuahua() ch = Chihuahua()
print('Noise made by dogs: {}'.format(d .noiseMade())) print('Noise made by dogs: {}'.format(d .noiseMade()))
print('Noise made by labradors: {}'.format(l .noiseMade())) print('Noise made by labradors: {}'.format(l .noiseMade()))
print('Noise made by cats: {}'.format(c .noiseMade())) print('Noise made by cats: {}'.format(c .noiseMade()))
print('Noise made by chihuahuas: {}'.format(ch.noiseMade())) print('Noise made by chihuahuas: {}'.format(ch.noiseMade()))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that calling the `noiseMade` method on a `Labrador` instance resulted in Note that calling the `noiseMade` method on a `Labrador` instance resulted in
the `Dog.noiseMade` implementation being called. the `Dog.noiseMade` implementation being called.
<a class="anchor" id="code-re-use-and-problem-decomposition"></a> <a class="anchor" id="code-re-use-and-problem-decomposition"></a>
### Code re-use and problem decomposition ### Code re-use and problem decomposition
Inheritance allows us to split a problem into smaller problems, and to re-use Inheritance allows us to split a problem into smaller problems, and to re-use
code. Let's demonstrate this with a more involved (and even more contrived) code. Let's demonstrate this with a more involved (and even more contrived)
example. Imagine that a former colleague had written a class called example. Imagine that a former colleague had written a class called
`Operator`: `Operator`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Operator(object): class Operator(object):
def __init__(self): def __init__(self):
super().__init__() # this line will be explained later super().__init__() # this line will be explained later
self.__operations = [] self.__operations = []
self.__opFuncs = {} self.__opFuncs = {}
@property @property
def operations(self): def operations(self):
return list(self.__operations) return list(self.__operations)
@property @property
def functions(self): def functions(self):
return dict(self.__opFuncs) return dict(self.__opFuncs)
def addFunction(self, name, func): def addFunction(self, name, func):
self.__opFuncs[name] = func self.__opFuncs[name] = func
def do(self, name, *values): def do(self, name, *values):
self.__operations.append((name, values)) self.__operations.append((name, values))
def preprocess(self, value): def preprocess(self, value):
return value return value
def run(self, input): def run(self, input):
data = self.preprocess(input) data = self.preprocess(input)
for oper, vals in self.__operations: for oper, vals in self.__operations:
func = self.__opFuncs[oper] func = self.__opFuncs[oper]
vals = [self.preprocess(v) for v in vals] vals = [self.preprocess(v) for v in vals]
data = func(data, *vals) data = func(data, *vals)
return data return data
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This `Operator` class provides an interface and logic to execute a chain of This `Operator` class provides an interface and logic to execute a chain of
operations - an operation is some function which accepts one or more inputs, operations - an operation is some function which accepts one or more inputs,
and produce one output. and produce one output.
But it stops short of defining any operations. Instead, we can create another But it stops short of defining any operations. Instead, we can create another
class - a sub-class - which derives from the `Operator` class. This sub-class class - a sub-class - which derives from the `Operator` class. This sub-class
will define the operations that will ultimately be executed by the `Operator` will define the operations that will ultimately be executed by the `Operator`
class. All that the `Operator` class does is: class. All that the `Operator` class does is:
- Allow functions to be registered with the `addFunction` method - all - Allow functions to be registered with the `addFunction` method - all
registered functions can be used via the `do` method. registered functions can be used via the `do` method.
- Stage an operation (using a registered function) via the `do` method. Note - Stage an operation (using a registered function) via the `do` method. Note
that `do` allows any number of values to be passed to it, as we used the `*` that `do` allows any number of values to be passed to it, as we used the `*`
operator when specifying the `values` argument. operator when specifying the `values` argument.
- Run all staged operations via the `run` method - it passes an input through - Run all staged operations via the `run` method - it passes an input through
all of the operations that have been staged, and then returns the final all of the operations that have been staged, and then returns the final
result. result.
Let's define a sub-class: Let's define a sub-class:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class NumberOperator(Operator): class NumberOperator(Operator):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.addFunction('add', self.add) self.addFunction('add', self.add)
self.addFunction('mul', self.mul) self.addFunction('mul', self.mul)
self.addFunction('negate', self.negate) self.addFunction('negate', self.negate)
def preprocess(self, value): def preprocess(self, value):
return float(value) return float(value)
def add(self, a, b): def add(self, a, b):
return a + b return a + b
def mul(self, a, b): def mul(self, a, b):
return a * b return a * b
def negate(self, a): def negate(self, a):
return -a return -a
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `NumberOperator` is a sub-class of `Operator`, which we can use for basic The `NumberOperator` is a sub-class of `Operator`, which we can use for basic
numerical calculations. It provides a handful of simple numerical methods, but numerical calculations. It provides a handful of simple numerical methods, but
the most interesting stuff is inside `__init__`. the most interesting stuff is inside `__init__`.
> ``` > ```
> super().__init__() > super().__init__()
> ``` > ```
This line invokes `Operator.__init__` - the initialisation method for the This line invokes `Operator.__init__` - the initialisation method for the
`Operator` base-class. `Operator` base-class.
In Python, we can use the [built-in `super` In Python, we can use the [built-in `super`
method](https://docs.python.org/3.5/library/functions.html#super) to take care method](https://docs.python.org/3/library/functions.html#super) to take care
of correctly calling methods that are defined in an object's base-class (or of correctly calling methods that are defined in an object's base-class (or
classes, in the case of [multiple inheritance](multiple-inheritance)). classes, in the case of [multiple inheritance](multiple-inheritance)).
> The `super` function is one thing which changed between Python 2 and 3 - > The `super` function is one thing which changed between Python 2 and 3 -
> in Python 2, it was necessary to pass both the type and the instance > in Python 2, it was necessary to pass both the type and the instance
> to `super`. So it is common to see code that looks like this: > to `super`. So it is common to see code that looks like this:
> >
> ``` > ```
> def __init__(self): > def __init__(self):
> super(NumberOperator, self).__init__() > super(NumberOperator, self).__init__()
> ``` > ```
> >
> Fortunately things are a lot cleaner in Python 3. > Fortunately things are a lot cleaner in Python 3.
Let's move on to the next few lines in `__init__`: Let's move on to the next few lines in `__init__`:
> ``` > ```
> self.addFunction('add', self.add) > self.addFunction('add', self.add)
> self.addFunction('mul', self.mul) > self.addFunction('mul', self.mul)
> self.addFunction('negate', self.negate) > self.addFunction('negate', self.negate)
> ``` > ```
Here we are registering all of the functionality that is provided by the Here we are registering all of the functionality that is provided by the
`NumberOperator` class, via the `Operator.addFunction` method. `NumberOperator` class, via the `Operator.addFunction` method.
The `NumberOperator` class has also overridden the `preprocess` method, to The `NumberOperator` class has also overridden the `preprocess` method, to
ensure that all values handled by the `Operator` are numbers. This method gets ensure that all values handled by the `Operator` are numbers. This method gets
called within the `Operator.run` method - for a `NumberOperator` instance, the called within the `Operator.run` method - for a `NumberOperator` instance, the
`NumberOperator.preprocess` method will get called<sup>3</sup>. `NumberOperator.preprocess` method will get called<sup>3</sup>.
> <sup>3</sup> When a sub-class overrides a base-class method, it is still > <sup>3</sup> When a sub-class overrides a base-class method, it is still
> possible to access the base-class implementation [via the `super()` > possible to access the base-class implementation [via the `super()`
> function](https://stackoverflow.com/a/4747427) (the preferred method), or by > function](https://stackoverflow.com/a/4747427) (the preferred method), or by
> [explicitly calling the base-class > [explicitly calling the base-class
> implementation](https://stackoverflow.com/a/2421325). > implementation](https://stackoverflow.com/a/2421325).
Now let's see what our `NumberOperator` class does: Now let's see what our `NumberOperator` class does:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
no = NumberOperator() no = NumberOperator()
no.do('add', 10) no.do('add', 10)
no.do('mul', 2) no.do('mul', 2)
no.do('negate') no.do('negate')
print('Operations on {}: {}'.format(10, no.run(10))) print('Operations on {}: {}'.format(10, no.run(10)))
print('Operations on {}: {}'.format(2.5, no.run(5))) print('Operations on {}: {}'.format(2.5, no.run(5)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
It works! While this is a contrived example, hopefully you can see how It works! While this is a contrived example, hopefully you can see how
inheritance can be used to break a problem down into sub-problems: inheritance can be used to break a problem down into sub-problems:
- The `Operator` class provides all of the logic needed to manage and execute - The `Operator` class provides all of the logic needed to manage and execute
operations, without caring about what those operations are actually doing. operations, without caring about what those operations are actually doing.
- This leaves the `NumberOperator` class free to concentrate on implementing - This leaves the `NumberOperator` class free to concentrate on implementing
the functions that are specific to its task, and not having to worry about the functions that are specific to its task, and not having to worry about
how they are executed. how they are executed.
We could also easily implement other `Operator` sub-classes to work on We could also easily implement other `Operator` sub-classes to work on
different data types, such as arrays, images, or even non-numeric data such as different data types, such as arrays, images, or even non-numeric data such as
strings: strings:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class StringOperator(Operator): class StringOperator(Operator):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.addFunction('capitalise', self.capitalise) self.addFunction('capitalise', self.capitalise)
self.addFunction('concat', self.concat) self.addFunction('concat', self.concat)
def preprocess(self, value): def preprocess(self, value):
return str(value) return str(value)
def capitalise(self, s): def capitalise(self, s):
return ' '.join([w[0].upper() + w[1:] for w in s.split()]) return ' '.join([w[0].upper() + w[1:] for w in s.split()])
def concat(self, s1, s2): def concat(self, s1, s2):
return s1 + s2 return s1 + s2
so = StringOperator() so = StringOperator()
so.do('capitalise') so.do('capitalise')
so.do('concat', '!') so.do('concat', '!')
print(so.run('python is an ok language')) print(so.run('python is an ok language'))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="polymorphism"></a> <a class="anchor" id="polymorphism"></a>
### Polymorphism ### Polymorphism
Inheritance also allows us to take advantage of _polymorphism_, which refers Inheritance also allows us to take advantage of *polymorphism*, which refers
to idea that, in an object-oriented language, we should be able to use an to the idea that, in an object-oriented language, we should be able to use an
object without having complete knowledge about the class, or type, of that object without having complete knowledge about the class, or type, of that
object. For example, we should be able to write a function which expects an object. For example, we should be able to write a function which expects an
`Operator` instance, but which will work on an instance of any `Operator` `Operator` instance, but which will work on an instance of any `Operator`
sub-classs. As an example, let's write a function which prints a summary of an sub-classs. As an example, let's write a function which prints a summary of an
`Operator` instance: `Operator` instance:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def operatorSummary(o): def operatorSummary(o):
print(type(o).__name__) print(type(o).__name__)
print(' All functions: ') print(' All functions: ')
for fname in o.functions.keys(): for fname in o.functions.keys():
print(' {}'.format(fname)) print(' {}'.format(fname))
print(' Staged operations: ') print(' Staged operations: ')
for i, (fname, vals) in enumerate(o.operations): for i, (fname, vals) in enumerate(o.operations):
vals = ', '.join([str(v) for v in vals]) vals = ', '.join([str(v) for v in vals])
print(' {}: {}({})'.format(i + 1, fname, vals)) print(' {}: {}({})'.format(i + 1, fname, vals))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Because the `operatorSummary` function only uses methods that are defined Because the `operatorSummary` function only uses methods that are defined
in the `Operator` base-class, we can use it on _any_ `Operator` instance, in the `Operator` base-class, we can use it on _any_ `Operator` instance,
regardless of its specific type: regardless of its specific type:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
operatorSummary(no) operatorSummary(no)
operatorSummary(so) operatorSummary(so)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="multiple-inheritance"></a> <a class="anchor" id="multiple-inheritance"></a>
### Multiple inheritance ### Multiple inheritance
Python allows you to define a class which has multiple base classes - this is Python allows you to define a class which has multiple base classes - this is
known as _multiple inheritance_. For example, we might want to build a known as _multiple inheritance_. For example, we might want to build a
notification mechanisim into our `StringOperator` class, so that listeners can notification mechanisim into our `StringOperator` class, so that listeners can
be notified whenever the `capitalise` method gets called. It so happens that be notified whenever the `capitalise` method gets called. It so happens that
our old colleague of `Operator` class fame also wrote a `Notifier` class which our old colleague of `Operator` class fame also wrote a `Notifier` class which
allows listeners to register to be notified when an event occurs: allows listeners to register to be notified when an event occurs:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Notifier(object): class Notifier(object):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.__listeners = {} self.__listeners = {}
def register(self, name, func): def register(self, name, func):
self.__listeners[name] = func self.__listeners[name] = func
def notify(self, *args, **kwargs): def notify(self, *args, **kwargs):
for func in self.__listeners.values(): for func in self.__listeners.values():
func(*args, **kwargs) func(*args, **kwargs)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Let's modify the `StringOperator` class to use the functionality of the Let's modify the `StringOperator` class to use the functionality of the
`Notifier ` class: `Notifier ` class:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class StringOperator(Operator, Notifier): class StringOperator(Operator, Notifier):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.addFunction('capitalise', self.capitalise) self.addFunction('capitalise', self.capitalise)
self.addFunction('concat', self.concat) self.addFunction('concat', self.concat)
def preprocess(self, value): def preprocess(self, value):
return str(value) return str(value)
def capitalise(self, s): def capitalise(self, s):
result = ' '.join([w[0].upper() + w[1:] for w in s.split()]) result = ' '.join([w[0].upper() + w[1:] for w in s.split()])
self.notify(result) self.notify(result)
return result return result
def concat(self, s1, s2): def concat(self, s1, s2):
return s1 + s2 return s1 + s2
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, anything which is interested in uses of the `capitalise` method can Now, anything which is interested in uses of the `capitalise` method can
register as a listener on a `StringOperator` instance: register as a listener on a `StringOperator` instance:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
so = StringOperator() so = StringOperator()
def capitaliseCalled(result): def capitaliseCalled(result):
print('Capitalise operation called: {}'.format(result)) print('Capitalise operation called: {}'.format(result))
so.register('mylistener', capitaliseCalled) so.register('mylistener', capitaliseCalled)
so.do('capitalise') so.do('capitalise')
so.do('concat', '?') so.do('concat', '?')
print(so.run('did you notice that functions are objects too')) print(so.run('did you notice that functions are objects too'))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Simple classes such as the `Notifier` are sometimes referred to as > Simple classes such as the `Notifier` are sometimes referred to as
> [_mixins_](https://en.wikipedia.org/wiki/Mixin). > [_mixins_](https://en.wikipedia.org/wiki/Mixin).
If you wish to use multiple inheritance in your design, it is important to be If you wish to use multiple inheritance in your design, it is important to be
aware of the mechanism that Python uses to determine how base class methods aware of the mechanism that Python uses to determine how base class methods
are called (and which base class method will be called, in the case of naming are called (and which base class method will be called, in the case of naming
conflicts). This is referred to as the Method Resolution Order (MRO) - further conflicts). This is referred to as the Method Resolution Order (MRO) - further
details on the topic can be found details on the topic can be found
[here](https://www.python.org/download/releases/2.3/mro/), and a more concise [here](https://www.python.org/download/releases/2.3/mro/), and a more concise
summary summary
[here](http://python-history.blogspot.co.uk/2010/06/method-resolution-order.html). [here](http://python-history.blogspot.co.uk/2010/06/method-resolution-order.html).
Note also that for base class `__init__` methods to be correctly called in a Note also that for base class `__init__` methods to be correctly called in a
design which uses multiple inheritance, _all_ classes in the hierarchy must design which uses multiple inheritance, _all_ classes in the hierarchy must
invoke `super().__init__()`. This can become complicated when some base invoke `super().__init__()`. This can become complicated when some base
classes expect to be passed arguments to their `__init__` method. In scenarios classes expect to be passed arguments to their `__init__` method. In scenarios
like this it may be prefereable to manually invoke the base class `__init__` like this it may be prefereable to manually invoke the base class `__init__`
methods instead of using `super()`. For example: methods instead of using `super()`. For example:
> ``` > ```
> class StringOperator(Operator, Notifier): > class StringOperator(Operator, Notifier):
> def __init__(self): > def __init__(self):
> Operator.__init__(self) > Operator.__init__(self)
> Notifier.__init__(self) > Notifier.__init__(self)
> ``` > ```
This approach has the disadvantage that if the base classes change, you will This approach has the disadvantage that if the base classes change, you will
have to change these invocations. But the advantage is that you know exactly have to change these invocations. But the advantage is that you know exactly
how the class hierarchy will be initialised. In general though, doing how the class hierarchy will be initialised. In general though, doing
everything with `super()` will result in more maintainable code. everything with `super()` will result in more maintainable code.
<a class="anchor" id="class-attributes-and-methods"></a> <a class="anchor" id="class-attributes-and-methods"></a>
## Class attributes and methods ## Class attributes and methods
Up to this point we have been covering how to add attributes and methods to an Up to this point we have been covering how to add attributes and methods to an
_object_. But it is also possible to add methods and attributes to a _class_ _object_. But it is also possible to add methods and attributes to a _class_
(`static` methods and fields, for those of you familiar with C++ or Java). (`static` methods and fields, for those of you familiar with C++ or Java).
Class attributes and methods can be accessed without having to create an Class attributes and methods can be accessed without having to create an
instance of the class - they are not associated with individual objects, but instance of the class - they are not associated with individual objects, but
rather with the class itself. rather with the class itself.
Class methods and attributes can be useful in several scenarios - as a Class methods and attributes can be useful in several scenarios - as a
hypothetical, not very useful example, let's say that we want to gain usage hypothetical, not very useful example, let's say that we want to gain usage
statistics for how many times each type of operation is used on instances of statistics for how many times each type of operation is used on instances of
our `FSLMaths` class. We might, for example, use this information in a grant our `FSLMaths` class. We might, for example, use this information in a grant
application to show evidence that more research is needed to optimise the application to show evidence that more research is needed to optimise the
performance of the `add` operation. performance of the `add` operation.
<a class="anchor" id="class-attributes"></a> <a class="anchor" id="class-attributes"></a>
### Class attributes ### Class attributes
Let's add a `dict` called `opCounters` as a class attribute to the `FSLMaths` Let's add a `dict` called `opCounters` as a class attribute to the `FSLMaths`
class - whenever an operation is called on a `FSLMaths` instance, the counter class - whenever an operation is called on a `FSLMaths` instance, the counter
for that operation will be incremented: for that operation will be incremented:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
import nibabel as nib import nibabel as nib
class FSLMaths(object): class FSLMaths(object):
# It's this easy to add a class-level # It's this easy to add a class-level
# attribute. This dict is associated # attribute. This dict is associated
# with the FSLMaths *class*, not with # with the FSLMaths *class*, not with
# any individual FSLMaths instance. # any individual FSLMaths instance.
opCounters = {} opCounters = {}
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
self.operations = [] self.operations = []
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
return self
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
return self
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
return self
def run(self, output=None): def run(self, output=None):
data = np.array(self.img.get_data()) data = np.array(self.img.get_data())
for oper, value in self.operations: for oper, value in self.operations:
# Code omitted for brevity # Code omitted for brevity
# Increment the usage counter # Increment the usage counter for this operation. We can
# for this operation. We can # access class attributes (and methods) through the class
# access class attributes (and # itself, as shown here.
# methods) through the class FSLMaths.opCounters[oper] = FSLMaths.opCounters.get(oper, 0) + 1
# itself.
FSLMaths.opCounters[oper] = self.opCounters.get(oper, 0) + 1 # It is also possible to access class-level
# attributes via instances of the class, e.g.
# self.opCounters[oper] = self.opCounters.get(oper, 0) + 1
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So let's see it in action: So let's see it in action:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz') fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
mask = nib.load(fmask) mask = nib.load(fmask)
fm1 = FSLMaths(inimg) FSLMaths(inimg).mul(mask).add(25).run()
fm2 = FSLMaths(inimg) FSLMaths(inimg).add(15).div(1.5).run()
fm1.mul(mask)
fm1.add(15)
fm2.add(25)
fm1.div(1.5)
fm1.run()
fm2.run()
print('FSLMaths usage statistics') print('FSLMaths usage statistics')
for oper in ('add', 'div', 'mul'): for oper in ('add', 'div', 'mul'):
print(' {} : {}'.format(oper, FSLMaths.opCounters.get(oper, 0))) print(' {} : {}'.format(oper, FSLMaths.opCounters.get(oper, 0)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="class-methods"></a> <a class="anchor" id="class-methods"></a>
### Class methods ### Class methods
It is just as easy to add a method to a class - let's take our reporting code It is just as easy to add a method to a class - let's take our reporting code
from above, and add it as a method to the `FSLMaths` class. from above, and add it as a method to the `FSLMaths` class.
A class method is denoted by the A class method is denoted by the
[`@classmethod`](https://docs.python.org/3.5/library/functions.html#classmethod) [`@classmethod`](https://docs.python.org/3.5/library/functions.html#classmethod)
decorator. Note that, where a regular method which is called on an instance decorator. Note that, where a regular method which is called on an instance
will be passed the instance as its first argument (`self`), a class method will be passed the instance as its first argument (`self`), a class method
will be passed the class itself as the first argument - the standard will be passed the class itself as the first argument - the standard
convention is to call this argument `cls`: convention is to call this argument `cls`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class FSLMaths(object): class FSLMaths(object):
opCounters = {} opCounters = {}
@classmethod @classmethod
def usage(cls): def usage(cls):
print('FSLMaths usage statistics') print('FSLMaths usage statistics')
for oper in ('add', 'div', 'mul'): for oper in ('add', 'div', 'mul'):
print(' {} : {}'.format(oper, FSLMaths.opCounters.get(oper, 0))) print(' {} : {}'.format(oper, FSLMaths.opCounters.get(oper, 0)))
def __init__(self, inimg): def __init__(self, inimg):
self.img = inimg self.img = inimg
self.operations = [] self.operations = []
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
return self
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
return self
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
return self
def run(self, output=None): def run(self, output=None):
data = np.array(self.img.get_data()) data = np.array(self.img.get_data())
for oper, value in self.operations: for oper, value in self.operations:
FSLMaths.opCounters[oper] = self.opCounters.get(oper, 0) + 1 FSLMaths.opCounters[oper] = self.opCounters.get(oper, 0) + 1
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> There is another decorator - > There is another decorator -
> [`@staticmethod`](https://docs.python.org/3.5/library/functions.html#staticmethod) - > [`@staticmethod`](https://docs.python.org/3.5/library/functions.html#staticmethod) -
> which can be used on methods defined within a class. The difference > which can be used on methods defined within a class. The difference
> between a `@classmethod` and a `@staticmethod` is that the latter will _not_ > between a `@classmethod` and a `@staticmethod` is that the latter will *not*
> be passed the class (`cls`). > be passed the class (`cls`).
calling a class method is the same as accessing a class attribute: Calling a class method is the same as accessing a class attribute:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz') fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz') fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
mask = nib.load(fmask) mask = nib.load(fmask)
fm1 = FSLMaths(inimg) fm1 = FSLMaths(inimg).mul(mask).add(25)
fm2 = FSLMaths(inimg) fm2 = FSLMaths(inimg).add(15).div(1.5)
fm1.mul(mask)
fm1.add(15)
fm2.add(25)
fm1.div(1.5)
fm1.run() fm1.run()
fm2.run() fm2.run()
FSLMaths.usage() FSLMaths.usage()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that it is also possible to access class attributes and methods through Note that it is also possible to access class attributes and methods through
instances: instances:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print(fm1.opCounters) print(fm1.opCounters)
fm1.usage() fm1.usage()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="appendix-the-object-base-class"></a> <a class="anchor" id="appendix-the-object-base-class"></a>
## Appendix: The `object` base-class ## Appendix: The `object` base-class
When you are defining a class, you need to specify the base-class from which When you are defining a class, you need to specify the base-class from which
your class inherits. If your class does not inherit from a particular class, your class inherits. If your class does not inherit from a particular class,
then it should inherit from the built-in `object` class: then it should inherit from the built-in `object` class:
> ``` > ```
> class MyClass(object): > class MyClass(object):
> ... > ...
> ``` > ```
However, in older code bases, you might see class definitions that look like However, in older code bases, you might see class definitions that look like
this, without explicitly inheriting from the `object` base class: this, without explicitly inheriting from the `object` base class:
> ``` > ```
> class MyClass: > class MyClass:
> ... > ...
> ``` > ```
This syntax is a [throwback to older versions of This syntax is a [throwback to older versions of
Python](https://docs.python.org/2/reference/datamodel.html#new-style-and-classic-classes). Python](https://docs.python.org/2/reference/datamodel.html#new-style-and-classic-classes).
In Python 3 there is actually no difference in defining classes in the In Python 3 there is actually no difference in defining classes in the
"new-style" way we have used throughout this tutorial, or the "old-style" way "new-style" way we have used throughout this tutorial, or the "old-style" way
mentioned in this appendix. mentioned in this appendix.
But if you are writing code which needs to run on both Python 2 and 3, you But if you are writing code which needs to run on both Python 2 and 3, you
__must__ define your classes in the new-style convention, i.e. by explicitly __must__ define your classes in the new-style convention, i.e. by explicitly
inheriting from the `object` base class. Therefore, the safest approach is to inheriting from the `object` base class. Therefore, the safest approach is to
always use the new-style format. always use the new-style format.
<a class="anchor" id="appendix-init-versus-new"></a> <a class="anchor" id="appendix-init-versus-new"></a>
## Appendix: `__init__` versus `__new__` ## Appendix: `__init__` versus `__new__`
In Python, object creation is actually a two-stage process - _creation_, and In Python, object creation is actually a two-stage process - *creation*, and
then _initialisation_. The `__init__` method gets called during the then *initialisation*. The `__init__` method gets called during the
_initialisation_ stage - its job is to initialise the state of the object. But *initialisation* stage - its job is to initialise the state of the object. But
note that, by the time `__init__` gets called, the object has already been note that, by the time `__init__` gets called, the object has already been
created. created.
You can also define a method called `__new__` if you need to control the You can also define a method called `__new__` if you need to control the
creation stage, although this is very rarely needed. One example of where you creation stage, although this is very rarely needed. One example of where you
might need to implement the `__new__` method is if you wish to create a might need to implement the `__new__` method is if you wish to create a
[subclass of a [subclass of a
`numpy.array`](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.subclassing.html) `numpy.array`](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.subclassing.html)
(although you might alternatively want to think about redefining your problem (although you might alternatively want to think about redefining your problem
so that this is not necessary). so that this is not necessary).
A brief explanation on A brief explanation on
the difference between `__new__` and `__init__` can be found the difference between `__new__` and `__init__` can be found
[here](https://www.reddit.com/r/learnpython/comments/2s3pms/what_is_the_difference_between_init_and_new/cnm186z/), [here](https://www.reddit.com/r/learnpython/comments/2s3pms/what_is_the_difference_between_init_and_new/cnm186z/),
and you may also wish to take a look at the [official Python and you may also wish to take a look at the [official Python
docs](https://docs.python.org/3.5/reference/datamodel.html#basic-customization). docs](https://docs.python.org/3/reference/datamodel.html#basic-customization).
<a class="anchor" id="appendix-monkey-patching"></a> <a class="anchor" id="appendix-monkey-patching"></a>
## Appendix: Monkey-patching ## Appendix: Monkey-patching
The act of run-time modification of objects or class definitions is referred The act of run-time modification of objects or class definitions is referred
to as [_monkey-patching_](https://en.wikipedia.org/wiki/Monkey_patch) and, to as [*monkey-patching*](https://en.wikipedia.org/wiki/Monkey_patch) and,
whilst it is allowed by the Python programming language, it is generally whilst it is allowed by the Python programming language, it is generally
considered quite bad practice. considered quite bad practice.
Just because you _can_ do something doesn't mean that you _should_. Python Just because you *can* do something doesn't mean that you *should*. Python
gives you the flexibility to write your software in whatever manner you deem gives you the flexibility to write your software in whatever manner you deem
suitable. __But__ if you want to write software that will be used, adopted, suitable. **But** if you want to write software that will be used, adopted,
maintained, and enjoyed by other people, you should be polite, write your code maintained, and enjoyed by other people, you should be polite, write your code
in a clear, readable fashion, and avoid the use of devious tactics such as in a clear, readable fashion, and avoid the use of devious tactics such as
monkey-patching. monkey-patching.
__However__, while monkey-patching may seem like a horrific programming **However**, while monkey-patching may seem like a horrific programming
practice to those of you coming from the realms of C++, Java, and the like, practice to those of you coming from the realms of C++, Java, and the like,
(and it is horrific in many cases), it can be _extremely_ useful in certain (and it is horrific in many cases), it can be *extremely* useful in certain
circumstances. For instance, monkey-patching makes [unit testing a circumstances. For instance, monkey-patching makes [unit testing a
breeze in Python](https://docs.python.org/3.5/library/unittest.mock.html). breeze in Python](https://docs.python.org/3/library/unittest.mock.html).
As another example, consider the scenario where you are dependent on a third As another example, consider the scenario where you are dependent on a third
party library which has bugs in it. No problem - while you are waiting for the party library which has bugs in it. No problem - while you are waiting for the
library author to release a new version of the library, you can write your own library author to release a new version of the library, you can write your own
working implementation and [monkey-patch it working implementation and [monkey-patch it
in!](https://git.fmrib.ox.ac.uk/fsl/fsleyes/fsleyes/blob/0.21.0/fsleyes/views/viewpanel.py#L726) in!](https://git.fmrib.ox.ac.uk/fsl/fsleyes/fsleyes/blob/0.21.0/fsleyes/views/viewpanel.py#L726)
<a class="anchor" id="appendix-method-overloading"></a> <a class="anchor" id="appendix-method-overloading"></a>
## Appendix: Method overloading ## Appendix: Method overloading
Method overloading (defining multiple methods with the same name in a class, Method overloading (defining multiple methods with the same name in a class,
but each accepting different arguments) is one of the only object-oriented but each accepting different arguments) is one of the only object-oriented
features that is not present in Python. Becuase Python does not perform any features that is not present in Python. Becuase Python does not perform any
runtime checks on the types of arguments that are passed to a method, or the runtime checks on the types of arguments that are passed to a method, or the
compatibility of the method to accept the arguments, it would not be possible compatibility of the method to accept the arguments, it would not be possible
to determine which implementation of a method is to be called. In other words, to determine which implementation of a method is to be called. In other words,
in Python only the name of a method is used to identify that method, unlike in in Python only the name of a method is used to identify that method, unlike in
C++ and Java, where the full method signature (name, input types and return C++ and Java, where the full method signature (name, input types and return
types) is used. types) is used.
However, because a Python method can be written to accept any number or type However, because a Python method can be written to accept any number or type
of arguments, it is very easy to to build your own overloading logic by of arguments, it is very easy to to build your own overloading logic by
writing a "dispatch" method<sup>4</sup>. Here is YACE (Yet Another Contrived writing a "dispatch" method<sup>4</sup>. Here is YACE (Yet Another Contrived
Example): Example):
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Adder(object): class Adder(object):
def add(self, *args): def add(self, *args):
if len(args) == 2: return self.__add2(*args) if len(args) == 2: return self.__add2(*args)
elif len(args) == 3: return self.__add3(*args) elif len(args) == 3: return self.__add3(*args)
elif len(args) == 4: return self.__add4(*args) elif len(args) == 4: return self.__add4(*args)
else: else:
raise AttributeError('No method available to accept {} ' raise AttributeError('No method available to accept {} '
'arguments'.format(len(args))) 'arguments'.format(len(args)))
def __add2(self, a, b): def __add2(self, a, b):
return a + b return a + b
def __add3(self, a, b, c): def __add3(self, a, b, c):
return a + b + c return a + b + c
def __add4(self, a, b, c, d): def __add4(self, a, b, c, d):
return a + b + c + d return a + b + c + d
a = Adder() a = Adder()
print('Add two: {}'.format(a.add(1, 2))) print('Add two: {}'.format(a.add(1, 2)))
print('Add three: {}'.format(a.add(1, 2, 3))) print('Add three: {}'.format(a.add(1, 2, 3)))
print('Add four: {}'.format(a.add(1, 2, 3, 4))) print('Add four: {}'.format(a.add(1, 2, 3, 4)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> <sup>4</sup>Another option is the [`functools.singledispatch` > <sup>4</sup>Another option is the [`functools.singledispatch`
> decorator](https://docs.python.org/3.5/library/functools.html#functools.singledispatch), > decorator](https://docs.python.org/3/library/functools.html#functools.singledispatch),
> which is more complicated, but may allow you to write your dispatch logic in > which is more complicated, but may allow you to write your dispatch logic in
> a more concise manner. > a more concise manner.
<a class="anchor" id="useful-references"></a> <a class="anchor" id="useful-references"></a>
## Useful references ## Useful references
The official Python documentation has a wealth of information on the internal The official Python documentation has a wealth of information on the internal
workings of classes and objects, so these pages are worth a read: workings of classes and objects, so these pages are worth a read:
* https://docs.python.org/3.5/tutorial/classes.html * https://docs.python.org/3/tutorial/classes.html
* https://docs.python.org/3.5/reference/datamodel.html * https://docs.python.org/3/reference/datamodel.html
......
...@@ -19,6 +19,7 @@ you use an object-oriented approach. ...@@ -19,6 +19,7 @@ you use an object-oriented approach.
* [We didn't specify the `self` argument - what gives?!?](#we-didnt-specify-the-self-argument) * [We didn't specify the `self` argument - what gives?!?](#we-didnt-specify-the-self-argument)
* [Attributes](#attributes) * [Attributes](#attributes)
* [Methods](#methods) * [Methods](#methods)
* [Method chaining](#method-chaining)
* [Protecting attribute access](#protecting-attribute-access) * [Protecting attribute access](#protecting-attribute-access)
* [A better way - properties](#a-better-way-properties]) * [A better way - properties](#a-better-way-properties])
* [Inheritance](#inheritance) * [Inheritance](#inheritance)
...@@ -46,8 +47,8 @@ section. ...@@ -46,8 +47,8 @@ section.
If you have not done any object-oriented programming before, your first step If you have not done any object-oriented programming before, your first step
is to understand the difference between _objects_ (also known as is to understand the difference between *objects* (also known as
_instances_) and _classes_ (also known as _types_). *instances*) and *classes* (also known as *types*).
If you have some experience in C, then you can start off by thinking of a If you have some experience in C, then you can start off by thinking of a
...@@ -66,8 +67,8 @@ layout of a chunk of memory. For example, here is a typical struct definition: ...@@ -66,8 +67,8 @@ layout of a chunk of memory. For example, here is a typical struct definition:
> ``` > ```
Now, an _object_ is not a definition, but rather a thing which resides in Now, an *object* is not a definition, but rather a thing which resides in
memory. An object can have _attributes_ (pieces of information), and _methods_ memory. An object can have *attributes* (pieces of information), and *methods*
(functions associated with the object). You can pass objects around your code, (functions associated with the object). You can pass objects around your code,
manipulate their attributes, and call their methods. manipulate their attributes, and call their methods.
...@@ -92,12 +93,12 @@ you create an object from that class. ...@@ -92,12 +93,12 @@ you create an object from that class.
Of course there are many more differences between C structs and classes (most Of course there are many more differences between C structs and classes (most
notably [inheritance](todo), [polymorphism](todo), and [access notably [inheritance](todo), [polymorphism](todo), and [access
protection](todo)). But if you can understand the difference between a protection](todo)). But if you can understand the difference between a
_definition_ of a C struct, and an _instantiation_ of that struct, then you *definition* of a C struct, and an *instantiation* of that struct, then you
are most of the way towards understanding the difference between a _class_, are most of the way towards understanding the difference between a *class*,
and an _object_. and an *object*.
> But just to confuse you, remember that in Python, __everything__ is an > But just to confuse you, remember that in Python, **everything** is an
> object - even classes! > object - even classes!
...@@ -206,7 +207,7 @@ print(fm) ...@@ -206,7 +207,7 @@ print(fm)
Refer to the [official Refer to the [official
docs](https://docs.python.org/3.5/reference/datamodel.html#special-method-names) docs](https://docs.python.org/3/reference/datamodel.html#special-method-names)
for details on all of the special methods that can be defined in a class. And for details on all of the special methods that can be defined in a class. And
take a look at the appendix for some more details on [how Python objects get take a look at the appendix for some more details on [how Python objects get
created](appendix-init-versus-new). created](appendix-init-versus-new).
...@@ -352,8 +353,8 @@ append a tuple to that `operations` list. ...@@ -352,8 +353,8 @@ append a tuple to that `operations` list.
The idea behind this design is that our `FSLMaths` class will not actually do The idea behind this design is that our `FSLMaths` class will not actually do
anything when we call the `add`, `mul` or `div` methods. Instead, it will anything when we call the `add`, `mul` or `div` methods. Instead, it will
"stage" each operation, and then perform them all in one go. So let's add *stage* each operation, and then perform them all in one go at a later point
another method, `run`, which actually does the work: in time. So let's add another method, `run`, which actually does the work:
``` ```
...@@ -387,7 +388,6 @@ class FSLMaths(object): ...@@ -387,7 +388,6 @@ class FSLMaths(object):
if isinstance(value, nib.nifti1.Nifti1Image): if isinstance(value, nib.nifti1.Nifti1Image):
value = value.get_data() value = value.get_data()
if oper == 'add': if oper == 'add':
data = data + value data = data + value
elif oper == 'mul': elif oper == 'mul':
...@@ -430,6 +430,99 @@ print('Number of voxels >0 in masked image: {}'.format(nmaskvox)) ...@@ -430,6 +430,99 @@ print('Number of voxels >0 in masked image: {}'.format(nmaskvox))
``` ```
<a class="anchor" id="method-chaining"></a>
## Method chaining
A neat trick, which is used by all the cool kids these days, is to write
classes that allow *method chaining* - writing one line of code which
calls more than one method on an object, e.g.:
> ```
> fm = FSLMaths(img)
> result = fm.add(1).mul(10).run()
> ```
Adding this feature to our budding `FSLMaths` class is easy - all we have
to do is return `self` from each method:
```
import numpy as np
import nibabel as nib
class FSLMaths(object):
def __init__(self, inimg):
self.img = inimg
self.operations = []
def add(self, value):
self.operations.append(('add', value))
return self
def mul(self, value):
self.operations.append(('mul', value))
return self
def div(self, value):
self.operations.append(('div', value))
return self
def run(self, output=None):
data = np.array(self.img.get_data())
for oper, value in self.operations:
# Value could be an image.
# If not, we assume that
# it is a scalar/numpy array.
if isinstance(value, nib.nifti1.Nifti1Image):
value = value.get_data()
if oper == 'add':
data = data + value
elif oper == 'mul':
data = data * value
elif oper == 'div':
data = data / value
# turn final output into a nifti,
# and save it to disk if an
# 'output' has been specified.
outimg = nib.nifti1.Nifti1Image(data, inimg.affine)
if output is not None:
nib.save(outimg, output)
return outimg
```
Now we can chain all of our method calls, and even the creation of our
`FSLMaths` object, into a single line:
```
fpath = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm.nii.gz')
fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath)
mask = nib.load(fmask)
outimg = FSLMaths(inimg).mul(mask).add(-10).run()
norigvox = (inimg .get_data() > 0).sum()
nmaskvox = (outimg.get_data() > 0).sum()
print('Number of voxels >0 in original image: {}'.format(norigvox))
print('Number of voxels >0 in masked image: {}'.format(nmaskvox))
```
> In fact, this is precisely how the
> [`fsl.wrappers.fslmaths`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.wrappers.fslmaths.html)
> function works.
<a class="anchor" id="protecting-attribute-access"></a> <a class="anchor" id="protecting-attribute-access"></a>
## Protecting attribute access ## Protecting attribute access
...@@ -488,9 +581,8 @@ of an object. This is in contrast to languages like C++ and Java, where the ...@@ -488,9 +581,8 @@ of an object. This is in contrast to languages like C++ and Java, where the
notion of a private attribute or method is strictly enforced by the language. notion of a private attribute or method is strictly enforced by the language.
However, there are a couple of conventions in Python that are [universally However, there are a couple of conventions in Python that are
adhered [universally adhered to](https://docs.python.org/3/tutorial/classes.html#private-variables):
to](https://docs.python.org/3.5/tutorial/classes.html#private-variables):
* Class-level attributes and methods, and module-level attributes, functions, * Class-level attributes and methods, and module-level attributes, functions,
and classes, which begin with a single underscore (`_`), should be and classes, which begin with a single underscore (`_`), should be
...@@ -504,14 +596,13 @@ to](https://docs.python.org/3.5/tutorial/classes.html#private-variables): ...@@ -504,14 +596,13 @@ to](https://docs.python.org/3.5/tutorial/classes.html#private-variables):
enforcement for this rule - any attribute or method with such a name will enforcement for this rule - any attribute or method with such a name will
actually be _renamed_ (in a standardised manner) at runtime, so that it is actually be _renamed_ (in a standardised manner) at runtime, so that it is
not accessible through its original name (it is still accessible via its not accessible through its original name (it is still accessible via its
[mangled [mangled name](https://docs.python.org/3/tutorial/classes.html#private-variables)
name](https://docs.python.org/3.5/tutorial/classes.html#private-variables)
though). though).
> <sup>2</sup> With the exception that module-level fields which begin with a > <sup>2</sup> With the exception that module-level fields which begin with a
> single underscore will not be imported into the local scope via the > single underscore will not be imported into the local scope via the
> `from [module] import *` techinque. > `from [module] import *` technique.
So with all of this in mind, we can adjust our `FSLMaths` class to discourage So with all of this in mind, we can adjust our `FSLMaths` class to discourage
...@@ -541,7 +632,7 @@ print(fm.__img) ...@@ -541,7 +632,7 @@ print(fm.__img)
Python has a feature called Python has a feature called
[`properties`](https://docs.python.org/3.5/library/functions.html#property), [`properties`](https://docs.python.org/3/library/functions.html#property),
which is a nice way of controlling access to the attributes of an object. We which is a nice way of controlling access to the attributes of an object. We
can use properties by defining a "getter" method which can be used to access can use properties by defining a "getter" method which can be used to access
our attributes, and "decorating" them with the `@property` decorator (we will our attributes, and "decorating" them with the `@property` decorator (we will
...@@ -676,17 +767,17 @@ class Chihuahua(Dog): ...@@ -676,17 +767,17 @@ class Chihuahua(Dog):
Hopefully this example doesn't need much in the way of explanation - this Hopefully this example doesn't need much in the way of explanation - this
collection of classes captures a hierarchical relationship which exists in the collection of classes represents a hierarchical relationship which exists in
real world (and also captures the inherently annoying nature of the real world (and also represents the inherently annoying nature of
chihuahuas). For example, in the real world, all dogs are animals, but not all chihuahuas). For example, in the real world, all dogs are animals, but not all
animals are dogs. Therefore in our model, the `Dog` class has specified animals are dogs. Therefore in our model, the `Dog` class has specified
`Animal` as its base class. We say that the `Dog` class _extends_, _derives `Animal` as its base class. We say that the `Dog` class *extends*, *derives
from_, or _inherits from_, the `Animal` class, and that all `Dog` instances from*, or *inherits from*, the `Animal` class, and that all `Dog` instances
are also `Animal` instances (but not vice-versa). are also `Animal` instances (but not vice-versa).
What does that `noiseMade` method do? There is a `noiseMade` method defined What does that `noiseMade` method do? There is a `noiseMade` method defined
on the `Animal` class, but it has been re-implemented, or _overridden_ in the on the `Animal` class, but it has been re-implemented, or *overridden* in the
`Dog`, `Dog`,
[`TalkingDog`](https://twitter.com/simpsonsqotd/status/427941665836630016?lang=en), [`TalkingDog`](https://twitter.com/simpsonsqotd/status/427941665836630016?lang=en),
`Cat`, and `Chihuahua` classes (but not on the `Labrador` class). We can call `Cat`, and `Chihuahua` classes (but not on the `Labrador` class). We can call
...@@ -819,7 +910,7 @@ This line invokes `Operator.__init__` - the initialisation method for the ...@@ -819,7 +910,7 @@ This line invokes `Operator.__init__` - the initialisation method for the
In Python, we can use the [built-in `super` In Python, we can use the [built-in `super`
method](https://docs.python.org/3.5/library/functions.html#super) to take care method](https://docs.python.org/3/library/functions.html#super) to take care
of correctly calling methods that are defined in an object's base-class (or of correctly calling methods that are defined in an object's base-class (or
classes, in the case of [multiple inheritance](multiple-inheritance)). classes, in the case of [multiple inheritance](multiple-inheritance)).
...@@ -920,8 +1011,8 @@ print(so.run('python is an ok language')) ...@@ -920,8 +1011,8 @@ print(so.run('python is an ok language'))
### Polymorphism ### Polymorphism
Inheritance also allows us to take advantage of _polymorphism_, which refers Inheritance also allows us to take advantage of *polymorphism*, which refers
to idea that, in an object-oriented language, we should be able to use an to the idea that, in an object-oriented language, we should be able to use an
object without having complete knowledge about the class, or type, of that object without having complete knowledge about the class, or type, of that
object. For example, we should be able to write a function which expects an object. For example, we should be able to write a function which expects an
`Operator` instance, but which will work on an instance of any `Operator` `Operator` instance, but which will work on an instance of any `Operator`
...@@ -1110,12 +1201,15 @@ class FSLMaths(object): ...@@ -1110,12 +1201,15 @@ class FSLMaths(object):
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
return self
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
return self
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
return self
def run(self, output=None): def run(self, output=None):
...@@ -1125,12 +1219,15 @@ class FSLMaths(object): ...@@ -1125,12 +1219,15 @@ class FSLMaths(object):
# Code omitted for brevity # Code omitted for brevity
# Increment the usage counter # Increment the usage counter for this operation. We can
# for this operation. We can # access class attributes (and methods) through the class
# access class attributes (and # itself, as shown here.
# methods) through the class FSLMaths.opCounters[oper] = FSLMaths.opCounters.get(oper, 0) + 1
# itself.
FSLMaths.opCounters[oper] = self.opCounters.get(oper, 0) + 1 # It is also possible to access class-level
# attributes via instances of the class, e.g.
# self.opCounters[oper] = self.opCounters.get(oper, 0) + 1
``` ```
...@@ -1143,17 +1240,8 @@ fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz') ...@@ -1143,17 +1240,8 @@ fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
mask = nib.load(fmask) mask = nib.load(fmask)
fm1 = FSLMaths(inimg) FSLMaths(inimg).mul(mask).add(25).run()
fm2 = FSLMaths(inimg) FSLMaths(inimg).add(15).div(1.5).run()
fm1.mul(mask)
fm1.add(15)
fm2.add(25)
fm1.div(1.5)
fm1.run()
fm2.run()
print('FSLMaths usage statistics') print('FSLMaths usage statistics')
for oper in ('add', 'div', 'mul'): for oper in ('add', 'div', 'mul'):
...@@ -1194,12 +1282,15 @@ class FSLMaths(object): ...@@ -1194,12 +1282,15 @@ class FSLMaths(object):
def add(self, value): def add(self, value):
self.operations.append(('add', value)) self.operations.append(('add', value))
return self
def mul(self, value): def mul(self, value):
self.operations.append(('mul', value)) self.operations.append(('mul', value))
return self
def div(self, value): def div(self, value):
self.operations.append(('div', value)) self.operations.append(('div', value))
return self
def run(self, output=None): def run(self, output=None):
...@@ -1213,11 +1304,11 @@ class FSLMaths(object): ...@@ -1213,11 +1304,11 @@ class FSLMaths(object):
> There is another decorator - > There is another decorator -
> [`@staticmethod`](https://docs.python.org/3.5/library/functions.html#staticmethod) - > [`@staticmethod`](https://docs.python.org/3.5/library/functions.html#staticmethod) -
> which can be used on methods defined within a class. The difference > which can be used on methods defined within a class. The difference
> between a `@classmethod` and a `@staticmethod` is that the latter will _not_ > between a `@classmethod` and a `@staticmethod` is that the latter will *not*
> be passed the class (`cls`). > be passed the class (`cls`).
calling a class method is the same as accessing a class attribute: Calling a class method is the same as accessing a class attribute:
``` ```
...@@ -1226,14 +1317,8 @@ fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz') ...@@ -1226,14 +1317,8 @@ fmask = op.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')
inimg = nib.load(fpath) inimg = nib.load(fpath)
mask = nib.load(fmask) mask = nib.load(fmask)
fm1 = FSLMaths(inimg) fm1 = FSLMaths(inimg).mul(mask).add(25)
fm2 = FSLMaths(inimg) fm2 = FSLMaths(inimg).add(15).div(1.5)
fm1.mul(mask)
fm1.add(15)
fm2.add(25)
fm1.div(1.5)
fm1.run() fm1.run()
fm2.run() fm2.run()
...@@ -1294,9 +1379,9 @@ always use the new-style format. ...@@ -1294,9 +1379,9 @@ always use the new-style format.
## Appendix: `__init__` versus `__new__` ## Appendix: `__init__` versus `__new__`
In Python, object creation is actually a two-stage process - _creation_, and In Python, object creation is actually a two-stage process - *creation*, and
then _initialisation_. The `__init__` method gets called during the then *initialisation*. The `__init__` method gets called during the
_initialisation_ stage - its job is to initialise the state of the object. But *initialisation* stage - its job is to initialise the state of the object. But
note that, by the time `__init__` gets called, the object has already been note that, by the time `__init__` gets called, the object has already been
created. created.
...@@ -1314,7 +1399,7 @@ A brief explanation on ...@@ -1314,7 +1399,7 @@ A brief explanation on
the difference between `__new__` and `__init__` can be found the difference between `__new__` and `__init__` can be found
[here](https://www.reddit.com/r/learnpython/comments/2s3pms/what_is_the_difference_between_init_and_new/cnm186z/), [here](https://www.reddit.com/r/learnpython/comments/2s3pms/what_is_the_difference_between_init_and_new/cnm186z/),
and you may also wish to take a look at the [official Python and you may also wish to take a look at the [official Python
docs](https://docs.python.org/3.5/reference/datamodel.html#basic-customization). docs](https://docs.python.org/3/reference/datamodel.html#basic-customization).
<a class="anchor" id="appendix-monkey-patching"></a> <a class="anchor" id="appendix-monkey-patching"></a>
...@@ -1322,24 +1407,24 @@ docs](https://docs.python.org/3.5/reference/datamodel.html#basic-customization). ...@@ -1322,24 +1407,24 @@ docs](https://docs.python.org/3.5/reference/datamodel.html#basic-customization).
The act of run-time modification of objects or class definitions is referred The act of run-time modification of objects or class definitions is referred
to as [_monkey-patching_](https://en.wikipedia.org/wiki/Monkey_patch) and, to as [*monkey-patching*](https://en.wikipedia.org/wiki/Monkey_patch) and,
whilst it is allowed by the Python programming language, it is generally whilst it is allowed by the Python programming language, it is generally
considered quite bad practice. considered quite bad practice.
Just because you _can_ do something doesn't mean that you _should_. Python Just because you *can* do something doesn't mean that you *should*. Python
gives you the flexibility to write your software in whatever manner you deem gives you the flexibility to write your software in whatever manner you deem
suitable. __But__ if you want to write software that will be used, adopted, suitable. **But** if you want to write software that will be used, adopted,
maintained, and enjoyed by other people, you should be polite, write your code maintained, and enjoyed by other people, you should be polite, write your code
in a clear, readable fashion, and avoid the use of devious tactics such as in a clear, readable fashion, and avoid the use of devious tactics such as
monkey-patching. monkey-patching.
__However__, while monkey-patching may seem like a horrific programming **However**, while monkey-patching may seem like a horrific programming
practice to those of you coming from the realms of C++, Java, and the like, practice to those of you coming from the realms of C++, Java, and the like,
(and it is horrific in many cases), it can be _extremely_ useful in certain (and it is horrific in many cases), it can be *extremely* useful in certain
circumstances. For instance, monkey-patching makes [unit testing a circumstances. For instance, monkey-patching makes [unit testing a
breeze in Python](https://docs.python.org/3.5/library/unittest.mock.html). breeze in Python](https://docs.python.org/3/library/unittest.mock.html).
As another example, consider the scenario where you are dependent on a third As another example, consider the scenario where you are dependent on a third
...@@ -1398,7 +1483,7 @@ print('Add four: {}'.format(a.add(1, 2, 3, 4))) ...@@ -1398,7 +1483,7 @@ print('Add four: {}'.format(a.add(1, 2, 3, 4)))
``` ```
> <sup>4</sup>Another option is the [`functools.singledispatch` > <sup>4</sup>Another option is the [`functools.singledispatch`
> decorator](https://docs.python.org/3.5/library/functools.html#functools.singledispatch), > decorator](https://docs.python.org/3/library/functools.html#functools.singledispatch),
> which is more complicated, but may allow you to write your dispatch logic in > which is more complicated, but may allow you to write your dispatch logic in
> a more concise manner. > a more concise manner.
...@@ -1411,5 +1496,5 @@ The official Python documentation has a wealth of information on the internal ...@@ -1411,5 +1496,5 @@ The official Python documentation has a wealth of information on the internal
workings of classes and objects, so these pages are worth a read: workings of classes and objects, so these pages are worth a read:
* https://docs.python.org/3.5/tutorial/classes.html * https://docs.python.org/3/tutorial/classes.html
* https://docs.python.org/3.5/reference/datamodel.html * https://docs.python.org/3/reference/datamodel.html
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Operator overloading # Operator overloading
> This practical assumes you are familiar with the basics of object-oriented > This practical assumes you are familiar with the basics of object-oriented
> programming in Python. > programming in Python.
Operator overloading, in an object-oriented programming language, is the Operator overloading, in an object-oriented programming language, is the
process of customising the behaviour of _operators_ (e.g. `+`, `*`, `/` and process of customising the behaviour of _operators_ (e.g. `+`, `*`, `/` and
`-`) on user-defined types. This practical aims to show you that operator `-`) on user-defined types. This practical aims to show you that operator
overloading is __very__ easy to do in Python. overloading is **very** easy to do in Python.
This practical gives a brief overview of the operators which you may be most This practical gives a brief overview of the operators which you may be most
interested in implementing. However, there are many operators (and other interested in implementing. However, there are many operators (and other
special methods) which you can support in your own classes - the [official special methods) which you can support in your own classes - the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#basic-customization) documentation](https://docs.python.org/3/reference/datamodel.html#basic-customization)
is the best reference if you are interested in learning more. is the best reference if you are interested in learning more.
* [Overview](#overview) * [Overview](#overview)
* [Arithmetic operators](#arithmetic-operators) * [Arithmetic operators](#arithmetic-operators)
* [Equality and comparison operators](#equality-and-comparison-operators) * [Equality and comparison operators](#equality-and-comparison-operators)
* [The indexing operator `[]`](#the-indexing-operator) * [The indexing operator `[]`](#the-indexing-operator)
* [The call operator `()`](#the-call-operator) * [The call operator `()`](#the-call-operator)
* [The dot operator `.`](#the-dot-operator) * [The dot operator `.`](#the-dot-operator)
<a class="anchor" id="overview"></a> <a class="anchor" id="overview"></a>
## Overview ## Overview
In Python, when you add two numbers together: In Python, when you add two numbers together:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
a = 5 a = 5
b = 10 b = 10
r = a + b r = a + b
print(r) print(r)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
What actually goes on behind the scenes is this: What actually goes on behind the scenes is this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
r = a.__add__(b) r = a.__add__(b)
print(r) print(r)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In other words, whenever you use the `+` operator on two variables (the In other words, whenever you use the `+` operator on two variables (the
operands to the `+` operator), the Python interpreter calls the `__add__` operands to the `+` operator), the Python interpreter calls the `__add__`
method of the first operand (`a`), and passes the second operand (`b`) as an method of the first operand (`a`), and passes the second operand (`b`) as an
argument. argument.
So it is very easy to use the `+` operator with our own classes - all we have So it is very easy to use the `+` operator with our own classes - all we have
to do is implement a method called `__add__`. to do is implement a method called `__add__`.
<a class="anchor" id="arithmetic-operators"></a> <a class="anchor" id="arithmetic-operators"></a>
## Arithmetic operators ## Arithmetic operators
Let's play with an example - a class which represents a 2D vector: Let's play with an example - a class which represents a 2D vector:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Vector2D(object): class Vector2D(object):
def __init__(self, x, y): def __init__(self, x, y):
self.x = x self.x = x
self.y = y self.y = y
def __str__(self): def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y) return 'Vector2D({}, {})'.format(self.x, self.y)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Note that we have implemented the special `__str__` method, which allows our > Note that we have implemented the special `__str__` method, which allows our
> `Vector2D` instances to be converted into strings. > `Vector2D` instances to be converted into strings.
If we try to use the `+` operator on this class, we are bound to get an error: If we try to use the `+` operator on this class, we are bound to get an error:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
v1 = Vector2D(2, 3) v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5) v2 = Vector2D(4, 5)
print(v1 + v2) print(v1 + v2)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But all we need to do to support the `+` operator is to implement a method But all we need to do to support the `+` operator is to implement a method
called `__add__`: called `__add__`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Vector2D(object): class Vector2D(object):
def __init__(self, x, y): def __init__(self, x, y):
self.x = x self.x = x
self.y = y self.y = y
def __str__(self): def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y) return 'Vector2D({}, {})'.format(self.x, self.y)
def __add__(self, other): def __add__(self, other):
return Vector2D(self.x + other.x, return Vector2D(self.x + other.x,
self.y + other.y) self.y + other.y)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And now we can use `+` on `Vector2D` objects - it's that easy: And now we can use `+` on `Vector2D` objects - it's that easy:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
v1 = Vector2D(2, 3) v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5) v2 = Vector2D(4, 5)
print('{} + {} = {}'.format(v1, v2, v1 + v2)) print('{} + {} = {}'.format(v1, v2, v1 + v2))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Our `__add__` method creates and returns a new `Vector2D` which contains the Our `__add__` method creates and returns a new `Vector2D` which contains the
sum of the `x` and `y` components of the `Vector2D` on which it is called, and sum of the `x` and `y` components of the `Vector2D` on which it is called, and
the `Vector2D` which is passed in. We could also make the `__add__` method the `Vector2D` which is passed in. We could also make the `__add__` method
work with scalars, by extending its definition a bit: work with scalars, by extending its definition a bit:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Vector2D(object): class Vector2D(object):
def __init__(self, x, y): def __init__(self, x, y):
self.x = x self.x = x
self.y = y self.y = y
def __add__(self, other): def __add__(self, other):
if isinstance(other, Vector2D): if isinstance(other, Vector2D):
return Vector2D(self.x + other.x, return Vector2D(self.x + other.x,
self.y + other.y) self.y + other.y)
else: else:
return Vector2D(self.x + other, self.y + other) return Vector2D(self.x + other, self.y + other)
def __str__(self): def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y) return 'Vector2D({}, {})'.format(self.x, self.y)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So now we can add both `Vector2D` instances and scalars numbers together: So now we can add both `Vector2D` instances and scalars numbers together:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
v1 = Vector2D(2, 3) v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5) v2 = Vector2D(4, 5)
n = 6 n = 6
print('{} + {} = {}'.format(v1, v2, v1 + v2)) print('{} + {} = {}'.format(v1, v2, v1 + v2))
print('{} + {} = {}'.format(v1, n, v1 + n)) print('{} + {} = {}'.format(v1, n, v1 + n))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Other numeric and logical operators can be supported by implementing the Other numeric and logical operators can be supported by implementing the
appropriate method, for example: appropriate method, for example:
- Multiplication (`*`): `__mul__` - Multiplication (`*`): `__mul__`
- Division (`/`): `__div__` - Division (`/`): `__div__`
- Negation (`-`): `__neg__` - Negation (`-`): `__neg__`
- In-place addition (`+=`): `__iadd__` - In-place addition (`+=`): `__iadd__`
- Exclusive or (`^`): `__xor__` - Exclusive or (`^`): `__xor__`
When an operator is applied to operands of different types, a set of fall-back When an operator is applied to operands of different types, a set of fall-back
rules are followed depending on the set of methods implemented on the rules are followed depending on the set of methods implemented on the
operands. For example, in the expression `a + b`, if `a.__add__` is not operands. For example, in the expression `a + b`, if `a.__add__` is not
implemented, but but `b.__radd__` is implemented, then the latter will be implemented, but but `b.__radd__` is implemented, then the latter will be
called. Take a look at the [official called. Take a look at the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#emulating-numeric-types) documentation](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types)
for further details, including a full list of the arithmetic and logical for further details, including a full list of the arithmetic and logical
operators that your classes can support. operators that your classes can support.
<a class="anchor" id="equality-and-comparison-operators"></a> <a class="anchor" id="equality-and-comparison-operators"></a>
## Equality and comparison operators ## Equality and comparison operators
Adding support for equality (`==`, `!=`) and comparison (e.g. `>=`) operators Adding support for equality (`==`, `!=`) and comparison (e.g. `>=`) operators
is just as easy. Imagine that we have a class called `Label`, which represents is just as easy. Imagine that we have a class called `Label`, which represents
a label in a lookup table. Our `Label` has an integer label, a name, and an a label in a lookup table. Our `Label` has an integer label, a name, and an
RGB colour: RGB colour:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Label(object): class Label(object):
def __init__(self, label, name, colour): def __init__(self, label, name, colour):
self.label = label self.label = label
self.name = name self.name = name
self.colour = colour self.colour = colour
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In order to ensure that a list of `Label` objects is ordered by their label In order to ensure that a list of `Label` objects is ordered by their label
values, we can implement a set of functions, so that `Label` classes can be values, we can implement a set of functions, so that `Label` classes can be
compared using the standard comparison operators: compared using the standard comparison operators:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import functools import functools
# Don't worry about this statement # Don't worry about this statement
# just yet - it is explained below # just yet - it is explained below
@functools.total_ordering @functools.total_ordering
class Label(object): class Label(object):
def __init__(self, label, name, colour): def __init__(self, label, name, colour):
self.label = label self.label = label
self.name = name self.name = name
self.colour = colour self.colour = colour
def __str__(self): def __str__(self):
rgb = ''.join(['{:02x}'.format(c) for c in self.colour]) rgb = ''.join(['{:02x}'.format(c) for c in self.colour])
return 'Label({}, {}, #{})'.format(self.label, self.name, rgb) return 'Label({}, {}, #{})'.format(self.label, self.name, rgb)
def __repr__(self): def __repr__(self):
return str(self) return str(self)
# implement Label == Label # implement Label == Label
def __eq__(self, other): def __eq__(self, other):
return self.label == other.label return self.label == other.label
# implement Label < Label # implement Label < Label
def __lt__(self, other): def __lt__(self, other):
return self.label < other.label return self.label < other.label
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> We also added `__str__` and `__repr__` methods to the `Label` class so that > We also added `__str__` and `__repr__` methods to the `Label` class so that
> `Label` instances will be printed nicely. > `Label` instances will be printed nicely.
Now we can compare and sort our `Label` instances: Now we can compare and sort our `Label` instances:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
l1 = Label(1, 'Parietal', (255, 0, 0)) l1 = Label(1, 'Parietal', (255, 0, 0))
l2 = Label(2, 'Occipital', ( 0, 255, 0)) l2 = Label(2, 'Occipital', ( 0, 255, 0))
l3 = Label(3, 'Temporal', ( 0, 0, 255)) l3 = Label(3, 'Temporal', ( 0, 0, 255))
print('{} > {}: {}'.format(l1, l2, l1 > l2)) print('{} > {}: {}'.format(l1, l2, l1 > l2))
print('{} < {}: {}'.format(l1, l3, l1 <= l3)) print('{} < {}: {}'.format(l1, l3, l1 <= l3))
print('{} != {}: {}'.format(l2, l3, l2 != l3)) print('{} != {}: {}'.format(l2, l3, l2 != l3))
print(sorted((l3, l1, l2))) print(sorted((l3, l1, l2)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The The
[`@functools.total_ordering`](https://docs.python.org/3.5/library/functools.html#functools.total_ordering) [`@functools.total_ordering`](https://docs.python.org/3/library/functools.html#functools.total_ordering)
is a convenience is a convenience
[decorator](https://docs.python.org/3.5/glossary.html#term-decorator) which, [decorator](https://docs.python.org/3/glossary.html#term-decorator) which,
given a class that implements equality and a single comparison function given a class that implements equality and a single comparison function
(`__lt__` in the above code), will "fill in" the remainder of the comparison (`__lt__` in the above code), will "fill in" the remainder of the comparison
operators. If you need very specific or complicated behaviour, then you can operators. If you need very specific or complicated behaviour, then you can
provide methods for _all_ of the comparison operators, e.g. `__gt__` for `>`, provide methods for _all_ of the comparison operators, e.g. `__gt__` for `>`,
`__ge__` for `>=`, etc.). `__ge__` for `>=`, etc.).
> Decorators are introduced in another practical. > Decorators are introduced in another practical.
But if you just want the operators to work in the conventional manner, you can But if you just want the operators to work in the conventional manner, you can
simply use the `@functools.total_ordering` decorator, and provide `__eq__`, simply use the `@functools.total_ordering` decorator, and provide `__eq__`,
and just one of `__lt__`, `__le__`, `__gt__` or `__ge__`. and just one of `__lt__`, `__le__`, `__gt__` or `__ge__`.
Refer to the [official Refer to the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#object.__lt__) documentation](https://docs.python.org/3/reference/datamodel.html#object.__lt__)
for all of the details on supporting comparison operators. for all of the details on supporting comparison operators.
> You may see the `__cmp__` method in older code bases - this provides a > You may see the `__cmp__` method in older code bases - this provides a
> C-style comparison function which returns `<0`, `0`, or `>0` based on > C-style comparison function which returns `<0`, `0`, or `>0` based on
> comparing two items. This has been superseded by the rich comparison > comparing two items. This has been superseded by the rich comparison
> operators introduced here, and is no longer supported in Python 3. > operators introduced here, and is no longer supported in Python 3.
<a class="anchor" id="the-indexing-operator"></a> <a class="anchor" id="the-indexing-operator"></a>
## The indexing operator `[]` ## The indexing operator `[]`
The indexing operator (`[]`) is generally used by "container" types, such as The indexing operator (`[]`) is generally used by "container" types, such as
the built-in `list` and `dict` classes. the built-in `list` and `dict` classes.
At its essence, there are only three types of behaviours that are possible At its essence, there are only three types of behaviours that are possible
with the `[]` operator. All that is needed to support them are to implement with the `[]` operator. All that is needed to support them are to implement
three special methods in your class, regardless of whether your class will be three special methods in your class, regardless of whether your class will be
indexed by sequential integers (like a `list`) or by indexed by sequential integers (like a `list`) or by
[hashable](https://docs.python.org/3.5/glossary.html#term-hashable) values [hashable](https://docs.python.org/3/glossary.html#term-hashable) values
(like a `dict`): (like a `dict`):
- __Retrieval__ is performed by the `__getitem__` method - **Retrieval** is performed by the `__getitem__` method
- __Assignment__ is performed by the `__setitem__` method - **Assignment** is performed by the `__setitem__` method
- __Deletion__ is performed by the `__delitem__` method - **Deletion** is performed by the `__delitem__` method
Note that, if you implement these methods in your own class, there is no Note that, if you implement these methods in your own class, there is no
requirement for them to actually provide any form of data storage or requirement for them to actually provide any form of data storage or
retrieval. However if you don't, you will probably confuse users of your code retrieval. However if you don't, you will probably confuse users of your code
who are used to how the `list` and `dict` types work. Whenever you deviate who are used to how the `list` and `dict` types work. Whenever you deviate
from conventional behaviour, make sure you explain it well in your from conventional behaviour, make sure you explain it well in your
documentation! documentation!
The following contrived example demonstrates all three behaviours: The following contrived example demonstrates all three behaviours:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class TwoTimes(object): class TwoTimes(object):
def __init__(self): def __init__(self):
self.__deleted = set() self.__deleted = set()
self.__assigned = {} self.__assigned = {}
def __getitem__(self, key): def __getitem__(self, key):
if key in self.__deleted: if key in self.__deleted:
raise KeyError('{} has been deleted!'.format(key)) raise KeyError('{} has been deleted!'.format(key))
elif key in self.__assigned: elif key in self.__assigned:
return self.__assigned[key] return self.__assigned[key]
else: else:
return key * 2 return key * 2
def __setitem__(self, key, value): def __setitem__(self, key, value):
self.__assigned[key] = value self.__assigned[key] = value
def __delitem__(self, key): def __delitem__(self, key):
self.__deleted.add(key) self.__deleted.add(key)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Guess what happens whenever we index a `TwoTimes` object: Guess what happens whenever we index a `TwoTimes` object:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
tt = TwoTimes() tt = TwoTimes()
print('TwoTimes[{}] = {}'.format(2, tt[2])) print('TwoTimes[{}] = {}'.format(2, tt[2]))
print('TwoTimes[{}] = {}'.format(6, tt[6])) print('TwoTimes[{}] = {}'.format(6, tt[6]))
print('TwoTimes[{}] = {}'.format('abc', tt['abc'])) print('TwoTimes[{}] = {}'.format('abc', tt['abc']))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `TwoTimes` class allows us to override the value for a specific key: The `TwoTimes` class allows us to override the value for a specific key:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print(tt[4]) print(tt[4])
tt[4] = 'this is not 4 * 4' tt[4] = 'this is not 4 * 4'
print(tt[4]) print(tt[4])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And we can also "delete" keys: And we can also "delete" keys:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print(tt['12345']) print(tt['12345'])
del tt['12345'] del tt['12345']
# this is going to raise an error # this is going to raise an error
print(tt['12345']) print(tt['12345'])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If you wish to support the Python `start:stop:step` [slice If you wish to support the Python `start:stop:step` [slice
notation](https://docs.python.org/3.5/library/functions.html#slice), you notation](https://docs.python.org/3/library/functions.html#slice), you
simply need to write your `__getitem__` and `__setitem__` methods so that they simply need to write your `__getitem__` and `__setitem__` methods so that they
can detect `slice` objects: can detect `slice` objects:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class TwoTimes(object): class TwoTimes(object):
def __init__(self, max): def __init__(self, max):
self.__max = max self.__max = max
def __getitem__(self, key): def __getitem__(self, key):
if isinstance(key, slice): if isinstance(key, slice):
start = key.start or 0 start = key.start or 0
stop = key.stop or self.__max stop = key.stop or self.__max
step = key.step or 1 step = key.step or 1
else: else:
start = key start = key
stop = key + 1 stop = key + 1
step = 1 step = 1
return [i * 2 for i in range(start, stop, step)] return [i * 2 for i in range(start, stop, step)]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now we can "slice" a `TwoTimes` instance: Now we can "slice" a `TwoTimes` instance:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
tt = TwoTimes(10) tt = TwoTimes(10)
print(tt[5]) print(tt[5])
print(tt[3:7]) print(tt[3:7])
print(tt[::2]) print(tt[::2])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> It is possible to sub-class the built-in `list` and `dict` classes if you > It is possible to sub-class the built-in `list` and `dict` classes if you
> wish to extend their functionality in some way. However, if you are writing > wish to extend their functionality in some way. However, if you are writing
> a class that should mimic the one of the `list` or `dict` classes, but work > a class that should mimic the one of the `list` or `dict` classes, but work
> in a different way internally (e.g. a `dict`-like object which uses a > in a different way internally (e.g. a `dict`-like object which uses a
> different hashing algorithm), the `Sequence` and `MutableMapping` classes > different hashing algorithm), the `Sequence` and `MutableMapping` classes
> are [a better choice](https://stackoverflow.com/a/7148602) - you can find > are [a better choice](https://stackoverflow.com/a/7148602) - you can find
> them in the > them in the
> [`collections.abc`](https://docs.python.org/3.5/library/collections.abc.html) > [`collections.abc`](https://docs.python.org/3/library/collections.abc.html)
> module. > module.
<a class="anchor" id="the-call-operator"></a> <a class="anchor" id="the-call-operator"></a>
## The call operator `()` ## The call operator `()`
Remember how everything in Python is an object, even functions? When you call Remember how everything in Python is an object, even functions? When you call
a function, a method called `__call__` is called on the function object. We can a function, a method called `__call__` is called on the function object. We can
implement the `__call__` method on our own class, which will allow us to "call" implement the `__call__` method on our own class, which will allow us to "call"
objects as if they are functions. objects as if they are functions.
For example, the `TimedFunction` class allows us to calculate the execution For example, the `TimedFunction` class allows us to calculate the execution
time of any function: time of any function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
class TimedFunction(object): class TimedFunction(object):
def __init__(self, func): def __init__(self, func):
self.func = func self.func = func
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
print('Timing {}...'.format(self.func.__name__)) print('Timing {}...'.format(self.func.__name__))
start = time.time() start = time.time()
retval = self.func(*args, **kwargs) retval = self.func(*args, **kwargs)
end = time.time() end = time.time()
print('Elapsed time: {:0.2f} seconds'.format(end - start)) print('Elapsed time: {:0.2f} seconds'.format(end - start))
return retval return retval
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Let's see how the `TimedFunction` behaves: Let's see how the `TimedFunction` behaves:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
import numpy.linalg as npla import numpy.linalg as npla
def inverse(data): def inverse(data):
return npla.inv(data) return npla.inv(data)
tf = TimedFunction(inverse) tf = TimedFunction(inverse)
data = np.random.random((5000, 5000)) data = np.random.random((5000, 5000))
# Wait a few seconds after # Wait a few seconds after
# running this code block! # running this code block!
inv = tf(data) inv = tf(data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> The `TimedFunction` class is conceptually very similar to a > The `TimedFunction` class is conceptually very similar to a
> [decorator](https://docs.python.org/3.5/glossary.html#term-decorator) - > [decorator](https://docs.python.org/3/glossary.html#term-decorator) -
> decorators are covered in another practical. > decorators are covered in another practical.
<a class="anchor" id="the-dot-operator"></a> <a class="anchor" id="the-dot-operator"></a>
## The dot operator `.` ## The dot operator `.`
Python allows us to override the `.` (dot) operator which is used to access Python allows us to override the `.` (dot) operator which is used to access
the attributes and methods of an object. This is very powerful, but is also the attributes and methods of an object. This is very powerful, but is also
quite a niche feature, and it is easy to trip yourself up, so if you wish to quite a niche feature, and it is easy to trip yourself up, so if you wish to
use this in your own project, make sure that you carefully read (and use this in your own project, make sure that you carefully read (and
understand) [the understand) [the
documentation](https://docs.python.org/3.5/reference/datamodel.html#customizing-attribute-access), documentation](https://docs.python.org/3/reference/datamodel.html#customizing-attribute-access),
and test your code comprehensively! and test your code comprehensively!
For this example, we need a little background information. OpenGL includes For this example, we need a little background information. OpenGL includes
the native data types `vec2`, `vec3`, and `vec4`, which can be used to the native data types `vec2`, `vec3`, and `vec4`, which can be used to
represent 2, 3, or 4 component vectors respectively. These data types have a represent 2, 3, or 4 component vectors respectively. These data types have a
neat feature called [_swizzling_][glslref], which allows you to access any neat feature called [_swizzling_][glslref], which allows you to access any
component (`x`,`y`, `z`, `w` for vectors, or `r`, `g`, `b`, `a` for colours) component (`x`,`y`, `z`, `w` for vectors, or `r`, `g`, `b`, `a` for colours)
in any order, with a syntax similar to attribute access in Python. in any order, with a syntax similar to attribute access in Python.
[glslref]: https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling [glslref]: https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling
So here is an example which implements this swizzle-style attribute access on So here is an example which implements this swizzle-style attribute access on
a class called `Vector`, in which we have customised the behaviour of the `.` a class called `Vector`, in which we have customised the behaviour of the `.`
operator: operator:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class Vector(object): class Vector(object):
def __init__(self, xyz): def __init__(self, xyz):
self.__xyz = list(xyz) self.__xyz = list(xyz)
def __str__(self): def __str__(self):
return 'Vector({})'.format(self.__xyz) return 'Vector({})'.format(self.__xyz)
def __getattr__(self, key): def __getattr__(self, key):
# Swizzling behaviour only occurs when # Swizzling behaviour only occurs when
# the attribute name is entirely comprised # the attribute name is entirely comprised
# of 'x', 'y', and 'z'. # of 'x', 'y', and 'z'.
if not all([c in 'xyz' for c in key]): if not all([c in 'xyz' for c in key]):
raise AttributeError(key) raise AttributeError(key)
key = ['xyz'.index(c) for c in key] key = ['xyz'.index(c) for c in key]
return [self.__xyz[c] for c in key] return [self.__xyz[c] for c in key]
def __setattr__(self, key, value): def __setattr__(self, key, value):
# Restrict swizzling behaviour as above # Restrict swizzling behaviour as above
if not all([c in 'xyz' for c in key]): if not all([c in 'xyz' for c in key]):
return super().__setattr__(key, value) return super().__setattr__(key, value)
if len(key) == 1: if len(key) == 1:
value = (value,) value = (value,)
idxs = ['xyz'.index(c) for c in key] idxs = ['xyz'.index(c) for c in key]
for i, v in sorted(zip(idxs, value)): for i, v in sorted(zip(idxs, value)):
self.__xyz[i] = v self.__xyz[i] = v
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And here it is in action: And here it is in action:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
v = Vector((1, 2, 3)) v = Vector((1, 2, 3))
print('v: ', v) print('v: ', v)
print('xyz: ', v.xyz) print('xyz: ', v.xyz)
print('zy: ', v.zy) print('zy: ', v.zy)
print('xx: ', v.xx) print('xx: ', v.xx)
v.xz = 10, 30 v.xz = 10, 30
print(v) print(v)
v.y = 20 v.y = 20
print(v) print(v)
``` ```
......
...@@ -8,13 +8,13 @@ ...@@ -8,13 +8,13 @@
Operator overloading, in an object-oriented programming language, is the Operator overloading, in an object-oriented programming language, is the
process of customising the behaviour of _operators_ (e.g. `+`, `*`, `/` and process of customising the behaviour of _operators_ (e.g. `+`, `*`, `/` and
`-`) on user-defined types. This practical aims to show you that operator `-`) on user-defined types. This practical aims to show you that operator
overloading is __very__ easy to do in Python. overloading is **very** easy to do in Python.
This practical gives a brief overview of the operators which you may be most This practical gives a brief overview of the operators which you may be most
interested in implementing. However, there are many operators (and other interested in implementing. However, there are many operators (and other
special methods) which you can support in your own classes - the [official special methods) which you can support in your own classes - the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#basic-customization) documentation](https://docs.python.org/3/reference/datamodel.html#basic-customization)
is the best reference if you are interested in learning more. is the best reference if you are interested in learning more.
...@@ -173,7 +173,7 @@ rules are followed depending on the set of methods implemented on the ...@@ -173,7 +173,7 @@ rules are followed depending on the set of methods implemented on the
operands. For example, in the expression `a + b`, if `a.__add__` is not operands. For example, in the expression `a + b`, if `a.__add__` is not
implemented, but but `b.__radd__` is implemented, then the latter will be implemented, but but `b.__radd__` is implemented, then the latter will be
called. Take a look at the [official called. Take a look at the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#emulating-numeric-types) documentation](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types)
for further details, including a full list of the arithmetic and logical for further details, including a full list of the arithmetic and logical
operators that your classes can support. operators that your classes can support.
...@@ -252,9 +252,9 @@ print(sorted((l3, l1, l2))) ...@@ -252,9 +252,9 @@ print(sorted((l3, l1, l2)))
The The
[`@functools.total_ordering`](https://docs.python.org/3.5/library/functools.html#functools.total_ordering) [`@functools.total_ordering`](https://docs.python.org/3/library/functools.html#functools.total_ordering)
is a convenience is a convenience
[decorator](https://docs.python.org/3.5/glossary.html#term-decorator) which, [decorator](https://docs.python.org/3/glossary.html#term-decorator) which,
given a class that implements equality and a single comparison function given a class that implements equality and a single comparison function
(`__lt__` in the above code), will "fill in" the remainder of the comparison (`__lt__` in the above code), will "fill in" the remainder of the comparison
operators. If you need very specific or complicated behaviour, then you can operators. If you need very specific or complicated behaviour, then you can
...@@ -271,7 +271,7 @@ and just one of `__lt__`, `__le__`, `__gt__` or `__ge__`. ...@@ -271,7 +271,7 @@ and just one of `__lt__`, `__le__`, `__gt__` or `__ge__`.
Refer to the [official Refer to the [official
documentation](https://docs.python.org/3.5/reference/datamodel.html#object.__lt__) documentation](https://docs.python.org/3/reference/datamodel.html#object.__lt__)
for all of the details on supporting comparison operators. for all of the details on supporting comparison operators.
...@@ -293,13 +293,13 @@ At its essence, there are only three types of behaviours that are possible ...@@ -293,13 +293,13 @@ At its essence, there are only three types of behaviours that are possible
with the `[]` operator. All that is needed to support them are to implement with the `[]` operator. All that is needed to support them are to implement
three special methods in your class, regardless of whether your class will be three special methods in your class, regardless of whether your class will be
indexed by sequential integers (like a `list`) or by indexed by sequential integers (like a `list`) or by
[hashable](https://docs.python.org/3.5/glossary.html#term-hashable) values [hashable](https://docs.python.org/3/glossary.html#term-hashable) values
(like a `dict`): (like a `dict`):
- __Retrieval__ is performed by the `__getitem__` method - **Retrieval** is performed by the `__getitem__` method
- __Assignment__ is performed by the `__setitem__` method - **Assignment** is performed by the `__setitem__` method
- __Deletion__ is performed by the `__delitem__` method - **Deletion** is performed by the `__delitem__` method
Note that, if you implement these methods in your own class, there is no Note that, if you implement these methods in your own class, there is no
...@@ -370,7 +370,7 @@ print(tt['12345']) ...@@ -370,7 +370,7 @@ print(tt['12345'])
If you wish to support the Python `start:stop:step` [slice If you wish to support the Python `start:stop:step` [slice
notation](https://docs.python.org/3.5/library/functions.html#slice), you notation](https://docs.python.org/3/library/functions.html#slice), you
simply need to write your `__getitem__` and `__setitem__` methods so that they simply need to write your `__getitem__` and `__setitem__` methods so that they
can detect `slice` objects: can detect `slice` objects:
...@@ -414,7 +414,7 @@ print(tt[::2]) ...@@ -414,7 +414,7 @@ print(tt[::2])
> different hashing algorithm), the `Sequence` and `MutableMapping` classes > different hashing algorithm), the `Sequence` and `MutableMapping` classes
> are [a better choice](https://stackoverflow.com/a/7148602) - you can find > are [a better choice](https://stackoverflow.com/a/7148602) - you can find
> them in the > them in the
> [`collections.abc`](https://docs.python.org/3.5/library/collections.abc.html) > [`collections.abc`](https://docs.python.org/3/library/collections.abc.html)
> module. > module.
...@@ -472,7 +472,7 @@ inv = tf(data) ...@@ -472,7 +472,7 @@ inv = tf(data)
> The `TimedFunction` class is conceptually very similar to a > The `TimedFunction` class is conceptually very similar to a
> [decorator](https://docs.python.org/3.5/glossary.html#term-decorator) - > [decorator](https://docs.python.org/3/glossary.html#term-decorator) -
> decorators are covered in another practical. > decorators are covered in another practical.
...@@ -485,7 +485,7 @@ the attributes and methods of an object. This is very powerful, but is also ...@@ -485,7 +485,7 @@ the attributes and methods of an object. This is very powerful, but is also
quite a niche feature, and it is easy to trip yourself up, so if you wish to quite a niche feature, and it is easy to trip yourself up, so if you wish to
use this in your own project, make sure that you carefully read (and use this in your own project, make sure that you carefully read (and
understand) [the understand) [the
documentation](https://docs.python.org/3.5/reference/datamodel.html#customizing-attribute-access), documentation](https://docs.python.org/3/reference/datamodel.html#customizing-attribute-access),
and test your code comprehensively! and test your code comprehensively!
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Context managers # Context managers
The recommended way to open a file in Python is via the `with` statement: The recommended way to open a file in Python is via the `with` statement:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with open('05_context_managers.md', 'rt') as f: with open('05_context_managers.md', 'rt') as f:
firstlines = f.readlines()[:4] firstlines = f.readlines()[:4]
firstlines = [l.strip() for l in firstlines] firstlines = [l.strip() for l in firstlines]
print('\n'.join(firstlines)) print('\n'.join(firstlines))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This is because the `with` statement ensures that the file will be closed This is because the `with` statement ensures that the file will be closed
automatically, even if an error occurs inside the `with` statement. automatically, even if an error occurs inside the `with` statement.
The `with` statement is obviously hiding some internal details from us. But The `with` statement is obviously hiding some internal details from us. But
these internals are in fact quite straightforward, and are known as [_context these internals are in fact quite straightforward, and are known as [*context
managers_](https://docs.python.org/3.5/reference/datamodel.html#context-managers). managers*](https://docs.python.org/3/reference/datamodel.html#context-managers).
* [Anatomy of a context manager](#anatomy-of-a-context-manager) * [Anatomy of a context manager](#anatomy-of-a-context-manager)
* [Why not just use `try ... finally`?](#why-not-just-use-try-finally) * [Why not just use `try ... finally`?](#why-not-just-use-try-finally)
* [Uses for context managers](#uses-for-context-managers) * [Uses for context managers](#uses-for-context-managers)
* [Handling errors in `__exit__`](#handling-errors-in-exit) * [Handling errors in `__exit__`](#handling-errors-in-exit)
* [Suppressing errors with `__exit__`](#suppressing-errors-with-exit) * [Suppressing errors with `__exit__`](#suppressing-errors-with-exit)
* [Nesting context managers](#nesting-context-managers) * [Nesting context managers](#nesting-context-managers)
* [Functions as context managers](#functions-as-context-managers) * [Functions as context managers](#functions-as-context-managers)
* [Methods as context managers](#methods-as-context-managers) * [Methods as context managers](#methods-as-context-managers)
* [Useful references](#useful-references) * [Useful references](#useful-references)
<a class="anchor" id="anatomy-of-a-context-manager"></a> <a class="anchor" id="anatomy-of-a-context-manager"></a>
## Anatomy of a context manager ## Anatomy of a context manager
A _context manager_ is simply an object which has two specially named methods A *context manager* is simply an object which has two specially named methods
`__enter__` and `__exit__`. Any object which has these methods can be used in `__enter__` and `__exit__`. Any object which has these methods can be used in
a `with` statement. a `with` statement.
Let's define a context manager class that we can play with: Let's define a context manager class that we can play with:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MyContextManager(object): class MyContextManager(object):
def __enter__(self): def __enter__(self):
print('In enter') print('In enter')
def __exit__(self, *args): def __exit__(self, *args):
print('In exit') print('In exit')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, what happens when we use `MyContextManager` in a `with` statement? Now, what happens when we use `MyContextManager` in a `with` statement?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with MyContextManager(): with MyContextManager():
print('In with block') print('In with block')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So the `__enter__` method is called before the statements in the `with` block, So the `__enter__` method is called before the statements in the `with` block,
and the `__exit__` method is called afterwards. and the `__exit__` method is called afterwards.
Context managers are that simple. What makes them really useful though, is Context managers are that simple. What makes them really useful though, is
that the `__exit__` method will be called even if the code in the `with` block that the `__exit__` method will be called even if the code in the `with` block
raises an error. The error will be held, and only raised after the `__exit__` raises an error. The error will be held, and only raised after the `__exit__`
method has finished: method has finished:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with MyContextManager(): with MyContextManager():
print('In with block') print('In with block')
assert 1 == 0 assert 1 == 0
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This means that we can use context managers to perform any sort of clean up or This means that we can use context managers to perform any sort of clean up or
finalisation logic that we always want to have executed. finalisation logic that we always want to have executed.
<a class="anchor" id="why-not-just-use-try-finally"></a> <a class="anchor" id="why-not-just-use-try-finally"></a>
### Why not just use `try ... finally`? ### Why not just use `try ... finally`?
Context managers do not provide anything that cannot be accomplished in other Context managers do not provide anything that cannot be accomplished in other
ways. For example, we could accomplish very similar behaviour using ways. For example, we could accomplish very similar behaviour using
[`try` - `finally` logic](https://docs.python.org/3.5/tutorial/errors.html#handling-exceptions) - [`try` - `finally` logic](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) -
the statements in the `finally` clause will *always* be executed, whether an the statements in the `finally` clause will *always* be executed, whether an
error is raised or not: error is raised or not:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('Before try block') print('Before try block')
try: try:
print('In try block') print('In try block')
assert 1 == 0 assert 1 == 0
finally: finally:
print('In finally block') print('In finally block')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But context managers have the advantage that you can implement your clean-up But context managers have the advantage that you can implement your clean-up
logic in one place, and re-use it as many times as you want. logic in one place, and re-use it as many times as you want.
<a class="anchor" id="uses-for-context-managers"></a> <a class="anchor" id="uses-for-context-managers"></a>
## Uses for context managers ## Uses for context managers
We have already talked about how context managers can be used to perform any We have already talked about how context managers can be used to perform any
task which requires some initialistion and/or clean-up logic. As an example, task which requires some initialistion and/or clean-up logic. As an example,
here is a context manager which creates a temporary directory, and then makes here is a context manager which creates a temporary directory, and then makes
sure that it is deleted afterwards. sure that it is deleted afterwards.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import os import os
import shutil import shutil
import tempfile import tempfile
class TempDir(object): class TempDir(object):
def __enter__(self): def __enter__(self):
self.tempDir = tempfile.mkdtemp() self.tempDir = tempfile.mkdtemp()
self.prevDir = os.getcwd() self.prevDir = os.getcwd()
print('Changing to temp dir: {}'.format(self.tempDir)) print('Changing to temp dir: {}'.format(self.tempDir))
print('Previous directory: {}'.format(self.prevDir)) print('Previous directory: {}'.format(self.prevDir))
os.chdir(self.tempDir) os.chdir(self.tempDir)
def __exit__(self, *args): def __exit__(self, *args):
print('Changing back to: {}'.format(self.prevDir)) print('Changing back to: {}'.format(self.prevDir))
print('Removing temp dir: {}'.format(self.tempDir)) print('Removing temp dir: {}'.format(self.tempDir))
os .chdir( self.prevDir) os .chdir( self.prevDir)
shutil.rmtree(self.tempDir) shutil.rmtree(self.tempDir)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now imagine that we have a function which loads data from a file, and performs Now imagine that we have a function which loads data from a file, and performs
some calculation on it: some calculation on it:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
def complexAlgorithm(infile): def complexAlgorithm(infile):
data = np.loadtxt(infile) data = np.loadtxt(infile)
return data.mean() return data.mean()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We could use the `TempDir` context manager to write a test case for this We could use the `TempDir` context manager to write a test case for this
function, and not have to worry about cleaning up the test data: function, and not have to worry about cleaning up the test data:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with TempDir(): with TempDir():
print('Testing complex algorithm') print('Testing complex algorithm')
data = np.random.random((100, 100)) data = np.random.random((100, 100))
np.savetxt('data.txt', data) np.savetxt('data.txt', data)
result = complexAlgorithm('data.txt') result = complexAlgorithm('data.txt')
assert result > 0.1 and result < 0.9 assert result > 0.1 and result < 0.9
print('Test passed (result: {})'.format(result)) print('Test passed (result: {})'.format(result))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="handling-errors-in-exit"></a> <a class="anchor" id="handling-errors-in-exit"></a>
### Handling errors in `__exit__` ### Handling errors in `__exit__`
By now you must be [panicking](https://youtu.be/cSU_5MgtDc8?t=9) about why I By now you must be [panicking](https://youtu.be/cSU_5MgtDc8?t=9) about why I
haven't mentioned those conspicuous `*args` that get passed to the`__exit__` haven't mentioned those conspicuous `*args` that get passed to the`__exit__`
method. It turns out that a context manager's [`__exit__` method. It turns out that a context manager's [`__exit__`
method](https://docs.python.org/3.5/reference/datamodel.html#object.__exit__) method](https://docs.python.org/3/reference/datamodel.html#object.__exit__)
is always passed three arguments. is always passed three arguments.
Let's adjust our `MyContextManager` class a little so we can see what these Let's adjust our `MyContextManager` class a little so we can see what these
arguments are for: arguments are for:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MyContextManager(object): class MyContextManager(object):
def __enter__(self): def __enter__(self):
print('In enter') print('In enter')
def __exit__(self, arg1, arg2, arg3): def __exit__(self, arg1, arg2, arg3):
print('In exit') print('In exit')
print(' arg1: ', arg1) print(' arg1: ', arg1)
print(' arg2: ', arg2) print(' arg2: ', arg2)
print(' arg3: ', arg3) print(' arg3: ', arg3)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If the code inside the `with` statement does not raise an error, these three If the code inside the `with` statement does not raise an error, these three
arguments will all be `None`. arguments will all be `None`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with MyContextManager(): with MyContextManager():
print('In with block') print('In with block')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
However, if the code inside the `with` statement raises an error, things look However, if the code inside the `with` statement raises an error, things look
a little different: a little different:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with MyContextManager(): with MyContextManager():
print('In with block') print('In with block')
raise ValueError('Oh no!') raise ValueError('Oh no!')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So when an error occurs, the `__exit__` method is passed the following: So when an error occurs, the `__exit__` method is passed the following:
- The [`Exception`](https://docs.python.org/3.5/tutorial/errors.html) - The [`Exception`](https://docs.python.org/3/tutorial/errors.html)
type that was raised. type that was raised.
- The `Exception` instance that was raised. - The `Exception` instance that was raised.
- A [`traceback`](https://docs.python.org/3.5/library/traceback.html) object - A [`traceback`](https://docs.python.org/3/library/traceback.html) object
which can be used to get more information about the exception (e.g. line which can be used to get more information about the exception (e.g. line
number). number).
<a class="anchor" id="suppressing-errors-with-exit"></a> <a class="anchor" id="suppressing-errors-with-exit"></a>
### Suppressing errors with `__exit__` ### Suppressing errors with `__exit__`
The `__exit__` method is also capable of suppressing errors - if it returns a The `__exit__` method is also capable of suppressing errors - if it returns a
value of `True`, then any error that was raised will be ignored. For example, value of `True`, then any error that was raised will be ignored. For example,
we could write a context manager which ignores any assertion errors, but we could write a context manager which ignores any assertion errors, but
allows other errors to halt execution as normal: allows other errors to halt execution as normal:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MyContextManager(object): class MyContextManager(object):
def __enter__(self): def __enter__(self):
print('In enter') print('In enter')
def __exit__(self, arg1, arg2, arg3): def __exit__(self, arg1, arg2, arg3):
print('In exit') print('In exit')
if issubclass(arg1, AssertionError): if issubclass(arg1, AssertionError):
return True return True
print(' arg1: ', arg1) print(' arg1: ', arg1)
print(' arg2: ', arg2) print(' arg2: ', arg2)
print(' arg3: ', arg3) print(' arg3: ', arg3)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Note that if a function or method does not explicitly return a value, its > Note that if a function or method does not explicitly return a value, its
> return value is `None` (which would evaluate to `False` when converted to a > return value is `None` (which would evaluate to `False` when converted to a
> `bool`). Also note that we are using the built-in > `bool`). Also note that we are using the built-in
> [`issubclass`](https://docs.python.org/3.5/library/functions.html#issubclass) > [`issubclass`](https://docs.python.org/3/library/functions.html#issubclass)
> function, which allows us to test the type of a class. > function, which allows us to test the type of a class.
Now, when we use `MyContextManager`, any assertion errors are suppressed, Now, when we use `MyContextManager`, any assertion errors are suppressed,
whereas other errors will be raised as normal: whereas other errors will be raised as normal:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with MyContextManager(): with MyContextManager():
assert 1 == 0 assert 1 == 0
print('Continuing execution!') print('Continuing execution!')
with MyContextManager(): with MyContextManager():
raise ValueError('Oh no!') raise ValueError('Oh no!')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="nesting-context-managers"></a> <a class="anchor" id="nesting-context-managers"></a>
## Nesting context managers ## Nesting context managers
It is possible to nest `with` statements: It is possible to nest `with` statements:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with open('05_context_managers.md', 'rt') as inf: with open('05_context_managers.md', 'rt') as inf:
with TempDir(): with TempDir():
with open('05_context_managers.md.copy', 'wt') as outf: with open('05_context_managers.md.copy', 'wt') as outf:
outf.write(inf.read()) outf.write(inf.read())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can also use multiple context managers in a single `with` statement: You can also use multiple context managers in a single `with` statement:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
with open('05_context_managers.md', 'rt') as inf, \ with open('05_context_managers.md', 'rt') as inf, \
TempDir(), \ TempDir(), \
open('05_context_managers.md.copy', 'wt') as outf: open('05_context_managers.md.copy', 'wt') as outf:
outf.write(inf.read()) outf.write(inf.read())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="functions-as-context-managers"></a> <a class="anchor" id="functions-as-context-managers"></a>
## Functions as context managers ## Functions as context managers
In fact, there is another way to create context managers in Python. The In fact, there is another way to create context managers in Python. The
built-in [`contextlib` built-in [`contextlib`
module](https://docs.python.org/3.5/library/contextlib.html#contextlib.contextmanager) module](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
has a decorator called `@contextmanager`, which allows us to turn __any has a decorator called `@contextmanager`, which allows us to turn __any
function__ into a context manager. The only requirement is that the function function__ into a context manager. The only requirement is that the function
must have a `yield` statement<sup>1</sup>. So we could rewrite our `TempDir` must have a `yield` statement<sup>1</sup>. So we could rewrite our `TempDir`
class from above as a function: class from above as a function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import os import os
import shutil import shutil
import tempfile import tempfile
import contextlib import contextlib
@contextlib.contextmanager @contextlib.contextmanager
def tempdir(): def tempdir():
tdir = tempfile.mkdtemp() tdir = tempfile.mkdtemp()
prevdir = os.getcwd() prevdir = os.getcwd()
try: try:
os.chdir(tdir) os.chdir(tdir)
yield tdir yield tdir
finally: finally:
os.chdir(prevdir) os.chdir(prevdir)
shutil.rmtree(tdir) shutil.rmtree(tdir)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This new `tempdir` function is used in exactly the same way as our `TempDir` This new `tempdir` function is used in exactly the same way as our `TempDir`
class: class:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('In directory: {}'.format(os.getcwd())) print('In directory: {}'.format(os.getcwd()))
with tempdir() as tmp: with tempdir() as tmp:
print('Now in directory: {}'.format(os.getcwd())) print('Now in directory: {}'.format(os.getcwd()))
print('Back in directory: {}'.format(os.getcwd())) print('Back in directory: {}'.format(os.getcwd()))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `yield tdir` statement in our `tempdir` function causes the `tdir` value The `yield tdir` statement in our `tempdir` function causes the `tdir` value
to be passed to the `with` statement, so in the line `with tempdir() as tmp`, to be passed to the `with` statement, so in the line `with tempdir() as tmp`,
the variable `tmp` will be given the value `tdir`. the variable `tmp` will be given the value `tdir`.
> <sup>1</sup> The `yield` keyword is used in _generator functions_. > <sup>1</sup> The `yield` keyword is used in *generator functions*.
> Functions which are used with the `@contextmanager` decorator must be > Functions which are used with the `@contextmanager` decorator must be
> generator functions which yield exactly one value. > generator functions which yield exactly one value.
> [Generators](https://www.python.org/dev/peps/pep-0289/) and [generator > [Generators](https://www.python.org/dev/peps/pep-0289/) and [generator
> functions](https://docs.python.org/3.5/glossary.html#term-generator) are > functions](https://docs.python.org/3/glossary.html#term-generator) are
> beyond the scope of this practical. > beyond the scope of this practical.
<a class="anchor" id="methods-as-context-managers"></a> <a class="anchor" id="methods-as-context-managers"></a>
## Methods as context managers ## Methods as context managers
Since it is possible to write a function which is a context manager, it is of Since it is possible to write a function which is a context manager, it is of
course also possible to write a _method_ which is a context manager. Let's course also possible to write a _method_ which is a context manager. Let's
play with another example. We have a `Notifier` class which can be used to play with another example. We have a `Notifier` class which can be used to
notify interested listeners when an event occurs. Listeners can be registered notify interested listeners when an event occurs. Listeners can be registered
for notification via the `register` method: for notification via the `register` method:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from collections import OrderedDict from collections import OrderedDict
class Notifier(object): class Notifier(object):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.listeners = OrderedDict() self.listeners = OrderedDict()
def register(self, name, func): def register(self, name, func):
self.listeners[name] = func self.listeners[name] = func
def notify(self): def notify(self):
for listener in self.listeners.values(): for listener in self.listeners.values():
listener() listener()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, let's build a little plotting application. First of all, we have a `Line` Now, let's build a little plotting application. First of all, we have a `Line`
class, which represents a line plot. The `Line` class is a sub-class of class, which represents a line plot. The `Line` class is a sub-class of
`Notifier`, so whenever its display properties (`colour`, `width`, or `name`) `Notifier`, so whenever its display properties (`colour`, `width`, or `name`)
change, it emits a notification, and whatever is drawing it can refresh the change, it emits a notification, and whatever is drawing it can refresh the
display: display:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
class Line(Notifier): class Line(Notifier):
def __init__(self, data): def __init__(self, data):
super().__init__() super().__init__()
self.__data = data self.__data = data
self.__colour = '#000000' self.__colour = '#000000'
self.__width = 1 self.__width = 1
self.__name = 'line' self.__name = 'line'
@property @property
def xdata(self): def xdata(self):
return np.arange(len(self.__data)) return np.arange(len(self.__data))
@property @property
def ydata(self): def ydata(self):
return np.copy(self.__data) return np.copy(self.__data)
@property @property
def colour(self): def colour(self):
return self.__colour return self.__colour
@colour.setter @colour.setter
def colour(self, newColour): def colour(self, newColour):
self.__colour = newColour self.__colour = newColour
print('Line: colour changed: {}'.format(newColour)) print('Line: colour changed: {}'.format(newColour))
self.notify() self.notify()
@property @property
def width(self): def width(self):
return self.__width return self.__width
@width.setter @width.setter
def width(self, newWidth): def width(self, newWidth):
self.__width = newWidth self.__width = newWidth
print('Line: width changed: {}'.format(newWidth)) print('Line: width changed: {}'.format(newWidth))
self.notify() self.notify()
@property @property
def name(self): def name(self):
return self.__name return self.__name
@name.setter @name.setter
def name(self, newName): def name(self, newName):
self.__name = newName self.__name = newName
print('Line: name changed: {}'.format(newName)) print('Line: name changed: {}'.format(newName))
self.notify() self.notify()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now let's write a `Plotter` class, which can plot one or more `Line` Now let's write a `Plotter` class, which can plot one or more `Line`
instances: instances:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
class Plotter(object): class Plotter(object):
def __init__(self, axis): def __init__(self, axis):
self.__axis = axis self.__axis = axis
self.__lines = [] self.__lines = []
def addData(self, data): def addData(self, data):
line = Line(data) line = Line(data)
self.__lines.append(line) self.__lines.append(line)
line.register('plot', self.lineChanged) line.register('plot', self.lineChanged)
self.draw() self.draw()
return line return line
def lineChanged(self): def lineChanged(self):
self.draw() self.draw()
def draw(self): def draw(self):
print('Plotter: redrawing plot') print('Plotter: redrawing plot')
ax = self.__axis ax = self.__axis
ax.clear() ax.clear()
for line in self.__lines: for line in self.__lines:
ax.plot(line.xdata, ax.plot(line.xdata,
line.ydata, line.ydata,
color=line.colour, color=line.colour,
linewidth=line.width, linewidth=line.width,
label=line.name) label=line.name)
ax.legend() ax.legend()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Let's create a `Plotter` object, and add a couple of lines to it (note that Let's create a `Plotter` object, and add a couple of lines to it (note that
the `matplotlib` plot will open in a separate window): the `matplotlib` plot will open in a separate window):
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
# this line is only necessary when # this line is only necessary when
# working in jupyer notebook/ipython # working in jupyer notebook/ipython
%matplotlib %matplotlib
fig = plt.figure() fig = plt.figure()
ax = fig.add_subplot(111) ax = fig.add_subplot(111)
plotter = Plotter(ax) plotter = Plotter(ax)
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50))) l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50))) l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
fig.show() fig.show()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, when we change the properties of our `Line` instances, the plot will be Now, when we change the properties of our `Line` instances, the plot will be
automatically updated: automatically updated:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
l1.colour = '#ff0000' l1.colour = '#ff0000'
l2.colour = '#00ff00' l2.colour = '#00ff00'
l1.width = 2 l1.width = 2
l2.width = 2 l2.width = 2
l1.name = 'sine' l1.name = 'sine'
l2.name = 'cosine' l2.name = 'cosine'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Pretty cool! However, this seems very inefficient - every time we change the Pretty cool! However, this seems very inefficient - every time we change the
properties of a `Line`, the `Plotter` will refresh the plot. If we were properties of a `Line`, the `Plotter` will refresh the plot. If we were
plotting large amounts of data, this would be unacceptable, as plotting would plotting large amounts of data, this would be unacceptable, as plotting would
simply take too long. simply take too long.
Wouldn't it be nice if we were able to perform batch-updates of `Line` Wouldn't it be nice if we were able to perform batch-updates of `Line`
properties, and only refresh the plot when we are done? Let's add an extra properties, and only refresh the plot when we are done? Let's add an extra
method to the `Plotter` class: method to the `Plotter` class:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import contextlib import contextlib
class Plotter(object): class Plotter(object):
def __init__(self, axis): def __init__(self, axis):
self.__axis = axis self.__axis = axis
self.__lines = [] self.__lines = []
self.__holdUpdates = False self.__holdUpdates = False
def addData(self, data): def addData(self, data):
line = Line(data) line = Line(data)
self.__lines.append(line) self.__lines.append(line)
line.register('plot', self.lineChanged) line.register('plot', self.lineChanged)
if not self.__holdUpdates: if not self.__holdUpdates:
self.draw() self.draw()
return line return line
def lineChanged(self): def lineChanged(self):
if not self.__holdUpdates: if not self.__holdUpdates:
self.draw() self.draw()
def draw(self): def draw(self):
print('Plotter: redrawing plot') print('Plotter: redrawing plot')
ax = self.__axis ax = self.__axis
ax.clear() ax.clear()
for line in self.__lines: for line in self.__lines:
ax.plot(line.xdata, ax.plot(line.xdata,
line.ydata, line.ydata,
color=line.colour, color=line.colour,
linewidth=line.width, linewidth=line.width,
label=line.name) label=line.name)
ax.legend() ax.legend()
@contextlib.contextmanager @contextlib.contextmanager
def holdUpdates(self): def holdUpdates(self):
self.__holdUpdates = True self.__holdUpdates = True
try: try:
yield yield
self.draw() self.draw()
finally: finally:
self.__holdUpdates = False self.__holdUpdates = False
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This new `holdUpdates` method allows us to temporarily suppress notifications This new `holdUpdates` method allows us to temporarily suppress notifications
from all `Line` instances. So now, we can update many `Line` properties from all `Line` instances. Let's create a new plot:
without performing any redundant redraws:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fig = plt.figure() fig = plt.figure()
ax = fig.add_subplot(111) ax = fig.add_subplot(111)
plotter = Plotter(ax) plotter = Plotter(ax)
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
plt.show() plt.show()
```
%% Cell type:markdown id: tags:
Now, we can update many `Line` properties without performing any redundant
redraws:
%% Cell type:code id: tags:
```
with plotter.holdUpdates(): with plotter.holdUpdates():
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
l1.colour = '#0000ff' l1.colour = '#0000ff'
l2.colour = '#ffff00' l2.colour = '#ffff00'
l1.width = 1 l1.width = 1
l2.width = 1 l2.width = 1
l1.name = '$sin(x)$' l1.name = '$sin(x)$'
l2.name = '$cos(x)$' l2.name = '$cos(x)$'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="useful-references"></a> <a class="anchor" id="useful-references"></a>
## Useful references ## Useful references
* [Context manager classes](https://docs.python.org/3.5/reference/datamodel.html#context-managers) * [Context manager classes](https://docs.python.org/3/reference/datamodel.html#context-managers)
* The [`contextlib` module](https://docs.python.org/3.5/library/contextlib.html) * The [`contextlib` module](https://docs.python.org/3/library/contextlib.html)
......
...@@ -17,8 +17,8 @@ automatically, even if an error occurs inside the `with` statement. ...@@ -17,8 +17,8 @@ automatically, even if an error occurs inside the `with` statement.
The `with` statement is obviously hiding some internal details from us. But The `with` statement is obviously hiding some internal details from us. But
these internals are in fact quite straightforward, and are known as [_context these internals are in fact quite straightforward, and are known as [*context
managers_](https://docs.python.org/3.5/reference/datamodel.html#context-managers). managers*](https://docs.python.org/3/reference/datamodel.html#context-managers).
* [Anatomy of a context manager](#anatomy-of-a-context-manager) * [Anatomy of a context manager](#anatomy-of-a-context-manager)
...@@ -36,7 +36,7 @@ managers_](https://docs.python.org/3.5/reference/datamodel.html#context-managers ...@@ -36,7 +36,7 @@ managers_](https://docs.python.org/3.5/reference/datamodel.html#context-managers
## Anatomy of a context manager ## Anatomy of a context manager
A _context manager_ is simply an object which has two specially named methods A *context manager* is simply an object which has two specially named methods
`__enter__` and `__exit__`. Any object which has these methods can be used in `__enter__` and `__exit__`. Any object which has these methods can be used in
a `with` statement. a `with` statement.
...@@ -89,7 +89,7 @@ finalisation logic that we always want to have executed. ...@@ -89,7 +89,7 @@ finalisation logic that we always want to have executed.
Context managers do not provide anything that cannot be accomplished in other Context managers do not provide anything that cannot be accomplished in other
ways. For example, we could accomplish very similar behaviour using ways. For example, we could accomplish very similar behaviour using
[`try` - `finally` logic](https://docs.python.org/3.5/tutorial/errors.html#handling-exceptions) - [`try` - `finally` logic](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) -
the statements in the `finally` clause will *always* be executed, whether an the statements in the `finally` clause will *always* be executed, whether an
error is raised or not: error is raised or not:
...@@ -183,7 +183,7 @@ with TempDir(): ...@@ -183,7 +183,7 @@ with TempDir():
By now you must be [panicking](https://youtu.be/cSU_5MgtDc8?t=9) about why I By now you must be [panicking](https://youtu.be/cSU_5MgtDc8?t=9) about why I
haven't mentioned those conspicuous `*args` that get passed to the`__exit__` haven't mentioned those conspicuous `*args` that get passed to the`__exit__`
method. It turns out that a context manager's [`__exit__` method. It turns out that a context manager's [`__exit__`
method](https://docs.python.org/3.5/reference/datamodel.html#object.__exit__) method](https://docs.python.org/3/reference/datamodel.html#object.__exit__)
is always passed three arguments. is always passed three arguments.
...@@ -227,10 +227,10 @@ with MyContextManager(): ...@@ -227,10 +227,10 @@ with MyContextManager():
So when an error occurs, the `__exit__` method is passed the following: So when an error occurs, the `__exit__` method is passed the following:
- The [`Exception`](https://docs.python.org/3.5/tutorial/errors.html) - The [`Exception`](https://docs.python.org/3/tutorial/errors.html)
type that was raised. type that was raised.
- The `Exception` instance that was raised. - The `Exception` instance that was raised.
- A [`traceback`](https://docs.python.org/3.5/library/traceback.html) object - A [`traceback`](https://docs.python.org/3/library/traceback.html) object
which can be used to get more information about the exception (e.g. line which can be used to get more information about the exception (e.g. line
number). number).
...@@ -262,7 +262,7 @@ class MyContextManager(object): ...@@ -262,7 +262,7 @@ class MyContextManager(object):
> Note that if a function or method does not explicitly return a value, its > Note that if a function or method does not explicitly return a value, its
> return value is `None` (which would evaluate to `False` when converted to a > return value is `None` (which would evaluate to `False` when converted to a
> `bool`). Also note that we are using the built-in > `bool`). Also note that we are using the built-in
> [`issubclass`](https://docs.python.org/3.5/library/functions.html#issubclass) > [`issubclass`](https://docs.python.org/3/library/functions.html#issubclass)
> function, which allows us to test the type of a class. > function, which allows us to test the type of a class.
...@@ -312,7 +312,7 @@ with open('05_context_managers.md', 'rt') as inf, \ ...@@ -312,7 +312,7 @@ with open('05_context_managers.md', 'rt') as inf, \
In fact, there is another way to create context managers in Python. The In fact, there is another way to create context managers in Python. The
built-in [`contextlib` built-in [`contextlib`
module](https://docs.python.org/3.5/library/contextlib.html#contextlib.contextmanager) module](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
has a decorator called `@contextmanager`, which allows us to turn __any has a decorator called `@contextmanager`, which allows us to turn __any
function__ into a context manager. The only requirement is that the function function__ into a context manager. The only requirement is that the function
must have a `yield` statement<sup>1</sup>. So we could rewrite our `TempDir` must have a `yield` statement<sup>1</sup>. So we could rewrite our `TempDir`
...@@ -359,11 +359,11 @@ to be passed to the `with` statement, so in the line `with tempdir() as tmp`, ...@@ -359,11 +359,11 @@ to be passed to the `with` statement, so in the line `with tempdir() as tmp`,
the variable `tmp` will be given the value `tdir`. the variable `tmp` will be given the value `tdir`.
> <sup>1</sup> The `yield` keyword is used in _generator functions_. > <sup>1</sup> The `yield` keyword is used in *generator functions*.
> Functions which are used with the `@contextmanager` decorator must be > Functions which are used with the `@contextmanager` decorator must be
> generator functions which yield exactly one value. > generator functions which yield exactly one value.
> [Generators](https://www.python.org/dev/peps/pep-0289/) and [generator > [Generators](https://www.python.org/dev/peps/pep-0289/) and [generator
> functions](https://docs.python.org/3.5/glossary.html#term-generator) are > functions](https://docs.python.org/3/glossary.html#term-generator) are
> beyond the scope of this practical. > beyond the scope of this practical.
...@@ -582,20 +582,24 @@ class Plotter(object): ...@@ -582,20 +582,24 @@ class Plotter(object):
This new `holdUpdates` method allows us to temporarily suppress notifications This new `holdUpdates` method allows us to temporarily suppress notifications
from all `Line` instances. So now, we can update many `Line` properties from all `Line` instances. Let's create a new plot:
without performing any redundant redraws:
``` ```
fig = plt.figure() fig = plt.figure()
ax = fig.add_subplot(111) ax = fig.add_subplot(111)
plotter = Plotter(ax) plotter = Plotter(ax)
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
plt.show() plt.show()
```
Now, we can update many `Line` properties without performing any redundant
redraws:
```
with plotter.holdUpdates(): with plotter.holdUpdates():
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
l1.colour = '#0000ff' l1.colour = '#0000ff'
l2.colour = '#ffff00' l2.colour = '#ffff00'
l1.width = 1 l1.width = 1
...@@ -609,5 +613,5 @@ with plotter.holdUpdates(): ...@@ -609,5 +613,5 @@ with plotter.holdUpdates():
## Useful references ## Useful references
* [Context manager classes](https://docs.python.org/3.5/reference/datamodel.html#context-managers) * [Context manager classes](https://docs.python.org/3/reference/datamodel.html#context-managers)
* The [`contextlib` module](https://docs.python.org/3.5/library/contextlib.html) * The [`contextlib` module](https://docs.python.org/3/library/contextlib.html)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Decorators # Decorators
Remember that in Python, everything is an object, including functions. This Remember that in Python, everything is an object, including functions. This
means that we can do things like: means that we can do things like:
- Pass a function as an argument to another function. - Pass a function as an argument to another function.
- Create/define a function inside another function. - Create/define a function inside another function.
- Write a function which returns another function. - Write a function which returns another function.
These abilities mean that we can do some neat things with functions in Python. These abilities mean that we can do some neat things with functions in Python.
* [Overview](#overview) * [Overview](#overview)
* [Decorators on methods](#decorators-on-methods) * [Decorators on methods](#decorators-on-methods)
* [Example - memoization](#example-memoization) * [Example - memoization](#example-memoization)
* [Decorators with arguments](#decorators-with-arguments) * [Decorators with arguments](#decorators-with-arguments)
* [Chaining decorators](#chaining-decorators) * [Chaining decorators](#chaining-decorators)
* [Decorator classes](#decorator-classes) * [Decorator classes](#decorator-classes)
* [Appendix: Functions are not special](#appendix-functions-are-not-special) * [Appendix: Functions are not special](#appendix-functions-are-not-special)
* [Appendix: Closures](#appendix-closures) * [Appendix: Closures](#appendix-closures)
* [Appendix: Decorators without arguments versus decorators with arguments](#appendix-decorators-without-arguments-versus-decorators-with-arguments) * [Appendix: Decorators without arguments versus decorators with arguments](#appendix-decorators-without-arguments-versus-decorators-with-arguments)
* [Appendix: Per-instance decorators](#appendix-per-instance-decorators) * [Appendix: Per-instance decorators](#appendix-per-instance-decorators)
* [Appendix: Preserving function metadata](#appendix-preserving-function-metadata) * [Appendix: Preserving function metadata](#appendix-preserving-function-metadata)
* [Appendix: Class decorators](#appendix-class-decorators) * [Appendix: Class decorators](#appendix-class-decorators)
* [Useful references](#useful-references) * [Useful references](#useful-references)
<a class="anchor" id="overview"></a> <a class="anchor" id="overview"></a>
## Overview ## Overview
Let's say that we want a way to calculate the execution time of any function Let's say that we want a way to calculate the execution time of any function
(this example might feel familiar to you if you have gone through the (this example might feel familiar to you if you have gone through the
practical on operator overloading). practical on operator overloading).
Our first attempt at writing such a function might look like this: Our first attempt at writing such a function might look like this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
def timeFunc(func, *args, **kwargs): def timeFunc(func, *args, **kwargs):
start = time.time() start = time.time()
retval = func(*args, **kwargs) retval = func(*args, **kwargs)
end = time.time() end = time.time()
print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start)) print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start))
return retval return retval
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `timeFunc` function accepts another function, `func`, as its first The `timeFunc` function accepts another function, `func`, as its first
argument. It calls `func`, passing it all of the other arguments, and then argument. It calls `func`, passing it all of the other arguments, and then
prints the time taken for `func` to complete: prints the time taken for `func` to complete:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
import numpy.linalg as npla import numpy.linalg as npla
def inverse(a): def inverse(a):
return npla.inv(a) return npla.inv(a)
data = np.random.random((2000, 2000)) data = np.random.random((2000, 2000))
invdata = timeFunc(inverse, data) invdata = timeFunc(inverse, data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But this means that whenever we want to time something, we have to call the But this means that whenever we want to time something, we have to call the
`timeFunc` function directly. Let's take advantage of the fact that we can `timeFunc` function directly. Let's take advantage of the fact that we can
define a function inside another funciton. Look at the next block of code define a function inside another funciton. Look at the next block of code
carefully, and make sure you understand what our new `timeFunc` implementation carefully, and make sure you understand what our new `timeFunc` implementation
is doing. is doing.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
def timeFunc(func): def timeFunc(func):
def wrapperFunc(*args, **kwargs): def wrapperFunc(*args, **kwargs):
start = time.time() start = time.time()
retval = func(*args, **kwargs) retval = func(*args, **kwargs)
end = time.time() end = time.time()
print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start)) print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start))
return retval return retval
return wrapperFunc return wrapperFunc
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This new `timeFunc` function is again passed a function `func`, but this time This new `timeFunc` function is again passed a function `func`, but this time
as its sole argument. It then creates and returns a new function, as its sole argument. It then creates and returns a new function,
`wrapperFunc`. This `wrapperFunc` function calls and times the function that `wrapperFunc`. This `wrapperFunc` function calls and times the function that
was passed to `timeFunc`. But note that when `timeFunc` is called, was passed to `timeFunc`. But note that when `timeFunc` is called,
`wrapperFunc` is _not_ called - it is only created and returned. `wrapperFunc` is *not* called - it is only created and returned.
Let's use our new `timeFunc` implementation: Let's use our new `timeFunc` implementation:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
import numpy.linalg as npla import numpy.linalg as npla
def inverse(a): def inverse(a):
return npla.inv(a) return npla.inv(a)
data = np.random.random((2000, 2000)) data = np.random.random((2000, 2000))
inverse = timeFunc(inverse) inverse = timeFunc(inverse)
invdata = inverse(data) invdata = inverse(data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here, we did the following: Here, we did the following:
1. We defined a function called `inverse`: 1. We defined a function called `inverse`:
> ``` > ```
> def inverse(a): > def inverse(a):
> return npla.inv(a) > return npla.inv(a)
> ``` > ```
2. We passed the `inverse` function to the `timeFunc` function, and 2. We passed the `inverse` function to the `timeFunc` function, and
re-assigned the return value of `timeFunc` back to `inverse`: re-assigned the return value of `timeFunc` back to `inverse`:
> ``` > ```
> inverse = timeFunc(inverse) > inverse = timeFunc(inverse)
> ``` > ```
3. We called the new `inverse` function: 3. We called the new `inverse` function:
> ``` > ```
> invdata = inverse(data) > invdata = inverse(data)
> ``` > ```
So now the `inverse` variable refers to an instantiation of `wrapperFunc`, So now the `inverse` variable refers to an instantiation of `wrapperFunc`,
which holds a reference to the original definition of `inverse`. which holds a reference to the original definition of `inverse`.
> If this is not clear, take a break now and read through the appendix on how > If this is not clear, take a break now and read through the appendix on how
> [functions are not special](#appendix-functions-are-not-special). > [functions are not special](#appendix-functions-are-not-special).
Guess what? We have just created a __decorator__. A decorator is simply a Guess what? We have just created a **decorator**. A decorator is simply a
function which accepts a function as its input, and returns another function function which accepts a function as its input, and returns another function
as its output. In the example above, we have _decorated_ the `inverse` as its output. In the example above, we have *decorated* the `inverse`
function with the `timeFunc` decorator. function with the `timeFunc` decorator.
Python provides an alternative syntax for decorating one function with Python provides an alternative syntax for decorating one function with
another, using the `@` character. The approach that we used to decorate another, using the `@` character. The approach that we used to decorate
`inverse` above: `inverse` above:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def inverse(a): def inverse(a):
return npla.inv(a) return npla.inv(a)
inverse = timeFunc(inverse) inverse = timeFunc(inverse)
invdata = inverse(data) invdata = inverse(data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
is semantically equivalent to this: is semantically equivalent to this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@timeFunc @timeFunc
def inverse(a): def inverse(a):
return npla.inv(a) return npla.inv(a)
invdata = inverse(data) invdata = inverse(data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="decorators-on-methods"></a> <a class="anchor" id="decorators-on-methods"></a>
## Decorators on methods ## Decorators on methods
Applying a decorator to the methods of a class works in the same way: Applying a decorator to the methods of a class works in the same way:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy.linalg as npla import numpy.linalg as npla
class MiscMaths(object): class MiscMaths(object):
@timeFunc @timeFunc
def inverse(self, a): def inverse(self, a):
return npla.inv(a) return npla.inv(a)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, the `inverse` method of all `MiscMaths` instances will be timed: Now, the `inverse` method of all `MiscMaths` instances will be timed:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
mm1 = MiscMaths() mm1 = MiscMaths()
mm2 = MiscMaths() mm2 = MiscMaths()
i1 = mm1.inverse(np.random.random((1000, 1000))) i1 = mm1.inverse(np.random.random((1000, 1000)))
i2 = mm2.inverse(np.random.random((1500, 1500))) i2 = mm2.inverse(np.random.random((1500, 1500)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that only one `timeFunc` decorator was created here - the `timeFunc` Note that only one `timeFunc` decorator was created here - the `timeFunc`
function was only called once - when the `MiscMaths` class was defined. This function was only called once - when the `MiscMaths` class was defined. This
might be clearer if we re-write the above code in the following (equivalent) might be clearer if we re-write the above code in the following (equivalent)
manner: manner:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MiscMaths(object): class MiscMaths(object):
def inverse(self, a): def inverse(self, a):
return npla.inv(a) return npla.inv(a)
MiscMaths.inverse = timeFunc(MiscMaths.inverse) MiscMaths.inverse = timeFunc(MiscMaths.inverse)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So only one `wrapperFunc` function exists, and this function is _shared_ by So only one `wrapperFunc` function exists, and this function is *shared* by
all instances of the `MiscMaths` class - (such as the `mm1` and `mm2` all instances of the `MiscMaths` class - (such as the `mm1` and `mm2`
instances in the example above). In many cases this is not a problem, but instances in the example above). In many cases this is not a problem, but
there can be situations where you need each instance of your class to have its there can be situations where you need each instance of your class to have its
own unique decorator. own unique decorator.
> If you are interested in solutions to this problem, take a look at the > If you are interested in solutions to this problem, take a look at the
> appendix on [per-instance decorators](#appendix-per-instance-decorators). > appendix on [per-instance decorators](#appendix-per-instance-decorators).
<a class="anchor" id="example-memoization"></a> <a class="anchor" id="example-memoization"></a>
## Example - memoization ## Example - memoization
Let's move onto another example. Let's move onto another example.
[Meowmoization](https://en.wikipedia.org/wiki/Memoization) is a common [Meowmoization](https://en.wikipedia.org/wiki/Memoization) is a common
performance optimisation technique used in cats. I mean software. Essentially, performance optimisation technique used in cats. I mean software. Essentially,
memoization refers to the process of maintaining a cache for a function which memoization refers to the process of maintaining a cache for a function which
performs some expensive calculation. When the function is executed with a set performs some expensive calculation. When the function is executed with a set
of inputs, the calculation is performed, and then a copy of the inputs and the of inputs, the calculation is performed, and then a copy of the inputs and the
result are cached. If the function is called again with the same inputs, the result are cached. If the function is called again with the same inputs, the
cached result can be returned. cached result can be returned.
This is a perfect problem to tackle with decorators: This is a perfect problem to tackle with decorators:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def memoize(func): def memoize(func):
cache = {} cache = {}
def wrapper(*args): def wrapper(*args):
# is there a value in the cache # is there a value in the cache
# for this set of inputs? # for this set of inputs?
cached = cache.get(args, None) cached = cache.get(args, None)
# If not, call the function, # If not, call the function,
# and cache the result. # and cache the result.
if cached is None: if cached is None:
cached = func(*args) cached = func(*args)
cache[args] = cached cache[args] = cached
else: else:
print('Cached {}({}): {}'.format(func.__name__, args, cached)) print('Cached {}({}): {}'.format(func.__name__, args, cached))
return cached return cached
return wrapper return wrapper
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can now use our `memoize` decorator to add a memoization cache to any We can now use our `memoize` decorator to add a memoization cache to any
function. Let's memoize a function which generates the $n^{th}$ number in the function. Let's memoize a function which generates the $n^{th}$ number in the
[Fibonacci series](https://en.wikipedia.org/wiki/Fibonacci_number): [Fibonacci series](https://en.wikipedia.org/wiki/Fibonacci_number):
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@memoize @memoize
def fib(n): def fib(n):
if n in (0, 1): if n in (0, 1):
print('fib({}) = {}'.format(n, n)) print('fib({}) = {}'.format(n, n))
return n return n
twoback = 1 twoback = 1
oneback = 1 oneback = 1
val = 1 val = 1
for _ in range(2, n): for _ in range(2, n):
val = oneback + twoback val = oneback + twoback
twoback = oneback twoback = oneback
oneback = val oneback = val
print('fib({}) = {}'.format(n, val)) print('fib({}) = {}'.format(n, val))
return val return val
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
For a given input, when `fib` is called the first time, it will calculate the For a given input, when `fib` is called the first time, it will calculate the
$n^{th}$ Fibonacci number: $n^{th}$ Fibonacci number:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
for i in range(10): for i in range(10):
fib(i) fib(i)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
However, on repeated calls with the same input, the calculation is skipped, However, on repeated calls with the same input, the calculation is skipped,
and instead the result is retrieved from the memoization cache: and instead the result is retrieved from the memoization cache:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
for i in range(10): for i in range(10):
fib(i) fib(i)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> If you are wondering how the `wrapper` function is able to access the > If you are wondering how the `wrapper` function is able to access the
> `cache` variable, refer to the [appendix on closures](#appendix-closures). > `cache` variable, refer to the [appendix on closures](#appendix-closures).
<a class="anchor" id="decorators-with-arguments"></a> <a class="anchor" id="decorators-with-arguments"></a>
## Decorators with arguments ## Decorators with arguments
Continuing with our memoization example, let's say that we want to place a Continuing with our memoization example, let's say that we want to place a
limit on the maximum size that our cache can grow to. For example, the output limit on the maximum size that our cache can grow to. For example, the output
of our function might have large memory requirements, so we can only afford to of our function might have large memory requirements, so we can only afford to
store a handful of pre-calculated results. It would be nice to be able to store a handful of pre-calculated results. It would be nice to be able to
specify the maximum cache size when we define our function to be memoized, specify the maximum cache size when we define our function to be memoized,
like so: like so:
> ``` > ```
> # cache at most 10 results > # cache at most 10 results
> @limitedMemoize(10): > @limitedMemoize(10):
> def fib(n): > def fib(n):
> ... > ...
> ``` > ```
In order to support this, our `memoize` decorator function needs to be In order to support this, our `memoize` decorator function needs to be
modified - it is currently written to accept a function as its sole argument, modified - it is currently written to accept a function as its sole argument,
but we need it to accept a cache size limit. but we need it to accept a cache size limit.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from collections import OrderedDict from collections import OrderedDict
def limitedMemoize(maxSize): def limitedMemoize(maxSize):
cache = OrderedDict() cache = OrderedDict()
def decorator(func): def decorator(func):
def wrapper(*args): def wrapper(*args):
# is there a value in the cache # is there a value in the cache
# for this set of inputs? # for this set of inputs?
cached = cache.get(args, None) cached = cache.get(args, None)
# If not, call the function, # If not, call the function,
# and cache the result. # and cache the result.
if cached is None: if cached is None:
cached = func(*args) cached = func(*args)
# If the cache has grown too big, # If the cache has grown too big,
# remove the oldest item. In practice # remove the oldest item. In practice
# it would make more sense to remove # it would make more sense to remove
# the item with the oldest access # the item with the oldest access
# time (or remove the least recently # time (or remove the least recently
# used item, as the built-in # used item, as the built-in
# @functools.lru_cache does), but this # @functools.lru_cache does), but this
# is good enough for now! # is good enough for now!
if len(cache) >= maxSize: if len(cache) >= maxSize:
cache.popitem(last=False) cache.popitem(last=False)
cache[args] = cached cache[args] = cached
else: else:
print('Cached {}({}): {}'.format(func.__name__, args, cached)) print('Cached {}({}): {}'.format(func.__name__, args, cached))
return cached return cached
return wrapper return wrapper
return decorator return decorator
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> We used the handy > We used the handy
> [`collections.OrderedDict`](https://docs.python.org/3.5/library/collections.html#collections.OrderedDict) > [`collections.OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict)
> class here which preserves the insertion order of key-value pairs. > class here which preserves the insertion order of key-value pairs.
This is starting to look a little complicated - we now have _three_ layers of This is starting to look a little complicated - we now have *three* layers of
functions. This is necessary when you wish to write a decorator which accepts functions. This is necessary when you wish to write a decorator which accepts
arguments (refer to the arguments (refer to the
[appendix](#appendix-decorators-without-arguments-versus-decorators-with-arguments) [appendix](#appendix-decorators-without-arguments-versus-decorators-with-arguments)
for more details). for more details).
But this `limitedMemoize` decorator is used in essentially the same way as our But this `limitedMemoize` decorator is used in essentially the same way as our
earlier `memoize` decorator: earlier `memoize` decorator:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@limitedMemoize(5) @limitedMemoize(5)
def fib(n): def fib(n):
if n in (0, 1): if n in (0, 1):
print('fib({}) = 1'.format(n)) print('fib({}) = 1'.format(n))
return n return n
twoback = 1 twoback = 1
oneback = 1 oneback = 1
val = 1 val = 1
for _ in range(2, n): for _ in range(2, n):
val = oneback + twoback val = oneback + twoback
twoback = oneback twoback = oneback
oneback = val oneback = val
print('fib({}) = {}'.format(n, val)) print('fib({}) = {}'.format(n, val))
return val return val
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Except that now, the `fib` function will only cache up to 5 values. Except that now, the `fib` function will only cache up to 5 values.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
fib(10) fib(10)
fib(11) fib(11)
fib(12) fib(12)
fib(13) fib(13)
fib(14) fib(14)
print('The result for 10 should come from the cache') print('The result for 10 should come from the cache')
fib(10) fib(10)
fib(15) fib(15)
print('The result for 10 should no longer be cached') print('The result for 10 should no longer be cached')
fib(10) fib(10)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="chaining-decorators"></a> <a class="anchor" id="chaining-decorators"></a>
## Chaining decorators ## Chaining decorators
Decorators can easily be chained, or nested: Decorators can easily be chained, or nested:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
@timeFunc @timeFunc
@memoize @memoize
def expensiveFunc(n): def expensiveFunc(n):
time.sleep(n) time.sleep(n)
return n return n
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Remember that this is semantically equivalent to the following: > Remember that this is semantically equivalent to the following:
> >
> ``` > ```
> def expensiveFunc(n): > def expensiveFunc(n):
> time.sleep(n) > time.sleep(n)
> return n > return n
> >
> expensiveFunc = timeFunc(memoize(expensiveFunc)) > expensiveFunc = timeFunc(memoize(expensiveFunc))
> ``` > ```
Now we can see the effect of our memoization layer on performance: Now we can see the effect of our memoization layer on performance:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
expensiveFunc(0.5) expensiveFunc(0.5)
expensiveFunc(1) expensiveFunc(1)
expensiveFunc(1) expensiveFunc(1)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Note that in Python 3.2 and newer you can use the > Note that in Python 3.2 and newer you can use the
> [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache) > [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache)
> to memoize your functions. > to memoize your functions.
<a class="anchor" id="decorator-classes"></a> <a class="anchor" id="decorator-classes"></a>
## Decorator classes ## Decorator classes
By now, you will have gained the impression that a decorator is a function By now, you will have gained the impression that a decorator is a function
which _decorates_ another function. But if you went through the practical on which *decorates* another function. But if you went through the practical on
operator overloading, you might remember the special `__call__` method, that operator overloading, you might remember the special `__call__` method, that
allows an object to be called as if it were a function. allows an object to be called as if it were a function.
This feature allows us to write our decorators as classes, instead of This feature allows us to write our decorators as classes, instead of
functions. This can be handy if you are writing a decorator that has functions. This can be handy if you are writing a decorator that has
complicated behaviour, and/or needs to maintain some sort of state which complicated behaviour, and/or needs to maintain some sort of state which
cannot be easily or elegantly written using nested functions. cannot be easily or elegantly written using nested functions.
As an example, let's say we are writing a framework for unit testing. We want As an example, let's say we are writing a framework for unit testing. We want
to be able to "mark" our test functions like so, so they can be easily to be able to "mark" our test functions like so, so they can be easily
identified and executed: identified and executed:
> ``` > ```
> @unitTest > @unitTest
> def testblerk(): > def testblerk():
> """tests the blerk algorithm.""" > """tests the blerk algorithm."""
> ... > ...
> ``` > ```
With a decorator like this, we wouldn't need to worry about where our tests With a decorator like this, we wouldn't need to worry about where our tests
are located - they will all be detected because we have marked them as test are located - they will all be detected because we have marked them as test
functions. What does this `unitTest` decorator look like? functions. What does this `unitTest` decorator look like?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class TestRegistry(object): class TestRegistry(object):
def __init__(self): def __init__(self):
self.testFuncs = [] self.testFuncs = []
def __call__(self, func): def __call__(self, func):
self.testFuncs.append(func) self.testFuncs.append(func)
def listTests(self): def listTests(self):
print('All registered tests:') print('All registered tests:')
for test in self.testFuncs: for test in self.testFuncs:
print(' ', test.__name__) print(' ', test.__name__)
def runTests(self): def runTests(self):
for test in self.testFuncs: for test in self.testFuncs:
print('Running test {:10s} ... '.format(test.__name__), end='') print('Running test {:10s} ... '.format(test.__name__), end='')
try: try:
test() test()
print('passed!') print('passed!')
except Exception as e: except Exception as e:
print('failed!') print('failed!')
# Create our test registry # Create our test registry
registry = TestRegistry() registry = TestRegistry()
# Alias our registry to "unitTest" # Alias our registry to "unitTest"
# so that we can register tests # so that we can register tests
# with a "@unitTest" decorator. # with a "@unitTest" decorator.
unitTest = registry unitTest = registry
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So we've defined a class, `TestRegistry`, and created an instance of it, So we've defined a class, `TestRegistry`, and created an instance of it,
`registry`, which will manage all of our unit tests. Now, in order to "mark" `registry`, which will manage all of our unit tests. Now, in order to "mark"
any function as being a unit test, we just need to use the `unitTest` any function as being a unit test, we just need to use the `unitTest`
decorator (which is simply a reference to our `TestRegistry` instance): decorator (which is simply a reference to our `TestRegistry` instance):
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@unitTest @unitTest
def testFoo(): def testFoo():
assert 'a' in 'bcde' assert 'a' in 'bcde'
@unitTest @unitTest
def testBar(): def testBar():
assert 1 > 0 assert 1 > 0
@unitTest @unitTest
def testBlerk(): def testBlerk():
assert 9 % 2 == 0 assert 9 % 2 == 0
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now that these functions have been registered with our `TestRegistry` Now that these functions have been registered with our `TestRegistry`
instance, we can run them all: instance, we can run them all:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
registry.listTests() registry.listTests()
registry.runTests() registry.runTests()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Unit testing is something which you must do! This is __especially__ > Unit testing is something which you must do! This is **especially**
> important in an interpreted language such as Python, where there is no > important in an interpreted language such as Python, where there is no
> compiler to catch all of your mistakes. > compiler to catch all of your mistakes.
> >
> Python has a built-in > Python has a built-in
> [`unittest`](https://docs.python.org/3.5/library/unittest.html) module, > [`unittest`](https://docs.python.org/3/library/unittest.html) module,
> however the third-party [`pytest`](https://docs.pytest.org/en/latest/) and > however the third-party [`pytest`](https://docs.pytest.org/en/latest/) and
> [`nose`](http://nose2.readthedocs.io/en/latest/) are popular. It is also > [`nose`](http://nose2.readthedocs.io/en/latest/) are popular. It is also
> wise to combine your unit tests with > wise to combine your unit tests with
> [`coverage`](https://coverage.readthedocs.io/en/coverage-4.5.1/), which > [`coverage`](https://coverage.readthedocs.io/en/coverage-4.5.1/), which
> tells you how much of your code was executed, or _covered_ when your > tells you how much of your code was executed, or *covered* when your
> tests were run. > tests were run.
<a class="anchor" id="appendix-functions-are-not-special"></a> <a class="anchor" id="appendix-functions-are-not-special"></a>
## Appendix: Functions are not special ## Appendix: Functions are not special
When we write a statement like this: When we write a statement like this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
a = [1, 2, 3] a = [1, 2, 3]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
the variable `a` is a reference to a `list`. We can create a new reference to the variable `a` is a reference to a `list`. We can create a new reference to
the same list, and delete `a`: the same list, and delete `a`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
b = a b = a
del a del a
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Deleting `a` doesn't affect the list at all - the list still exists, and is Deleting `a` doesn't affect the list at all - the list still exists, and is
now referred to by a variable called `b`. now referred to by a variable called `b`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('b: ', b) print('b: ', b)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
`a` has, however, been deleted: `a` has, however, been deleted:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('a: ', a) print('a: ', a)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The variables `a` and `b` are just references to a list that is sitting in The variables `a` and `b` are just references to a list that is sitting in
memory somewhere - renaming or removing a reference does not have any effect memory somewhere - renaming or removing a reference does not have any effect
upon the list<sup>2</sup>. upon the list<sup>2</sup>.
If you are familiar with C or C++, you can think of a variable in Python as If you are familiar with C or C++, you can think of a variable in Python as
like a `void *` pointer - it is just a pointer of an unspecified type, which like a `void *` pointer - it is just a pointer of an unspecified type, which
is pointing to some item in memory (which does have a specific type). Deleting is pointing to some item in memory (which does have a specific type). Deleting
the pointer does not have any effect upon the item to which it was pointing. the pointer does not have any effect upon the item to which it was pointing.
> <sup>2</sup> Until no more references to the list exist, at which point it > <sup>2</sup> Until no more references to the list exist, at which point it
> will be > will be
> [garbage-collected](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons). > [garbage-collected](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons).
Now, functions in Python work in _exactly_ the same way as variables. When we Now, functions in Python work in _exactly_ the same way as variables. When we
define a function like this: define a function like this:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def inverse(a): def inverse(a):
return npla.inv(a) return npla.inv(a)
print(inverse) print(inverse)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
there is nothing special about the name `inverse` - `inverse` is just a there is nothing special about the name `inverse` - `inverse` is just a
reference to a function that resides somewhere in memory. We can create a new reference to a function that resides somewhere in memory. We can create a new
reference to this function: reference to this function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
inv2 = inverse inv2 = inverse
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And delete the old reference: And delete the old reference:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
del inverse del inverse
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But the function still exists, and is still callable, via our second But the function still exists, and is still callable, via our second
reference: reference:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print(inv2) print(inv2)
data = np.random.random((10, 10)) data = np.random.random((10, 10))
invdata = inv2(data) invdata = inv2(data)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So there is nothing special about functions in Python - they are just items So there is nothing special about functions in Python - they are just items
that reside somewhere in memory, and to which we can create as many references that reside somewhere in memory, and to which we can create as many references
as we like. as we like.
> If it bothers you that `print(inv2)` resulted in > If it bothers you that `print(inv2)` resulted in
> `<function inverse at ...>`, and not `<function inv2 at ...>`, then refer to > `<function inverse at ...>`, and not `<function inv2 at ...>`, then refer to
> the appendix on > the appendix on
> [preserving function metdata](#appendix-preserving-function-metadata). > [preserving function metadata](#appendix-preserving-function-metadata).
<a class="anchor" id="appendix-closures"></a> <a class="anchor" id="appendix-closures"></a>
## Appendix: Closures ## Appendix: Closures
Whenever we define or use a decorator, we are taking advantage of a concept Whenever we define or use a decorator, we are taking advantage of a concept
called a [_closure_][wiki-closure]. Take a second to re-familiarise yourself called a [*closure*][wiki-closure]. Take a second to re-familiarise yourself
with our `memoize` decorator function from earlier - when `memoize` is called, with our `memoize` decorator function from earlier - when `memoize` is called,
it creates and returns a function called `wrapper`: it creates and returns a function called `wrapper`:
[wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) [wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def memoize(func): def memoize(func):
cache = {} cache = {}
def wrapper(*args): def wrapper(*args):
# is there a value in the cache # is there a value in the cache
# for this set of inputs? # for this set of inputs?
cached = cache.get(args, None) cached = cache.get(args, None)
# If not, call the function, # If not, call the function,
# and cache the result. # and cache the result.
if cached is None: if cached is None:
cached = func(*args) cached = func(*args)
cache[args] = cached cache[args] = cached
else: else:
print('Cached {}({}): {}'.format(func.__name__, args, cached)) print('Cached {}({}): {}'.format(func.__name__, args, cached))
return cached return cached
return wrapper return wrapper
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Then `wrapper` is executed at some arbitrary point in the future. But how does Then `wrapper` is executed at some arbitrary point in the future. But how does
it have access to `cache`, defined within the scope of the `memoize` function, it have access to `cache`, defined within the scope of the `memoize` function,
after the execution of `memoize` has ended? after the execution of `memoize` has ended?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def nby2(n): def nby2(n):
return n * 2 return n * 2
# wrapper function is created here (and # wrapper function is created here (and
# assigned back to the nby2 reference) # assigned back to the nby2 reference)
nby2 = memoize(nby2) nby2 = memoize(nby2)
# wrapper function is executed here # wrapper function is executed here
print('nby2(2): ', nby2(2)) print('nby2(2): ', nby2(2))
print('nby2(2): ', nby2(2)) print('nby2(2): ', nby2(2))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The trick is that whenever a nested function is defined in Python, the scope The trick is that whenever a nested function is defined in Python, the scope
in which it is defined is preserved for that function's lifetime. So `wrapper` in which it is defined is preserved for that function's lifetime. So `wrapper`
has access to all of the variables within the `memoize` function's scope, that has access to all of the variables within the `memoize` function's scope, that
were defined at the time that `wrapper` was created (which was when we called were defined at the time that `wrapper` was created (which was when we called
`memoize`). This is why `wrapper` is able to access `cache`, even though at `memoize`). This is why `wrapper` is able to access `cache`, even though at
the time that `wrapper` is called, the execution of `memoize` has long since the time that `wrapper` is called, the execution of `memoize` has long since
finished. finished.
This is what is known as a This is what is known as a
[_closure_](https://www.geeksforgeeks.org/python-closures/). Closures are a [*closure*](https://www.geeksforgeeks.org/python-closures/). Closures are a
fundamental, and extremely powerful, aspect of Python and other high level fundamental, and extremely powerful, aspect of Python and other high level
languages. So there's your answer, languages. So there's your answer,
[fishbulb](https://www.youtube.com/watch?v=CiAaEPcnlOg). [fishbulb](https://www.youtube.com/watch?v=CiAaEPcnlOg).
<a class="anchor" id="appendix-decorators-without-arguments-versus-decorators-with-arguments"></a> <a class="anchor" id="appendix-decorators-without-arguments-versus-decorators-with-arguments"></a>
## Appendix: Decorators without arguments versus decorators with arguments ## Appendix: Decorators without arguments versus decorators with arguments
There are three ways to invoke a decorator with the `@` notation: There are three ways to invoke a decorator with the `@` notation:
1. Naming it, e.g. `@mydecorator` 1. Naming it, e.g. `@mydecorator`
2. Calling it, e.g. `@mydecorator()` 2. Calling it, e.g. `@mydecorator()`
3. Calling it, and passing it arguments, e.g. `@mydecorator(1, 2, 3)` 3. Calling it, and passing it arguments, e.g. `@mydecorator(1, 2, 3)`
Python expects a decorator function to behave differently in the second and Python expects a decorator function to behave differently in the second and
third scenarios, when compared to the first: third scenarios, when compared to the first:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def decorator(*args): def decorator(*args):
print(' decorator({})'.format(args)) print(' decorator({})'.format(args))
def wrapper(*args): def wrapper(*args):
print(' wrapper({})'.format(args)) print(' wrapper({})'.format(args))
return wrapper return wrapper
print('Scenario #1: @decorator') print('Scenario #1: @decorator')
@decorator @decorator
def noop(): def noop():
pass pass
print('\nScenario #2: @decorator()') print('\nScenario #2: @decorator()')
@decorator() @decorator()
def noop(): def noop():
pass pass
print('\nScenario #3: @decorator(1, 2, 3)') print('\nScenario #3: @decorator(1, 2, 3)')
@decorator(1, 2, 3) @decorator(1, 2, 3)
def noop(): def noop():
pass pass
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So if a decorator is "named" (scenario 1), only the decorator function So if a decorator is "named" (scenario 1), only the decorator function
(`decorator` in the example above) is called, and is passed the decorated (`decorator` in the example above) is called, and is passed the decorated
function. function.
But if a decorator function is "called" (scenarios 2 or 3), both the decorator But if a decorator function is "called" (scenarios 2 or 3), both the decorator
function (`decorator`), __and its return value__ (`wrapper`) are called - the function (`decorator`), **and its return value** (`wrapper`) are called - the
decorator function is passed the arguments that were provided, and its return decorator function is passed the arguments that were provided, and its return
value is passed the decorated function. value is passed the decorated function.
This is why, if you are writing a decorator function which expects arguments, This is why, if you are writing a decorator function which expects arguments,
you must use three layers of functions, like so: you must use three layers of functions, like so:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def decorator(*args): def decorator(*args):
def realDecorator(func): def realDecorator(func):
def wrapper(*args, **kwargs): def wrapper(*args, **kwargs):
return func(*args, **kwargs) return func(*args, **kwargs)
return wrapper return wrapper
return realDecorator return realDecorator
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> The author of this practical is angry about this, as he does not understand > The author of this practical is angry about this, as he does not understand
> why the Python language designers couldn't allow a decorator function to be > why the Python language designers couldn't allow a decorator function to be
> passed both the decorated function, and any arguments that were passed when > passed both the decorated function, and any arguments that were passed when
> the decorator was invoked, like so: > the decorator was invoked, like so:
> >
> ``` > ```
> def decorator(func, *args, **kwargs): # args/kwargs here contain > def decorator(func, *args, **kwargs): # args/kwargs here contain
> # whatever is passed to the > # whatever is passed to the
> # decorator > # decorator
> >
> def wrapper(*args, **kwargs): # args/kwargs here contain > def wrapper(*args, **kwargs): # args/kwargs here contain
> # whatever is passed to the > # whatever is passed to the
> # decorated function > # decorated function
> return func(*args, **kwargs) > return func(*args, **kwargs)
> >
> return wrapper > return wrapper
> ``` > ```
<a class="anchor" id="appendix-per-instance-decorators"></a> <a class="anchor" id="appendix-per-instance-decorators"></a>
## Appendix: Per-instance decorators ## Appendix: Per-instance decorators
In the section on [decorating methods](#decorators-on-methods), you saw In the section on [decorating methods](#decorators-on-methods), you saw
that when a decorator is applied to a method of a class, that decorator that when a decorator is applied to a method of a class, that decorator
is invoked just once, and shared by all instances of the class. Consider this is invoked just once, and shared by all instances of the class. Consider this
example: example:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def decorator(func): def decorator(func):
print('Decorating {} function'.format(func.__name__)) print('Decorating {} function'.format(func.__name__))
def wrapper(*args, **kwargs): def wrapper(*args, **kwargs):
print('Calling decorated function {}'.format(func.__name__)) print('Calling decorated function {}'.format(func.__name__))
return func(*args, **kwargs) return func(*args, **kwargs)
return wrapper return wrapper
class MiscMaths(object): class MiscMaths(object):
@decorator @decorator
def add(self, a, b): def add(self, a, b):
return a + b return a + b
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that `decorator` was called at the time that the `MiscMaths` class was Note that `decorator` was called at the time that the `MiscMaths` class was
defined. Now, all `MiscMaths` instances share the same `wrapper` function: defined. Now, all `MiscMaths` instances share the same `wrapper` function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
mm1 = MiscMaths() mm1 = MiscMaths()
mm2 = MiscMaths() mm2 = MiscMaths()
print('1 + 2 =', mm1.add(1, 2)) print('1 + 2 =', mm1.add(1, 2))
print('3 + 4 =', mm2.add(3, 4)) print('3 + 4 =', mm2.add(3, 4))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This is not an issue in many cases, but it can be problematic in some. Imagine This is not an issue in many cases, but it can be problematic in some. Imagine
if we have a decorator called `ensureNumeric`, which makes sure that arguments if we have a decorator called `ensureNumeric`, which makes sure that arguments
passed to a function are numbers: passed to a function are numbers:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def ensureNumeric(func): def ensureNumeric(func):
def wrapper(*args): def wrapper(*args):
args = tuple([float(a) for a in args]) args = tuple([float(a) for a in args])
return func(*args) return func(*args)
return wrapper return wrapper
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This all looks well and good - we can use it to decorate a numeric function, This all looks well and good - we can use it to decorate a numeric function,
allowing strings to be passed in as well: allowing strings to be passed in as well:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@ensureNumeric @ensureNumeric
def mul(a, b): def mul(a, b):
return a * b return a * b
print(mul( 2, 3)) print(mul( 2, 3))
print(mul('5', '10')) print(mul('5', '10'))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But what will happen when we try to decorate a method of a class? But what will happen when we try to decorate a method of a class?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MiscMaths(object): class MiscMaths(object):
@ensureNumeric @ensureNumeric
def add(self, a, b): def add(self, a, b):
return a + b return a + b
mm = MiscMaths() mm = MiscMaths()
print(mm.add('5', 10)) print(mm.add('5', 10))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
What happened here?? Remember that the first argument passed to any instance What happened here?? Remember that the first argument passed to any instance
method is the instance itself (the `self` argument). Well, the `MiscMaths` method is the instance itself (the `self` argument). Well, the `MiscMaths`
instance was passed to the `wrapper` function, which then tried to convert it instance was passed to the `wrapper` function, which then tried to convert it
into a `float`. So we can't actually apply the `ensureNumeric` function as a into a `float`. So we can't actually apply the `ensureNumeric` function as a
decorator on a method in this way. decorator on a method in this way.
There are a few potential solutions here. We could modify the `ensureNumeric` There are a few potential solutions here. We could modify the `ensureNumeric`
function, so that the `wrapper` ignores the first argument. But this would function, so that the `wrapper` ignores the first argument. But this would
mean that we couldn't use `ensureNumeric` with standalone functions. mean that we couldn't use `ensureNumeric` with standalone functions.
But we _can_ manually apply the `ensureNumeric` decorator to `MiscMaths` But we *can* manually apply the `ensureNumeric` decorator to `MiscMaths`
instances when they are initialised. We can't use the nice `@ensureNumeric` instances when they are initialised. We can't use the nice `@ensureNumeric`
syntax to apply our decorators, but this is a viable approach: syntax to apply our decorators, but this is a viable approach:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class MiscMaths(object): class MiscMaths(object):
def __init__(self): def __init__(self):
self.add = ensureNumeric(self.add) self.add = ensureNumeric(self.add)
def add(self, a, b): def add(self, a, b):
return a + b return a + b
mm = MiscMaths() mm = MiscMaths()
print(mm.add('5', 10)) print(mm.add('5', 10))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Another approach is to use a second decorator, which dynamically creates the Another approach is to use a second decorator, which dynamically creates the
real decorator when it is accessed on an instance. This requires the use of an real decorator when it is accessed on an instance. This requires the use of an
advanced Python technique called advanced Python technique called
[_descriptors_](https://docs.python.org/3.5/howto/descriptor.html), which is [*descriptors*](https://docs.python.org/3/howto/descriptor.html), which is
beyond the scope of this practical. But if you are interested, you can see an beyond the scope of this practical. But if you are interested, you can see an
implementation of this approach implementation of this approach
[here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.8/fsl/utils/memoize.py#L249). [here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.8/fsl/utils/memoize.py#L249).
<a class="anchor" id="appendix-preserving-function-metadata"></a> <a class="anchor" id="appendix-preserving-function-metadata"></a>
## Appendix: Preserving function metadata ## Appendix: Preserving function metadata
You may have noticed that when we decorate a function, some of its properties You may have noticed that when we decorate a function, some of its properties
are lost. Consider this function: are lost. Consider this function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def add2(a, b): def add2(a, b):
"""Adds two numbers together.""" """Adds two numbers together."""
return a + b return a + b
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `add2` function is an object which has some attributes, e.g.: The `add2` function is an object which has some attributes, e.g.:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('Name: ', add2.__name__) print('Name: ', add2.__name__)
print('Help: ', add2.__doc__) print('Help: ', add2.__doc__)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
However, when we apply a decorator to `add2`: However, when we apply a decorator to `add2`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def decorator(func): def decorator(func):
def wrapper(*args, **kwargs): def wrapper(*args, **kwargs):
"""Internal wrapper function for decorator.""" """Internal wrapper function for decorator."""
print('Calling decorated function {}'.format(func.__name__)) print('Calling decorated function {}'.format(func.__name__))
return func(*args, **kwargs) return func(*args, **kwargs)
return wrapper return wrapper
@decorator @decorator
def add2(a, b): def add2(a, b):
"""Adds two numbers together.""" """Adds two numbers together."""
return a + b return a + b
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Those attributes are lost, and instead we get the attributes of the `wrapper` Those attributes are lost, and instead we get the attributes of the `wrapper`
function: function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('Name: ', add2.__name__) print('Name: ', add2.__name__)
print('Help: ', add2.__doc__) print('Help: ', add2.__doc__)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
While this may be inconsequential in most situations, it can be quite annoying While this may be inconsequential in most situations, it can be quite annoying
in some, such as when we are automatically [generating in some, such as when we are automatically [generating
documentation](http://www.sphinx-doc.org/) for our code. documentation](http://www.sphinx-doc.org/) for our code.
Fortunately, there is a workaround, available in the built-in Fortunately, there is a workaround, available in the built-in
[`functools`](https://docs.python.org/3.5/library/functools.html#functools.wraps) [`functools`](https://docs.python.org/3/library/functools.html#functools.wraps)
module: module:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import functools import functools
def decorator(func): def decorator(func):
@functools.wraps(func) @functools.wraps(func)
def wrapper(*args, **kwargs): def wrapper(*args, **kwargs):
"""Internal wrapper function for decorator.""" """Internal wrapper function for decorator."""
print('Calling decorated function {}'.format(func.__name__)) print('Calling decorated function {}'.format(func.__name__))
return func(*args, **kwargs) return func(*args, **kwargs)
return wrapper return wrapper
@decorator @decorator
def add2(a, b): def add2(a, b):
"""Adds two numbers together.""" """Adds two numbers together."""
return a + b return a + b
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We have applied the `@functools.wraps` decorator to our internal `wrapper` We have applied the `@functools.wraps` decorator to our internal `wrapper`
function - this will essentially replace the `wrapper` function metdata with function - this will essentially replace the `wrapper` function metdata with
the metadata from our `func` function. So our `add2` name and documentation is the metadata from our `func` function. So our `add2` name and documentation is
now preserved: now preserved:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
print('Name: ', add2.__name__) print('Name: ', add2.__name__)
print('Help: ', add2.__doc__) print('Help: ', add2.__doc__)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="appendix-class-decorators"></a> <a class="anchor" id="appendix-class-decorators"></a>
## Appendix: Class decorators ## Appendix: Class decorators
> Not to be confused with [_decorator classes_](#decorator-classes)! > Not to be confused with [*decorator classes*](#decorator-classes)!
In this practical, we have shown how decorators can be applied to functions In this practical, we have shown how decorators can be applied to functions
and methods. But decorators can in fact also be applied to _classes_. This is and methods. But decorators can in fact also be applied to *classes*. This is
a fairly niche feature that you are probably not likely to need, so we will a fairly niche feature that you are probably not likely to need, so we will
only cover it briefly. only cover it briefly.
Imagine that we want all objects in our application to have a globally unique Imagine that we want all objects in our application to have a globally unique
(within the application) identifier. We could use a decorator which contains (within the application) identifier. We could use a decorator which contains
the logic for generating unique IDs, and defines the interface that we can the logic for generating unique IDs, and defines the interface that we can
use on an instance to obtain its ID: use on an instance to obtain its ID:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import random import random
allIds = set() allIds = set()
def uniqueID(cls): def uniqueID(cls):
class subclass(cls): class subclass(cls):
def getUniqueID(self): def getUniqueID(self):
uid = getattr(self, '_uid', None) uid = getattr(self, '_uid', None)
if uid is not None: if uid is not None:
return uid return uid
while uid is None or uid in set(): while uid is None or uid in set():
uid = random.randint(1, 100) uid = random.randint(1, 100)
self._uid = uid self._uid = uid
return uid return uid
return subclass return subclass
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now we can use the `@uniqueID` decorator on any class that we need to Now we can use the `@uniqueID` decorator on any class that we need to
have a unique ID: have a unique ID:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
@uniqueID @uniqueID
class Foo(object): class Foo(object):
pass pass
@uniqueID @uniqueID
class Bar(object): class Bar(object):
pass pass
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
All instances of these classes will have a `getUniqueID` method: All instances of these classes will have a `getUniqueID` method:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
f1 = Foo() f1 = Foo()
f2 = Foo() f2 = Foo()
b1 = Bar() b1 = Bar()
b2 = Bar() b2 = Bar()
print('f1: ', f1.getUniqueID()) print('f1: ', f1.getUniqueID())
print('f2: ', f2.getUniqueID()) print('f2: ', f2.getUniqueID())
print('b1: ', b1.getUniqueID()) print('b1: ', b1.getUniqueID())
print('b2: ', b2.getUniqueID()) print('b2: ', b2.getUniqueID())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="useful-references"></a> <a class="anchor" id="useful-references"></a>
## Useful references ## Useful references
* [Understanding decorators in 12 easy steps](http://simeonfranklin.com/blog/2012/jul/1/python-decorators-in-12-steps/) * [Understanding decorators in 12 easy steps](http://simeonfranklin.com/blog/2012/jul/1/python-decorators-in-12-steps/)
* [The decorators they won't tell you about](https://github.com/hchasestevens/hchasestevens.github.io/blob/master/notebooks/the-decorators-they-wont-tell-you-about.ipynb) * [The decorators they won't tell you about](https://github.com/hchasestevens/hchasestevens.github.io/blob/master/notebooks/the-decorators-they-wont-tell-you-about.ipynb)
* [Closures - Wikipedia][wiki-closure] * [Closures - Wikipedia][wiki-closure]
* [Closures in Python](https://www.geeksforgeeks.org/python-closures/) * [Closures in Python](https://www.geeksforgeeks.org/python-closures/)
* [Garbage collection in Python](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons) * [Garbage collection in Python](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons)
[wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) [wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming)
......
...@@ -100,7 +100,7 @@ This new `timeFunc` function is again passed a function `func`, but this time ...@@ -100,7 +100,7 @@ This new `timeFunc` function is again passed a function `func`, but this time
as its sole argument. It then creates and returns a new function, as its sole argument. It then creates and returns a new function,
`wrapperFunc`. This `wrapperFunc` function calls and times the function that `wrapperFunc`. This `wrapperFunc` function calls and times the function that
was passed to `timeFunc`. But note that when `timeFunc` is called, was passed to `timeFunc`. But note that when `timeFunc` is called,
`wrapperFunc` is _not_ called - it is only created and returned. `wrapperFunc` is *not* called - it is only created and returned.
Let's use our new `timeFunc` implementation: Let's use our new `timeFunc` implementation:
...@@ -151,9 +151,9 @@ which holds a reference to the original definition of `inverse`. ...@@ -151,9 +151,9 @@ which holds a reference to the original definition of `inverse`.
> [functions are not special](#appendix-functions-are-not-special). > [functions are not special](#appendix-functions-are-not-special).
Guess what? We have just created a __decorator__. A decorator is simply a Guess what? We have just created a **decorator**. A decorator is simply a
function which accepts a function as its input, and returns another function function which accepts a function as its input, and returns another function
as its output. In the example above, we have _decorated_ the `inverse` as its output. In the example above, we have *decorated* the `inverse`
function with the `timeFunc` decorator. function with the `timeFunc` decorator.
...@@ -228,7 +228,7 @@ MiscMaths.inverse = timeFunc(MiscMaths.inverse) ...@@ -228,7 +228,7 @@ MiscMaths.inverse = timeFunc(MiscMaths.inverse)
``` ```
So only one `wrapperFunc` function exists, and this function is _shared_ by So only one `wrapperFunc` function exists, and this function is *shared* by
all instances of the `MiscMaths` class - (such as the `mm1` and `mm2` all instances of the `MiscMaths` class - (such as the `mm1` and `mm2`
instances in the example above). In many cases this is not a problem, but instances in the example above). In many cases this is not a problem, but
there can be situations where you need each instance of your class to have its there can be situations where you need each instance of your class to have its
...@@ -400,11 +400,11 @@ def limitedMemoize(maxSize): ...@@ -400,11 +400,11 @@ def limitedMemoize(maxSize):
``` ```
> We used the handy > We used the handy
> [`collections.OrderedDict`](https://docs.python.org/3.5/library/collections.html#collections.OrderedDict) > [`collections.OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict)
> class here which preserves the insertion order of key-value pairs. > class here which preserves the insertion order of key-value pairs.
This is starting to look a little complicated - we now have _three_ layers of This is starting to look a little complicated - we now have *three* layers of
functions. This is necessary when you wish to write a decorator which accepts functions. This is necessary when you wish to write a decorator which accepts
arguments (refer to the arguments (refer to the
[appendix](#appendix-decorators-without-arguments-versus-decorators-with-arguments) [appendix](#appendix-decorators-without-arguments-versus-decorators-with-arguments)
...@@ -505,7 +505,7 @@ expensiveFunc(1) ...@@ -505,7 +505,7 @@ expensiveFunc(1)
By now, you will have gained the impression that a decorator is a function By now, you will have gained the impression that a decorator is a function
which _decorates_ another function. But if you went through the practical on which *decorates* another function. But if you went through the practical on
operator overloading, you might remember the special `__call__` method, that operator overloading, you might remember the special `__call__` method, that
allows an object to be called as if it were a function. allows an object to be called as if it were a function.
...@@ -596,17 +596,17 @@ registry.runTests() ...@@ -596,17 +596,17 @@ registry.runTests()
``` ```
> Unit testing is something which you must do! This is __especially__ > Unit testing is something which you must do! This is **especially**
> important in an interpreted language such as Python, where there is no > important in an interpreted language such as Python, where there is no
> compiler to catch all of your mistakes. > compiler to catch all of your mistakes.
> >
> Python has a built-in > Python has a built-in
> [`unittest`](https://docs.python.org/3.5/library/unittest.html) module, > [`unittest`](https://docs.python.org/3/library/unittest.html) module,
> however the third-party [`pytest`](https://docs.pytest.org/en/latest/) and > however the third-party [`pytest`](https://docs.pytest.org/en/latest/) and
> [`nose`](http://nose2.readthedocs.io/en/latest/) are popular. It is also > [`nose`](http://nose2.readthedocs.io/en/latest/) are popular. It is also
> wise to combine your unit tests with > wise to combine your unit tests with
> [`coverage`](https://coverage.readthedocs.io/en/coverage-4.5.1/), which > [`coverage`](https://coverage.readthedocs.io/en/coverage-4.5.1/), which
> tells you how much of your code was executed, or _covered_ when your > tells you how much of your code was executed, or *covered* when your
> tests were run. > tests were run.
...@@ -713,7 +713,7 @@ as we like. ...@@ -713,7 +713,7 @@ as we like.
> If it bothers you that `print(inv2)` resulted in > If it bothers you that `print(inv2)` resulted in
> `<function inverse at ...>`, and not `<function inv2 at ...>`, then refer to > `<function inverse at ...>`, and not `<function inv2 at ...>`, then refer to
> the appendix on > the appendix on
> [preserving function metdata](#appendix-preserving-function-metadata). > [preserving function metadata](#appendix-preserving-function-metadata).
<a class="anchor" id="appendix-closures"></a> <a class="anchor" id="appendix-closures"></a>
...@@ -721,7 +721,7 @@ as we like. ...@@ -721,7 +721,7 @@ as we like.
Whenever we define or use a decorator, we are taking advantage of a concept Whenever we define or use a decorator, we are taking advantage of a concept
called a [_closure_][wiki-closure]. Take a second to re-familiarise yourself called a [*closure*][wiki-closure]. Take a second to re-familiarise yourself
with our `memoize` decorator function from earlier - when `memoize` is called, with our `memoize` decorator function from earlier - when `memoize` is called,
it creates and returns a function called `wrapper`: it creates and returns a function called `wrapper`:
...@@ -783,7 +783,7 @@ finished. ...@@ -783,7 +783,7 @@ finished.
This is what is known as a This is what is known as a
[_closure_](https://www.geeksforgeeks.org/python-closures/). Closures are a [*closure*](https://www.geeksforgeeks.org/python-closures/). Closures are a
fundamental, and extremely powerful, aspect of Python and other high level fundamental, and extremely powerful, aspect of Python and other high level
languages. So there's your answer, languages. So there's your answer,
[fishbulb](https://www.youtube.com/watch?v=CiAaEPcnlOg). [fishbulb](https://www.youtube.com/watch?v=CiAaEPcnlOg).
...@@ -834,7 +834,7 @@ function. ...@@ -834,7 +834,7 @@ function.
But if a decorator function is "called" (scenarios 2 or 3), both the decorator But if a decorator function is "called" (scenarios 2 or 3), both the decorator
function (`decorator`), __and its return value__ (`wrapper`) are called - the function (`decorator`), **and its return value** (`wrapper`) are called - the
decorator function is passed the arguments that were provided, and its return decorator function is passed the arguments that were provided, and its return
value is passed the decorated function. value is passed the decorated function.
...@@ -966,7 +966,7 @@ function, so that the `wrapper` ignores the first argument. But this would ...@@ -966,7 +966,7 @@ function, so that the `wrapper` ignores the first argument. But this would
mean that we couldn't use `ensureNumeric` with standalone functions. mean that we couldn't use `ensureNumeric` with standalone functions.
But we _can_ manually apply the `ensureNumeric` decorator to `MiscMaths` But we *can* manually apply the `ensureNumeric` decorator to `MiscMaths`
instances when they are initialised. We can't use the nice `@ensureNumeric` instances when they are initialised. We can't use the nice `@ensureNumeric`
syntax to apply our decorators, but this is a viable approach: syntax to apply our decorators, but this is a viable approach:
...@@ -988,7 +988,7 @@ print(mm.add('5', 10)) ...@@ -988,7 +988,7 @@ print(mm.add('5', 10))
Another approach is to use a second decorator, which dynamically creates the Another approach is to use a second decorator, which dynamically creates the
real decorator when it is accessed on an instance. This requires the use of an real decorator when it is accessed on an instance. This requires the use of an
advanced Python technique called advanced Python technique called
[_descriptors_](https://docs.python.org/3.5/howto/descriptor.html), which is [*descriptors*](https://docs.python.org/3/howto/descriptor.html), which is
beyond the scope of this practical. But if you are interested, you can see an beyond the scope of this practical. But if you are interested, you can see an
implementation of this approach implementation of this approach
[here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.8/fsl/utils/memoize.py#L249). [here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.8/fsl/utils/memoize.py#L249).
...@@ -1053,7 +1053,7 @@ documentation](http://www.sphinx-doc.org/) for our code. ...@@ -1053,7 +1053,7 @@ documentation](http://www.sphinx-doc.org/) for our code.
Fortunately, there is a workaround, available in the built-in Fortunately, there is a workaround, available in the built-in
[`functools`](https://docs.python.org/3.5/library/functools.html#functools.wraps) [`functools`](https://docs.python.org/3/library/functools.html#functools.wraps)
module: module:
...@@ -1091,11 +1091,11 @@ print('Help: ', add2.__doc__) ...@@ -1091,11 +1091,11 @@ print('Help: ', add2.__doc__)
## Appendix: Class decorators ## Appendix: Class decorators
> Not to be confused with [_decorator classes_](#decorator-classes)! > Not to be confused with [*decorator classes*](#decorator-classes)!
In this practical, we have shown how decorators can be applied to functions In this practical, we have shown how decorators can be applied to functions
and methods. But decorators can in fact also be applied to _classes_. This is and methods. But decorators can in fact also be applied to *classes*. This is
a fairly niche feature that you are probably not likely to need, so we will a fairly niche feature that you are probably not likely to need, so we will
only cover it briefly. only cover it briefly.
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Threading and parallel processing # Threading and parallel processing
The Python language has built-in support for multi-threading in the The Python language has built-in support for multi-threading in the
[`threading`](https://docs.python.org/3.5/library/threading.html) module, and [`threading`](https://docs.python.org/3/library/threading.html) module, and
true parallelism in the true parallelism in the
[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html) [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
module. If you want to be impressed, skip straight to the section on module. If you want to be impressed, skip straight to the section on
[`multiprocessing`](todo). [`multiprocessing`](todo).
> *Note*: If you are familiar with a "real" programming language such as C++
> or Java, you might be disappointed with the native support for parallelism in
> Python. Python threads do not run in parallel because of the Global
> Interpreter Lock, and if you use `multiprocessing`, be prepared to either
> bear the performance hit of copying data between processes, or jump through
> hoops order to share data between processes.
>
> This limitation *might* be solved in a future Python release by way of
> [*sub-interpreters*](https://www.python.org/dev/peps/pep-0554/), but the
> author of this practical is not holding his breath.
* [Threading](#threading)
* [Subclassing `Thread`](#subclassing-thread)
* [Daemon threads](#daemon-threads)
* [Thread synchronisation](#thread-synchronisation)
* [`Lock`](#lock)
* [`Event`](#event)
* [The Global Interpreter Lock (GIL)](#the-global-interpreter-lock-gil)
* [Multiprocessing](#multiprocessing)
* [`threading`-equivalent API](#threading-equivalent-api)
* [Higher-level API - the `multiprocessing.Pool`](#higher-level-api-the-multiprocessing-pool)
* [`Pool.map`](#pool-map)
* [`Pool.apply_async`](#pool-apply-async)
* [Sharing data between processes](#sharing-data-between-processes)
* [Read-only sharing](#read-only-sharing)
* [Read/write sharing](#read-write-sharing)
<a class="anchor" id="threading"></a>
## Threading ## Threading
The [`threading`](https://docs.python.org/3.5/library/threading.html) module The [`threading`](https://docs.python.org/3/library/threading.html) module
provides a traditional multi-threading API that should be familiar to you if provides a traditional multi-threading API that should be familiar to you if
you have worked with threads in other languages. you have worked with threads in other languages.
Running a task in a separate thread in Python is easy - simply create a Running a task in a separate thread in Python is easy - simply create a
`Thread` object, and pass it the function or method that you want it to `Thread` object, and pass it the function or method that you want it to
run. Then call its `start` method: run. Then call its `start` method:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
import threading import threading
def longRunningTask(niters): def longRunningTask(niters):
for i in range(niters): for i in range(niters):
if i % 2 == 0: print('Tick') if i % 2 == 0: print('Tick')
else: print('Tock') else: print('Tock')
time.sleep(0.5) time.sleep(0.5)
t = threading.Thread(target=longRunningTask, args=(8,)) t = threading.Thread(target=longRunningTask, args=(8,))
t.start() t.start()
while t.is_alive(): while t.is_alive():
time.sleep(0.4) time.sleep(0.4)
print('Waiting for thread to finish...') print('Waiting for thread to finish...')
print('Finished!') print('Finished!')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can also `join` a thread, which will block execution in the current thread You can also `join` a thread, which will block execution in the current thread
until the thread that has been `join`ed has finished: until the thread that has been `join`ed has finished:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
t = threading.Thread(target=longRunningTask, args=(6, )) t = threading.Thread(target=longRunningTask, args=(6, ))
t.start() t.start()
print('Joining thread ...') print('Joining thread ...')
t.join() t.join()
print('Finished!') print('Finished!')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="subclassing-thread"></a>
### Subclassing `Thread` ### Subclassing `Thread`
It is also possible to sub-class the `Thread` class, and override its `run` It is also possible to sub-class the `Thread` class, and override its `run`
method: method:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
class LongRunningThread(threading.Thread): class LongRunningThread(threading.Thread):
def __init__(self, niters, *args, **kwargs): def __init__(self, niters, *args, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.niters = niters self.niters = niters
def run(self): def run(self):
for i in range(self.niters): for i in range(self.niters):
if i % 2 == 0: print('Tick') if i % 2 == 0: print('Tick')
else: print('Tock') else: print('Tock')
time.sleep(0.5) time.sleep(0.5)
t = LongRunningThread(6) t = LongRunningThread(6)
t.start() t.start()
t.join() t.join()
print('Done') print('Done')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="daemon-threads"></a>
### Daemon threads ### Daemon threads
By default, a Python application will not exit until _all_ active threads have By default, a Python application will not exit until _all_ active threads have
finished execution. If you want to run a task in the background for the finished execution. If you want to run a task in the background for the
duration of your application, you can mark it as a `daemon` thread - when all duration of your application, you can mark it as a `daemon` thread - when all
non-daemon threads in a Python application have finished, all daemon threads non-daemon threads in a Python application have finished, all daemon threads
will be halted, and the application will exit. will be halted, and the application will exit.
You can mark a thread as being a daemon by setting an attribute on it after You can mark a thread as being a daemon by setting an attribute on it after
creation: creation:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
t = threading.Thread(target=longRunningTask) t = threading.Thread(target=longRunningTask)
t.daemon = True t.daemon = True
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
See the [`Thread` See the [`Thread`
documentation](https://docs.python.org/3.5/library/threading.html#thread-objects) documentation](https://docs.python.org/3/library/threading.html#thread-objects)
for more details. for more details.
<a class="anchor" id="thread-synchronisation"></a>
### Thread synchronisation ### Thread synchronisation
The `threading` module provides some useful thread-synchronisation primitives The `threading` module provides some useful thread-synchronisation primitives
- the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes. The - the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes. The
`threading` module also provides `Condition` and `Semaphore` classes - refer `threading` module also provides `Condition` and `Semaphore` classes - refer
to the [documentation](https://docs.python.org/3.5/library/threading.html) for to the [documentation](https://docs.python.org/3/library/threading.html) for
more details. more details.
<a class="anchor" id="lock"></a>
#### `Lock` #### `Lock`
The [`Lock`](https://docs.python.org/3.5/library/threading.html#lock-objects) The [`Lock`](https://docs.python.org/3/library/threading.html#lock-objects)
class (and its re-entrant version, the class (and its re-entrant version, the
[`RLock`](https://docs.python.org/3.5/library/threading.html#rlock-objects)) [`RLock`](https://docs.python.org/3/library/threading.html#rlock-objects))
prevents a block of code from being accessed by more than one thread at a prevents a block of code from being accessed by more than one thread at a
time. For example, if we have multiple threads running this `task` function, time. For example, if we have multiple threads running this `task` function,
their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably
become intertwined: become intertwined:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def task(): def task():
for i in range(5): for i in range(5):
print('{} Woozle '.format(i), end='') print('{} Woozle '.format(i), end='')
time.sleep(0.1) time.sleep(0.1)
print('Wuzzle') print('Wuzzle')
threads = [threading.Thread(target=task) for i in range(5)] threads = [threading.Thread(target=task) for i in range(5)]
for t in threads: for t in threads:
t.start() t.start()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
But if we protect the critical section with a `Lock` object, the output will But if we protect the critical section with a `Lock` object, the output will
look more sensible: look more sensible:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
lock = threading.Lock() lock = threading.Lock()
def task(): def task():
for i in range(5): for i in range(5):
with lock: with lock:
print('{} Woozle '.format(i), end='') print('{} Woozle '.format(i), end='')
time.sleep(0.1) time.sleep(0.1)
print('Wuzzle') print('Wuzzle')
threads = [threading.Thread(target=task) for i in range(5)] threads = [threading.Thread(target=task) for i in range(5)]
for t in threads: for t in threads:
t.start() t.start()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
> Instead of using a `Lock` object in a `with` statement, it is also possible > Instead of using a `Lock` object in a `with` statement, it is also possible
> to manually call its `acquire` and `release` methods: > to manually call its `acquire` and `release` methods:
> >
> def task(): > def task():
> for i in range(5): > for i in range(5):
> lock.acquire() > lock.acquire()
> print('{} Woozle '.format(i), end='') > print('{} Woozle '.format(i), end='')
> time.sleep(0.1) > time.sleep(0.1)
> print('Wuzzle') > print('Wuzzle')
> lock.release() > lock.release()
Python does not have any built-in constructs to implement `Lock`-based mutual Python does not have any built-in constructs to implement `Lock`-based mutual
exclusion across several functions or methods - each function/method must exclusion across several functions or methods - each function/method must
explicitly acquire/release a shared `Lock` instance. However, it is relatively explicitly acquire/release a shared `Lock` instance. However, it is relatively
straightforward to implement a decorator which does this for you: straightforward to implement a decorator which does this for you:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def mutex(func, lock): def mutex(func, lock):
def wrapper(*args): def wrapper(*args):
with lock: with lock:
func(*args) func(*args)
return wrapper return wrapper
class MyClass(object): class MyClass(object):
def __init__(self): def __init__(self):
lock = threading.Lock() lock = threading.Lock()
self.safeFunc1 = mutex(self.safeFunc1, lock) self.safeFunc1 = mutex(self.safeFunc1, lock)
self.safeFunc2 = mutex(self.safeFunc2, lock) self.safeFunc2 = mutex(self.safeFunc2, lock)
def safeFunc1(self): def safeFunc1(self):
time.sleep(0.1) time.sleep(0.1)
print('safeFunc1 start') print('safeFunc1 start')
time.sleep(0.2) time.sleep(0.2)
print('safeFunc1 end') print('safeFunc1 end')
def safeFunc2(self): def safeFunc2(self):
time.sleep(0.1) time.sleep(0.1)
print('safeFunc2 start') print('safeFunc2 start')
time.sleep(0.2) time.sleep(0.2)
print('safeFunc2 end') print('safeFunc2 end')
mc = MyClass() mc = MyClass()
f1threads = [threading.Thread(target=mc.safeFunc1) for i in range(4)] f1threads = [threading.Thread(target=mc.safeFunc1) for i in range(4)]
f2threads = [threading.Thread(target=mc.safeFunc2) for i in range(4)] f2threads = [threading.Thread(target=mc.safeFunc2) for i in range(4)]
for t in f1threads + f2threads: for t in f1threads + f2threads:
t.start() t.start()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Try removing the `mutex` lock from the two methods in the above code, and see Try removing the `mutex` lock from the two methods in the above code, and see
what it does to the output. what it does to the output.
<a class="anchor" id="event"></a>
#### `Event` #### `Event`
The The
[`Event`](https://docs.python.org/3.5/library/threading.html#event-objects) [`Event`](https://docs.python.org/3/library/threading.html#event-objects)
class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to
signal events between threads. Threads can `wait` on the event, and be awoken signal events between threads. Threads can `wait` on the event, and be awoken
when the event is `set` by another thread: when the event is `set` by another thread:
[semaphore-wiki]: https://en.wikipedia.org/wiki/Semaphore_(programming) [semaphore-wiki]: https://en.wikipedia.org/wiki/Semaphore_(programming)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import numpy as np import numpy as np
processingFinished = threading.Event() processingFinished = threading.Event()
def processData(data): def processData(data):
print('Processing data ...') print('Processing data ...')
time.sleep(2) time.sleep(2)
print('Result: {}'.format(data.mean())) print('Result: {}'.format(data.mean()))
processingFinished.set() processingFinished.set()
data = np.random.randint(1, 100, 100) data = np.random.randint(1, 100, 100)
t = threading.Thread(target=processData, args=(data,)) t = threading.Thread(target=processData, args=(data,))
t.start() t.start()
processingFinished.wait() processingFinished.wait()
print('Processing finished!') print('Processing finished!')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a class="anchor" id="the-global-interpreter-lock-gil"></a>
### The Global Interpreter Lock (GIL) ### The Global Interpreter Lock (GIL)
The [_Global Interpreter The [*Global Interpreter
Lock_](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock) Lock*](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)
is an implementation detail of [CPython](https://github.com/python/cpython) is an implementation detail of [CPython](https://github.com/python/cpython)
(the official Python interpreter). The GIL means that a multi-threaded (the official Python interpreter). The GIL means that a multi-threaded
program written in pure Python is not able to take advantage of multiple program written in pure Python is not able to take advantage of multiple
cores - this essentially means that only one thread may be executing at any cores - this essentially means that only one thread may be executing at any
point in time. point in time.
The `threading` module does still have its uses though, as this GIL problem The `threading` module does still have its uses though, as this GIL problem
does not affect tasks which involve calls to system or natively compiled does not affect tasks which involve calls to system or natively compiled
libraries (e.g. file and network I/O, Numpy operations, etc.). So you can, libraries (e.g. file and network I/O, Numpy operations, etc.). So you can,
for example, perform some expensive processing on a Numpy array in a thread for example, perform some expensive processing on a Numpy array in a thread
running on one core, whilst having another thread (e.g. user interaction) running on one core, whilst having another thread (e.g. user interaction)
running on another core. running on another core.
<a class="anchor" id="multiprocessing"></a>
## Multiprocessing ## Multiprocessing
For true parallelism, you should check out the For true parallelism, you should check out the
[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html) [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
module. module.
The `multiprocessing` module spawns sub-processes, rather than threads, and so The `multiprocessing` module spawns sub-processes, rather than threads, and so
is not subject to the GIL constraints that the `threading` module suffers is not subject to the GIL constraints that the `threading` module suffers
from. It provides two APIs - a "traditional" equivalent to that provided by from. It provides two APIs - a "traditional" equivalent to that provided by
the `threading` module, and a powerful higher-level API. the `threading` module, and a powerful higher-level API.
> Python also provides the
> [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html)
> module, which offers a simpler alternative API to `multiprocessing`. It
> offers no functionality over `multiprocessing`, so is not covered here.
<a class="anchor" id="threading-equivalent-api"></a>
### `threading`-equivalent API ### `threading`-equivalent API
The The
[`Process`](https://docs.python.org/3.5/library/multiprocessing.html#the-process-class) [`Process`](https://docs.python.org/3/library/multiprocessing.html#the-process-class)
class is the `multiprocessing` equivalent of the class is the `multiprocessing` equivalent of the
[`threading.Thread`](https://docs.python.org/3.5/library/threading.html#thread-objects) [`threading.Thread`](https://docs.python.org/3/library/threading.html#thread-objects)
class. `multprocessing` also has equivalents of the [`Lock` and `Event` class. `multprocessing` also has equivalents of the [`Lock` and `Event`
classes](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-between-processes), classes](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes),
and the other synchronisation primitives provided by `threading`. and the other synchronisation primitives provided by `threading`.
So you can simply replace `threading.Thread` with `multiprocessing.Process`, So you can simply replace `threading.Thread` with `multiprocessing.Process`,
and you will have true parallelism. and you will have true parallelism.
Because your "threads" are now independent processes, you need to be a little Because your "threads" are now independent processes, you need to be a little
careful about how to share information across them. Fortunately, the careful about how to share information across them. If you only need to share
`multiprocessing` module provides [`Queue` and `Pipe` small amounts of data, you can use the [`Queue` and `Pipe`
classes](https://docs.python.org/3.5/library/multiprocessing.html#exchanging-objects-between-processes) classes](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes),
which make it easy to share data across processes. in the `multiprocessing` module. If you are working with large amounts of data
where copying between processes is not feasible, things become more
complicated, but read on...
<a class="anchor" id="higher-level-api-the-multiprocessing-pool"></a>
### Higher-level API - the `multiprocessing.Pool` ### Higher-level API - the `multiprocessing.Pool`
The real advantages of `multiprocessing` lie in its higher level API, centered The real advantages of `multiprocessing` lie in its higher level API, centered
around the [`Pool` around the [`Pool`
class](https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers). class](https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers).
Essentially, you create a `Pool` of worker processes - you specify the number Essentially, you create a `Pool` of worker processes - you specify the number
of processes when you create the pool. of processes when you create the pool. Once you have created a `Pool`, you can
use its methods to automatically parallelise tasks. The most useful are the
`map`, `starmap` and `apply_async` methods.
The `Pool` class is a context manager, so can be used in a `with` statement,
e.g.:
> ```
> with mp.Pool(processes=16) as pool:
> # do stuff with the pool
> ```
It is possible to create a `Pool` outside of a `with` statement, but in this
case you must ensure that you call its `close` mmethod when you are finished.
Using a `Pool` in a `with` statement is therefore recommended, because you know
that it will be shut down correctly, even in the event of an error.
> The best number of processes to use for a `Pool` will depend on the system > The best number of processes to use for a `Pool` will depend on the system
> you are running on (number of cores), and the tasks you are running (e.g. > you are running on (number of cores), and the tasks you are running (e.g.
> I/O bound or CPU bound). > I/O bound or CPU bound).
Once you have created a `Pool`, you can use its methods to automatically <a class="anchor" id="pool-map"></a>
parallelise tasks. The most useful are the `map`, `starmap` and
`apply_async` methods.
#### `Pool.map` #### `Pool.map`
The The
[`Pool.map`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map) [`Pool.map`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map)
method is the multiprocessing equivalent of the built-in method is the multiprocessing equivalent of the built-in
[`map`](https://docs.python.org/3.5/library/functions.html#map) function - it [`map`](https://docs.python.org/3/library/functions.html#map) function - it
is given a function, and a sequence, and it applies the function to each is given a function, and a sequence, and it applies the function to each
element in the sequence. element in the sequence.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
import multiprocessing as mp import multiprocessing as mp
import numpy as np import numpy as np
def crunchImage(imgfile): def crunchImage(imgfile):
# Load a nifti image, do stuff # Load a nifti image, do stuff
# to it. Use your imagination # to it. Use your imagination
# to fill in this function. # to fill in this function.
time.sleep(2) time.sleep(2)
# numpy's random number generator # numpy's random number generator
# will be initialised in the same # will be initialised in the same
# way in each process, so let's # way in each process, so let's
# re-seed it. # re-seed it.
np.random.seed() np.random.seed()
result = np.random.randint(1, 100, 1) result = np.random.randint(1, 100, 1)
print(imgfile, ':', result) print(imgfile, ':', result)
return result return result
imgfiles = ['{:02d}.nii.gz'.format(i) for i in range(20)] imgfiles = ['{:02d}.nii.gz'.format(i) for i in range(20)]
p = mp.Pool(processes=16)
print('Crunching images...') print('Crunching images...')
start = time.time() start = time.time()
results = p.map(crunchImage, imgfiles)
end = time.time() with mp.Pool(processes=16) as p:
results = p.map(crunchImage, imgfiles)
end = time.time()
print('Total execution time: {:0.2f} seconds'.format(end - start)) print('Total execution time: {:0.2f} seconds'.format(end - start))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `Pool.map` method only works with functions that accept one argument, such The `Pool.map` method only works with functions that accept one argument, such
as our `crunchImage` function above. If you have a function which accepts as our `crunchImage` function above. If you have a function which accepts
multiple arguments, use the multiple arguments, use the
[`Pool.starmap`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.starmap) [`Pool.starmap`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)
method instead: method instead:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def crunchImage(imgfile, modality): def crunchImage(imgfile, modality):
time.sleep(2) time.sleep(2)
np.random.seed() np.random.seed()
if modality == 't1': if modality == 't1':
result = np.random.randint(1, 100, 1) result = np.random.randint(1, 100, 1)
elif modality == 't2': elif modality == 't2':
result = np.random.randint(100, 200, 1) result = np.random.randint(100, 200, 1)
print(imgfile, ': ', result) print(imgfile, ': ', result)
return result return result
imgfiles = ['t1_{:02d}.nii.gz'.format(i) for i in range(10)] + \ imgfiles = ['t1_{:02d}.nii.gz'.format(i) for i in range(10)] + \
['t2_{:02d}.nii.gz'.format(i) for i in range(10)] ['t2_{:02d}.nii.gz'.format(i) for i in range(10)]
modalities = ['t1'] * 10 + ['t2'] * 10 modalities = ['t1'] * 10 + ['t2'] * 10
pool = mp.Pool(processes=16)
args = [(f, m) for f, m in zip(imgfiles, modalities)] args = [(f, m) for f, m in zip(imgfiles, modalities)]
print('Crunching images...') print('Crunching images...')
start = time.time() start = time.time()
results = pool.starmap(crunchImage, args)
end = time.time() with mp.Pool(processes=16) as pool:
results = pool.starmap(crunchImage, args)
end = time.time()
print('Total execution time: {:0.2f} seconds'.format(end - start)) print('Total execution time: {:0.2f} seconds'.format(end - start))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `map` and `starmap` methods also have asynchronous equivalents `map_async` The `map` and `starmap` methods also have asynchronous equivalents `map_async`
and `starmap_async`, which return immediately. Refer to the and `starmap_async`, which return immediately. Refer to the
[`Pool`](https://docs.python.org/3.5/library/multiprocessing.html#module-multiprocessing.pool) [`Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
documentation for more details. documentation for more details.
<a class="anchor" id="pool-apply-async"></a>
#### `Pool.apply_async` #### `Pool.apply_async`
The The
[`Pool.apply`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply) [`Pool.apply`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply)
method will execute a function on one of the processes, and block until it has method will execute a function on one of the processes, and block until it has
finished. The finished. The
[`Pool.apply_async`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async) [`Pool.apply_async`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)
method returns immediately, and is thus more suited to asynchronously method returns immediately, and is thus more suited to asynchronously
scheduling multiple jobs to run in parallel. scheduling multiple jobs to run in parallel.
`apply_async` returns an object of type `apply_async` returns an object of type
[`AsyncResult`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.AsyncResult). [`AsyncResult`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult).
An `AsyncResult` object has `wait` and `get` methods which will block until An `AsyncResult` object has `wait` and `get` methods which will block until
the job has completed. the job has completed.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import time import time
import multiprocessing as mp import multiprocessing as mp
import numpy as np import numpy as np
def linear_registration(src, ref): def linear_registration(src, ref):
time.sleep(1) time.sleep(1)
return np.eye(4) return np.eye(4)
def nonlinear_registration(src, ref, affine): def nonlinear_registration(src, ref, affine):
time.sleep(3) time.sleep(3)
# this number represents a non-linear warp # this number represents a non-linear warp
# field - use your imagination people! # field - use your imagination people!
np.random.seed() np.random.seed()
return np.random.randint(1, 100, 1) return np.random.randint(1, 100, 1)
t1s = ['{:02d}_t1.nii.gz'.format(i) for i in range(20)] t1s = ['{:02d}_t1.nii.gz'.format(i) for i in range(20)]
std = 'MNI152_T1_2mm.nii.gz' std = 'MNI152_T1_2mm.nii.gz'
pool = mp.Pool(processes=16)
print('Running structural-to-standard registration ' print('Running structural-to-standard registration '
'on {} subjects...'.format(len(t1s))) 'on {} subjects...'.format(len(t1s)))
# Run linear registration on all the T1s. # Run linear registration on all the T1s.
#
# We build a list of AsyncResult objects
linresults = [pool.apply_async(linear_registration, (t1, std))
for t1 in t1s]
# Then we wait for each job to finish,
# and replace its AsyncResult object
# with the actual result - an affine
# transformation matrix.
start = time.time() start = time.time()
for i, r in enumerate(linresults): with mp.Pool(processes=16) as pool:
linresults[i] = r.get()
# We build a list of AsyncResult objects
linresults = [pool.apply_async(linear_registration, (t1, std))
for t1 in t1s]
# Then we wait for each job to finish,
# and replace its AsyncResult object
# with the actual result - an affine
# transformation matrix.
for i, r in enumerate(linresults):
linresults[i] = r.get()
end = time.time() end = time.time()
print('Linear registrations completed in ' print('Linear registrations completed in '
'{:0.2f} seconds'.format(end - start)) '{:0.2f} seconds'.format(end - start))
# Run non-linear registration on all the T1s, # Run non-linear registration on all the T1s,
# using the linear registrations to initialise. # using the linear registrations to initialise.
nlinresults = [pool.apply_async(nonlinear_registration, (t1, std, aff))
for (t1, aff) in zip(t1s, linresults)]
# Wait for each non-linear reg to finish,
# and store the resulting warp field.
start = time.time() start = time.time()
for i, r in enumerate(nlinresults): with mp.Pool(processes=16) as pool:
nlinresults[i] = r.get() nlinresults = [pool.apply_async(nonlinear_registration, (t1, std, aff))
for (t1, aff) in zip(t1s, linresults)]
# Wait for each non-linear reg to finish,
# and store the resulting warp field.
for i, r in enumerate(nlinresults):
nlinresults[i] = r.get()
end = time.time() end = time.time()
print('Non-linear registrations completed in ' print('Non-linear registrations completed in '
'{:0.2f} seconds'.format(end - start)) '{:0.2f} seconds'.format(end - start))
print('Non linear registrations:') print('Non linear registrations:')
for t1, result in zip(t1s, nlinresults): for t1, result in zip(t1s, nlinresults):
print(t1, ':', result) print(t1, ':', result)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Sharing data between processes <a class="anchor" id="sharing-data-between-processes"></a>
## Sharing data between processes
When you use the `Pool.map` method (or any of the other methods we have shown) When you use the `Pool.map` method (or any of the other methods we have shown)
to run a function on a sequence of items, those items must be copied into the to run a function on a sequence of items, those items must be copied into the
memory of each of the child processes. When the child processes are finished, memory of each of the child processes. When the child processes are finished,
the data that they return then has to be copied back to the parent process. the data that they return then has to be copied back to the parent process.
Any items which you wish to pass to a function that is executed by a `Pool` Any items which you wish to pass to a function that is executed by a `Pool`
must be - the built-in must be *pickleable*<sup>1</sup> - the built-in
[`pickle`](https://docs.python.org/3.5/library/pickle.html) module is used by [`pickle`](https://docs.python.org/3/library/pickle.html) module is used by
`multiprocessing` to serialise and de-serialise the data passed into and `multiprocessing` to serialise and de-serialise the data passed to and
returned from a child process. The majority of standard Python types (`list`, returned from a child process. The majority of standard Python types (`list`,
`dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only `dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only
need to worry about this detail if you are passing objects of a custom type need to worry about this detail if you are passing objects of a custom type
(e.g. instances of classes that you have written, or that are defined in some (e.g. instances of classes that you have written, or that are defined in some
third-party library). third-party library).
> <sup>1</sup>*Pickleable* is the term used in the Python world to refer to
> something that is *serialisable* - basically, the process of converting an
> in-memory object into a binary form that can be stored and/or transmitted.
There is obviously some overhead in copying data back and forth between the There is obviously some overhead in copying data back and forth between the
main process and the worker processes. For most computationally intensive main process and the worker processes; this may or may not be a problem. For
tasks, this communication overhead is not important - the performance most computationally intensive tasks, this communication overhead is not
bottleneck is typically going to be the computation time, rather than I/O important - the performance bottleneck is typically going to be the
between the parent and child processes. You may need to spend some time computation time, rather than I/O between the parent and child processes.
adjusting the way in which you split up your data, and the number of
processes, in order to get the best performance.
However, if you have determined that copying data between processes is having However, if you are working with a large dataset, you have determined that
a substantial impact on your performance, the `multiprocessing` module copying data between processes is having a substantial impact on your
provides the [`Value`, `Array`, and `RawArray` performance, and instead wish to *share* a single copy of the data between
classes](https://docs.python.org/3.5/library/multiprocessing.html#shared-ctypes-objects), the processes, you will need to:
which allow you to share individual values, or arrays of values, respectively.
1. Structure your code so that the data you want to share is accessible at
the *module level*.
2. Define/create/load the data *before* creating the `Pool`.
This is because, when you create a `Pool`, what actually happens is that the
process your Pythonn script is running in will [**fork**][wiki-fork] itself -
the child processes that are created are used as the worker processes by the
`Pool`. And if you create/load your data in your main process *before* this
fork occurs, all of the child processes will inherit the memory space of the
main process, and will therefore have (read-only) access to the data, without
any copying required.
The `Array` and `RawArray` classes essentially wrap a typed pointer (from the
built-in [`ctypes`](https://docs.python.org/3.5/library/ctypes.html) module)
to a block of memory. We can use the `Array` or `RawArray` class to share a
Numpy array between our worker processes. The difference between an `Array`
and a `RawArray` is that the former offers synchronised (i.e. process-safe)
access to the shared memory. This is necessary if your child processes will be
modifying the same parts of your data.
[wiki-fork]: https://en.wikipedia.org/wiki/Fork_(system_call)
Due to the way that shared memory works, in order to share a Numpy array
between different processes you need to structure your code so that the <a class="anchor" id="read-only-sharing"></a>
array(s) you want to share are accessible at the _module level_. Furthermore, ### Read-only sharing
we need to make sure that our input and output arrays are located in shared
memory - we can do this via the `Array` or `RawArray`.
Let's see this in action with a simple example. We'll start by defining a
horrible little helper function which allows us to track the total memory
usage:
%% Cell type:code id: tags:
```
import sys
import subprocess as sp
def memusage(msg):
if sys.platform == 'darwin':
total = sp.run(['sysctl', 'hw.memsize'], capture_output=True).stdout.decode()
total = int(total.split()[1]) // 1048576
usage = sp.run('vm_stat', capture_output=True).stdout.decode()
usage = usage.strip().split('\n')
usage = [l.split(':') for l in usage]
usage = {k.strip() : v.strip() for k, v in usage}
usage = int(usage['Pages free'][:-1]) / 256.0
usage = int(total - usage)
else:
stdout = sp.run(['free', '--mega'], capture_output=True).stdout.decode()
stdout = stdout.split('\n')[1].split()
total = int(stdout[1])
usage = int(stdout[2])
print('Memory usage {}: {} / {} MB'.format(msg, usage, total))
```
%% Cell type:markdown id: tags:
Now our task is simply to calculate the sum of a large array of numbers. We're
going to create a big chunk of data, and process it in chunks, keeping track
of memory usage as the task progresses:
%% Cell type:code id: tags:
```
import time
import multiprocessing as mp
import numpy as np
memusage('before creating data')
# allocate 500MB of data
data = np.random.random(500 * (1048576 // 8))
# Assign nelems values to each worker
# process (hard-coded so we need 12
# jobs to complete the task)
nelems = len(data) // 12
memusage('after creating data')
# Each job process nelems values,
# starting from the specified offset
def process_chunk(offset):
time.sleep(1)
return data[offset:offset + nelems].sum()
# Generate an offset into the data for each job -
# we will call process_chunk for each offset
offsets = range(0, len(data), nelems)
# Create our worker process pool
with mp.Pool(4) as pool:
results = pool.map_async(process_chunk, offsets)
# Wait for all of the jobs to finish
elapsed = 0
while not results.ready():
memusage('after {} seconds'.format(elapsed))
time.sleep(1)
elapsed += 1
results = results.get()
print('Total sum: ', sum(results))
print('Sanity check:', data.sum())
```
%% Cell type:markdown id: tags:
You should be able to see that only one copy of `data` is created, and is
shared by all of the worker processes without any copying taking place.
So things are reasonably straightforward if you only need read-only acess to
your data. But what if your worker processes need to be able to modify the
data? Go back to the code block above and:
1. Modify the `process_chunk` function so that it modifies every element of
its assigned portion of the data before the call to `time.sleep`. For
example:
> ```
> data[offset:offset + nelems] += 1
> ```
2. Restart the Jupyter notebook kernel (*Kernel -> Restart*) - this example is
somewhat dependent on the behaviour of the Python garbage collector, so it
helps to start afresh
2. Re-run the two code blocks, and watch what happens to the memory usage.
What happened? Well, you are seeing [copy-on-write][wiki-copy-on-write] in
action. When the `process_chunk` function is invoked, it is given a reference
to the original data array in the memory space of the parent process. But as
soon as an attempt is made to modify it, a copy of the data, in the memory
space of the child process, is created. The modifications are then applied to
this child process copy, and not to the original copy. So the total memory
usage has blown out to twice as much as before, and the changes made by each
child process are being lost!
[wiki-copy-on-write]: https://en.wikipedia.org/wiki/Copy-on-write
<a class="anchor" id="read-write-sharing"></a>
### Read/write sharing
> If you have worked with a real programming language with true parallelism
> and shared memory via within-process multi-threading, feel free to take a
> break at this point. Breathe. Relax. Go punch a hole in a wall. I've been
> coding in Python for years, and this still makes me angry. Sometimes
> ... don't tell anyone I said this ... I even find myself wishing I were
> coding in *Java* instead of Python. Ugh. I need to take a shower.
In order to truly share memory between multiple processes, the
`multiprocessing` module provides the [`Value`, `Array`, and `RawArray`
classes](https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects),
which allow you to share individual values, or arrays of values, respectively.
The `Array` and `RawArray` classes essentially wrap a typed pointer (from the
built-in [`ctypes`](https://docs.python.org/3/library/ctypes.html) module) to
a block of memory. We can use the `Array` or `RawArray` class to share a Numpy
array between our worker processes. The difference between an `Array` and a
`RawArray` is that the former offers low-level synchronised
(i.e. process-safe) access to the shared memory. This is necessary if your
child processes will be modifying the same parts of your data.
> If you need fine-grained control over synchronising access to shared data by
> multiple processes, all of the [synchronisation
> primitives](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes)
> from the `multiprocessing` module are at your disposal.
The requirements for sharing memory between processes still apply here - we
need to make our data accessible at the *module level*, and we need to create
our data before creating the `Pool`. And to achieve read and write capability,
we also need to make sure that our input and output arrays are located in
shared memory - we can do this via the `Array` or `RawArray`.
As an example, let's say we want to parallelise processing of an image by As an example, let's say we want to parallelise processing of an image by
having each worker process perform calculations on a chunk of the image. having each worker process perform calculations on a chunk of the image.
First, let's define a function which does the calculation on a specified set First, let's define a function which does the calculation on a specified set
of image coordinates: of image coordinates:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
import multiprocessing as mp import multiprocessing as mp
import ctypes import ctypes
import numpy as np import numpy as np
np.set_printoptions(suppress=True) np.set_printoptions(suppress=True)
def process_chunk(shape, idxs): def process_chunk(shape, idxs):
# Get references to our # Get references to our
# input/output data, and # input/output data, and
# create Numpy array views # create Numpy array views
# into them. # into them.
sindata = process_chunk.input_data sindata = process_chunk.input_data
soutdata = process_chunk.output_data soutdata = process_chunk.output_data
indata = np.ctypeslib.as_array(sindata) .reshape(shape) indata = np.ctypeslib.as_array(sindata) .reshape(shape)
outdata = np.ctypeslib.as_array(soutdata).reshape(shape) outdata = np.ctypeslib.as_array(soutdata).reshape(shape)
# Do the calculation on # Do the calculation on
# the specified voxels # the specified voxels
outdata[idxs] = indata[idxs] ** 2 outdata[idxs] = indata[idxs] ** 2
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Rather than passing the input and output data arrays in as arguments to the Rather than passing the input and output data arrays in as arguments to the
`process_chunk` function, we set them as attributes of the `process_chunk` `process_chunk` function, we set them as attributes of the `process_chunk`
function. This makes the input/output data accessible at the module level, function. This makes the input/output data accessible at the module level,
which is required in order to share the data between the main process and the which is required in order to share the data between the main process and the
child processes. child processes.
Now let's define a second function which process an entire image. It does the Now let's define a second function which process an entire image. It does the
following: following:
1. Initialises shared memory areas to store the input and output data. 1. Initialises shared memory areas to store the input and output data.
2. Copies the input data into shared memory. 2. Copies the input data into shared memory.
3. Sets the input and output data as attributes of the `process_chunk` function. 3. Sets the input and output data as attributes of the `process_chunk` function.
4. Creates sets of indices into the input data which, for each worker process, 4. Creates sets of indices into the input data which, for each worker process,
specifies the portion of the data that it is responsible for. specifies the portion of the data that it is responsible for.
5. Creates a worker pool, and runs the `process_chunk` function for each set 5. Creates a worker pool, and runs the `process_chunk` function for each set
of indices. of indices.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def process_dataset(data): def process_dataset(data):
nprocs = 8 nprocs = 8
origData = data origData = data
# Create arrays to store the # Create arrays to store the
# input and output data # input and output data
sindata = mp.RawArray(ctypes.c_double, data.size) sindata = mp.RawArray(ctypes.c_double, data.size)
soutdata = mp.RawArray(ctypes.c_double, data.size) soutdata = mp.RawArray(ctypes.c_double, data.size)
data = np.ctypeslib.as_array(sindata).reshape(data.shape) data = np.ctypeslib.as_array(sindata).reshape(data.shape)
outdata = np.ctypeslib.as_array(soutdata).reshape(data.shape) outdata = np.ctypeslib.as_array(soutdata).reshape(data.shape)
# Copy the input data # Copy the input data
# into shared memory # into shared memory
data[:] = origData data[:] = origData
# Make the input/output data # Make the input/output data
# accessible to the process_chunk # accessible to the process_chunk
# function. This must be done # function. This must be done
# *before* the worker pool is created. # *before* the worker pool is
# created - even though we are
# doing things differently to the
# read-only example, we are still
# making the data arrays accessible
# at the *module* level, so the
# memory they are stored in can be
# shared with the child processes.
process_chunk.input_data = sindata process_chunk.input_data = sindata
process_chunk.output_data = soutdata process_chunk.output_data = soutdata
# number of boxels to be computed # number of voxels to be computed
# by each worker process. # by each worker process.
nvox = int(data.size / nprocs) nvox = int(data.size / nprocs)
# Generate coordinates for # Generate coordinates for
# every voxel in the image # every voxel in the image
xlen, ylen, zlen = data.shape xlen, ylen, zlen = data.shape
xs, ys, zs = np.meshgrid(np.arange(xlen), xs, ys, zs = np.meshgrid(np.arange(xlen),
np.arange(ylen), np.arange(ylen),
np.arange(zlen)) np.arange(zlen))
xs = xs.flatten() xs = xs.flatten()
ys = ys.flatten() ys = ys.flatten()
zs = zs.flatten() zs = zs.flatten()
# We're going to pass each worker # We're going to pass each worker
# process a list of indices, which # process a list of indices, which
# specify the data items which that # specify the data items which that
# worker process needs to compute. # worker process needs to compute.
xs = [xs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \ xs = [xs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [xs[nvox * nprocs:]]
[xs[nvox * nprocs:]] ys = [ys[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [ys[nvox * nprocs:]]
ys = [ys[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \ zs = [zs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [zs[nvox * nprocs:]]
[ys[nvox * nprocs:]]
zs = [zs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \
[zs[nvox * nprocs:]]
# Build the argument lists for # Build the argument lists for
# each worker process. # each worker process.
args = [(data.shape, (x, y, z)) for x, y, z in zip(xs, ys, zs)] args = [(data.shape, (x, y, z)) for x, y, z in zip(xs, ys, zs)]
# Create a pool of worker # Create a pool of worker
# processes and run the jobs. # processes and run the jobs.
pool = mp.Pool(processes=nprocs) with mp.Pool(processes=nprocs) as pool:
pool.starmap(process_chunk, args)
pool.starmap(process_chunk, args)
return outdata return outdata
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now we can call our `process_data` function just like any other function: Now we can call our `process_data` function just like any other function:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
data = np.array(np.arange(64).reshape((4, 4, 4)), dtype=np.float64) indata = np.array(np.arange(64).reshape((4, 4, 4)), dtype=np.float64)
outdata = process_dataset(indata)
outdata = process_dataset(data)
print('Input') print('Input')
print(data) print(indata)
print('Output') print('Output')
print(outdata) print(outdata)
``` ```
......
...@@ -2,20 +2,47 @@ ...@@ -2,20 +2,47 @@
The Python language has built-in support for multi-threading in the The Python language has built-in support for multi-threading in the
[`threading`](https://docs.python.org/3.5/library/threading.html) module, and [`threading`](https://docs.python.org/3/library/threading.html) module, and
true parallelism in the true parallelism in the
[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html) [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
module. If you want to be impressed, skip straight to the section on module. If you want to be impressed, skip straight to the section on
[`multiprocessing`](todo). [`multiprocessing`](todo).
> *Note*: If you are familiar with a "real" programming language such as C++
> or Java, you might be disappointed with the native support for parallelism in
> Python. Python threads do not run in parallel because of the Global
> Interpreter Lock, and if you use `multiprocessing`, be prepared to either
> bear the performance hit of copying data between processes, or jump through
> hoops order to share data between processes.
>
> This limitation *might* be solved in a future Python release by way of
> [*sub-interpreters*](https://www.python.org/dev/peps/pep-0554/), but the
> author of this practical is not holding his breath.
* [Threading](#threading)
* [Subclassing `Thread`](#subclassing-thread)
* [Daemon threads](#daemon-threads)
* [Thread synchronisation](#thread-synchronisation)
* [`Lock`](#lock)
* [`Event`](#event)
* [The Global Interpreter Lock (GIL)](#the-global-interpreter-lock-gil)
* [Multiprocessing](#multiprocessing)
* [`threading`-equivalent API](#threading-equivalent-api)
* [Higher-level API - the `multiprocessing.Pool`](#higher-level-api-the-multiprocessing-pool)
* [`Pool.map`](#pool-map)
* [`Pool.apply_async`](#pool-apply-async)
* [Sharing data between processes](#sharing-data-between-processes)
* [Read-only sharing](#read-only-sharing)
* [Read/write sharing](#read-write-sharing)
<a class="anchor" id="threading"></a>
## Threading ## Threading
The [`threading`](https://docs.python.org/3.5/library/threading.html) module The [`threading`](https://docs.python.org/3/library/threading.html) module
provides a traditional multi-threading API that should be familiar to you if provides a traditional multi-threading API that should be familiar to you if
you have worked with threads in other languages. you have worked with threads in other languages.
...@@ -60,6 +87,7 @@ print('Finished!') ...@@ -60,6 +87,7 @@ print('Finished!')
``` ```
<a class="anchor" id="subclassing-thread"></a>
### Subclassing `Thread` ### Subclassing `Thread`
...@@ -86,6 +114,7 @@ print('Done') ...@@ -86,6 +114,7 @@ print('Done')
``` ```
<a class="anchor" id="daemon-threads"></a>
### Daemon threads ### Daemon threads
...@@ -107,26 +136,28 @@ t.daemon = True ...@@ -107,26 +136,28 @@ t.daemon = True
See the [`Thread` See the [`Thread`
documentation](https://docs.python.org/3.5/library/threading.html#thread-objects) documentation](https://docs.python.org/3/library/threading.html#thread-objects)
for more details. for more details.
<a class="anchor" id="thread-synchronisation"></a>
### Thread synchronisation ### Thread synchronisation
The `threading` module provides some useful thread-synchronisation primitives The `threading` module provides some useful thread-synchronisation primitives
- the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes. The - the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes. The
`threading` module also provides `Condition` and `Semaphore` classes - refer `threading` module also provides `Condition` and `Semaphore` classes - refer
to the [documentation](https://docs.python.org/3.5/library/threading.html) for to the [documentation](https://docs.python.org/3/library/threading.html) for
more details. more details.
<a class="anchor" id="lock"></a>
#### `Lock` #### `Lock`
The [`Lock`](https://docs.python.org/3.5/library/threading.html#lock-objects) The [`Lock`](https://docs.python.org/3/library/threading.html#lock-objects)
class (and its re-entrant version, the class (and its re-entrant version, the
[`RLock`](https://docs.python.org/3.5/library/threading.html#rlock-objects)) [`RLock`](https://docs.python.org/3/library/threading.html#rlock-objects))
prevents a block of code from being accessed by more than one thread at a prevents a block of code from being accessed by more than one thread at a
time. For example, if we have multiple threads running this `task` function, time. For example, if we have multiple threads running this `task` function,
their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably
...@@ -225,11 +256,12 @@ Try removing the `mutex` lock from the two methods in the above code, and see ...@@ -225,11 +256,12 @@ Try removing the `mutex` lock from the two methods in the above code, and see
what it does to the output. what it does to the output.
<a class="anchor" id="event"></a>
#### `Event` #### `Event`
The The
[`Event`](https://docs.python.org/3.5/library/threading.html#event-objects) [`Event`](https://docs.python.org/3/library/threading.html#event-objects)
class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to
signal events between threads. Threads can `wait` on the event, and be awoken signal events between threads. Threads can `wait` on the event, and be awoken
when the event is `set` by another thread: when the event is `set` by another thread:
...@@ -258,11 +290,13 @@ processingFinished.wait() ...@@ -258,11 +290,13 @@ processingFinished.wait()
print('Processing finished!') print('Processing finished!')
``` ```
<a class="anchor" id="the-global-interpreter-lock-gil"></a>
### The Global Interpreter Lock (GIL) ### The Global Interpreter Lock (GIL)
The [_Global Interpreter The [*Global Interpreter
Lock_](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock) Lock*](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)
is an implementation detail of [CPython](https://github.com/python/cpython) is an implementation detail of [CPython](https://github.com/python/cpython)
(the official Python interpreter). The GIL means that a multi-threaded (the official Python interpreter). The GIL means that a multi-threaded
program written in pure Python is not able to take advantage of multiple program written in pure Python is not able to take advantage of multiple
...@@ -278,11 +312,12 @@ running on one core, whilst having another thread (e.g. user interaction) ...@@ -278,11 +312,12 @@ running on one core, whilst having another thread (e.g. user interaction)
running on another core. running on another core.
<a class="anchor" id="multiprocessing"></a>
## Multiprocessing ## Multiprocessing
For true parallelism, you should check out the For true parallelism, you should check out the
[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html) [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
module. module.
...@@ -292,15 +327,22 @@ from. It provides two APIs - a "traditional" equivalent to that provided by ...@@ -292,15 +327,22 @@ from. It provides two APIs - a "traditional" equivalent to that provided by
the `threading` module, and a powerful higher-level API. the `threading` module, and a powerful higher-level API.
> Python also provides the
> [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html)
> module, which offers a simpler alternative API to `multiprocessing`. It
> offers no functionality over `multiprocessing`, so is not covered here.
<a class="anchor" id="threading-equivalent-api"></a>
### `threading`-equivalent API ### `threading`-equivalent API
The The
[`Process`](https://docs.python.org/3.5/library/multiprocessing.html#the-process-class) [`Process`](https://docs.python.org/3/library/multiprocessing.html#the-process-class)
class is the `multiprocessing` equivalent of the class is the `multiprocessing` equivalent of the
[`threading.Thread`](https://docs.python.org/3.5/library/threading.html#thread-objects) [`threading.Thread`](https://docs.python.org/3/library/threading.html#thread-objects)
class. `multprocessing` also has equivalents of the [`Lock` and `Event` class. `multprocessing` also has equivalents of the [`Lock` and `Event`
classes](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-between-processes), classes](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes),
and the other synchronisation primitives provided by `threading`. and the other synchronisation primitives provided by `threading`.
...@@ -309,22 +351,41 @@ and you will have true parallelism. ...@@ -309,22 +351,41 @@ and you will have true parallelism.
Because your "threads" are now independent processes, you need to be a little Because your "threads" are now independent processes, you need to be a little
careful about how to share information across them. Fortunately, the careful about how to share information across them. If you only need to share
`multiprocessing` module provides [`Queue` and `Pipe` small amounts of data, you can use the [`Queue` and `Pipe`
classes](https://docs.python.org/3.5/library/multiprocessing.html#exchanging-objects-between-processes) classes](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes),
which make it easy to share data across processes. in the `multiprocessing` module. If you are working with large amounts of data
where copying between processes is not feasible, things become more
complicated, but read on...
<a class="anchor" id="higher-level-api-the-multiprocessing-pool"></a>
### Higher-level API - the `multiprocessing.Pool` ### Higher-level API - the `multiprocessing.Pool`
The real advantages of `multiprocessing` lie in its higher level API, centered The real advantages of `multiprocessing` lie in its higher level API, centered
around the [`Pool` around the [`Pool`
class](https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers). class](https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers).
Essentially, you create a `Pool` of worker processes - you specify the number Essentially, you create a `Pool` of worker processes - you specify the number
of processes when you create the pool. of processes when you create the pool. Once you have created a `Pool`, you can
use its methods to automatically parallelise tasks. The most useful are the
`map`, `starmap` and `apply_async` methods.
The `Pool` class is a context manager, so can be used in a `with` statement,
e.g.:
> ```
> with mp.Pool(processes=16) as pool:
> # do stuff with the pool
> ```
It is possible to create a `Pool` outside of a `with` statement, but in this
case you must ensure that you call its `close` mmethod when you are finished.
Using a `Pool` in a `with` statement is therefore recommended, because you know
that it will be shut down correctly, even in the event of an error.
> The best number of processes to use for a `Pool` will depend on the system > The best number of processes to use for a `Pool` will depend on the system
...@@ -332,18 +393,14 @@ of processes when you create the pool. ...@@ -332,18 +393,14 @@ of processes when you create the pool.
> I/O bound or CPU bound). > I/O bound or CPU bound).
Once you have created a `Pool`, you can use its methods to automatically <a class="anchor" id="pool-map"></a>
parallelise tasks. The most useful are the `map`, `starmap` and
`apply_async` methods.
#### `Pool.map` #### `Pool.map`
The The
[`Pool.map`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map) [`Pool.map`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map)
method is the multiprocessing equivalent of the built-in method is the multiprocessing equivalent of the built-in
[`map`](https://docs.python.org/3.5/library/functions.html#map) function - it [`map`](https://docs.python.org/3/library/functions.html#map) function - it
is given a function, and a sequence, and it applies the function to each is given a function, and a sequence, and it applies the function to each
element in the sequence. element in the sequence.
...@@ -373,13 +430,14 @@ def crunchImage(imgfile): ...@@ -373,13 +430,14 @@ def crunchImage(imgfile):
imgfiles = ['{:02d}.nii.gz'.format(i) for i in range(20)] imgfiles = ['{:02d}.nii.gz'.format(i) for i in range(20)]
p = mp.Pool(processes=16)
print('Crunching images...') print('Crunching images...')
start = time.time() start = time.time()
results = p.map(crunchImage, imgfiles)
end = time.time() with mp.Pool(processes=16) as p:
results = p.map(crunchImage, imgfiles)
end = time.time()
print('Total execution time: {:0.2f} seconds'.format(end - start)) print('Total execution time: {:0.2f} seconds'.format(end - start))
``` ```
...@@ -388,7 +446,7 @@ print('Total execution time: {:0.2f} seconds'.format(end - start)) ...@@ -388,7 +446,7 @@ print('Total execution time: {:0.2f} seconds'.format(end - start))
The `Pool.map` method only works with functions that accept one argument, such The `Pool.map` method only works with functions that accept one argument, such
as our `crunchImage` function above. If you have a function which accepts as our `crunchImage` function above. If you have a function which accepts
multiple arguments, use the multiple arguments, use the
[`Pool.starmap`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.starmap) [`Pool.starmap`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)
method instead: method instead:
...@@ -411,15 +469,16 @@ imgfiles = ['t1_{:02d}.nii.gz'.format(i) for i in range(10)] + \ ...@@ -411,15 +469,16 @@ imgfiles = ['t1_{:02d}.nii.gz'.format(i) for i in range(10)] + \
['t2_{:02d}.nii.gz'.format(i) for i in range(10)] ['t2_{:02d}.nii.gz'.format(i) for i in range(10)]
modalities = ['t1'] * 10 + ['t2'] * 10 modalities = ['t1'] * 10 + ['t2'] * 10
pool = mp.Pool(processes=16)
args = [(f, m) for f, m in zip(imgfiles, modalities)] args = [(f, m) for f, m in zip(imgfiles, modalities)]
print('Crunching images...') print('Crunching images...')
start = time.time() start = time.time()
results = pool.starmap(crunchImage, args)
end = time.time() with mp.Pool(processes=16) as pool:
results = pool.starmap(crunchImage, args)
end = time.time()
print('Total execution time: {:0.2f} seconds'.format(end - start)) print('Total execution time: {:0.2f} seconds'.format(end - start))
``` ```
...@@ -427,24 +486,25 @@ print('Total execution time: {:0.2f} seconds'.format(end - start)) ...@@ -427,24 +486,25 @@ print('Total execution time: {:0.2f} seconds'.format(end - start))
The `map` and `starmap` methods also have asynchronous equivalents `map_async` The `map` and `starmap` methods also have asynchronous equivalents `map_async`
and `starmap_async`, which return immediately. Refer to the and `starmap_async`, which return immediately. Refer to the
[`Pool`](https://docs.python.org/3.5/library/multiprocessing.html#module-multiprocessing.pool) [`Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
documentation for more details. documentation for more details.
<a class="anchor" id="pool-apply-async"></a>
#### `Pool.apply_async` #### `Pool.apply_async`
The The
[`Pool.apply`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply) [`Pool.apply`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply)
method will execute a function on one of the processes, and block until it has method will execute a function on one of the processes, and block until it has
finished. The finished. The
[`Pool.apply_async`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async) [`Pool.apply_async`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)
method returns immediately, and is thus more suited to asynchronously method returns immediately, and is thus more suited to asynchronously
scheduling multiple jobs to run in parallel. scheduling multiple jobs to run in parallel.
`apply_async` returns an object of type `apply_async` returns an object of type
[`AsyncResult`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.AsyncResult). [`AsyncResult`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult).
An `AsyncResult` object has `wait` and `get` methods which will block until An `AsyncResult` object has `wait` and `get` methods which will block until
the job has completed. the job has completed.
...@@ -472,24 +532,24 @@ def nonlinear_registration(src, ref, affine): ...@@ -472,24 +532,24 @@ def nonlinear_registration(src, ref, affine):
t1s = ['{:02d}_t1.nii.gz'.format(i) for i in range(20)] t1s = ['{:02d}_t1.nii.gz'.format(i) for i in range(20)]
std = 'MNI152_T1_2mm.nii.gz' std = 'MNI152_T1_2mm.nii.gz'
pool = mp.Pool(processes=16)
print('Running structural-to-standard registration ' print('Running structural-to-standard registration '
'on {} subjects...'.format(len(t1s))) 'on {} subjects...'.format(len(t1s)))
# Run linear registration on all the T1s. # Run linear registration on all the T1s.
#
# We build a list of AsyncResult objects
linresults = [pool.apply_async(linear_registration, (t1, std))
for t1 in t1s]
# Then we wait for each job to finish,
# and replace its AsyncResult object
# with the actual result - an affine
# transformation matrix.
start = time.time() start = time.time()
for i, r in enumerate(linresults): with mp.Pool(processes=16) as pool:
linresults[i] = r.get()
# We build a list of AsyncResult objects
linresults = [pool.apply_async(linear_registration, (t1, std))
for t1 in t1s]
# Then we wait for each job to finish,
# and replace its AsyncResult object
# with the actual result - an affine
# transformation matrix.
for i, r in enumerate(linresults):
linresults[i] = r.get()
end = time.time() end = time.time()
print('Linear registrations completed in ' print('Linear registrations completed in '
...@@ -497,14 +557,16 @@ print('Linear registrations completed in ' ...@@ -497,14 +557,16 @@ print('Linear registrations completed in '
# Run non-linear registration on all the T1s, # Run non-linear registration on all the T1s,
# using the linear registrations to initialise. # using the linear registrations to initialise.
nlinresults = [pool.apply_async(nonlinear_registration, (t1, std, aff))
for (t1, aff) in zip(t1s, linresults)]
# Wait for each non-linear reg to finish,
# and store the resulting warp field.
start = time.time() start = time.time()
for i, r in enumerate(nlinresults): with mp.Pool(processes=16) as pool:
nlinresults[i] = r.get() nlinresults = [pool.apply_async(nonlinear_registration, (t1, std, aff))
for (t1, aff) in zip(t1s, linresults)]
# Wait for each non-linear reg to finish,
# and store the resulting warp field.
for i, r in enumerate(nlinresults):
nlinresults[i] = r.get()
end = time.time() end = time.time()
print('Non-linear registrations completed in ' print('Non-linear registrations completed in '
...@@ -516,7 +578,8 @@ for t1, result in zip(t1s, nlinresults): ...@@ -516,7 +578,8 @@ for t1, result in zip(t1s, nlinresults):
``` ```
### Sharing data between processes <a class="anchor" id="sharing-data-between-processes"></a>
## Sharing data between processes
When you use the `Pool.map` method (or any of the other methods we have shown) When you use the `Pool.map` method (or any of the other methods we have shown)
...@@ -526,9 +589,9 @@ the data that they return then has to be copied back to the parent process. ...@@ -526,9 +589,9 @@ the data that they return then has to be copied back to the parent process.
Any items which you wish to pass to a function that is executed by a `Pool` Any items which you wish to pass to a function that is executed by a `Pool`
must be - the built-in must be *pickleable*<sup>1</sup> - the built-in
[`pickle`](https://docs.python.org/3.5/library/pickle.html) module is used by [`pickle`](https://docs.python.org/3/library/pickle.html) module is used by
`multiprocessing` to serialise and de-serialise the data passed into and `multiprocessing` to serialise and de-serialise the data passed to and
returned from a child process. The majority of standard Python types (`list`, returned from a child process. The majority of standard Python types (`list`,
`dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only `dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only
need to worry about this detail if you are passing objects of a custom type need to worry about this detail if you are passing objects of a custom type
...@@ -536,36 +599,196 @@ need to worry about this detail if you are passing objects of a custom type ...@@ -536,36 +599,196 @@ need to worry about this detail if you are passing objects of a custom type
third-party library). third-party library).
> <sup>1</sup>*Pickleable* is the term used in the Python world to refer to
> something that is *serialisable* - basically, the process of converting an
> in-memory object into a binary form that can be stored and/or transmitted.
There is obviously some overhead in copying data back and forth between the There is obviously some overhead in copying data back and forth between the
main process and the worker processes. For most computationally intensive main process and the worker processes; this may or may not be a problem. For
tasks, this communication overhead is not important - the performance most computationally intensive tasks, this communication overhead is not
bottleneck is typically going to be the computation time, rather than I/O important - the performance bottleneck is typically going to be the
between the parent and child processes. You may need to spend some time computation time, rather than I/O between the parent and child processes.
adjusting the way in which you split up your data, and the number of
processes, in order to get the best performance.
However, if you are working with a large dataset, you have determined that
copying data between processes is having a substantial impact on your
However, if you have determined that copying data between processes is having performance, and instead wish to *share* a single copy of the data between
a substantial impact on your performance, the `multiprocessing` module the processes, you will need to:
provides the [`Value`, `Array`, and `RawArray`
classes](https://docs.python.org/3.5/library/multiprocessing.html#shared-ctypes-objects), 1. Structure your code so that the data you want to share is accessible at
the *module level*.
2. Define/create/load the data *before* creating the `Pool`.
This is because, when you create a `Pool`, what actually happens is that the
process your Pythonn script is running in will [**fork**][wiki-fork] itself -
the child processes that are created are used as the worker processes by the
`Pool`. And if you create/load your data in your main process *before* this
fork occurs, all of the child processes will inherit the memory space of the
main process, and will therefore have (read-only) access to the data, without
any copying required.
[wiki-fork]: https://en.wikipedia.org/wiki/Fork_(system_call)
<a class="anchor" id="read-only-sharing"></a>
### Read-only sharing
Let's see this in action with a simple example. We'll start by defining a
horrible little helper function which allows us to track the total memory
usage:
```
import sys
import subprocess as sp
def memusage(msg):
if sys.platform == 'darwin':
total = sp.run(['sysctl', 'hw.memsize'], capture_output=True).stdout.decode()
total = int(total.split()[1]) // 1048576
usage = sp.run('vm_stat', capture_output=True).stdout.decode()
usage = usage.strip().split('\n')
usage = [l.split(':') for l in usage]
usage = {k.strip() : v.strip() for k, v in usage}
usage = int(usage['Pages free'][:-1]) / 256.0
usage = int(total - usage)
else:
stdout = sp.run(['free', '--mega'], capture_output=True).stdout.decode()
stdout = stdout.split('\n')[1].split()
total = int(stdout[1])
usage = int(stdout[2])
print('Memory usage {}: {} / {} MB'.format(msg, usage, total))
```
Now our task is simply to calculate the sum of a large array of numbers. We're
going to create a big chunk of data, and process it in chunks, keeping track
of memory usage as the task progresses:
```
import time
import multiprocessing as mp
import numpy as np
memusage('before creating data')
# allocate 500MB of data
data = np.random.random(500 * (1048576 // 8))
# Assign nelems values to each worker
# process (hard-coded so we need 12
# jobs to complete the task)
nelems = len(data) // 12
memusage('after creating data')
# Each job process nelems values,
# starting from the specified offset
def process_chunk(offset):
time.sleep(1)
return data[offset:offset + nelems].sum()
# Generate an offset into the data for each job -
# we will call process_chunk for each offset
offsets = range(0, len(data), nelems)
# Create our worker process pool
with mp.Pool(4) as pool:
results = pool.map_async(process_chunk, offsets)
# Wait for all of the jobs to finish
elapsed = 0
while not results.ready():
memusage('after {} seconds'.format(elapsed))
time.sleep(1)
elapsed += 1
results = results.get()
print('Total sum: ', sum(results))
print('Sanity check:', data.sum())
```
You should be able to see that only one copy of `data` is created, and is
shared by all of the worker processes without any copying taking place.
So things are reasonably straightforward if you only need read-only acess to
your data. But what if your worker processes need to be able to modify the
data? Go back to the code block above and:
1. Modify the `process_chunk` function so that it modifies every element of
its assigned portion of the data before the call to `time.sleep`. For
example:
> ```
> data[offset:offset + nelems] += 1
> ```
2. Restart the Jupyter notebook kernel (*Kernel -> Restart*) - this example is
somewhat dependent on the behaviour of the Python garbage collector, so it
helps to start afresh
2. Re-run the two code blocks, and watch what happens to the memory usage.
What happened? Well, you are seeing [copy-on-write][wiki-copy-on-write] in
action. When the `process_chunk` function is invoked, it is given a reference
to the original data array in the memory space of the parent process. But as
soon as an attempt is made to modify it, a copy of the data, in the memory
space of the child process, is created. The modifications are then applied to
this child process copy, and not to the original copy. So the total memory
usage has blown out to twice as much as before, and the changes made by each
child process are being lost!
[wiki-copy-on-write]: https://en.wikipedia.org/wiki/Copy-on-write
<a class="anchor" id="read-write-sharing"></a>
### Read/write sharing
> If you have worked with a real programming language with true parallelism
> and shared memory via within-process multi-threading, feel free to take a
> break at this point. Breathe. Relax. Go punch a hole in a wall. I've been
> coding in Python for years, and this still makes me angry. Sometimes
> ... don't tell anyone I said this ... I even find myself wishing I were
> coding in *Java* instead of Python. Ugh. I need to take a shower.
In order to truly share memory between multiple processes, the
`multiprocessing` module provides the [`Value`, `Array`, and `RawArray`
classes](https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects),
which allow you to share individual values, or arrays of values, respectively. which allow you to share individual values, or arrays of values, respectively.
The `Array` and `RawArray` classes essentially wrap a typed pointer (from the The `Array` and `RawArray` classes essentially wrap a typed pointer (from the
built-in [`ctypes`](https://docs.python.org/3.5/library/ctypes.html) module) built-in [`ctypes`](https://docs.python.org/3/library/ctypes.html) module) to
to a block of memory. We can use the `Array` or `RawArray` class to share a a block of memory. We can use the `Array` or `RawArray` class to share a Numpy
Numpy array between our worker processes. The difference between an `Array` array between our worker processes. The difference between an `Array` and a
and a `RawArray` is that the former offers synchronised (i.e. process-safe) `RawArray` is that the former offers low-level synchronised
access to the shared memory. This is necessary if your child processes will be (i.e. process-safe) access to the shared memory. This is necessary if your
modifying the same parts of your data. child processes will be modifying the same parts of your data.
> If you need fine-grained control over synchronising access to shared data by
> multiple processes, all of the [synchronisation
> primitives](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes)
> from the `multiprocessing` module are at your disposal.
Due to the way that shared memory works, in order to share a Numpy array The requirements for sharing memory between processes still apply here - we
between different processes you need to structure your code so that the need to make our data accessible at the *module level*, and we need to create
array(s) you want to share are accessible at the _module level_. Furthermore, our data before creating the `Pool`. And to achieve read and write capability,
we need to make sure that our input and output arrays are located in shared we also need to make sure that our input and output arrays are located in
memory - we can do this via the `Array` or `RawArray`. shared memory - we can do this via the `Array` or `RawArray`.
As an example, let's say we want to parallelise processing of an image by As an example, let's say we want to parallelise processing of an image by
...@@ -638,11 +861,18 @@ def process_dataset(data): ...@@ -638,11 +861,18 @@ def process_dataset(data):
# Make the input/output data # Make the input/output data
# accessible to the process_chunk # accessible to the process_chunk
# function. This must be done # function. This must be done
# *before* the worker pool is created. # *before* the worker pool is
# created - even though we are
# doing things differently to the
# read-only example, we are still
# making the data arrays accessible
# at the *module* level, so the
# memory they are stored in can be
# shared with the child processes.
process_chunk.input_data = sindata process_chunk.input_data = sindata
process_chunk.output_data = soutdata process_chunk.output_data = soutdata
# number of boxels to be computed # number of voxels to be computed
# by each worker process. # by each worker process.
nvox = int(data.size / nprocs) nvox = int(data.size / nprocs)
...@@ -661,12 +891,9 @@ def process_dataset(data): ...@@ -661,12 +891,9 @@ def process_dataset(data):
# process a list of indices, which # process a list of indices, which
# specify the data items which that # specify the data items which that
# worker process needs to compute. # worker process needs to compute.
xs = [xs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \ xs = [xs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [xs[nvox * nprocs:]]
[xs[nvox * nprocs:]] ys = [ys[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [ys[nvox * nprocs:]]
ys = [ys[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \ zs = [zs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + [zs[nvox * nprocs:]]
[ys[nvox * nprocs:]]
zs = [zs[nvox * i:nvox * i + nvox] for i in range(nprocs)] + \
[zs[nvox * nprocs:]]
# Build the argument lists for # Build the argument lists for
# each worker process. # each worker process.
...@@ -674,9 +901,8 @@ def process_dataset(data): ...@@ -674,9 +901,8 @@ def process_dataset(data):
# Create a pool of worker # Create a pool of worker
# processes and run the jobs. # processes and run the jobs.
pool = mp.Pool(processes=nprocs) with mp.Pool(processes=nprocs) as pool:
pool.starmap(process_chunk, args)
pool.starmap(process_chunk, args)
return outdata return outdata
``` ```
...@@ -686,12 +912,11 @@ Now we can call our `process_data` function just like any other function: ...@@ -686,12 +912,11 @@ Now we can call our `process_data` function just like any other function:
``` ```
data = np.array(np.arange(64).reshape((4, 4, 4)), dtype=np.float64) indata = np.array(np.arange(64).reshape((4, 4, 4)), dtype=np.float64)
outdata = process_dataset(indata)
outdata = process_dataset(data)
print('Input') print('Input')
print(data) print(indata)
print('Output') print('Output')
print(outdata) print(outdata)
......
%% Cell type:markdown id: tags:
# `fslpy`
**Important:** Portions of this practical require `fslpy` 2.9.0, due to be
released with FSL 6.0.4, in Spring 2020.
[`fslpy`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/) is a
Python library which is built into FSL, and contains a range of functionality
for working with FSL and with neuroimaging data from Python.
This practical highlights some of the most useful features provided by
`fslpy`. You may find `fslpy` useful if you are writing Python code to
perform analyses and image processing in conjunction with FSL.
* [The `Image` class, and other data types](#the-image-class-and-other-data-types)
* [Creating images](#creating-images)
* [Working with image data](#working-with-image-data)
* [Loading other file types](#loading-other-file-types)
* [NIfTI coordinate systems](#nifti-coordinate-systems)
* [Transformations and resampling](#transformations-and-resampling)
* [FSL wrapper functions](#fsl-wrapper-functions)
* [In-memory images](#in-memory-images)
* [Loading outputs into Python](#loading-outputs-into-python)
* [The `fslmaths` wrapper](#the-fslmaths-wrapper)
* [The `FileTree`](#the-filetree)
* [Describing your data](#describing-your-data)
* [Using the `FileTree`](#using-the-filetree)
* [Building a processing pipeline with `FileTree`](#building-a-processing-pipeline-with-filetree)
* [The `FileTreeQuery`](#the-filetreequery)
* [Calling shell commands](#calling-shell-commands)
* [The `runfsl` function](#the-runfsl-function)
* [Submitting to the cluster](#submitting-to-the-cluster)
* [Redirecting output](#redirecting-output)
* [FSL atlases](#fsl-atlases)
* [Querying atlases](#querying-atlases)
* [Loading atlas images](#loading-atlas-images)
* [Working with atlases](#working-with-atlases)
> **Note**: `fslpy` is distinct from `fslpython` - `fslpython` is the Python
> environment that is baked into FSL. `fslpy` is a Python library which is
> installed into the `fslpython` environment.
Let's start with some standard imports and environment set-up:
%% Cell type:code id: tags:
```
%matplotlib inline
import matplotlib.pyplot as plt
import os
import os.path as op
import nibabel as nib
import numpy as np
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(suppress=True, precision=4)
```
%% Cell type:markdown id: tags:
And a little function that we can use to generate a simple orthographic plot:
%% Cell type:code id: tags:
```
def ortho(data, voxel, fig=None, cursor=False, **kwargs):
"""Simple orthographic plot of a 3D array using matplotlib.
:arg data: 3D numpy array
:arg voxel: XYZ coordinates for each slice
:arg fig: Existing figure and axes for overlay plotting
:arg cursor: Show a cursor at the `voxel`
All other arguments are passed through to the `imshow` function.
:returns: The figure and orthogaxes (which can be passed back in as the
`fig` argument to plot overlays).
"""
voxel = [int(round(v)) for v in voxel]
data = np.asanyarray(data, dtype=np.float)
data[data <= 0] = np.nan
x, y, z = voxel
xslice = np.flipud(data[x, :, :].T)
yslice = np.flipud(data[:, y, :].T)
zslice = np.flipud(data[:, :, z].T)
if fig is None:
fig = plt.figure()
xax = fig.add_subplot(1, 3, 1)
yax = fig.add_subplot(1, 3, 2)
zax = fig.add_subplot(1, 3, 3)
else:
fig, xax, yax, zax = fig
xax.imshow(xslice, **kwargs)
yax.imshow(yslice, **kwargs)
zax.imshow(zslice, **kwargs)
if cursor:
cargs = {'color' : (0, 1, 0), 'linewidth' : 1}
xax.axvline( y, **cargs)
xax.axhline(data.shape[2] - z, **cargs)
yax.axvline( x, **cargs)
yax.axhline(data.shape[2] - z, **cargs)
zax.axvline( x, **cargs)
zax.axhline(data.shape[1] - y, **cargs)
for ax in (xax, yax, zax):
ax.set_xticks([])
ax.set_yticks([])
fig.tight_layout(pad=0)
return (fig, xax, yax, zax)
```
%% Cell type:markdown id: tags:
And another function which uses FSLeyes for more complex plots:
%% Cell type:code id: tags:
```
def render(cmdline):
import shlex
import IPython.display as display
prefix = '-of screenshot.png -hl -c 2 '
try:
from fsleyes.render import main
main(shlex.split(prefix + cmdline))
except ImportError:
# fall-back for macOS - we have to run
# FSLeyes render in a separate process
from fsl.utils.run import runfsl
prefix = 'render ' + prefix
runfsl(prefix + cmdline, env={})
return display.Image('screenshot.png')
```
%% Cell type:markdown id: tags:
<a class="anchor" id="the-image-class-and-other-data-types"></a>
## The `Image` class, and other data types
The
[`fsl.data.image`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.image.html#fsl.data.image.Image)
module provides the `Image` class, which sits on top of `nibabel` and contains
some handy functionality if you need to work with coordinate transformations,
or do some FSL-specific processing. The `Image` class provides features such
as:
- Support for NIFTI1, NIFTI2, and ANALYZE image files
- Access to affine transformations between the voxel, FSL and world coordinate
systems
- Ability to load metadata from BIDS sidecar files
> The `Image` class behaves differently to the `nibabel.Nifti1Image`. For
> example, when you create an `Image` object, the default behaviour is to load
> the image data into memory. This is configurable however; take a look at
> [the
> documentation](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.image.html#fsl.data.image.Image)
> to explore all of the options.
Some simple image processing routines are also provided - these are covered
[below](#image-processing).
<a class="anchor" id="creating-images"></a>
### Creating images
It's easy to create an `Image` - you can create one from a file name:
%% Cell type:code id: tags:
```
from fsl.data.image import Image
stddir = op.expandvars('${FSLDIR}/data/standard/')
# load a FSL image - the file
# suffix is optional, just like
# in real FSL-land!
std1mm = Image(op.join(stddir, 'MNI152_T1_1mm'))
print(std1mm)
```
%% Cell type:markdown id: tags:
You can create an `Image` from an existing `nibabel` image:
%% Cell type:code id: tags:
```
# load a nibabel image, and
# convert it into an FSL image
nibimg = nib.load(op.join(stddir, 'MNI152_T1_1mm.nii.gz'))
std1mm = Image(nibimg)
```
%% Cell type:markdown id: tags:
Or you can create an `Image` from a `numpy` array:
%% Cell type:code id: tags:
```
data = np.zeros((182, 218, 182))
img = Image(data, xform=np.eye(4))
```
%% Cell type:markdown id: tags:
If you have generated some data from another `Image` (or from a
`nibabel.Nifti1Image`) you can use the `header` option to set
the header information on the new image:
%% Cell type:code id: tags:
```
img = Image(data, header=std1mm.header)
```
%% Cell type:markdown id: tags:
You can save an image to file via the `save` method:
%% Cell type:code id: tags:
```
img.save('empty')
!ls
```
%% Cell type:markdown id: tags:
`Image` objects have all of the attributes you might expect:
%% Cell type:code id: tags:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std1mm = Image(op.join(stddir, 'MNI152_T1_1mm'))
print('name: ', std1mm.name)
print('file: ', std1mm.dataSource)
print('NIfTI version:', std1mm.niftiVersion)
print('ndim: ', std1mm.ndim)
print('shape: ', std1mm.shape)
print('dtype: ', std1mm.dtype)
print('nvals: ', std1mm.nvals)
print('pixdim: ', std1mm.pixdim)
```
%% Cell type:markdown id: tags:
and a number of useful methods:
%% Cell type:code id: tags:
```
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
mask2mm = Image(op.join(stddir, 'MNI152_T1_2mm_brain_mask'))
print(std1mm.sameSpace(std2mm))
print(std2mm.sameSpace(mask2mm))
print(std2mm.getAffine('voxel', 'world'))
```
%% Cell type:markdown id: tags:
An `Image` object is a high-level wrapper around a `nibabel` image object -
you can always work directly with the `nibabel` object via the `nibImage`
attribute:
%% Cell type:code id: tags:
```
print(std2mm)
print(std2mm.nibImage)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="working-with-image-data"></a>
### Working with image data
You can get the image data as a `numpy` array via the `data` attribute:
%% Cell type:code id: tags:
```
data = std2mm.data
print(data.min(), data.max())
ortho(data, (45, 54, 45))
```
%% Cell type:markdown id: tags:
> Note that `Image.data` will give you the data in its underlying type, unlike
> the `nibabel.get_fdata` method, which up-casts image data to floating-point.
You can also read and write data directly via the `Image` object:
%% Cell type:code id: tags:
```
slc = std2mm[:, :, 45]
std2mm[0:10, :, :] *= 2
```
%% Cell type:markdown id: tags:
Doing so has some advantages that may or may not be useful, depending on your
use-case:
- The image data will be kept on disk - only the parts that you access will
be loaded into RAM (you will also need to pass`loadData=False` when creating
the `Image` to achieve this).
- The `Image` object will keep track of modifications to the data - this can
be queried via the `saveState` attribute.
<a class="anchor" id="loading-other-file-types"></a>
### Loading other file types
The
[`fsl.data`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.html#module-fsl.data)
package has a number of other classes for working with different types of FSL
and neuroimaging data. Most of these are higher-level wrappers around the
corresponding `nibabel` types:
* The
[`fsl.data.bitmap.Bitmap`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.bitmap.html)
class can be used to load a bitmap image (e.g. `jpg`, `png`, etc) and
convert it to a NIfTI image.
* The
[`fsl.data.dicom.DicomImage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.dicom.html)
class uses `dcm2niix` to load NIfTI images contained within a DICOM
directory<sup>*</sup>.
* The
[`fsl.data.mghimage.MGHImage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.mghimage.html)
class can be used too load `.mgh`/`.mgz` images (they are converted into
NIfTI images).
* The
[`fsl.data.dtifit`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.dtifit.html)
module contains functions for loading and working with the output of the
FSL `dtifit` tool.
* The
[`fsl.data.featanalysis`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featanalysis.html),
[`fsl.data.featimage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featimage.html),
and
[`fsl.data.featdesign`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featdesign.html)
modules contain classes and functions for loading data from FEAT
directories.
* Similarly, the
[`fsl.data.melodicanalysis`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.melodicanalysis.html)
and
[`fsl.data.melodicimage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.melodicimage.html)
modules contain classes and functions for loading data from MELODIC
directories.
* The
[`fsl.data.gifti`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.gifti.html),
[`fsl.data.freesurfer`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.freesurfer.html),
and
[`fsl.data.vtk`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.vtk.html)
modules contain functionality form loading surface data from GIfTI,
freesurfer, and ASCII VTK files respectively.
> <sup>*</sup>You must make sure that
> [`dcm2niix`](https://github.com/rordenlab/dcm2niix/) is installed on your
> system in order to use this class.
<a class="anchor" id="nifti-coordinate-systems"></a>
### NIfTI coordinate systems
The `Image.getAffine` method gives you access to affine transformations which
can be used to convert coordinates between the different coordinate systems
associated with a NIfTI image. Have some MNI coordinates you'd like to convert
to voxels? Easy!
%% Cell type:code id: tags:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
mnicoords = np.array([[0, 0, 0],
[0, -18, 18]])
world2vox = std2mm.getAffine('world', 'voxel')
vox2world = std2mm.getAffine('voxel', 'world')
# Apply the world->voxel
# affine to the coordinates
voxcoords = (np.dot(world2vox[:3, :3], mnicoords.T)).T + world2vox[:3, 3]
```
%% Cell type:markdown id: tags:
The code above is a bit fiddly, so instead of figuring it out, you can just
use the
[`affine.transform`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.affine.html#fsl.transform.affine.transform)
function:
%% Cell type:code id: tags:
```
from fsl.transform.affine import transform
voxcoords = transform(mnicoords, world2vox)
# just to double check, let's transform
# those voxel coordinates back into world
# coordinates
backtomni = transform(voxcoords, vox2world)
for m, v, b in zip(mnicoords, voxcoords, backtomni):
print(m, '->', v, '->', b)
```
%% Cell type:markdown id: tags:
> The `Image.getAffine` method can give you transformation matrices
> between any of these coordinate systems:
>
> - `'voxel'`: Image data voxel coordinates
> - `'world'`: mm coordinates, defined by the sform/qform of an image
> - `'fsl'`: The FSL coordinate system, used internally by many FSL tools
> (e.g. FLIRT)
Oh, that example was too easy I hear you say? Try this one on for size. Let's
say we have run FEAT on some task fMRI data, and want to get the MNI
coordinates of the voxel with peak activation.
> This is what people used to use `Featquery` for, back in the un-enlightened
> days.
Let's start by identifying the voxel with the biggest t-statistic:
%% Cell type:code id: tags:
```
featdir = op.join('08_fslpy', 'fmri.feat')
tstat1 = Image(op.join(featdir, 'stats', 'tstat1')).data
# Recall from the numpy practical that
# argmax gives us a 1D index into a
# flattened view of the array. We can
# use the unravel_index function to
# convert it into a 3D index.
peakvox = np.abs(tstat1).argmax()
peakvox = np.unravel_index(peakvox, tstat1.shape)
print('Peak voxel coordinates for tstat1:', peakvox, tstat1[peakvox])
```
%% Cell type:markdown id: tags:
Now that we've got the voxel coordinates in functional space, we need to
transform them into MNI space. FEAT provides a transformation which goes
directly from functional to standard space, in the `reg` directory:
%% Cell type:code id: tags:
```
func2std = np.loadtxt(op.join(featdir, 'reg', 'example_func2standard.mat'))
```
%% Cell type:markdown id: tags:
But ... wait a minute ... this is a FLIRT matrix! We can't just plug voxel
coordinates into a FLIRT matrix and expect to get sensible results, because
FLIRT works in an internal FSL coordinate system, which is not quite
`'voxel'`, and not quite `'world'`. So we need to do a little more work.
Let's start by loading our functional image, and the MNI152 template (the
source and reference images of our FLIRT matrix):
%% Cell type:code id: tags:
```
func = Image(op.join(featdir, 'reg', 'example_func'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
```
%% Cell type:markdown id: tags:
Now we can use them to get affines which convert between all of the different
coordinate systems - we're going to combine them into a single uber-affine,
which transforms our functional-space voxels into MNI world coordinates via:
1. functional voxels -> FLIRT source space
2. FLIRT source space -> FLIRT reference space
3. FLIRT referece space -> MNI world coordinates
%% Cell type:code id: tags:
```
vox2fsl = func.getAffine('voxel', 'fsl')
fsl2mni = std .getAffine('fsl', 'world')
```
%% Cell type:markdown id: tags:
Combining two affines into one is just a simple dot-product. There is a
`concat()` function which does this for us, for any number of affines:
%% Cell type:code id: tags:
```
from fsl.transform.affine import concat
# To combine affines together, we
# have to list them in reverse -
# linear algebra is *weird*.
funcvox2mni = concat(fsl2mni, func2std, vox2fsl)
print(funcvox2mni)
```
%% Cell type:markdown id: tags:
> In the next section we will use the
> [`fsl.transform.flirt.fromFlirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.flirt.html#fsl.transform.flirt.fromFlirt)
> function, which does all of the above for us.
So we've now got some voxel coordinates from our functional data, and an
affine to transform into MNI world coordinates. The rest is easy:
%% Cell type:code id: tags:
```
mnicoords = transform(peakvox, funcvox2mni)
mnivoxels = transform(mnicoords, std.getAffine('world', 'voxel'))
mnivoxels = [int(round(v)) for v in mnivoxels]
print('Peak activation (MNI coordinates):', mnicoords)
print('Peak activation (MNI voxels): ', mnivoxels)
```
%% Cell type:markdown id: tags:
Note that in the above example we are only applying a linear transformation
into MNI space - in reality you would also want to apply your non-linear
structural-to-standard transformation too. This is covered in the next
section.
<a class="anchor" id="transformations-and-resampling"></a>
### Transformations and resampling
Now, it's all well and good to look at t-statistic values and voxel
coordinates and so on and so forth, but let's spice things up a bit and look
at some images. Let's display our peak activation location in MNI space. To do
this, we're going to resample our functional image into MNI space, so we can
overlay it on the MNI template. This can be done using some handy functions
from the
[`fsl.transform.flirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.flirt.html)
and
[`fsl.utils.image.resample`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.image.resample.html)
modules.
Let's make sure we've got our source and reference images loaded:
%% Cell type:code id: tags:
```
featdir = op.join(op.join('08_fslpy', 'fmri.feat'))
tstat1 = Image(op.join(featdir, 'stats', 'tstat1'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
```
%% Cell type:markdown id: tags:
Now we'll load the `example_func2standard` FLIRT matrix, and adjust it so that
it transforms from functional *world* coordinates into standard *world*
coordinates - this is what is expected by the `resampleToReference` function,
used below:
%% Cell type:code id: tags:
```
from fsl.transform.flirt import fromFlirt
func2std = np.loadtxt(op.join(featdir, 'reg', 'example_func2standard.mat'))
func2std = fromFlirt(func2std, tstat1, std, 'world', 'world')
```
%% Cell type:markdown id: tags:
Now we can use `resampleToReference` to resample our functional data into
MNI152 space. This function returns a `numpy` array containing the resampled
data, and an adjusted voxel-to-world affine transformation. But in this case,
we know that the data will be aligned to MNI152, so we can ignore the affine:
%% Cell type:code id: tags:
```
from fsl.utils.image.resample import resampleToReference
std_tstat1 = resampleToReference(tstat1, std, func2std)[0]
std_tstat1 = Image(std_tstat1, header=std.header)
```
%% Cell type:markdown id: tags:
Now that we have our t-statistic image in MNI152 space, we can plot it in
standard space using `matplotlib`:
%% Cell type:code id: tags:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
std_tstat1 = std_tstat1.data
std_tstat1[std_tstat1 < 3] = 0
fig = ortho(std2mm.data, mnivoxels, cmap=plt.cm.gray)
fig = ortho(std_tstat1, mnivoxels, cmap=plt.cm.inferno, fig=fig, cursor=True)
```
%% Cell type:markdown id: tags:
In the example above, we resampled some data from functional space into
standard space using a linear transformation. But we all know that this is not
how things work in the real world - linear transformations are for kids. The
real world is full of lions and tigers and bears and warp fields.
The
[`fsl.transform.fnirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.fnirt.html#fsl.transform.fnirt.fromFnirt)
and
[`fsl.transform.nonlinear`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.nonlinear.html)
modules contain classes and functions for working with FNIRT-style warp fields
(modules for working with lions, tigers, and bears are still under
development).
Let's imagine that we have defined an ROI in MNI152 space, and we want to
project it into the space of our functional data. We can do this by combining
the nonlinear structural to standard registration produced by FNIRT with the
linear functional to structural registration generated by FLIRT. First of
all, we'll load images from each of the functional, structural, and standard
spaces:
%% Cell type:code id: tags:
```
featdir = op.join('08_fslpy', 'fmri.feat')
func = Image(op.join(featdir, 'reg', 'example_func'))
struc = Image(op.join(featdir, 'reg', 'highres'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
```
%% Cell type:markdown id: tags:
Now, let's say we have obtained our seed location in MNI152 coordinates. Let's
convert them to MNI152 voxels just to double check:
%% Cell type:code id: tags:
```
seedmni = [-48, -74, -9]
seedmnivox = transform(seedmni, std.getAffine('world', 'voxel'))
ortho(std.data, seedmnivox, cursor=True)
```
%% Cell type:markdown id: tags:
Now we'll load the FNIRT warp field, which encodes a nonlinear transformation
from structural space to standard space. FNIRT warp fields are often stored as
*coefficient* fields to reduce the file size, but in order to use it, we must
convert the coefficient field into a *deformation* (a.k.a. *displacement*)
field. This takes a few seconds:
%% Cell type:code id: tags:
```
from fsl.transform.fnirt import readFnirt
from fsl.transform.nonlinear import coefficientFieldToDeformationField
struc2std = readFnirt(op.join(featdir, 'reg', 'highres2standard_warp'), struc, std)
struc2std = coefficientFieldToDeformationField(struc2std)
```
%% Cell type:markdown id: tags:
We'll also load our FLIRT functional to structural transformation, adjust it
so that it transforms between voxel coordinate systems instead of the FSL
coordinate system, and invert so it can transform from structural voxels to
functional voxels:
%% Cell type:code id: tags:
```
from fsl.transform.affine import invert
func2struc = np.loadtxt(op.join(featdir, 'reg', 'example_func2highres.mat'))
func2struc = fromFlirt(func2struc, func, struc, 'voxel', 'voxel')
struc2func = invert(func2struc)
```
%% Cell type:markdown id: tags:
Now we can transform our seed coordinates from MNI152 space into functional
space in two stages. First, we'll use our deformation field to transform from
MNI152 space into structural space:
%% Cell type:code id: tags:
```
seedstruc = struc2std.transform(seedmni, 'world', 'voxel')
seedfunc = transform(seedstruc, struc2func)
print('Seed location in MNI coordinates: ', seedmni)
print('Seed location in functional voxels:', seedfunc)
ortho(func.data, seedfunc, cursor=True)
```
%% Cell type:markdown id: tags:
> FNIRT warp fields kind of work backwards - we can use them to transform
> reference coordinates into source coordinates, but would need to invert the
> warp field using `invwarp` if we wanted to transform from source coordinates
> into referemce coordinates.
Of course, we can also use our deformation field to resample an image from
structural space into MNI152 space. The `applyDeformation` function takes an
`Image` and a `DeformationField`, and returns a `numpy` array containing the
resampled data.
%% Cell type:code id: tags:
```
from fsl.transform.nonlinear import applyDeformation
strucmni = applyDeformation(struc, struc2std)
# remove low-valued voxels,
# just for visualisation below
strucmni[strucmni < 1] = 0
fig = ortho(std.data, [45, 54, 45], cmap=plt.cm.gray)
fig = ortho(strucmni, [45, 54, 45], fig=fig)
```
%% Cell type:markdown id: tags:
The `premat` option to `applyDeformation` can be used to specify our linear
functional to structural transformation, and hence resample a functional image
into MNI152 space:
%% Cell type:code id: tags:
```
tstatmni = applyDeformation(tstat1, struc2std, premat=func2struc)
tstatmni[tstatmni < 3] = 0
fig = ortho(std.data, [45, 54, 45], cmap=plt.cm.gray)
fig = ortho(tstatmni, [45, 54, 45], fig=fig)
```
%% Cell type:markdown id: tags:
There are a few other useful functions tucked away in the
[`fsl.utils.image`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.image.html)
and
[`fsl.transform`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.html)
packages, with more to be added in the future.
<a class="anchor" id="fsl-wrapper-functions"></a>
## FSL wrapper functions
The
[fsl.wrappers](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.wrappers.html)
package is the home of "wrapper" functions for a range of FSL tools. You can
use them to call an FSL tool from Python code, without having to worry about
constructing a command-line, or saving/loading input/output images.
> The `fsl.wrappers` functions also allow you to submit jobs to be run on the
> cluster - this is described [below](#submitting-to-the-cluster).
You can use the FSL wrapper functions with file names, similar to calling the
corresponding tool via the command-line:
%% Cell type:code id: tags:
```
from fsl.wrappers import robustfov
robustfov('08_fslpy/bighead', 'bighead_cropped')
render('08_fslpy/bighead bighead_cropped -cm blue')
```
%% Cell type:markdown id: tags:
The `fsl.wrappers` functions strive to provide an interface which is as close
as possible to the command-line tool - most functions use positional arguments
for required options, and keyword arguments for all other options, with
argument names equivalent to command line option names. For example, the usage
for the command-line `bet` tool is as follows:
> ```
> Usage: bet <input> <output> [options]
>
> Main bet2 options:
> -o generate brain surface outline overlaid onto original image
> -m generate binary brain mask
> -s generate approximate skull image
> -n don't generate segmented brain image output
> -f <f> fractional intensity threshold (0->1); default=0.5; smaller values give larger brain outline estimates
> -g <g> vertical gradient in fractional intensity threshold (-1->1); default=0; positive values give larger brain outline at bottom, smaller at top
> -r <r> head radius (mm not voxels); initial surface sphere is set to half of this
> -c <x y z> centre-of-gravity (voxels not mm) of initial mesh surface.
> ...
> ```
So to use the `bet()` wrapper function, pass `<input>` and `<output>` as
positional arguments, and pass the additional options as keyword arguments:
%% Cell type:code id: tags:
```
from fsl.wrappers import bet
bet('bighead_cropped', 'bighead_cropped_brain', f=0.3, m=True, s=True)
render('bighead_cropped -b 40 '
'bighead_cropped_brain -cm hot '
'bighead_cropped_brain_skull -ot mask -mc 0.4 0.4 1 '
'bighead_cropped_brain_mask -ot mask -mc 0 1 0 -o -w 5')
```
%% Cell type:markdown id: tags:
> Some FSL commands accept arguments which cannot be used as Python
> identifiers - for example, the `-2D` option to `flirt` cannot be used as an
> identifier in Python, because it begins with a number. In situations like
> this, an alias is used. So to set the `-2D` option to `flirt`, you can do this:
>
> ```
> # "twod" applies the -2D flag
> flirt('source.nii.gz', 'ref.nii.gz', omat='src2ref.mat', twod=True)
> ```
>
> Some of the `fsl.wrappers` functions also support aliases which may make
> your code more readable. For example, when calling `bet`, you can use either
> `m=True` or `mask=True` to apply the `-m` command line flag.
<a class="anchor" id="in-memory-images"></a>
### In-memory images
It can be quite awkward to combine image processing with FSL tools and image
processing in Python. The `fsl.wrappers` package tries to make this a little
easier for you - if you are working with image data in Python, you can pass
`Image` or `nibabel` objects directly into `fsl.wrappers` functions - they will
be automatically saved to temporary files and passed to the underlying FSL
command:
%% Cell type:code id: tags:
```
cropped = Image('bighead_cropped')
bet(cropped, 'bighead_cropped_brain')
betted = Image('bighead_cropped_brain')
fig = ortho(cropped.data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="loading-outputs-into-python"></a>
### Loading outputs into Python
By using the special `fsl.wrappers.LOAD` symbol, you can also have any output
files produced by the tool automatically loaded into memory for you:
%% Cell type:code id: tags:
```
from fsl.wrappers import LOAD
cropped = Image('bighead_cropped')
# The loaded result is called "output",
# because that is the name of the
# argument in the bet wrapper function.
betted = bet(cropped, LOAD).output
fig = ortho(cropped.data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
%% Cell type:markdown id: tags:
You can use the `LOAD` symbol for any output argument - any output files which
are loaded will be available through the return value of the wrapper function:
%% Cell type:code id: tags:
```
from fsl.wrappers import flirt
std2mm = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
tstat1 = Image(op.join('08_fslpy', 'fmri.feat', 'stats', 'tstat1'))
func2std = np.loadtxt(op.join('08_fslpy', 'fmri.feat', 'reg', 'example_func2standard.mat'))
aligned = flirt(tstat1, std2mm, applyxfm=True, init=func2std, out=LOAD)
# Here the resampled tstat image
# is called "out", because that
# is the name of the flirt argument.
aligned = aligned.out.data
aligned[aligned < 1] = 0
peakvox = np.abs(aligned).argmax()
peakvox = np.unravel_index(peakvox, aligned.shape)
fig = ortho(std2mm .data, peakvox, cmap=plt.cm.gray)
fig = ortho(aligned.data, peakvox, cmap=plt.cm.inferno, fig=fig, cursor=True)
```
%% Cell type:markdown id: tags:
For tools like `bet` and `fast`, which expect an output *prefix* or
*basename*, you can just set the prefix to `LOAD` - all output files with that
prefix will be available in the object that is returned:
%% Cell type:code id: tags:
```
img = Image('bighead_cropped')
betted = bet(img, LOAD, f=0.3, mask=True)
fig = ortho(img .data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted.output .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
fig = ortho(betted.output_mask.data, (80, 112, 85), cmap=plt.cm.summer, fig=fig, alpha=0.5)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="the-fslmaths-wrapper"></a>
### The `fslmaths` wrapper
*Most* of the `fsl.wrappers` functions aim to provide an interface which is as
close as possible to the underlying FSL tool. Ideally, if you read the
command-line help for a tool, you should be able to figure out how to use the
corresponding wrapper function. The wrapper for the `fslmaths` command is a
little different, however. It provides more of an object-oriented interface,
which is hopefully a little easier to use from within Python.
You can apply an `fslmaths` operation by specifying the input image,
*chaining* method calls together, and finally calling the `run()` method. For
example:
%% Cell type:code id: tags:
```
from fsl.wrappers import fslmaths
fslmaths('bighead_cropped') \
.mas( 'bighead_cropped_brain_mask') \
.run( 'bighead_cropped_brain')
render('bighead_cropped bighead_cropped_brain -cm hot')
```
%% Cell type:markdown id: tags:
Of course, you can also use the `fslmaths` wrapper with in-memory images:
%% Cell type:code id: tags:
```
wholehead = Image('bighead_cropped')
brainmask = Image('bighead_cropped_brain_mask')
eroded = fslmaths(brainmask).ero().ero().run()
erodedbrain = fslmaths(wholehead).mas(eroded).run()
fig = ortho(wholehead .data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(brainmask .data, (80, 112, 85), cmap=plt.cm.summer, fig=fig)
fig = ortho(erodedbrain.data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="the-filetree"></a>
## The `FileTree`
The
[`fsl.utils.filetree`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.html)
library provides functionality which allows you to work with *structured data
directories*, such as HCP or BIDS datasets. You can use `filetree` for both
reading and for creating datasets.
This practical gives a very brief introduction to the `filetree` library -
refer to the [full
documentation](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.html)
to get a feel for how powerful it can be.
<a class="anchor" id="describing-your-data"></a>
### Describing your data
To introduce `filetree`, we'll begin with a small example. Imagine that we
have a dataset which looks like this:
> ```
> mydata
> ├── sub_A
> │ ├── ses_1
> │ │ └── T1w.nii.gz
> │ ├── ses_2
> │ │ └── T1w.nii.gz
> │ └── T2w.nii.gz
> ├── sub_B
> │ ├── ses_1
> │ │ └── T1w.nii.gz
> │ ├── ses_2
> │ │ └── T1w.nii.gz
> │ └── T2w.nii.gz
> └── sub_C
> ├── ses_1
> │ └── T1w.nii.gz
> ├── ses_2
> │ └── T1w.nii.gz
> └── T2w.nii.gz
> ```
(Run the code cell below to create a dummy data set with the above structure):
%% Cell type:code id: tags:
```
%%bash
for sub in A B C; do
subdir=mydata/sub_$sub/
mkdir -p $subdir
cp $FSLDIR/data/standard/MNI152_T1_2mm.nii.gz $subdir/T2w.nii.gz
for ses in 1 2; do
sesdir=$subdir/ses_$ses/
mkdir $sesdir
cp $FSLDIR/data/standard/MNI152_T1_2mm.nii.gz $sesdir/T1w.nii.gz
done
done
```
%% Cell type:markdown id: tags:
To use `filetree` with this dataset, we must first describe its structure - we
do this by creating a `.tree` file:
%% Cell type:code id: tags:
```
%%writefile mydata.tree
sub_{subject}
T2w.nii.gz
ses_{session}
T1w.nii.gz
```
%% Cell type:markdown id: tags:
A `.tree` file is simply a description of the structure of your data
directory - it describes the *file types* (also known as *templates*) which
are present in the dataset (`T1w` and `T2w`), and the *variables* which are
implicitly present in the structure of the dataset (`subject` and `session`).
<a class="anchor" id="using-the-filetree"></a>
### Using the `FileTree`
Now that we have a `.tree` file which describe our data, we can create a
`FileTree` to work with it:
%% Cell type:code id: tags:
```
from fsl.utils.filetree import FileTree
# Create a FileTree, giving
# it our tree specification,
# and the path to our data.
tree = FileTree.read('mydata.tree', 'mydata')
```
%% Cell type:markdown id: tags:
We can list all of the T1 images via the `FileTree.get_all` method. The
`glob_vars='all'` option tells the `FileTree` to fill in the `T1w` template
with all possible combinations of variables. The `FileTree.extract_variables`
method accepts a file path, and gives you back the variable values contained
within:
%% Cell type:code id: tags:
```
for t1file in tree.get_all('T1w', glob_vars='all'):
fvars = tree.extract_variables('T1w', t1file)
print(t1file, fvars)
```
%% Cell type:markdown id: tags:
The `FileTree.update` method allows you to "fill in" variable values; it
returns a new `FileTree` object which can be used on a selection of the
data set:
%% Cell type:code id: tags:
```
treeA = tree.update(subject='A')
for t1file in treeA.get_all('T1w', glob_vars='all'):
fvars = treeA.extract_variables('T1w', t1file)
print(t1file, fvars)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="building-a-processing-pipeline-with-filetree"></a>
### Building a processing pipeline with `FileTree`
Let's say we want to run BET on all of our T1 images. Let's start by modifying
our `.tree` definition to include the BET outputs:
%% Cell type:code id: tags:
```
%%writefile mydata.tree
sub_{subject}
T2w.nii.gz
ses_{session}
T1w.nii.gz
T1w_brain.nii.gz
T1w_brain_mask.nii.gz
```
%% Cell type:markdown id: tags:
Now we can use the `FileTree` to generate the relevant file names for us,
which we can then pass on to BET. Here we'll use the `FileTree.get_all_trees`
method to create a sub-tree for each subject and each session:
%% Cell type:code id: tags:
```
from fsl.wrappers import bet
tree = FileTree.read('mydata.tree', 'mydata')
for subtree in tree.get_all_trees('T1w', glob_vars='all'):
t1file = subtree.get('T1w')
t1brain = subtree.get('T1w_brain')
print('Running BET: {} -> {} ...'.format(t1file, t1brain))
bet(t1file, t1brain, mask=True)
print('Done!')
example = tree.update(subject='A', session='1')
render('{} {} -ot mask -o -w 2 -mc 0 1 0'.format(
example.get('T1w'),
example.get('T1w_brain_mask')))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="the-filetreequery"></a>
### The `FileTreeQuery`
The `filetree` module contains another class called the
[`FileTreeQuery`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.query.html),
which provides an interface that is more convenient if you are reading data
from large datasets with many different file types and variables.
When you create a `FileTreeQuery`, it scans the entire data directory and
identifies all of the values that are present for each variable defined in the
`.tree` file:
%% Cell type:code id: tags:
```
from fsl.utils.filetree import FileTreeQuery
tree = FileTree.read('mydata.tree', 'mydata')
query = FileTreeQuery(tree)
print('T1w variables:', query.variables('T1w'))
print('T2w variables:', query.variables('T2w'))
```
%% Cell type:markdown id: tags:
The `FileTreeQuery.query` method will return the paths to all existing files
which match a set of variable values:
%% Cell type:code id: tags:
```
print('All files for subject A')
for template in query.templates:
print(' {} files:'.format(template))
for match in query.query(template, subject='A'):
print(' ', match.filename)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="calling-shell-commands"></a>
## Calling shell commands
The
[`fsl.utils.run`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.run.html)
module provides the `run` and `runfsl` functions, which are wrappers around
the built-in [`subprocess`
library](https://docs.python.org/3/library/subprocess.html).
The default behaviour of `run` is to return the standard output of the
command:
%% Cell type:code id: tags:
```
from fsl.utils.run import run
# You can pass the command
# and its arguments as a single
# string, or as a sequence
print('Lines in this notebook:', run('wc -l 08_fslpy.md').strip())
print('Words in this notebook:', run(['wc', '-w', '08_fslpy.md']).strip())
```
%% Cell type:markdown id: tags:
But you can control what `run` returns, depending on your needs. Let's create
a little script to demonstrate the options:
%% Cell type:code id: tags:
```
%%writefile mycmd
#!/usr/bin/env bash
exitcode=$1
echo "Standard output!"
echo "Standard error :(" >&2
exit $exitcode
```
%% Cell type:markdown id: tags:
And let's not forget to make it executable:
%% Cell type:code id: tags:
```
!chmod a+x mycmd
```
%% Cell type:markdown id: tags:
You can use the `stdout`, `stderr` and `exitcode` arguments to control the
return value:
%% Cell type:code id: tags:
```
print('run("./mycmd 0"): ',
run("./mycmd 0").strip())
print('run("./mycmd 0", stdout=False): ',
run("./mycmd 0", stdout=False))
print('run("./mycmd 0", exitcode=True):',
run("./mycmd 0", exitcode=True))
print('run("./mycmd 0", stdout=False, exitcode=True):',
run("./mycmd 0", stdout=False, exitcode=True))
print('run("./mycmd 0", stderr=True): ',
run("./mycmd 0", stderr=True))
print('run("./mycmd 0", stdout=False, stderr=True): ',
run("./mycmd 0", stdout=False, stderr=True).strip())
print('run("./mycmd 0", stderr=True, exitcode=True):',
run("./mycmd 0", stderr=True, exitcode=True))
print('run("./mycmd 1", exitcode=True):',
run("./mycmd 1", exitcode=True))
print('run("./mycmd 1", stdout=False, exitcode=True):',
run("./mycmd 1", stdout=False, exitcode=True))
```
%% Cell type:markdown id: tags:
So if only one of `stdout`, `stderr`, or `exitcode` is `True`, `run` will only
return the corresponding value. Otherwise `run` will return a tuple which
contains the requested outputs.
If you run a command which returns a non-0 exit code, the default behaviour
(if you don't set `exitcode=True`) is for a `RuntimeError` to be raised:
%% Cell type:code id: tags:
```
run("./mycmd 99")
```
%% Cell type:markdown id: tags:
<a class="anchor" id="the-runfsl-function"></a>
### The `runfsl` function
The `runfsl` function is a wrapper around `run` which simply makes sure that
the command you are calling is inside the `$FSLDIR/bin/` directory. It has the
same usage as the `run` function:
%% Cell type:code id: tags:
```
from fsl.utils.run import runfsl
runfsl('bet bighead_cropped bighead_cropped_brain')
runfsl('fslroi bighead_cropped_brain bighead_slices 0 -1 0 -1 90 3')
runfsl('fast -o bighead_fast bighead_slices')
render('-vl 80 112 91 -xh -yh '
'bighead_cropped '
'bighead_slices.nii.gz -cm brain_colours_1hot -b 30 '
'bighead_fast_seg.nii.gz -ot label -o')
```
%% Cell type:markdown id: tags:
<a class="anchor" id="submitting-to-the-cluster"></a>
### Submitting to the cluster
Both the `run` and `runfsl` accept an argument called `submit`, which allows
you to submit jobs to be executed on the cluster via the FSL `fsl_sub`
command.
> Cluster submission is handled by the
> [`fsl.utils.fslsub`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.fslsub.html)
> module - it contains lower level functions for managing and querying jobs
> that have been submitted to the cluster. The functions defined in this
> module can be used directly if you have more complicated requirements.
The semantics of the `run` and `runfsl` functions are slightly different when
you use the `submit` option - when you submit a job, the `run`/`runfsl`
functions will return immediately, and will return a string containing the job
ID:
%% Cell type:code id: tags:
```
jobid = run('ls', submit=True)
print('Job ID:', jobid)
```
%% Cell type:markdown id: tags:
Once the job finishes, we should be able to read the usual `.o` and `.e`
files:
%% Cell type:code id: tags:
```
stdout = f'ls.o{jobid}'
print('Job output')
print(open(stdout).read())
```
%% Cell type:markdown id: tags:
All of the `fsl.wrappers` functions also accept the `submit` argument:
%% Cell type:code id: tags:
```
jobid = bet('08_fslpy/bighead', 'bighead_brain', submit=True)
print('Job ID:', jobid)
```
%% Cell type:markdown id: tags:
> But an error will occur if you try to pass in-memory images, or `LOAD` any
> outputs when you call a wrapper function with `submit=True`.
After submitting a job, you can use the `wait` function to wait until a job
has completed:
%% Cell type:code id: tags:
```
from fsl.utils.run import wait
jobid = bet('08_fslpy/bighead', 'bighead_brain', submit=True)
print('Job ID:', jobid)
wait(jobid)
print('Done!')
render('08_fslpy/bighead bighead_brain -cm hot')
```
%% Cell type:markdown id: tags:
When you use `submit=True`, you can also specify cluster submission options -
you can include any arguments that are accepted by the
[`fslsub.submit`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.fslsub.html#fsl.utils.fslsub.submit)
function
%% Cell type:code id: tags:
```
jobs = []
jobs.append(runfsl('robustfov -i 08_fslpy/bighead -r bighead_cropped', submit=True, queue='short.q'))
jobs.append(runfsl('bet bighead_cropped bighead_brain', submit=True, queue='short.q', wait_for=jobs[-1]))
jobs.append(runfsl('fslroi bighead_brain bighead_slices 0 -1 111 3 0 -1', submit=True, queue='short.q', wait_for=jobs[-1]))
jobs.append(runfsl('fast -o bighead_fast bighead_slices', submit=True, queue='short.q', wait_for=jobs[-1]))
print('Waiting for', jobs, '...')
wait(jobs)
render('-vl 80 112 91 -xh -zh -hc '
'bighead_brain '
'bighead_slices.nii.gz -cm brain_colours_1hot -b 30 '
'bighead_fast_seg.nii.gz -ot label -o')
```
%% Cell type:markdown id: tags:
<a class="anchor" id="redirecting-output"></a>
### Redirecting output
The `log` option, accepted by both `run` and `fslrun`, allows for more
fine-grained control over what is done with the standard output and error
streams.
You can use `'tee'` to redirect the standard output and error streams of the
command to the standard output and error streams of the calling command (your
python script):
%% Cell type:code id: tags:
```
print('Teeing:')
_ = run('./mycmd 0', log={'tee' : True})
```
%% Cell type:markdown id: tags:
Or you can use `'stdout'` and `'stderr'` to redirect the standard output and
error streams of the command to files:
%% Cell type:code id: tags:
```
with open('stdout.log', 'wt') as o, \
open('stderr.log', 'wt') as e:
run('./mycmd 0', log={'stdout' : o, 'stderr' : e})
print('\nRedirected stdout:')
!cat stdout.log
print('\nRedirected stderr:')
!cat stderr.log
```
%% Cell type:markdown id: tags:
Finally, you can use `'cmd'` to log the command itself to a file (useful for
pipeline logging):
%% Cell type:code id: tags:
```
with open('commands.log', 'wt') as cmdlog:
run('./mycmd 0', log={'cmd' : cmdlog})
run('wc -l 08_fslpy.md', log={'cmd' : cmdlog})
print('\nCommand log:')
!cat commands.log
```
%% Cell type:markdown id: tags:
<a class="anchor" id="fsl-atlases"></a>
## FSL atlases
The
[`fsl.data.atlases`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.atlases.html)
module provides access to all of the atlas images that are stored in the
`$FSLDIR/data/atlases/` directory of a standard FSL installation. It can be
used to load and query probabilistic and label-based atlases.
The `atlases` module needs to be initialised using the `rescanAtlases` function:
%% Cell type:code id: tags:
```
import fsl.data.atlases as atlases
atlases.rescanAtlases()
```
%% Cell type:markdown id: tags:
<a class="anchor" id="querying-atlases"></a>
### Querying atlases
You can list all of the available atlases using `listAtlases`:
%% Cell type:code id: tags:
```
for desc in atlases.listAtlases():
print(desc)
```
%% Cell type:markdown id: tags:
`listAtlases` returns a list of `AtlasDescription` objects, each of which
contains descriptive information about one atlas. You can retrieve the
`AtlasDescription` for a specific atlas via the `getAtlasDescription`
function:
%% Cell type:code id: tags:
```
desc = atlases.getAtlasDescription('harvardoxford-cortical')
print(desc.name)
print(desc.atlasID)
print(desc.specPath)
print(desc.atlasType)
```
%% Cell type:markdown id: tags:
Each `AtlasDescription` maintains a list of `AtlasLabel` objects, each of
which represents one region that is defined in the atlas. You can access all
of the `AtlasLabel` objects via the `labels` attribute:
%% Cell type:code id: tags:
```
for lbl in desc.labels[:5]:
print(lbl)
```
%% Cell type:markdown id: tags:
Or you can retrieve a specific label using the `find` method:
%% Cell type:code id: tags:
```
# search by region name
print(desc.find(name='Occipital Pole'))
# or by label value
print(desc.find(value=48))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="loading-atlas-images"></a>
### Loading atlas images
The `loadAtlas` function can be used to load the atlas image:
%% Cell type:code id: tags:
```
# For probabilistic atlases, you
# can ask for the 3D ROI image
# by setting loadSummary=True.
# You can also request a
# resolution - by default the
# highest resolution version
# will be loaded.
lblatlas = atlases.loadAtlas('harvardoxford-cortical',
loadSummary=True,
resolution=2)
# By default you will get the 4D
# probabilistic atlas image (for
# atlases for which this is
# available).
probatlas = atlases.loadAtlas('harvardoxford-cortical',
resolution=2)
print(lblatlas)
print(probatlas)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="working-with-atlases"></a>
### Working with atlases
Both `LabelAtlas` and `ProbabilisticAtlas` objects have a method called `get`,
which can be used to extract ROI images for a specific region:
%% Cell type:code id: tags:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
frontal = lblatlas.get(name='Frontal Pole').data
frontal = np.ma.masked_where(frontal < 1, frontal)
fig = ortho(std2mm.data, (45, 54, 45), cmap=plt.cm.gray)
fig = ortho(frontal, (45, 54, 45), cmap=plt.cm.winter, fig=fig)
```
%% Cell type:markdown id: tags:
Calling `get` on a `ProbabilisticAtlas` will return a probability image:
%% Cell type:code id: tags:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
frontal = probatlas.get(name='Frontal Pole').data
frontal = np.ma.masked_where(frontal < 1, frontal)
fig = ortho(std2mm.data, (45, 54, 45), cmap=plt.cm.gray)
fig = ortho(frontal, (45, 54, 45), cmap=plt.cm.inferno, fig=fig)
```
%% Cell type:markdown id: tags:
The `get` method can be used to retrieve an image for a region by:
- an `AtlasLabel` object
- The region index
- The region value
- The region name
`LabelAtlas` objects have a method called `label`, which can be used to
interrogate the atlas at specific locations:
%% Cell type:code id: tags:
```
# The label method accepts 3D
# voxel or world coordinates
val = lblatlas.label((25, 52, 43), voxel=True)
lbl = lblatlas.find(value=val)
print('Region at voxel [25, 52, 43]: {} [{}]'.format(val, lbl.name))
# or a 3D weighted or binary mask
mask = np.zeros(lblatlas.shape)
mask[30:60, 30:60, 30:60] = 1
mask = Image(mask, header=lblatlas.header)
lbls, props = lblatlas.label(mask)
print('Labels in mask:')
for lbl, prop in zip(lbls, props):
lblname = lblatlas.find(value=lbl).name
print(' {} [{}]: {:0.2f}%'.format(lbl, lblname, prop))
```
%% Cell type:markdown id: tags:
`ProbabilisticAtlas` objects have an analogous method called `values`:
%% Cell type:code id: tags:
```
vals = probatlas.values((25, 52, 43), voxel=True)
print('Regions at voxel [25, 52, 43]:')
for idx, val in enumerate(vals):
if val > 0:
lbl = probatlas.find(index=idx)
print(' {} [{}]: {:0.2f}%'.format(lbl.value, lbl.name, val))
print('Average proportions of regions within mask:')
vals = probatlas.values(mask)
for idx, val in enumerate(vals):
if val > 0:
lbl = probatlas.find(index=idx)
print(' {} [{}]: {:0.2f}%'.format(lbl.value, lbl.name, val))
```
# `fslpy` # `fslpy`
`fslpy` is a Python library which is built into FSL, and contains a range of
functionality for working with neuroimaging data in an FSL context. **Important:** Portions of this practical require `fslpy` 2.9.0, due to be
released with FSL 6.0.4, in Spring 2020.
[`fslpy`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/) is a
Python library which is built into FSL, and contains a range of functionality
for working with FSL and with neuroimaging data from Python.
This practical highlights some of the most useful features provided by This practical highlights some of the most useful features provided by
`fslpy`. You may find `fslpy` useful if you are writing Python code to `fslpy`. You may find `fslpy` useful if you are writing Python code to
perform analyses and image processing in conjunction with FSL. perform analyses and image processing in conjunction with FSL.
* [The `Image` class, and other data types](#the-image-class-and-other-data-types)
* [Creating images](#creating-images)
* [Working with image data](#working-with-image-data)
* [Loading other file types](#loading-other-file-types)
* [NIfTI coordinate systems](#nifti-coordinate-systems)
* [Transformations and resampling](#transformations-and-resampling)
* [FSL wrapper functions](#fsl-wrapper-functions)
* [In-memory images](#in-memory-images)
* [Loading outputs into Python](#loading-outputs-into-python)
* [The `fslmaths` wrapper](#the-fslmaths-wrapper)
* [The `FileTree`](#the-filetree)
* [Describing your data](#describing-your-data)
* [Using the `FileTree`](#using-the-filetree)
* [Building a processing pipeline with `FileTree`](#building-a-processing-pipeline-with-filetree)
* [The `FileTreeQuery`](#the-filetreequery)
* [Calling shell commands](#calling-shell-commands)
* [The `runfsl` function](#the-runfsl-function)
* [Submitting to the cluster](#submitting-to-the-cluster)
* [Redirecting output](#redirecting-output)
* [FSL atlases](#fsl-atlases)
* [Querying atlases](#querying-atlases)
* [Loading atlas images](#loading-atlas-images)
* [Working with atlases](#working-with-atlases)
> **Note**: `fslpy` is distinct from `fslpython` - `fslpython` is the Python > **Note**: `fslpy` is distinct from `fslpython` - `fslpython` is the Python
> environment that is baked into FSL. `fslpy` is a Python library which is > environment that is baked into FSL. `fslpy` is a Python library which is
> installed into the `fslpython` environment. > installed into the `fslpython` environment.
* [The `Image` class, and other data types](#the-image-class-and-other-data-types) Let's start with some standard imports and environment set-up:
* [FSL atlases](#fsl-atlases)
* [The `filetree`](#the-filetree)
* [NIfTI coordinate systems](#nifti-coordinate-systems) ```
* [Image processing](#image-processing) %matplotlib inline
* [FSL wrapper functions](#fsl-wrapper-functions) import matplotlib.pyplot as plt
import os
import os.path as op
import nibabel as nib
import numpy as np
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(suppress=True, precision=4)
```
And a little function that we can use to generate a simple orthographic plot:
```
def ortho(data, voxel, fig=None, cursor=False, **kwargs):
"""Simple orthographic plot of a 3D array using matplotlib.
:arg data: 3D numpy array
:arg voxel: XYZ coordinates for each slice
:arg fig: Existing figure and axes for overlay plotting
:arg cursor: Show a cursor at the `voxel`
All other arguments are passed through to the `imshow` function.
:returns: The figure and orthogaxes (which can be passed back in as the
`fig` argument to plot overlays).
"""
voxel = [int(round(v)) for v in voxel]
data = np.asanyarray(data, dtype=np.float)
data[data <= 0] = np.nan
x, y, z = voxel
xslice = np.flipud(data[x, :, :].T)
yslice = np.flipud(data[:, y, :].T)
zslice = np.flipud(data[:, :, z].T)
if fig is None:
fig = plt.figure()
xax = fig.add_subplot(1, 3, 1)
yax = fig.add_subplot(1, 3, 2)
zax = fig.add_subplot(1, 3, 3)
else:
fig, xax, yax, zax = fig
xax.imshow(xslice, **kwargs)
yax.imshow(yslice, **kwargs)
zax.imshow(zslice, **kwargs)
if cursor:
cargs = {'color' : (0, 1, 0), 'linewidth' : 1}
xax.axvline( y, **cargs)
xax.axhline(data.shape[2] - z, **cargs)
yax.axvline( x, **cargs)
yax.axhline(data.shape[2] - z, **cargs)
zax.axvline( x, **cargs)
zax.axhline(data.shape[1] - y, **cargs)
for ax in (xax, yax, zax):
ax.set_xticks([])
ax.set_yticks([])
fig.tight_layout(pad=0)
return (fig, xax, yax, zax)
```
And another function which uses FSLeyes for more complex plots:
```
def render(cmdline):
import shlex
import IPython.display as display
prefix = '-of screenshot.png -hl -c 2 '
try:
from fsleyes.render import main
main(shlex.split(prefix + cmdline))
except ImportError:
# fall-back for macOS - we have to run
# FSLeyes render in a separate process
from fsl.utils.run import runfsl
prefix = 'render ' + prefix
runfsl(prefix + cmdline, env={})
return display.Image('screenshot.png')
```
<a class="anchor" id="the-image-class-and-other-data-types"></a> <a class="anchor" id="the-image-class-and-other-data-types"></a>
## The `Image` class, and other data types ## The `Image` class, and other data types
The `fsl.data.image` module provides the `Image` class, which sits on top of The
`nibabel` and contains some handy functionality if you need to work with [`fsl.data.image`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.image.html#fsl.data.image.Image)
coordinate transformations, or do some FSL-specific processing. The `Image` module provides the `Image` class, which sits on top of `nibabel` and contains
class provides features such as: some handy functionality if you need to work with coordinate transformations,
or do some FSL-specific processing. The `Image` class provides features such
as:
- Support for NIFTI1, NIFTI2, and ANALYZE image files - Support for NIFTI1, NIFTI2, and ANALYZE image files
- Access to affine transformations between the voxel, FSL and world coordinate - Access to affine transformations between the voxel, FSL and world coordinate
systems systems
- Ability to load metadata from BIDS sidecar files - Ability to load metadata from BIDS sidecar files
> The `Image` class behaves differently to the `nibabel.Nifti1Image`. For
> example, when you create an `Image` object, the default behaviour is to load
> the image data into memory. This is configurable however; take a look at
> [the
> documentation](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.image.html#fsl.data.image.Image)
> to explore all of the options.
Some simple image processing routines are also provided - these are covered Some simple image processing routines are also provided - these are covered
[below](#image-processing). [below](#image-processing).
<a class="anchor" id="creating-images"></a>
### Creating images ### Creating images
It's easy to create an `Image` - you can create one from a file name: It's easy to create an `Image` - you can create one from a file name:
``` ```
from fsl.data.image import Image from fsl.data.image import Image
stddir = op.expandvars('${FSLDIR}/data/standard/') stddir = op.expandvars('${FSLDIR}/data/standard/')
# load a FSL image - the file # load a FSL image - the file
# suffix is optional, just like # suffix is optional, just like
# in real FSL-land! # in real FSL-land!
img = Image(op.join(stddir, 'MNI152_T1_1mm')) std1mm = Image(op.join(stddir, 'MNI152_T1_1mm'))
print(std1mm)
``` ```
You can crearte an `Image` from an existing `nibabel` image:
You can create an `Image` from an existing `nibabel` image:
``` ```
# load a nibabel image, and # load a nibabel image, and
# convert it into an FSL image # convert it into an FSL image
nibimg = nib.load(op.join(stddir, 'MNI152_T1_1mm.nii.gz')) nibimg = nib.load(op.join(stddir, 'MNI152_T1_1mm.nii.gz'))
img = Image(nibimg) std1mm = Image(nibimg)
`` ```
Or you can create an `Image` from a `numpy` array: Or you can create an `Image` from a `numpy` array:
``` ```
data = np.zeros((100, 100, 100)) data = np.zeros((182, 218, 182))
img = Image(data, xform=np.eye(4)) img = Image(data, xform=np.eye(4))
``` ```
If you have generated some data from another `Image` (or from a
`nibabel.Nifti1Image`) you can use the `header` option to set
the header information on the new image:
```
img = Image(data, header=std1mm.header)
```
<a class="anchor" id="fsl-atlases"></a> You can save an image to file via the `save` method:
## FSL atlases
<a class="anchor" id="the-filetree"></a>
## The `filetree`
<a class="anchor" id="nifti-coordinate-systems"></a> ```
## NIfTI coordinate systems img.save('empty')
!ls
```
<a class="anchor" id="image-processing"></a>
## Image processing
<a class="anchor" id="fsl-wrapper-functions"></a> `Image` objects have all of the attributes you might expect:
## FSL wrapper functions
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std1mm = Image(op.join(stddir, 'MNI152_T1_1mm'))
print('name: ', std1mm.name)
print('file: ', std1mm.dataSource)
print('NIfTI version:', std1mm.niftiVersion)
print('ndim: ', std1mm.ndim)
print('shape: ', std1mm.shape)
print('dtype: ', std1mm.dtype)
print('nvals: ', std1mm.nvals)
print('pixdim: ', std1mm.pixdim)
```
and a number of useful methods:
```
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
mask2mm = Image(op.join(stddir, 'MNI152_T1_2mm_brain_mask'))
print(std1mm.sameSpace(std2mm))
print(std2mm.sameSpace(mask2mm))
print(std2mm.getAffine('voxel', 'world'))
```
An `Image` object is a high-level wrapper around a `nibabel` image object -
you can always work directly with the `nibabel` object via the `nibImage`
attribute:
```
print(std2mm)
print(std2mm.nibImage)
```
<a class="anchor" id="working-with-image-data"></a>
### Working with image data
You can get the image data as a `numpy` array via the `data` attribute:
```
data = std2mm.data
print(data.min(), data.max())
ortho(data, (45, 54, 45))
```
> Note that `Image.data` will give you the data in its underlying type, unlike
> the `nibabel.get_fdata` method, which up-casts image data to floating-point.
You can also read and write data directly via the `Image` object:
```
slc = std2mm[:, :, 45]
std2mm[0:10, :, :] *= 2
```
Doing so has some advantages that may or may not be useful, depending on your
use-case:
- The image data will be kept on disk - only the parts that you access will
be loaded into RAM (you will also need to pass`loadData=False` when creating
the `Image` to achieve this).
- The `Image` object will keep track of modifications to the data - this can
be queried via the `saveState` attribute.
<a class="anchor" id="loading-other-file-types"></a>
### Loading other file types
The
[`fsl.data`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.html#module-fsl.data)
package has a number of other classes for working with different types of FSL
and neuroimaging data. Most of these are higher-level wrappers around the
corresponding `nibabel` types:
* The
[`fsl.data.bitmap.Bitmap`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.bitmap.html)
class can be used to load a bitmap image (e.g. `jpg`, `png`, etc) and
convert it to a NIfTI image.
* The
[`fsl.data.dicom.DicomImage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.dicom.html)
class uses `dcm2niix` to load NIfTI images contained within a DICOM
directory<sup>*</sup>.
* The
[`fsl.data.mghimage.MGHImage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.mghimage.html)
class can be used too load `.mgh`/`.mgz` images (they are converted into
NIfTI images).
* The
[`fsl.data.dtifit`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.dtifit.html)
module contains functions for loading and working with the output of the
FSL `dtifit` tool.
* The
[`fsl.data.featanalysis`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featanalysis.html),
[`fsl.data.featimage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featimage.html),
and
[`fsl.data.featdesign`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.featdesign.html)
modules contain classes and functions for loading data from FEAT
directories.
* Similarly, the
[`fsl.data.melodicanalysis`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.melodicanalysis.html)
and
[`fsl.data.melodicimage`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.melodicimage.html)
modules contain classes and functions for loading data from MELODIC
directories.
* The
[`fsl.data.gifti`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.gifti.html),
[`fsl.data.freesurfer`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.freesurfer.html),
and
[`fsl.data.vtk`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.vtk.html)
modules contain functionality form loading surface data from GIfTI,
freesurfer, and ASCII VTK files respectively.
> <sup>*</sup>You must make sure that
> [`dcm2niix`](https://github.com/rordenlab/dcm2niix/) is installed on your
> system in order to use this class.
<a class="anchor" id="nifti-coordinate-systems"></a> <a class="anchor" id="nifti-coordinate-systems"></a>
## NIfTI coordinate systems ### NIfTI coordinate systems
The `Image.getAffine` method gives you access to affine transformations which
can be used to convert coordinates between the different coordinate systems
associated with a NIfTI image. Have some MNI coordinates you'd like to convert
to voxels? Easy!
The `getAffine` method gives you access to affine transformations which can be
used to convert coordinates between the different coordinate systems
associated with an image. Have some MNI coordinates you'd like to convert to
voxels? Easy!
``` ```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
mnicoords = np.array([[0, 0, 0], mnicoords = np.array([[0, 0, 0],
[0, -18, 18]]) [0, -18, 18]])
world2vox = img.getAffine('world', 'voxel') world2vox = std2mm.getAffine('world', 'voxel')
vox2world = img.getAffine('voxel', 'world') vox2world = std2mm.getAffine('voxel', 'world')
# Apply the world->voxel # Apply the world->voxel
# affine to the coordinates # affine to the coordinates
voxcoords = (np.dot(world2vox[:3, :3], mnicoords.T)).T + world2vox[:3, 3] voxcoords = (np.dot(world2vox[:3, :3], mnicoords.T)).T + world2vox[:3, 3]
```
# The code above is a bit fiddly, so
# instead of figuring it out, you can The code above is a bit fiddly, so instead of figuring it out, you can just
# just use the transform() function: use the
[`affine.transform`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.affine.html#fsl.transform.affine.transform)
function:
```
from fsl.transform.affine import transform from fsl.transform.affine import transform
voxcoords = transform(mnicoords, world2vox) voxcoords = transform(mnicoords, world2vox)
# just to double check, let's transform # just to double check, let's transform
...@@ -126,6 +409,7 @@ for m, v, b in zip(mnicoords, voxcoords, backtomni): ...@@ -126,6 +409,7 @@ for m, v, b in zip(mnicoords, voxcoords, backtomni):
print(m, '->', v, '->', b) print(m, '->', v, '->', b)
``` ```
> The `Image.getAffine` method can give you transformation matrices > The `Image.getAffine` method can give you transformation matrices
> between any of these coordinate systems: > between any of these coordinate systems:
> >
...@@ -135,23 +419,21 @@ for m, v, b in zip(mnicoords, voxcoords, backtomni): ...@@ -135,23 +419,21 @@ for m, v, b in zip(mnicoords, voxcoords, backtomni):
> (e.g. FLIRT) > (e.g. FLIRT)
Oh, that example was too easy I hear you say? Try this one on for size. Let's Oh, that example was too easy I hear you say? Try this one on for size. Let's
say we have run FEAT on some task fMRI data, and want to get the MNI say we have run FEAT on some task fMRI data, and want to get the MNI
coordinates of the voxel with peak activation. coordinates of the voxel with peak activation.
> This is what people used to use `Featquery` for, back in the un-enlightened > This is what people used to use `Featquery` for, back in the un-enlightened
> days. > days.
Let's start by identifying the voxel with the biggest t-statistic: Let's start by identifying the voxel with the biggest t-statistic:
``` ```
featdir = op.join(op.join('05_nifti', 'fmri.feat')) featdir = op.join('08_fslpy', 'fmri.feat')
# The Image.data attribute returns a
# numpy array containing, well, the
# image data.
tstat1 = Image(op.join(featdir, 'stats', 'tstat1')).data tstat1 = Image(op.join(featdir, 'stats', 'tstat1')).data
# Recall from the numpy practical that # Recall from the numpy practical that
...@@ -169,10 +451,12 @@ Now that we've got the voxel coordinates in functional space, we need to ...@@ -169,10 +451,12 @@ Now that we've got the voxel coordinates in functional space, we need to
transform them into MNI space. FEAT provides a transformation which goes transform them into MNI space. FEAT provides a transformation which goes
directly from functional to standard space, in the `reg` directory: directly from functional to standard space, in the `reg` directory:
``` ```
func2std = np.loadtxt(op.join(featdir, 'reg', 'example_func2standard.mat')) func2std = np.loadtxt(op.join(featdir, 'reg', 'example_func2standard.mat'))
``` ```
But ... wait a minute ... this is a FLIRT matrix! We can't just plug voxel But ... wait a minute ... this is a FLIRT matrix! We can't just plug voxel
coordinates into a FLIRT matrix and expect to get sensible results, because coordinates into a FLIRT matrix and expect to get sensible results, because
FLIRT works in an internal FSL coordinate system, which is not quite FLIRT works in an internal FSL coordinate system, which is not quite
...@@ -180,6 +464,7 @@ FLIRT works in an internal FSL coordinate system, which is not quite ...@@ -180,6 +464,7 @@ FLIRT works in an internal FSL coordinate system, which is not quite
Let's start by loading our functional image, and the MNI152 template (the Let's start by loading our functional image, and the MNI152 template (the
source and reference images of our FLIRT matrix): source and reference images of our FLIRT matrix):
``` ```
func = Image(op.join(featdir, 'reg', 'example_func')) func = Image(op.join(featdir, 'reg', 'example_func'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm'))) std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
...@@ -200,9 +485,11 @@ vox2fsl = func.getAffine('voxel', 'fsl') ...@@ -200,9 +485,11 @@ vox2fsl = func.getAffine('voxel', 'fsl')
fsl2mni = std .getAffine('fsl', 'world') fsl2mni = std .getAffine('fsl', 'world')
``` ```
Combining two affines into one is just a simple dot-product. There is a Combining two affines into one is just a simple dot-product. There is a
`concat()` function which does this for us, for any number of affines: `concat()` function which does this for us, for any number of affines:
``` ```
from fsl.transform.affine import concat from fsl.transform.affine import concat
...@@ -210,38 +497,1091 @@ from fsl.transform.affine import concat ...@@ -210,38 +497,1091 @@ from fsl.transform.affine import concat
# have to list them in reverse - # have to list them in reverse -
# linear algebra is *weird*. # linear algebra is *weird*.
funcvox2mni = concat(fsl2mni, func2std, vox2fsl) funcvox2mni = concat(fsl2mni, func2std, vox2fsl)
print(funcvox2mni)
``` ```
So we've now got some voxel coordinates from our functional data, and an affine > In the next section we will use the
to transform into MNI world coordinates. The rest is easy: > [`fsl.transform.flirt.fromFlirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.flirt.html#fsl.transform.flirt.fromFlirt)
> function, which does all of the above for us.
So we've now got some voxel coordinates from our functional data, and an
affine to transform into MNI world coordinates. The rest is easy:
``` ```
mnicoords = transform(peakvox, funcvox2mni) mnicoords = transform(peakvox, funcvox2mni)
mnivoxels = transform(mnicoords, std.getAffine('world', 'voxel'))
mnivoxels = [int(round(v)) for v in mnivoxels]
print('Peak activation (MNI coordinates):', mnicoords) print('Peak activation (MNI coordinates):', mnicoords)
print('Peak activation (MNI voxels): ', mnivoxels)
``` ```
> Note that in the above example we are only applying a linear transformation Note that in the above example we are only applying a linear transformation
> into MNI space - in reality you would also want to apply your non-linear into MNI space - in reality you would also want to apply your non-linear
> structural-to-standard transformation too. But this is left as an exercise structural-to-standard transformation too. This is covered in the next
> for the reader ;). section.
<a class="anchor" id="transformations-and-resampling"></a>
### Transformations and resampling
<a class="anchor" id="image-processing"></a>
## Image processing
Now, it's all well and good to look at t-statistric values and voxel Now, it's all well and good to look at t-statistic values and voxel
coordinates and so on and so forth, but let's spice things up a bit and look coordinates and so on and so forth, but let's spice things up a bit and look
at some images. Let's display our peak activation location in MNI space. To at some images. Let's display our peak activation location in MNI space. To do
do this, we're going to resample our functional image into this, we're going to resample our functional image into MNI space, so we can
overlay it on the MNI template. This can be done using some handy functions
from the
[`fsl.transform.flirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.flirt.html)
and
[`fsl.utils.image.resample`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.image.resample.html)
modules.
Let's make sure we've got our source and reference images loaded:
```
featdir = op.join(op.join('08_fslpy', 'fmri.feat'))
tstat1 = Image(op.join(featdir, 'stats', 'tstat1'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
```
Now we'll load the `example_func2standard` FLIRT matrix, and adjust it so that
it transforms from functional *world* coordinates into standard *world*
coordinates - this is what is expected by the `resampleToReference` function,
used below:
```
from fsl.transform.flirt import fromFlirt
func2std = np.loadtxt(op.join(featdir, 'reg', 'example_func2standard.mat'))
func2std = fromFlirt(func2std, tstat1, std, 'world', 'world')
```
Now we can use `resampleToReference` to resample our functional data into
MNI152 space. This function returns a `numpy` array containing the resampled
data, and an adjusted voxel-to-world affine transformation. But in this case,
we know that the data will be aligned to MNI152, so we can ignore the affine:
```
from fsl.utils.image.resample import resampleToReference
std_tstat1 = resampleToReference(tstat1, std, func2std)[0]
std_tstat1 = Image(std_tstat1, header=std.header)
``` ```
from IPython.display import Image as Screenshot
!fsleyes render -of screenshot.png -std
Now that we have our t-statistic image in MNI152 space, we can plot it in
standard space using `matplotlib`:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
std_tstat1 = std_tstat1.data
std_tstat1[std_tstat1 < 3] = 0
fig = ortho(std2mm.data, mnivoxels, cmap=plt.cm.gray)
fig = ortho(std_tstat1, mnivoxels, cmap=plt.cm.inferno, fig=fig, cursor=True)
```
In the example above, we resampled some data from functional space into
standard space using a linear transformation. But we all know that this is not
how things work in the real world - linear transformations are for kids. The
real world is full of lions and tigers and bears and warp fields.
The
[`fsl.transform.fnirt`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.fnirt.html#fsl.transform.fnirt.fromFnirt)
and
[`fsl.transform.nonlinear`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.nonlinear.html)
modules contain classes and functions for working with FNIRT-style warp fields
(modules for working with lions, tigers, and bears are still under
development).
Let's imagine that we have defined an ROI in MNI152 space, and we want to
project it into the space of our functional data. We can do this by combining
the nonlinear structural to standard registration produced by FNIRT with the
linear functional to structural registration generated by FLIRT. First of
all, we'll load images from each of the functional, structural, and standard
spaces:
```
featdir = op.join('08_fslpy', 'fmri.feat')
func = Image(op.join(featdir, 'reg', 'example_func'))
struc = Image(op.join(featdir, 'reg', 'highres'))
std = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
```
Now, let's say we have obtained our seed location in MNI152 coordinates. Let's
convert them to MNI152 voxels just to double check:
```
seedmni = [-48, -74, -9]
seedmnivox = transform(seedmni, std.getAffine('world', 'voxel'))
ortho(std.data, seedmnivox, cursor=True)
```
Now we'll load the FNIRT warp field, which encodes a nonlinear transformation
from structural space to standard space. FNIRT warp fields are often stored as
*coefficient* fields to reduce the file size, but in order to use it, we must
convert the coefficient field into a *deformation* (a.k.a. *displacement*)
field. This takes a few seconds:
```
from fsl.transform.fnirt import readFnirt
from fsl.transform.nonlinear import coefficientFieldToDeformationField
struc2std = readFnirt(op.join(featdir, 'reg', 'highres2standard_warp'), struc, std)
struc2std = coefficientFieldToDeformationField(struc2std)
```
We'll also load our FLIRT functional to structural transformation, adjust it
so that it transforms between voxel coordinate systems instead of the FSL
coordinate system, and invert so it can transform from structural voxels to
functional voxels:
```
from fsl.transform.affine import invert
func2struc = np.loadtxt(op.join(featdir, 'reg', 'example_func2highres.mat'))
func2struc = fromFlirt(func2struc, func, struc, 'voxel', 'voxel')
struc2func = invert(func2struc)
```
Now we can transform our seed coordinates from MNI152 space into functional
space in two stages. First, we'll use our deformation field to transform from
MNI152 space into structural space:
```
seedstruc = struc2std.transform(seedmni, 'world', 'voxel')
seedfunc = transform(seedstruc, struc2func)
print('Seed location in MNI coordinates: ', seedmni)
print('Seed location in functional voxels:', seedfunc)
ortho(func.data, seedfunc, cursor=True)
```
> FNIRT warp fields kind of work backwards - we can use them to transform
> reference coordinates into source coordinates, but would need to invert the
> warp field using `invwarp` if we wanted to transform from source coordinates
> into referemce coordinates.
Of course, we can also use our deformation field to resample an image from
structural space into MNI152 space. The `applyDeformation` function takes an
`Image` and a `DeformationField`, and returns a `numpy` array containing the
resampled data.
```
from fsl.transform.nonlinear import applyDeformation
strucmni = applyDeformation(struc, struc2std)
# remove low-valued voxels,
# just for visualisation below
strucmni[strucmni < 1] = 0
fig = ortho(std.data, [45, 54, 45], cmap=plt.cm.gray)
fig = ortho(strucmni, [45, 54, 45], fig=fig)
```
The `premat` option to `applyDeformation` can be used to specify our linear
functional to structural transformation, and hence resample a functional image
into MNI152 space:
```
tstatmni = applyDeformation(tstat1, struc2std, premat=func2struc)
tstatmni[tstatmni < 3] = 0
fig = ortho(std.data, [45, 54, 45], cmap=plt.cm.gray)
fig = ortho(tstatmni, [45, 54, 45], fig=fig)
```
There are a few other useful functions tucked away in the
[`fsl.utils.image`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.image.html)
and
[`fsl.transform`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.html)
packages, with more to be added in the future.
<a class="anchor" id="fsl-wrapper-functions"></a>
## FSL wrapper functions
The
[fsl.wrappers](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.wrappers.html)
package is the home of "wrapper" functions for a range of FSL tools. You can
use them to call an FSL tool from Python code, without having to worry about
constructing a command-line, or saving/loading input/output images.
> The `fsl.wrappers` functions also allow you to submit jobs to be run on the
> cluster - this is described [below](#submitting-to-the-cluster).
You can use the FSL wrapper functions with file names, similar to calling the
corresponding tool via the command-line:
```
from fsl.wrappers import robustfov
robustfov('08_fslpy/bighead', 'bighead_cropped')
render('08_fslpy/bighead bighead_cropped -cm blue')
```
The `fsl.wrappers` functions strive to provide an interface which is as close
as possible to the command-line tool - most functions use positional arguments
for required options, and keyword arguments for all other options, with
argument names equivalent to command line option names. For example, the usage
for the command-line `bet` tool is as follows:
> ```
> Usage: bet <input> <output> [options]
>
> Main bet2 options:
> -o generate brain surface outline overlaid onto original image
> -m generate binary brain mask
> -s generate approximate skull image
> -n don't generate segmented brain image output
> -f <f> fractional intensity threshold (0->1); default=0.5; smaller values give larger brain outline estimates
> -g <g> vertical gradient in fractional intensity threshold (-1->1); default=0; positive values give larger brain outline at bottom, smaller at top
> -r <r> head radius (mm not voxels); initial surface sphere is set to half of this
> -c <x y z> centre-of-gravity (voxels not mm) of initial mesh surface.
> ...
> ```
So to use the `bet()` wrapper function, pass `<input>` and `<output>` as
positional arguments, and pass the additional options as keyword arguments:
```
from fsl.wrappers import bet
bet('bighead_cropped', 'bighead_cropped_brain', f=0.3, m=True, s=True)
render('bighead_cropped -b 40 '
'bighead_cropped_brain -cm hot '
'bighead_cropped_brain_skull -ot mask -mc 0.4 0.4 1 '
'bighead_cropped_brain_mask -ot mask -mc 0 1 0 -o -w 5')
```
> Some FSL commands accept arguments which cannot be used as Python
> identifiers - for example, the `-2D` option to `flirt` cannot be used as an
> identifier in Python, because it begins with a number. In situations like
> this, an alias is used. So to set the `-2D` option to `flirt`, you can do this:
>
> ```
> # "twod" applies the -2D flag
> flirt('source.nii.gz', 'ref.nii.gz', omat='src2ref.mat', twod=True)
> ```
>
> Some of the `fsl.wrappers` functions also support aliases which may make
> your code more readable. For example, when calling `bet`, you can use either
> `m=True` or `mask=True` to apply the `-m` command line flag.
<a class="anchor" id="in-memory-images"></a>
### In-memory images
It can be quite awkward to combine image processing with FSL tools and image
processing in Python. The `fsl.wrappers` package tries to make this a little
easier for you - if you are working with image data in Python, you can pass
`Image` or `nibabel` objects directly into `fsl.wrappers` functions - they will
be automatically saved to temporary files and passed to the underlying FSL
command:
```
cropped = Image('bighead_cropped')
bet(cropped, 'bighead_cropped_brain')
betted = Image('bighead_cropped_brain')
fig = ortho(cropped.data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
<a class="anchor" id="loading-outputs-into-python"></a>
### Loading outputs into Python
By using the special `fsl.wrappers.LOAD` symbol, you can also have any output
files produced by the tool automatically loaded into memory for you:
```
from fsl.wrappers import LOAD
cropped = Image('bighead_cropped')
# The loaded result is called "output",
# because that is the name of the
# argument in the bet wrapper function.
betted = bet(cropped, LOAD).output
fig = ortho(cropped.data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
You can use the `LOAD` symbol for any output argument - any output files which
are loaded will be available through the return value of the wrapper function:
``` ```
from fsl.wrappers import flirt
std2mm = Image(op.expandvars(op.join('$FSLDIR', 'data', 'standard', 'MNI152_T1_2mm')))
tstat1 = Image(op.join('08_fslpy', 'fmri.feat', 'stats', 'tstat1'))
func2std = np.loadtxt(op.join('08_fslpy', 'fmri.feat', 'reg', 'example_func2standard.mat'))
aligned = flirt(tstat1, std2mm, applyxfm=True, init=func2std, out=LOAD)
# Here the resampled tstat image
# is called "out", because that
# is the name of the flirt argument.
aligned = aligned.out.data
aligned[aligned < 1] = 0
peakvox = np.abs(aligned).argmax()
peakvox = np.unravel_index(peakvox, aligned.shape)
fig = ortho(std2mm .data, peakvox, cmap=plt.cm.gray)
fig = ortho(aligned.data, peakvox, cmap=plt.cm.inferno, fig=fig, cursor=True)
```
### (Advanced) Transform coordinates with nonlinear warpfields
For tools like `bet` and `fast`, which expect an output *prefix* or
*basename*, you can just set the prefix to `LOAD` - all output files with that
prefix will be available in the object that is returned:
have to use your own dataset
```
img = Image('bighead_cropped')
betted = bet(img, LOAD, f=0.3, mask=True)
fig = ortho(img .data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(betted.output .data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
fig = ortho(betted.output_mask.data, (80, 112, 85), cmap=plt.cm.summer, fig=fig, alpha=0.5)
```
<a class="anchor" id="the-fslmaths-wrapper"></a>
### The `fslmaths` wrapper
*Most* of the `fsl.wrappers` functions aim to provide an interface which is as
close as possible to the underlying FSL tool. Ideally, if you read the
command-line help for a tool, you should be able to figure out how to use the
corresponding wrapper function. The wrapper for the `fslmaths` command is a
little different, however. It provides more of an object-oriented interface,
which is hopefully a little easier to use from within Python.
You can apply an `fslmaths` operation by specifying the input image,
*chaining* method calls together, and finally calling the `run()` method. For
example:
```
from fsl.wrappers import fslmaths
fslmaths('bighead_cropped') \
.mas( 'bighead_cropped_brain_mask') \
.run( 'bighead_cropped_brain')
render('bighead_cropped bighead_cropped_brain -cm hot')
```
Of course, you can also use the `fslmaths` wrapper with in-memory images:
```
wholehead = Image('bighead_cropped')
brainmask = Image('bighead_cropped_brain_mask')
eroded = fslmaths(brainmask).ero().ero().run()
erodedbrain = fslmaths(wholehead).mas(eroded).run()
fig = ortho(wholehead .data, (80, 112, 85), cmap=plt.cm.gray)
fig = ortho(brainmask .data, (80, 112, 85), cmap=plt.cm.summer, fig=fig)
fig = ortho(erodedbrain.data, (80, 112, 85), cmap=plt.cm.inferno, fig=fig)
```
<a class="anchor" id="the-filetree"></a>
## The `FileTree`
The
[`fsl.utils.filetree`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.html)
library provides functionality which allows you to work with *structured data
directories*, such as HCP or BIDS datasets. You can use `filetree` for both
reading and for creating datasets.
This practical gives a very brief introduction to the `filetree` library -
refer to the [full
documentation](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.html)
to get a feel for how powerful it can be.
<a class="anchor" id="describing-your-data"></a>
### Describing your data
To introduce `filetree`, we'll begin with a small example. Imagine that we
have a dataset which looks like this:
> ```
> mydata
> ├── sub_A
> │ ├── ses_1
> │ │ └── T1w.nii.gz
> │ ├── ses_2
> │ │ └── T1w.nii.gz
> │ └── T2w.nii.gz
> ├── sub_B
> │ ├── ses_1
> │ │ └── T1w.nii.gz
> │ ├── ses_2
> │ │ └── T1w.nii.gz
> │ └── T2w.nii.gz
> └── sub_C
> ├── ses_1
> │ └── T1w.nii.gz
> ├── ses_2
> │ └── T1w.nii.gz
> └── T2w.nii.gz
> ```
(Run the code cell below to create a dummy data set with the above structure):
```
%%bash
for sub in A B C; do
subdir=mydata/sub_$sub/
mkdir -p $subdir
cp $FSLDIR/data/standard/MNI152_T1_2mm.nii.gz $subdir/T2w.nii.gz
for ses in 1 2; do
sesdir=$subdir/ses_$ses/
mkdir $sesdir
cp $FSLDIR/data/standard/MNI152_T1_2mm.nii.gz $sesdir/T1w.nii.gz
done
done
```
To use `filetree` with this dataset, we must first describe its structure - we
do this by creating a `.tree` file:
```
%%writefile mydata.tree
sub_{subject}
T2w.nii.gz
ses_{session}
T1w.nii.gz
```
A `.tree` file is simply a description of the structure of your data
directory - it describes the *file types* (also known as *templates*) which
are present in the dataset (`T1w` and `T2w`), and the *variables* which are
implicitly present in the structure of the dataset (`subject` and `session`).
<a class="anchor" id="using-the-filetree"></a>
### Using the `FileTree`
Now that we have a `.tree` file which describe our data, we can create a
`FileTree` to work with it:
```
from fsl.utils.filetree import FileTree
# Create a FileTree, giving
# it our tree specification,
# and the path to our data.
tree = FileTree.read('mydata.tree', 'mydata')
```
We can list all of the T1 images via the `FileTree.get_all` method. The
`glob_vars='all'` option tells the `FileTree` to fill in the `T1w` template
with all possible combinations of variables. The `FileTree.extract_variables`
method accepts a file path, and gives you back the variable values contained
within:
```
for t1file in tree.get_all('T1w', glob_vars='all'):
fvars = tree.extract_variables('T1w', t1file)
print(t1file, fvars)
```
The `FileTree.update` method allows you to "fill in" variable values; it
returns a new `FileTree` object which can be used on a selection of the
data set:
```
treeA = tree.update(subject='A')
for t1file in treeA.get_all('T1w', glob_vars='all'):
fvars = treeA.extract_variables('T1w', t1file)
print(t1file, fvars)
```
<a class="anchor" id="building-a-processing-pipeline-with-filetree"></a>
### Building a processing pipeline with `FileTree`
Let's say we want to run BET on all of our T1 images. Let's start by modifying
our `.tree` definition to include the BET outputs:
```
%%writefile mydata.tree
sub_{subject}
T2w.nii.gz
ses_{session}
T1w.nii.gz
T1w_brain.nii.gz
T1w_brain_mask.nii.gz
```
Now we can use the `FileTree` to generate the relevant file names for us,
which we can then pass on to BET. Here we'll use the `FileTree.get_all_trees`
method to create a sub-tree for each subject and each session:
```
from fsl.wrappers import bet
tree = FileTree.read('mydata.tree', 'mydata')
for subtree in tree.get_all_trees('T1w', glob_vars='all'):
t1file = subtree.get('T1w')
t1brain = subtree.get('T1w_brain')
print('Running BET: {} -> {} ...'.format(t1file, t1brain))
bet(t1file, t1brain, mask=True)
print('Done!')
example = tree.update(subject='A', session='1')
render('{} {} -ot mask -o -w 2 -mc 0 1 0'.format(
example.get('T1w'),
example.get('T1w_brain_mask')))
```
<a class="anchor" id="the-filetreequery"></a>
### The `FileTreeQuery`
The `filetree` module contains another class called the
[`FileTreeQuery`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.query.html),
which provides an interface that is more convenient if you are reading data
from large datasets with many different file types and variables.
When you create a `FileTreeQuery`, it scans the entire data directory and
identifies all of the values that are present for each variable defined in the
`.tree` file:
```
from fsl.utils.filetree import FileTreeQuery
tree = FileTree.read('mydata.tree', 'mydata')
query = FileTreeQuery(tree)
print('T1w variables:', query.variables('T1w'))
print('T2w variables:', query.variables('T2w'))
```
The `FileTreeQuery.query` method will return the paths to all existing files
which match a set of variable values:
```
print('All files for subject A')
for template in query.templates:
print(' {} files:'.format(template))
for match in query.query(template, subject='A'):
print(' ', match.filename)
```
<a class="anchor" id="calling-shell-commands"></a>
## Calling shell commands
The
[`fsl.utils.run`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.run.html)
module provides the `run` and `runfsl` functions, which are wrappers around
the built-in [`subprocess`
library](https://docs.python.org/3/library/subprocess.html).
The default behaviour of `run` is to return the standard output of the
command:
```
from fsl.utils.run import run
# You can pass the command
# and its arguments as a single
# string, or as a sequence
print('Lines in this notebook:', run('wc -l 08_fslpy.md').strip())
print('Words in this notebook:', run(['wc', '-w', '08_fslpy.md']).strip())
```
But you can control what `run` returns, depending on your needs. Let's create
a little script to demonstrate the options:
```
%%writefile mycmd
#!/usr/bin/env bash
exitcode=$1
echo "Standard output!"
echo "Standard error :(" >&2
exit $exitcode
```
And let's not forget to make it executable:
```
!chmod a+x mycmd
```
You can use the `stdout`, `stderr` and `exitcode` arguments to control the
return value:
```
print('run("./mycmd 0"): ',
run("./mycmd 0").strip())
print('run("./mycmd 0", stdout=False): ',
run("./mycmd 0", stdout=False))
print('run("./mycmd 0", exitcode=True):',
run("./mycmd 0", exitcode=True))
print('run("./mycmd 0", stdout=False, exitcode=True):',
run("./mycmd 0", stdout=False, exitcode=True))
print('run("./mycmd 0", stderr=True): ',
run("./mycmd 0", stderr=True))
print('run("./mycmd 0", stdout=False, stderr=True): ',
run("./mycmd 0", stdout=False, stderr=True).strip())
print('run("./mycmd 0", stderr=True, exitcode=True):',
run("./mycmd 0", stderr=True, exitcode=True))
print('run("./mycmd 1", exitcode=True):',
run("./mycmd 1", exitcode=True))
print('run("./mycmd 1", stdout=False, exitcode=True):',
run("./mycmd 1", stdout=False, exitcode=True))
```
So if only one of `stdout`, `stderr`, or `exitcode` is `True`, `run` will only
return the corresponding value. Otherwise `run` will return a tuple which
contains the requested outputs.
If you run a command which returns a non-0 exit code, the default behaviour
(if you don't set `exitcode=True`) is for a `RuntimeError` to be raised:
```
run("./mycmd 99")
```
<a class="anchor" id="the-runfsl-function"></a>
### The `runfsl` function
The `runfsl` function is a wrapper around `run` which simply makes sure that
the command you are calling is inside the `$FSLDIR/bin/` directory. It has the
same usage as the `run` function:
```
from fsl.utils.run import runfsl
runfsl('bet bighead_cropped bighead_cropped_brain')
runfsl('fslroi bighead_cropped_brain bighead_slices 0 -1 0 -1 90 3')
runfsl('fast -o bighead_fast bighead_slices')
render('-vl 80 112 91 -xh -yh '
'bighead_cropped '
'bighead_slices.nii.gz -cm brain_colours_1hot -b 30 '
'bighead_fast_seg.nii.gz -ot label -o')
```
<a class="anchor" id="submitting-to-the-cluster"></a>
### Submitting to the cluster
Both the `run` and `runfsl` accept an argument called `submit`, which allows
you to submit jobs to be executed on the cluster via the FSL `fsl_sub`
command.
> Cluster submission is handled by the
> [`fsl.utils.fslsub`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.fslsub.html)
> module - it contains lower level functions for managing and querying jobs
> that have been submitted to the cluster. The functions defined in this
> module can be used directly if you have more complicated requirements.
The semantics of the `run` and `runfsl` functions are slightly different when
you use the `submit` option - when you submit a job, the `run`/`runfsl`
functions will return immediately, and will return a string containing the job
ID:
```
jobid = run('ls', submit=True)
print('Job ID:', jobid)
```
Once the job finishes, we should be able to read the usual `.o` and `.e`
files:
```
stdout = f'ls.o{jobid}'
print('Job output')
print(open(stdout).read())
```
All of the `fsl.wrappers` functions also accept the `submit` argument:
```
jobid = bet('08_fslpy/bighead', 'bighead_brain', submit=True)
print('Job ID:', jobid)
```
> But an error will occur if you try to pass in-memory images, or `LOAD` any
> outputs when you call a wrapper function with `submit=True`.
After submitting a job, you can use the `wait` function to wait until a job
has completed:
```
from fsl.utils.run import wait
jobid = bet('08_fslpy/bighead', 'bighead_brain', submit=True)
print('Job ID:', jobid)
wait(jobid)
print('Done!')
render('08_fslpy/bighead bighead_brain -cm hot')
```
When you use `submit=True`, you can also specify cluster submission options -
you can include any arguments that are accepted by the
[`fslsub.submit`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.fslsub.html#fsl.utils.fslsub.submit)
function
```
jobs = []
jobs.append(runfsl('robustfov -i 08_fslpy/bighead -r bighead_cropped', submit=True, queue='short.q'))
jobs.append(runfsl('bet bighead_cropped bighead_brain', submit=True, queue='short.q', wait_for=jobs[-1]))
jobs.append(runfsl('fslroi bighead_brain bighead_slices 0 -1 111 3 0 -1', submit=True, queue='short.q', wait_for=jobs[-1]))
jobs.append(runfsl('fast -o bighead_fast bighead_slices', submit=True, queue='short.q', wait_for=jobs[-1]))
print('Waiting for', jobs, '...')
wait(jobs)
render('-vl 80 112 91 -xh -zh -hc '
'bighead_brain '
'bighead_slices.nii.gz -cm brain_colours_1hot -b 30 '
'bighead_fast_seg.nii.gz -ot label -o')
```
<a class="anchor" id="redirecting-output"></a>
### Redirecting output
The `log` option, accepted by both `run` and `fslrun`, allows for more
fine-grained control over what is done with the standard output and error
streams.
You can use `'tee'` to redirect the standard output and error streams of the
command to the standard output and error streams of the calling command (your
python script):
```
print('Teeing:')
_ = run('./mycmd 0', log={'tee' : True})
```
Or you can use `'stdout'` and `'stderr'` to redirect the standard output and
error streams of the command to files:
```
with open('stdout.log', 'wt') as o, \
open('stderr.log', 'wt') as e:
run('./mycmd 0', log={'stdout' : o, 'stderr' : e})
print('\nRedirected stdout:')
!cat stdout.log
print('\nRedirected stderr:')
!cat stderr.log
```
Finally, you can use `'cmd'` to log the command itself to a file (useful for
pipeline logging):
```
with open('commands.log', 'wt') as cmdlog:
run('./mycmd 0', log={'cmd' : cmdlog})
run('wc -l 08_fslpy.md', log={'cmd' : cmdlog})
print('\nCommand log:')
!cat commands.log
```
<a class="anchor" id="fsl-atlases"></a>
## FSL atlases
The
[`fsl.data.atlases`](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.data.atlases.html)
module provides access to all of the atlas images that are stored in the
`$FSLDIR/data/atlases/` directory of a standard FSL installation. It can be
used to load and query probabilistic and label-based atlases.
The `atlases` module needs to be initialised using the `rescanAtlases` function:
```
import fsl.data.atlases as atlases
atlases.rescanAtlases()
```
<a class="anchor" id="querying-atlases"></a>
### Querying atlases
You can list all of the available atlases using `listAtlases`:
```
for desc in atlases.listAtlases():
print(desc)
```
`listAtlases` returns a list of `AtlasDescription` objects, each of which
contains descriptive information about one atlas. You can retrieve the
`AtlasDescription` for a specific atlas via the `getAtlasDescription`
function:
```
desc = atlases.getAtlasDescription('harvardoxford-cortical')
print(desc.name)
print(desc.atlasID)
print(desc.specPath)
print(desc.atlasType)
```
Each `AtlasDescription` maintains a list of `AtlasLabel` objects, each of
which represents one region that is defined in the atlas. You can access all
of the `AtlasLabel` objects via the `labels` attribute:
```
for lbl in desc.labels[:5]:
print(lbl)
```
Or you can retrieve a specific label using the `find` method:
```
# search by region name
print(desc.find(name='Occipital Pole'))
# or by label value
print(desc.find(value=48))
```
<a class="anchor" id="loading-atlas-images"></a>
### Loading atlas images
The `loadAtlas` function can be used to load the atlas image:
```
# For probabilistic atlases, you
# can ask for the 3D ROI image
# by setting loadSummary=True.
# You can also request a
# resolution - by default the
# highest resolution version
# will be loaded.
lblatlas = atlases.loadAtlas('harvardoxford-cortical',
loadSummary=True,
resolution=2)
# By default you will get the 4D
# probabilistic atlas image (for
# atlases for which this is
# available).
probatlas = atlases.loadAtlas('harvardoxford-cortical',
resolution=2)
print(lblatlas)
print(probatlas)
```
<a class="anchor" id="working-with-atlases"></a>
### Working with atlases
Both `LabelAtlas` and `ProbabilisticAtlas` objects have a method called `get`,
which can be used to extract ROI images for a specific region:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
frontal = lblatlas.get(name='Frontal Pole').data
frontal = np.ma.masked_where(frontal < 1, frontal)
fig = ortho(std2mm.data, (45, 54, 45), cmap=plt.cm.gray)
fig = ortho(frontal, (45, 54, 45), cmap=plt.cm.winter, fig=fig)
```
Calling `get` on a `ProbabilisticAtlas` will return a probability image:
```
stddir = op.expandvars('${FSLDIR}/data/standard/')
std2mm = Image(op.join(stddir, 'MNI152_T1_2mm'))
frontal = probatlas.get(name='Frontal Pole').data
frontal = np.ma.masked_where(frontal < 1, frontal)
fig = ortho(std2mm.data, (45, 54, 45), cmap=plt.cm.gray)
fig = ortho(frontal, (45, 54, 45), cmap=plt.cm.inferno, fig=fig)
```
The `get` method can be used to retrieve an image for a region by:
- an `AtlasLabel` object
- The region index
- The region value
- The region name
`LabelAtlas` objects have a method called `label`, which can be used to
interrogate the atlas at specific locations:
```
# The label method accepts 3D
# voxel or world coordinates
val = lblatlas.label((25, 52, 43), voxel=True)
lbl = lblatlas.find(value=val)
print('Region at voxel [25, 52, 43]: {} [{}]'.format(val, lbl.name))
# or a 3D weighted or binary mask
mask = np.zeros(lblatlas.shape)
mask[30:60, 30:60, 30:60] = 1
mask = Image(mask, header=lblatlas.header)
lbls, props = lblatlas.label(mask)
print('Labels in mask:')
for lbl, prop in zip(lbls, props):
lblname = lblatlas.find(value=lbl).name
print(' {} [{}]: {:0.2f}%'.format(lbl, lblname, prop))
```
`ProbabilisticAtlas` objects have an analogous method called `values`:
```
vals = probatlas.values((25, 52, 43), voxel=True)
print('Regions at voxel [25, 52, 43]:')
for idx, val in enumerate(vals):
if val > 0:
lbl = probatlas.find(index=idx)
print(' {} [{}]: {:0.2f}%'.format(lbl.value, lbl.name, val))
print('Average proportions of regions within mask:')
vals = probatlas.values(mask)
for idx, val in enumerate(vals):
if val > 0:
lbl = probatlas.find(index=idx)
print(' {} [{}]: {:0.2f}%'.format(lbl.value, lbl.name, val))
```
File added
File added
0.99895165 0.0442250787 -0.01181694611 6.534548061
-0.0439203422 0.9987243849 0.02491016319 9.692178016
0.01290352651 -0.02436503913 0.9996197946 21.90296924
0 0 0 1
1.057622308 0.05073972589 0.008769375553 -31.51452272
-0.0541050194 0.9680196522 0.1445326796 2.872941273
-0.01020603009 -0.2324151706 1.127557283 21.35031106
0 0 0 1
File added
1.056802026 -0.01924547726 0.02614687181 -36.51723948
0.009055463297 0.9745460053 0.09056277052 -8.771603455
-0.04315832679 -0.1680837227 1.136420957 -1.399839791
0 0 0 1