Merge branch 'master' into 'master'

New threading/parallel practical See merge request fsl/pytreat-2018-practicals!37

Merge branch 'master' into 'master'
cd8499d7 · Paul McCarthy · b9d95b74 · 6ad4b884 · cd8499d7 · cd8499d7
Commit cd8499d7 authored 7 years ago by Paul McCarthy
--- a/advanced_topics/06_decorators.ipynb
+++ b/advanced_topics/06_decorators.ipynb
@@ -644,6 +644,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "> Note that in Python 3.2 and newer you can use the\n",
+    "> [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache)\n",
+    "> to memoize your functions.\n",
+    "\n",
+    "\n",
    "<a class=\"anchor\" id=\"decorator-classes\"></a>\n",
    "## Decorator classes\n",
    "\n",

 %% Cell type:markdown id: tags:

 # Decorators


 Remember that in Python, everything is an object, including functions. This
 means that we can do things like:


 - Pass a function as an argument to another function.
 - Create/define a function inside another function.
 - Write a function which returns another function.


 These abilities mean that we can do some neat things with functions in Python.


 * [Overview](#overview)
 * [Decorators on methods](#decorators-on-methods)
 * [Example - memoization](#example-memoization)
 * [Decorators with arguments](#decorators-with-arguments)
 * [Chaining decorators](#chaining-decorators)
 * [Decorator classes](#decorator-classes)
 * [Appendix: Functions are not special](#appendix-functions-are-not-special)
 * [Appendix: Closures](#appendix-closures)
 * [Appendix: Decorators without arguments versus decorators with arguments](#appendix-decorators-without-arguments-versus-decorators-with-arguments)
 * [Appendix: Per-instance decorators](#appendix-per-instance-decorators)
 * [Appendix: Preserving function metadata](#appendix-preserving-function-metadata)
 * [Appendix: Class decorators](#appendix-class-decorators)
 * [Useful references](#useful-references)


 <a class="anchor" id="overview"></a>
 ## Overview


 Let's say that we want a way to calculate the execution time of any function
 (this example might feel familiar to you if you have gone through the
 practical on operator overloading).


 Our first attempt at writing such a function might look like this:

 %% Cell type:code id: tags:

 ``` 
 import time
 def timeFunc(func, *args, **kwargs):

    start  = time.time()
    retval = func(*args, **kwargs)
    end    = time.time()

    print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start))

    return retval
 ```

 %% Cell type:markdown id: tags:

 The `timeFunc` function accepts another function, `func`, as its first
 argument. It calls `func`, passing it all of the other arguments, and then
 prints the time taken for `func` to complete:

 %% Cell type:code id: tags:

 ``` 
 import numpy        as np
 import numpy.linalg as npla

 def inverse(a):
    return npla.inv(a)

 data    = np.random.random((2000, 2000))
 invdata = timeFunc(inverse, data)
 ```

 %% Cell type:markdown id: tags:

 But this means that whenever we want to time something, we have to call the
 `timeFunc` function directly. Let's take advantage of the fact that we can
 define a function inside another funciton. Look at the next block of code
 carefully, and make sure you understand what our new `timeFunc` implementation
 is doing.

 %% Cell type:code id: tags:

 ``` 
 import time
 def timeFunc(func):

    def wrapperFunc(*args, **kwargs):

        start  = time.time()
        retval = func(*args, **kwargs)
        end    = time.time()

        print('Ran {} in {:0.2f} seconds'.format(func.__name__, end - start))

        return retval

    return wrapperFunc
 ```

 %% Cell type:markdown id: tags:

 This new `timeFunc` function is again passed a function `func`, but this time
 as its sole argument. It then creates and returns a new function,
 `wrapperFunc`. This `wrapperFunc` function calls and times the function that
 was passed to `timeFunc`.  But note that when `timeFunc` is called,
 `wrapperFunc` is _not_ called - it is only created and returned.


 Let's use our new `timeFunc` implementation:

 %% Cell type:code id: tags:

 ``` 
 import numpy        as np
 import numpy.linalg as npla

 def inverse(a):
    return npla.inv(a)

 data    = np.random.random((2000, 2000))
 inverse = timeFunc(inverse)
 invdata = inverse(data)
 ```

 %% Cell type:markdown id: tags:

 Here, we did the following:


 1. We defined a function called `inverse`:

  > ```
  > def inverse(a):
  >     return npla.inv(a)
  > ```

 2. We passed the `inverse` function to the `timeFunc` function, and
   re-assigned the return value of `timeFunc` back to `inverse`:

  > ```
  > inverse = timeFunc(inverse)
  > ```

 3. We called the new `inverse` function:

  > ```
  > invdata = inverse(data)
  > ```


 So now the `inverse` variable refers to an instantiation of `wrapperFunc`,
 which holds a reference to the original definition of `inverse`.


 > If this is not clear, take a break now and read through the appendix on how
 > [functions are not special](#appendix-functions-are-not-special).


 Guess what? We have just created a __decorator__. A decorator is simply a
 function which accepts a function as its input, and returns another function
 as its output. In the example above, we have _decorated_ the `inverse`
 function with the `timeFunc` decorator.


 Python provides an alternative syntax for decorating one function with
 another, using the `@` character. The approach that we used to decorate
 `inverse` above:

 %% Cell type:code id: tags:

 ``` 
 def inverse(a):
    return npla.inv(a)

 inverse = timeFunc(inverse)
 invdata = inverse(data)
 ```

 %% Cell type:markdown id: tags:

 is semantically equivalent to this:

 %% Cell type:code id: tags:

 ``` 
 @timeFunc
 def inverse(a):
    return npla.inv(a)

 invdata = inverse(data)
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="decorators-on-methods"></a>
 ## Decorators on methods


 Applying a decorator to the methods of a class works in the same way:

 %% Cell type:code id: tags:

 ``` 
 import numpy.linalg as npla

 class MiscMaths(object):

    @timeFunc
    def inverse(self, a):
        return npla.inv(a)
 ```

 %% Cell type:markdown id: tags:

 Now, the `inverse` method of all `MiscMaths` instances will be timed:

 %% Cell type:code id: tags:

 ``` 
 mm1 = MiscMaths()
 mm2 = MiscMaths()

 i1 = mm1.inverse(np.random.random((1000, 1000)))
 i2 = mm2.inverse(np.random.random((1500, 1500)))
 ```

 %% Cell type:markdown id: tags:

 Note that only one `timeFunc` decorator was created here - the `timeFunc`
 function was only called once - when the `MiscMaths` class was defined.  This
 might be clearer if we re-write the above code in the following (equivalent)
 manner:

 %% Cell type:code id: tags:

 ``` 
 class MiscMaths(object):
    def inverse(self, a):
        return npla.inv(a)

 MiscMaths.inverse = timeFunc(MiscMaths.inverse)
 ```

 %% Cell type:markdown id: tags:

 So only one `wrapperFunc` function exists, and this function is _shared_ by
 all instances of the `MiscMaths` class - (such as the `mm1` and `mm2`
 instances in the example above). In many cases this is not a problem, but
 there can be situations where you need each instance of your class to have its
 own unique decorator.


 > If you are interested in solutions to this problem, take a look at the
 > appendix on [per-instance decorators](#appendix-per-instance-decorators).


 <a class="anchor" id="example-memoization"></a>
 ## Example - memoization


 Let's move onto another example.
 [Meowmoization](https://en.wikipedia.org/wiki/Memoization) is a common
 performance optimisation technique used in cats. I mean software. Essentially,
 memoization refers to the process of maintaining a cache for a function which
 performs some expensive calculation. When the function is executed with a set
 of inputs, the calculation is performed, and then a copy of the inputs and the
 result are cached. If the function is called again with the same inputs, the
 cached result can be returned.


 This is a perfect problem to tackle with decorators:

 %% Cell type:code id: tags:

 ``` 
 def memoize(func):

    cache = {}

    def wrapper(*args):

        # is there a value in the cache
        # for this set of inputs?
        cached = cache.get(args, None)

        # If not, call the function,
        # and cache the result.
        if cached is None:
            cached      = func(*args)
            cache[args] = cached
        else:
            print('Cached {}({}): {}'.format(func.__name__, args, cached))

        return cached

    return wrapper
 ```

 %% Cell type:markdown id: tags:

 We can now use our `memoize` decorator to add a memoization cache to any
 function.  Let's memoize a function which generates the $n^{th}$ number in the
 [Fibonacci series](https://en.wikipedia.org/wiki/Fibonacci_number):

 %% Cell type:code id: tags:

 ``` 
 @memoize
 def fib(n):

    if n in (0, 1):
        print('fib({}) = {}'.format(n, n))
        return n

    twoback = 1
    oneback = 1
    val     = 1

    for _ in range(2, n):

        val     = oneback + twoback
        twoback = oneback
        oneback = val

    print('fib({}) = {}'.format(n, val))

    return val
 ```

 %% Cell type:markdown id: tags:

 For a given input, when `fib` is called the first time, it will calculate the
 $n^{th}$ Fibonacci number:

 %% Cell type:code id: tags:

 ``` 
 for i in range(10):
    fib(i)
 ```

 %% Cell type:markdown id: tags:

 However, on repeated calls with the same input, the calculation is skipped,
 and instead the result is retrieved from the memoization cache:

 %% Cell type:code id: tags:

 ``` 
 for i in range(10):
    fib(i)
 ```

 %% Cell type:markdown id: tags:

 > If you are wondering how the `wrapper` function is able to access the
 > `cache` variable, refer to the [appendix on closures](#appendix-closures).


 <a class="anchor" id="decorators-with-arguments"></a>
 ## Decorators with arguments


 Continuing with our memoization example, let's say that we want to place a
 limit on the maximum size that our cache can grow to. For example, the output
 of our function might have large memory requirements, so we can only afford to
 store a handful of pre-calculated results. It would be nice to be able to
 specify the maximum cache size when we define our function to be memoized,
 like so:


 > ```
 > # cache at most 10 results
 > @limitedMemoize(10):
 > def fib(n):
 >     ...
 > ```


 In order to support this, our `memoize` decorator function needs to be
 modified - it is currently written to accept a function as its sole argument,
 but we need it to accept a cache size limit.

 %% Cell type:code id: tags:

 ``` 
 from collections import OrderedDict

 def limitedMemoize(maxSize):

    cache = OrderedDict()

    def decorator(func):
        def wrapper(*args):

            # is there a value in the cache
            # for this set of inputs?
            cached = cache.get(args, None)

            # If not, call the function,
            # and cache the result.
            if cached is None:

                cached = func(*args)

                # If the cache has grown too big,
                # remove the oldest item. In practice
                # it would make more sense to remove
                # the item with the oldest access
                # time, but this is good enough for
                # an introduction!
                if len(cache) >= maxSize:
                    cache.popitem(last=False)

                cache[args] = cached
            else:
                print('Cached {}({}): {}'.format(func.__name__, args, cached))

            return cached
        return wrapper
    return decorator
 ```

 %% Cell type:markdown id: tags:

 > We used the handy
 > [`collections.OrderedDict`](https://docs.python.org/3.5/library/collections.html#collections.OrderedDict)
 > class here which preserves the insertion order of key-value pairs.


 This is starting to look a little complicated - we now have _three_ layers of
 functions. This is necessary when you wish to write a decorator which accepts
 arguments (refer to the
 [appendix](#appendix-decorators-without-arguments-versus-decorators-with-arguments)
 for more details).


 But this `limitedMemoize` decorator is used in essentially the same way as our
 earlier `memoize` decorator:

 %% Cell type:code id: tags:

 ``` 
 @limitedMemoize(5)
 def fib(n):

    if n in (0, 1):
        print('fib({}) = 1'.format(n))
        return n

    twoback = 1
    oneback = 1
    val     = 1

    for _ in range(2, n):

        val     = oneback + twoback
        twoback = oneback
        oneback = val

    print('fib({}) = {}'.format(n, val))

    return val
 ```

 %% Cell type:markdown id: tags:

 Except that now, the `fib` function will only cache up to 5 values.

 %% Cell type:code id: tags:

 ``` 
 fib(10)
 fib(11)
 fib(12)
 fib(13)
 fib(14)
 print('The result for 10 should come from the cache')
 fib(10)
 fib(15)
 print('The result for 10 should no longer be cached')
 fib(10)
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="chaining-decorators"></a>
 ## Chaining decorators


 Decorators can easily be chained, or nested:

 %% Cell type:code id: tags:

 ``` 
 import time

 @timeFunc
 @memoize
 def expensiveFunc(n):
    time.sleep(n)
    return n
 ```

 %% Cell type:markdown id: tags:

 > Remember that this is semantically equivalent to the following:
 >
 > ```
 > def expensiveFunc(n):
 >     time.sleep(n)
 >     return n
 >
 > expensiveFunc = timeFunc(memoize(expensiveFunc))
 > ```


 Now we can see the effect of our memoization layer on performance:

 %% Cell type:code id: tags:

 ``` 
 expensiveFunc(0.5)
 expensiveFunc(1)
 expensiveFunc(1)
 ```

 %% Cell type:markdown id: tags:

+> Note that in Python 3.2 and newer you can use the
+> [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache)
+> to memoize your functions.
+
+
 <a class="anchor" id="decorator-classes"></a>
 ## Decorator classes


 By now, you will have gained the impression that a decorator is a function
 which _decorates_ another function. But if you went through the practical on
 operator overloading, you might remember the special `__call__` method, that
 allows an object to be called as if it were a function.


 This feature allows us to write our decorators as classes, instead of
 functions. This can be handy if you are writing a decorator that has
 complicated behaviour, and/or needs to maintain some sort of state which
 cannot be easily or elegantly written using nested functions.


 As an example, let's say we are writing a framework for unit testing. We want
 to be able to "mark" our test functions like so, so they can be easily
 identified and executed:


 > ```
 > @unitTest
 > def testblerk():
 >     """tests the blerk algorithm."""
 >     ...
 > ```


 With a decorator like this, we wouldn't need to worry about where our tests
 are located - they will all be detected because we have marked them as test
 functions. What does this `unitTest` decorator look like?

 %% Cell type:code id: tags:

 ``` 
 class TestRegistry(object):

    def __init__(self):
        self.testFuncs = []

    def __call__(self, func):
        self.testFuncs.append(func)

    def listTests(self):
        print('All registered tests:')
        for test in self.testFuncs:
            print(' ', test.__name__)

    def runTests(self):
        for test in self.testFuncs:
            print('Running test {:10s} ... '.format(test.__name__), end='')
            try:
                test()
                print('passed!')
            except Exception as e:
                print('failed!')

 # Create our test registry
 registry = TestRegistry()

 # Alias our registry to "unitTest"
 # so that we can register tests
 # with a "@unitTest" decorator.
 unitTest = registry
 ```

 %% Cell type:markdown id: tags:

 So we've defined a class, `TestRegistry`, and created an instance of it,
 `registry`, which will manage all of our unit tests. Now, in order to "mark"
 any function as being a unit test, we just need to use the `unitTest`
 decorator (which is simply a reference to our `TestRegistry` instance):

 %% Cell type:code id: tags:

 ``` 
 @unitTest
 def testFoo():
    assert 'a' in 'bcde'

 @unitTest
 def testBar():
    assert 1 > 0

 @unitTest
 def testBlerk():
    assert 9 % 2 == 0
 ```

 %% Cell type:markdown id: tags:

 Now that these functions have been registered with our `TestRegistry`
 instance, we can run them all:

 %% Cell type:code id: tags:

 ``` 
 registry.listTests()
 registry.runTests()
 ```

 %% Cell type:markdown id: tags:

 > Unit testing is something which you must do! This is __especially__
 > important in an interpreted language such as Python, where there is no
 > compiler to catch all of your mistakes.
 >
 > Python has a built-in
 > [`unittest`](https://docs.python.org/3.5/library/unittest.html) module,
 > however the third-party [`pytest`](https://docs.pytest.org/en/latest/) and
 > [`nose`](http://nose2.readthedocs.io/en/latest/) are popular.  It is also
 > wise to combine your unit tests with
 > [`coverage`](https://coverage.readthedocs.io/en/coverage-4.5.1/), which
 > tells you how much of your code was executed, or _covered_ when your
 > tests were run.


 <a class="anchor" id="appendix-functions-are-not-special"></a>
 ## Appendix: Functions are not special


 When we write a statement like this:

 %% Cell type:code id: tags:

 ``` 
 a = [1, 2, 3]
 ```

 %% Cell type:markdown id: tags:

 the variable `a` is a reference to a `list`. We can create a new reference to
 the same list, and delete `a`:

 %% Cell type:code id: tags:

 ``` 
 b = a
 del a
 ```

 %% Cell type:markdown id: tags:

 Deleting `a` doesn't affect the list at all - the list still exists, and is
 now referred to by a variable called `b`.

 %% Cell type:code id: tags:

 ``` 
 print('b: ', b)
 ```

 %% Cell type:markdown id: tags:

 `a` has, however, been deleted:

 %% Cell type:code id: tags:

 ``` 
 print('a: ', a)
 ```

 %% Cell type:markdown id: tags:

 The variables `a` and `b` are just references to a list that is sitting in
 memory somewhere - renaming or removing a reference does not have any effect
 upon the list<sup>2</sup>.


 If you are familiar with C or C++, you can think of a variable in Python as
 like a `void *` pointer - it is just a pointer of an unspecified type, which
 is pointing to some item in memory (which does have a specific type). Deleting
 the pointer does not have any effect upon the item to which it was pointing.


 > <sup>2</sup> Until no more references to the list exist, at which point it
 > will be
 > [garbage-collected](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons).


 Now, functions in Python work in _exactly_ the same way as variables.  When we
 define a function like this:

 %% Cell type:code id: tags:

 ``` 
 def inverse(a):
    return npla.inv(a)

 print(inverse)
 ```

 %% Cell type:markdown id: tags:

 there is nothing special about the name `inverse` - `inverse` is just a
 reference to a function that resides somewhere in memory. We can create a new
 reference to this function:

 %% Cell type:code id: tags:

 ``` 
 inv2 = inverse
 ```

 %% Cell type:markdown id: tags:

 And delete the old reference:

 %% Cell type:code id: tags:

 ``` 
 del inverse
 ```

 %% Cell type:markdown id: tags:

 But the function still exists, and is still callable, via our second
 reference:

 %% Cell type:code id: tags:

 ``` 
 print(inv2)
 data    = np.random.random((10, 10))
 invdata = inv2(data)
 ```

 %% Cell type:markdown id: tags:

 So there is nothing special about functions in Python - they are just items
 that reside somewhere in memory, and to which we can create as many references
 as we like.


 > If it bothers you that `print(inv2)` resulted in
 > `<function inverse at ...>`, and not `<function inv2 at ...>`, then refer to
 > the appendix on
 > [preserving function metdata](#appendix-preserving-function-metadata).


 <a class="anchor" id="appendix-closures"></a>
 ## Appendix: Closures


 Whenever we define or use a decorator, we are taking advantage of a concept
 called a [_closure_][wiki-closure]. Take a second to re-familiarise yourself
 with our `memoize` decorator function from earlier - when `memoize` is called,
 it creates and returns a function called `wrapper`:


 [wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming)

 %% Cell type:code id: tags:

 ``` 
 def memoize(func):

    cache = {}

    def wrapper(*args):

        # is there a value in the cache
        # for this set of inputs?
        cached = cache.get(args, None)

        # If not, call the function,
        # and cache the result.
        if cached is None:
            cached      = func(*args)
            cache[args] = cached
        else:
            print('Cached {}({}): {}'.format(func.__name__, args, cached))

        return cached

    return wrapper
 ```

 %% Cell type:markdown id: tags:

 Then `wrapper` is executed at some arbitrary point in the future. But how does
 it have access to `cache`, defined within the scope of the `memoize` function,
 after the execution of `memoize` has ended?

 %% Cell type:code id: tags:

 ``` 
 def nby2(n):
    return n * 2

 # wrapper function is created here (and
 # assigned back to the nby2 reference)
 nby2 = memoize(nby2)

 # wrapper function is executed here
 print('nby2(2): ', nby2(2))
 print('nby2(2): ', nby2(2))
 ```

 %% Cell type:markdown id: tags:

 The trick is that whenever a nested function is defined in Python, the scope
 in which it is defined is preserved for that function's lifetime. So `wrapper`
 has access to all of the variables within the `memoize` function's scope, that
 were defined at the time that `wrapper` was created (which was when we called
 `memoize`).  This is why `wrapper` is able to access `cache`, even though at
 the time that `wrapper` is called, the execution of `memoize` has long since
 finished.


 This is what is known as a
 [_closure_](https://www.geeksforgeeks.org/python-closures/). Closures are a
 fundamental, and extremely powerful, aspect of Python and other high level
 languages. So there's your answer,
 [fishbulb](https://www.youtube.com/watch?v=CiAaEPcnlOg).


 <a class="anchor" id="appendix-decorators-without-arguments-versus-decorators-with-arguments"></a>
 ## Appendix: Decorators without arguments versus decorators with arguments


 There are three ways to invoke a decorator with the `@` notation:


 1. Naming it, e.g. `@mydecorator`
 2. Calling it, e.g. `@mydecorator()`
 3. Calling it, and passing it arguments, e.g. `@mydecorator(1, 2, 3)`


 Python expects a decorator function to behave differently in the second and
 third scenarios, when compared to the first:

 %% Cell type:code id: tags:

 ``` 
 def decorator(*args):
    print('  decorator({})'.format(args))
    def wrapper(*args):
        print('    wrapper({})'.format(args))
    return wrapper

 print('Scenario #1: @decorator')
 @decorator
 def noop():
    pass

 print('\nScenario #2: @decorator()')
 @decorator()
 def noop():
    pass

 print('\nScenario #3: @decorator(1, 2, 3)')
 @decorator(1, 2, 3)
 def noop():
    pass
 ```

 %% Cell type:markdown id: tags:

 So if a decorator is "named" (scenario 1), only the decorator function
 (`decorator` in the example above) is called, and is passed the decorated
 function.


 But if a decorator function is "called" (scenarios 2 or 3), both the decorator
 function (`decorator`), __and its return value__ (`wrapper`) are called - the
 decorator function is passed the arguments that were provided, and its return
 value is passed the decorated function.


 This is why, if you are writing a decorator function which expects arguments,
 you must use three layers of functions, like so:

 %% Cell type:code id: tags:

 ``` 
 def decorator(*args):
    def realDecorator(func):
        def wrapper(*args, **kwargs):
            return func(*args, **kwargs)
        return wrapper
    return realDecorator
 ```

 %% Cell type:markdown id: tags:

 > The author of this practical is angry about this, as he does not understand
 > why the Python language designers couldn't allow a decorator function to be
 > passed both the decorated function, and any arguments that were passed when
 > the decorator was invoked, like so:
 >
 > ```
 > def decorator(func, *args, **kwargs): # args/kwargs here contain
 >                                       # whatever is passed to the
 >                                       # decorator
 >
 >     def wrapper(*args, **kwargs):     # args/kwargs here contain
 >                                       # whatever is passed to the
 >                                       # decorated function
 >          return func(*args, **kwargs)
 >
 >     return wrapper
 > ```


 <a class="anchor" id="appendix-per-instance-decorators"></a>
 ## Appendix: Per-instance decorators


 In the section on [decorating methods](#decorators-on-methods), you saw
 that when a decorator is applied to a method of a class,  that decorator
 is invoked just once, and shared by all instances of the class. Consider this
 example:

 %% Cell type:code id: tags:

 ``` 
 def decorator(func):
    print('Decorating {} function'.format(func.__name__))
    def wrapper(*args, **kwargs):
        print('Calling decorated function {}'.format(func.__name__))
        return func(*args, **kwargs)
    return wrapper

 class MiscMaths(object):

    @decorator
    def add(self, a, b):
        return a + b
 ```

 %% Cell type:markdown id: tags:

 Note that `decorator` was called at the time that the `MiscMaths` class was
 defined. Now, all `MiscMaths` instances share the same `wrapper` function:

 %% Cell type:code id: tags:

 ``` 
 mm1 = MiscMaths()
 mm2 = MiscMaths()

 print('1 + 2 =', mm1.add(1, 2))
 print('3 + 4 =', mm2.add(3, 4))
 ```

 %% Cell type:markdown id: tags:

 This is not an issue in many cases, but it can be problematic in some. Imagine
 if we have a decorator called `ensureNumeric`, which makes sure that arguments
 passed to a function are numbers:

 %% Cell type:code id: tags:

 ``` 
 def ensureNumeric(func):
    def wrapper(*args):
        args = tuple([float(a) for a in args])
        return func(*args)
    return wrapper
 ```

 %% Cell type:markdown id: tags:

 This all looks well and good - we can use it to decorate a numeric function,
 allowing strings to be passed in as well:

 %% Cell type:code id: tags:

 ``` 
 @ensureNumeric
 def mul(a, b):
    return a * b

 print(mul( 2,   3))
 print(mul('5', '10'))
 ```

 %% Cell type:markdown id: tags:

 But what will happen when we try to decorate a method of a class?

 %% Cell type:code id: tags:

 ``` 
 class MiscMaths(object):

    @ensureNumeric
    def add(self, a, b):
        return a + b

 mm = MiscMaths()
 print(mm.add('5', 10))
 ```

 %% Cell type:markdown id: tags:

 What happened here?? Remember that the first argument passed to any instance
 method is the instance itself (the `self` argument). Well, the `MiscMaths`
 instance was passed to the `wrapper` function, which then tried to convert it
 into a `float`.  So we can't actually apply the `ensureNumeric` function as a
 decorator on a method in this way.


 There are a few potential solutions here. We could modify the `ensureNumeric`
 function, so that the `wrapper` ignores the first argument. But this would
 mean that we couldn't use `ensureNumeric` with standalone functions.


 But we _can_ manually apply the `ensureNumeric` decorator to `MiscMaths`
 instances when they are initialised.  We can't use the nice `@ensureNumeric`
 syntax to apply our decorators, but this is a viable approach:

 %% Cell type:code id: tags:

 ``` 
 class MiscMaths(object):

    def __init__(self):
        self.add = ensureNumeric(self.add)

    def add(self, a, b):
        return a + b

 mm = MiscMaths()
 print(mm.add('5', 10))
 ```

 %% Cell type:markdown id: tags:

 Another approach is to use a second decorator, which dynamically creates the
 real decorator when it is accessed on an instance. This requires the use of an
 advanced Python technique called
 [_descriptors_](https://docs.python.org/3.5/howto/descriptor.html), which is
 beyond the scope of this practical. But if you are interested, you can see an
 implementation of this approach
 [here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.8/fsl/utils/memoize.py#L249).


 <a class="anchor" id="appendix-preserving-function-metadata"></a>
 ## Appendix: Preserving function metadata


 You may have noticed that when we decorate a function, some of its properties
 are lost. Consider this function:

 %% Cell type:code id: tags:

 ``` 
 def add2(a, b):
    """Adds two numbers together."""
    return a + b
 ```

 %% Cell type:markdown id: tags:

 The `add2` function is an object which has some attributes, e.g.:

 %% Cell type:code id: tags:

 ``` 
 print('Name: ', add2.__name__)
 print('Help: ', add2.__doc__)
 ```

 %% Cell type:markdown id: tags:

 However, when we apply a decorator to `add2`:

 %% Cell type:code id: tags:

 ``` 
 def decorator(func):
    def wrapper(*args, **kwargs):
        """Internal wrapper function for decorator."""
        print('Calling decorated function {}'.format(func.__name__))
        return func(*args, **kwargs)
    return wrapper


 @decorator
 def add2(a, b):
    """Adds two numbers together."""
    return a + b
 ```

 %% Cell type:markdown id: tags:

 Those attributes are lost, and instead we get the attributes of the `wrapper`
 function:

 %% Cell type:code id: tags:

 ``` 
 print('Name: ', add2.__name__)
 print('Help: ', add2.__doc__)
 ```

 %% Cell type:markdown id: tags:

 While this may be inconsequential in most situations, it can be quite annoying
 in some, such as when we are automatically [generating
 documentation](http://www.sphinx-doc.org/) for our code.


 Fortunately, there is a workaround, available in the built-in
 [`functools`](https://docs.python.org/3.5/library/functools.html#functools.wraps)
 module:

 %% Cell type:code id: tags:

 ``` 
 import functools

 def decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        """Internal wrapper function for decorator."""
        print('Calling decorated function {}'.format(func.__name__))
        return func(*args, **kwargs)
    return wrapper

 @decorator
 def add2(a, b):
    """Adds two numbers together."""
    return a + b
 ```

 %% Cell type:markdown id: tags:

 We have applied the `@functools.wraps` decorator to our internal `wrapper`
 function - this will essentially replace the `wrapper` function metdata with
 the metadata from our `func` function. So our `add2` name and documentation is
 now preserved:

 %% Cell type:code id: tags:

 ``` 
 print('Name: ', add2.__name__)
 print('Help: ', add2.__doc__)
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="appendix-class-decorators"></a>
 ## Appendix: Class decorators


 > Not to be confused with [_decorator classes_](#decorator-classes)!


 In this practical, we have shown how decorators can be applied to functions
 and methods. But decorators can in fact also be applied to _classes_. This is
 a fairly niche feature that you are probably not likely to need, so we will
 only cover it briefly.


 Imagine that we want all objects in our application to have a globally unique
 (within the application) identifier. We could use a decorator which contains
 the logic for generating unique IDs, and defines the interface that we can
 use on an instance to obtain its ID:

 %% Cell type:code id: tags:

 ``` 
 import random

 allIds = set()

 def uniqueID(cls):
    class subclass(cls):
        def getUniqueID(self):

            uid = getattr(self, '_uid', None)

            if uid is not None:
                return uid

            while uid is None or uid in set():
                uid = random.randint(1, 100)

            self._uid = uid
            return uid

    return subclass
 ```

 %% Cell type:markdown id: tags:

 Now we can use the `@uniqueID` decorator on any class that we need to
 have a unique ID:

 %% Cell type:code id: tags:

 ``` 
 @uniqueID
 class Foo(object):
    pass

 @uniqueID
 class Bar(object):
    pass
 ```

 %% Cell type:markdown id: tags:

 All instances of these classes will have a `getUniqueID` method:

 %% Cell type:code id: tags:

 ``` 
 f1 = Foo()
 f2 = Foo()
 b1 = Bar()
 b2 = Bar()

 print('f1: ', f1.getUniqueID())
 print('f2: ', f2.getUniqueID())
 print('b1: ', b1.getUniqueID())
 print('b2: ', b2.getUniqueID())
 ```

 %% Cell type:markdown id: tags:

 <a class="anchor" id="useful-references"></a>
 ## Useful references


 * [Understanding decorators in 12 easy steps](http://simeonfranklin.com/blog/2012/jul/1/python-decorators-in-12-steps/)
 * [The decorators they won't tell you about](https://github.com/hchasestevens/hchasestevens.github.io/blob/master/notebooks/the-decorators-they-wont-tell-you-about.ipynb)
 * [Closures - Wikipedia][wiki-closure]
 * [Closures in Python](https://www.geeksforgeeks.org/python-closures/)
 * [Garbage collection in Python](https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons)

 [wiki-closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming)

--- a/advanced_topics/06_decorators.md
+++ b/advanced_topics/06_decorators.md
@@ -493,6 +493,11 @@ expensiveFunc(1)
 ```


+> Note that in Python 3.2 and newer you can use the
+> [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache)
+> to memoize your functions.
+
+
 <a class="anchor" id="decorator-classes"></a>
 ## Decorator classes


--- a/advanced_topics/07_threading.ipynb
+++ b/advanced_topics/07_threading.ipynb
--- a/advanced_topics/07_threading.md
+++ b/advanced_topics/07_threading.md
+# Threading and parallel processing
+
+
+The Python language has built-in support for multi-threading in the
+[`threading`](https://docs.python.org/3.5/library/threading.html) module, and
+true parallelism in the
+[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)
+module.  If you want to be impressed, skip straight to the section on
+[`multiprocessing`](todo).
+
+
+
+
+
+## Threading
+
+
+The [`threading`](https://docs.python.org/3.5/library/threading.html) module
+provides a traditional multi-threading API that should be familiar to you if
+you have worked with threads in other languages.
+
+
+Running a task in a separate thread in Python is easy - simply create a
+`Thread` object, and pass it the function or method that you want it to
+run. Then call its `start` method:
+
+
+```
+import time
+import threading
+
+def longRunningTask(niters):
+    for i in range(niters):
+        if i % 2 == 0: print('Tick')
+        else:          print('Tock')
+        time.sleep(0.5)
+
+t = threading.Thread(target=longRunningTask, args=(8,))
+
+t.start()
+
+while t.is_alive():
+    time.sleep(0.4)
+    print('Waiting for thread to finish...')
+print('Finished!')
+```
+
+
+You can also `join` a thread, which will block execution in the current thread
+until the thread that has been `join`ed has finished:
+
+
+```
+t = threading.Thread(target=longRunningTask, args=(6, ))
+t.start()
+
+print('Joining thread ...')
+t.join()
+print('Finished!')
+```
+
+
+### Subclassing `Thread`
+
+
+It is also possible to sub-class the `Thread` class, and override its `run`
+method:
+
+
+```
+class LongRunningThread(threading.Thread):
+    def __init__(self, niters, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.niters = niters
+
+    def run(self):
+        for i in range(self.niters):
+            if i % 2 == 0: print('Tick')
+            else:          print('Tock')
+            time.sleep(0.5)
+
+t = LongRunningThread(6)
+t.start()
+t.join()
+print('Done')
+```
+
+
+### Daemon threads
+
+
+By default, a Python application will not exit until _all_ active threads have
+finished execution.  If you want to run a task in the background for the
+duration of your application, you can mark it as a `daemon` thread - when all
+non-daemon threads in a Python application have finished, all daemon threads
+will be halted, and the application will exit.
+
+
+You can mark a thread as being a daemon by setting an attribute on it after
+creation:
+
+
+```
+t = threading.Thread(target=longRunningTask)
+t.daemon = True
+```
+
+
+See the [`Thread`
+documentation](https://docs.python.org/3.5/library/threading.html#thread-objects)
+for more details.
+
+
+### Thread synchronisation
+
+
+The `threading` module provides some useful thread-synchronisation primitives
+- the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes.  The
+`threading` module also provides `Condition` and `Semaphore` classes - refer
+to the [documentation](https://docs.python.org/3.5/library/threading.html) for
+more details.
+
+
+#### `Lock`
+
+
+The [`Lock`](https://docs.python.org/3.5/library/threading.html#lock-objects)
+class (and its re-entrant version, the
+[`RLock`](https://docs.python.org/3.5/library/threading.html#rlock-objects))
+prevents a block of code from being accessed by more than one thread at a
+time. For example, if we have multiple threads running this `task` function,
+their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably
+become intertwined:
+
+
+```
+def task():
+    for i in range(5):
+        print('{} Woozle '.format(i), end='')
+        time.sleep(0.1)
+        print('Wuzzle')
+
+threads = [threading.Thread(target=task) for i in range(5)]
+for t in threads:
+    t.start()
+```
+
+
+But if we protect the critical section with a `Lock` object, the output will
+look more sensible:
+
+
+```
+lock = threading.Lock()
+
+def task():
+
+    for i in range(5):
+        with lock:
+            print('{} Woozle '.format(i), end='')
+            time.sleep(0.1)
+            print('Wuzzle')
+
+threads = [threading.Thread(target=task) for i in range(5)]
+for t in threads:
+    t.start()
+```
+
+
+> Instead of using a `Lock` object in a `with` statement, it is also possible
+> to manually call its `acquire` and `release` methods:
+>
+>     def task():
+>         for i in range(5):
+>             lock.acquire()
+>             print('{} Woozle '.format(i), end='')
+>             time.sleep(0.1)
+>             print('Wuzzle')
+>             lock.release()
+
+
+Python does not have any built-in constructs to implement `Lock`-based mutual
+exclusion across several functions or methods - each function/method must
+explicitly acquire/release a shared `Lock` instance. However, it is relatively
+straightforward to implement a decorator which does this for you:
+
+
+```
+def mutex(func, lock):
+    def wrapper(*args):
+        with lock:
+            func(*args)
+    return wrapper
+
+class MyClass(object):
+
+    def __init__(self):
+        lock = threading.Lock()
+        self.safeFunc1 = mutex(self.safeFunc1, lock)
+        self.safeFunc2 = mutex(self.safeFunc2, lock)
+
+    def safeFunc1(self):
+        time.sleep(0.1)
+        print('safeFunc1 start')
+        time.sleep(0.2)
+        print('safeFunc1 end')
+
+    def safeFunc2(self):
+        time.sleep(0.1)
+        print('safeFunc2 start')
+        time.sleep(0.2)
+        print('safeFunc2 end')
+
+mc = MyClass()
+
+f1threads = [threading.Thread(target=mc.safeFunc1) for i in range(4)]
+f2threads = [threading.Thread(target=mc.safeFunc2) for i in range(4)]
+
+for t in f1threads + f2threads:
+    t.start()
+```
+
+
+Try removing the `mutex` lock from the two methods in the above code, and see
+what it does to the output.
+
+
+#### `Event`
+
+
+The
+[`Event`](https://docs.python.org/3.5/library/threading.html#event-objects)
+class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to
+signal events between threads. Threads can `wait` on the event, and be awoken
+when the event is `set` by another thread:
+
+
+[semaphore-wiki]: https://en.wikipedia.org/wiki/Semaphore_(programming)
+
+
+```
+import numpy as np
+
+processingFinished = threading.Event()
+
+def processData(data):
+    print('Processing data ...')
+    time.sleep(2)
+    print('Result: {}'.format(data.mean()))
+    processingFinished.set()
+
+data = np.random.randint(1, 100, 100)
+
+t = threading.Thread(target=processData, args=(data,))
+t.start()
+
+processingFinished.wait()
+print('Processing finished!')
+```
+
+### The Global Interpreter Lock (GIL)
+
+
+The [_Global Interpreter
+Lock_](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)
+is an implementation detail of [CPython](https://github.com/python/cpython)
+(the official Python interpreter).  The GIL means that a multi-threaded
+program written in pure Python is not able to take advantage of multiple
+cores - this essentially means that only one thread may be executing at any
+point in time.
+
+
+The `threading` module does still have its uses though, as this GIL problem
+does not affect tasks which involve calls to system or natively compiled
+libraries (e.g. file and network I/O, Numpy operations, etc.). So you can,
+for example, perform some expensive processing on a Numpy array in a thread
+running on one core, whilst having another thread (e.g. user interaction)
+running on another core.
+
+
+## Multiprocessing
+
+
+For true parallelism, you should check out the
+[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)
+module.
+
+
+The `multiprocessing` module spawns sub-processes, rather than threads, and so
+is not subject to the GIL constraints that the `threading` module suffers
+from. It provides two APIs - a "traditional" equivalent to that provided by
+the `threading` module, and a powerful higher-level API.
+
+
+### `threading`-equivalent API
+
+
+The
+[`Process`](https://docs.python.org/3.5/library/multiprocessing.html#the-process-class)
+class is the `multiprocessing` equivalent of the
+[`threading.Thread`](https://docs.python.org/3.5/library/threading.html#thread-objects)
+class.  `multprocessing` also has equivalents of the [`Lock` and `Event`
+classes](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-between-processes),
+and the other synchronisation primitives provided by `threading`.
+
+
+So you can simply replace `threading.Thread` with `multiprocessing.Process`,
+and you will have true parallelism.
+
+
+Because your "threads" are now independent processes, you need to be a little
+careful about how to share information across them. Fortunately, the
+`multiprocessing` module provides [`Queue` and `Pipe`
+classes](https://docs.python.org/3.5/library/multiprocessing.html#exchanging-objects-between-processes)
+which make it easy to share data across processes.
+
+
+### Higher-level API - the `multiprocessing.Pool`
+
+
+The real advantages of `multiprocessing` lie in its higher level API, centered
+around the [`Pool`
+class](https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers).
+
+
+Essentially, you create a `Pool` of worker processes - you specify the number
+of processes when you create the pool.
+
+
+> The best number of processes to use for a `Pool` will depend on the system
+> you are running on (number of cores), and the tasks you are running (e.g.
+> I/O bound or CPU bound).
+
+
+Once you have created a `Pool`, you can use its methods to automatically
+parallelise tasks. The most useful are the `map`, `starmap` and
+`apply_async` methods.
+
+
+#### `Pool.map`
+
+
+The
+[`Pool.map`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map)
+method is the multiprocessing equivalent of the built-in
+[`map`](https://docs.python.org/3.5/library/functions.html#map) function - it
+is given a function, and a sequence, and it applies the function to each
+element in the sequence.
+
+
+```
+import                    time
+import multiprocessing as mp
+import numpy           as np
+
+def crunchImage(imgfile):
+
+    # Load a nifti image, do stuff
+    # to it. Use your imagination
+    # to fill in this function.
+    time.sleep(2)
+
+    # numpy's random number generator
+    # will be initialised in the same
+    # way in each process, so let's
+    # re-seed it.
+    np.random.seed()
+    result = np.random.randint(1, 100, 1)
+
+    print(imgfile, ':', result)
+
+    return result
+
+imgfiles = ['{:02d}.nii.gz'.format(i) for i in range(20)]
+
+p = mp.Pool(processes=16)
+
+print('Crunching images...')
+
+start   = time.time()
+results = p.map(crunchImage, imgfiles)
+end     = time.time()
+
+print('Total execution time: {:0.2f} seconds'.format(end - start))
+```
+
+
+The `Pool.map` method only works with functions that accept one argument, such
+as our `crunchImage` function above. If you have a function which accepts
+multiple arguments, use the
+[`Pool.starmap`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)
+method instead:
+
+
+```
+def crunchImage(imgfile, modality):
+    time.sleep(2)
+
+    np.random.seed()
+
+    if modality == 't1':
+        result = np.random.randint(1, 100, 1)
+    elif modality == 't2':
+        result = np.random.randint(100, 200, 1)
+
+    print(imgfile, ': ', result)
+
+    return result
+
+imgfiles   = ['t1_{:02d}.nii.gz'.format(i) for i in range(10)] + \
+             ['t2_{:02d}.nii.gz'.format(i) for i in range(10)]
+modalities = ['t1'] * 10 + ['t2'] * 10
+
+pool = mp.Pool(processes=16)
+
+args = [(f, m) for f, m in zip(imgfiles, modalities)]
+
+print('Crunching images...')
+
+start   = time.time()
+results = pool.starmap(crunchImage, args)
+end     = time.time()
+
+print('Total execution time: {:0.2f} seconds'.format(end - start))
+```
+
+
+The `map` and `starmap` methods also have asynchronous equivalents `map_async`
+and `starmap_async`, which return immediately. Refer to the
+[`Pool`](https://docs.python.org/3.5/library/multiprocessing.html#module-multiprocessing.pool)
+documentation for more details.
+
+
+#### `Pool.apply_async`
+
+
+The
+[`Pool.apply`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply)
+method will execute a function on one of the processes, and block until it has
+finished.  The
+[`Pool.apply_async`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)
+method returns immediately, and is thus more suited to asynchronously
+scheduling multiple jobs to run in parallel.
+
+
+`apply_async` returns an object of type
+[`AsyncResult`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.AsyncResult).
+An `AsyncResult` object has `wait` and `get` methods which will block until
+the job has completed.
+
+
+```
+import                    time
+import multiprocessing as mp
+import numpy           as np
+
+
+def linear_registration(src, ref):
+    time.sleep(1)
+
+    return np.eye(4)
+
+def nonlinear_registration(src, ref, affine):
+
+    time.sleep(3)
+
+    # this number represents a non-linear warp
+    # field - use your imagination people!
+    np.random.seed()
+    return np.random.randint(1, 100, 1)
+
+t1s = ['{:02d}_t1.nii.gz'.format(i) for i in range(20)]
+std = 'MNI152_T1_2mm.nii.gz'
+
+pool = mp.Pool(processes=16)
+
+print('Running structural-to-standard registration '
+      'on {} subjects...'.format(len(t1s)))
+
+# Run linear registration on all the T1s.
+#
+# We build a list of AsyncResult objects
+linresults = [pool.apply_async(linear_registration, (t1, std))
+              for t1 in t1s]
+
+# Then we wait for each job to finish,
+# and replace its AsyncResult object
+# with the actual result - an affine
+# transformation matrix.
+start = time.time()
+for i, r in enumerate(linresults):
+    linresults[i] = r.get()
+end = time.time()
+
+print('Linear registrations completed in '
+      '{:0.2f} seconds'.format(end - start))
+
+# Run non-linear registration on all the T1s,
+# using the linear registrations to initialise.
+nlinresults = [pool.apply_async(nonlinear_registration, (t1, std, aff))
+               for (t1, aff) in zip(t1s, linresults)]
+
+# Wait for each non-linear reg to finish,
+# and store the resulting warp field.
+start = time.time()
+for i, r in enumerate(nlinresults):
+    nlinresults[i] = r.get()
+end = time.time()
+
+print('Non-linear registrations completed in '
+      '{:0.2f} seconds'.format(end - start))
+
+print('Non linear registrations:')
+for t1, result in zip(t1s, nlinresults):
+    print(t1, ':', result)
+```
--- a/advanced_topics/README.md
+++ b/advanced_topics/README.md
@@ -15,4 +15,4 @@ order, but we recommend going through them in this order:
 4. Operator overloading
 5. Context managers
 6. Decorators
-7. Testing
+7. Threading and parallel processing
--- a/talks/structuring/structuring.ipynb
+++ b/talks/structuring/structuring.ipynb
@@ -31,6 +31,10 @@
    "* [Appendix: Cookiecutter](#appendix-cookiecutter)\n",
    "\n",
    "\n",
+    "Official documentation:\n",
+    "https://packaging.python.org/tutorials/distributing-packages/\n",
+    "\n",
+    "\n",
    "<a class=\"anchor\" id=\"recommended-project-structure\"></a>\n",
    "## Recommended project structure\n",
    "\n",

 %% Cell type:markdown id: tags:

 # Structuring a Python project


 If you are writing code that you are sure will never be seen or used by
 anybody else, then you can structure your project however you want, and you
 can stop reading now.


 However, if you are intending to make your code available for others to use,
 either as end users, or as a dependency of their own code, you will make their
 lives much easier if you spend a little time organising your project
 directory.


 * [Recommended project structure](#recommended-project-structure)
 * [The `mypackage/` directory](#the-mypackage-directory)
 * [`README`](#readme)
 * [`LICENSE`](#license)
 * [`requirements.txt`](#requirements-txt)
 * [`setup.py`](#setup-py)
 * [Appendix: Tests](#appendix-tests)
 * [Appendix: Versioning](#appendix-versioning)
 * [Include the version in your code](#include-the-version-in-your-code)
 * [Deprecate, don't remove!](#deprecate-dont-remove)
 * [Appendix: Cookiecutter](#appendix-cookiecutter)


+Official documentation:
+https://packaging.python.org/tutorials/distributing-packages/
+
+
 <a class="anchor" id="recommended-project-structure"></a>
 ## Recommended project structure


 A Python project directory should, at the very least, have a structure that
 resembles the following:


 > ```
 >   myproject/
 >       mypackage/
 >           __init__.py
 >           mymodule.py
 >       README
 >       LICENSE
 >       requirements.txt
 >       setup.py
 > ```


 This example structure is in the `example_project/` sub-directory - have a
 look through it if you like.


 <a class="anchor" id="the-mypackage-directory"></a>
 ### The `mypackage/` directory


 The first thing you should do is make sure that all of your python code is
 organised into a sensibly-named
 [_package_](https://docs.python.org/3.5/tutorial/modules.html#packages). This
 is important, because it greatly reduces the possibility of naming collisions
 when people install your library alongside other libraries.  Hands up those of
 you who have ever written a file called `utils.[py|m|c|cpp]`!


 Check out the `advanced_topics/02_modules_and_packages.ipynb` practical for
 more details on packages in Python.


 <a class="anchor" id="readme"></a>
 ### `README`


 Every project should have a README file. This is simply a plain text file
 which describes your project and how to use it. It is common and acceptable
 for a README file to be written in plain text,
 [reStructuredText](http://www.sphinx-doc.org/en/stable/rest.html)
 (`README.rst`), or
 [markdown](https://guides.github.com/features/mastering-markdown/)
 (`README.md`).


 <a class="anchor" id="license"></a>
 ### `LICENSE`


 Having a LICENSE file makes it easy for people to understand the constraints
 under which your code can be used.


 <a class="anchor" id="requirements-txt"></a>
 ### `requirements.txt`


 This file is not strictly necessary, but is very common in Python projects.
 It contains a list of the Python-based dependencies of your project, in a
 standardised syntax. You can specify the exact version, or range of versions,
 that your project requires. For example:


 >     six==1.*
 >     numpy==1.*
 >     scipy>=0.18,<2
 >     nibabel==2.*


 If your project has optional dependencies, i.e. libraries which are not
 critical but, if present, will allow your project to offer some extra
 features, you can list them in a separate requirements file called, for
 example, `requirements-extra.txt`.


 Having all your dependencies listed in a file in this way makes it easy for
 others to install the dependencies needed by your project, simply by running:


 >     pip install -r requirements.txt


 <a class="anchor" id="setup-py"></a>
 ### `setup.py`


 This is the most important file (apart from your code, of course). Python
 projects are installed using
 [`setuptools`](https://setuptools.readthedocs.io/en/latest/), which is used
 internally during both the creation of, and installation of Python libraries.


 The `setup.py` file in a Python project is akin to a `Makefile` in a C/C++
 project. But `setup.py` is also the location where you can define project
 metadata (e.g. name, author, URL, etc) in a standardised format and, if
 necessary, customise aspects of the build process for your library.


 You generally don't need to worry about, or interact with `setuptools` at all.
 With one exception - `setup.py` is a Python script, and its main job is to
 call the `setuptools.setup` function, passing it information about your
 project.


 The `setup.py` for our example project might look like this:


 > ```
 > #!/usr/bin/env python
 >
 > from setuptools import setup
 >
 > # Import version number from
 > # the project package (see
 > # the section on versioning).
 > from mypackage import __version__
 >
 > # Read in requirements from
 > # the requirements.txt file.
 > with open('requirements.txt', 'rt') as f:
 >     requirements = [l.strip() for l in f.readlines()]
 >
 > setup(
 >
 >     name='Example project',
 >     description='Example Python project for PyTreat',
 >     url='https://git.fmrib.ox.ac.uk/fsl/pytreat-2018-practicals/',
 >     author='Paul McCarthy',
 >     author_email='pauldmccarthy@gmail.com',
 >     license='Apache License Version 2.0',
 >
 >     version=__version__,
 >
 >     install_requires=requirements,
 >
 >     classifiers=[
 >         'Development Status :: 3 - Alpha',
 >         'Intended Audience :: Developers',
 >         'License :: OSI Approved :: Apache Software License',
 >         'Programming Language :: Python :: 2.7',
 >         'Programming Language :: Python :: 3.4',
 >         'Programming Language :: Python :: 3.5',
 >         'Programming Language :: Python :: 3.6',
 >         'Topic :: Software Development :: Libraries :: Python Modules'],
 > )
 > ```


 The `setup` function gets passed all of your project's metadata, including its
 version number, depedencies, and licensing information. The `classifiers`
 argument should contain a list of
 [classifiers](https://pypi.python.org/pypi?%3Aaction=list_classifiers) which
 are applicable to your project. Classifiers are purely for descriptive
 purposes - they can be used to aid people in finding your project on
 [`PyPI`](https://pypi.python.org/pypi), if you release it there.


 See
 [here](https://packaging.python.org/tutorials/distributing-packages/#setup-args)
 for more details on `setup.py` and the `setup` function.


 <a class="anchor" id="appendix-tests"></a>
 ## Appendix: Tests


 There are no strict rules for where to put your tests (you have tests,
 right?). There are two main conventions:


 You can store your test files _inside_ your package directory:


 > ```
 > myproject/
 >     mypackage/
 >         __init__.py
 >         mymodule.py
 >         tests/
 >             __init__.py
 >             test_mymodule.py
 > ```



 Or, you can store your test files _alongside_ your package directory:


 > ```
 > myproject/
 >     mypackage/
 >         __init__.py
 >         mymodule.py
 >     tests/
 >         test_mymodule.py
 > ```


 If you want your test code to be completely independent of your project's
 code, then go with the second option.  However, if you would like your test
 code to be distributed as part of your project (e.g. so that end users can run
 them), then the first option is probably the best.


 But in the end, the standard Python unit testing frameworks
 ([`pytest`](https://docs.pytest.org/en/latest/) and
 [`nose`](http://nose2.readthedocs.io/en/latest/)) are pretty good at finding
 your test functions no matter where you've hidden them, so the choice is
 really up to you.


 <a class="anchor" id="appendix-versioning"></a>
 ## Appendix: Versioning


 If you are intending to make your project available for public use (e.g. on
 [PyPI](https://pypi.python.org/pypi) and/or
 [conda](https://anaconda.org/anaconda/repo)), it is __very important__ to
 manage the version number of your project. If somebody decides to build their
 software on top of your project, they are not going to be very happy with you
 if you make substantial, API-breaking changes without changing your version
 number in an appropriate manner.


 Python has [official standards](https://www.python.org/dev/peps/pep-0440/) on
 what constitutes a valid version number. These standards can be quite
 complicated but, in the vast majority of cases, a simple three-number
 versioning scheme comprising _major_, _minor_, and _patch_ release
 numbers should suffice. Such a version number has the form:


 >     major.minor.patch


 For example, a version number of `1.3.2` has a _major_ release of 1, _minor_
 release of 3, and a _patch_ release of 2.


 If you follow some simple and rational guidelines for versioning
 `your_project`, then people who use your project can, for instance, specify
 that they depend on `your_project==1.*`, and be sure that their code will work
 for _any_ version of `your_project` with a major release of 1. Following these
 simple guidelines greatly improves software interoperability, and makes
 everybody (i.e. developers of other projects, and end users) much happier!


 Many modern Python projects use some form of [_semantic
 versioning_](https://semver.org/). Semantic versioning is simply a set of
 guidelines on how to manage your version number:


 - The _major_ release number should be incremented whenever you introduce any
   backwards-incompatible changes. In other words, if you change your code
   such that some other code which uses your code would break, you should
   increment the major release number.

 - The _minor_ release number should be incremented whenever you add any new
   (backwards-compatible) features to your project.

 - The _patch_ release number should be incremented for backwards-compatible
   bug-fixes and other minor changes.


 If you like to automate things,
 [`bumpversion`](https://github.com/peritus/bumpversion) is a simple tool that
 you can use to help manage your version number.


 <a class="anchor" id="include-the-version-in-your-code"></a>
 ### Include the version in your code


 While the version of a library is ultimately defined in `setup.py`, it is
 standard practice for a Python library to contain a version string called
 `__version__` in the `__init__.py` file of the top-level package. For example,
 our `example_project/mypackage/__init__.py` file contains this line:


 >     __version__ = '0.1.0'


 This makes a library's version number programmatically accessible and
 queryable.


 <a class="anchor" id="deprecate-dont-remove"></a>
 ### Deprecate, don't remove!


 If you really want to change your API, but can't bring yourself to increment
 your major release number, consider
 [_deprecating_](https://en.wikipedia.org/wiki/Deprecation#Software_deprecation)
 the old API, and postponing its removal until you are ready for a major
 release. This will allow you to change your API, but retain
 backwards-compatilbiity with the old API until it can safely be removed at the
 next major release.


 You can use the built-in
 [`warnings`](https://docs.python.org/3.5/library/exceptions.html#DeprecationWarning)
 module to warn about uses of deprecated items. There are also some
 [third-party libraries](https://github.com/briancurtin/deprecation) which make
 it easy to mark a function, method or class as being deprecated.


 <a class="anchor" id="appendix-cookiecutter"></a>
 ## Appendix: Cookiecutter


 It is worth mentioning
 [Cookiecutter](https://github.com/audreyr/cookiecutter), a little utility
 program which you can use to generate a skeleton file/directory structure for
 a new Python project.


 You need to give it a template (there are many available templates, including
 for projects in languages other than Python) - a couple of useful templates
 are the [minimal Python package
 template](https://github.com/kragniz/cookiecutter-pypackage-minimal), and the
 [full Python package
 template](https://github.com/audreyr/cookiecutter-pypackage) (although the
 latter is probably overkill for most).


 Here is how to create a skeleton project directory based off the minimal
 Python packagetemplate:


 > ```
 > pip install cookiecutter
 >
 > # tell cookiecutter to create a directory
 > # from the pypackage-minimal template
 > cookiecutter https://github.com/kragniz/cookiecutter-pypackage-minimal.git
 >
 > # cookiecutter will then prompt you for
 > # basic information (e.g. projectname,
 > # author name/email), and then create a
 > # new directory containing the project
 > # skeleton.
 > ```

--- a/talks/structuring/structuring.md
+++ b/talks/structuring/structuring.md
@@ -25,6 +25,10 @@ directory.
 * [Appendix: Cookiecutter](#appendix-cookiecutter)


+Official documentation:
+https://packaging.python.org/tutorials/distributing-packages/
+
+
 <a class="anchor" id="recommended-project-structure"></a>
 ## Recommended project structure