Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • fsl/pytreat-practicals-2020
  • mchiew/pytreat-practicals-2020
  • ndcn0236/pytreat-practicals-2020
  • nichols/pytreat-practicals-2020
4 results
Show changes
Showing
with 12050 additions and 0 deletions
This diff is collapsed.
# Operator overloading
> This practical assumes you are familiar with the basics of object-oriented
> programming in Python.
Operator overloading, in an object-oriented programming language, is the
process of customising the behaviour of _operators_ (e.g. `+`, `*`, `/` and
`-`) on user-defined types. This practical aims to show you that operator
overloading is **very** easy to do in Python.
This practical gives a brief overview of the operators which you may be most
interested in implementing. However, there are many operators (and other
special methods) which you can support in your own classes - the [official
documentation](https://docs.python.org/3/reference/datamodel.html#basic-customization)
is the best reference if you are interested in learning more.
* [Overview](#overview)
* [Arithmetic operators](#arithmetic-operators)
* [Equality and comparison operators](#equality-and-comparison-operators)
* [The indexing operator `[]`](#the-indexing-operator)
* [The call operator `()`](#the-call-operator)
* [The dot operator `.`](#the-dot-operator)
<a class="anchor" id="overview"></a>
## Overview
In Python, when you add two numbers together:
```
a = 5
b = 10
r = a + b
print(r)
```
What actually goes on behind the scenes is this:
```
r = a.__add__(b)
print(r)
```
In other words, whenever you use the `+` operator on two variables (the
operands to the `+` operator), the Python interpreter calls the `__add__`
method of the first operand (`a`), and passes the second operand (`b`) as an
argument.
So it is very easy to use the `+` operator with our own classes - all we have
to do is implement a method called `__add__`.
<a class="anchor" id="arithmetic-operators"></a>
## Arithmetic operators
Let's play with an example - a class which represents a 2D vector:
```
class Vector2D(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y)
```
> Note that we have implemented the special `__str__` method, which allows our
> `Vector2D` instances to be converted into strings.
If we try to use the `+` operator on this class, we are bound to get an error:
```
v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5)
print(v1 + v2)
```
But all we need to do to support the `+` operator is to implement a method
called `__add__`:
```
class Vector2D(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y)
def __add__(self, other):
return Vector2D(self.x + other.x,
self.y + other.y)
```
And now we can use `+` on `Vector2D` objects - it's that easy:
```
v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5)
print('{} + {} = {}'.format(v1, v2, v1 + v2))
```
Our `__add__` method creates and returns a new `Vector2D` which contains the
sum of the `x` and `y` components of the `Vector2D` on which it is called, and
the `Vector2D` which is passed in. We could also make the `__add__` method
work with scalars, by extending its definition a bit:
```
class Vector2D(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
if isinstance(other, Vector2D):
return Vector2D(self.x + other.x,
self.y + other.y)
else:
return Vector2D(self.x + other, self.y + other)
def __str__(self):
return 'Vector2D({}, {})'.format(self.x, self.y)
```
So now we can add both `Vector2D` instances and scalars numbers together:
```
v1 = Vector2D(2, 3)
v2 = Vector2D(4, 5)
n = 6
print('{} + {} = {}'.format(v1, v2, v1 + v2))
print('{} + {} = {}'.format(v1, n, v1 + n))
```
Other numeric and logical operators can be supported by implementing the
appropriate method, for example:
- Multiplication (`*`): `__mul__`
- Division (`/`): `__div__`
- Negation (`-`): `__neg__`
- In-place addition (`+=`): `__iadd__`
- Exclusive or (`^`): `__xor__`
When an operator is applied to operands of different types, a set of fall-back
rules are followed depending on the set of methods implemented on the
operands. For example, in the expression `a + b`, if `a.__add__` is not
implemented, but but `b.__radd__` is implemented, then the latter will be
called. Take a look at the [official
documentation](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types)
for further details, including a full list of the arithmetic and logical
operators that your classes can support.
<a class="anchor" id="equality-and-comparison-operators"></a>
## Equality and comparison operators
Adding support for equality (`==`, `!=`) and comparison (e.g. `>=`) operators
is just as easy. Imagine that we have a class called `Label`, which represents
a label in a lookup table. Our `Label` has an integer label, a name, and an
RGB colour:
```
class Label(object):
def __init__(self, label, name, colour):
self.label = label
self.name = name
self.colour = colour
```
In order to ensure that a list of `Label` objects is ordered by their label
values, we can implement a set of functions, so that `Label` classes can be
compared using the standard comparison operators:
```
import functools
# Don't worry about this statement
# just yet - it is explained below
@functools.total_ordering
class Label(object):
def __init__(self, label, name, colour):
self.label = label
self.name = name
self.colour = colour
def __str__(self):
rgb = ''.join(['{:02x}'.format(c) for c in self.colour])
return 'Label({}, {}, #{})'.format(self.label, self.name, rgb)
def __repr__(self):
return str(self)
# implement Label == Label
def __eq__(self, other):
return self.label == other.label
# implement Label < Label
def __lt__(self, other):
return self.label < other.label
```
> We also added `__str__` and `__repr__` methods to the `Label` class so that
> `Label` instances will be printed nicely.
Now we can compare and sort our `Label` instances:
```
l1 = Label(1, 'Parietal', (255, 0, 0))
l2 = Label(2, 'Occipital', ( 0, 255, 0))
l3 = Label(3, 'Temporal', ( 0, 0, 255))
print('{} > {}: {}'.format(l1, l2, l1 > l2))
print('{} < {}: {}'.format(l1, l3, l1 <= l3))
print('{} != {}: {}'.format(l2, l3, l2 != l3))
print(sorted((l3, l1, l2)))
```
The
[`@functools.total_ordering`](https://docs.python.org/3/library/functools.html#functools.total_ordering)
is a convenience
[decorator](https://docs.python.org/3/glossary.html#term-decorator) which,
given a class that implements equality and a single comparison function
(`__lt__` in the above code), will "fill in" the remainder of the comparison
operators. If you need very specific or complicated behaviour, then you can
provide methods for _all_ of the comparison operators, e.g. `__gt__` for `>`,
`__ge__` for `>=`, etc.).
> Decorators are introduced in another practical.
But if you just want the operators to work in the conventional manner, you can
simply use the `@functools.total_ordering` decorator, and provide `__eq__`,
and just one of `__lt__`, `__le__`, `__gt__` or `__ge__`.
Refer to the [official
documentation](https://docs.python.org/3/reference/datamodel.html#object.__lt__)
for all of the details on supporting comparison operators.
> You may see the `__cmp__` method in older code bases - this provides a
> C-style comparison function which returns `<0`, `0`, or `>0` based on
> comparing two items. This has been superseded by the rich comparison
> operators introduced here, and is no longer supported in Python 3.
<a class="anchor" id="the-indexing-operator"></a>
## The indexing operator `[]`
The indexing operator (`[]`) is generally used by "container" types, such as
the built-in `list` and `dict` classes.
At its essence, there are only three types of behaviours that are possible
with the `[]` operator. All that is needed to support them are to implement
three special methods in your class, regardless of whether your class will be
indexed by sequential integers (like a `list`) or by
[hashable](https://docs.python.org/3/glossary.html#term-hashable) values
(like a `dict`):
- **Retrieval** is performed by the `__getitem__` method
- **Assignment** is performed by the `__setitem__` method
- **Deletion** is performed by the `__delitem__` method
Note that, if you implement these methods in your own class, there is no
requirement for them to actually provide any form of data storage or
retrieval. However if you don't, you will probably confuse users of your code
who are used to how the `list` and `dict` types work. Whenever you deviate
from conventional behaviour, make sure you explain it well in your
documentation!
The following contrived example demonstrates all three behaviours:
```
class TwoTimes(object):
def __init__(self):
self.__deleted = set()
self.__assigned = {}
def __getitem__(self, key):
if key in self.__deleted:
raise KeyError('{} has been deleted!'.format(key))
elif key in self.__assigned:
return self.__assigned[key]
else:
return key * 2
def __setitem__(self, key, value):
self.__assigned[key] = value
def __delitem__(self, key):
self.__deleted.add(key)
```
Guess what happens whenever we index a `TwoTimes` object:
```
tt = TwoTimes()
print('TwoTimes[{}] = {}'.format(2, tt[2]))
print('TwoTimes[{}] = {}'.format(6, tt[6]))
print('TwoTimes[{}] = {}'.format('abc', tt['abc']))
```
The `TwoTimes` class allows us to override the value for a specific key:
```
print(tt[4])
tt[4] = 'this is not 4 * 4'
print(tt[4])
```
And we can also "delete" keys:
```
print(tt['12345'])
del tt['12345']
# this is going to raise an error
print(tt['12345'])
```
If you wish to support the Python `start:stop:step` [slice
notation](https://docs.python.org/3/library/functions.html#slice), you
simply need to write your `__getitem__` and `__setitem__` methods so that they
can detect `slice` objects:
```
class TwoTimes(object):
def __init__(self, max):
self.__max = max
def __getitem__(self, key):
if isinstance(key, slice):
start = key.start or 0
stop = key.stop or self.__max
step = key.step or 1
else:
start = key
stop = key + 1
step = 1
return [i * 2 for i in range(start, stop, step)]
```
Now we can "slice" a `TwoTimes` instance:
```
tt = TwoTimes(10)
print(tt[5])
print(tt[3:7])
print(tt[::2])
```
> It is possible to sub-class the built-in `list` and `dict` classes if you
> wish to extend their functionality in some way. However, if you are writing
> a class that should mimic the one of the `list` or `dict` classes, but work
> in a different way internally (e.g. a `dict`-like object which uses a
> different hashing algorithm), the `Sequence` and `MutableMapping` classes
> are [a better choice](https://stackoverflow.com/a/7148602) - you can find
> them in the
> [`collections.abc`](https://docs.python.org/3/library/collections.abc.html)
> module.
<a class="anchor" id="the-call-operator"></a>
## The call operator `()`
Remember how everything in Python is an object, even functions? When you call
a function, a method called `__call__` is called on the function object. We can
implement the `__call__` method on our own class, which will allow us to "call"
objects as if they are functions.
For example, the `TimedFunction` class allows us to calculate the execution
time of any function:
```
import time
class TimedFunction(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
print('Timing {}...'.format(self.func.__name__))
start = time.time()
retval = self.func(*args, **kwargs)
end = time.time()
print('Elapsed time: {:0.2f} seconds'.format(end - start))
return retval
```
Let's see how the `TimedFunction` behaves:
```
import numpy as np
import numpy.linalg as npla
def inverse(data):
return npla.inv(data)
tf = TimedFunction(inverse)
data = np.random.random((5000, 5000))
# Wait a few seconds after
# running this code block!
inv = tf(data)
```
> The `TimedFunction` class is conceptually very similar to a
> [decorator](https://docs.python.org/3/glossary.html#term-decorator) -
> decorators are covered in another practical.
<a class="anchor" id="the-dot-operator"></a>
## The dot operator `.`
Python allows us to override the `.` (dot) operator which is used to access
the attributes and methods of an object. This is very powerful, but is also
quite a niche feature, and it is easy to trip yourself up, so if you wish to
use this in your own project, make sure that you carefully read (and
understand) [the
documentation](https://docs.python.org/3/reference/datamodel.html#customizing-attribute-access),
and test your code comprehensively!
For this example, we need a little background information. OpenGL includes
the native data types `vec2`, `vec3`, and `vec4`, which can be used to
represent 2, 3, or 4 component vectors respectively. These data types have a
neat feature called [_swizzling_][glslref], which allows you to access any
component (`x`,`y`, `z`, `w` for vectors, or `r`, `g`, `b`, `a` for colours)
in any order, with a syntax similar to attribute access in Python.
[glslref]: https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling
So here is an example which implements this swizzle-style attribute access on
a class called `Vector`, in which we have customised the behaviour of the `.`
operator:
```
class Vector(object):
def __init__(self, xyz):
self.__xyz = list(xyz)
def __str__(self):
return 'Vector({})'.format(self.__xyz)
def __getattr__(self, key):
# Swizzling behaviour only occurs when
# the attribute name is entirely comprised
# of 'x', 'y', and 'z'.
if not all([c in 'xyz' for c in key]):
raise AttributeError(key)
key = ['xyz'.index(c) for c in key]
return [self.__xyz[c] for c in key]
def __setattr__(self, key, value):
# Restrict swizzling behaviour as above
if not all([c in 'xyz' for c in key]):
return super().__setattr__(key, value)
if len(key) == 1:
value = (value,)
idxs = ['xyz'.index(c) for c in key]
for i, v in sorted(zip(idxs, value)):
self.__xyz[i] = v
```
And here it is in action:
```
v = Vector((1, 2, 3))
print('v: ', v)
print('xyz: ', v.xyz)
print('zy: ', v.zy)
print('xx: ', v.xx)
v.xz = 10, 30
print(v)
v.y = 20
print(v)
```
This diff is collapsed.
# Context managers
The recommended way to open a file in Python is via the `with` statement:
```
with open('05_context_managers.md', 'rt') as f:
firstlines = f.readlines()[:4]
firstlines = [l.strip() for l in firstlines]
print('\n'.join(firstlines))
```
This is because the `with` statement ensures that the file will be closed
automatically, even if an error occurs inside the `with` statement.
The `with` statement is obviously hiding some internal details from us. But
these internals are in fact quite straightforward, and are known as [*context
managers*](https://docs.python.org/3/reference/datamodel.html#context-managers).
* [Anatomy of a context manager](#anatomy-of-a-context-manager)
* [Why not just use `try ... finally`?](#why-not-just-use-try-finally)
* [Uses for context managers](#uses-for-context-managers)
* [Handling errors in `__exit__`](#handling-errors-in-exit)
* [Suppressing errors with `__exit__`](#suppressing-errors-with-exit)
* [Nesting context managers](#nesting-context-managers)
* [Functions as context managers](#functions-as-context-managers)
* [Methods as context managers](#methods-as-context-managers)
* [Useful references](#useful-references)
<a class="anchor" id="anatomy-of-a-context-manager"></a>
## Anatomy of a context manager
A *context manager* is simply an object which has two specially named methods
`__enter__` and `__exit__`. Any object which has these methods can be used in
a `with` statement.
Let's define a context manager class that we can play with:
```
class MyContextManager(object):
def __enter__(self):
print('In enter')
def __exit__(self, *args):
print('In exit')
```
Now, what happens when we use `MyContextManager` in a `with` statement?
```
with MyContextManager():
print('In with block')
```
So the `__enter__` method is called before the statements in the `with` block,
and the `__exit__` method is called afterwards.
Context managers are that simple. What makes them really useful though, is
that the `__exit__` method will be called even if the code in the `with` block
raises an error. The error will be held, and only raised after the `__exit__`
method has finished:
```
with MyContextManager():
print('In with block')
assert 1 == 0
```
This means that we can use context managers to perform any sort of clean up or
finalisation logic that we always want to have executed.
<a class="anchor" id="why-not-just-use-try-finally"></a>
### Why not just use `try ... finally`?
Context managers do not provide anything that cannot be accomplished in other
ways. For example, we could accomplish very similar behaviour using
[`try` - `finally` logic](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) -
the statements in the `finally` clause will *always* be executed, whether an
error is raised or not:
```
print('Before try block')
try:
print('In try block')
assert 1 == 0
finally:
print('In finally block')
```
But context managers have the advantage that you can implement your clean-up
logic in one place, and re-use it as many times as you want.
<a class="anchor" id="uses-for-context-managers"></a>
## Uses for context managers
We have already talked about how context managers can be used to perform any
task which requires some initialistion and/or clean-up logic. As an example,
here is a context manager which creates a temporary directory, and then makes
sure that it is deleted afterwards.
```
import os
import shutil
import tempfile
class TempDir(object):
def __enter__(self):
self.tempDir = tempfile.mkdtemp()
self.prevDir = os.getcwd()
print('Changing to temp dir: {}'.format(self.tempDir))
print('Previous directory: {}'.format(self.prevDir))
os.chdir(self.tempDir)
def __exit__(self, *args):
print('Changing back to: {}'.format(self.prevDir))
print('Removing temp dir: {}'.format(self.tempDir))
os .chdir( self.prevDir)
shutil.rmtree(self.tempDir)
```
Now imagine that we have a function which loads data from a file, and performs
some calculation on it:
```
import numpy as np
def complexAlgorithm(infile):
data = np.loadtxt(infile)
return data.mean()
```
We could use the `TempDir` context manager to write a test case for this
function, and not have to worry about cleaning up the test data:
```
with TempDir():
print('Testing complex algorithm')
data = np.random.random((100, 100))
np.savetxt('data.txt', data)
result = complexAlgorithm('data.txt')
assert result > 0.1 and result < 0.9
print('Test passed (result: {})'.format(result))
```
<a class="anchor" id="handling-errors-in-exit"></a>
### Handling errors in `__exit__`
By now you must be [panicking](https://youtu.be/cSU_5MgtDc8?t=9) about why I
haven't mentioned those conspicuous `*args` that get passed to the`__exit__`
method. It turns out that a context manager's [`__exit__`
method](https://docs.python.org/3/reference/datamodel.html#object.__exit__)
is always passed three arguments.
Let's adjust our `MyContextManager` class a little so we can see what these
arguments are for:
```
class MyContextManager(object):
def __enter__(self):
print('In enter')
def __exit__(self, arg1, arg2, arg3):
print('In exit')
print(' arg1: ', arg1)
print(' arg2: ', arg2)
print(' arg3: ', arg3)
```
If the code inside the `with` statement does not raise an error, these three
arguments will all be `None`.
```
with MyContextManager():
print('In with block')
```
However, if the code inside the `with` statement raises an error, things look
a little different:
```
with MyContextManager():
print('In with block')
raise ValueError('Oh no!')
```
So when an error occurs, the `__exit__` method is passed the following:
- The [`Exception`](https://docs.python.org/3/tutorial/errors.html)
type that was raised.
- The `Exception` instance that was raised.
- A [`traceback`](https://docs.python.org/3/library/traceback.html) object
which can be used to get more information about the exception (e.g. line
number).
<a class="anchor" id="suppressing-errors-with-exit"></a>
### Suppressing errors with `__exit__`
The `__exit__` method is also capable of suppressing errors - if it returns a
value of `True`, then any error that was raised will be ignored. For example,
we could write a context manager which ignores any assertion errors, but
allows other errors to halt execution as normal:
```
class MyContextManager(object):
def __enter__(self):
print('In enter')
def __exit__(self, arg1, arg2, arg3):
print('In exit')
if issubclass(arg1, AssertionError):
return True
print(' arg1: ', arg1)
print(' arg2: ', arg2)
print(' arg3: ', arg3)
```
> Note that if a function or method does not explicitly return a value, its
> return value is `None` (which would evaluate to `False` when converted to a
> `bool`). Also note that we are using the built-in
> [`issubclass`](https://docs.python.org/3/library/functions.html#issubclass)
> function, which allows us to test the type of a class.
Now, when we use `MyContextManager`, any assertion errors are suppressed,
whereas other errors will be raised as normal:
```
with MyContextManager():
assert 1 == 0
print('Continuing execution!')
with MyContextManager():
raise ValueError('Oh no!')
```
<a class="anchor" id="nesting-context-managers"></a>
## Nesting context managers
It is possible to nest `with` statements:
```
with open('05_context_managers.md', 'rt') as inf:
with TempDir():
with open('05_context_managers.md.copy', 'wt') as outf:
outf.write(inf.read())
```
You can also use multiple context managers in a single `with` statement:
```
with open('05_context_managers.md', 'rt') as inf, \
TempDir(), \
open('05_context_managers.md.copy', 'wt') as outf:
outf.write(inf.read())
```
<a class="anchor" id="functions-as-context-managers"></a>
## Functions as context managers
In fact, there is another way to create context managers in Python. The
built-in [`contextlib`
module](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
has a decorator called `@contextmanager`, which allows us to turn __any
function__ into a context manager. The only requirement is that the function
must have a `yield` statement<sup>1</sup>. So we could rewrite our `TempDir`
class from above as a function:
```
import os
import shutil
import tempfile
import contextlib
@contextlib.contextmanager
def tempdir():
tdir = tempfile.mkdtemp()
prevdir = os.getcwd()
try:
os.chdir(tdir)
yield tdir
finally:
os.chdir(prevdir)
shutil.rmtree(tdir)
```
This new `tempdir` function is used in exactly the same way as our `TempDir`
class:
```
print('In directory: {}'.format(os.getcwd()))
with tempdir() as tmp:
print('Now in directory: {}'.format(os.getcwd()))
print('Back in directory: {}'.format(os.getcwd()))
```
The `yield tdir` statement in our `tempdir` function causes the `tdir` value
to be passed to the `with` statement, so in the line `with tempdir() as tmp`,
the variable `tmp` will be given the value `tdir`.
> <sup>1</sup> The `yield` keyword is used in *generator functions*.
> Functions which are used with the `@contextmanager` decorator must be
> generator functions which yield exactly one value.
> [Generators](https://www.python.org/dev/peps/pep-0289/) and [generator
> functions](https://docs.python.org/3/glossary.html#term-generator) are
> beyond the scope of this practical.
<a class="anchor" id="methods-as-context-managers"></a>
## Methods as context managers
Since it is possible to write a function which is a context manager, it is of
course also possible to write a _method_ which is a context manager. Let's
play with another example. We have a `Notifier` class which can be used to
notify interested listeners when an event occurs. Listeners can be registered
for notification via the `register` method:
```
from collections import OrderedDict
class Notifier(object):
def __init__(self):
super().__init__()
self.listeners = OrderedDict()
def register(self, name, func):
self.listeners[name] = func
def notify(self):
for listener in self.listeners.values():
listener()
```
Now, let's build a little plotting application. First of all, we have a `Line`
class, which represents a line plot. The `Line` class is a sub-class of
`Notifier`, so whenever its display properties (`colour`, `width`, or `name`)
change, it emits a notification, and whatever is drawing it can refresh the
display:
```
import numpy as np
class Line(Notifier):
def __init__(self, data):
super().__init__()
self.__data = data
self.__colour = '#000000'
self.__width = 1
self.__name = 'line'
@property
def xdata(self):
return np.arange(len(self.__data))
@property
def ydata(self):
return np.copy(self.__data)
@property
def colour(self):
return self.__colour
@colour.setter
def colour(self, newColour):
self.__colour = newColour
print('Line: colour changed: {}'.format(newColour))
self.notify()
@property
def width(self):
return self.__width
@width.setter
def width(self, newWidth):
self.__width = newWidth
print('Line: width changed: {}'.format(newWidth))
self.notify()
@property
def name(self):
return self.__name
@name.setter
def name(self, newName):
self.__name = newName
print('Line: name changed: {}'.format(newName))
self.notify()
```
Now let's write a `Plotter` class, which can plot one or more `Line`
instances:
```
import matplotlib.pyplot as plt
class Plotter(object):
def __init__(self, axis):
self.__axis = axis
self.__lines = []
def addData(self, data):
line = Line(data)
self.__lines.append(line)
line.register('plot', self.lineChanged)
self.draw()
return line
def lineChanged(self):
self.draw()
def draw(self):
print('Plotter: redrawing plot')
ax = self.__axis
ax.clear()
for line in self.__lines:
ax.plot(line.xdata,
line.ydata,
color=line.colour,
linewidth=line.width,
label=line.name)
ax.legend()
```
Let's create a `Plotter` object, and add a couple of lines to it (note that
the `matplotlib` plot will open in a separate window):
```
# this line is only necessary when
# working in jupyer notebook/ipython
%matplotlib
fig = plt.figure()
ax = fig.add_subplot(111)
plotter = Plotter(ax)
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
fig.show()
```
Now, when we change the properties of our `Line` instances, the plot will be
automatically updated:
```
l1.colour = '#ff0000'
l2.colour = '#00ff00'
l1.width = 2
l2.width = 2
l1.name = 'sine'
l2.name = 'cosine'
```
Pretty cool! However, this seems very inefficient - every time we change the
properties of a `Line`, the `Plotter` will refresh the plot. If we were
plotting large amounts of data, this would be unacceptable, as plotting would
simply take too long.
Wouldn't it be nice if we were able to perform batch-updates of `Line`
properties, and only refresh the plot when we are done? Let's add an extra
method to the `Plotter` class:
```
import contextlib
class Plotter(object):
def __init__(self, axis):
self.__axis = axis
self.__lines = []
self.__holdUpdates = False
def addData(self, data):
line = Line(data)
self.__lines.append(line)
line.register('plot', self.lineChanged)
if not self.__holdUpdates:
self.draw()
return line
def lineChanged(self):
if not self.__holdUpdates:
self.draw()
def draw(self):
print('Plotter: redrawing plot')
ax = self.__axis
ax.clear()
for line in self.__lines:
ax.plot(line.xdata,
line.ydata,
color=line.colour,
linewidth=line.width,
label=line.name)
ax.legend()
@contextlib.contextmanager
def holdUpdates(self):
self.__holdUpdates = True
try:
yield
self.draw()
finally:
self.__holdUpdates = False
```
This new `holdUpdates` method allows us to temporarily suppress notifications
from all `Line` instances. Let's create a new plot:
```
fig = plt.figure()
ax = fig.add_subplot(111)
plotter = Plotter(ax)
l1 = plotter.addData(np.sin(np.linspace(0, 6 * np.pi, 50)))
l2 = plotter.addData(np.cos(np.linspace(0, 6 * np.pi, 50)))
plt.show()
```
Now, we can update many `Line` properties without performing any redundant
redraws:
```
with plotter.holdUpdates():
l1.colour = '#0000ff'
l2.colour = '#ffff00'
l1.width = 1
l2.width = 1
l1.name = '$sin(x)$'
l2.name = '$cos(x)$'
```
<a class="anchor" id="useful-references"></a>
## Useful references
* [Context manager classes](https://docs.python.org/3/reference/datamodel.html#context-managers)
* The [`contextlib` module](https://docs.python.org/3/library/contextlib.html)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
File added
File added
0.99895165 0.0442250787 -0.01181694611 6.534548061
-0.0439203422 0.9987243849 0.02491016319 9.692178016
0.01290352651 -0.02436503913 0.9996197946 21.90296924
0 0 0 1
1.057622308 0.05073972589 0.008769375553 -31.51452272
-0.0541050194 0.9680196522 0.1445326796 2.872941273
-0.01020603009 -0.2324151706 1.127557283 21.35031106
0 0 0 1
File added
1.056802026 -0.01924547726 0.02614687181 -36.51723948
0.009055463297 0.9745460053 0.09056277052 -8.771603455
-0.04315832679 -0.1680837227 1.136420957 -1.399839791
0 0 0 1
File added
File added
%% Cell type:markdown id: tags:
# Structuring a Python project
If you are writing code that you are sure will never be seen or used by
anybody else, then you can structure your project however you want, and you
can stop reading now.
However, if you are intending to make your code available for others to use,
either as end users, or as a dependency of their own code, you will make their
lives much easier if you spend a little time organising your project
directory.
* [Recommended project structure](#recommended-project-structure)
* [The `mypackage/` directory](#the-mypackage-directory)
* [`README`](#readme)
* [`LICENSE`](#license)
* [`requirements.txt`](#requirements-txt)
* [`setup.py`](#setup-py)
* [Appendix: Tests](#appendix-tests)
* [Appendix: Versioning](#appendix-versioning)
* [Include the version in your code](#include-the-version-in-your-code)
* [Deprecate, don't remove!](#deprecate-dont-remove)
* [Appendix: Cookiecutter](#appendix-cookiecutter)
Official documentation:
https://packaging.python.org/tutorials/distributing-packages/
<a class="anchor" id="recommended-project-structure"></a>
## Recommended project structure
A Python project directory should, at the very least, have a structure that
resembles the following:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> README
> LICENSE
> requirements.txt
> setup.py
> ```
This example structure is in the `example_project/` sub-directory - have a
look through it if you like.
<a class="anchor" id="the-mypackage-directory"></a>
### The `mypackage/` directory
The first thing you should do is make sure that all of your python code is
organised into a sensibly-named
[*package*](https://docs.python.org/3/tutorial/modules.html#packages). This
is important, because it greatly reduces the possibility of naming collisions
when people install your library alongside other libraries. Hands up those of
you who have ever written a file called `utils.[py|m|c|cpp]`!
Check out the `advanced_topics/02_modules_and_packages.ipynb` practical for
more details on packages in Python.
<a class="anchor" id="readme"></a>
### `README`
Every project should have a README file. This is simply a plain text file
which describes your project and how to use it. It is common and acceptable
for a README file to be written in plain text,
[reStructuredText](http://www.sphinx-doc.org/en/stable/rest.html)
(`README.rst`), or
[markdown](https://guides.github.com/features/mastering-markdown/)
(`README.md`).
<a class="anchor" id="license"></a>
### `LICENSE`
Having a LICENSE file makes it easy for people to understand the constraints
under which your code can be used.
<a class="anchor" id="requirements-txt"></a>
### `requirements.txt`
This file is not strictly necessary, but is very common in Python projects.
It contains a list of the Python-based dependencies of your project, in a
standardised syntax. You can specify the exact version, or range of versions,
that your project requires. For example:
> ```
> six==1.*
> numpy==1.*
> scipy>=0.18
> nibabel==2.*
> ```
If your project has optional dependencies, i.e. libraries which are not
critical but, if present, will allow your project to offer some extra
features, you can list them in a separate requirements file called, for
example, `requirements-extra.txt`.
Having all your dependencies listed in a file in this way makes it easy for
others to install the dependencies needed by your project, simply by running:
> ```
> pip install -r requirements.txt
> ```
<a class="anchor" id="setup-py"></a>
### `setup.py`
This is the most important file (apart from your code, of course). Python
projects are installed using
[`setuptools`](https://setuptools.readthedocs.io/en/latest/), which is used
internally during both the creation of, and installation of Python libraries.
The `setup.py` file in a Python project is akin to a `Makefile` in a C/C++
project. But `setup.py` is also the location where you can define project
metadata (e.g. name, author, URL, etc) in a standardised format and, if
necessary, customise aspects of the build process for your library.
You generally don't need to worry about, or interact with `setuptools` at all.
With one exception - `setup.py` is a Python script, and its main job is to
call the `setuptools.setup` function, passing it information about your
project.
The `setup.py` for our example project might look like this:
> ```
> #!/usr/bin/env python
>
> from setuptools import setup
> from setuptools import find_packages
>
> # Import version number from
> # the project package (see
> # the section on versioning).
> from mypackage import __version__
>
> # Read in requirements from
> # the requirements.txt file.
> with open('requirements.txt', 'rt') as f:
> requirements = [l.strip() for l in f.readlines()]
>
> # Generate a list of all of the
> # packages that are in your project.
> packages = find_packages()
>
> setup(
>
> name='Example project',
> description='Example Python project for PyTreat',
> url='https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/',
> author='Paul McCarthy',
> author_email='pauldmccarthy@gmail.com',
> license='Apache License Version 2.0',
>
> packages=packages,
>
> version=__version__,
>
> install_requires=requirements,
>
> classifiers=[
> 'Development Status :: 3 - Alpha',
> 'Intended Audience :: Developers',
> 'License :: OSI Approved :: Apache Software License',
> 'Programming Language :: Python :: 2.7',
> 'Programming Language :: Python :: 3.4',
> 'Programming Language :: Python :: 3.5',
> 'Programming Language :: Python :: 3.6',
> 'Topic :: Software Development :: Libraries :: Python Modules'],
> )
> ```
The `setup` function gets passed all of your project's metadata, including its
version number, depedencies, and licensing information. The `classifiers`
argument should contain a list of
[classifiers](https://pypi.python.org/pypi?%3Aaction=list_classifiers) which
are applicable to your project. Classifiers are purely for descriptive
purposes - they can be used to aid people in finding your project on
[`PyPI`](https://pypi.python.org/pypi), if you release it there.
See
[here](https://packaging.python.org/tutorials/distributing-packages/#setup-args)
for more details on `setup.py` and the `setup` function.
<a class="anchor" id="appendix-tests"></a>
## Appendix: Tests
There are no strict rules for where to put your tests (you have tests,
right?). There are two main conventions:
You can store your test files *inside* your package directory:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> tests/
> __init__.py
> test_mymodule.py
> ```
Or, you can store your test files *alongside* your package directory:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> tests/
> test_mymodule.py
> ```
If you want your test code to be completely independent of your project's
code, then go with the second option. However, if you would like your test
code to be distributed as part of your project (e.g. so that end users can run
them), then the first option is probably the best.
But in the end, the standard Python unit testing frameworks
([`pytest`](https://docs.pytest.org/en/latest/) and
[`nose`](http://nose2.readthedocs.io/en/latest/)) are pretty good at finding
your test functions no matter where you've hidden them, so the choice is
really up to you.
<a class="anchor" id="appendix-versioning"></a>
## Appendix: Versioning
If you are intending to make your project available for public use (e.g. on
[PyPI](https://pypi.python.org/pypi) and/or
[conda](https://anaconda.org/anaconda/repo)), it is **very important** to
manage the version number of your project. If somebody decides to build their
software on top of your project, they are not going to be very happy with you
if you make substantial, API-breaking changes without changing your version
number in an appropriate manner.
Python has [official standards](https://www.python.org/dev/peps/pep-0440/) on
what constitutes a valid version number. These standards can be quite
complicated but, in the vast majority of cases, a simple three-number
versioning scheme comprising *major*, *minor*, and *patch* release
numbers should suffice. Such a version number has the form:
> ```
> major.minor.patch
> ```
For example, a version number of `1.3.2` has a _major_ release of 1, _minor_
release of 3, and a _patch_ release of 2.
If you follow some simple and rational guidelines for versioning
`your_project`, then people who use your project can, for instance, specify
that they depend on `your_project==1.*`, and be sure that their code will work
for *any* version of `your_project` with a major release of 1. Following these
simple guidelines greatly improves software interoperability, and makes
everybody (i.e. developers of other projects, and end users) much happier!
Many modern Python projects use some form of [*semantic
versioning*](https://semver.org/). Semantic versioning is simply a set of
guidelines on how to manage your version number:
- The *major* release number should be incremented whenever you introduce any
backwards-incompatible changes. In other words, if you change your code
such that some other code which uses your code would break, you should
increment the major release number.
- The *minor* release number should be incremented whenever you add any new
(backwards-compatible) features to your project.
- The *patch* release number should be incremented for backwards-compatible
bug-fixes and other minor changes.
If you like to automate things,
[`bumpversion`](https://github.com/peritus/bumpversion) is a simple tool that
you can use to help manage your version number.
<a class="anchor" id="include-the-version-in-your-code"></a>
### Include the version in your code
While the version of a library is ultimately defined in `setup.py`, it is
standard practice for a Python library to contain a version string called
`__version__` in the `__init__.py` file of the top-level package. For example,
our `example_project/mypackage/__init__.py` file contains this line:
> ```
> __version__ = '0.1.0'
> ```
This makes a library's version number programmatically accessible and
queryable.
<a class="anchor" id="deprecate-dont-remove"></a>
### Deprecate, don't remove!
If you really want to change your API, but can't bring yourself to increment
your major release number, consider
[*deprecating*](https://en.wikipedia.org/wiki/Deprecation#Software_deprecation)
the old API, and postponing its removal until you are ready for a major
release. This will allow you to change your API, but retain
backwards-compatilbiity with the old API until it can safely be removed at the
next major release.
You can use the built-in
[`warnings`](https://docs.python.org/3.5/library/exceptions.html#DeprecationWarning)
module to warn about uses of deprecated items. There are also some
[third-party libraries](https://github.com/briancurtin/deprecation) which make
it easy to mark a function, method or class as being deprecated.
<a class="anchor" id="appendix-cookiecutter"></a>
## Appendix: Cookiecutter
It is worth mentioning
[Cookiecutter](https://github.com/audreyr/cookiecutter), a little utility
program which you can use to generate a skeleton file/directory structure for
a new Python project.
You need to give it a template (there are many available templates, including
for projects in languages other than Python) - a couple of useful templates
are the [minimal Python package
template](https://github.com/kragniz/cookiecutter-pypackage-minimal), and the
[full Python package
template](https://github.com/audreyr/cookiecutter-pypackage) (although the
latter is probably overkill for most).
Here is how to create a skeleton project directory based off the minimal
Python packagetemplate:
> ```
> pip install cookiecutter
>
> # tell cookiecutter to create a directory
> # from the pypackage-minimal template
> cookiecutter https://github.com/kragniz/cookiecutter-pypackage-minimal.git
>
> # cookiecutter will then prompt you for
> # basic information (e.g. projectname,
> # author name/email), and then create a
> # new directory containing the project
> # skeleton.
> ```
# Structuring a Python project
If you are writing code that you are sure will never be seen or used by
anybody else, then you can structure your project however you want, and you
can stop reading now.
However, if you are intending to make your code available for others to use,
either as end users, or as a dependency of their own code, you will make their
lives much easier if you spend a little time organising your project
directory.
* [Recommended project structure](#recommended-project-structure)
* [The `mypackage/` directory](#the-mypackage-directory)
* [`README`](#readme)
* [`LICENSE`](#license)
* [`requirements.txt`](#requirements-txt)
* [`setup.py`](#setup-py)
* [Appendix: Tests](#appendix-tests)
* [Appendix: Versioning](#appendix-versioning)
* [Include the version in your code](#include-the-version-in-your-code)
* [Deprecate, don't remove!](#deprecate-dont-remove)
* [Appendix: Cookiecutter](#appendix-cookiecutter)
Official documentation:
https://packaging.python.org/tutorials/distributing-packages/
<a class="anchor" id="recommended-project-structure"></a>
## Recommended project structure
A Python project directory should, at the very least, have a structure that
resembles the following:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> README
> LICENSE
> requirements.txt
> setup.py
> ```
This example structure is in the `example_project/` sub-directory - have a
look through it if you like.
<a class="anchor" id="the-mypackage-directory"></a>
### The `mypackage/` directory
The first thing you should do is make sure that all of your python code is
organised into a sensibly-named
[*package*](https://docs.python.org/3/tutorial/modules.html#packages). This
is important, because it greatly reduces the possibility of naming collisions
when people install your library alongside other libraries. Hands up those of
you who have ever written a file called `utils.[py|m|c|cpp]`!
Check out the `advanced_topics/02_modules_and_packages.ipynb` practical for
more details on packages in Python.
<a class="anchor" id="readme"></a>
### `README`
Every project should have a README file. This is simply a plain text file
which describes your project and how to use it. It is common and acceptable
for a README file to be written in plain text,
[reStructuredText](http://www.sphinx-doc.org/en/stable/rest.html)
(`README.rst`), or
[markdown](https://guides.github.com/features/mastering-markdown/)
(`README.md`).
<a class="anchor" id="license"></a>
### `LICENSE`
Having a LICENSE file makes it easy for people to understand the constraints
under which your code can be used.
<a class="anchor" id="requirements-txt"></a>
### `requirements.txt`
This file is not strictly necessary, but is very common in Python projects.
It contains a list of the Python-based dependencies of your project, in a
standardised syntax. You can specify the exact version, or range of versions,
that your project requires. For example:
> ```
> six==1.*
> numpy==1.*
> scipy>=0.18
> nibabel==2.*
> ```
If your project has optional dependencies, i.e. libraries which are not
critical but, if present, will allow your project to offer some extra
features, you can list them in a separate requirements file called, for
example, `requirements-extra.txt`.
Having all your dependencies listed in a file in this way makes it easy for
others to install the dependencies needed by your project, simply by running:
> ```
> pip install -r requirements.txt
> ```
<a class="anchor" id="setup-py"></a>
### `setup.py`
This is the most important file (apart from your code, of course). Python
projects are installed using
[`setuptools`](https://setuptools.readthedocs.io/en/latest/), which is used
internally during both the creation of, and installation of Python libraries.
The `setup.py` file in a Python project is akin to a `Makefile` in a C/C++
project. But `setup.py` is also the location where you can define project
metadata (e.g. name, author, URL, etc) in a standardised format and, if
necessary, customise aspects of the build process for your library.
You generally don't need to worry about, or interact with `setuptools` at all.
With one exception - `setup.py` is a Python script, and its main job is to
call the `setuptools.setup` function, passing it information about your
project.
The `setup.py` for our example project might look like this:
> ```
> #!/usr/bin/env python
>
> from setuptools import setup
> from setuptools import find_packages
>
> # Import version number from
> # the project package (see
> # the section on versioning).
> from mypackage import __version__
>
> # Read in requirements from
> # the requirements.txt file.
> with open('requirements.txt', 'rt') as f:
> requirements = [l.strip() for l in f.readlines()]
>
> # Generate a list of all of the
> # packages that are in your project.
> packages = find_packages()
>
> setup(
>
> name='Example project',
> description='Example Python project for PyTreat',
> url='https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/',
> author='Paul McCarthy',
> author_email='pauldmccarthy@gmail.com',
> license='Apache License Version 2.0',
>
> packages=packages,
>
> version=__version__,
>
> install_requires=requirements,
>
> classifiers=[
> 'Development Status :: 3 - Alpha',
> 'Intended Audience :: Developers',
> 'License :: OSI Approved :: Apache Software License',
> 'Programming Language :: Python :: 2.7',
> 'Programming Language :: Python :: 3.4',
> 'Programming Language :: Python :: 3.5',
> 'Programming Language :: Python :: 3.6',
> 'Topic :: Software Development :: Libraries :: Python Modules'],
> )
> ```
The `setup` function gets passed all of your project's metadata, including its
version number, depedencies, and licensing information. The `classifiers`
argument should contain a list of
[classifiers](https://pypi.python.org/pypi?%3Aaction=list_classifiers) which
are applicable to your project. Classifiers are purely for descriptive
purposes - they can be used to aid people in finding your project on
[`PyPI`](https://pypi.python.org/pypi), if you release it there.
See
[here](https://packaging.python.org/tutorials/distributing-packages/#setup-args)
for more details on `setup.py` and the `setup` function.
<a class="anchor" id="appendix-tests"></a>
## Appendix: Tests
There are no strict rules for where to put your tests (you have tests,
right?). There are two main conventions:
You can store your test files *inside* your package directory:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> tests/
> __init__.py
> test_mymodule.py
> ```
Or, you can store your test files *alongside* your package directory:
> ```
> myproject/
> mypackage/
> __init__.py
> mymodule.py
> tests/
> test_mymodule.py
> ```
If you want your test code to be completely independent of your project's
code, then go with the second option. However, if you would like your test
code to be distributed as part of your project (e.g. so that end users can run
them), then the first option is probably the best.
But in the end, the standard Python unit testing frameworks
([`pytest`](https://docs.pytest.org/en/latest/) and
[`nose`](http://nose2.readthedocs.io/en/latest/)) are pretty good at finding
your test functions no matter where you've hidden them, so the choice is
really up to you.
<a class="anchor" id="appendix-versioning"></a>
## Appendix: Versioning
If you are intending to make your project available for public use (e.g. on
[PyPI](https://pypi.python.org/pypi) and/or
[conda](https://anaconda.org/anaconda/repo)), it is **very important** to
manage the version number of your project. If somebody decides to build their
software on top of your project, they are not going to be very happy with you
if you make substantial, API-breaking changes without changing your version
number in an appropriate manner.
Python has [official standards](https://www.python.org/dev/peps/pep-0440/) on
what constitutes a valid version number. These standards can be quite
complicated but, in the vast majority of cases, a simple three-number
versioning scheme comprising *major*, *minor*, and *patch* release
numbers should suffice. Such a version number has the form:
> ```
> major.minor.patch
> ```
For example, a version number of `1.3.2` has a _major_ release of 1, _minor_
release of 3, and a _patch_ release of 2.
If you follow some simple and rational guidelines for versioning
`your_project`, then people who use your project can, for instance, specify
that they depend on `your_project==1.*`, and be sure that their code will work
for *any* version of `your_project` with a major release of 1. Following these
simple guidelines greatly improves software interoperability, and makes
everybody (i.e. developers of other projects, and end users) much happier!
Many modern Python projects use some form of [*semantic
versioning*](https://semver.org/). Semantic versioning is simply a set of
guidelines on how to manage your version number:
- The *major* release number should be incremented whenever you introduce any
backwards-incompatible changes. In other words, if you change your code
such that some other code which uses your code would break, you should
increment the major release number.
- The *minor* release number should be incremented whenever you add any new
(backwards-compatible) features to your project.
- The *patch* release number should be incremented for backwards-compatible
bug-fixes and other minor changes.
If you like to automate things,
[`bumpversion`](https://github.com/peritus/bumpversion) is a simple tool that
you can use to help manage your version number.
<a class="anchor" id="include-the-version-in-your-code"></a>
### Include the version in your code
While the version of a library is ultimately defined in `setup.py`, it is
standard practice for a Python library to contain a version string called
`__version__` in the `__init__.py` file of the top-level package. For example,
our `example_project/mypackage/__init__.py` file contains this line:
> ```
> __version__ = '0.1.0'
> ```
This makes a library's version number programmatically accessible and
queryable.
<a class="anchor" id="deprecate-dont-remove"></a>
### Deprecate, don't remove!
If you really want to change your API, but can't bring yourself to increment
your major release number, consider
[*deprecating*](https://en.wikipedia.org/wiki/Deprecation#Software_deprecation)
the old API, and postponing its removal until you are ready for a major
release. This will allow you to change your API, but retain
backwards-compatilbiity with the old API until it can safely be removed at the
next major release.
You can use the built-in
[`warnings`](https://docs.python.org/3.5/library/exceptions.html#DeprecationWarning)
module to warn about uses of deprecated items. There are also some
[third-party libraries](https://github.com/briancurtin/deprecation) which make
it easy to mark a function, method or class as being deprecated.
<a class="anchor" id="appendix-cookiecutter"></a>
## Appendix: Cookiecutter
It is worth mentioning
[Cookiecutter](https://github.com/audreyr/cookiecutter), a little utility
program which you can use to generate a skeleton file/directory structure for
a new Python project.
You need to give it a template (there are many available templates, including
for projects in languages other than Python) - a couple of useful templates
are the [minimal Python package
template](https://github.com/kragniz/cookiecutter-pypackage-minimal), and the
[full Python package
template](https://github.com/audreyr/cookiecutter-pypackage) (although the
latter is probably overkill for most).
Here is how to create a skeleton project directory based off the minimal
Python packagetemplate:
> ```
> pip install cookiecutter
>
> # tell cookiecutter to create a directory
> # from the pypackage-minimal template
> cookiecutter https://github.com/kragniz/cookiecutter-pypackage-minimal.git
>
> # cookiecutter will then prompt you for
> # basic information (e.g. projectname,
> # author name/email), and then create a
> # new directory containing the project
> # skeleton.
> ```