Skip to content
Snippets Groups Projects
Commit a5b5ea88 authored by Michiel Cottaar's avatar Michiel Cottaar
Browse files

Bug fixes from running notebooks

parent 72845f6d
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Basic python
This tutorial is aimed at briefly introducing you to the main language
features of python, with emphasis on some of the common difficulties
and pitfalls that are commonly encountered when moving to python.
When going through this make sure that you _run_ each code block and
look at the output, as these are crucial for understanding the
explanations. You can run each block by using _shift + enter_
(including the text blocks, so you can just move down the document
with shift + enter).
It is also possible to _change_ the contents of each code block (these pages
are completely interactive) so do experiment with the code you see and try
some variations!
> **Important**: We are exclusively using Python 3 in FSL - as of FSL 6.0.4 we
> are using Python 3.7. There are some subtle differences between Python 2 and
> Python 3, but instead of learning about these differences, it is easier to
> simply forget that Python 2 exists. When you are googling for Python help,
> make sure that the pages you find are relevant to Python 3 and *not* Python
> 2! The official Python docs can be found at https://docs.python.org/3/ (note
> the _/3/_ at the end!).
## Contents
* [Basic types](#Basic-types)
- [Strings](#Strings)
+ [Format](#Format)
+ [String manipulation](#String-manipulation)
- [Tuples and lists](#Tuples-and-lists)
+ [Adding to a list](#Adding-to-a-list)
+ [Indexing](#Indexing)
+ [Slicing](#Slicing)
- [List operations](#List-operations)
+ [Looping over elements in a list (or tuple)](#Looping)
+ [Getting help](#Getting-help)
- [Dictionaries](#Dictionaries)
+ [Adding to a dictionary](#Adding-to-a-dictionary)
+ [Removing elements from a dictionary](#Removing-elements-dictionary)
+ [Looping over everything in a dictionary](#Looping-dictionary)
- [Copying and references](#Copying-and-references)
* [Control flow](#Control-flow)
- [Boolean operators](#Boolean-operators)
- [If statements](#If-statements)
- [For loops](#For-loops)
- [While loops](#While-loops)
- [A quick intro to conditional expressions and list comprehensions](#quick-intro)
+ [Conditional expressions](#Conditional-expressions)
+ [List comprehensions](#List-comprehensions)
* [Functions](#functions)
* [Exercise](#exercise)
---
<a class="anchor" id="Basic-types"></a>
# Basic types
Python has many different types and variables are dynamic and can change types (like MATLAB). Some of the most commonly used in-built types are:
* integer and floating point scalars
* strings
* tuples
* lists
* dictionaries
N-dimensional arrays and other types are supported through common modules (e.g., [numpy](https://numpy.org/), [scipy](https://docs.scipy.org/doc/scipy-1.4.1/reference/), [scikit-learn](https://scikit-learn.org/stable/)). These will be covered in a subsequent exercises.
%% Cell type:code id: tags:
```
a = 4
b = 3.6
c = 'abc'
d = [10, 20, 30]
e = {'a' : 10, 'b': 20}
print(a)
```
%% Cell type:markdown id: tags:
Any variable or combination of variables can be printed using the function `print()`:
%% Cell type:code id: tags:
```
print(d)
print(e)
print(a, b, c)
```
%% Cell type:markdown id: tags:
> _*Python 3 versus python 2*_:
>
> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should never use `print` without brackets, as this is incompatible with Python 3.
>
> Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C.
---
<a class="anchor" id="Strings"></a>
## Strings
Strings can be specified using single quotes *or* double quotes - as long as they are matched.
Strings can be indexed like lists (see later).
For example:
%% Cell type:code id: tags:
```
s1 = "test string"
s2 = 'another test string'
print(s1, ' :: ', s2)
```
%% Cell type:markdown id: tags:
You can also use triple quotes to capture multi-line strings. For example:
%% Cell type:code id: tags:
```
s3 = '''This is
a string over
multiple lines
'''
print(s3)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="Format"></a>
### Format
More interesting strings can be created using an [f-string](https://realpython.com/python-f-strings/),
which is very useful in print statements:
%% Cell type:code id: tags:
```
x = 1
y = 'PyTreat'
s = f'The numerical value is {x} and a name is {y}'
print(s)
print(f'A name is {y} and a number is {x}')
```
%% Cell type:markdown id: tags:
Note the `f` before the initial quote. This lets python know to fill in the variables between the curly brackets.
There are also other options along these lines, which will be discussed in the next practical.
This is the more modern version, although you will see plenty of the other alternatives in "old" code
(to python coders this means anything written before last week).
<a class="anchor" id="String-manipulation"></a>
### String manipulation
The methods `lower()` and `upper()` are useful for strings. For example:
%% Cell type:code id: tags:
```
s = 'This is a Test String'
print(s.upper())
print(s.lower())
```
%% Cell type:markdown id: tags:
Another useful method is:
%% Cell type:code id: tags:
```
s = 'This is a Test String'
s2 = s.replace('Test', 'Better')
print(s2)
```
%% Cell type:markdown id: tags:
Strings can be concatenated just by using the `+` operator:
%% Cell type:code id: tags:
```
s3 = s + ' :: ' + s2
print(s3)
```
%% Cell type:markdown id: tags:
If you like regular expressions then you're in luck as these are well supported in python using the `re` module. To use this (like many other "extensions" - called _modules_ in Python - you need to `import` it). For example:
%% Cell type:code id: tags:
```
import re
s = 'This is a test of a Test String'
s1 = re.sub(r'a [Tt]est', "an example", s)
print(s1)
```
%% Cell type:markdown id: tags:
where the `r` before the quote is used to force the regular expression specification to be a `raw string` (see [here](https://docs.python.org/3.5/library/re.html) for more info).
For more information on matching and substitutions, look up the regular expression module on the web.
Two common and convenient string methods are `strip()` and `split()`. The
first will remove any whitespace at the beginning and end of a string:
%% Cell type:code id: tags:
```
s2 = ' A very spacy string '
print('*' + s2 + '*')
print('*' + s2.strip() + '*')
```
%% Cell type:markdown id: tags:
With `split()` we can tokenize a string (to turn it into a list of strings) like this:
%% Cell type:code id: tags:
```
print(s.split())
print(s2.split())
```
%% Cell type:markdown id: tags:
By default it splits at whitespace, but it can also split at a specified delimiter:
%% Cell type:code id: tags:
```
s4 = ' This is, as you can see , a very weirdly spaced and punctuated string ... '
print(s4.split(','))
```
%% Cell type:markdown id: tags:
A neat trick, if you want to change the delimiter in some structured data (e.g.
replace `,` with `\t`), is to use `split()` in combination with another string
method, `join()`:
%% Cell type:code id: tags:
```
csvdata = 'some,comma,separated,data'
tsvdata = '\t'.join(csvdata.split(','))
tsvdata = tsvdata.replace('comma', 'tab'))
tsvdata = tsvdata.replace('comma', 'tab')
print('csvdata:', csvdata)
print('tsvdata:', tsvdata)
```
%% Cell type:markdown id: tags:
There are more powerful ways of dealing with this like csv files/strings,
which are covered in later practicals, but even this can get you a long way.
> Note that strings in python 3 are _unicode_ so can represent Chinese
> characters, etc, and is therefore very flexible. However, in general you
> can just be blissfully ignorant of this fact.
Strings can be converted to integer or floating-point values by using the
`int()` and `float()` calls:
%% Cell type:code id: tags:
```
sint='23'
sfp='2.03'
print(sint + sfp)
print(int(sint) + float(sfp))
print(float(sint) + float(sfp))
```
%% Cell type:markdown id: tags:
> Note that calling `int()` on a non-integer (e.g., on `sfp` above) will raise an error.
---
<a class="anchor" id="Tuples-and-lists"></a>
## Tuples and lists
Both tuples and lists are builtin python types and are like vectors,
but for numerical vectors and arrays it is much better to use `numpy`
arrays (or matrices), which are covered in a later tutorial.
A tuple is like a list or a vector, but with less flexibility than a full list (tuples are immutable), however anything can be stored in either a list or tuple, without any consistency being required. Tuples are defined using round brackets and lists are defined using square brackets. For example:
%% Cell type:code id: tags:
```
xtuple = (3, 7.6, 'str')
xlist = [1, 'mj', -5.4]
print(xtuple)
print(xlist)
```
%% Cell type:markdown id: tags:
They can also be nested:
%% Cell type:code id: tags:
```
x2 = (xtuple, xlist)
x3 = [xtuple, xlist]
print('x2 is: ', x2)
print('x3 is: ', x3)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="Adding-to-a-list"></a>
### Adding to a list
This is easy:
%% Cell type:code id: tags:
```
a = [10, 20, 30]
a = a + [70]
a += [80]
print(a)
```
%% Cell type:markdown id: tags:
> Similar things can be done for tuples, except for the last one: that is,
> `a += (80)` because a tuple is immutable so cannot be changed like this.
<a class="anchor" id="Indexing"></a>
### Indexing
Square brackets are used to index tuples, lists, strings, dictionaries, etc. For example:
%% Cell type:code id: tags:
```
d = [10, 20, 30]
print(d[1])
```
%% Cell type:markdown id: tags:
> _*Pitfall:*_
> Python uses zero-based indexing, unlike MATLAB
%% Cell type:code id: tags:
```
a = [10, 20, 30, 40, 50, 60]
print(a[0])
print(a[2])
```
%% Cell type:markdown id: tags:
Indices naturally run from 0 to N-1, _but_ negative numbers can be used to
reference from the end (circular wrap-around).
%% Cell type:code id: tags:
```
print(a[-1])
print(a[-6])
```
%% Cell type:markdown id: tags:
However, this is only true for -1 to -N. Outside of -N to N-1 will generate an `index out of range` error.
%% Cell type:code id: tags:
```
print(a[-7])
```
%% Cell type:code id: tags:
```
print(a[6])
```
%% Cell type:markdown id: tags:
Length of a tuple or list is given by the `len()` function:
%% Cell type:code id: tags:
```
print(len(a))
```
%% Cell type:markdown id: tags:
Nested lists can have nested indexing:
%% Cell type:code id: tags:
```
b = [[10, 20, 30], [40, 50, 60]]
print(b[0][1])
print(b[1][0])
```
%% Cell type:markdown id: tags:
but *not* an index like `b[0, 1]`. However, numpy arrays (covered in a later practical) can be indexed like `b[0, 1]` and similarly for higher dimensions.
> Note that `len` will only give the length of the top level.
> In general, numpy arrays should be preferred to nested lists when the contents are numerical.
<a class="anchor" id="Slicing"></a>
### Slicing
A range of values for the indices can be specified to extract values from a list. For example:
%% Cell type:code id: tags:
```
print(a[0:3])
```
%% Cell type:markdown id: tags:
> _*Pitfall:*_
>
> Slicing syntax is different from MATLAB in that second number is
> exclusive (i.e., one plus final index) - this is in addition to the zero index difference.
%% Cell type:code id: tags:
```
a = [10, 20, 30, 40, 50, 60]
print(a[0:3]) # same as a(1:3) in MATLAB
print(a[1:3]) # same as a(2:3) in MATLAB
```
%% Cell type:markdown id: tags:
> _*Pitfall:*_
>
> Unlike in MATLAB, you cannot use a list as indices instead of an
> integer or a slice (although this can be done in `numpy`).
%% Cell type:code id: tags:
```
b = [3, 4]
print(a[b])
```
%% Cell type:markdown id: tags:
In python you can leave the start and end values implicit, as it will assume these are the beginning and the end. For example:
%% Cell type:code id: tags:
```
print(a[:3])
print(a[1:])
print(a[:-1])
```
%% Cell type:markdown id: tags:
in the last example remember that negative indices are subject to wrap around so that `a[:-1]` represents all elements up to the penultimate one.
You can also change the step size, which is specified by the third value (not the second one, as in MATLAB). For example:
%% Cell type:code id: tags:
```
print(a[0:4:2])
print(a[::2])
print(a[::-1])
```
%% Cell type:markdown id: tags:
the last example is a simple way to reverse a sequence.
<a class="anchor" id="List-operations"></a>
### List operations
Multiplication can be used with lists, where multiplication implements replication.
%% Cell type:code id: tags:
```
d = [10, 20, 30]
print(d * 4)
```
%% Cell type:markdown id: tags:
There are also other operations such as:
%% Cell type:code id: tags:
```
d.append(40)
print(d)
d.extend([50, 60])
print(d)
d = d + [70, 80]
print(d)
d.remove(20)
print(d)
d.pop(0)
print(d)
```
%% Cell type:markdown id: tags:
> Note that `d.append([50,60])` would run but instead of adding two extra elements it only adds a single element, where this element is a list of length two, making a messy list. Try it and see if this is not clear.
<a class="anchor" id="Looping"></a>
### Looping over elements in a list (or tuple)
%% Cell type:code id: tags:
```
d = [10, 20, 30]
for x in d:
print(x)
```
%% Cell type:markdown id: tags:
> Note that the indentation within the loop is _*crucial*_. All python control blocks are delineated purely by indentation. We recommend using **four spaces** and no tabs, as this is a standard practice and will help a lot when collaborating with others.
<a class="anchor" id="Getting-help"></a>
### Getting help
The function `help()` can be used to get information about any variable/object/function in python. It lists the possible operations. In `ipython` you can also just type `?<blah>` or `<blah>?` instead:
%% Cell type:code id: tags:
```
help(d)
```
%% Cell type:markdown id: tags:
There is also a `dir()` function that gives a basic listing of the operations:
%% Cell type:code id: tags:
```
dir(d)
```
%% Cell type:markdown id: tags:
> Note that google is often more helpful! At least, as long as you find pages
> relating to Python 3 - Python 2 is no longer supported, but there is still
> lots of information about it on the internet, so be careful!
---
<a class="anchor" id="Dictionaries"></a>
## Dictionaries
These store key-value pairs. For example:
%% Cell type:code id: tags:
```
e = {'a' : 10, 'b': 20}
print(len(e))
print(e.keys())
print(e.values())
print(e['a'])
```
%% Cell type:markdown id: tags:
The keys and values can take on almost any type, even dictionaries!
Python is nothing if not flexible. However, each key must be unique
and [hashable](https://docs.python.org/3.5/glossary.html#term-hashable).
<a class="anchor" id="Adding-to-a-dictionary"></a>
### Adding to a dictionary
This is very easy:
%% Cell type:code id: tags:
```
e['c'] = 555 # just like in Biobank! ;)
print(e)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="Removing-elements-dictionary"></a>
### Removing elements from a dictionary
There are two main approaches - `pop` and `del`:
%% Cell type:code id: tags:
```
e.pop('b')
print(e)
del e['c']
print(e)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="Looping-dictionary"></a>
### Looping over everything in a dictionary
Several variables can jointly work as loop variables in python, which is very convenient. For example:
%% Cell type:code id: tags:
```
e = {'a' : 10, 'b': 20, 'c':555}
for k, v in e.items():
print((k, v))
```
%% Cell type:markdown id: tags:
The print statement here constructs a tuple, which is often used in python.
Another option is:
%% Cell type:code id: tags:
```
for k in e:
print((k, e[k]))
```
%% Cell type:markdown id: tags:
> In older versions of Python 3, there was no guarantee of ordering when using dictionaries.
> However, a of Python 3.7, dictionaries will remember the order in which items are inserted,
> and the `keys()`, `values()`, and `items()` methods will return elements in that order.
>
> If you want a dictionary with ordering, *and* you want your code to work with
> Python versions older than 3.7, you can use the
> [`OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict)
> class.
---
<a class="anchor" id="Copying-and-references"></a>
## Copying and references
In python there are immutable types (e.g. numbers) and mutable types (e.g. lists). The main thing to know is that assignment can sometimes create separate copies and sometimes create references (as in C++). In general, the more complicated types are assigned via references. For example:
%% Cell type:code id: tags:
```
a = 7
b = a
a = 2348
print(b)
```
%% Cell type:markdown id: tags:
As opposed to:
%% Cell type:code id: tags:
```
a = [7]
b = a
a[0] = 8888
print(b)
```
%% Cell type:markdown id: tags:
But if an operation is performed then a copy might be made:
%% Cell type:code id: tags:
```
a = [7]
b = a * 2
a[0] = 8888
print(b)
```
%% Cell type:markdown id: tags:
If an explicit copy is necessary then this can be made using the `list()` constructor:
%% Cell type:code id: tags:
```
a = [7]
b = list(a)
a[0] = 8888
print(b)
```
%% Cell type:markdown id: tags:
There is a constructor for each type and this can be useful for converting between types:
%% Cell type:code id: tags:
```
xt = (2, 5, 7)
xl = list(xt)
print(xt)
print(xl)
```
%% Cell type:markdown id: tags:
> _*Pitfall:*_
>
> When writing functions you need to be particularly careful about references and copies.
%% Cell type:code id: tags:
```
def foo1(x):
x.append(10)
print('x: ', x)
def foo2(x):
x = x + [10]
print('x: ', x)
def foo3(x):
print('return value: ', x + [10])
return x + [10]
a = [5]
print('a: ', a)
foo1(a)
print('a: ', a)
foo2(a)
print('a: ', a)
b = foo3(a)
print('a: ', a)
print('b: ', b)
```
%% Cell type:markdown id: tags:
> Note that we have defined some functions here - and the syntax
> should be relatively intuitive. See <a href="#functions">below</a>
> for a bit more detail on function definitions.
---
<a class="anchor" id="Control-flow"></a>
## Control flow
<a class="anchor" id="Boolean-operators"></a>
### Boolean operators
There is a boolean type in python that can be `True` or `False` (note the
capitals). Other values can also be used for True or False (e.g., `1` for
`True`; `0` or `None` or `[]` or `{}` or `""` for `False`) although they are
not considered 'equal' in the sense that the operator `==` would consider them
the same.
Relevant boolean and comparison operators include: `not`, `and`, `or`, `==` and `!=`.
For example:
%% Cell type:code id: tags:
```
a = True
print('Not a is:', not a)
print('Not 1 is:', not 1)
print('Not 0 is:', not 0)
print('Not {} is:', not {})
print('{}==0 is:', {}==0)
```
%% Cell type:markdown id: tags:
There is also the `in` test for strings, lists, etc:
%% Cell type:code id: tags:
```
print('the' in 'a number of words')
print('of' in 'a number of words')
print(3 in [1, 2, 3, 4])
```
%% Cell type:markdown id: tags:
A useful keyword is `None`, which is a bit like "null".
This can be a default value for a variable and should be tested with the `is` operator rather than `==` (for technical reasons that it isn't worth going into here). For example: `a is None` or `a is not None` are the preferred tests.
Do not use the `is` instead of the `==` operator for any other comparisons (unless you know what you are doing).
<a class="anchor" id="If-statements"></a>
### If statements
The basic syntax of `if` statements is fairly standard, though don't forget that you _*must*_ indent the statements within the conditional/loop block as this is the way of delineating blocks of code in python. For example:
%% Cell type:code id: tags:
```
import random
a = random.uniform(-1, 1)
print(a)
if a>0:
print('Positive')
elif a<0:
print('Negative')
else:
print('Zero')
```
%% Cell type:markdown id: tags:
Or more generally:
%% Cell type:code id: tags:
```
a = [] # just one of many examples
if not a:
print('Variable is true, or at least not empty')
```
%% Cell type:markdown id: tags:
This can be useful for functions where a variety of possible input types are being dealt with.
---
<a class="anchor" id="For-loops"></a>
### For loops
The `for` loop works like in bash:
%% Cell type:code id: tags:
```
for x in [2, 'is', 'more', 'than', 1]:
print(x)
```
%% Cell type:markdown id: tags:
where a list or any other sequence (e.g. tuple) can be used.
If you want a numerical range then use:
%% Cell type:code id: tags:
```
for x in range(2, 9):
print(x)
print(x)
```
%% Cell type:markdown id: tags:
Note that, like slicing, the maximum value is one less than the value specified. Also, `range` actually returns an object that can be iterated over but is not just a list of numbers. If you want a list of numbers then `list(range(2, 9))` will give you this.
A very nice feature of python is that multiple variables can be assigned from a tuple or list:
%% Cell type:code id: tags:
```
x, y = [4, 7]
print(x)
print(y)
```
%% Cell type:markdown id: tags:
and this can be combined with a function called `zip` to make very convenient dual variable loops:
%% Cell type:code id: tags:
```
alist = ['Some', 'set', 'of', 'items']
blist = list(range(len(alist)))
print(list(zip(alist, blist)))
for x, y in zip(alist, blist):
print(y, x)
```
%% Cell type:markdown id: tags:
This type of loop can be used with any two lists (or similar) to iterate over them jointly.
<a class="anchor" id="While-loops"></a>
### While loops
The syntax for this is pretty standard:
%% Cell type:code id: tags:
```
import random
n = 0
x = 0
while n<100:
x += random.uniform(0, 1)**2 # where ** is a power operation
if x>50:
break
n += 1
print(x)
```
%% Cell type:markdown id: tags:
You can also use `continue` as in other languages.
> Note that there is no `do ... while` construct.
---
<a class="anchor" id="quick-intro"></a>
### A quick intro to conditional expressions and list comprehensions
These are more advanced bits of python but are really useful and common, so worth having a little familiarity with at this stage.
<a class="anchor" id="Conditional-expressions"></a>
#### Conditional expressions
A general expression that can be used in python is: A `if` condition `else` B
For example:
%% Cell type:code id: tags:
```
import random
x = random.uniform(0, 1)
y = x**2 if x<0.5 else (1 - x)**2
print(x, y)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="List-comprehensions"></a>
#### List comprehensions
This is a shorthand syntax for building a list like a for loop but doing it in one line, and is very popular in python. It is quite similar to mathematical set notation. For example:
%% Cell type:code id: tags:
```
v1 = [ x**2 for x in range(10) ]
print(v1)
v2 = [ x**2 for x in range(10) if x!=7 ]
print(v2)
```
%% Cell type:markdown id: tags:
You'll find that python programmers use this kind of construction _*a lot*_.
---
<a class="anchor" id="functions"></a>
## Functions
You will find functions pretty familiar in python to start with,
although they have a few options which are really handy and different
from C++ or matlab (to be covered in a later practical). To start
with we'll look at a simple function but note a few key points:
* you _must_ indent everything inside the function (it is a code
block and indentation is the only way of determining this - just
like for the guts of a loop)
* you can return _whatever you want_ from a python function, but only
a single object - it is usual to package up multiple things in a
tuple or list, which is easily unpacked by the calling invocation:
e.g., `a, b, c = myfunc(x)`
* parameters are passed by reference (see section on <a
href="#Copying-and-references">copying and references</a>)
%% Cell type:code id: tags:
```
def myfunc(x, y, z=0):
r2 = x*x + y*y + z*z
r = r2**0.5
return r, r2
rad = myfunc(10, 20)
print(rad)
rad, dummy = myfunc(10, 20, 30)
print(rad)
rad, _ = myfunc(10,20,30)
print(rad)
```
%% Cell type:markdown id: tags:
> Note that the `_` is used as shorthand here for a dummy variable
> that you want to throw away.
>
> The return statement implicitly creates a tuple to return and is equivalent to `return (r, r2)`
One nice feature of python functions is that you can name the
arguments when you call them, rather than only doing it by position.
For example:
%% Cell type:code id: tags:
```
def myfunc(x, y, z=0, flag=''):
if flag=='L1':
r = abs(x) + abs(y) + abs(z)
else:
r = (x*x + y*y + z*z)**0.5
return r
rA = myfunc(10, 20)
rB = myfunc(10, 20, flag='L1')
rC = myfunc(10, 20, flag='L1', z=30)
print(rA, rB, rC)
```
%% Cell type:markdown id: tags:
You will often see python functions called with these named arguments. In fact, for functions with more than 2 or 3 variables this naming of arguments is recommended, because it clarifies what each of the arguments does for anyone reading the code.
---
<a class="anchor" id="exercise"></a>
## Exercise
Let's say you are given a single string with comma separated elements
that represent filenames and ID codes: e.g., `/vols/Data/pytreat/AAC, 165873, /vols/Data/pytreat/AAG, 170285, ...`
Write some code to do the following:
* separate out the filenames and ID codes into separate lists (ID's
should be numerical values, not strings) - you may need several steps for this
* loop over the two and generate a _string_ that could be used to
rename the directories (e.g., `mv /vols/Data/pytreat/AAC /vols/Data/pytreat/S165873`) - we will cover how to actually execute these in a later practical
* convert your dual lists into a dictionary, with ID as the key
* write a small function to determine if an ID is present in this
set of not, and also return the filename if it is
* write a for loop to create a list of all the odd-numbered IDs (you can use the `%` operator for modulus - i.e., `5 % 2` is 1)
* re-write the for loop as a list comprehension
* now generate a list of the filenames corresponding to these odd-numbered IDs
%% Cell type:code id: tags:
```
mstr = '/vols/Data/pytreat/AAC, 165873, /vols/Data/pytreat/AAG, 170285, /vols/Data/pytreat/AAH, 196792, /vols/Data/pytreat/AAK, 212577, /vols/Data/pytreat/AAQ, 385376, /vols/Data/pytreat/AB, 444600, /vols/Data/pytreat/AC6, 454578, /vols/Data/pytreat/V8, 501502, /vols/Data/pytreat/2YK, 667688, /vols/Data/pytreat/C3PO, 821971'
```
......
......@@ -192,7 +192,7 @@ method, `join()`:
```
csvdata = 'some,comma,separated,data'
tsvdata = '\t'.join(csvdata.split(','))
tsvdata = tsvdata.replace('comma', 'tab'))
tsvdata = tsvdata.replace('comma', 'tab')
print('csvdata:', csvdata)
print('tsvdata:', tsvdata)
```
......@@ -630,7 +630,7 @@ where a list or any other sequence (e.g. tuple) can be used.
If you want a numerical range then use:
```
for x in range(2, 9):
print(x)
print(x)
```
Note that, like slicing, the maximum value is one less than the value specified. Also, `range` actually returns an object that can be iterated over but is not just a list of numbers. If you want a list of numbers then `list(range(2, 9))` will give you this.
......
%% Cell type:markdown id: tags:
# Text input/output
In this section we will explore how to write and/or retrieve our data from
text files.
Most of the functionality for reading/writing files and manipulating strings
is available without any imports. However, you can find some additional
functionality in the
[`string`](https://docs.python.org/3/library/string.html) module.
Most of the string functions are available as methods on string objects. This
means that you can use the ipython autocomplete to check for them.
%% Cell type:code id: tags:
```
empty_string = ''
```
%% Cell type:code id: tags:
```
# after running the code block above,
# put your cursor after the dot and
# press tab to get a list of methods
empty_string.
```
%% Cell type:markdown id: tags:
* [Reading/writing files](#reading-writing-files)
* [Creating new strings](#creating-new-strings)
* [String syntax](#string-syntax)
* [Unicode versus bytes](#unicode-versus-bytes)
* [Converting objects into strings](#converting-objects-into-strings)
* [Combining strings](#combining-strings)
* [String formattings](#string-formatting)
* [Extracting information from strings](#extracting-information-from-strings)
* [Splitting strings](#splitting-strings)
* [Converting strings to numbers](#converting-strings-to-numbers)
* [Regular expressions](#regular-expressions)
* [Exercises](#exercises)
<a class="anchor" id="reading-writing-files"></a>
## Reading/writing files
The syntax to open a file in python is `with open(<filename>, <mode>) as
<file_object>: <block of code>`, where
* `filename` is a string with the name of the file
* `mode` is one of `'r'` (for read-only access), `'w'` (for writing a file,
this wipes out any existing content), `'a'` (for appending to an existing
file). A `'b'` can be added to any of these to open the file in "byte"-mode,
which prevents python from interpreting non-text (e.g., NIFTI) files as text.
* `file_object` is a variable name which will be used within the `block of
code` to access the opened file.
For example the following will read all the text in `README.md` and print it.
%% Cell type:code id: tags:
```
with open('README.md', 'r') as readme_file:
print(readme_file.read())
```
%% Cell type:markdown id: tags:
> The `with` statement is an advanced python feature, however you will
> probably only encounter it when opening files. In that context it merely
> ensures that the file will be properly closed as soon as the program leaves
> the `with` statement (even if an error is raised within the `with`
> statement).
You could also use the `readlines()` method to get a list of all the lines, or
simply "loop over" the file object to get the lines one by one:
%% Cell type:code id: tags:
```
with open('README.md', 'r') as readme_file:
print('First five lines...')
for i, line in enumerate(readme_file):
# each line is returned with its
# newline character still intact,
# so we use rstrip() to remove it.
print(f'{i}: {line.rstrip()}'))
print(f'{i}: {line.rstrip()}')
if i == 4:
break
```
%% Cell type:markdown id: tags:
> enumerate takes any sequence and returns 2-element tuples with the index and the sequence item
A very similar syntax is used to write files:
%% Cell type:code id: tags:
```
with open('02_text_io/my_file', 'w') as my_file:
my_file.write('This is my first line\n')
my_file.writelines(['Second line\n', 'and the third\n'])
```
%% Cell type:markdown id: tags:
Note that new line characters do not get added automatically. We can investigate
the resulting file using
%% Cell type:code id: tags:
```
!cat 02_text_io/my_file
```
%% Cell type:markdown id: tags:
> In Jupyter notebook, (and in `ipython`/`fslipython`), any lines starting
> with `!` will be interpreted as shell commands. It is great when playing
> around in a Jupyter notebook or in the `ipython` terminal, however it is an
> ipython-only feature and hence is not available when writing python
> scripts. How to call shell commands from python will be discussed in the
> `scripts` practical.
If we want to add to the existing file we can open it in the append mode:
%% Cell type:code id: tags:
```
with open('02_text_io/my_file', 'a') as my_file:
my_file.write('More lines is always better\n')
!cat 02_text_io/my_file
```
%% Cell type:markdown id: tags:
Below we will discuss how we can convert python objects to strings to store in
these files and how to extract those python objects from strings again.
<a class="anchor" id="creating-new-strings"></a>
## Creating new strings
<a class="anchor" id="string-syntax"></a>
### String syntax
Single-line strings can be created in python using either single or double
quotes:
%% Cell type:code id: tags:
```
a_string = 'To be or not to be'
same_string = "To be or not to be"
print(a_string == same_string)
```
%% Cell type:markdown id: tags:
The main rationale for choosing between single or double quotes, is whether
the string itself will contain any quotes. You can include a single quote in a
string surrounded by single quotes by escaping it with the `\` character,
however in such a case it would be more convenient to use double quotes:
%% Cell type:code id: tags:
```
a_string = "That's the question"
same_string = 'That\'s the question'
print(a_string == same_string)
```
%% Cell type:markdown id: tags:
New-lines (`\n`), tabs (`\t`) and many other special characters are supported
%% Cell type:code id: tags:
```
a_string = "This is the first line.\nAnd here is the second.\n\tThe third starts with a tab."
print(a_string)
```
%% Cell type:markdown id: tags:
You can even include unicode characters:
%% Cell type:code id: tags:
```
a_string = "Python = 🐍"
print(a_string)
```
%% Cell type:markdown id: tags:
However, the easiest way to create multi-line strings is to use a triple quote (again single or double quotes can be used). Triple quotes allow your string to span multiple lines:
%% Cell type:code id: tags:
```
multi_line_string = """This is the first line.
And here is the second.
\tThird line starts with a tab."""
print(multi_line_string)
```
%% Cell type:markdown id: tags:
If you don't want python to reintepret your `\n`, `\t`, etc. in your strings, you can prepend the quotes enclosing the string with an `r`. This will lead to python interpreting the following string as raw text.
%% Cell type:code id: tags:
```
single_line_string = r"This string is not multiline.\nEven though it contains the \n character"
print(single_line_string)
```
%% Cell type:markdown id: tags:
One pitfall when creating a list of strings is that python automatically concatenates string literals, which are only separated by white space:
%% Cell type:code id: tags:
```
my_list_of_strings = ['a', 'b', 'c' 'd', 'e']
print("The 'c' and 'd' got concatenated, because we forgot the comma:", my_list_of_strings)
```
%% Cell type:markdown id: tags:
> This will lead to a syntax warning in python 3.8 or greater
<a class="anchor" id="unicode-versus-bytes"></a>
#### unicode versus bytes
> **Note**: You can safely skip this section if you do not have any plans to
> work with binary files or non-English text in Python, and you do not want
> to know how to insert poop emojis into your code.
To encourage the spread of python around the world, python 3 switched to using
unicode as the default for strings and code (which is one of the main reasons
for the incompatibility between python 2 and 3). This means that each element
in a string is a unicode character (using [UTF-8
encoding](https://docs.python.org/3/howto/unicode.html)), which can consist of
one or more bytes. The advantage is that any unicode characters can now be
used in strings or in the code itself:
%% Cell type:code id: tags:
```
Δ = "café"
print(Δ)
```
%% Cell type:markdown id: tags:
In python 2 each element in a string was a single byte rather than a
potentially multi-byte character. You can convert back to interpreting your
sequence as a unicode string or a byte array using:
* `encode()` called on a string converts it into a bytes array (`bytes` object)
* `decode()` called on a `bytes` array converts it into a unicode string.
%% Cell type:code id: tags:
```
delta = "Δ"
print('The character', delta, 'consists of the following 2 bytes', delta.encode())
```
%% Cell type:markdown id: tags:
These byte arrays can be created directly by prepending the quotes enclosing
the string with a `b`, which tells python 3 to interpret the following as a
byte array:
%% Cell type:code id: tags:
```
a_byte_array = b'\xce\xa9'
print('The two bytes ', a_byte_array, ' become single unicode character (', a_byte_array.decode(), ') with UTF-8 encoding')
```
%% Cell type:markdown id: tags:
Especially in code dealing with strings (e.g., reading/writing of files) many
of the errors arising of running python 2 code in python 3 arise from the
mixing of unicode strings with byte arrays. Decoding and/or encoding some of
these objects can often fix these issues.
By default any file opened in python will be interpreted as unicode. If you
want to treat a file as raw bytes, you have to include a 'b' in the `mode`
when calling the `open()` function:
%% Cell type:code id: tags:
```
import os.path as op
with open(op.expandvars('${FSLDIR}/data/standard/MNI152_T1_1mm.nii.gz'), 'rb') as gzipped_nifti:
print('First few bytes of gzipped NIFTI file:', gzipped_nifti.read(10))
```
%% Cell type:markdown id: tags:
> We use the `expandvars()` function here to insert the FSLDIR environmental
> variable into our string. This function will be presented in the file
> management practical.
<a class="anchor" id="converting-objects-into-strings"></a>
### Converting objects into strings
There are two functions to convert python objects into strings, `repr()` and
`str()`. All other functions that rely on string-representations of python
objects will use one of these two (for example the `print()` function will
call `str()` on the object).
The goal of the `str()` function is to be readable, while the goal of `repr()`
is to be unambiguous. Compare
%% Cell type:code id: tags:
```
print(str("3"))
print(str(3))
```
%% Cell type:markdown id: tags:
with
%% Cell type:code id: tags:
```
print(repr("3"))
print(repr(3))
```
%% Cell type:markdown id: tags:
In both cases you get the value of the object (3), but only the `repr` returns enough information to actually know the type of the object.
Perhaps the difference is clearer with a more advanced object.
The `datetime` module contains various classes and functions to work with dates (there is also a `time` module).
Here we will look at the alternative string representations of the `datetime` object itself:
%% Cell type:code id: tags:
```
from datetime import datetime
print('str(): ', str(datetime.now()))
print('repr(): ', repr(datetime.now()))
```
%% Cell type:markdown id: tags:
Note that the result from `str()` is human-readable as a date, while the result from `repr()` is more useful if you wanted to recreate the `datetime` object.
<a class="anchor" id="combining-strings"></a>
### Combining strings
The simplest way to concatenate strings is to simply add them together:
%% Cell type:code id: tags:
```
a_string = "Part 1"
other_string = "Part 2"
full_string = a_string + ", " + other_string
print(full_string)
```
%% Cell type:markdown id: tags:
Given a whole sequence of strings, you can concatenate them together using the `join()` method:
%% Cell type:code id: tags:
```
list_of_strings = ['first', 'second', 'third', 'fourth']
full_string = ', '.join(list_of_strings)
print(full_string)
```
%% Cell type:markdown id: tags:
Note that the string on which the `join()` method is called (`', '` in this case) is used as a delimiter to separate the different strings. If you just want to concatenate the strings you can call `join()` on the empty string:
%% Cell type:code id: tags:
```
list_of_strings = ['first', 'second', 'third', 'fourth']
full_string = ''.join(list_of_strings)
print(full_string)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="string-formatting"></a>
### String formatting
Using the techniques in [Combining strings](#combining-strings) we can build simple strings. For longer strings it is often useful to first write a template strings with some placeholders, where variables are later inserted. Built into python are currently 4 different ways of doing this (with many packages providing similar capabilities):
* [formatted string literals](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) (these are only available in python 3.6+)
* [new-style formatting](https://docs.python.org/3/library/string.html#format-string-syntax).
* printf-like [old-style formatting](https://docs.python.org/3/library/stdtypes.html#old-string-formatting)
* bash-like [template-strings](https://docs.python.org/3/library/string.html#template-strings)
Here we provide a single example using the first three methods, so you can recognize them in the future.
First the old print-f style. Note that this style is invoked by using the modulo (`%`) operator on the string. Every placeholder (starting with the `%`) is then replaced by one of the values provided.
%% Cell type:code id: tags:
```
a = 3
b = 1 / 3
print('%.3f = %i + %.3f' % (a + b, a, b))
print('%(total).3f = %(a)i + %(b).3f' % {'a': a, 'b': b, 'total': a + b})
```
%% Cell type:markdown id: tags:
Then the recommended new style formatting (You can find a nice tutorial [here](https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3)). Note that this style is invoked by calling the `format()` method on the string and the placeholders are marked by the curly braces `{}`.
%% Cell type:code id: tags:
```
a = 3
b = 1 / 3
print('{:.3f} = {} + {:.3f}'.format(a + b, a, b))
print('{total:.3f} = {a} + {b:.3f}'.format(a=a, b=b, total=a+b))
```
%% Cell type:markdown id: tags:
Note that the variable `:` delimiter separates the variable identifiers on the left from the formatting rules on the right.
Finally the new, fancy formatted string literals (only available in python 3.6+).
This new format is very similar to the recommended style, except that all placeholders are automatically evaluated in the local environment at the time the template is defined.
This means that we do not have to explicitly provide the parameters (and we can evaluate the sum inside the string!), although it does mean we also can not re-use the template.
%% Cell type:code id: tags:
```
a = 3
b = 1/3
print(f'{a + b:.3f} = {a} + {b:.3f}')
```
%% Cell type:markdown id: tags:
These f-strings are extremely useful when creating print or error messages for debugging,
especially with the new support for self-documenting in python 3.8 (see
[here](https://docs.python.org/3/whatsnew/3.8.html#f-strings-support-for-self-documenting-expressions-and-debugging)):
%% Cell type:code id: tags:
```
a = 3
b = 1/3
print(f'{a + b=}')
```
%% Cell type:markdown id: tags:
Note that this prints both the expression `a + b` and the output (this block will raise an error for python <= 3.7).
<a class="anchor" id="extracting-information-from-strings"></a>
## Extracting information from strings
The techniques shown in this section are useful if you are loading data from a
small text file or user input, or parsing a small amount of output from
e.g. `fslstats`. However, if you are working with large structured text data
(e.g. a big `csv` file), you should use the I/O capabilities of `numpy` or
`pandas` instead of doing things manually - this is covered in separate
practcals.
<a class="anchor" id="splitting-strings"></a>
### Splitting strings
The simplest way to extract a sub-string is to use slicing (see previous practical for more details):
%% Cell type:code id: tags:
```
a_string = 'abcdefghijklmnopqrstuvwxyz'
print(a_string[10]) # create a string containing only the 11th character
print(a_string[20:]) # create a string containing the 21st character onward
print(a_string[::-1]) # creating the reverse string
```
%% Cell type:markdown id: tags:
If you are not sure, where to cut into a string, you can use the `find()` method to find the first occurrence of a sub-string or `findall()` to find all occurrences.
%% Cell type:code id: tags:
```
a_string = 'abcdefghijklmnopqrstuvwxyz'
index = a_string.find('fgh')
print(a_string[:index]) # extracts the sub-string up to the first occurence of 'fgh'
print('index for non-existent sub-string', a_string.find('cats')) # note that find returns -1 when it can not find the sub-string rather than raising an error.
```
%% Cell type:markdown id: tags:
You can automate this process of splitting a string at a sub-string using the `split()` method. By default it will split a string at the white space.
%% Cell type:code id: tags:
```
print('The split() method\trecognizes a wide variety\nof white space'.split())
```
%% Cell type:markdown id: tags:
To separate a comma separated list we will need to supply the delimiter to the `split()` method. We can then use the `strip()` method to remove any whitespace at the beginning or end of the string:
%% Cell type:code id: tags:
```
scientific_packages_string = "numpy, scipy, pandas, matplotlib, nibabel"
list_with_whitespace = scientific_packages_string.split(',')
print(list_with_whitespace)
list_without_whitespace = [individual_string.strip() for individual_string in list_with_whitespace]
print(list_without_whitespace)
```
%% Cell type:markdown id: tags:
> We use the syntax `[<expr> for <element> in <sequence>]` here which applies the `expr` to each `element` in the `sequence` and returns the resulting list. This is a list comprehension - a convenient form in python to create a new list from the old one.
<a class="anchor" id="converting-strings-to-numbers"></a>
### Converting strings to numbers
Once you have extracted a number from a string, you can convert it into an
actual integer or float by calling respectively `int()` or `float()` on
it. `float()` understands a wide variety of different ways to write numbers:
%% Cell type:code id: tags:
```
print(int("3"))
print(float("3"))
print(float("3.213"))
print(float("3.213e5"))
print(float("3.213E-25"))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="regular-expressions"></a>
### Regular expressions
Regular expressions are used for looking for specific patterns in a longer string. This can be used to extract specific information from a well-formatted string or to modify a string. In python regular expressions are available in the [re](https://docs.python.org/3/library/re.html#re-syntax) module.
A full discussion of regular expression goes far beyond this practical. If you are interested, have a look [here](https://docs.python.org/3/howto/regex.html).
<a class="anchor" id="exercises"></a>
## Exercises
### Joining/splitting strings
The file 02_text_io/input.txt contains integers in a 2-column format (separated by spaces). Read in this file and write it back out in 2-rows separated by comma's.
%% Cell type:code id: tags:
```
input_filename = '02_text_io/input.txt'
out_filename = '02_text_io/output.txt'
output_filename = '02_text_io/output.txt'
with open(input_filename, 'r') as input_file:
...
with open(output_filename, 'w') as output_file:
...
```
%% Cell type:markdown id: tags:
### String formatting and regular expressions
Given a template for MRI files:
`s<subject_id>/<modality>_<res>mm.nii.gz`
where `<subject_id>` is a 6-digit subject-id, `<modality>` is one of T1w, T2w, or PD, and `<res>` is the resolution of the image (up to one digits behind the dot, e.g. 1.5)
Write a function that takes the subject_id (as an integer), the modality (as a string), and the resolution (as a float) and returns the complete filename (Hint: use one of the formatting techniques mentioned in [String formatting](#string-formatting)).
%% Cell type:code id: tags:
```
def get_filename(subject_id, modality, resolution):
...
```
%% Cell type:markdown id: tags:
For a more difficult exercise, write a function that extracts the subject id, modality, and resolution from a filename name (using a regular expression or by using `find` and `split` to access relevant parts of the string)
%% Cell type:code id: tags:
```
def get_parameters(filename):
...
return subject_id, modality, resolution
```
......
......@@ -74,7 +74,7 @@ with open('README.md', 'r') as readme_file:
# each line is returned with its
# newline character still intact,
# so we use rstrip() to remove it.
print(f'{i}: {line.rstrip()}'))
print(f'{i}: {line.rstrip()}')
if i == 4:
break
```
......@@ -418,7 +418,7 @@ The file 02_text_io/input.txt contains integers in a 2-column format (separated
```
input_filename = '02_text_io/input.txt'
out_filename = '02_text_io/output.txt'
output_filename = '02_text_io/output.txt'
with open(input_filename, 'r') as input_file:
...
......
%% Cell type:markdown id: tags:
# File management
In this section we will introduce you to file management - how do we find and
manage files, directories and paths in Python?
Most of Python's built-in functionality for managing files and paths is spread
across the following modules:
- [`os`](https://docs.python.org/3/library/os.html)
- [`shutil`](https://docs.python.org/3/library/shutil.html)
- [`os.path`](https://docs.python.org/3/library/os.path.html)
- [`glob`](https://docs.python.org/3/library/glob.html)
- [`fnmatch`](https://docs.python.org/3/library/fnmatch.html)
The `os` and `shutil` modules have functions allowing you to manage _files and
directories_. The `os.path`, `glob` and `fnmatch` modules have functions for
managing file and directory _paths_.
> Another standard library -
> [`pathlib`](https://docs.python.org/3/library/pathlib.html) - was added in
> Python 3.4, and provides an object-oriented interface to path management. We
> aren't going to cover `pathlib` here, but feel free to take a look at it if
> you are into that sort of thing.
## Contents
If you are impatient, feel free to dive straight in to the exercises, and use the
other sections as a reference. You might miss out on some neat tricks though.
* [Managing files and directories](#managing-files-and-directories)
* [Querying and changing the current directory](#querying-and-changing-the-current-directory)
* [Directory listings](#directory-listings)
* [Creating and removing directories](#creating-and-removing-directories)
* [Moving and removing files](#moving-and-removing-files)
* [Walking a directory tree](#walking-a-directory-tree)
* [Copying, moving, and removing directory trees](#copying-moving-and-removing-directory-trees)
* [Managing file paths](#managing-file-paths)
* [File and directory tests](#file-and-directory-tests)
* [Deconstructing paths](#deconstructing-paths)
* [Absolute and relative paths](#absolute-and-relative-paths)
* [Wildcard matching a.k.a. globbing](#wildcard-matching-aka-globbing)
* [Expanding the home directory and environment variables](#expanding-the-home-directory-and-environment-variables)
* [Cross-platform compatibility](#cross-platform-compatbility)
* [FileTrees](#filetree)
* [FileTree](#filetree)
* [Exercises](#exercises)
* [Re-name subject directories](#re-name-subject-directories)
* [Re-organise a data set](#re-organise-a-data-set)
* [Solutions](#solutions)
* [Appendix: Exceptions](#appendix-exceptions)
<a class="anchor" id="managing-files-and-directories"></a>
## Managing files and directories
The `os` module contains functions for querying and changing the current
working directory, moving and removing individual files, and for listing,
creating, removing, and traversing directories.
%% Cell type:code id: tags:
```
import os
import os.path as op
from pathlib import Path
```
%% Cell type:markdown id: tags:
> If you are using a library with a long name, you can create an alias for it
> using the `as` keyword, as we have done here for the `os.path` module.
<a class="anchor" id="querying-and-changing-the-current-directory"></a>
### Querying and changing the current directory
You can query and change the current directory with the `os.getcwd` and
`os.chdir` functions.
> Here we're also going to use the `expanduser` function from the `os.path`
> module, which allows us to expand the tilde character to the user's home
> directory This is [covered in more detail
> below](#expanding-the-home-directory-and-environment-variables).
%% Cell type:code id: tags:
```
cwd = os.getcwd()
print(f'Current directory: {cwd}')
os.chdir(op.expanduser('~'))
print(f'Changed to: {os.get_cwd()}')
print(f'Changed to: {os.getcwd()}')
os.chdir(cwd)
print(f'Changed back to: {cwd}')
```
%% Cell type:markdown id: tags:
For the rest of this practical, we're going to use a little data set that has
been pre-generated, and is located in a sub-directory called
`03_file_management`.
%% Cell type:code id: tags:
```
os.chdir('03_file_management')
```
%% Cell type:markdown id: tags:
<a class="anchor" id="directory-listings"></a>
### Directory listings
Use the `os.listdir` function to get a directory listing (a.k.a. the Unix `ls`
command):
%% Cell type:code id: tags:
```
cwd = os.getcwd()
listing = os.listdir(cwd)
print(f'Directory listing: {cwd}')
print('\n'.join(listing))
print()
datadir = 'raw_mri_data'
listing = os.listdir(datadir)
print(f'Directory listing: {datadir}')
print('\n'.join(listing))
```
%% Cell type:markdown id: tags:
> Check out the `os.scandir` function as an alternative to `os.listdir`, if
> you have performance problems on large data sets.
> In the code above, we used the string `join` method to print each path in
> our directory listing on a new line. If you have a list of strings, the
> `join` method is a handy way to insert a delimiting character or string
> (e.g. newline, space, tab, comma - any string you want), between each string
> in the list.
<a class="anchor" id="creating-and-removing-directories"></a>
### Creating and removing directories
You can, not surprisingly, use the `os.mkdir` function to make a
directory. The `os.makedirs` function is also handy - it is equivalent to
`mkdir -p` in Unix:
%% Cell type:code id: tags:
```
print(os.listdir('.'))
os.mkdir('onedir')
os.makedirs('a/big/tree/of/directories')
print(os.listdir('.'))
```
%% Cell type:markdown id: tags:
The `os.rmdir` and `os.removedirs` functions perform the reverse
operations. The `os.removedirs` function will only remove empty directories,
and you must pass it the _leaf_ directory, just like `rmdir -p` in Unix:
%% Cell type:code id: tags:
```
os.rmdir('onedir')
os.removedirs('a/big/tree/of/directories')
print(os.listdir('.'))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="moving-and-removing-files"></a>
### Moving and removing files
The `os.remove` and `os.rename` functions perform the equivalent of the Unix
`rm` and `mv` commands for files. Just like in Unix, if the destination file
you pass to `os.rename` already exists, it will be silently overwritten!
%% Cell type:code id: tags:
```
with open('file.txt', 'wt') as f:
f.write('This file contains nothing of interest')
print(os.listdir())
os.rename('file.txt', 'file2.txt')
print(os.listdir())
os.remove('file2.txt')
print(os.listdir())
```
%% Cell type:markdown id: tags:
The `os.rename` function will also work on directories, but the `shutil.move`
function (covered below) is more flexible.
<a class="anchor" id="walking-a-directory-tree"></a>
### Walking a directory tree
The `os.walk` function is a useful one to know about. It is a bit fiddly to
use, but it is the best option if you need to traverse a directory tree. It
will recursively iterate over all of the files in a directory tree - by
default it will traverse the tree in a breadth-first manner.
%% Cell type:code id: tags:
```
# On each iteration of the loop, we get:
# - root: the current directory
# - dirs: a list of all sub-directories in the root
# - files: a list of all files in the root
for root, dirs, files in os.walk('raw_mri_data'):
print('Current directory: {}'.format(root))
print(' Sub-directories:')
print('\n'.join([' {}'.format(d) for d in dirs]))
print(' Files:')
print('\n'.join([' {}'.format(f) for f in files]))
```
%% Cell type:markdown id: tags:
> Note that `os.walk` does not guarantee a specific ordering in the lists of
> files and sub-directories that it returns. However, you can force an
> ordering quite easily - see its
> [documentation](https://docs.python.org/3/library/os.html#os.walk) for
> more details.
If you need to traverse the directory depth-first, you can use the `topdown`
parameter:
%% Cell type:code id: tags:
```
for root, dirs, files in os.walk('raw_mri_data', topdown=False):
print('Current directory: {}'.format(root))
print(' Sub-directories:')
print('\n'.join([' {}'.format(d) for d in dirs]))
print(' Files:')
print('\n'.join([' {}'.format(f) for f in files]))
```
%% Cell type:markdown id: tags:
> Here we have explicitly named the `topdown` argument when passing it to the
> `os.walk` function. This is referred to as a a _keyword argument_ - unnamed
> arguments are referred to as _positional arguments_. We'll give some more
> examples of positional and keyword arguments below.
<a class="anchor" id="copying-moving-and-removing-directory-trees"></a>
### Copying, moving, and removing directory trees
The `shutil` module contains some higher level functions for copying and
moving files and directories.
%% Cell type:code id: tags:
```
import shutil
```
%% Cell type:markdown id: tags:
The `shutil.copy` and `shutil.move` functions work just like the Unix `cp` and
`mv` commands:
%% Cell type:code id: tags:
```
# copy the source file to a destination file
src = 'raw_mri_data/subj_1/t1.nii'
shutil.copy(src, 'subj_1_t1.nii')
print(os.listdir('.'))
# copy the source file to a destination directory
os.mkdir('data_backup')
shutil.copy('subj_1_t1.nii', 'data_backup')
print(os.listdir('.'))
print(os.listdir('data_backup'))
# Move the file copy into that destination directory
shutil.move('subj_1_t1.nii', 'data_backup/subj_1_t1_backup.nii')
print(os.listdir('.'))
print(os.listdir('data_backup'))
# Move that destination directory into another directory
os.mkdir('data_backup_backup')
shutil.move('data_backup', 'data_backup_backup')
print(os.listdir('.'))
print(os.listdir('data_backup_backup'))
```
%% Cell type:markdown id: tags:
The `shutil.copytree` function allows you to copy entire directory trees - it
is the equivalent of the Unix `cp -r` command. The reverse operation is provided
by the `shutil.rmtree` function:
%% Cell type:code id: tags:
```
shutil.copytree('raw_mri_data', 'raw_mri_data_backup')
print(os.listdir('.'))
shutil.rmtree('raw_mri_data_backup')
shutil.rmtree('data_backup_backup')
print(os.listdir('.'))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="managing-file-paths"></a>
## Managing file paths
The `os.path` module contains functions for creating and manipulating file and
directory paths, such as stripping directory prefixes and suffixes, and
joining directory paths in a cross-platform manner. In this code, we are using
`op` to refer to `os.path` - remember that we [created an alias
earlier](#managing-files-and-directories).
> Note that many of the functions in the `os.path` module do not care if your
> path actually refers to a real file or directory - they are just
> manipulating the path string, and will happily generate invalid or
> non-existent paths for you.
<a class="anchor" id="file-and-directory-tests"></a>
### File and directory tests
If you want to know whether a given path is a file, or a directory, or whether
it exists at all, then the `os.path` module has got your back with its
`isfile`, `isdir`, and `exists` functions. Let's define a silly function which
will tell us what a path is:
%% Cell type:code id: tags:
```
def whatisit(path, existonly=False):
print('Does {} exist? {}'.format(path, op.exists(path)))
if not existonly:
print('Is {} a file? {}' .format(path, op.isfile(path)))
print('Is {} a directory? {}'.format(path, op.isdir( path)))
```
%% Cell type:markdown id: tags:
> This is the first time in a while that we have defined our own function,
> [hooray!](https://www.youtube.com/watch?v=zQiibNVIvK4). Here's a quick
> refresher on how to write functions in Python, in case you have forgotten.
>
> First of all, all function definitions in Python begin with the `def`
> keyword:
>
> ```
> def myfunction():
> function_body
> ```
>
> Just like with other control flow tools, such as `if`, `for`, and `while`
> statements, the body of a function must be indented (with four spaces
> please!).
>
> Python functions can be written to accept any number of arguments:
>
> ```
> def myfunction(arg1, arg2, arg3):
> function_body
> ```
>
> Arguments can also be given default values:
>
> ```
> def myfunction(arg1, arg2, arg3=False):
> function_body
> ```
>
> In our `whatisit` function above, we gave the `existonly` argument (which
> controls whether the path is only tested for existence) a default value.
> This makes the `existonly` argument optional - we can call `whatisit` either
> with or without this argument.
>
> To return a value from a function, use the `return` keyword:
>
> ```
> def add(n1, n2):
> return n1 + n2
> ```
>
> Take a look at the [official Python
> tutorial](https://docs.python.org/3/tutorial/controlflow.html#defining-functions)
> for more details on defining your own functions.
Now let's use that function to test some paths. Here we are using the
`op.join` function to construct paths - it is [covered
below](#cross-platform-compatbility):
%% Cell type:code id: tags:
```
dirname = op.join('raw_mri_data')
filename = op.join('raw_mri_data', 'subj_1', 't1.nii')
nonexist = op.join('very', 'unlikely', 'to', 'exist')
whatisit(dirname)
whatisit(filename)
whatisit(nonexist)
whatisit(nonexist, existonly=True)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="deconstructing-paths"></a>
### Deconstructing paths
If you are only interested in the directory or file component of a path then
the `os.path` module has the `dirname`, `basename`, and `split` functions.
%% Cell type:code id: tags:
```
path = '/path/to/my/image.nii'
print('Directory name: {}'.format(op.dirname( path)))
print('Base name: {}'.format(op.basename(path)))
print('Directory and base names: {}'.format(op.split( path)))
```
%% Cell type:markdown id: tags:
> Note here that `op.split` returns both the directory and base names - remember
> that it is super easy to define a Python function that returns multiple values,
> simply by having it return a tuple. For example, the implementation of
> `op.split` might look something like this:
>
>
> ```
> def mysplit(path):
> dirname = op.dirname(path)
> basename = op.basename(path)
>
> # It is not necessary to use round brackets here
> # to denote the tuple - the return values will
> # be implicitly grouped into a tuple for us.
> return dirname, basename
> ```
>
>
> When calling a function which returns multiple values, you can _unpack_ those
> values in a single statement like so:
>
>
> ```
> dirname, basename = mysplit(path)
>
> print('Directory name: {}'.format(dirname))
> print('Base name: {}'.format(basename))
> ```
If you want to extract the prefix or suffix of a file, you can use `splitext`:
%% Cell type:code id: tags:
```
prefix, suffix = op.splitext('image.nii')
print('Prefix: {}'.format(prefix))
print('Suffix: {}'.format(suffix))
```
%% Cell type:markdown id: tags:
> Double-barrelled file suffixes (e.g. `.nii.gz`) are the work of the devil.
> Correct handling of them is an open problem in Computer Science, and is
> considered by many to be unsolvable. For `imglob`, `imcp`, and `immv`-like
> functionality, check out the `fsl.utils.path` and `fsl.utils.imcp` modules,
> part of the [`fslpy`
> project](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/). If you
> are using `fslpython`, then you already have access to all of the functions
> in `fslpy`.
<a class="anchor" id="absolute-and-relative-paths"></a>
### Absolute and relative paths
The `os.path` module has three useful functions for converting between
absolute and relative paths. The `op.abspath` and `op.relpath` functions will
respectively turn the provided path into an equivalent absolute or relative
path.
%% Cell type:code id: tags:
```
path = op.abspath('relative/path/to/some/file.txt')
print('Absolutised: {}'.format(path))
print('Relativised: {}'.format(op.relpath(path)))
```
%% Cell type:markdown id: tags:
By default, the `op.abspath` and `op.relpath` functions work relative to the
current working directory. The `op.relpath` function allows you to specify a
different directory to work from, and another function - `op.normpath` -
allows you create absolute paths with a different starting
point. `op.normpath` will take care of removing duplicate back-slashes,
and resolving references to `"."` and `".."`:
%% Cell type:code id: tags:
```
path = 'relative/path/to/some/file.txt'
root = '/vols/Data/'
abspath = op.normpath(op.join(root, path))
print('Absolute path: {}'.format(abspath))
print('Relative path: {}'.format(op.relpath(abspath, root)))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="wildcard-matching-aka-globbing"></a>
### Wildcard matching a.k.a. globbing
The `glob` module has a function, also called `glob`, which allows you to find
files, based on unix-style wildcard pattern matching.
%% Cell type:code id: tags:
```
from glob import glob
root = 'raw_mri_data'
# find all niftis for subject 1
images = glob(op.join(root, 'subj_1', '*.nii*'))
print('Subject #1 images:')
print('\n'.join([' {}'.format(i) for i in images]))
# find all subject directories
subjdirs = glob(op.join(root, 'subj_*'))
print('Subject directories:')
print('\n'.join([' {}'.format(d) for d in subjdirs]))
```
%% Cell type:markdown id: tags:
As with [`os.walk`](walking-a-directory-tree), the order of the results
returned by `glob` is arbitrary. Unfortunately the undergraduate who
acquired this specific data set did not think to use zero-padded subject IDs
(you'll be pleased to know that this student was immediately kicked out of his
college and banned from ever returning), so we can't simply sort the paths
alphabetically. Instead, let's use some trickery to sort the subject
directories numerically by ID:
Let's define a function which, given a subject directory, returns the numeric
subject ID:
%% Cell type:code id: tags:
```
def get_subject_id(subjdir):
# Remove any leading directories (e.g. "raw_mri_data/")
subjdir = op.basename(subjdir)
# Split "subj_[id]" into two words
prefix, sid = subjdir.split('_')
# return the subject ID as an integer
return int(sid)
```
%% Cell type:markdown id: tags:
This function works like so:
%% Cell type:code id: tags:
```
print(get_subject_id('raw_mri_data/subj_9'))
```
%% Cell type:markdown id: tags:
Now that we have this function, we can sort the directories in one line of
code, via the built-in
[`sorted`](https://docs.python.org/3/library/functions.html#sorted)
function. The directories will be sorted according to the `key` function that
we specify, which provides a mapping from each directory to a sortable
&quot;key&quot;:
%% Cell type:code id: tags:
```
subjdirs = sorted(subjdirs, key=get_subject_id)
print('Subject directories, sorted by ID:')
print('\n'.join([' {}'.format(d) for d in subjdirs]))
```
%% Cell type:markdown id: tags:
> Note that in Python, we can pass a function around just like any other
> variable - we passed the `get_subject_id` function as an argument to the
> `sorted` function. This is possible (and normal) because functions are
> [first class citizens](https://en.wikipedia.org/wiki/First-class_citizen) in
> Python!
As of Python 3.5, `glob` also supports recursive pattern matching via the
`recursive` flag. Let's say we want a list of all resting-state scans in our
data set:
%% Cell type:code id: tags:
```
rscans = glob('raw_mri_data/**/rest.nii.gz', recursive=True)
print('Resting state scans:')
print('\n'.join(rscans))
```
%% Cell type:markdown id: tags:
Internally, the `glob` module uses the `fnmatch` module, which implements the
pattern matching logic.
* If you are searching your file system for files and directory, use
`glob.glob`.
* But if you already have a set of paths, you can use the `fnmatch.fnmatch`
and `fnmatch.filter` functions to identify which paths match your pattern.
Note that the syntax used by `glob` and `fnmatch` is similar, but __not__
identical to the syntax that you are used to from `bash`. Refer to the
[`fnmatch` module](https://docs.python.org/3/library/fnmatch.html)
documentation for details. If you need more complicated pattern matching, you
can use regular expressions, available via the [`re`
module](https://docs.python.org/3/library/re.html).
For example, let's retrieve all images that are in our data set:
%% Cell type:code id: tags:
```
allimages = glob(op.join('raw_mri_data', '**', '*.nii*'), recursive=True)
print('All images in experiment:')
# Let's just print the first and last few
print('\n'.join([' {}'.format(i) for i in allimages[:3]]))
print(' .')
print(' .')
print(' .')
print('\n'.join([' {}'.format(i) for i in allimages[-3:]]))
```
%% Cell type:markdown id: tags:
Now let's reduce that list to only those images which are uncompressed:
%% Cell type:code id: tags:
```
import fnmatch
# filter a list of images
uncompressed = fnmatch.filter(allimages, '*.nii')
print('All uncompressed images:')
print('\n'.join([' {}'.format(i) for i in uncompressed]))
```
%% Cell type:markdown id: tags:
And further reduce the list by identifying which of these images are T1 scans,
and are from our fictional patient group, made up of subjects 1, 4, 7, 8, and
9:
%% Cell type:code id: tags:
```
patients = [1, 4, 7, 8, 9]
print('All uncompressed T1 images from patient group:')
for filename in uncompressed:
fullfile = filename
dirname, filename = op.split(fullfile)
subjid = get_subject_id(dirname)
if subjid in patients and fnmatch.fnmatch(filename, 't1.*'):
print(' {}'.format(fullfile))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="expanding-the-home-directory-and-environment-variables"></a>
### Expanding the home directory and environment variables
You have [already been
introduced](#querying-and-changing-the-current-directory) to the
`op.expanduser` function. Another handy function is the `op.expandvars`
function, which will expand expand any environment variables in a path:
%% Cell type:code id: tags:
```
print(op.expanduser('~'))
print(op.expandvars('$HOME'))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="cross-platform-compatbility"></a>
### Cross-platform compatibility
If you are the type of person who likes running code on both Windows and Unix
machines, you will be delighted to learn that the `os` module has a couple
of useful attributes:
- `os.sep` contains the separator character that is used in file paths on
your platform (i.e. &#47; on Unix, &#92; on Windows).
- `os.pathsep` contains the separator character that is used when creating
path lists (e.g. on your `$PATH` environment variable - &#58; on Unix,
and &#58; on Windows).
You will also find the `op.join` function handy. Given a set of directory
and/or file names, it will construct a path suited to the platform that you
are running on:
%% Cell type:code id: tags:
```
path = op.join('home', 'fsluser', '.bash_profile')
# if you are on Unix, you will get 'home/fsluser/.bash_profile'.
# On windows, you will get 'home\\fsluser\\.bash_profile'
print(path)
# Tn create an absolute path from
# the file system root, use os.sep:
print(op.join(op.sep, 'home', 'fsluser', '.bash_profile'))
```
%% Cell type:markdown id: tags:
> The `Path` object in the built-in [`pathlib`](https://docs.python.org/3/library/pathlib.html) also has
> excellent cross-platform support. If you write your file management code using this class you are far less likely
> to get errors on Windows.
<a class="anchor" id="filetree"></a>
## FileTree
`fslpy` also contains support for `FileTree` objects (docs are
[here](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.utils.filetree.html)).
These `FileTree` objects provide a simple format to define a whole directory structure with multiple subjects, sessions,
scans, etc. In the `FileTree` format the dataset we have been looking at so far would be described by:
%% Cell type:code id: tags:
```
tree_text = """
raw_mri_data
subj_{subject}
rest.nii.gz
t1.nii
t2.nii
task.nii.gz
"""
```
%% Cell type:markdown id: tags:
FileTrees are discussed in more detail in the advanced `fslpy` practical.
<a class="anchor" id="exercises"></a>
## Exercises
<a class="anchor" id="re-name-subject-directories"></a>
### Re-name subject directories
Write a function which can rename the subject directories in `raw_mri_data` so
that the subject IDs are padded with zeros, and thus will be able to be sorted
alphabetically. This function:
- Should accept the path to the parent directory of the data set
(`raw_mri_data` in this case).
- Should be able to handle any number of subjects
> Hint: `numpy.log10`
- May assume that the subject directory names follow the pattern
`subj_[id]`, where `[id]` is the integer subject ID.
<a class="anchor" id="re-organise-a-data-set"></a>
### Re-organise a data set
Write a function which can be used to separate the data for each group
(patients: 1, 4, 7, 8, 9, and controls: 2, 3, 5, 6, 10) into sub-directories
`CON` and `PAT`.
This function should work with any number of groups, and should accept three
parameters:
- The root directory of the data set (e.g. `raw_mri_data`).
- A list of strings, the labels for each group.
- A list of lists, with each list containing the subject IDs for one group.
__Extra exercises:__ If you are looking for something more to do, you can find
some more exercises in the file `03_file_management_extra.md`.
<a class="anchor" id="solutions"></a>
### Solutions
Use the `print_solution` function, defined below, to print the solution for a
specific exercise.
%% Cell type:code id: tags:
```
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import IPython
# Pass the title of the exercise you
# are interested to this function
def print_solution(extitle):
solfile = ''.join([c.lower() if c.isalnum() else '_' for c in extitle])
solfile = op.join('.solutions', '{}.py'.format(solfile))
if not op.exists(solfile):
print('Can\'t find solution to exercise "{}"'.format(extitle))
return
with open(solfile, 'rt') as f:
code = f.read()
formatter = HtmlFormatter()
return IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
formatter.get_style_defs('.highlight'),
highlight(code, PythonLexer(), formatter)))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="appendix-exceptions"></a>
## Appendix: Exceptions
At some point in your life, a piece of code that you write is inevitably going
to fail, and you are going to have to deal with it. This is particularly
relevant to file management tasks - many of the functions that have been
introduced in this practical can fail for all kinds of reasons, such as
incorrect permissions or ownership, lack of disk space, or a network file
system going down.
Any statement in Python can potentially result in an error. When a line of
code triggers an error, we say that it _raises_ the error (a.k.a. _throws_ in
other languages). When an error occurs, an `Exception` object is raised,
causing execution to stop at the line that caused the error:
%% Cell type:code id: tags:
```
a = [1, 2, 3]
a.remove(4)
```
%% Cell type:markdown id: tags:
The word `Exception` is used instead of `Error` because not all exceptions are
errors. For example, when you type CTRL+C into a running Python program, a
`KeyboardInterrupt` exception will be raised.
> There are many different types of exceptions in Python - a list of all the
> built-in ones can be found
> [here](https://docs.python.org/3/library/exceptions.html). It is also easy
> to define your own exceptions by creating a sub-class of `Exception` (beyond
> the scope of this practical).
Fortunately Python gives us the capability to _catch_ exceptions when they are
raised, using the `try` and `except` keywords. As an example, let's say that
the user asked our program to create a directory somewhere on the file system.
A real program would need to handle situations in which that directory cannot
be created - we might do it like this in Python:
%% Cell type:code id: tags:
```
import os
dirpath = '/sbin/foo'
try:
os.mkdir(dirpath)
except OSError as e:
print('Could not create {}! Reason: {}'.format(dirpath, e))
```
%% Cell type:markdown id: tags:
In this example, we have put the `os.mkdir` call inside a `try:` block. Now,
if it raises an `Exception` of type `OSError`, that `OSError` will be _caught_
and passed to the `except:` block. A `try` block must always followed by an
`except` block (and/or a `finally` block - keep reading).
The `except OSError as e:` line means: _if any code in the `try` block raises
an `Exception` of type `OSError`, then catch it, assign it to a variable
called `e`, and pass it to the code inside the `except` block._
### Catching different types of exceptions
It is common for a piece of code to have the potential to raise different
types of exceptions. Python allows you to have multiple `except` blocks
associated with a single `try` block, so you can handle different types of
exceptions in different ways. For example, you might want to print a useful
error message so the user knows what has gone wrong:
%% Cell type:code id: tags:
```
numerator = '123'
denominator = 0
try:
numerator = float(numerator)
print(numerator / denominator)
except TypeError as e:
print('Numerator and/or denominator are of the wrong type!')
print(' ', e)
except ValueError as e:
print('Numerator is not a float!')
print(' ', e)
# Note that specifying a variable to refer
# to the Exception object is optional.
except ZeroDivisionError:
print('Denominator is zero!')
```
%% Cell type:markdown id: tags:
Experiment with the above code block - try out different values for the
`numerator` and `denominator`, and see what happens.
You can also specify different types of exceptions in a single `except`
statement:
%% Cell type:code id: tags:
```
numerator = '123'
denominator = 0
try:
numerator = float(numerator)
print(numerator / denominator)
except (TypeError, ZeroDivisionError) as e:
print('Numerator and/or denominator are of the '
'wrong type, or the denominator is zero!')
print(' ', e)
```
%% Cell type:markdown id: tags:
### The catch-all approach
Instead of specifying all of the different types of exceptions that could
occur, it is possible to simply use a single `except` block to catch all
exceptions of type `Exception`:
%% Cell type:code id: tags:
```
numerator = 'abc'
denominator = 1
try:
numerator = float(numerator)
print(numerator / denominator)
except Exception as e:
print('Something is wrong with numerator or denominator!')
print(' ', e)
```
%% Cell type:markdown id: tags:
It is generally better practice to be as specific as possible when you are
catching exceptions, but sometimes all you care about is whether your code
worked or didn't, and in this case the you can simply use this catch-all
approach.
__Warning:__ Even though it is possible to, you should __never__ write a
`try`-`except` block like this:
%% Cell type:code id: tags:
```
try:
# do stuff
pass
except:
# handle exceptions
pass
```
%% Cell type:markdown id: tags:
You don't actually have to specify any exception type in an `except`
statement. But you should never do this! As we have already mentioned, not
all exceptions are errors. The above code will catch _all_ exceptions, even
those which do not inherit from the standard `Exception` class. This includes
important exceptions such as `KeyboardInterrupt` and `SystemExit`, which
control important aspects of your program's behaviour.
So you should always, at the very least, specify the `Exception` type:
%% Cell type:code id: tags:
```
try:
# do stuff
pass
except Exception:
# handle exceptions
pass
```
%% Cell type:markdown id: tags:
### The `finally` keyword
Sometimes, when you are performing a task, you might have some clean-up logic
that must be executed regardless of whether the task succeeded or failed. The
canonical example here is that if you open a file, you must make sure that to
close it when you are finished, otherwise its contents may be corrupted.
%% Cell type:code id: tags:
```
f = open('raw_mri_data/subj_1/t1.nii', 'rb')
try:
f.write('ho hum')
except IOError as e:
print('Error occurred!: ', e)
finally:
print('Closing file')
f.close()
```
%% Cell type:markdown id: tags:
It is possible to use `try` and `finally` without an `except` block. This is
useful if you have some code that needs some clean-up logic, but you don't
actually want to catch the exception - sometimes it is better for a program
to crash, rather than for errors to be silently suppressed, because it can
be easier to figure out what went wrong:
%% Cell type:code id: tags:
```
f = open('raw_mri_data/subj_1/t1.nii', 'rb')
try:
f.write('ho hum')
finally:
print('Closing file')
f.close()
```
%% Cell type:markdown id: tags:
> The above was just an example - it is generally better practice to use the
> `with` statement when opening files.
You can read more about handling exceptions in Python
[here](https://docs.python.org/3/tutorial/errors.html).
### Raising exceptions
It is possible to generate your own exception at any point by using the
`raise` keyword, and passing it an `Exception` object:
%% Cell type:code id: tags:
```
raise Exception('Kaboom!')
```
%% Cell type:markdown id: tags:
This can be useful if your code detects that something has gone wrong, and
needs to abort.
You can also raise an existing `Exception` from within an `except` block:
%% Cell type:code id: tags:
```
try:
print(1 / 0)
except Exception:
print('Some error occurred!')
raise
```
%% Cell type:markdown id: tags:
This can be useful if you want to print a message when an exception occurs,
but also allow the execption to be propagated upwards.
......
......@@ -49,7 +49,7 @@ other sections as a reference. You might miss out on some neat tricks though.
* [Wildcard matching a.k.a. globbing](#wildcard-matching-aka-globbing)
* [Expanding the home directory and environment variables](#expanding-the-home-directory-and-environment-variables)
* [Cross-platform compatibility](#cross-platform-compatbility)
* [FileTrees](#filetree)
* [FileTree](#filetree)
* [Exercises](#exercises)
* [Re-name subject directories](#re-name-subject-directories)
* [Re-organise a data set](#re-organise-a-data-set)
......@@ -97,7 +97,7 @@ cwd = os.getcwd()
print(f'Current directory: {cwd}')
os.chdir(op.expanduser('~'))
print(f'Changed to: {os.get_cwd()}')
print(f'Changed to: {os.getcwd()}')
os.chdir(cwd)
print(f'Changed back to: {cwd}')
......
%% Cell type:markdown id: tags:
# Numpy
This section introduces you to [`numpy`](http://www.numpy.org/), Python's
numerical computing library.
Numpy is not actually part of the standard Python library. But it is a
fundamental part of the Python ecosystem - it forms the basis for many
important Python libraries, and it (along with its partners
[`scipy`](https://www.scipy.org/), [`matplotlib`](https://matplotlib.org/) and
[`pandas`](https://pandas.pydata.org/)) is what makes Python an attractive
alternative to Matlab as a scientific computing platform.
## Contents
* [The Python list versus the Numpy array](#the-python-list-versus-the-numpy-array)
* [Numpy basics](#numpy-basics)
* [Creating arrays](#creating-arrays)
* [Loading text files](#loading-text-files)
* [Array properties](#array-properties)
* [Descriptive statistics](#descriptive-statistics)
* [Reshaping and rearranging arrays](#reshaping-and-rearranging-arrays)
* [Operating on arrays](#operating-on-arrays)
* [Scalar operations](#scalar-operations)
* [Multi-variate operations](#multi-variate-operations)
* [Matrix multplication](#matrix-multiplication)
* [Broadcasting](#broadcasting)
* [Linear algebra](#linear-algebra)
* [Array indexing](#array-indexing)
* [Indexing multi-dimensional arrays](#indexing-multi-dimensional-arrays)
* [Boolean indexing](#boolean-indexing)
* [Coordinate array indexing](#coordinate-array-indexing)
* [Exercises](#exercises)
* [Load an array from a file and do stuff with it](#load-an-array-from-a-file-and-do-stuff-with-it)
* [Concatenate affine transforms](#concatenate-affine-transforms)
* [Appendix A: Generating random numbers](#appendix-generating-random-numbers)
* [Appendix B: Importing Numpy](#appendix-importing-numpy)
* [Appendix C: Vectors in Numpy](#appendix-vectors-in-numpy)
* [Appendix D: The Numpy `matrix`](#appendix-the-numpy-matrix)
* [Useful references](#useful-references)
<a class="anchor" id="the-python-list-versus-the-numpy-array"></a>
## The Python list versus the Numpy array
Numpy adds a new data type to the Python language - the `array` (more
specifically, the `ndarray`). A Numpy `array` is a N-dimensional array of
homogeneously-typed numerical data.
You have already been introduced to the Python `list`, which you can easily
use to store a handful of numbers (or anything else):
%% Cell type:code id: tags:
```
data = [10, 8, 12, 14, 7, 6, 11]
```
%% Cell type:markdown id: tags:
You could also emulate a 2D or ND matrix by using lists of lists, for example:
%% Cell type:code id: tags:
```
xyz_coords = [[-11.4, 1.0, 22.6],
[ 22.7, -32.8, 19.1],
[ 62.8, -18.2, -34.5]]
```
%% Cell type:markdown id: tags:
For simple tasks, you could stick with processing your data using python
lists, and the built-in
[`math`](https://docs.python.org/3/library/math.html) library. And this
might be tempting, because it does look quite a lot like what you might type
into Matlab.
But __BEWARE!__ A Python list is a terrible data structure for scientific
computing!
This is a major source of confusion for people who are learning Python, and
are trying to write efficient code. It is _crucial_ to be able to distinguish
between a Python list and a Numpy array.
___Python list == Matlab cell array:___ A list in Python is akin to a cell
array in Matlab - they can store anything, but are extremely inefficient, and
unwieldy when you have more than a couple of dimensions.
___Numpy array == Matlab matrix:___ These are in contrast to the Numpy array
and Matlab matrix, which are both thin wrappers around a contiguous chunk of
memory, and which provide blazing-fast performance (because behind the scenes
in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down).
So you should strongly consider turning those lists into Numpy arrays:
%% Cell type:code id: tags:
```
import numpy as np
data = np.array([10, 8, 12, 14, 7, 6, 11])
xyz_coords = np.array([[-11.4, 1.0, 22.6],
[ 22.7, -32.8, 19.1],
[ 62.8, -18.2, -34.5]])
```
%% Cell type:markdown id: tags:
If you look carefully at the code above, you will notice that we are still
actually using Python lists. We have declared our data sets in exactly the
same way as we did earlier, by denoting them with square brackets `[` and `]`.
The key difference here is that these lists immediately get converted into
Numpy arrays, by passing them to the `np.array` function. To clarify this
point, we could rewrite this code in the following equivalent manner:
%% Cell type:code id: tags:
```
import numpy as np
# Define our data sets as python lists
data = [10, 8, 12, 14, 7, 6, 11]
xyz_coords = [[-11.4, 1.0, 22.6],
[ 22.7, -32.8, 19.1],
[ 62.8, -18.2, -34.5]]
# Convert them to numpy arrays
data = np.array(data)
xyz_coords = np.array(xyz_coords)
```
%% Cell type:markdown id: tags:
Of course, in practice, we would never create a Numpy array in this way - we
will be loading our data from text or binary files directly into a Numpy
array, thus completely bypassing the use of Python lists and the costly
list-to-array conversion. I'm emphasising this to help you understand the
difference between Python lists and Numpy arrays. Apologies if you've already
got it, [forgiveness
please](https://www.youtube.com/watch?v=ZeHflFNR4kQ&feature=youtu.be&t=128).
<a class="anchor" id="numpy-basics"></a>
## Numpy basics
Let's get started.
%% Cell type:code id: tags:
```
import numpy as np
```
%% Cell type:markdown id: tags:
<a class="anchor" id="creating-arrays"></a>
### Creating arrays
Numpy has quite a few functions which behave similarly to their equivalents in
Matlab:
%% Cell type:code id: tags:
```
print('np.zeros gives us zeros: ', np.zeros(5))
print('np.ones gives us ones: ', np.ones(5))
print('np.arange gives us a range: ', np.arange(5))
print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5))
print('np.random.random gives us random numbers [0-1]:', np.random.random(5))
print('np.random.randint gives us random integers: ', np.random.randint(1, 10, 5))
print('np.eye gives us an identity matrix:')
print(np.eye(4))
print('np.diag gives us a diagonal matrix:')
print(np.diag([1, 2, 3, 4]))
```
%% Cell type:markdown id: tags:
> See the [appendix](#appendix-generating-random-numbers) for more information
> on generating random numbers in Numpy.
The `zeros` and `ones` functions can also be used to generate N-dimensional
arrays:
%% Cell type:code id: tags:
```
z = np.zeros((3, 4))
o = np.ones((2, 10))
print(z)
print(o)
```
%% Cell type:markdown id: tags:
> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the
> second to columns - just like in Matlab.
<a class="anchor" id="loading-text-files"></a>
### Loading text files
The `numpy.loadtxt` function is capable of loading numerical data from
plain-text files. By default it expects space-separated data:
%% Cell type:code id: tags:
```
data = np.loadtxt('04_numpy/space_separated.txt')
print('data in 04_numpy/space_separated.txt:')
print(data)
```
%% Cell type:markdown id: tags:
But you can also specify the delimiter to expect<sup>1</sup>:
%% Cell type:code id: tags:
```
data = np.loadtxt('04_numpy/comma_separated.txt', delimiter=',')
print('data in 04_numpy/comma_separated.txt:')
print(data)
```
%% Cell type:markdown id: tags:
> <sup>1</sup> And many other things such as file headers, footers, comments,
> and newline characters - see the
> [docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html)
> for more information.
Of course you can also save data out to a text file just as easily, with
[`numpy.savetxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html):
%% Cell type:code id: tags:
```
data = np.random.randint(1, 10, (10, 10))
np.savetxt('mydata.txt', data, delimiter=',', fmt='%i')
with open('mydata.txt', 'rt') as f:
for line in f:
print(line.strip())
```
%% Cell type:markdown id: tags:
> The `fmt` argument to the `numpy.savetxt` function uses a specification
> language similar to that used in the C `printf` function - in the example
> above, `'%i`' indicates that the values of the array should be output as
> signed integers. See the [`numpy.savetxt`
> documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html)
> for more details on specifying the output format.
<a class="anchor" id="array-properties"></a>
### Array properties
Numpy is a bit different than Matlab in the way that you interact with
arrays. In Matlab, you would typically pass an array to a built-in function,
e.g. `size(M)`, `ndims(M)`, etc. In contrast, a Numpy array is a Python
object which has _attributes_ that contain basic information about the array:
%% Cell type:code id: tags:
```
z = np.zeros((2, 3, 4))
print(z)
print('Shape: ', z.shape)
print('Number of dimensions: ', z.ndim)
print('Number of elements: ', z.size)
print('Data type: ', z.dtype)
print('Number of bytes: ', z.nbytes)
print('Length of first dimension: ', len(z))
```
%% Cell type:markdown id: tags:
> As depicted above, passing a Numpy array to the built-in `len` function will
> only give you the length of the first dimension, so you will typically want
> to avoid using it - instead, use the `size` attribute if you want to know
> how many elements are in an array, or the `shape` attribute if you want to
> know the array shape.
<a class="anchor" id="descriptive-statistics"></a>
### Descriptive statistics
Similarly, a Numpy array has a set of methods<sup>2</sup> which allow you to
calculate basic descriptive statistics on an array:
%% Cell type:code id: tags:
```
a = np.random.random(10)
print('a: ', a)
print('min: ', a.min())
print('max: ', a.max())
print('index of min: ', a.argmin()) # remember that in Python, list indices
print('index of max: ', a.argmax()) # start from zero - Numpy is the same!
print('mean: ', a.mean())
print('variance: ', a.var())
print('stddev: ', a.std())
print('sum: ', a.sum())
print('prod: ', a.prod())
```
%% Cell type:markdown id: tags:
These methods can also be applied to arrays with multiple dimensions:
%% Cell type:code id: tags:
```
a = np.random.randint(1, 10, (3, 3))
print('a:')
print(a)
print('min: ', a.min())
print('row mins: ', a.min(axis=1))
print('col mins: ', a.min(axis=0))
print('Min index : ', a.argmin())
print('Row min indices: ', a.argmin(axis=1))
```
%% Cell type:markdown id: tags:
Note that, for a multi-dimensional array, the `argmin` and `argmax` methods
will return the (0-based) index of the minimum/maximum values into a
[flattened](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.flatten.html)
view of the array.
> <sup>2</sup> Python, being an object-oriented language, distinguishes
> between _functions_ and _methods_. Hopefully we all know what a function
> is - a _method_ is simply the term used to refer to a function that is
> associated with a specific object. Similarly, the term _attribute_ is used
> to refer to some piece of information that is attached to an object, such as
> `z.shape`, or `z.dtype`.
<a class="anchor" id="reshaping-and-rearranging-arrays"></a>
### Reshaping and rearranging arrays
A numpy array can be reshaped very easily, using the `reshape` method.
%% Cell type:code id: tags:
```
a = np.arange(16).reshape(4, 4)
b = a.reshape((2, 8))
print('a:')
print(a)
print('b:')
print(b)
```
%% Cell type:markdown id: tags:
Note that this does not modify the underlying data in any way - the `reshape`
method returns a _view_ of the same array, just indexed differently:
%% Cell type:code id: tags:
```
a[3, 3] = 12345
b[0, 7] = 54321
print('a:')
print(a)
print('b:')
print(b)
```
%% Cell type:markdown id: tags:
If you need to create a reshaped copy of an array, use the `np.array`
function:
%% Cell type:code id: tags:
```
a = np.arange(16).reshape((4, 4))
b = np.array(a.reshape(2, 8))
a[3, 3] = 12345
b[0, 7] = 54321
print('a:')
print(a)
print('b:')
print(b)
```
%% Cell type:markdown id: tags:
The `T` attribute is a shortcut to obtain the transpose of an array.
%% Cell type:code id: tags:
```
a = np.arange(16).reshape((4, 4))
print(a)
print(a.T)
```
%% Cell type:markdown id: tags:
The `transpose` method allows you to obtain more complicated rearrangements
of an N-dimensional array's axes:
%% Cell type:code id: tags:
```
a = np.arange(24).reshape((2, 3, 4))
b = a.transpose((2, 0, 1))
print('a: ', a.shape)
print(a)
print('b:', b.shape)
print(b)
```
%% Cell type:markdown id: tags:
> Note again that the `T` attribute and `transpose` method return _views_ of
> your array.
Numpy has some useful functions which allow you to concatenate or stack
multiple arrays into one. The `concatenate` function does what it says on the
tin:
%% Cell type:code id: tags:
```
a = np.zeros(3)
b = np.ones(3)
print('1D concatenation:', np.concatenate((a, b)))
a = np.zeros((3, 3))
b = np.ones((3, 3))
print('2D column-wise concatenation:')
print(np.concatenate((a, b), axis=1))
print('2D row-wise concatenation:')
# The axis parameter defaults to 0,
# so it is not strictly necessary here.
print(np.concatenate((a, b), axis=0))
```
%% Cell type:markdown id: tags:
The `hstack`, `vstack` and `dstack` functions allow you to concatenate vectors
or arrays along the first, second, or third dimension respectively:
%% Cell type:code id: tags:
```
a = np.zeros(3)
b = np.ones(3)
print('a: ', a)
print('b: ', b)
hstacked = np.hstack((a, b))
vstacked = np.vstack((a, b))
dstacked = np.dstack((a, b))
print('hstacked: (shape {}):'.format(hstacked.shape))
print( hstacked)
print('vstacked: (shape {}):'.format(vstacked.shape))
print( vstacked)
print('dstacked: (shape {}):'.format(dstacked.shape))
print( dstacked)
```
%% Cell type:markdown id: tags:
Alternatively, you can use the `stack` function and give the index of the dimension along which the array
should be stacked as the `axis` keyword (so, `np.vstack((a, b))` is equivalent to `np.stack((a, b), axis=1)`).
<a class="anchor" id="operating-on-arrays"></a>
## Operating on arrays
If you are coming from Matlab, you should read this section as, while many
Numpy operations behave similarly to Matlab, there are a few important
behaviours which differ from what you might expect.
<a class="anchor" id="scalar-operations"></a>
### Scalar operations
All of the mathematical operators you know and love can be applied to Numpy
arrays:
%% Cell type:code id: tags:
```
a = np.arange(1, 10).reshape((3, 3))
print('a:')
print(a)
print('a + 2:')
print( a + 2)
print('a * 3:')
print( a * 3)
print('a % 2:')
print( a % 2)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="multi-variate-operations"></a>
### Multi-variate operations
Many operations in Numpy operate on an element-wise basis. For example:
%% Cell type:code id: tags:
```
a = np.ones(5)
b = np.random.randint(1, 10, 5)
print('a: ', a)
print('b: ', b)
print('a + b: ', a + b)
print('a * b: ', a * b)
```
%% Cell type:markdown id: tags:
This also extends to higher dimensional arrays:
%% Cell type:code id: tags:
```
a = np.ones((4, 4))
b = np.arange(16).reshape((4, 4))
print('a:')
print(a)
print('b:')
print(b)
print('a + b')
print(a + b)
print('a * b')
print(a * b)
```
%% Cell type:markdown id: tags:
Wait ... what's that you say? Oh, I couldn't understand because of all the
froth coming out of your mouth. I guess you're angry that `a * b` didn't give
you the matrix product, like it would have in Matlab. Well all I can say is
that Numpy is not Matlab. Matlab operations are typically consistent with
linear algebra notation. This is not the case in Numpy. Get over it.
[Get yourself a calmative](https://youtu.be/M_w_n-8w3IQ?t=32).
<a class="anchor" id="matrix-multiplication"></a>
### Matrix multiplication
When your heart rate has returned to its normal caffeine-induced state, you
can use the `@` operator or the `dot` method, to perform matrix
multiplication:
%% Cell type:code id: tags:
```
a = np.arange(1, 5).reshape((2, 2))
b = a.T
print('a:')
print(a)
print('b:')
print(b)
print('a @ b')
print(a @ b)
print('a.dot(b)')
print(a.dot(b))
print('b.dot(a)')
print(b.dot(a))
```
%% Cell type:markdown id: tags:
> The `@` matrix multiplication operator is a relatively recent addition to
> Python and Numpy, so you might not see it all that often in existing
> code. But it's here to stay, so if you don't need to worry about
> backwards-compatibility, go ahead and use it!
One potential source of confusion for those of you who are used to Matlab's
linear algebra-based take on things is that Numpy treats row and column
vectors differently - you should take a break now and skim over the [appendix
on vectors in Numpy](#appendix-vectors-in-numpy).
For matrix-by-vector multiplications, a 1-dimensional Numpy array may be
treated as _either_ a row vector _or_ a column vector, depending on where
it is in the expression:
%% Cell type:code id: tags:
```
a = np.arange(1, 5).reshape((2, 2))
b = np.random.randint(1, 10, 2)
print('a:')
print(a)
print('b:', b)
print('a @ b - b is a column vector:')
print(a @ b)
print('b @ a - b is a row vector:')
print(b @ a)
```
%% Cell type:markdown id: tags:
If you really can't stand using `@` to denote matrix multiplication, and just
want things to be like they were back in Matlab-land, you do have the option
of using a different Numpy data type - the `matrix` - which behaves a bit more
like what you might expect from Matlab. You can find a brief overview of the
`matrix` data type in [the appendix](appendix-the-numpy-matrix).
<a class="anchor" id="broadcasting"></a>
### Broadcasting
One of the coolest features of Numpy is *broadcasting*<sup>3</sup>.
Broadcasting allows you to perform element-wise operations on arrays which
have a different shape. For each axis in the two arrays, Numpy will implicitly
expand the shape of the smaller axis to match the shape of the larger one. You
never need to use `repmat` ever again!
> <sup>3</sup>Mathworks have shamelessly stolen Numpy's broadcasting behaviour
> and included it in Matlab versions from 2016b onwards, referring to it as
> _implicit expansion_.
Broadcasting allows you to, for example, add the elements of a 1D vector to
all of the rows or columns of a 2D array:
%% Cell type:code id: tags:
```
a = np.arange(9).reshape((3, 3))
b = np.arange(1, 4)
print('a:')
print(a)
print('b: ', b)
print('a * b (row-wise broadcasting):')
print(a * b)
print('a * b.T (column-wise broadcasting):')
print(a * b.reshape(-1, 1))
```
%% Cell type:markdown id: tags:
> Here we used a handy feature of the `reshape` method - if you pass `-1` for
> the size of one dimension, it will automatically determine the size to use
> for that dimension.
Here is a more useful example, where we use broadcasting to de-mean the rows
or columns of an array:
%% Cell type:code id: tags:
```
a = np.arange(9).reshape((3, 3))
print('a:')
print(a)
print('a (cols demeaned):')
print(a - a.mean(axis=0))
print('a (rows demeaned):')
print(a - a.mean(axis=1).reshape(-1, 1))
```
%% Cell type:markdown id: tags:
> As demonstrated above, many functions in Numpy accept an `axis` parameter,
> allowing you to apply the function along a specific axis. Omitting the
> `axis` parameter will apply the function to the whole array.
Broadcasting can sometimes be confusing, but the rules which Numpy follows to
align arrays of different sizes, and hence determine how the broadcasting
should be applied, are pretty straightforward. If something is not working,
and you can't figure out why refer to the [official
documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
In short the broadcasting rules are:
1. If the input arrays have a different number of dimensions, the ones with fewer
dimensions will have new dimensions with length 1 prepended until all arrays
have the same number of dimensions. So adding a 2D array shaped (3, 3) with
a 1D array of length (3, ), is equivalent to adding the two 2D arrays with
shapes (3, 3) and (1, 3).
2. Once, all the arrays have the same number of dimensions, the arrays are combined
elementwise. Each dimension is compatible between the two arrays if they have
equal length or one has a length of 1. In the latter case the dimension will
be repeated using a procedure equivalent to Matlab's `repmat`).
<a class="anchor" id="linear-algebra"></a>
### Linear algebra
Numpy is first and foremost a library for general-purpose numerical computing.
But it does have a range of linear algebra functionality, hidden away in the
[`numpy.linalg`](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
module. Here are a couple of quick examples:
%% Cell type:code id: tags:
```
import numpy.linalg as npla
a = np.array([[1, 2, 3, 4],
[0, 5, 6, 7],
[0, 0, 8, 9],
[0, 0, 0, 10]])
print('a:')
print(a)
print('inv(a)')
print(npla.inv(a))
eigvals, eigvecs = npla.eig(a)
print('eigenvalues and vectors of a:')
for val, vec in zip(eigvals, eigvecs):
print('{:2.0f} - {}'.format(val, vec))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="array-indexing"></a>
## Array indexing
Just like in Matlab, slicing up your arrays is a breeze in Numpy. If you are
after some light reading, you might want to check out the [comprehensive Numpy
Indexing
reference](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).
> As with indexing regular Python lists, array indices start from 0, and end
> indices (if specified) are exclusive.
Let's whet our appetites with some basic 1D array slicing. Numpy supports the
standard Python
[__slice__](https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/)
notation for indexing, where you can specify the start and end indices, and
the step size, via the `start:stop:step` syntax:
%% Cell type:code id: tags:
```
a = np.arange(10)
print('a: ', a)
print('first element: ', a[0])
print('first two elements: ', a[:2])
print('last element: ', a[a.shape[0] - 1])
print('last element again: ', a[-1])
print('last two elements: ', a[-2:])
print('middle four elements: ', a[3:7])
print('Every second element: ', a[1::2])
print('Every second element, reversed: ', a[-1::-2])
```
%% Cell type:markdown id: tags:
Note that slicing an array in this way returns a _view_, not a copy, into that
array:
%% Cell type:code id: tags:
```
a = np.arange(10)
print('a:', a)
every2nd = a[::2]
print('every 2nd:', every2nd)
every2nd += 10
print('a:', a)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="indexing-multi-dimensional-arrays"></a>
### Indexing multi-dimensional arrays
Multi-dimensional array indexing works in much the same way as one-dimensional
indexing but with, well, more dimensions. Use commas within the square
brackets to separate the slices for each dimension:
%% Cell type:code id: tags:
```
a = np.arange(25).reshape((5, 5))
print('a:')
print(a)
print(' First row: ', a[ 0, :])
print(' Last row: ', a[ -1, :])
print(' second column: ', a[ :, 1])
print(' Centre:')
print(a[1:4, 1:4])
```
%% Cell type:markdown id: tags:
For arrays with more than two dimensions, the ellipsis (`...`) is a handy
feature - it allows you to specify a slice comprising all elements along
more than one dimension:
%% Cell type:code id: tags:
```
a = np.arange(27).reshape((3, 3, 3))
print('a:')
print(a)
print('All elements at x=0:')
print(a[0, ...])
print('All elements at z=2:')
print(a[..., 2])
print('All elements at x=0, z=2:')
print(a[0, ..., 2])
```
%% Cell type:markdown id: tags:
<a class="anchor" id="boolean-indexing"></a>
### Boolean indexing
A numpy array can be indexed with a boolean array of the same shape. For
example:
%% Cell type:code id: tags:
```
a = np.arange(10)
print('a: ', a)
print('a > 5: ', a > 4)
print('elements in a that are > 5: ', a[a > 5])
```
%% Cell type:markdown id: tags:
In contrast to the simple indexing we have already seen, boolean indexing will
return a _copy_ of the indexed data, __not__ a view. For example:
%% Cell type:code id: tags:
```
a = np.arange(10)
b = a[a > 5]
print('a: ', a)
print('b: ', b)
print('Setting b[0] to 999')
b[0] = 999
print('a: ', a)
print('b: ', b)
```
%% Cell type:markdown id: tags:
> In general, any 'simple' indexing operation on a Numpy array, where the
> indexing object comprises integers, slices (using the standard Python
> `start:stop:step` notation), colons (`:`) and/or ellipses (`...`), will
> result in a __view__ into the indexed array. Any 'advanced' indexing
> operation, where the indexing object contains anything else (e.g. boolean or
> integer arrays, or even python lists), will result in a __copy__ of the
> data.
Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate
and combine boolean Numpy arrays:
%% Cell type:code id: tags:
```
a = np.arange(10)
gt5 = a > 5
even = a % 2 == 0
print('a: ', a)
print('elements in a which are > 5: ', a[ gt5])
print('elements in a which are <= 5: ', a[~gt5])
print('elements in a which are even: ', a[ even])
print('elements in a which are odd: ', a[~even])
print('elements in a which are > 5 and even: ', a[gt5 & even])
print('elements in a which are > 5 or odd: ', a[gt5 | ~even])
```
%% Cell type:markdown id: tags:
Numpy also has two handy functions, `all` and `any`, which respectively allow
you to perform boolean `and` and `or` operations along the axes of an array:
%% Cell type:code id: tags:
```
a = np.arange(9).reshape((3, 3))
print('a:')
print(a)
print('rows with any element divisible by 3: ', np.any(a % 3 == 0, axis=1))
print('cols with any element divisible by 3: ', np.any(a % 3 == 0, axis=0))
print('rows with all elements divisible by 3:', np.all(a % 3 == 0, axis=1))
print('cols with all elements divisible by 3:', np.all(a % 3 == 0, axis=0))
```
%% Cell type:markdown id: tags:
<a class="anchor" id="coordinate-array-indexing"></a>
### Coordinate array indexing
You can index a numpy array using another array containing coordinates into
the first array. As with boolean indexing, this will result in a copy of the
data. Generally, you will need to have a separate array, or list, of
coordinates for each axis of the array you wish to index:
%% Cell type:code id: tags:
```
a = np.arange(16).reshape((4, 4))
print('a:')
print(a)
rows = [0, 2, 3]
cols = [1, 0, 2]
indexed = a[rows, cols]
for r, c, v in zip(rows, cols, indexed):
print('a[{}, {}] = {}'.format(r, c, v))
```
%% Cell type:markdown id: tags:
The `numpy.where` function can be combined with boolean arrays to easily
generate coordinate arrays for values which meet some condition:
%% Cell type:code id: tags:
```
a = np.arange(16).reshape((4, 4))
print('a:')
print(a)
evenrows, evencols = np.where(a % 2 == 0)
print('even row coordinates:', evenx)
print('even col coordinates:', eveny)
print('even row coordinates:', evenrows)
print('even col coordinates:', evencols)
print(a[evenrows, evencols])
```
%% Cell type:markdown id: tags:
<a class="anchor" id="exercises"></a>
## Exercises
The challenge for each of these exercises is to complete them in as few lines
of code as possible!
> You can find example answers to the exercises in `04_numpy/.solutions`.
<a class="anchor" id="load-an-array-from-a-file-and-do-stuff-with-it"></a>
### Load an array from a file and do stuff with it
Load the file `04_numpy/2d_array.txt`, and calculate and print the mean for
each column. If your code doesn't work, you might want to __LOOK AT YOUR
DATA__, as you will have learnt if you have ever attended the FSL course.
> Bonus: Find the hidden message (hint:
> [chr](https://docs.python.org/3/library/functions.html#chr))
<a class="anchor" id="concatenate-affine-transforms"></a>
### Concatenate affine transforms
Given all of the files in `04_numpy/xfms/`, create a transformation matrix
which can transform coordinates from subject 1 functional space to subject 2
functional space<sup>4</sup>.
Write some code to transform the following coordinates from subject 1
functional space to subject 2 functional space, to test that your matrix is
correct:
| __Subject 1 coordinates__ | __Subject 2 coordinates__ |
|:-------------------------:|:-------------------------:|
| `[ 0.0, 0.0, 0.0]` | `[ 3.21, 4.15, -9.89]` |
| `[-5.0, -20.0, 10.0]` | `[-0.87, -14.36, -1.13]` |
| `[20.0, 25.0, 60.0]` | `[29.58, 27.61, 53.37]` |
> <sup>4</sup> Even though these are FLIRT transforms, this is just a toy
> example. Look
> [here](https://users.fmrib.ox.ac.uk/~paulmc/fsleyes/fslpy/latest/fsl.transform.flirt.html)
> and
> [here](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT/FAQ#What_is_the_format_of_the_matrix_used_by_FLIRT.2C_and_how_does_it_relate_to_the_transformation_parameters.3F)
> if you actually need to work with FLIRT transforms.
<a class="anchor" id="appendix-generating-random-numbers"></a>
## Appendix A: Generating random numbers
Numpy's
[`numpy.random`](https://docs.scipy.org/doc/numpy/reference/routines.random.html)
module is where you should go if you want to introduce a little randomness
into your code. You have already seen a couple of functions for generating
uniformly distributed real or integer data:
%% Cell type:code id: tags:
```
import numpy.random as npr
print('Random floats between 0.0 (inclusive) and 1.0 (exclusive):')
print(npr.random((3, 3)))
print('Random integers in a specified range:')
print(npr.randint(1, 100, (3, 3)))
```
%% Cell type:markdown id: tags:
You can also draw random data from other distributions - here are just a few
examples:
%% Cell type:code id: tags:
```
print('Gaussian (mean: 0, stddev: 1):')
print(npr.normal(0, 1, (3, 3)))
print('Gamma (shape: 1, scale: 1):')
print(npr.normal(1, 1, (3, 3)))
print('Chi-square (dof: 10):')
print(npr.chisquare(10, (3, 3)))
```
%% Cell type:markdown id: tags:
The `numpy.random` module also has a couple of other handy functions for
random sampling of existing data:
%% Cell type:code id: tags:
```
data = np.arange(5)
print('data: ', data)
print('two random values: ', npr.choice(data, 2))
print('random permutation: ', npr.permutation(data))
# The numpy.random.shuffle function
# will shuffle an array *in-place*.
npr.shuffle(data)
print('randomly shuffled: ', data)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="appendix-importing-numpy"></a>
## Appendix B: Importing Numpy
For interactive exploration/experimentation, you might want to import
Numpy like this:
%% Cell type:code id: tags:
```
from numpy import *
```
%% Cell type:markdown id: tags:
This makes your Python session very similar to Matlab - you can call all
of the Numpy functions directly:
%% Cell type:code id: tags:
```
e = array([1, 2, 3, 4, 5])
z = zeros((100, 100))
d = diag([2, 3, 4, 5])
print(e)
print(z)
print(d)
```
%% Cell type:markdown id: tags:
But if you are writing a script or application using Numpy, I implore you to
import Numpy (and its commonly used sub-modules) like this instead:
%% Cell type:code id: tags:
```
import numpy as np
import numpy.random as npr
import numpy.linalg as npla
```
%% Cell type:markdown id: tags:
The downside to this is that you will have to prefix all Numpy functions with
`np.`, like so:
%% Cell type:code id: tags:
```
e = np.array([1, 2, 3, 4, 5])
z = np.zeros((100, 100))
d = np.diag([2, 3, 4, 5])
r = npr.random(5)
print(e)
print(z)
print(d)
print(r)
```
%% Cell type:markdown id: tags:
There is a big upside, however, in that other people who have to read/use your
code will like you a lot more. This is because it will be easier for them to
figure out what the hell your code is doing. Namespaces are your friend - use
them!
<a class="anchor" id="appendix-vectors-in-numpy"></a>
## Appendix C: Vectors in Numpy
One aspect of Numpy which might trip you up, and which can be quite
frustrating at times, is that Numpy has no understanding of row or column
vectors. __An array with only one dimension is neither a row, nor a column
vector - it is just a 1D array__. If you have a 1D array, and you want to use
it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly,
to use a 1D array as a column vector, you must reshape it to have shape
`(N, 1)`.
In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you
need to make sure that your arrays have the correct shape. For example:
%% Cell type:code id: tags:
```
r = np.random.randint(1, 10, 3)
print('r is a row: ', r)
print('r.T should be a column: ', r.T, ' ... huh?')
print('Ok, make n a 2D array with one row: ', r.reshape(1, -1))
print('We could also use the np.atleast_2d function:', np.atleast_2d(r))
print('Now we can transpose r to get a column:')
print(np.atleast_2d(r).T)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="appendix-the-numpy-matrix"></a>
## Appendix D: The Numpy `matrix`
By now you should be aware that a Numpy `array` does not behave in quite the
same way as a Matlab matrix. The primary difference between Numpy and Matlab
is that in Numpy, the `*` operator denotes element-wise multiplication,
whereas in Matlab, `*` denotes matrix multiplication.
Numpy does support the `@` operator for matrix multiplication, but if this is
a complete show-stopper for you - if you just can't bring yourself to write
`A @ B` to denote the matrix product of `A` and `B` - if you _must_ have your
code looking as Matlab-like as possible, then you should look into the Numpy
[`matrix`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html)
data type.
The `matrix` is an alternative to the `array` which essentially behaves more
like a Matlab matrix:
* `matrix` objects always have exactly two dimensions.
* `a * b` denotes matrix multiplication, rather than elementwise
multiplication.
* `matrix` objects have `.H` and `.I` attributes, which are convenient ways to
access the conjugate transpose and inverse of the matrix respectively.
Note however that use of the `matrix` type is _not_ widespread, and if you use
it you will risk confusing others who are familiar with the much more commonly
used `array`, and who need to work with your code. In fact, the official Numpy
documentation [recommends against using the `matrix`
type](https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html#array-or-matrix-which-should-i-use).
But if you are writing some very maths-heavy code, and you want your code to
be as clear and concise, and maths/Matlab-like as possible, then the `matrix`
type is there for you. Just make sure you document your code well to make it
clear to others what is going on!
<a class="anchor" id="useful-references"></a>
## Useful references
* [The Numpy manual](https://docs.scipy.org/doc/numpy/)
* [Linear algebra in `numpy.linalg`](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
* [Broadcasting in Numpy](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
* [Indexing in Numpy](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)
* [Random sampling in `numpy.random`](https://docs.scipy.org/doc/numpy/reference/routines.random.html)
* [Python slicing](https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/)
* [Numpy for Matlab users](https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html)
......
......@@ -914,8 +914,8 @@ print(a)
evenrows, evencols = np.where(a % 2 == 0)
print('even row coordinates:', evenx)
print('even col coordinates:', eveny)
print('even row coordinates:', evenrows)
print('even col coordinates:', evencols)
print(a[evenrows, evencols])
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment