Compare revisions

15d75bb5 · 15d75bb5 · 15d75bb5 · 15d75bb5 · 15d75bb5 · 15d75bb5
--- a/talks/matlab_vs_python/partial_fourier/data.mat
+++ b/talks/matlab_vs_python/partial_fourier/data.mat
--- a/talks/matlab_vs_python/partial_fourier/partial_fourier.ipynb
+++ b/talks/matlab_vs_python/partial_fourier/partial_fourier.ipynb
+%% Cell type:markdown id: tags:
+
+Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import h5py
+import matplotlib.pyplot as plt
+import numpy as np
+```
+
+%% Cell type:markdown id: tags:
+
+# Load data
+
+Load complex image data from MATLAB mat-file (v7.3 or later), which is actually an HDF5 format
+
+Complex data is loaded as a (real, imag) tuple, so it neds to be explicitly converted to complex double
+
+In this section:
+- using h5py module
+- np.transpose
+- 1j as imaginary constant
+
+%% Cell type:code id: tags:
+
+``` python
+# get hdf5 object for the mat-file
+h = h5py.File('data.mat','r')
+
+# get img variable from the mat-file
+dat = h.get('img')
+
+# turn array of (real, imag) tuples into an array of complex doubles
+# transpose to keep data in same orientation as MATLAB
+img = np.transpose(dat['real'] + 1j*dat['imag'])
+```
+
+%% Cell type:markdown id: tags:
+
+# 6/8 Partial Fourier sampling
+
+Fourier transform the image to get k-space data, and add complex Gaussian noise
+
+To simulate 6/8 Partial Fourier sampling, zero out the bottom 1/4 of k-space
+
+In this section:
+- np.random.randn
+- np.fft
+- 0-based indexing
+- image plotting
+
+%% Cell type:code id: tags:
+
+``` python
+# generate normally-distributed complex noise
+n = np.random.randn(96,96) + 1j*np.random.randn(96,96)
+
+# Fourier transform the image and add noise
+y = np.fft.fftshift(np.fft.fft2(img), axes=0) + n
+
+# set bottom 24/96 lines to 0
+y[72:,:] = 0
+
+# show sampling
+_, ax = plt.subplots()
+ax.imshow(np.log(np.abs(np.fft.fftshift(y, axes=1))))
+```
+
+%% Cell type:markdown id: tags:
+
+# Estimate phase
+
+Filter the k-space data and extract a low-resolution phase estimate
+
+Filtering can help reduce ringing in the phase image
+
+In this section:
+- np.pad
+- np.hanning
+- reshaping 1D array to 2D array using np.newaxis (or None)
+- subplots with titles
+
+%% Cell type:code id: tags:
+
+``` python
+# create zero-padded hanning filter for ky-filtering
+filt = np.pad(np.hanning(48),24,'constant')
+
+# reshape 1D array into 2D array
+filt = filt[:,np.newaxis]
+# or
+# filt = filt[:,None]
+
+# generate low-res image with inverse Fourier transform
+low = np.fft.ifft2(np.fft.ifftshift(y*filt, axes=0))
+
+# get phase image
+phs = np.exp(1j*np.angle(low))
+
+# show phase estimate alongside true phase
+_, ax = plt.subplots(1,2)
+ax[0].imshow(np.angle(img))
+ax[0].set_title('True image phase')
+ax[1].imshow(np.angle(phs))
+ax[1].set_title('Estimated phase')
+```
+
+%% Cell type:markdown id: tags:
+
+# POCS reconstruction
+
+Perform the projection-onto-convex-sets (POCS) partial Fourier reconstruction method.
+
+POCS is an iterative scheme estimates the reconstructed image as any element in the intersection of the following two (convex) sets:
+1. Set of images consistent with the measured data
+2. Set of images that are non-negative real
+
+This requires prior knowledge of the image phase (hence the estimate above), and it works because although we have less than a full k-space of measurements, we're now only estimating half the number of free parameters (real values only, instead of real + imag), and we're no longer under-determined. Equivalently, consider the fact that real-valued images have conjugate symmetric k-spaces, so we only require half of k-space to reconstruct our image.
+
+In this section:
+- np.zeros
+- range() builtin
+- point-wise multiplication (*)
+- np.fft operations default to last axis, not first
+- np.maximum vs np.max
+
+%% Cell type:code id: tags:
+
+``` python
+# initialise image estimate to be zeros
+est = np.zeros((96,96))
+
+# set the number of iterations
+iters = 10
+
+# each iteration cycles between projections
+for i in range(iters):
+# projection onto data-consistent set:
+    # use a-priori phase to get complex image
+    est = est*phs
+
+    # Fourier transform to get k-space
+    est = np.fft.fftshift(np.fft.fft2(est), axes=0)
+
+    # replace data with measured lines
+    est[:72,:] = y[:72,:]
+
+    # inverse Fourier transform to get back to image space
+    est = np.fft.ifft2(np.fft.ifftshift(est, axes=0))
+
+# projection onto non-negative reals:
+    # remove a-priori phase
+    est = est*np.conj(phs)
+
+    # get real part
+    est = np.real(est)
+
+    # ensure output is non-negative
+    est = np.maximum(est, 0)
+```
+
+%% Cell type:markdown id: tags:
+
+# Display error and plot reconstruction
+
+The POCS reconstruction is compared to a zero-filled reconstruction (i.e., where the missing data is zeroed prior to inverse Fourier Transform)
+
+In this section:
+- print formatted strings to standard output
+- 2D subplots with min/max scales, figure size
+- np.sum sums over all elements by default
+
+%% Cell type:code id: tags:
+
+``` python
+# compute zero-filled recon
+zf = np.fft.ifft2(np.fft.ifftshift(y, axes=0))
+
+# compute rmse for zero-filled and POCS recon
+err_zf = np.sqrt(np.sum(np.abs(zf - img)**2))
+err_pocs = np.sqrt(np.sum(np.abs(est*phs - img)**2))
+
+# print errors
+print(f'RMSE for zero-filled recon: {err_zf}')
+print(f'RMSE for POCS recon: {err_pocs}')
+
+# plot both recons side-by-side
+_, ax = plt.subplots(2,2,figsize=(16,16))
+
+# plot zero-filled
+ax[0,0].imshow(np.abs(zf), vmin=0, vmax=1)
+ax[0,0].set_title('Zero-filled')
+ax[1,0].plot(np.abs(zf[:,47]))
+
+# plot POCS
+ax[0,1].imshow(est, vmin=0, vmax=1)
+ax[0,1].set_title('POCS recon')
+ax[1,1].plot(np.abs(est[:,47]))
+```
+%% Cell type:markdown id: tags:
+
+Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import h5py
+import matplotlib.pyplot as plt
+import numpy as np
+```
+
+%% Cell type:markdown id: tags:
+
+# Load data
+
+Load complex image data from MATLAB mat-file (v7.3 or later), which is actually an HDF5 format
+
+Complex data is loaded as a (real, imag) tuple, so it neds to be explicitly converted to complex double
+
+In this section:
+- using h5py module
+- np.transpose
+- 1j as imaginary constant
+
+%% Cell type:code id: tags:
+
+``` python
+# get hdf5 object for the mat-file
+h = h5py.File('data.mat','r')
+
+# get img variable from the mat-file
+dat = h.get('img')
+
+# turn array of (real, imag) tuples into an array of complex doubles
+# transpose to keep data in same orientation as MATLAB
+img = np.transpose(dat['real'] + 1j*dat['imag'])
+```
+
+%% Cell type:markdown id: tags:
+
+# 6/8 Partial Fourier sampling
+
+Fourier transform the image to get k-space data, and add complex Gaussian noise
+
+To simulate 6/8 Partial Fourier sampling, zero out the bottom 1/4 of k-space
+
+In this section:
+- np.random.randn
+- np.fft
+- 0-based indexing
+- image plotting
+
+%% Cell type:code id: tags:
+
+``` python
+# generate normally-distributed complex noise
+n = np.random.randn(96,96) + 1j*np.random.randn(96,96)
+
+# Fourier transform the image and add noise
+y = np.fft.fftshift(np.fft.fft2(img), axes=0) + n
+
+# set bottom 24/96 lines to 0
+y[72:,:] = 0
+
+# show sampling
+_, ax = plt.subplots()
+ax.imshow(np.log(np.abs(np.fft.fftshift(y, axes=1))))
+```
+
+%% Cell type:markdown id: tags:
+
+# Estimate phase
+
+Filter the k-space data and extract a low-resolution phase estimate
+
+Filtering can help reduce ringing in the phase image
+
+In this section:
+- np.pad
+- np.hanning
+- reshaping 1D array to 2D array using np.newaxis (or None)
+- subplots with titles
+
+%% Cell type:code id: tags:
+
+``` python
+# create zero-padded hanning filter for ky-filtering
+filt = np.pad(np.hanning(48),24,'constant')
+
+# reshape 1D array into 2D array
+filt = filt[:,np.newaxis]
+# or
+# filt = filt[:,None]
+
+# generate low-res image with inverse Fourier transform
+low = np.fft.ifft2(np.fft.ifftshift(y*filt, axes=0))
+
+# get phase image
+phs = np.exp(1j*np.angle(low))
+
+# show phase estimate alongside true phase
+_, ax = plt.subplots(1,2)
+ax[0].imshow(np.angle(img))
+ax[0].set_title('True image phase')
+ax[1].imshow(np.angle(phs))
+ax[1].set_title('Estimated phase')
+```
+
+%% Cell type:markdown id: tags:
+
+# POCS reconstruction
+
+Perform the projection-onto-convex-sets (POCS) partial Fourier reconstruction method.
+
+POCS is an iterative scheme estimates the reconstructed image as any element in the intersection of the following two (convex) sets:
+1. Set of images consistent with the measured data
+2. Set of images that are non-negative real
+
+This requires prior knowledge of the image phase (hence the estimate above), and it works because although we have less than a full k-space of measurements, we're now only estimating half the number of free parameters (real values only, instead of real + imag), and we're no longer under-determined. Equivalently, consider the fact that real-valued images have conjugate symmetric k-spaces, so we only require half of k-space to reconstruct our image.
+
+In this section:
+- np.zeros
+- range() builtin
+- point-wise multiplication (*)
+- np.fft operations default to last axis, not first
+- np.maximum vs np.max
+
+%% Cell type:code id: tags:
+
+``` python
+# initialise image estimate to be zeros
+est = np.zeros((96,96))
+
+# set the number of iterations
+iters = 10
+
+# each iteration cycles between projections
+for i in range(iters):
+# projection onto data-consistent set:
+    # use a-priori phase to get complex image
+    est = est*phs
+
+    # Fourier transform to get k-space
+    est = np.fft.fftshift(np.fft.fft2(est), axes=0)
+
+    # replace data with measured lines
+    est[:72,:] = y[:72,:]
+
+    # inverse Fourier transform to get back to image space
+    est = np.fft.ifft2(np.fft.ifftshift(est, axes=0))
+
+# projection onto non-negative reals:
+    # remove a-priori phase
+    est = est*np.conj(phs)
+
+    # get real part
+    est = np.real(est)
+
+    # ensure output is non-negative
+    est = np.maximum(est, 0)
+```
+
+%% Cell type:markdown id: tags:
+
+# Display error and plot reconstruction
+
+The POCS reconstruction is compared to a zero-filled reconstruction (i.e., where the missing data is zeroed prior to inverse Fourier Transform)
+
+In this section:
+- print formatted strings to standard output
+- 2D subplots with min/max scales, figure size
+- np.sum sums over all elements by default
+
+%% Cell type:code id: tags:
+
+``` python
+# compute zero-filled recon
+zf = np.fft.ifft2(np.fft.ifftshift(y, axes=0))
+
+# compute rmse for zero-filled and POCS recon
+err_zf = np.sqrt(np.sum(np.abs(zf - img)**2))
+err_pocs = np.sqrt(np.sum(np.abs(est*phs - img)**2))
+
+# print errors
+print(f'RMSE for zero-filled recon: {err_zf}')
+print(f'RMSE for POCS recon: {err_pocs}')
+
+# plot both recons side-by-side
+_, ax = plt.subplots(2,2,figsize=(16,16))
+
+# plot zero-filled
+ax[0,0].imshow(np.abs(zf), vmin=0, vmax=1)
+ax[0,0].set_title('Zero-filled')
+ax[1,0].plot(np.abs(zf[:,47]))
+
+# plot POCS
+ax[0,1].imshow(est, vmin=0, vmax=1)
+ax[0,1].set_title('POCS recon')
+ax[1,1].plot(np.abs(est[:,47]))
+```
--- a/talks/matlab_vs_python/partial_fourier/partial_fourier.m
+++ b/talks/matlab_vs_python/partial_fourier/partial_fourier.m
+%% Load Data
+% get matfile object
+h = matfile('data.mat');
+
+% get img variable from the mat-file
+img = h.img;
+
+%% 6/8 Partial Fourier sampling
+% generate normally-distributed complex noise
+n = randn(96) + 1j*randn(96);
+
+% Fourier transform the image and add noise
+y = fftshift(fft2(img),1) + n;
+
+% set bottom 24/96 lines to 0
+y(73:end,:) = 0;
+
+% show sampling
+figure();
+imshow(log(abs(fftshift(y,2))), [], 'colormap', jet)
+
+%% Estimate phase
+% create zero-padded hanning filter for ky-filtering
+filt = padarray(hann(48),24);
+
+% generate low-res image with inverse Fourier transform
+low = ifft2(ifftshift(y.*filt,1));
+
+% get phase image
+phs = exp(1j*angle(low));
+
+% show phase estimate alongside true phase
+figure();
+subplot(1,2,1);
+imshow(angle(img), [-pi,pi], 'colormap', hsv);
+title('True image phase');
+
+subplot(1,2,2);
+imshow(angle(phs), [-pi,pi], 'colormap', hsv)
+title('Estimated phase');
+
+%% POCS reconstruction
+% initialise image estimate to be zeros
+est = zeros(96);
+
+% set number of iterations
+iters = 10;
+
+% each iteration cycles between projections
+for i = 1:iters
+% projection onto data-consistent set:
+    % use a-priori phase to get complex image
+    est = est.*phs;
+
+    % Fourier transform to get k-space
+    est = fftshift(fft2(est), 1);
+
+    % replace data with measured lines
+    est(1:72,:) = y(1:72,:);
+
+    % inverse Fourier transform to get back to image space
+    est = ifft2(ifftshift(est, 1));
+
+% projection onto non-negative reals
+    % remove a-priori phase
+    est = est.*conj(phs);
+
+    % get real part
+    est = real(est);
+
+    % ensure output is non-negative
+    est = max(est, 0);
+end
+
+%% Display error and plot reconstruction
+% compute zero-filled recon
+zf = ifft2(ifftshift(y, 1));
+
+% compute rmse for zero-filled and POCS recon
+err_zf = norm(zf(:) - img(:));
+err_pocs = norm(est(:).*phs(:) - img(:));
+
+% print errors
+fprintf(1, 'RMSE for zero-filled recon: %f\n', err_zf);
+fprintf(1, 'RMSE for POCS recon: %f\n', err_pocs);
+
+% plot both recons side-by-side
+figure();
+
+% plot zero-filled
+subplot(2,2,1);
+imshow(abs(zf), [0 1]);
+title('Zero-Filled');
+subplot(2,2,3);
+plot(abs(zf(:,48)), 'linewidth', 2);
+
+% plot POCS
+subplot(2,2,2);
+imshow(est, [0 1]);
+title('POCS recon');
+subplot(2,2,4);
+plot(abs(est(:,48)), 'linewidth', 2);
--- a/talks/matlab_vs_python/rbf/play_rbf.ipynb
+++ b/talks/matlab_vs_python/rbf/play_rbf.ipynb
+%% Cell type:markdown id: tags:
+
+## Matlab v Python : introduction
+
+This is a very small piece of code that fits RBFs to some data
+
+In this section
+
+- numpy
+- matlab-like functions like np.linspace
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+
+# Generate noisy sine wave
+x = np.linspace(0,10,100)
+y = np.sin(3*x) + np.random.randn(x.size)*.5
+```
+
+%% Cell type:markdown id: tags:
+
+
+We define our Radial basis functions as $\textrm{RBF}(x) = \exp\left[-\frac{(x-c)^2}{\sigma^2}\right]$ where $c$ is the centre and $\sigma$ the extent.  By having multiple such functions with different centres we can fit an arbitrary function.
+
+In this section
+
+- defining a function (two options)
+- list comprehension
+- power with double *
+- numpy array from list
+- transpose
+
+%% Cell type:code id: tags:
+
+``` python
+# Define RBF atom
+# two options:
+#   inline function definition (not available in matlab)
+#   lambda : like @ in matlab
+
+sig = 2 # extent of an atom
+# option 1
+# def rbf(x,c):
+#     return np.exp(-(x-c)**2/sig**2)
+# option 2
+rbf = lambda x,c : np.exp(-(x-c)**2/sig**2)
+
+
+# create a design matrix
+# (using list comprehension to show off)
+ci     = np.linspace(0,10,20)   # centres of the atoms
+desmat = [rbf(x,c) for c in ci] # desmat contains a list of atoms
+
+# from list to numpy array
+desmat = np.asarray(desmat).T
+```
+
+%% Cell type:markdown id: tags:
+
+Now we can fit to the sine wave with pseudoinverse.
+
+In this section:
+
+- pinv
+- basic plotting and prettifying
+- saving figure to file
+
+%% Cell type:code id: tags:
+
+``` python
+
+# invert model
+beta   = np.linalg.pinv(desmat)@y.T
+
+
+import matplotlib.pyplot as plt
+
+# plot data, RBFs, and fitted model
+plt.figure()
+plt.plot(x,y,'.')
+plt.plot(x,desmat,'k',alpha=.2)
+plt.plot(x,desmat@beta)
+
+# make it pretty
+plt.grid()
+plt.xlabel('x')
+plt.ylabel('y')
+plt.title('RBF fitting')
+plt.savefig('/Users/saad/Desktop/RBF.pdf')
+
+```
+
+%% Output
+
+
+
+%% Cell type:markdown id: tags:
+
+## The end.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+## Matlab v Python : introduction
+
+This is a very small piece of code that fits RBFs to some data
+
+In this section
+
+- numpy
+- matlab-like functions like np.linspace
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+
+# Generate noisy sine wave
+x = np.linspace(0,10,100)
+y = np.sin(3*x) + np.random.randn(x.size)*.5
+```
+
+%% Cell type:markdown id: tags:
+
+
+We define our Radial basis functions as $\textrm{RBF}(x) = \exp\left[-\frac{(x-c)^2}{\sigma^2}\right]$ where $c$ is the centre and $\sigma$ the extent.  By having multiple such functions with different centres we can fit an arbitrary function.
+
+In this section
+
+- defining a function (two options)
+- list comprehension
+- power with double *
+- numpy array from list
+- transpose
+
+%% Cell type:code id: tags:
+
+``` python
+# Define RBF atom
+# two options:
+#   inline function definition (not available in matlab)
+#   lambda : like @ in matlab
+
+sig = 2 # extent of an atom
+# option 1
+# def rbf(x,c):
+#     return np.exp(-(x-c)**2/sig**2)
+# option 2
+rbf = lambda x,c : np.exp(-(x-c)**2/sig**2)
+
+
+# create a design matrix
+# (using list comprehension to show off)
+ci     = np.linspace(0,10,20)   # centres of the atoms
+desmat = [rbf(x,c) for c in ci] # desmat contains a list of atoms
+
+# from list to numpy array
+desmat = np.asarray(desmat).T
+```
+
+%% Cell type:markdown id: tags:
+
+Now we can fit to the sine wave with pseudoinverse.
+
+In this section:
+
+- pinv
+- basic plotting and prettifying
+- saving figure to file
+
+%% Cell type:code id: tags:
+
+``` python
+
+# invert model
+beta   = np.linalg.pinv(desmat)@y.T
+
+
+import matplotlib.pyplot as plt
+
+# plot data, RBFs, and fitted model
+plt.figure()
+plt.plot(x,y,'.')
+plt.plot(x,desmat,'k',alpha=.2)
+plt.plot(x,desmat@beta)
+
+# make it pretty
+plt.grid()
+plt.xlabel('x')
+plt.ylabel('y')
+plt.title('RBF fitting')
+plt.savefig('/Users/saad/Desktop/RBF.pdf')
+
+```
+
+%% Output
+
+
+
+%% Cell type:markdown id: tags:
+
+## The end.
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/talks/matlab_vs_python/rbf/play_rbf.m
+++ b/talks/matlab_vs_python/rbf/play_rbf.m
+%% Fitting Radial Basis Functions to some data
+
+% Generate a noisy sine wave and plot it
+x = linspace(0,10,100);
+y = sin(3*x) + randn(size(x))*.5;
+    
+figure,
+plot(x,y)
+
+%% Fit RBF
+
+% This defines a RBF atom function
+sig = 2;
+rbf = @(x,c)(exp(-(x-c).^2/sig^2));
+
+% design matrix for fitting
+xi     = linspace(0,10,20);
+desmat = zeros(length(x),length(xi));
+for i=1:length(xi)
+    % each column is an RBF centered around xi
+    desmat(:,i) = rbf(x,xi(i));
+end
+% fit model
+beta = desmat\y';
+
+% plot fit
+figure(1),hold on
+plot(x,y,'.','markersize',10)
+h = plot(x,desmat,'k'); for i =1:20;h(i).Color=[0,0,0,0.2];end
+plot(x,desmat*beta,'-','linewidth',2)
+% make it a little prettier
+grid on
+xlabel('x')
+ylabel('y')
+set(gca,'fontsize',16)
+title('RBF fitting')
+
+%% save figure
+print -depsc ~/Desktop/RBF.eps 
+close(1)
+!open  ~/Desktop/RBF.eps
+
--- a/talks/multiprocessing/Multiprocessing.ipynb
+++ b/talks/multiprocessing/Multiprocessing.ipynb
+%% Cell type:markdown id: tags:
+
+![Running in parallel](parallel.png)
+
+# Multiprocessing and multithreading in Python
+
+## Why use multiprocessing?
+
+%% Cell type:code id: tags:
+
+``` python
+import multiprocessing
+multiprocessing.cpu_count()
+```
+
+%% Output
+
+    4
+
+%% Cell type:markdown id: tags:
+
+*Almost all CPUs these days are multi-core.*
+
+CPU-intensive programs will not be efficient unless they take advantage of this!
+
+%% Cell type:markdown id: tags:
+
+# General plan
+
+Walk through a basic application of multiprocessing, hopefully relevant to the kind of work you might want to do.
+
+Not a comprehensive guide to Python multithreading/multiprocessing.
+
+%% Cell type:markdown id: tags:
+
+![voxel](voxel.png)
+
+## Sample application
+
+Assume we are doing some voxelwise image processing - i.e. running a computationally intensive calculation *independently* on each voxel in a (possibly large) image.
+
+*(Such problems are sometimes called 'embarrassingly parallel')*
+
+This is in a Python module called my_analysis. Here we simulate this by just calculating a large number of exponentials for each voxel.
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+import math
+import numpy
+
+def calculate_voxel(val):
+    # 'Slow' voxelwise calculation
+    for i in range(30000):
+        b = math.exp(val)
+    return b
+```
+
+%% Cell type:markdown id: tags:
+
+We're going to run this on a Numpy array. `numpy.vectorize` is a convenient function to apply a function to every element of the array, but it is *not* doing anything clever - it is no different than looping over the x, y, and z co-ordinates.
+
+We're also giving the data an ID - this will be used later when we have multiple threads.
+
+%% Cell type:code id: tags:
+
+``` python
+def calculate_data(data, id=0):
+    # Run 'calculate_voxel' on each voxel in data
+    print("Id: %i: Processing %i voxels" % (id, data.size))
+    vectorized = numpy.vectorize(calculate_voxel)
+    vectorized(data)
+    print("Id: %i: Done" % id)
+    return data
+
+```
+
+%% Cell type:markdown id: tags:
+
+Here's some Python code to run our analysis on a random Numpy array, and time how long it takes
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import timeit
+
+import my_analysis
+
+def run():
+    data = numpy.random.rand(16, 16, 16)
+    my_analysis.calculate_data(data)
+
+t = timeit.timeit(run, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Id: 0: Processing 4096 voxels
+    Id: 0: Done
+    Data processing took 26.44 seconds
+
+%% Cell type:markdown id: tags:
+
+So, it took a little while.
+
+%% Cell type:markdown id: tags:
+
+If we watch what's going on while this runs, we can see the program is not using all of our CPU. It's only working on one core.
+
+![Running in serial](onecore.png)
+
+%% Cell type:markdown id: tags:
+
+## What we want
+
+It would be nice to split the data up into chunks and give one to each core. Then we could get through the processing 8 times as fast.
+
+![Running in parallel](multicore.png)
+
+%% Cell type:markdown id: tags:
+
+# Multithreading attempt
+
+*Threads* are a way for a program to run more than one task at a time. Let's try using this on our application, using the Python `threading` module.
+
+%% Cell type:markdown id: tags:
+
+## Splitting the data up
+
+We're going to need to split the data up into chunks. Numpy has a handy function `numpy.split` which slices the array up into equal portions along a specified axis:
+
+    chunks = numpy.split(full_data, num_chunks, axis)
+
+*The data must split up equally along this axis! We will solve this problem later*
+
+%% Cell type:markdown id: tags:
+
+## Creating a new thread for each chunk
+
+    def function_to_call(args, arg2, arg3):
+       ...do something
+
+    ...
+
+    import threading
+    thread = threading.Thread(target=function_to_call,
+                              args=[arg1, arg2, arg3])
+
+%% Cell type:markdown id: tags:
+
+## Waiting for the threads to complete
+
+    thread.join()
+
+- This waits until `thread` has completed
+- So, if we have more than one thread we need to keep a list and wait for them all to finish:
+
+
+    for thread in threads:
+        thread.join()
+
+%% Cell type:markdown id: tags:
+
+## Example code
+
+The example code is below - let's see how it does!
+
+%% Cell type:code id: tags:
+
+``` python
+import threading
+
+def multithread_process(data):
+    n_workers = 4
+
+    # Split the data into chunks along axis 0
+    # We are assuming this axis is divisible by the number of workers!
+    chunks = numpy.split(data, n_workers, axis=0)
+
+    # Start a worker for each chunk
+    workers = []
+    for idx, chunk in enumerate(chunks):
+        print("Starting worker for part %i" % idx)
+        w = threading.Thread(target=my_analysis.calculate_data, args=[chunk, idx])
+        workers.append(w)
+        w.start()
+
+    # Wait for each worker to finish
+    for w in workers:
+        w.join()
+
+def run_with_threads():
+    data = numpy.random.rand(16, 16, 16)
+    multithread_process(data)
+
+t = timeit.timeit(run_with_threads, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Starting worker for part 0
+    Starting worker for part 1
+     Id: 0: Processing 1024 voxels
+    Id: 1: Processing 1024 voxels
+    Starting worker for part 2Id: 2: Processing 1024 voxels
+    
+    Starting worker for part 3Id: 3: Processing 1024 voxels
+    
+    Id: 1: DoneId: 0: Done
+    
+    Id: 2: Done
+    Id: 3: Done
+    Data processing took 132.90 seconds
+
+%% Cell type:markdown id: tags:
+
+# The Big Problem with Python threads
+
+%% Cell type:markdown id: tags:
+
+**Only one thread can execute Python code at a time**
+
+![Python multithreading](thread_gil.png)
+
+This is what's really going on.
+
+%% Cell type:markdown id: tags:
+
+The reason is something called the **Global Interpreter Lock (GIL)**. Only one thread can have it, and you can only execute Python code when you have the GIL.
+
+%% Cell type:markdown id: tags:
+
+## So, does that mean Python threads are useless?
+
+No, not completely. They're useful for:
+
+- Making a user interface continue to respond while a calculation takes place in the background
+- A web server handling multiple requests.
+  - *The GIL is not required while waiting for network connections*
+- Doing calculations in parallel which are running in native (C/C++) code
+  - *The GIL is not required while running native code*
+
+%% Cell type:markdown id: tags:
+
+
+### But for doing CPU-intensive Python calculations in parallel, yes Python threads are essentially useless
+
+%% Cell type:markdown id: tags:
+
+## Can multiprocessing help?
+
+%% Cell type:markdown id: tags:
+
+### Differences between threads and processes
+
+- Threads are quicker to start up and generally require fewer resources
+
+- Threads share memory with the main process
+  - Don't need to copy your data to pass it to a thread
+  - Don't need to copy the output data back to the main program
+
+- Processes have their own memory space
+  - Data needs to be copied from the main program to the process
+  - Any output needs to be copied back
+
+- However, importantly for Python, *Each process has its own GIL so they can run at the same time as others*
+
+%% Cell type:markdown id: tags:
+
+## Multiprocessing attempt
+
+Multiprocessing is normally more work than multithreading.
+
+However Python tries *very hard* to make multiprocessing as easy as multithreading.
+
+- `import multiprocessing` instead of `import threading`
+- `multiprocessing.Process()` instead of `threading.Thread()`
+
+%% Cell type:code id: tags:
+
+``` python
+import multiprocessing
+
+def multiprocess_process(data):
+    n_workers = 4
+
+    # Split the data into chunks along axis 0
+    # We are assuming this axis is divisible by the number of workers!
+    chunks = numpy.split(data, n_workers, axis=0)
+
+    workers = []
+    for idx, chunk in enumerate(chunks):
+        print("Starting worker for chunk %i" % idx)
+        w = multiprocessing.Process(target=my_analysis.calculate_data, args=[chunk, idx])
+        workers.append(w)
+        w.start()
+
+    # Wait for workers to complete
+    for w in workers:
+        w.join()
+
+def run_with_processes():
+    data = numpy.random.rand(16, 16, 16)
+    multiprocess_process(data)
+
+if __name__ == "__main__":
+    t = timeit.timeit(run_with_processes, number=1)
+    print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Starting worker for chunk 0
+    Starting worker for chunk 1
+    Starting worker for chunk 2
+    Starting worker for chunk 3
+    Data processing took 9.74 seconds
+
+%% Cell type:markdown id: tags:
+
+# Multiprocessing works!
+
+%% Cell type:markdown id: tags:
+
+## BUT
+
+# Caveats and gotchas
+
+Before we just run off and replace all our threads with processes there are a few things we need to bear in mind:
+
+## Data copying
+
+- Python *copied* each chunk of data to each worker. If the data was very large this could be a significant overhead
+- Python needs to know *how* to copy all the data we pass to the process.
+  - This is fine for normal data types (strings, lists, dictionaries, etc) and Numpy arrays
+  - Can get trouble if you try to pass complex objects to your function
+  - Anything you pass to the worker needs to be support the `pickle` module
+
+## The global variable problem
+
+- Can't rely on global variables being copied
+
+%% Cell type:markdown id: tags:
+
+## The output problem
+
+If you change data in a subprocess, your main program will not see it.
+
+## Example:
+
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+def add_one(data):
+    data += 1
+```
+
+%% Cell type:code id: tags:
+
+``` python
+data = numpy.zeros([2, 2])
+print("Starting with zeros")
+print(data)
+
+#my_analysis.add_one(data)
+#w = threading.Thread(target=my_analysis.add_one, args=[data,])
+w = multiprocessing.Process(target=my_analysis.add_one, args=[data,])
+w.start()
+w.join()
+
+print("I think my worker just added one")
+print(data)
+```
+
+%% Output
+
+    Starting with zeros
+    [[ 0.  0.]
+     [ 0.  0.]]
+    I think my worker just added one
+    [[ 0.  0.]
+     [ 0.  0.]]
+
+%% Cell type:markdown id: tags:
+
+# Making multiprocessing work better
+
+%% Cell type:markdown id: tags:
+
+## Problems to solve:
+
+- Dividing data amongst processes
+- Returning data from process
+- Status updates from process
+
+%% Cell type:markdown id: tags:
+
+## Worker pools
+
+A *Pool* is a fixed number of worker processes which can be given tasks to do.
+
+We can give a pool as many tasks as we like - once a worker finishes a task it will start another, until they're all done.
+
+We can create a pool using:
+
+    multiprocessing.Pool(num_workers)
+
+- If the number of workers in the pool is equal to the number of cores, we should be able to keep our CPU busy.
+- Pools are good for load balancing if some slices are more work than others
+
+%% Cell type:markdown id: tags:
+
+![Worker pool](pool.png)
+
+%% Cell type:markdown id: tags:
+
+## Splitting our data into chunks for the pool
+
+ - Now we can split our data up into as many chunks as we like
+ - Easiest solution is to use 1-voxel slices along one of the axes `split_axis`:
+   - `numpy.split(data, data.shape[split_axis], axis=split_axis)`
+
+%% Cell type:markdown id: tags:
+
+## Giving the tasks to the pool
+
+ - Easiest way is to use:
+
+
+    Pool.map(function, task_args)
+
+ - task_args is a *sequence of sequences*
+   - Each element in `task_args` is a sequence of arguments for one task
+   - The length of `task_args` is the number of tasks
+
+#### Example `task_args` for 5 tasks, each being passed an ID and a chunk of data
+
+    [
+       [0, chunk_1], # task 1
+       [1, chunk_2], # task_2
+       [2, chunk_3], # task 3
+       [3, chunk_4], # task_4
+       [4, chunk_5], # task_5
+    ]
+
+ - If we have a list of chunks we can generate this with `enumerate(chunks)`
+
+
+ - **Arguments are passed to the task in a slightly different way compared to `multiprocessing.Process()`**
+
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+def do_task(args):
+    # Pool.map passes all our arguments as a single tuple, so unpack it
+    # and pass the arguments to the real calculate function.
+    id, data = args
+    return calculate_data(data, id)
+
+```
+
+%% Cell type:markdown id: tags:
+
+ - If you're using Python 3, look into `Pool.starmap`
+
+%% Cell type:markdown id: tags:
+
+## Getting the output and putting it back together
+
+ - `Pool.map()` captures the return value of your worker function
+ - It returns a list of all of the return values for each task
+   - for us this is a list of Numpy arrays, one for each slice
+ - `numpy.concatenate(list_of_slices, split_axis)` will combine them back into a single data item for us
+
+%% Cell type:markdown id: tags:
+
+## The full example
+
+%% Cell type:code id: tags:
+
+``` python
+# Split our data along the x-axis
+SPLIT_AXIS = 0
+
+def pool_process(data):
+    n_workers = 4
+
+    # Split the data into 1-voxel slices along axis 0
+    parts = numpy.split(data, data.shape[SPLIT_AXIS], axis=SPLIT_AXIS)
+
+    print("Input data shape=%s" % str(data.shape))
+    print("Processing %i parts with %i workers" % (len(parts), n_workers))
+
+    # Create a pool - normally this would be 1 worker per CPU core
+    pool = multiprocessing.Pool(n_workers)
+
+    # Send the tasks to the workers
+    list_of_slices = pool.map(my_analysis.do_task, enumerate(parts))
+
+    # Combine the return data back into a single array
+    processed_array = numpy.concatenate(list_of_slices, SPLIT_AXIS)
+
+    print("Processed data, output shape=%s" % str(processed_array.shape))
+
+def run_with_pool():
+    data = numpy.random.rand(23, 13, 11)
+    pool_process(data)
+
+t = timeit.timeit(run_with_pool, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Input data shape=(23L, 13L, 11L)
+    Processing 23 parts with 4 workers
+    Processed data, output shape=(23L, 13L, 11L)
+    Data processing took 8.08 seconds
+
+%% Cell type:markdown id: tags:
+
+# Communication / Status updates?
+
+ - Would be nice if workers could communicate their progress as they work. One way to do this is using a `Queue`.
+
+
+%% Cell type:markdown id: tags:
+
+
+## Queues
+
+A Queue is often used to send status updates from the process to the main program.
+
+ - Shared between the main program and the subprocesses
+ - Create it with `multiprocessing.Manager().Queue()`
+ - Pass it to the worker thread like any other argument
+ - Worker calls `queue.put()` to send some data to the queue
+ - Main program calls `queue.get()` to get data off the queue
+ - Queue is FIFO (First In First Out)
+ - Need to have a thread running which checks the queue for updates every so often
+   - This is a good use for threads!
+
+![Queue](queue_put.png)
+
+
+%% Cell type:markdown id: tags:
+
+## Modify our example to report progress
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+def calculate_data_and_report(args):
+    id, data, queue = args
+
+    # Run 'calculate_voxel' on each voxel in data
+    vectorized = numpy.vectorize(calculate_voxel)
+    vectorized(data)
+
+    # Report our ID and how many voxels we have done to the queue
+    queue.put((id, data.size))
+
+```
+
+%% Cell type:markdown id: tags:
+
+## Create a thread to monitor the queue for updates
+
+I've done this as a class, because it that's the easiest way
+
+%% Cell type:code id: tags:
+
+``` python
+class QueueChecker():
+    def __init__(self, queue, num_voxels, interval_seconds=1):
+        self._queue = queue
+        self._num_voxels = num_voxels
+        self._interval_seconds = interval_seconds
+        self._voxels_done = 0
+        self._cancel = False
+        self._restart_timer()
+
+    def cancel(self):
+        self._cancel = True
+
+    def _restart_timer(self):
+        self._timer = threading.Timer(self._interval_seconds, self._check_queue)
+        self._timer.start()
+
+    def _check_queue(self):
+        while not self._queue.empty():
+            id, voxels_done = self._queue.get()
+            self._voxels_done += voxels_done
+
+        percent = int(100*float(self._voxels_done)/self._num_voxels)
+        print("%i%% complete" % percent)
+        if not self._cancel:
+            self._restart_timer()
+
+```
+
+%% Cell type:markdown id: tags:
+
+## Modify our main program to pass the queue to each of our workers
+
+We need to create the queue and the `QueueChecker` and make sure each task includes a copy of the queue
+
+%% Cell type:code id: tags:
+
+``` python
+# Split our data along the x-axis
+SPLIT_AXIS = 0
+reload(my_analysis)
+
+def pool_process(data):
+    n_workers = 4
+
+    # Split the data into 1-voxel slices along axis 0
+    parts = numpy.split(data, data.shape[SPLIT_AXIS], axis=SPLIT_AXIS)
+
+    print("Input data shape=%s" % str(data.shape))
+    print("We are processing %i parts with %i workers" % (len(parts), n_workers))
+    pool = multiprocessing.Pool(n_workers)
+
+    # Create the queue
+    queue = multiprocessing.Manager().Queue()
+    checker = QueueChecker(queue, data.size)
+
+    # Note that we need to pass the queue as an argument to the worker
+    args = [(id, part, queue) for id, part in enumerate(parts)]
+    list_of_slices = pool.map(my_analysis.calculate_data_and_report, args)
+
+    checker.cancel()
+
+    # Join processed data back together again
+    processed_array = numpy.concatenate(list_of_slices, SPLIT_AXIS)
+    print("Processed data, output shape=%s" % str(processed_array.shape))
+
+def run_with_pool():
+    data = numpy.random.rand(23, 19, 17)
+    pool_process(data)
+
+t = timeit.timeit(run_with_pool, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Input data shape=(23L, 19L, 17L)
+    We are processing 23 parts with 4 workers
+    0% complete
+    0% complete
+    13% complete
+    17% complete
+    17% complete
+    34% complete
+    34% complete
+    47% complete
+    52% complete
+    56% complete
+    69% complete
+    69% complete
+    78% complete
+    86% complete
+    95% complete
+    Processed data, output shape=(23L, 19L, 17L)
+    Data processing took 17.39 seconds
+    100% complete
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+%% Cell type:markdown id: tags:
+
+## What we've covered
+
+ - Limitations of threading for parallel processing in Python
+ - How to split up a simple voxel-processing task into separate chunks
+   - `numpy.split()`
+ - How to run each chunk in parallel using multiprocessing
+   - `multiprocessing.Process`
+ - How to separate the number of tasks from the number of workers
+   - `multiprocessing.Pool()`
+   - `Pool.map()`
+ - How to get output back from the workers and join it back together again
+   - `numpy.concatenate()`
+ - How to pass back progress information from our worker processes
+   - `multiprocessing.manager.Queue()`
+ - Using a threading.Timer object to monitor the queue and display updates
+
+%% Cell type:markdown id: tags:
+
+## Things I haven't covered
+
+Loads of stuff!
+
+%% Cell type:markdown id: tags:
+
+### Threading
+
+- Locking of shared data (so only one thread can use it at a time)
+- Thread-local storage (see `threading.local()`)
+- See Paul's tutorial on the PyTreat GIT for more information
+- Or see the `threading` Python documentation for full details
+
+%% Cell type:markdown id: tags:
+
+### Multiprocessing
+
+- Passing data *between* workers
+  - Can use `Queue` for one-way traffic
+  - Use `Pipe` for two-way communication between one worker and another
+  - May be required when your problem is not 'embarrasingly parallel'
+- Sharing memory
+  - Way to avoid copying large amounts of data
+  - Look at `multiprocessing.Array`
+  - Need to convert Numpy array into a ctypes array
+  - Shared memory has pitfalls
+  - *Don't go here unless you have aready determined that data copying is a bottleneck*
+- Running workers asynchronously
+  - So main program doesn't have to wait for them to finish
+  - Instead, a function is called every time a task is finished
+  - see `multiprocessing.apply_async()` for more information
+- Error handling
+  - Needs a bit of care - very easy to 'lose' errors
+  - Workers should catch all exceptions
+  - And should return a value to signal when a task has failed
+  - Main program decides how to deal with it
+
+%% Cell type:markdown id: tags:
+
+## Always remember
+
+**Python is not the best tool for every job!**
+
+If you are really after performance, consider implementing your algorithm in multi-threaded C/C++ and then create a Python interface.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+![Running in parallel](parallel.png)
+
+# Multiprocessing and multithreading in Python
+
+## Why use multiprocessing?
+
+%% Cell type:code id: tags:
+
+``` python
+import multiprocessing
+multiprocessing.cpu_count()
+```
+
+%% Output
+
+    4
+
+%% Cell type:markdown id: tags:
+
+*Almost all CPUs these days are multi-core.*
+
+CPU-intensive programs will not be efficient unless they take advantage of this!
+
+%% Cell type:markdown id: tags:
+
+# General plan
+
+Walk through a basic application of multiprocessing, hopefully relevant to the kind of work you might want to do.
+
+Not a comprehensive guide to Python multithreading/multiprocessing.
+
+%% Cell type:markdown id: tags:
+
+![voxel](voxel.png)
+
+## Sample application
+
+Assume we are doing some voxelwise image processing - i.e. running a computationally intensive calculation *independently* on each voxel in a (possibly large) image.
+
+*(Such problems are sometimes called 'embarrassingly parallel')*
+
+This is in a Python module called my_analysis. Here we simulate this by just calculating a large number of exponentials for each voxel.
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+import math
+import numpy
+
+def calculate_voxel(val):
+    # 'Slow' voxelwise calculation
+    for i in range(30000):
+        b = math.exp(val)
+    return b
+```
+
+%% Cell type:markdown id: tags:
+
+We're going to run this on a Numpy array. `numpy.vectorize` is a convenient function to apply a function to every element of the array, but it is *not* doing anything clever - it is no different than looping over the x, y, and z co-ordinates.
+
+We're also giving the data an ID - this will be used later when we have multiple threads.
+
+%% Cell type:code id: tags:
+
+``` python
+def calculate_data(data, id=0):
+    # Run 'calculate_voxel' on each voxel in data
+    print("Id: %i: Processing %i voxels" % (id, data.size))
+    vectorized = numpy.vectorize(calculate_voxel)
+    vectorized(data)
+    print("Id: %i: Done" % id)
+    return data
+
+```
+
+%% Cell type:markdown id: tags:
+
+Here's some Python code to run our analysis on a random Numpy array, and time how long it takes
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import timeit
+
+import my_analysis
+
+def run():
+    data = numpy.random.rand(16, 16, 16)
+    my_analysis.calculate_data(data)
+
+t = timeit.timeit(run, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Id: 0: Processing 4096 voxels
+    Id: 0: Done
+    Data processing took 26.44 seconds
+
+%% Cell type:markdown id: tags:
+
+So, it took a little while.
+
+%% Cell type:markdown id: tags:
+
+If we watch what's going on while this runs, we can see the program is not using all of our CPU. It's only working on one core.
+
+![Running in serial](onecore.png)
+
+%% Cell type:markdown id: tags:
+
+## What we want
+
+It would be nice to split the data up into chunks and give one to each core. Then we could get through the processing 8 times as fast.
+
+![Running in parallel](multicore.png)
+
+%% Cell type:markdown id: tags:
+
+# Multithreading attempt
+
+*Threads* are a way for a program to run more than one task at a time. Let's try using this on our application, using the Python `threading` module.
+
+%% Cell type:markdown id: tags:
+
+## Splitting the data up
+
+We're going to need to split the data up into chunks. Numpy has a handy function `numpy.split` which slices the array up into equal portions along a specified axis:
+
+    chunks = numpy.split(full_data, num_chunks, axis)
+
+*The data must split up equally along this axis! We will solve this problem later*
+
+%% Cell type:markdown id: tags:
+
+## Creating a new thread for each chunk
+
+    def function_to_call(args, arg2, arg3):
+       ...do something
+
+    ...
+
+    import threading
+    thread = threading.Thread(target=function_to_call,
+                              args=[arg1, arg2, arg3])
+
+%% Cell type:markdown id: tags:
+
+## Waiting for the threads to complete
+
+    thread.join()
+
+- This waits until `thread` has completed
+- So, if we have more than one thread we need to keep a list and wait for them all to finish:
+
+
+    for thread in threads:
+        thread.join()
+
+%% Cell type:markdown id: tags:
+
+## Example code
+
+The example code is below - let's see how it does!
+
+%% Cell type:code id: tags:
+
+``` python
+import threading
+
+def multithread_process(data):
+    n_workers = 4
+
+    # Split the data into chunks along axis 0
+    # We are assuming this axis is divisible by the number of workers!
+    chunks = numpy.split(data, n_workers, axis=0)
+
+    # Start a worker for each chunk
+    workers = []
+    for idx, chunk in enumerate(chunks):
+        print("Starting worker for part %i" % idx)
+        w = threading.Thread(target=my_analysis.calculate_data, args=[chunk, idx])
+        workers.append(w)
+        w.start()
+
+    # Wait for each worker to finish
+    for w in workers:
+        w.join()
+
+def run_with_threads():
+    data = numpy.random.rand(16, 16, 16)
+    multithread_process(data)
+
+t = timeit.timeit(run_with_threads, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Starting worker for part 0
+    Starting worker for part 1
+     Id: 0: Processing 1024 voxels
+    Id: 1: Processing 1024 voxels
+    Starting worker for part 2Id: 2: Processing 1024 voxels
+    
+    Starting worker for part 3Id: 3: Processing 1024 voxels
+    
+    Id: 1: DoneId: 0: Done
+    
+    Id: 2: Done
+    Id: 3: Done
+    Data processing took 132.90 seconds
+
+%% Cell type:markdown id: tags:
+
+# The Big Problem with Python threads
+
+%% Cell type:markdown id: tags:
+
+**Only one thread can execute Python code at a time**
+
+![Python multithreading](thread_gil.png)
+
+This is what's really going on.
+
+%% Cell type:markdown id: tags:
+
+The reason is something called the **Global Interpreter Lock (GIL)**. Only one thread can have it, and you can only execute Python code when you have the GIL.
+
+%% Cell type:markdown id: tags:
+
+## So, does that mean Python threads are useless?
+
+No, not completely. They're useful for:
+
+- Making a user interface continue to respond while a calculation takes place in the background
+- A web server handling multiple requests.
+  - *The GIL is not required while waiting for network connections*
+- Doing calculations in parallel which are running in native (C/C++) code
+  - *The GIL is not required while running native code*
+
+%% Cell type:markdown id: tags:
+
+
+### But for doing CPU-intensive Python calculations in parallel, yes Python threads are essentially useless
+
+%% Cell type:markdown id: tags:
+
+## Can multiprocessing help?
+
+%% Cell type:markdown id: tags:
+
+### Differences between threads and processes
+
+- Threads are quicker to start up and generally require fewer resources
+
+- Threads share memory with the main process
+  - Don't need to copy your data to pass it to a thread
+  - Don't need to copy the output data back to the main program
+
+- Processes have their own memory space
+  - Data needs to be copied from the main program to the process
+  - Any output needs to be copied back
+
+- However, importantly for Python, *Each process has its own GIL so they can run at the same time as others*
+
+%% Cell type:markdown id: tags:
+
+## Multiprocessing attempt
+
+Multiprocessing is normally more work than multithreading.
+
+However Python tries *very hard* to make multiprocessing as easy as multithreading.
+
+- `import multiprocessing` instead of `import threading`
+- `multiprocessing.Process()` instead of `threading.Thread()`
+
+%% Cell type:code id: tags:
+
+``` python
+import multiprocessing
+
+def multiprocess_process(data):
+    n_workers = 4
+
+    # Split the data into chunks along axis 0
+    # We are assuming this axis is divisible by the number of workers!
+    chunks = numpy.split(data, n_workers, axis=0)
+
+    workers = []
+    for idx, chunk in enumerate(chunks):
+        print("Starting worker for chunk %i" % idx)
+        w = multiprocessing.Process(target=my_analysis.calculate_data, args=[chunk, idx])
+        workers.append(w)
+        w.start()
+
+    # Wait for workers to complete
+    for w in workers:
+        w.join()
+
+def run_with_processes():
+    data = numpy.random.rand(16, 16, 16)
+    multiprocess_process(data)
+
+if __name__ == "__main__":
+    t = timeit.timeit(run_with_processes, number=1)
+    print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Starting worker for chunk 0
+    Starting worker for chunk 1
+    Starting worker for chunk 2
+    Starting worker for chunk 3
+    Data processing took 9.74 seconds
+
+%% Cell type:markdown id: tags:
+
+# Multiprocessing works!
+
+%% Cell type:markdown id: tags:
+
+## BUT
+
+# Caveats and gotchas
+
+Before we just run off and replace all our threads with processes there are a few things we need to bear in mind:
+
+## Data copying
+
+- Python *copied* each chunk of data to each worker. If the data was very large this could be a significant overhead
+- Python needs to know *how* to copy all the data we pass to the process.
+  - This is fine for normal data types (strings, lists, dictionaries, etc) and Numpy arrays
+  - Can get trouble if you try to pass complex objects to your function
+  - Anything you pass to the worker needs to be support the `pickle` module
+
+## The global variable problem
+
+- Can't rely on global variables being copied
+
+%% Cell type:markdown id: tags:
+
+## The output problem
+
+If you change data in a subprocess, your main program will not see it.
+
+## Example:
+
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+def add_one(data):
+    data += 1
+```
+
+%% Cell type:code id: tags:
+
+``` python
+data = numpy.zeros([2, 2])
+print("Starting with zeros")
+print(data)
+
+#my_analysis.add_one(data)
+#w = threading.Thread(target=my_analysis.add_one, args=[data,])
+w = multiprocessing.Process(target=my_analysis.add_one, args=[data,])
+w.start()
+w.join()
+
+print("I think my worker just added one")
+print(data)
+```
+
+%% Output
+
+    Starting with zeros
+    [[ 0.  0.]
+     [ 0.  0.]]
+    I think my worker just added one
+    [[ 0.  0.]
+     [ 0.  0.]]
+
+%% Cell type:markdown id: tags:
+
+# Making multiprocessing work better
+
+%% Cell type:markdown id: tags:
+
+## Problems to solve:
+
+- Dividing data amongst processes
+- Returning data from process
+- Status updates from process
+
+%% Cell type:markdown id: tags:
+
+## Worker pools
+
+A *Pool* is a fixed number of worker processes which can be given tasks to do.
+
+We can give a pool as many tasks as we like - once a worker finishes a task it will start another, until they're all done.
+
+We can create a pool using:
+
+    multiprocessing.Pool(num_workers)
+
+- If the number of workers in the pool is equal to the number of cores, we should be able to keep our CPU busy.
+- Pools are good for load balancing if some slices are more work than others
+
+%% Cell type:markdown id: tags:
+
+![Worker pool](pool.png)
+
+%% Cell type:markdown id: tags:
+
+## Splitting our data into chunks for the pool
+
+ - Now we can split our data up into as many chunks as we like
+ - Easiest solution is to use 1-voxel slices along one of the axes `split_axis`:
+   - `numpy.split(data, data.shape[split_axis], axis=split_axis)`
+
+%% Cell type:markdown id: tags:
+
+## Giving the tasks to the pool
+
+ - Easiest way is to use:
+
+
+    Pool.map(function, task_args)
+
+ - task_args is a *sequence of sequences*
+   - Each element in `task_args` is a sequence of arguments for one task
+   - The length of `task_args` is the number of tasks
+
+#### Example `task_args` for 5 tasks, each being passed an ID and a chunk of data
+
+    [
+       [0, chunk_1], # task 1
+       [1, chunk_2], # task_2
+       [2, chunk_3], # task 3
+       [3, chunk_4], # task_4
+       [4, chunk_5], # task_5
+    ]
+
+ - If we have a list of chunks we can generate this with `enumerate(chunks)`
+
+
+ - **Arguments are passed to the task in a slightly different way compared to `multiprocessing.Process()`**
+
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+def do_task(args):
+    # Pool.map passes all our arguments as a single tuple, so unpack it
+    # and pass the arguments to the real calculate function.
+    id, data = args
+    return calculate_data(data, id)
+
+```
+
+%% Cell type:markdown id: tags:
+
+ - If you're using Python 3, look into `Pool.starmap`
+
+%% Cell type:markdown id: tags:
+
+## Getting the output and putting it back together
+
+ - `Pool.map()` captures the return value of your worker function
+ - It returns a list of all of the return values for each task
+   - for us this is a list of Numpy arrays, one for each slice
+ - `numpy.concatenate(list_of_slices, split_axis)` will combine them back into a single data item for us
+
+%% Cell type:markdown id: tags:
+
+## The full example
+
+%% Cell type:code id: tags:
+
+``` python
+# Split our data along the x-axis
+SPLIT_AXIS = 0
+
+def pool_process(data):
+    n_workers = 4
+
+    # Split the data into 1-voxel slices along axis 0
+    parts = numpy.split(data, data.shape[SPLIT_AXIS], axis=SPLIT_AXIS)
+
+    print("Input data shape=%s" % str(data.shape))
+    print("Processing %i parts with %i workers" % (len(parts), n_workers))
+
+    # Create a pool - normally this would be 1 worker per CPU core
+    pool = multiprocessing.Pool(n_workers)
+
+    # Send the tasks to the workers
+    list_of_slices = pool.map(my_analysis.do_task, enumerate(parts))
+
+    # Combine the return data back into a single array
+    processed_array = numpy.concatenate(list_of_slices, SPLIT_AXIS)
+
+    print("Processed data, output shape=%s" % str(processed_array.shape))
+
+def run_with_pool():
+    data = numpy.random.rand(23, 13, 11)
+    pool_process(data)
+
+t = timeit.timeit(run_with_pool, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Input data shape=(23L, 13L, 11L)
+    Processing 23 parts with 4 workers
+    Processed data, output shape=(23L, 13L, 11L)
+    Data processing took 8.08 seconds
+
+%% Cell type:markdown id: tags:
+
+# Communication / Status updates?
+
+ - Would be nice if workers could communicate their progress as they work. One way to do this is using a `Queue`.
+
+
+%% Cell type:markdown id: tags:
+
+
+## Queues
+
+A Queue is often used to send status updates from the process to the main program.
+
+ - Shared between the main program and the subprocesses
+ - Create it with `multiprocessing.Manager().Queue()`
+ - Pass it to the worker thread like any other argument
+ - Worker calls `queue.put()` to send some data to the queue
+ - Main program calls `queue.get()` to get data off the queue
+ - Queue is FIFO (First In First Out)
+ - Need to have a thread running which checks the queue for updates every so often
+   - This is a good use for threads!
+
+![Queue](queue_put.png)
+
+
+%% Cell type:markdown id: tags:
+
+## Modify our example to report progress
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis.py
+
+def calculate_data_and_report(args):
+    id, data, queue = args
+
+    # Run 'calculate_voxel' on each voxel in data
+    vectorized = numpy.vectorize(calculate_voxel)
+    vectorized(data)
+
+    # Report our ID and how many voxels we have done to the queue
+    queue.put((id, data.size))
+
+```
+
+%% Cell type:markdown id: tags:
+
+## Create a thread to monitor the queue for updates
+
+I've done this as a class, because it that's the easiest way
+
+%% Cell type:code id: tags:
+
+``` python
+class QueueChecker():
+    def __init__(self, queue, num_voxels, interval_seconds=1):
+        self._queue = queue
+        self._num_voxels = num_voxels
+        self._interval_seconds = interval_seconds
+        self._voxels_done = 0
+        self._cancel = False
+        self._restart_timer()
+
+    def cancel(self):
+        self._cancel = True
+
+    def _restart_timer(self):
+        self._timer = threading.Timer(self._interval_seconds, self._check_queue)
+        self._timer.start()
+
+    def _check_queue(self):
+        while not self._queue.empty():
+            id, voxels_done = self._queue.get()
+            self._voxels_done += voxels_done
+
+        percent = int(100*float(self._voxels_done)/self._num_voxels)
+        print("%i%% complete" % percent)
+        if not self._cancel:
+            self._restart_timer()
+
+```
+
+%% Cell type:markdown id: tags:
+
+## Modify our main program to pass the queue to each of our workers
+
+We need to create the queue and the `QueueChecker` and make sure each task includes a copy of the queue
+
+%% Cell type:code id: tags:
+
+``` python
+# Split our data along the x-axis
+SPLIT_AXIS = 0
+reload(my_analysis)
+
+def pool_process(data):
+    n_workers = 4
+
+    # Split the data into 1-voxel slices along axis 0
+    parts = numpy.split(data, data.shape[SPLIT_AXIS], axis=SPLIT_AXIS)
+
+    print("Input data shape=%s" % str(data.shape))
+    print("We are processing %i parts with %i workers" % (len(parts), n_workers))
+    pool = multiprocessing.Pool(n_workers)
+
+    # Create the queue
+    queue = multiprocessing.Manager().Queue()
+    checker = QueueChecker(queue, data.size)
+
+    # Note that we need to pass the queue as an argument to the worker
+    args = [(id, part, queue) for id, part in enumerate(parts)]
+    list_of_slices = pool.map(my_analysis.calculate_data_and_report, args)
+
+    checker.cancel()
+
+    # Join processed data back together again
+    processed_array = numpy.concatenate(list_of_slices, SPLIT_AXIS)
+    print("Processed data, output shape=%s" % str(processed_array.shape))
+
+def run_with_pool():
+    data = numpy.random.rand(23, 19, 17)
+    pool_process(data)
+
+t = timeit.timeit(run_with_pool, number=1)
+print("Data processing took %.2f seconds" % t)
+```
+
+%% Output
+
+    Input data shape=(23L, 19L, 17L)
+    We are processing 23 parts with 4 workers
+    0% complete
+    0% complete
+    13% complete
+    17% complete
+    17% complete
+    34% complete
+    34% complete
+    47% complete
+    52% complete
+    56% complete
+    69% complete
+    69% complete
+    78% complete
+    86% complete
+    95% complete
+    Processed data, output shape=(23L, 19L, 17L)
+    Data processing took 17.39 seconds
+    100% complete
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+%% Cell type:markdown id: tags:
+
+## What we've covered
+
+ - Limitations of threading for parallel processing in Python
+ - How to split up a simple voxel-processing task into separate chunks
+   - `numpy.split()`
+ - How to run each chunk in parallel using multiprocessing
+   - `multiprocessing.Process`
+ - How to separate the number of tasks from the number of workers
+   - `multiprocessing.Pool()`
+   - `Pool.map()`
+ - How to get output back from the workers and join it back together again
+   - `numpy.concatenate()`
+ - How to pass back progress information from our worker processes
+   - `multiprocessing.manager.Queue()`
+ - Using a threading.Timer object to monitor the queue and display updates
+
+%% Cell type:markdown id: tags:
+
+## Things I haven't covered
+
+Loads of stuff!
+
+%% Cell type:markdown id: tags:
+
+### Threading
+
+- Locking of shared data (so only one thread can use it at a time)
+- Thread-local storage (see `threading.local()`)
+- See Paul's tutorial on the PyTreat GIT for more information
+- Or see the `threading` Python documentation for full details
+
+%% Cell type:markdown id: tags:
+
+### Multiprocessing
+
+- Passing data *between* workers
+  - Can use `Queue` for one-way traffic
+  - Use `Pipe` for two-way communication between one worker and another
+  - May be required when your problem is not 'embarrasingly parallel'
+- Sharing memory
+  - Way to avoid copying large amounts of data
+  - Look at `multiprocessing.Array`
+  - Need to convert Numpy array into a ctypes array
+  - Shared memory has pitfalls
+  - *Don't go here unless you have aready determined that data copying is a bottleneck*
+- Running workers asynchronously
+  - So main program doesn't have to wait for them to finish
+  - Instead, a function is called every time a task is finished
+  - see `multiprocessing.apply_async()` for more information
+- Error handling
+  - Needs a bit of care - very easy to 'lose' errors
+  - Workers should catch all exceptions
+  - And should return a value to signal when a task has failed
+  - Main program decides how to deal with it
+
+%% Cell type:markdown id: tags:
+
+## Always remember
+
+**Python is not the best tool for every job!**
+
+If you are really after performance, consider implementing your algorithm in multi-threaded C/C++ and then create a Python interface.
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/talks/packages/packages.ipynb
+++ b/talks/packages/packages.ipynb
 %% Cell type:markdown id: tags:

 # Main scientific python libraries
 See https://scipy.org/

 Most of these packages have or are in the progress of dropping support for python2.
 So use python3!

 ## [Numpy](http://www.numpy.org/): arrays
 This is the main library underlying (nearly) all of the scientific python ecosystem.
 See the tutorial in the beginner session or [the official numpy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) for usage details.

 The usual nickname of numpy is np:

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 ```

 %% Cell type:markdown id: tags:

 Numpy includes support for:
 - N-dimensional arrays with various datatypes
  - masked arrays
  - matrices
  - structured/record array
 - basic functions (e.g., sin, log, arctan, polynomials)
 - basic linear algebra
 - random number generation

 ## [Scipy](https://scipy.org/scipylib/index.html): most general scientific tools
 At the top level this module includes all of the basic functionality from numpy.
 You could import this as, but you might as well import numpy directly.

 %% Cell type:code id: tags:

 ``` 
 import scipy as sp
 ```

 %% Cell type:markdown id: tags:

 The main strength in scipy lies in its sub-packages:

 %% Cell type:code id: tags:

 ``` 
 from scipy import optimize
 def costfunc(params):
    return (params[0] - 3) ** 2
 optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
 ```

 %% Cell type:markdown id: tags:

 Tutorials for all sub-packages can be found [here](https://docs.scipy.org/doc/scipy-1.0.0/reference/).

+Alternative for `scipy.ndimage`:
+- [Scikit-image](http://scikit-image.org/docs/stable/auto_examples/) for image manipulation/segmentation/feature detection
+
 ## [Matplotlib](https://matplotlib.org/): Main plotting library

 %% Cell type:code id: tags:

 ``` 
 import matplotlib as mpl
 mpl.use('nbagg')
 import matplotlib.pyplot as plt
 ```

 %% Cell type:markdown id: tags:

 The matplotlib tutorials are [here](https://matplotlib.org/tutorials/index.html)

 %% Cell type:code id: tags:

 ``` 
 x = np.linspace(0, 2, 100)

 plt.plot(x, x, label='linear')
 plt.plot(x, x**2, label='quadratic')
 plt.plot(x, x**3, label='cubic')

 plt.xlabel('x label')
 plt.ylabel('y label')

 plt.title("Simple Plot")

 plt.legend()

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 Alternatives:
 - [Mayavi](http://docs.enthought.com/mayavi/mayavi/): 3D plotting (hard to install)
 - [Bokeh](https://bokeh.pydata.org/en/latest/) among many others: interactive plots in the browser (i.e., in javascript)

 ## [Ipython](http://ipython.org/)/[Jupyter](https://jupyter.org/) notebook: interactive python environments
-Supports:
- run code in multiple languages
-
-%% Cell type:code id: tags:
-
-``` 
-%%bash
-for name in python ruby ; do
-    echo $name
-done
-```
-
-%% Cell type:markdown id: tags:
-
+Features:
 - debugging

 %% Cell type:code id: tags:

 ``` 
 from scipy import optimize
 def costfunc(params):
-    return 1 / params[0] ** 2
-optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
+    if params[0] <= 0:
+        raise ValueError('Input variable is too low')
+    return 1 / params[0]
+optimize.minimize(costfunc, x0=[3], method='l-bfgs-b')
 ```

 %% Cell type:code id: tags:

 ``` 
 %debug
 ```

 %% Cell type:markdown id: tags:

 - timing/profiling

 %% Cell type:code id: tags:

 ``` 
 %%prun
 plt.plot([0, 3])
 ```

 %% Cell type:markdown id: tags:

 - getting help

 %% Cell type:code id: tags:

 ``` 
 plt.plot?
 ```

 %% Cell type:markdown id: tags:

 - [and much more...](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

 The next generation is already out: [jupyterlab](https://jupyterlab.readthedocs.io/en/latest/)

 There are many [useful extensions available](https://github.com/ipython-contrib/jupyter_contrib_nbextensions).

 ## [Pandas](https://pandas.pydata.org/): Analyzing "clean" data
 Once your data is in tabular form (e.g. Biobank IDP's), you want to use pandas dataframes to analyze them.
 This brings most of the functionality of R into python.
 Pandas has excellent support for:
 - fast IO to many tabular formats
 - accurate handling of missing data
 - Many, many routines to handle data
  - group by categorical data (e.g., male/female)
  - joining/merging data (all SQL-like operations and much more)
  - time series support
 - statistical models through [statsmodels](http://www.statsmodels.org/stable/index.html)
 - plotting though [seaborn](https://seaborn.pydata.org/)
 - Use [dask](https://dask.pydata.org/en/latest/) if your data is too big for memory (or if you want to run in parallel)

 You should also install `numexpr` and `bottleneck` for optimal performance.

 For the documentation check [here](http://pandas.pydata.org/pandas-docs/stable/index.html)

 ### Adjusted example from statsmodels tutorial

 %% Cell type:code id: tags:

 ``` 
 import statsmodels.api as sm
 import statsmodels.formula.api as smf
 import numpy as np
 ```

 %% Cell type:code id: tags:

 ``` 
 df = sm.datasets.get_rdataset("Guerry", "HistData").data
 df
 ```

 %% Cell type:code id: tags:

 ``` 
 df.describe()
 ```

 %% Cell type:code id: tags:

 ``` 
 df.groupby('Region').mean()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 df['log_pop'] = np.log(df.Pop1831)
 df
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + log_pop', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region + Region * Literacy', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 %matplotlib nbagg
 import seaborn as sns
 sns.pairplot(df, hue="Region", vars=('Lottery', 'Literacy', 'log_pop'))
 ```

 %% Cell type:markdown id: tags:

 ## [Sympy](http://www.sympy.org/en/index.html): Symbolic programming

 %% Cell type:code id: tags:

 ``` 
 import sympy as sym  # no standard nickname
 ```

 %% Cell type:code id: tags:

 ``` 
 x, a, b, c = sym.symbols('x, a, b, c')
 sym.solve(a * x ** 2 + b * x + c, x)
 ```

 %% Cell type:code id: tags:

 ``` 
 sym.integrate(x/(x**2+a*x+2), x)
 ```

 %% Cell type:code id: tags:

 ``` 
 f = sym.utilities.lambdify((x, a), sym.integrate((x**2+a*x+2), x))
 f(np.random.rand(10), np.random.rand(10))
 ```

 %% Cell type:markdown id: tags:

 # Other topics
 ## [Argparse](https://docs.python.org/3.6/howto/argparse.html): Command line arguments

 %% Cell type:code id: tags:

 ``` 
 %%writefile test_argparse.py
 import argparse

 def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

 if __name__ == '__main__':
    main()
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py 3 8 -v
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py -h
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py 3 8.5
 ```

 %% Cell type:markdown id: tags:

 Alternatives:
 - [docopt](http://docopt.org/): You write a usage string, docopt will generate the parser
 > ```
 > # example from https://realpython.com/blog/python/comparing-python-command-line-parsing-libraries-argparse-docopt-click/
 > """Greeter.
 >
 > Usage:
 >   commands.py hello
 >   commands.py goodbye
 >   commands.py -h | --help
 >
 > Options:
 >   -h --help     Show this screen.
 > """
 > from docopt import docopt
 >
 > if __name__ == '__main__':
 >     arguments = docopt(__doc__)
 > ```
 - [clize](http://clize.readthedocs.io/en/stable/why.html): You write a function, clize will generate the parser
 > ```
 > from clize import run
 >
 > def echo(word):
 >     return word
 >
 > if __name__ == '__main__':
 >     run(echo)
 > ```

 ### [Gooey](https://github.com/chriskiehl/Gooey): GUI from command line tool

 %% Cell type:code id: tags:

 ``` 
 %%writefile test_gooey.py
 import argparse
 from gooey import Gooey

 @Gooey
 def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

 if __name__ == '__main__':
    main()
 ```

 %% Cell type:code id: tags:

 ``` 
 !python.app test_gooey.py
 ```

 %% Cell type:code id: tags:

 ``` 
 !gcoord_gui
 ```

 %% Cell type:markdown id: tags:

 ## [Jinja2](http://jinja.pocoo.org/docs/2.10/): Templating language
 Jinja2 allows to create templates of files with placeholders, where future content will go.
 This allows for the creation of a large number of similar files.

 This can for example be used to produce static HTML output in a highly flexible manner.

 %% Cell type:code id: tags:

 ``` 
 %%writefile image_list.jinja2
 <!DOCTYPE html>
 <html lang="en">
 <head>
    {% block head %}
    <title>{{ title }}</title>
    {% endblock %}
 </head>
 <body>
    <div id="content">
        {% block content %}
            {% for description, filenames in images %}
                <p>
                    {{ description }}
                </p>
                {% for filename in filenames %}
                    <a href="{{ filename }}">
                        <img src="{{ filename }}">
                    </a>
                {% endfor %}
            {% endfor %}
        {% endblock %}
    </div>
    <footer>
        Created on {{ time }}
    </footer>
 </body>
 </html>
 ```

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 import matplotlib.pyplot as plt
 plt.ioff()

 def plot_sine(amplitude, frequency):
    x = np.linspace(0, 2 * np.pi, 100)
    y = amplitude * np.sin(frequency * x)
    plt.plot(x, y)
    plt.xticks([0, np.pi, 2 * np.pi], ['0', '$\pi$', '$2 \pi$'])
    plt.ylim(-1.1, 1.1)
    filename = 'plots/A{:.2f}_F{:.2f}.png'.format(amplitude, frequency)
    plt.title('A={:.2f}, F={:.2f}'.format(amplitude, frequency))
    plt.savefig(filename)
    plt.close(plt.gcf())
    return filename

 !mkdir plots
 amplitudes = [plot_sine(A, 1.) for A in [0.1, 0.3, 0.7, 1.0]]
 frequencies = [plot_sine(1., F) for F in [1, 2, 3, 4, 5, 6]]
 plt.ion()
 ```

 %% Cell type:code id: tags:

 ``` 
 from jinja2 import Environment, FileSystemLoader
 from datetime import datetime
 loader = FileSystemLoader('.')
 env = Environment(loader=loader)
 template = env.get_template('image_list.jinja2')

 images = [
    ('Varying the amplitude', amplitudes),
    ('Varying the frequency', frequencies),
 ]

 with open('image_list.html', 'w') as f:
    f.write(template.render(title='Lots of sines',
                            images=images, time=datetime.now()))
 ```

 %% Cell type:code id: tags:

 ``` 
 !open image_list.html
 ```

 %% Cell type:markdown id: tags:

 ## Neuroimage packages
 The [nipy](http://nipy.org/) ecosystem covers most of these.

 ## [networkx](https://networkx.github.io/): graph theory

 ## GUI
 - [tkinter](https://docs.python.org/3.6/library/tkinter.html): thin wrapper around Tcl/Tk; included in python
 - [wxpython](https://www.wxpython.org/): Wrapper around the C++ wxWidgets library

 %% Cell type:code id: tags:

 ``` 
 %%writefile wx_hello_world.py
 """
 Hello World, but with more meat.
 """

 import wx

 class HelloFrame(wx.Frame):
    """
    A Frame that says Hello World
    """

    def __init__(self, *args, **kw):
        # ensure the parent's __init__ is called
        super(HelloFrame, self).__init__(*args, **kw)

        # create a panel in the frame
        pnl = wx.Panel(self)

        # and put some text with a larger bold font on it
        st = wx.StaticText(pnl, label="Hello World!", pos=(25,25))
        font = st.GetFont()
        font.PointSize += 10
        font = font.Bold()
        st.SetFont(font)

        # create a menu bar
        self.makeMenuBar()

        # and a status bar
        self.CreateStatusBar()
        self.SetStatusText("Welcome to wxPython!")


    def makeMenuBar(self):
        """
        A menu bar is composed of menus, which are composed of menu items.
        This method builds a set of menus and binds handlers to be called
        when the menu item is selected.
        """

        # Make a file menu with Hello and Exit items
        fileMenu = wx.Menu()
        # The "\t..." syntax defines an accelerator key that also triggers
        # the same event
        helloItem = fileMenu.Append(-1, "&Hello...\tCtrl-H",
                "Help string shown in status bar for this menu item")
        fileMenu.AppendSeparator()
        # When using a stock ID we don't need to specify the menu item's
        # label
        exitItem = fileMenu.Append(wx.ID_EXIT)

        # Now a help menu for the about item
        helpMenu = wx.Menu()
        aboutItem = helpMenu.Append(wx.ID_ABOUT)

        # Make the menu bar and add the two menus to it. The '&' defines
        # that the next letter is the "mnemonic" for the menu item. On the
        # platforms that support it those letters are underlined and can be
        # triggered from the keyboard.
        menuBar = wx.MenuBar()
        menuBar.Append(fileMenu, "&File")
        menuBar.Append(helpMenu, "&Help")

        # Give the menu bar to the frame
        self.SetMenuBar(menuBar)

        # Finally, associate a handler function with the EVT_MENU event for
        # each of the menu items. That means that when that menu item is
        # activated then the associated handler function will be called.
        self.Bind(wx.EVT_MENU, self.OnHello, helloItem)
        self.Bind(wx.EVT_MENU, self.OnExit,  exitItem)
        self.Bind(wx.EVT_MENU, self.OnAbout, aboutItem)


    def OnExit(self, event):
        """Close the frame, terminating the application."""
        self.Close(True)


    def OnHello(self, event):
        """Say hello to the user."""
        wx.MessageBox("Hello again from wxPython")


    def OnAbout(self, event):
        """Display an About Dialog"""
        wx.MessageBox("This is a wxPython Hello World sample",
                      "About Hello World 2",
                      wx.OK|wx.ICON_INFORMATION)


 if __name__ == '__main__':
    # When this module is run (not imported) then create the app, the
    # frame, show it, and start the event loop.
    app = wx.App()
    frm = HelloFrame(None, title='Hello World 2')
    frm.Show()
    app.MainLoop()
 ```

 %% Cell type:code id: tags:

 ``` 
 !python.app wx_hello_world.py
 ```

 %% Cell type:markdown id: tags:

 ## Machine learning
 - scikit-learn
 - theano/tensorflow/pytorch
  - keras

 ## [pymc3](http://docs.pymc.io/): Pobabilstic programming

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 import matplotlib.pyplot as plt

 # Initialize random number generator
 np.random.seed(123)

 # True parameter values
 alpha, sigma = 1, 1
 beta = [1, 2.5]

 # Size of dataset
 size = 100

 # Predictor variable
 X1 = np.random.randn(size)
 X2 = np.random.randn(size) * 0.2

 # Simulate outcome variable
 Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
 ```

 %% Cell type:code id: tags:

 ``` 
 import pymc3 as pm
 basic_model = pm.Model()

 with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=2)
    sigma = pm.HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
 ```

 %% Cell type:code id: tags:

 ``` 
 with basic_model:

    # obtain starting values via MAP
    start = pm.find_MAP(fmin=optimize.fmin_powell)

    # instantiate sampler
    step = pm.Slice()

    # draw 5000 posterior samples
    trace = pm.sample(5000, step=step, start=start)
 ```

 %% Cell type:code id: tags:

 ``` 
 _ = pm.traceplot(trace)
 ```

 %% Cell type:code id: tags:

 ``` 
 pm.summary(trace)
 ```

 %% Cell type:markdown id: tags:

-Alternative: [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+Alternatives:
+- [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+- [emcee](http://dfm.io/emcee/current/): if you just want MCMC


 ## [Pycuda](https://documen.tician.de/pycuda/): Programming the GPU
 Wrapper around [Cuda](https://developer.nvidia.com/cuda-zone).
-The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
-
-%% Cell type:code id: tags:
-
-``` 
-import pycuda.autoinit
-import pycuda.driver as drv
+- The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
+- Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)

-from pycuda.compiler import SourceModule
-mod = SourceModule("""
-__global__ void multiply_them(double *dest, double *a, double *b)
-{
-  const int i = threadIdx.x;
-  dest[i] = a[i] * b[i];
-}
-""")
-
-multiply_them = mod.get_function("multiply_them")
-
-a = np.random.randn(400)
-b = np.random.randn(400)
-
-dest = np.zeros_like(a)
-multiply_them(
-        drv.Out(dest), drv.In(a), drv.In(b),
-        block=(400,1,1), grid=(1,1))
-
-print(dest-a*b)
-```
-
-%% Cell type:markdown id: tags:
-
-Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)
 ## Testing
 - [unittest](https://docs.python.org/3.6/library/unittest.html): python built-in testing

 %% Cell type:code id: tags:

 ``` 
 import unittest

 class TestStringMethods(unittest.TestCase):

    def test_upper(self):
        self.assertEqual('foo'.upper(), 'FOO')

    def test_isupper(self):
        self.assertTrue('FOO'.isupper())
        self.assertFalse('Foo'.isupper())

    def test_split(self):
        s = 'hello world'
        self.assertEqual(s.split(), ['hello', 'world'])
        # check that s.split fails when the separator is not a string
        with self.assertRaises(TypeError):
            s.split(2)

 if __name__ == '__main__':
    unittest.main()
 ```

 %% Cell type:markdown id: tags:

 - [doctest](https://docs.python.org/3.6/library/doctest.html): checks the example usage in the documentation

 %% Cell type:code id: tags:

 ``` 
 def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(30)
    265252859812191058636308480000000
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: n must be >= 0

    Factorials of floats are OK, but the float must be an exact integer:
    >>> factorial(30.1)
    Traceback (most recent call last):
        ...
    ValueError: n must be exact integer
    >>> factorial(30.0)
    265252859812191058636308480000000

    It must also not be ridiculously large:
    >>> factorial(1e100)
    Traceback (most recent call last):
        ...
    OverflowError: n too large
    """

    import math
    if not n >= 0:
        raise ValueError("n must be >= 0")
    if math.floor(n) != n:
        raise ValueError("n must be exact integer")
    if n+1 == n:  # catch a value like 1e300
        raise OverflowError("n too large")
    result = 1
    factor = 2
    while factor <= n:
        result *= factor
        factor += 1
    return result


 if __name__ == "__main__":
    import doctest
    doctest.testmod()
 ```

 %% Cell type:markdown id: tags:

 Two external packages provide more convenient unit tests:
 - [py.test](https://docs.pytest.org/en/latest/)
 - [nose2](http://nose2.readthedocs.io/en/latest/usage.html)

 %% Cell type:code id: tags:

 ``` 
 # content of test_sample.py
 def inc(x):
    return x + 1

 def test_answer():
    assert inc(3) == 5
 ```

 %% Cell type:markdown id: tags:

 - [coverage](https://coverage.readthedocs.io/en/coverage-4.5.1/): measures which part of the code is covered by the tests

 ## Linters
 Linters check the code for any syntax errors, [style errors](https://www.python.org/dev/peps/pep-0008/), unused variables, unreachable code, etc.
 - [pylint](https://pypi.python.org/pypi/pylint): most extensive linter
 - [pyflake](https://pypi.python.org/pypi/pyflakes): if you think pylint is too strict
 - [pep8](https://pypi.python.org/pypi/pep8): just checks for style errors
+
 ### Optional static typing
 - Document how your method/function should be called
  - Static checking of whether your type hints are still up to date
  - Static checking of whether you call your own function correctly
 - Even if you don't assign types yourself, static type checking can still check whether you call typed functions/methods from other packages correctly.

 %% Cell type:code id: tags:

 ``` 
 from typing import List

 def greet_all(names: List[str]) -> None:
    for name in names:
        print('Hello, {}'.format(name))

 greet_all(['python', 'java', 'C++'])  # type checker will be fine with this

 greet_all('matlab')  # this will actually run fine, but type checker will raise an error
 ```

 %% Cell type:markdown id: tags:

 Packages:
 - [typing](https://docs.python.org/3/library/typing.html): built-in library containing generics, unions, etc.
 - [mypy](http://mypy-lang.org/): linter doing static type checking
 - [pyAnnotate](https://github.com/dropbox/pyannotate): automatically assign types to most of your functions/methods based on runtime

 ## Web frameworks
 - [Django2](https://www.djangoproject.com/): includes the most features, but also forces you to do things their way
 - [Pyramid](https://trypyramid.com): Intermediate options
 - [Flask](http://flask.pocoo.org/): Bare-bone web framework, but many extensions available

 There are also many, many libraries to interact with databases, but you will have to google those yourself.

 # Quick mentions
 - [trimesh](https://github.com/mikedh/trimesh): Triangular mesh algorithms
 - [Pillow](https://pillow.readthedocs.io/en/latest/): Read/write/manipulate a wide variety of images (png, jpg, tiff, etc.)
 - [psychopy](http://www.psychopy.org/): equivalent of psychtoolbox (workshop coming up in April in Nottingham)
+- [Sphinx](http://www.sphinx-doc.org/en/master/): documentation generator
 - [Buit-in libraries](https://docs.python.org/3/py-modindex.html)
-    - [collections](https://docs.python.org/3.6/library/collections.html): deque, OrderedDict, namedtuple, and more
-    - [datetime](https://docs.python.org/3/library/datetime.html): Basic date and time types
    - [functools](https://docs.python.org/3/library/functools.html): caching, decorators, and support for functional programming
    - [json](https://docs.python.org/3/library/json.html)/[ipaddress](https://docs.python.org/3/library/ipaddress.html)/[xml](https://docs.python.org/3/library/xml.html#module-xml): parsing/writing
-    - [itertools](https://docs.python.org/3/library/itertools.html): more tools to loop over sequences
    - [logging](https://docs.python.org/3/library/logging.htm): log your output to stdout or a file (more flexible than print statements)
    - [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
    - [os](https://docs.python.org/3/library/os.html#module-os)/[sys](https://docs.python.org/3/library/sys.html): Miscellaneous operating system interfaces
    - [os.path](https://docs.python.org/3/library/os.path.html)/[pathlib](https://docs.python.org/3/library/pathlib.html): utilities to deal with filesystem paths (latter provides an object-oriented interface)
    - [pickle](https://docs.python.org/3/library/pickle.html): Store/load any python object
-    - [shutil](https://docs.python.org/3/library/shutil.html): copy/move files
    - [subprocess](https://docs.python.org/3/library/subprocess.html): call shell commands
-    - [time](https://docs.python.org/3/library/time.html)/[timeit](https://docs.python.org/3/library/timeit.html): Timing your code
-    - [turtle](https://docs.python.org/3/library/turtle.html#module-turtle): teach python to your kids!
    - [warnings](https://docs.python.org/3/library/warnings.html#module-warnings): tell people they are not using your code properly

 %% Cell type:code id: tags:

 ``` 
-from turtle import *
-color('red', 'yellow')
-begin_fill()
-speed(10)
-while True:
-    forward(200)
-    left(170)
-    if abs(pos()) < 1:
-        break
-end_fill()
-done()
-```
-
-%% Cell type:code id: tags:
-
-``` 
 import this
 ```

 %% Cell type:markdown id: tags:

 # Main scientific python libraries
 See https://scipy.org/

 Most of these packages have or are in the progress of dropping support for python2.
 So use python3!

 ## [Numpy](http://www.numpy.org/): arrays
 This is the main library underlying (nearly) all of the scientific python ecosystem.
 See the tutorial in the beginner session or [the official numpy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) for usage details.

 The usual nickname of numpy is np:

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 ```

 %% Cell type:markdown id: tags:

 Numpy includes support for:
 - N-dimensional arrays with various datatypes
  - masked arrays
  - matrices
  - structured/record array
 - basic functions (e.g., sin, log, arctan, polynomials)
 - basic linear algebra
 - random number generation

 ## [Scipy](https://scipy.org/scipylib/index.html): most general scientific tools
 At the top level this module includes all of the basic functionality from numpy.
 You could import this as, but you might as well import numpy directly.

 %% Cell type:code id: tags:

 ``` 
 import scipy as sp
 ```

 %% Cell type:markdown id: tags:

 The main strength in scipy lies in its sub-packages:

 %% Cell type:code id: tags:

 ``` 
 from scipy import optimize
 def costfunc(params):
    return (params[0] - 3) ** 2
 optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
 ```

 %% Cell type:markdown id: tags:

 Tutorials for all sub-packages can be found [here](https://docs.scipy.org/doc/scipy-1.0.0/reference/).

+Alternative for `scipy.ndimage`:
+- [Scikit-image](http://scikit-image.org/docs/stable/auto_examples/) for image manipulation/segmentation/feature detection
+
 ## [Matplotlib](https://matplotlib.org/): Main plotting library

 %% Cell type:code id: tags:

 ``` 
 import matplotlib as mpl
 mpl.use('nbagg')
 import matplotlib.pyplot as plt
 ```

 %% Cell type:markdown id: tags:

 The matplotlib tutorials are [here](https://matplotlib.org/tutorials/index.html)

 %% Cell type:code id: tags:

 ``` 
 x = np.linspace(0, 2, 100)

 plt.plot(x, x, label='linear')
 plt.plot(x, x**2, label='quadratic')
 plt.plot(x, x**3, label='cubic')

 plt.xlabel('x label')
 plt.ylabel('y label')

 plt.title("Simple Plot")

 plt.legend()

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 Alternatives:
 - [Mayavi](http://docs.enthought.com/mayavi/mayavi/): 3D plotting (hard to install)
 - [Bokeh](https://bokeh.pydata.org/en/latest/) among many others: interactive plots in the browser (i.e., in javascript)

 ## [Ipython](http://ipython.org/)/[Jupyter](https://jupyter.org/) notebook: interactive python environments
-Supports:
- run code in multiple languages
-
-%% Cell type:code id: tags:
-
-``` 
-%%bash
-for name in python ruby ; do
-    echo $name
-done
-```
-
-%% Cell type:markdown id: tags:
-
+Features:
 - debugging

 %% Cell type:code id: tags:

 ``` 
 from scipy import optimize
 def costfunc(params):
-    return 1 / params[0] ** 2
-optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
+    if params[0] <= 0:
+        raise ValueError('Input variable is too low')
+    return 1 / params[0]
+optimize.minimize(costfunc, x0=[3], method='l-bfgs-b')
 ```

 %% Cell type:code id: tags:

 ``` 
 %debug
 ```

 %% Cell type:markdown id: tags:

 - timing/profiling

 %% Cell type:code id: tags:

 ``` 
 %%prun
 plt.plot([0, 3])
 ```

 %% Cell type:markdown id: tags:

 - getting help

 %% Cell type:code id: tags:

 ``` 
 plt.plot?
 ```

 %% Cell type:markdown id: tags:

 - [and much more...](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

 The next generation is already out: [jupyterlab](https://jupyterlab.readthedocs.io/en/latest/)

 There are many [useful extensions available](https://github.com/ipython-contrib/jupyter_contrib_nbextensions).

 ## [Pandas](https://pandas.pydata.org/): Analyzing "clean" data
 Once your data is in tabular form (e.g. Biobank IDP's), you want to use pandas dataframes to analyze them.
 This brings most of the functionality of R into python.
 Pandas has excellent support for:
 - fast IO to many tabular formats
 - accurate handling of missing data
 - Many, many routines to handle data
  - group by categorical data (e.g., male/female)
  - joining/merging data (all SQL-like operations and much more)
  - time series support
 - statistical models through [statsmodels](http://www.statsmodels.org/stable/index.html)
 - plotting though [seaborn](https://seaborn.pydata.org/)
 - Use [dask](https://dask.pydata.org/en/latest/) if your data is too big for memory (or if you want to run in parallel)

 You should also install `numexpr` and `bottleneck` for optimal performance.

 For the documentation check [here](http://pandas.pydata.org/pandas-docs/stable/index.html)

 ### Adjusted example from statsmodels tutorial

 %% Cell type:code id: tags:

 ``` 
 import statsmodels.api as sm
 import statsmodels.formula.api as smf
 import numpy as np
 ```

 %% Cell type:code id: tags:

 ``` 
 df = sm.datasets.get_rdataset("Guerry", "HistData").data
 df
 ```

 %% Cell type:code id: tags:

 ``` 
 df.describe()
 ```

 %% Cell type:code id: tags:

 ``` 
 df.groupby('Region').mean()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 df['log_pop'] = np.log(df.Pop1831)
 df
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + log_pop', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region + Region * Literacy', data=df).fit()
 results.summary()
 ```

 %% Cell type:code id: tags:

 ``` 
 %matplotlib nbagg
 import seaborn as sns
 sns.pairplot(df, hue="Region", vars=('Lottery', 'Literacy', 'log_pop'))
 ```

 %% Cell type:markdown id: tags:

 ## [Sympy](http://www.sympy.org/en/index.html): Symbolic programming

 %% Cell type:code id: tags:

 ``` 
 import sympy as sym  # no standard nickname
 ```

 %% Cell type:code id: tags:

 ``` 
 x, a, b, c = sym.symbols('x, a, b, c')
 sym.solve(a * x ** 2 + b * x + c, x)
 ```

 %% Cell type:code id: tags:

 ``` 
 sym.integrate(x/(x**2+a*x+2), x)
 ```

 %% Cell type:code id: tags:

 ``` 
 f = sym.utilities.lambdify((x, a), sym.integrate((x**2+a*x+2), x))
 f(np.random.rand(10), np.random.rand(10))
 ```

 %% Cell type:markdown id: tags:

 # Other topics
 ## [Argparse](https://docs.python.org/3.6/howto/argparse.html): Command line arguments

 %% Cell type:code id: tags:

 ``` 
 %%writefile test_argparse.py
 import argparse

 def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

 if __name__ == '__main__':
    main()
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py 3 8 -v
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py -h
 ```

 %% Cell type:code id: tags:

 ``` 
 %run test_argparse.py 3 8.5
 ```

 %% Cell type:markdown id: tags:

 Alternatives:
 - [docopt](http://docopt.org/): You write a usage string, docopt will generate the parser
 > ```
 > # example from https://realpython.com/blog/python/comparing-python-command-line-parsing-libraries-argparse-docopt-click/
 > """Greeter.
 >
 > Usage:
 >   commands.py hello
 >   commands.py goodbye
 >   commands.py -h | --help
 >
 > Options:
 >   -h --help     Show this screen.
 > """
 > from docopt import docopt
 >
 > if __name__ == '__main__':
 >     arguments = docopt(__doc__)
 > ```
 - [clize](http://clize.readthedocs.io/en/stable/why.html): You write a function, clize will generate the parser
 > ```
 > from clize import run
 >
 > def echo(word):
 >     return word
 >
 > if __name__ == '__main__':
 >     run(echo)
 > ```

 ### [Gooey](https://github.com/chriskiehl/Gooey): GUI from command line tool

 %% Cell type:code id: tags:

 ``` 
 %%writefile test_gooey.py
 import argparse
 from gooey import Gooey

 @Gooey
 def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

 if __name__ == '__main__':
    main()
 ```

 %% Cell type:code id: tags:

 ``` 
 !python.app test_gooey.py
 ```

 %% Cell type:code id: tags:

 ``` 
 !gcoord_gui
 ```

 %% Cell type:markdown id: tags:

 ## [Jinja2](http://jinja.pocoo.org/docs/2.10/): Templating language
 Jinja2 allows to create templates of files with placeholders, where future content will go.
 This allows for the creation of a large number of similar files.

 This can for example be used to produce static HTML output in a highly flexible manner.

 %% Cell type:code id: tags:

 ``` 
 %%writefile image_list.jinja2
 <!DOCTYPE html>
 <html lang="en">
 <head>
    {% block head %}
    <title>{{ title }}</title>
    {% endblock %}
 </head>
 <body>
    <div id="content">
        {% block content %}
            {% for description, filenames in images %}
                <p>
                    {{ description }}
                </p>
                {% for filename in filenames %}
                    <a href="{{ filename }}">
                        <img src="{{ filename }}">
                    </a>
                {% endfor %}
            {% endfor %}
        {% endblock %}
    </div>
    <footer>
        Created on {{ time }}
    </footer>
 </body>
 </html>
 ```

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 import matplotlib.pyplot as plt
 plt.ioff()

 def plot_sine(amplitude, frequency):
    x = np.linspace(0, 2 * np.pi, 100)
    y = amplitude * np.sin(frequency * x)
    plt.plot(x, y)
    plt.xticks([0, np.pi, 2 * np.pi], ['0', '$\pi$', '$2 \pi$'])
    plt.ylim(-1.1, 1.1)
    filename = 'plots/A{:.2f}_F{:.2f}.png'.format(amplitude, frequency)
    plt.title('A={:.2f}, F={:.2f}'.format(amplitude, frequency))
    plt.savefig(filename)
    plt.close(plt.gcf())
    return filename

 !mkdir plots
 amplitudes = [plot_sine(A, 1.) for A in [0.1, 0.3, 0.7, 1.0]]
 frequencies = [plot_sine(1., F) for F in [1, 2, 3, 4, 5, 6]]
 plt.ion()
 ```

 %% Cell type:code id: tags:

 ``` 
 from jinja2 import Environment, FileSystemLoader
 from datetime import datetime
 loader = FileSystemLoader('.')
 env = Environment(loader=loader)
 template = env.get_template('image_list.jinja2')

 images = [
    ('Varying the amplitude', amplitudes),
    ('Varying the frequency', frequencies),
 ]

 with open('image_list.html', 'w') as f:
    f.write(template.render(title='Lots of sines',
                            images=images, time=datetime.now()))
 ```

 %% Cell type:code id: tags:

 ``` 
 !open image_list.html
 ```

 %% Cell type:markdown id: tags:

 ## Neuroimage packages
 The [nipy](http://nipy.org/) ecosystem covers most of these.

 ## [networkx](https://networkx.github.io/): graph theory

 ## GUI
 - [tkinter](https://docs.python.org/3.6/library/tkinter.html): thin wrapper around Tcl/Tk; included in python
 - [wxpython](https://www.wxpython.org/): Wrapper around the C++ wxWidgets library

 %% Cell type:code id: tags:

 ``` 
 %%writefile wx_hello_world.py
 """
 Hello World, but with more meat.
 """

 import wx

 class HelloFrame(wx.Frame):
    """
    A Frame that says Hello World
    """

    def __init__(self, *args, **kw):
        # ensure the parent's __init__ is called
        super(HelloFrame, self).__init__(*args, **kw)

        # create a panel in the frame
        pnl = wx.Panel(self)

        # and put some text with a larger bold font on it
        st = wx.StaticText(pnl, label="Hello World!", pos=(25,25))
        font = st.GetFont()
        font.PointSize += 10
        font = font.Bold()
        st.SetFont(font)

        # create a menu bar
        self.makeMenuBar()

        # and a status bar
        self.CreateStatusBar()
        self.SetStatusText("Welcome to wxPython!")


    def makeMenuBar(self):
        """
        A menu bar is composed of menus, which are composed of menu items.
        This method builds a set of menus and binds handlers to be called
        when the menu item is selected.
        """

        # Make a file menu with Hello and Exit items
        fileMenu = wx.Menu()
        # The "\t..." syntax defines an accelerator key that also triggers
        # the same event
        helloItem = fileMenu.Append(-1, "&Hello...\tCtrl-H",
                "Help string shown in status bar for this menu item")
        fileMenu.AppendSeparator()
        # When using a stock ID we don't need to specify the menu item's
        # label
        exitItem = fileMenu.Append(wx.ID_EXIT)

        # Now a help menu for the about item
        helpMenu = wx.Menu()
        aboutItem = helpMenu.Append(wx.ID_ABOUT)

        # Make the menu bar and add the two menus to it. The '&' defines
        # that the next letter is the "mnemonic" for the menu item. On the
        # platforms that support it those letters are underlined and can be
        # triggered from the keyboard.
        menuBar = wx.MenuBar()
        menuBar.Append(fileMenu, "&File")
        menuBar.Append(helpMenu, "&Help")

        # Give the menu bar to the frame
        self.SetMenuBar(menuBar)

        # Finally, associate a handler function with the EVT_MENU event for
        # each of the menu items. That means that when that menu item is
        # activated then the associated handler function will be called.
        self.Bind(wx.EVT_MENU, self.OnHello, helloItem)
        self.Bind(wx.EVT_MENU, self.OnExit,  exitItem)
        self.Bind(wx.EVT_MENU, self.OnAbout, aboutItem)


    def OnExit(self, event):
        """Close the frame, terminating the application."""
        self.Close(True)


    def OnHello(self, event):
        """Say hello to the user."""
        wx.MessageBox("Hello again from wxPython")


    def OnAbout(self, event):
        """Display an About Dialog"""
        wx.MessageBox("This is a wxPython Hello World sample",
                      "About Hello World 2",
                      wx.OK|wx.ICON_INFORMATION)


 if __name__ == '__main__':
    # When this module is run (not imported) then create the app, the
    # frame, show it, and start the event loop.
    app = wx.App()
    frm = HelloFrame(None, title='Hello World 2')
    frm.Show()
    app.MainLoop()
 ```

 %% Cell type:code id: tags:

 ``` 
 !python.app wx_hello_world.py
 ```

 %% Cell type:markdown id: tags:

 ## Machine learning
 - scikit-learn
 - theano/tensorflow/pytorch
  - keras

 ## [pymc3](http://docs.pymc.io/): Pobabilstic programming

 %% Cell type:code id: tags:

 ``` 
 import numpy as np
 import matplotlib.pyplot as plt

 # Initialize random number generator
 np.random.seed(123)

 # True parameter values
 alpha, sigma = 1, 1
 beta = [1, 2.5]

 # Size of dataset
 size = 100

 # Predictor variable
 X1 = np.random.randn(size)
 X2 = np.random.randn(size) * 0.2

 # Simulate outcome variable
 Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
 ```

 %% Cell type:code id: tags:

 ``` 
 import pymc3 as pm
 basic_model = pm.Model()

 with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=2)
    sigma = pm.HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
 ```

 %% Cell type:code id: tags:

 ``` 
 with basic_model:

    # obtain starting values via MAP
    start = pm.find_MAP(fmin=optimize.fmin_powell)

    # instantiate sampler
    step = pm.Slice()

    # draw 5000 posterior samples
    trace = pm.sample(5000, step=step, start=start)
 ```

 %% Cell type:code id: tags:

 ``` 
 _ = pm.traceplot(trace)
 ```

 %% Cell type:code id: tags:

 ``` 
 pm.summary(trace)
 ```

 %% Cell type:markdown id: tags:

-Alternative: [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+Alternatives:
+- [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+- [emcee](http://dfm.io/emcee/current/): if you just want MCMC


 ## [Pycuda](https://documen.tician.de/pycuda/): Programming the GPU
 Wrapper around [Cuda](https://developer.nvidia.com/cuda-zone).
-The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
-
-%% Cell type:code id: tags:
-
-``` 
-import pycuda.autoinit
-import pycuda.driver as drv
+- The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
+- Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)

-from pycuda.compiler import SourceModule
-mod = SourceModule("""
-__global__ void multiply_them(double *dest, double *a, double *b)
-{
-  const int i = threadIdx.x;
-  dest[i] = a[i] * b[i];
-}
-""")
-
-multiply_them = mod.get_function("multiply_them")
-
-a = np.random.randn(400)
-b = np.random.randn(400)
-
-dest = np.zeros_like(a)
-multiply_them(
-        drv.Out(dest), drv.In(a), drv.In(b),
-        block=(400,1,1), grid=(1,1))
-
-print(dest-a*b)
-```
-
-%% Cell type:markdown id: tags:
-
-Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)
 ## Testing
 - [unittest](https://docs.python.org/3.6/library/unittest.html): python built-in testing

 %% Cell type:code id: tags:

 ``` 
 import unittest

 class TestStringMethods(unittest.TestCase):

    def test_upper(self):
        self.assertEqual('foo'.upper(), 'FOO')

    def test_isupper(self):
        self.assertTrue('FOO'.isupper())
        self.assertFalse('Foo'.isupper())

    def test_split(self):
        s = 'hello world'
        self.assertEqual(s.split(), ['hello', 'world'])
        # check that s.split fails when the separator is not a string
        with self.assertRaises(TypeError):
            s.split(2)

 if __name__ == '__main__':
    unittest.main()
 ```

 %% Cell type:markdown id: tags:

 - [doctest](https://docs.python.org/3.6/library/doctest.html): checks the example usage in the documentation

 %% Cell type:code id: tags:

 ``` 
 def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(30)
    265252859812191058636308480000000
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: n must be >= 0

    Factorials of floats are OK, but the float must be an exact integer:
    >>> factorial(30.1)
    Traceback (most recent call last):
        ...
    ValueError: n must be exact integer
    >>> factorial(30.0)
    265252859812191058636308480000000

    It must also not be ridiculously large:
    >>> factorial(1e100)
    Traceback (most recent call last):
        ...
    OverflowError: n too large
    """

    import math
    if not n >= 0:
        raise ValueError("n must be >= 0")
    if math.floor(n) != n:
        raise ValueError("n must be exact integer")
    if n+1 == n:  # catch a value like 1e300
        raise OverflowError("n too large")
    result = 1
    factor = 2
    while factor <= n:
        result *= factor
        factor += 1
    return result


 if __name__ == "__main__":
    import doctest
    doctest.testmod()
 ```

 %% Cell type:markdown id: tags:

 Two external packages provide more convenient unit tests:
 - [py.test](https://docs.pytest.org/en/latest/)
 - [nose2](http://nose2.readthedocs.io/en/latest/usage.html)

 %% Cell type:code id: tags:

 ``` 
 # content of test_sample.py
 def inc(x):
    return x + 1

 def test_answer():
    assert inc(3) == 5
 ```

 %% Cell type:markdown id: tags:

 - [coverage](https://coverage.readthedocs.io/en/coverage-4.5.1/): measures which part of the code is covered by the tests

 ## Linters
 Linters check the code for any syntax errors, [style errors](https://www.python.org/dev/peps/pep-0008/), unused variables, unreachable code, etc.
 - [pylint](https://pypi.python.org/pypi/pylint): most extensive linter
 - [pyflake](https://pypi.python.org/pypi/pyflakes): if you think pylint is too strict
 - [pep8](https://pypi.python.org/pypi/pep8): just checks for style errors
+
 ### Optional static typing
 - Document how your method/function should be called
  - Static checking of whether your type hints are still up to date
  - Static checking of whether you call your own function correctly
 - Even if you don't assign types yourself, static type checking can still check whether you call typed functions/methods from other packages correctly.

 %% Cell type:code id: tags:

 ``` 
 from typing import List

 def greet_all(names: List[str]) -> None:
    for name in names:
        print('Hello, {}'.format(name))

 greet_all(['python', 'java', 'C++'])  # type checker will be fine with this

 greet_all('matlab')  # this will actually run fine, but type checker will raise an error
 ```

 %% Cell type:markdown id: tags:

 Packages:
 - [typing](https://docs.python.org/3/library/typing.html): built-in library containing generics, unions, etc.
 - [mypy](http://mypy-lang.org/): linter doing static type checking
 - [pyAnnotate](https://github.com/dropbox/pyannotate): automatically assign types to most of your functions/methods based on runtime

 ## Web frameworks
 - [Django2](https://www.djangoproject.com/): includes the most features, but also forces you to do things their way
 - [Pyramid](https://trypyramid.com): Intermediate options
 - [Flask](http://flask.pocoo.org/): Bare-bone web framework, but many extensions available

 There are also many, many libraries to interact with databases, but you will have to google those yourself.

 # Quick mentions
 - [trimesh](https://github.com/mikedh/trimesh): Triangular mesh algorithms
 - [Pillow](https://pillow.readthedocs.io/en/latest/): Read/write/manipulate a wide variety of images (png, jpg, tiff, etc.)
 - [psychopy](http://www.psychopy.org/): equivalent of psychtoolbox (workshop coming up in April in Nottingham)
+- [Sphinx](http://www.sphinx-doc.org/en/master/): documentation generator
 - [Buit-in libraries](https://docs.python.org/3/py-modindex.html)
-    - [collections](https://docs.python.org/3.6/library/collections.html): deque, OrderedDict, namedtuple, and more
-    - [datetime](https://docs.python.org/3/library/datetime.html): Basic date and time types
    - [functools](https://docs.python.org/3/library/functools.html): caching, decorators, and support for functional programming
    - [json](https://docs.python.org/3/library/json.html)/[ipaddress](https://docs.python.org/3/library/ipaddress.html)/[xml](https://docs.python.org/3/library/xml.html#module-xml): parsing/writing
-    - [itertools](https://docs.python.org/3/library/itertools.html): more tools to loop over sequences
    - [logging](https://docs.python.org/3/library/logging.htm): log your output to stdout or a file (more flexible than print statements)
    - [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
    - [os](https://docs.python.org/3/library/os.html#module-os)/[sys](https://docs.python.org/3/library/sys.html): Miscellaneous operating system interfaces
    - [os.path](https://docs.python.org/3/library/os.path.html)/[pathlib](https://docs.python.org/3/library/pathlib.html): utilities to deal with filesystem paths (latter provides an object-oriented interface)
    - [pickle](https://docs.python.org/3/library/pickle.html): Store/load any python object
-    - [shutil](https://docs.python.org/3/library/shutil.html): copy/move files
    - [subprocess](https://docs.python.org/3/library/subprocess.html): call shell commands
-    - [time](https://docs.python.org/3/library/time.html)/[timeit](https://docs.python.org/3/library/timeit.html): Timing your code
-    - [turtle](https://docs.python.org/3/library/turtle.html#module-turtle): teach python to your kids!
    - [warnings](https://docs.python.org/3/library/warnings.html#module-warnings): tell people they are not using your code properly

 %% Cell type:code id: tags:

 ``` 
-from turtle import *
-color('red', 'yellow')
-begin_fill()
-speed(10)
-while True:
-    forward(200)
-    left(170)
-    if abs(pos()) < 1:
-        break
-end_fill()
-done()
-```
-
-%% Cell type:code id: tags:
-
-``` 
 import this
 ```

--- a/talks/packages/packages.md
+++ b/talks/packages/packages.md
@@ -39,6 +39,9 @@ optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')

 Tutorials for all sub-packages can be found [here](https://docs.scipy.org/doc/scipy-1.0.0/reference/).

+Alternative for `scipy.ndimage`:
+- [Scikit-image](http://scikit-image.org/docs/stable/auto_examples/) for image manipulation/segmentation/feature detection
+
 ## [Matplotlib](https://matplotlib.org/): Main plotting library
 ```
 import matplotlib as mpl
@@ -69,22 +72,17 @@ Alternatives:
 - [Bokeh](https://bokeh.pydata.org/en/latest/) among many others: interactive plots in the browser (i.e., in javascript)

 ## [Ipython](http://ipython.org/)/[Jupyter](https://jupyter.org/) notebook: interactive python environments
-Supports:
- run code in multiple languages
-```
-%%bash
-for name in python ruby ; do
-    echo $name
-done
-```
-
+Features:
 - debugging
 ```
 from scipy import optimize
 def costfunc(params):
-    return 1 / params[0] ** 2
-optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
+    if params[0] <= 0:
+        raise ValueError('Input variable is too low')
+    return 1 / params[0]
+optimize.minimize(costfunc, x0=[3], method='l-bfgs-b')
 ```
+
 ```
 %debug
 ```
@@ -552,39 +550,16 @@ _ = pm.traceplot(trace)
 pm.summary(trace)
 ```

-Alternative: [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+Alternatives:
+- [pystan](https://pystan.readthedocs.io/en/latest/): wrapper around the [Stan](http://mc-stan.org/users/) probabilistic programming language.
+- [emcee](http://dfm.io/emcee/current/): if you just want MCMC


 ## [Pycuda](https://documen.tician.de/pycuda/): Programming the GPU
 Wrapper around [Cuda](https://developer.nvidia.com/cuda-zone).
-The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
-```
-import pycuda.autoinit
-import pycuda.driver as drv
-
-from pycuda.compiler import SourceModule
-mod = SourceModule("""
-__global__ void multiply_them(double *dest, double *a, double *b)
-{
-  const int i = threadIdx.x;
-  dest[i] = a[i] * b[i];
-}
-""")
+- The alternative [Pyopencl](https://documen.tician.de/pyopencl/) provides a very similar wrapper around [OpenCL](https://www.khronos.org/opencl/).
+- Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)

-multiply_them = mod.get_function("multiply_them")
-
-a = np.random.randn(400)
-b = np.random.randn(400)
-
-dest = np.zeros_like(a)
-multiply_them(
-        drv.Out(dest), drv.In(a), drv.In(b),
-        block=(400,1,1), grid=(1,1))
-
-print(dest-a*b)
-```
-
-Also see [pyopenGL](http://pyopengl.sourceforge.net/): graphics programming in python (used in FSLeyes)
 ## Testing
 - [unittest](https://docs.python.org/3.6/library/unittest.html): python built-in testing
 ```
@@ -676,6 +651,7 @@ Linters check the code for any syntax errors, [style errors](https://www.python.
 - [pylint](https://pypi.python.org/pypi/pylint): most extensive linter
 - [pyflake](https://pypi.python.org/pypi/pyflakes): if you think pylint is too strict
 - [pep8](https://pypi.python.org/pypi/pep8): just checks for style errors
+
 ### Optional static typing
 - Document how your method/function should be called
  - Static checking of whether your type hints are still up to date
@@ -710,37 +686,18 @@ There are also many, many libraries to interact with databases, but you will hav
 - [trimesh](https://github.com/mikedh/trimesh): Triangular mesh algorithms
 - [Pillow](https://pillow.readthedocs.io/en/latest/): Read/write/manipulate a wide variety of images (png, jpg, tiff, etc.)
 - [psychopy](http://www.psychopy.org/): equivalent of psychtoolbox (workshop coming up in April in Nottingham)
+- [Sphinx](http://www.sphinx-doc.org/en/master/): documentation generator
 - [Buit-in libraries](https://docs.python.org/3/py-modindex.html)
-    - [collections](https://docs.python.org/3.6/library/collections.html): deque, OrderedDict, namedtuple, and more
-    - [datetime](https://docs.python.org/3/library/datetime.html): Basic date and time types
    - [functools](https://docs.python.org/3/library/functools.html): caching, decorators, and support for functional programming
    - [json](https://docs.python.org/3/library/json.html)/[ipaddress](https://docs.python.org/3/library/ipaddress.html)/[xml](https://docs.python.org/3/library/xml.html#module-xml): parsing/writing
-    - [itertools](https://docs.python.org/3/library/itertools.html): more tools to loop over sequences
    - [logging](https://docs.python.org/3/library/logging.htm): log your output to stdout or a file (more flexible than print statements)
    - [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
    - [os](https://docs.python.org/3/library/os.html#module-os)/[sys](https://docs.python.org/3/library/sys.html): Miscellaneous operating system interfaces
    - [os.path](https://docs.python.org/3/library/os.path.html)/[pathlib](https://docs.python.org/3/library/pathlib.html): utilities to deal with filesystem paths (latter provides an object-oriented interface)
    - [pickle](https://docs.python.org/3/library/pickle.html): Store/load any python object
-    - [shutil](https://docs.python.org/3/library/shutil.html): copy/move files
    - [subprocess](https://docs.python.org/3/library/subprocess.html): call shell commands
-    - [time](https://docs.python.org/3/library/time.html)/[timeit](https://docs.python.org/3/library/timeit.html): Timing your code
-    - [turtle](https://docs.python.org/3/library/turtle.html#module-turtle): teach python to your kids!
    - [warnings](https://docs.python.org/3/library/warnings.html#module-warnings): tell people they are not using your code properly

-```
-from turtle import *
-color('red', 'yellow')
-begin_fill()
-speed(10)
-while True:
-    forward(200)
-    left(170)
-    if abs(pos()) < 1:
-        break
-end_fill()
-done()
-```
-
 ```
 import this
 ```
--- a/talks/packaging_projects/PyTreat_Final.key
+++ b/talks/packaging_projects/PyTreat_Final.key
--- a/talks/pandas/group_by.png
+++ b/talks/pandas/group_by.png
--- a/talks/pandas/pandas.ipynb
+++ b/talks/pandas/pandas.ipynb
+%% Cell type:markdown id: tags:
+
+# Pandas
+
+Follow along online at: https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/-/blob/master/talks/pandas/pandas.ipynb
+
+Pandas is a data analysis library focused on the cleaning and exploration of
+tabular data.
+
+Some useful links are:
+- [main website](https://pandas.pydata.org)
+- [documentation](http://pandas.pydata.org/pandas-docs/stable/)<sup>1</sup>
+- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)<sup>1</sup> by
+  Jake van der Plas
+- [List of Pandas tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)
+
+<sup>1</sup> This tutorial borrows heavily from the pandas documentation and
+the Python Data Science Handbook
+
+%% Cell type:code id: tags:
+
+``` 
+%pylab inline
+import pandas as pd  # pd is the usual abbreviation for pandas
+import matplotlib.pyplot as plt # matplotlib for plotting
+import seaborn as sns  # seaborn is the main plotting library for Pandas
+import statsmodels.api as sm  # statsmodels fits linear models to pandas data
+import statsmodels.formula.api as smf
+from IPython.display import Image
+sns.set()  # use the prettier seaborn plotting settings rather than the default matplotlib one
+```
+
+%% Cell type:markdown id: tags:
+
+> We will mostly be using `seaborn` instead of `matplotlib` for
+> visualisation. But `seaborn` is actually an extension to `matplotlib`, so we
+> are still using the latter under the hood.
+
+## Loading in data
+
+Pandas supports a wide range of I/O tools to load from text files, binary files,
+and SQL databases. You can find a table with all formats
+[here](http://pandas.pydata.org/pandas-docs/stable/io.html).
+
+%% Cell type:code id: tags:
+
+``` 
+titanic = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
+titanic
+```
+
+%% Cell type:markdown id: tags:
+
+This loads the data into a
+[`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
+object, which is the main object we will be interacting with in pandas. It
+represents a table of data.  The other file formats all start with
+`pd.read_{format}`.  Note that we can provide the URL to the dataset, rather
+than download it beforehand.
+
+We can write out the dataset using `dataframe.to_{format}(<filename>)`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.to_csv('titanic_copy.csv', index=False)  # we set index to False to prevent pandas from storing the row names
+```
+
+%% Cell type:markdown id: tags:
+
+If you can not connect to the internet, you can run the command below to load
+this locally stored titanic dataset
+
+%% Cell type:code id: tags:
+
+``` 
+titanic = pd.read_csv('titanic.csv')
+titanic
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the titanic dataset was also available to us as one of the standard
+datasets included with seaborn. We could load it from there using
+
+%% Cell type:code id: tags:
+
+``` 
+sns.load_dataset('titanic')
+```
+
+%% Cell type:markdown id: tags:
+
+`Dataframes` can also be created from other python objects, using
+`pd.DataFrame.from_{other type}`. The most useful of these is `from_dict`,
+which converts a mapping of the columns to a pandas `DataFrame` (i.e., table).
+
+%% Cell type:code id: tags:
+
+``` 
+pd.DataFrame.from_dict({
+    'random numbers': np.random.rand(5),
+    'sequence (int)': np.arange(5),
+    'sequence (float)': np.linspace(0, 5, 5),
+    'letters': list('abcde'),
+    'constant_value': 'same_value'
+})
+```
+
+%% Cell type:markdown id: tags:
+
+For many applications (e.g., ICA, machine learning input) you might want to
+extract your data as a numpy array. The underlying numpy array can be accessed
+using the `to_numpy` method
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.to_numpy()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the type of the returned array is the most common type (in this case
+object). If you just want the numeric parts of the table you can use
+`select_dtypes`, which selects specific columns based on their dtype:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.select_dtypes(include=np.number).to_numpy()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the numpy array has no information on the column names or row indices.
+Alternatively, when you want to include the categorical variables in your later
+analysis (e.g., for machine learning), you can extract dummy variables using:
+
+%% Cell type:code id: tags:
+
+``` 
+pd.get_dummies(titanic)
+```
+
+%% Cell type:markdown id: tags:
+
+## Accessing parts of the data
+
+[Documentation on indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html)
+
+### Selecting columns by name
+
+Single columns can be selected using the normal python indexing:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic['embark_town']
+```
+
+%% Cell type:markdown id: tags:
+
+If the column names are simple strings (not required) we can also access it
+directly as an attribute
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.embark_town
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this returns a pandas
+[`Series`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)
+rather than a `DataFrame` object. A `Series` is simply a 1-dimensional array
+representing a single column.  Multiple columns can be returned by providing a
+list of columns names. This will return a `DataFrame`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[['class', 'alive']]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that you have to provide a list here (square brackets). If you provide a
+tuple (round brackets) pandas will think you are trying to access a single
+column that has that tuple as a name:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[('class', 'alive')]
+```
+
+%% Cell type:markdown id: tags:
+
+In this case there is no column called `('class', 'alive')` leading to an
+error.  Later on we will see some uses to having columns named like this.
+
+### Indexing rows by name or integer
+
+Individual rows can be accessed based on their name (i.e., the index) or integer
+(i.e., which row it is in). In our current table this will give the same
+results. To ensure that these are different, let's sort our titanic dataset
+based on the passenger fare:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted = titanic.sort_values('fare')
+titanic_sorted
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the re-sorting did not change the values in the index (i.e., left-most
+column).
+
+We can select the first row of this newly sorted table using `iloc`
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.iloc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+We can select the row with the index 0 using
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.loc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this gives the same passenger as the first row of the initial table
+before sorting
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.iloc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+Another common way to access the first or last N rows of a table is using the
+head/tail methods
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.head(3)
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.tail(3)
+```
+
+%% Cell type:markdown id: tags:
+
+Note that nearly all methods in pandas return a new `Dataframe`, which means
+that we can easily call another method on them
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.tail(10).head(5)  # select the first 5 of the last 10 passengers in the database
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.iloc[-10:-5]  # alternative way to get the same passengers
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: use sorting and tail/head or indexing to find the 10 youngest
+passengers on the titanic. Try to do this on a single line by chaining calls
+to the titanic `DataFrame` object
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.sort_values...
+```
+
+%% Cell type:markdown id: tags:
+
+### Indexing rows by value
+
+One final way to select specific columns is by their value
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[titanic.sex == 'female']  # selects all females
+```
+
+%% Cell type:code id: tags:
+
+``` 
+# select all passengers older than 60 who departed from Southampton
+titanic[(titanic.age > 60) & (titanic['embark_town'] == 'Southampton')]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this required typing `titanic` quite often. A quicker way to get the
+same result is using the `query` method, which is described in detail
+[here](http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method)
+(note that using the `query` method is also faster and uses a lot less
+memory).
+
+> You may have trouble using the `query` method with columns which have
+a name that cannot be used as a Python identifier.
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.query('(age > 60) & (embark_town == "Southampton")')
+```
+
+%% Cell type:markdown id: tags:
+
+When selecting a categorical multiple options from a categorical values you
+might want to use `isin`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[titanic['class'].isin(['First','Second'])]
+```
+
+%% Cell type:markdown id: tags:
+
+Particularly useful when selecting data like this is the `isna` method which
+finds all missing data
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[~titanic.age.isna()]  # select first few passengers whose age is not N/A
+```
+
+%% Cell type:markdown id: tags:
+
+This removing of missing numbers is so common that it has is own method
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.dropna()  # drops all passengers that have some datapoint missing
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.dropna(subset=['age', 'fare'])  # Only drop passengers with missing ages or fares
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: use sorting, indexing by value, `dropna` and `tail`/`head` or
+indexing to find the 10 oldest female passengers on the titanic. Try to do
+this on a single line by chaining calls to the titanic `DataFrame` object
+
+%% Cell type:code id: tags:
+
+``` 
+titanic...
+```
+
+%% Cell type:markdown id: tags:
+
+## Plotting the data
+
+Before we start analyzing the data, let's play around with visualizing it.
+Pandas does have some basic built-in plotting options:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.fare.hist(bins=20, log=True)
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.age.plot()
+```
+
+%% Cell type:markdown id: tags:
+
+To plot all variables simply call `plot` or `hist` on the full dataframe
+rather than a single Series (i.e., column). You might want to set `subplots=True`
+to plot each variable in a different subplot.
+
+Individual Series are essentially 1D arrays, so we can use them as such in
+`matplotlib`
+
+%% Cell type:code id: tags:
+
+``` 
+plt.scatter(titanic.age, titanic.fare)
+```
+
+%% Cell type:markdown id: tags:
+
+However, for most purposes much nicer plots can be obtained using
+[Seaborn](https://seaborn.pydata.org). Seaborn has support to produce plots
+showing the
+[univariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-univariate-distributions)
+or
+[bivariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-bivariate-distributions)
+distribution of data in a single or a grid of plots.  Most of the seaborn
+plotting functions expect to get a pandas `DataFrame` (although they will work
+with Numpy arrays as well). So we can plot age vs. fare like:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.jointplot('age', 'fare', data=titanic)
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: check the documentation from `sns.jointplot` (hover the mouse
+over the text `jointplot` and press shift-tab) to find out how to turn the
+scatter plot into a density (kde) map
+
+%% Cell type:code id: tags:
+
+``` 
+sns.jointplot('age', 'fare', data=titanic, ...)
+```
+
+%% Cell type:markdown id: tags:
+
+Here is just a brief example of how we can use multiple columns to illustrate
+the data in more detail
+
+%% Cell type:code id: tags:
+
+``` 
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third'))
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: Split the plot above into two rows with the first row including
+the passengers who survived and the second row those who did not (you might
+have to check the documentation again by using shift-tab while overing the
+mouse over `relplot`)
+
+%% Cell type:code id: tags:
+
+``` 
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third')...)
+```
+
+%% Cell type:markdown id: tags:
+
+One of the nice thing of Seaborn is how easy it is to update how these plots
+look. You can read more about that
+[here](https://seaborn.pydata.org/tutorial/aesthetics.html). For example, to
+increase the font size to get a plot more approriate for a talk, you can use:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.set_context('talk')
+sns.violinplot(x='class', y='age', hue='sex', data=titanic, split=True,
+               order=('First', 'Second', 'Third'))
+```
+
+%% Cell type:markdown id: tags:
+
+## Summarizing the data (mean, std, etc.)
+
+There are a large number of built-in methods to summarize the observations in
+a Pandas `DataFrame`. Most of these will return a `Series` with the columns
+names as index:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.mean()
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.quantile(0.75)
+```
+
+%% Cell type:markdown id: tags:
+
+One very useful one is `describe`, which gives an overview of many common
+summary measures
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.describe()
+```
+
+%% Cell type:markdown id: tags:
+
+For a more detailed exploration of the data, you might want to check
+[pandas_profiliing](https://pandas-profiling.github.io/pandas-profiling/docs/)
+(not installed in fslpython, so the following will not run in fslpython):
+
+%% Cell type:code id: tags:
+
+``` 
+from pandas_profiling import ProfileReport
+profile = ProfileReport(titanic, title='Titanic Report', html={'style':{'full_width':True}})
+profile.to_widgets()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that non-numeric columns are ignored when summarizing data in this way.
+
+We can also define our own functions to apply to the columns (in this case we
+have to explicitly set the data types).
+
+%% Cell type:code id: tags:
+
+``` 
+def mad(series):
+    """
+    Computes the median absolute deviatation (MAD)
+
+    This is a outlier-resistant measure of the standard deviation
+    """
+    no_nan = series.dropna()
+    return np.median(abs(no_nan - np.nanmedian(no_nan)))
+
+titanic.select_dtypes(np.number).apply(mad)
+```
+
+%% Cell type:markdown id: tags:
+
+We can also provide multiple functions to the `apply` method (note that
+functions can be provided as strings)
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.select_dtypes(np.number).apply(['mean', np.median, np.std, mad])
+```
+
+%% Cell type:markdown id: tags:
+
+### Grouping by
+
+One of the more powerful features of is `groupby`, which splits the dataset on
+a categorical variable. The book contains a clear tutorial on that feature
+[here](https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html). You
+can check the pandas documentation
+[here](http://pandas.pydata.org/pandas-docs/stable/groupby.html) for a more
+formal introduction. One simple use is just to put it into a loop
+
+%% Cell type:code id: tags:
+
+``` 
+for cls, part_table in titanic.groupby('class'):
+    print(f'Mean fare in {cls.lower()} class: {part_table.fare.mean()}')
+```
+
+%% Cell type:markdown id: tags:
+
+However, it is more often combined with one of the aggregation functions
+discussed above as illustrated in this figure from the [Python data science
+handbook](https://jakevdp.github.io/PythonDataScienceHandbook/06.00-figure-code.html#Split-Apply-Combine)
+
+![group by image](group_by.png)
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby('class').mean()
+```
+
+%% Cell type:markdown id: tags:
+
+We can also group by multiple variables at once
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'survived']).mean()  # as always in pandas supply multiple column names as lists, not tuples
+```
+
+%% Cell type:markdown id: tags:
+
+When grouping it can help to use the `cut` method to split a continuous variable
+into a categorical one
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', pd.cut(titanic.age, bins=(0, 18, 50, np.inf))]).mean()
+```
+
+%% Cell type:markdown id: tags:
+
+We can use the `aggregate` method to apply a different function to each series
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+%% Cell type:markdown id: tags:
+
+Note that both the index (on the left) and the column names (on the top) now
+have multiple levels. Such a multi-level index is referred to as `MultiIndex`.
+This does complicate selecting specific columns/rows. You can read more of using
+`MultiIndex` [here](http://pandas.pydata.org/pandas-docs/stable/advanced.html).
+The short version is that columns can be selected using direct indexing (as
+discussed above)
+
+%% Cell type:code id: tags:
+
+``` 
+df_full = titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full[('age', 'median')]  # selects median age column; note that the round brackets are optional
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full['age']  # selects both age columns
+```
+
+%% Cell type:markdown id: tags:
+
+Remember that indexing based on the index was done through `loc`. The rest is
+the same as for the columns above
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.loc[('First', 0)]
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.loc['First']
+```
+
+%% Cell type:markdown id: tags:
+
+More advanced use of the `MultiIndex` is possible through `xs`:
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.xs(0, level='survived') # selects all the zero's from the survived index
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.xs('mad', axis=1, level=1) # selects mad from the second level in the columns (i.e., axis=1)
+```
+
+%% Cell type:markdown id: tags:
+
+## Reshaping tables
+
+If we were interested in how the survival rate depends on the class and sex of
+the passengers we could simply use a groupby:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'sex']).survived.mean()
+```
+
+%% Cell type:markdown id: tags:
+
+However, this single-column table is difficult to read. The reason for this is
+that the indexing is multi-leveled (called `MultiIndex` in pandas), while there
+is only a single column. We would like to move one of the levels in the index to
+the columns. This can be done using `stack`/`unstack`:
+
+- `unstack`: Moves one levels in the index to the columns
+- `stack`: Moves one of levels in the columns to the index
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'sex']).survived.mean().unstack('sex')
+```
+
+%% Cell type:markdown id: tags:
+
+The former table, where the different groups are defined in different rows, is
+often referred to as long-form. After unstacking the table is often referred to
+as wide-form as the different group (sex in this case) is now represented as
+different columns. In pandas some operations are easier on long-form tables
+(e.g., `groupby`) while others require wide_form tables (e.g., making scatter
+plots of two variables). You can go back and forth using `unstack` or `stack` as
+illustrated above, but as this is a crucial part of pandas there are many
+alternatives, such as `pivot_table`, `melt`, and `wide_to_long`, which we will
+discuss below.
+
+We can prettify the table further using seaborn
+
+%% Cell type:code id: tags:
+
+``` 
+ax = sns.heatmap(titanic.groupby(['class', 'sex']).survived.mean().unstack('sex'),
+                 annot=True)
+ax.set_title('survival rate')
+```
+
+%% Cell type:markdown id: tags:
+
+Note that there are also many ways to produce prettier tables in pandas (e.g.,
+color all the negative values). This is documented
+[here](http://pandas.pydata.org/pandas-docs/stable/style.html).
+
+Because this stacking/unstacking is fairly common after a groupby operation,
+there is a shortcut for it: `pivot_table`
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.pivot_table('survived', 'class', 'sex')
+```
+
+%% Cell type:markdown id: tags:
+
+As usual in pandas, where we can also provide multiple column names
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))]), annot=True)
+```
+
+%% Cell type:markdown id: tags:
+
+We can also change the function to be used to aggregate the data
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))],
+                                aggfunc='count'), annot=True)
+```
+
+%% Cell type:markdown id: tags:
+
+As in `groupby` the aggregation function can be a string of a common aggregation
+function, or any function that should be applied.
+
+We can even apply different aggregate functions to different columns
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.pivot_table(index='class', columns='sex',
+                    aggfunc={'survived': 'count', 'fare': np.mean}) # compute number of survivors and mean fare
+```
+
+%% Cell type:markdown id: tags:
+
+The opposite of `pivot_table` is `melt`. This can be used to change a wide-form
+table into a long-form table. This is not particularly useful on the titanic
+dataset, so let's create a new table where this might be useful. Let's say we
+have a dataset listing the FA and MD values in various WM tracts:
+
+%% Cell type:code id: tags:
+
+``` 
+tracts = ('Corpus callosum', 'Internal capsule', 'SLF', 'Arcuate fasciculus')
+df_wide = pd.DataFrame.from_dict(dict({'subject': list('ABCDEFGHIJ')}, **{
+    f'FA({tract})': np.random.rand(10) for tract in tracts }, **{
+    f'MD({tract})': np.random.rand(10) * 1e-3 for tract in tracts
+}))
+df_wide
+```
+
+%% Cell type:markdown id: tags:
+
+This wide-form table (i.e., all the information is in different columns) makes
+it hard to select just all the FA values or only the values associated with the
+SLF. For this it would be easier to list all the values in a single column.
+Most of the tools discussed above (e.g., `group_by` or `seaborn` plotting) work
+better with long-form data, which we can obtain from `melt`:
+
+%% Cell type:code id: tags:
+
+``` 
+df_long = df_wide.melt('subject', var_name='measurement', value_name='dti_value')
+df_long.head(12)
+```
+
+%% Cell type:markdown id: tags:
+
+We can see that `melt` took all the columns (we could also have specified a
+specific sub-set) and returned each measurement as a seperate row. We probably
+want to seperate the measurement column into the measurement type (FA or MD) and
+the tract name. Many string manipulation function are available in the
+`DataFrame` object under `DataFrame.str`
+([tutorial](http://pandas.pydata.org/pandas-docs/stable/text.html))
+
+%% Cell type:code id: tags:
+
+``` 
+df_long['variable'] = df_long.measurement.str.slice(0, 2)  # first two letters correspond to FA or MD
+df_long['tract'] = df_long.measurement.str.slice(3, -1)  # fourth till the second-to-last letter correspond to the tract
+df_long.head(12)
+```
+
+%% Cell type:markdown id: tags:
+
+Finally we probably do want the FA and MD variables as different columns.
+
+**Exercise**: Use `pivot_table` or `stack`/`unstack` to create a column for MD
+and FA.
+
+%% Cell type:code id: tags:
+
+``` 
+df_unstacked = df_long.
+```
+
+%% Cell type:markdown id: tags:
+
+We can now use the tools discussed above to visualize the table (`seaborn`) or
+to group the table based on tract (`groupby` or `pivot_table`).
+
+%% Cell type:code id: tags:
+
+``` 
+# feel free to analyze this random data in more detail
+```
+
+%% Cell type:markdown id: tags:
+
+In general pandas is better at handling long-form than wide-form data, although
+for better visualization of the data an intermediate format is often best. One
+exception is calculating a covariance (`DataFrame.cov`) or correlation
+(`DataFrame.corr`) matrices which computes the correlation between each column:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(df_wide.corr(), cmap=sns.diverging_palette(240, 10, s=99, n=300), )
+```
+
+%% Cell type:markdown id: tags:
+
+## Linear fitting (`statsmodels`)
+
+Linear fitting between the different columns is available through the
+[`statsmodels`](https://www.statsmodels.org/stable/index.html) library. A nice
+way to play around with a wide variety of possible models is to use R-style
+functions. The usage of the functions in `statsmodels` is described
+[here](https://www.statsmodels.org/dev/example_formulas.html). You can find a
+more detailed description of the R-style functions
+[here](https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-
+language).
+
+In short these functions describe the linear model as a string. For example,
+`"y ~ x + a + x * a"` fits the variable `y` as a function of `x`, `a`, and the
+interaction between `x` and `a`. The intercept is included by default (you can
+add `"+ 0"` to remove it).
+
+%% Cell type:code id: tags:
+
+``` 
+result = smf.logit('survived ~ age + sex + age * sex', data=titanic).fit()
+print(result.summary())
+```
+
+%% Cell type:markdown id: tags:
+
+Note that `statsmodels` understands categorical variables and automatically
+replaces them with dummy variables.
+
+Above we used logistic regression, which is appropriate for the binary
+survival rate. A wide variety of linear models are available. Let's try a GLM,
+but assume that the fare is drawn from a Gamma distribution:
+
+%% Cell type:code id: tags:
+
+``` 
+age_dmean = titanic.age - titanic.age.mean()
+result = smf.glm('fare ~ age_dmean + embark_town', data=titanic).fit()
+print(result.summary())
+```
+
+%% Cell type:markdown id: tags:
+
+Cherbourg passengers clearly paid a lot more...
+
+
+Note that we did not actually add the `age_dmean` to the
+`DataFrame`. `statsmodels` (or more precisely the underlying
+[patsy](https://patsy.readthedocs.io/en/latest/) library) automatically
+extracted this from our environment. This can lead to confusing behaviour...
+
+# More reading
+
+Other useful features
+
+- [Concatenating and merging tables](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html)
+- [Lots of time series support](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)
+- [Rolling Window
+  functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-
+  functions) for after you have meaningfully sorted your data
+- and much, much more
+%% Cell type:markdown id: tags:
+
+# Pandas
+
+Follow along online at: https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/-/blob/master/talks/pandas/pandas.ipynb
+
+Pandas is a data analysis library focused on the cleaning and exploration of
+tabular data.
+
+Some useful links are:
+- [main website](https://pandas.pydata.org)
+- [documentation](http://pandas.pydata.org/pandas-docs/stable/)<sup>1</sup>
+- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)<sup>1</sup> by
+  Jake van der Plas
+- [List of Pandas tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)
+
+<sup>1</sup> This tutorial borrows heavily from the pandas documentation and
+the Python Data Science Handbook
+
+%% Cell type:code id: tags:
+
+``` 
+%pylab inline
+import pandas as pd  # pd is the usual abbreviation for pandas
+import matplotlib.pyplot as plt # matplotlib for plotting
+import seaborn as sns  # seaborn is the main plotting library for Pandas
+import statsmodels.api as sm  # statsmodels fits linear models to pandas data
+import statsmodels.formula.api as smf
+from IPython.display import Image
+sns.set()  # use the prettier seaborn plotting settings rather than the default matplotlib one
+```
+
+%% Cell type:markdown id: tags:
+
+> We will mostly be using `seaborn` instead of `matplotlib` for
+> visualisation. But `seaborn` is actually an extension to `matplotlib`, so we
+> are still using the latter under the hood.
+
+## Loading in data
+
+Pandas supports a wide range of I/O tools to load from text files, binary files,
+and SQL databases. You can find a table with all formats
+[here](http://pandas.pydata.org/pandas-docs/stable/io.html).
+
+%% Cell type:code id: tags:
+
+``` 
+titanic = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
+titanic
+```
+
+%% Cell type:markdown id: tags:
+
+This loads the data into a
+[`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
+object, which is the main object we will be interacting with in pandas. It
+represents a table of data.  The other file formats all start with
+`pd.read_{format}`.  Note that we can provide the URL to the dataset, rather
+than download it beforehand.
+
+We can write out the dataset using `dataframe.to_{format}(<filename>)`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.to_csv('titanic_copy.csv', index=False)  # we set index to False to prevent pandas from storing the row names
+```
+
+%% Cell type:markdown id: tags:
+
+If you can not connect to the internet, you can run the command below to load
+this locally stored titanic dataset
+
+%% Cell type:code id: tags:
+
+``` 
+titanic = pd.read_csv('titanic.csv')
+titanic
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the titanic dataset was also available to us as one of the standard
+datasets included with seaborn. We could load it from there using
+
+%% Cell type:code id: tags:
+
+``` 
+sns.load_dataset('titanic')
+```
+
+%% Cell type:markdown id: tags:
+
+`Dataframes` can also be created from other python objects, using
+`pd.DataFrame.from_{other type}`. The most useful of these is `from_dict`,
+which converts a mapping of the columns to a pandas `DataFrame` (i.e., table).
+
+%% Cell type:code id: tags:
+
+``` 
+pd.DataFrame.from_dict({
+    'random numbers': np.random.rand(5),
+    'sequence (int)': np.arange(5),
+    'sequence (float)': np.linspace(0, 5, 5),
+    'letters': list('abcde'),
+    'constant_value': 'same_value'
+})
+```
+
+%% Cell type:markdown id: tags:
+
+For many applications (e.g., ICA, machine learning input) you might want to
+extract your data as a numpy array. The underlying numpy array can be accessed
+using the `to_numpy` method
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.to_numpy()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the type of the returned array is the most common type (in this case
+object). If you just want the numeric parts of the table you can use
+`select_dtypes`, which selects specific columns based on their dtype:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.select_dtypes(include=np.number).to_numpy()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the numpy array has no information on the column names or row indices.
+Alternatively, when you want to include the categorical variables in your later
+analysis (e.g., for machine learning), you can extract dummy variables using:
+
+%% Cell type:code id: tags:
+
+``` 
+pd.get_dummies(titanic)
+```
+
+%% Cell type:markdown id: tags:
+
+## Accessing parts of the data
+
+[Documentation on indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html)
+
+### Selecting columns by name
+
+Single columns can be selected using the normal python indexing:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic['embark_town']
+```
+
+%% Cell type:markdown id: tags:
+
+If the column names are simple strings (not required) we can also access it
+directly as an attribute
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.embark_town
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this returns a pandas
+[`Series`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)
+rather than a `DataFrame` object. A `Series` is simply a 1-dimensional array
+representing a single column.  Multiple columns can be returned by providing a
+list of columns names. This will return a `DataFrame`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[['class', 'alive']]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that you have to provide a list here (square brackets). If you provide a
+tuple (round brackets) pandas will think you are trying to access a single
+column that has that tuple as a name:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[('class', 'alive')]
+```
+
+%% Cell type:markdown id: tags:
+
+In this case there is no column called `('class', 'alive')` leading to an
+error.  Later on we will see some uses to having columns named like this.
+
+### Indexing rows by name or integer
+
+Individual rows can be accessed based on their name (i.e., the index) or integer
+(i.e., which row it is in). In our current table this will give the same
+results. To ensure that these are different, let's sort our titanic dataset
+based on the passenger fare:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted = titanic.sort_values('fare')
+titanic_sorted
+```
+
+%% Cell type:markdown id: tags:
+
+Note that the re-sorting did not change the values in the index (i.e., left-most
+column).
+
+We can select the first row of this newly sorted table using `iloc`
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.iloc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+We can select the row with the index 0 using
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.loc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this gives the same passenger as the first row of the initial table
+before sorting
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.iloc[0]
+```
+
+%% Cell type:markdown id: tags:
+
+Another common way to access the first or last N rows of a table is using the
+head/tail methods
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.head(3)
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.tail(3)
+```
+
+%% Cell type:markdown id: tags:
+
+Note that nearly all methods in pandas return a new `Dataframe`, which means
+that we can easily call another method on them
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.tail(10).head(5)  # select the first 5 of the last 10 passengers in the database
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic_sorted.iloc[-10:-5]  # alternative way to get the same passengers
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: use sorting and tail/head or indexing to find the 10 youngest
+passengers on the titanic. Try to do this on a single line by chaining calls
+to the titanic `DataFrame` object
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.sort_values...
+```
+
+%% Cell type:markdown id: tags:
+
+### Indexing rows by value
+
+One final way to select specific columns is by their value
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[titanic.sex == 'female']  # selects all females
+```
+
+%% Cell type:code id: tags:
+
+``` 
+# select all passengers older than 60 who departed from Southampton
+titanic[(titanic.age > 60) & (titanic['embark_town'] == 'Southampton')]
+```
+
+%% Cell type:markdown id: tags:
+
+Note that this required typing `titanic` quite often. A quicker way to get the
+same result is using the `query` method, which is described in detail
+[here](http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method)
+(note that using the `query` method is also faster and uses a lot less
+memory).
+
+> You may have trouble using the `query` method with columns which have
+a name that cannot be used as a Python identifier.
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.query('(age > 60) & (embark_town == "Southampton")')
+```
+
+%% Cell type:markdown id: tags:
+
+When selecting a categorical multiple options from a categorical values you
+might want to use `isin`:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[titanic['class'].isin(['First','Second'])]
+```
+
+%% Cell type:markdown id: tags:
+
+Particularly useful when selecting data like this is the `isna` method which
+finds all missing data
+
+%% Cell type:code id: tags:
+
+``` 
+titanic[~titanic.age.isna()]  # select first few passengers whose age is not N/A
+```
+
+%% Cell type:markdown id: tags:
+
+This removing of missing numbers is so common that it has is own method
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.dropna()  # drops all passengers that have some datapoint missing
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.dropna(subset=['age', 'fare'])  # Only drop passengers with missing ages or fares
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: use sorting, indexing by value, `dropna` and `tail`/`head` or
+indexing to find the 10 oldest female passengers on the titanic. Try to do
+this on a single line by chaining calls to the titanic `DataFrame` object
+
+%% Cell type:code id: tags:
+
+``` 
+titanic...
+```
+
+%% Cell type:markdown id: tags:
+
+## Plotting the data
+
+Before we start analyzing the data, let's play around with visualizing it.
+Pandas does have some basic built-in plotting options:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.fare.hist(bins=20, log=True)
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.age.plot()
+```
+
+%% Cell type:markdown id: tags:
+
+To plot all variables simply call `plot` or `hist` on the full dataframe
+rather than a single Series (i.e., column). You might want to set `subplots=True`
+to plot each variable in a different subplot.
+
+Individual Series are essentially 1D arrays, so we can use them as such in
+`matplotlib`
+
+%% Cell type:code id: tags:
+
+``` 
+plt.scatter(titanic.age, titanic.fare)
+```
+
+%% Cell type:markdown id: tags:
+
+However, for most purposes much nicer plots can be obtained using
+[Seaborn](https://seaborn.pydata.org). Seaborn has support to produce plots
+showing the
+[univariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-univariate-distributions)
+or
+[bivariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-bivariate-distributions)
+distribution of data in a single or a grid of plots.  Most of the seaborn
+plotting functions expect to get a pandas `DataFrame` (although they will work
+with Numpy arrays as well). So we can plot age vs. fare like:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.jointplot('age', 'fare', data=titanic)
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: check the documentation from `sns.jointplot` (hover the mouse
+over the text `jointplot` and press shift-tab) to find out how to turn the
+scatter plot into a density (kde) map
+
+%% Cell type:code id: tags:
+
+``` 
+sns.jointplot('age', 'fare', data=titanic, ...)
+```
+
+%% Cell type:markdown id: tags:
+
+Here is just a brief example of how we can use multiple columns to illustrate
+the data in more detail
+
+%% Cell type:code id: tags:
+
+``` 
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third'))
+```
+
+%% Cell type:markdown id: tags:
+
+**Exercise**: Split the plot above into two rows with the first row including
+the passengers who survived and the second row those who did not (you might
+have to check the documentation again by using shift-tab while overing the
+mouse over `relplot`)
+
+%% Cell type:code id: tags:
+
+``` 
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third')...)
+```
+
+%% Cell type:markdown id: tags:
+
+One of the nice thing of Seaborn is how easy it is to update how these plots
+look. You can read more about that
+[here](https://seaborn.pydata.org/tutorial/aesthetics.html). For example, to
+increase the font size to get a plot more approriate for a talk, you can use:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.set_context('talk')
+sns.violinplot(x='class', y='age', hue='sex', data=titanic, split=True,
+               order=('First', 'Second', 'Third'))
+```
+
+%% Cell type:markdown id: tags:
+
+## Summarizing the data (mean, std, etc.)
+
+There are a large number of built-in methods to summarize the observations in
+a Pandas `DataFrame`. Most of these will return a `Series` with the columns
+names as index:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.mean()
+```
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.quantile(0.75)
+```
+
+%% Cell type:markdown id: tags:
+
+One very useful one is `describe`, which gives an overview of many common
+summary measures
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.describe()
+```
+
+%% Cell type:markdown id: tags:
+
+For a more detailed exploration of the data, you might want to check
+[pandas_profiliing](https://pandas-profiling.github.io/pandas-profiling/docs/)
+(not installed in fslpython, so the following will not run in fslpython):
+
+%% Cell type:code id: tags:
+
+``` 
+from pandas_profiling import ProfileReport
+profile = ProfileReport(titanic, title='Titanic Report', html={'style':{'full_width':True}})
+profile.to_widgets()
+```
+
+%% Cell type:markdown id: tags:
+
+Note that non-numeric columns are ignored when summarizing data in this way.
+
+We can also define our own functions to apply to the columns (in this case we
+have to explicitly set the data types).
+
+%% Cell type:code id: tags:
+
+``` 
+def mad(series):
+    """
+    Computes the median absolute deviatation (MAD)
+
+    This is a outlier-resistant measure of the standard deviation
+    """
+    no_nan = series.dropna()
+    return np.median(abs(no_nan - np.nanmedian(no_nan)))
+
+titanic.select_dtypes(np.number).apply(mad)
+```
+
+%% Cell type:markdown id: tags:
+
+We can also provide multiple functions to the `apply` method (note that
+functions can be provided as strings)
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.select_dtypes(np.number).apply(['mean', np.median, np.std, mad])
+```
+
+%% Cell type:markdown id: tags:
+
+### Grouping by
+
+One of the more powerful features of is `groupby`, which splits the dataset on
+a categorical variable. The book contains a clear tutorial on that feature
+[here](https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html). You
+can check the pandas documentation
+[here](http://pandas.pydata.org/pandas-docs/stable/groupby.html) for a more
+formal introduction. One simple use is just to put it into a loop
+
+%% Cell type:code id: tags:
+
+``` 
+for cls, part_table in titanic.groupby('class'):
+    print(f'Mean fare in {cls.lower()} class: {part_table.fare.mean()}')
+```
+
+%% Cell type:markdown id: tags:
+
+However, it is more often combined with one of the aggregation functions
+discussed above as illustrated in this figure from the [Python data science
+handbook](https://jakevdp.github.io/PythonDataScienceHandbook/06.00-figure-code.html#Split-Apply-Combine)
+
+![group by image](group_by.png)
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby('class').mean()
+```
+
+%% Cell type:markdown id: tags:
+
+We can also group by multiple variables at once
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'survived']).mean()  # as always in pandas supply multiple column names as lists, not tuples
+```
+
+%% Cell type:markdown id: tags:
+
+When grouping it can help to use the `cut` method to split a continuous variable
+into a categorical one
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', pd.cut(titanic.age, bins=(0, 18, 50, np.inf))]).mean()
+```
+
+%% Cell type:markdown id: tags:
+
+We can use the `aggregate` method to apply a different function to each series
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+%% Cell type:markdown id: tags:
+
+Note that both the index (on the left) and the column names (on the top) now
+have multiple levels. Such a multi-level index is referred to as `MultiIndex`.
+This does complicate selecting specific columns/rows. You can read more of using
+`MultiIndex` [here](http://pandas.pydata.org/pandas-docs/stable/advanced.html).
+The short version is that columns can be selected using direct indexing (as
+discussed above)
+
+%% Cell type:code id: tags:
+
+``` 
+df_full = titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full[('age', 'median')]  # selects median age column; note that the round brackets are optional
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full['age']  # selects both age columns
+```
+
+%% Cell type:markdown id: tags:
+
+Remember that indexing based on the index was done through `loc`. The rest is
+the same as for the columns above
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.loc[('First', 0)]
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.loc['First']
+```
+
+%% Cell type:markdown id: tags:
+
+More advanced use of the `MultiIndex` is possible through `xs`:
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.xs(0, level='survived') # selects all the zero's from the survived index
+```
+
+%% Cell type:code id: tags:
+
+``` 
+df_full.xs('mad', axis=1, level=1) # selects mad from the second level in the columns (i.e., axis=1)
+```
+
+%% Cell type:markdown id: tags:
+
+## Reshaping tables
+
+If we were interested in how the survival rate depends on the class and sex of
+the passengers we could simply use a groupby:
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'sex']).survived.mean()
+```
+
+%% Cell type:markdown id: tags:
+
+However, this single-column table is difficult to read. The reason for this is
+that the indexing is multi-leveled (called `MultiIndex` in pandas), while there
+is only a single column. We would like to move one of the levels in the index to
+the columns. This can be done using `stack`/`unstack`:
+
+- `unstack`: Moves one levels in the index to the columns
+- `stack`: Moves one of levels in the columns to the index
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.groupby(['class', 'sex']).survived.mean().unstack('sex')
+```
+
+%% Cell type:markdown id: tags:
+
+The former table, where the different groups are defined in different rows, is
+often referred to as long-form. After unstacking the table is often referred to
+as wide-form as the different group (sex in this case) is now represented as
+different columns. In pandas some operations are easier on long-form tables
+(e.g., `groupby`) while others require wide_form tables (e.g., making scatter
+plots of two variables). You can go back and forth using `unstack` or `stack` as
+illustrated above, but as this is a crucial part of pandas there are many
+alternatives, such as `pivot_table`, `melt`, and `wide_to_long`, which we will
+discuss below.
+
+We can prettify the table further using seaborn
+
+%% Cell type:code id: tags:
+
+``` 
+ax = sns.heatmap(titanic.groupby(['class', 'sex']).survived.mean().unstack('sex'),
+                 annot=True)
+ax.set_title('survival rate')
+```
+
+%% Cell type:markdown id: tags:
+
+Note that there are also many ways to produce prettier tables in pandas (e.g.,
+color all the negative values). This is documented
+[here](http://pandas.pydata.org/pandas-docs/stable/style.html).
+
+Because this stacking/unstacking is fairly common after a groupby operation,
+there is a shortcut for it: `pivot_table`
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.pivot_table('survived', 'class', 'sex')
+```
+
+%% Cell type:markdown id: tags:
+
+As usual in pandas, where we can also provide multiple column names
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))]), annot=True)
+```
+
+%% Cell type:markdown id: tags:
+
+We can also change the function to be used to aggregate the data
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))],
+                                aggfunc='count'), annot=True)
+```
+
+%% Cell type:markdown id: tags:
+
+As in `groupby` the aggregation function can be a string of a common aggregation
+function, or any function that should be applied.
+
+We can even apply different aggregate functions to different columns
+
+%% Cell type:code id: tags:
+
+``` 
+titanic.pivot_table(index='class', columns='sex',
+                    aggfunc={'survived': 'count', 'fare': np.mean}) # compute number of survivors and mean fare
+```
+
+%% Cell type:markdown id: tags:
+
+The opposite of `pivot_table` is `melt`. This can be used to change a wide-form
+table into a long-form table. This is not particularly useful on the titanic
+dataset, so let's create a new table where this might be useful. Let's say we
+have a dataset listing the FA and MD values in various WM tracts:
+
+%% Cell type:code id: tags:
+
+``` 
+tracts = ('Corpus callosum', 'Internal capsule', 'SLF', 'Arcuate fasciculus')
+df_wide = pd.DataFrame.from_dict(dict({'subject': list('ABCDEFGHIJ')}, **{
+    f'FA({tract})': np.random.rand(10) for tract in tracts }, **{
+    f'MD({tract})': np.random.rand(10) * 1e-3 for tract in tracts
+}))
+df_wide
+```
+
+%% Cell type:markdown id: tags:
+
+This wide-form table (i.e., all the information is in different columns) makes
+it hard to select just all the FA values or only the values associated with the
+SLF. For this it would be easier to list all the values in a single column.
+Most of the tools discussed above (e.g., `group_by` or `seaborn` plotting) work
+better with long-form data, which we can obtain from `melt`:
+
+%% Cell type:code id: tags:
+
+``` 
+df_long = df_wide.melt('subject', var_name='measurement', value_name='dti_value')
+df_long.head(12)
+```
+
+%% Cell type:markdown id: tags:
+
+We can see that `melt` took all the columns (we could also have specified a
+specific sub-set) and returned each measurement as a seperate row. We probably
+want to seperate the measurement column into the measurement type (FA or MD) and
+the tract name. Many string manipulation function are available in the
+`DataFrame` object under `DataFrame.str`
+([tutorial](http://pandas.pydata.org/pandas-docs/stable/text.html))
+
+%% Cell type:code id: tags:
+
+``` 
+df_long['variable'] = df_long.measurement.str.slice(0, 2)  # first two letters correspond to FA or MD
+df_long['tract'] = df_long.measurement.str.slice(3, -1)  # fourth till the second-to-last letter correspond to the tract
+df_long.head(12)
+```
+
+%% Cell type:markdown id: tags:
+
+Finally we probably do want the FA and MD variables as different columns.
+
+**Exercise**: Use `pivot_table` or `stack`/`unstack` to create a column for MD
+and FA.
+
+%% Cell type:code id: tags:
+
+``` 
+df_unstacked = df_long.
+```
+
+%% Cell type:markdown id: tags:
+
+We can now use the tools discussed above to visualize the table (`seaborn`) or
+to group the table based on tract (`groupby` or `pivot_table`).
+
+%% Cell type:code id: tags:
+
+``` 
+# feel free to analyze this random data in more detail
+```
+
+%% Cell type:markdown id: tags:
+
+In general pandas is better at handling long-form than wide-form data, although
+for better visualization of the data an intermediate format is often best. One
+exception is calculating a covariance (`DataFrame.cov`) or correlation
+(`DataFrame.corr`) matrices which computes the correlation between each column:
+
+%% Cell type:code id: tags:
+
+``` 
+sns.heatmap(df_wide.corr(), cmap=sns.diverging_palette(240, 10, s=99, n=300), )
+```
+
+%% Cell type:markdown id: tags:
+
+## Linear fitting (`statsmodels`)
+
+Linear fitting between the different columns is available through the
+[`statsmodels`](https://www.statsmodels.org/stable/index.html) library. A nice
+way to play around with a wide variety of possible models is to use R-style
+functions. The usage of the functions in `statsmodels` is described
+[here](https://www.statsmodels.org/dev/example_formulas.html). You can find a
+more detailed description of the R-style functions
+[here](https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-
+language).
+
+In short these functions describe the linear model as a string. For example,
+`"y ~ x + a + x * a"` fits the variable `y` as a function of `x`, `a`, and the
+interaction between `x` and `a`. The intercept is included by default (you can
+add `"+ 0"` to remove it).
+
+%% Cell type:code id: tags:
+
+``` 
+result = smf.logit('survived ~ age + sex + age * sex', data=titanic).fit()
+print(result.summary())
+```
+
+%% Cell type:markdown id: tags:
+
+Note that `statsmodels` understands categorical variables and automatically
+replaces them with dummy variables.
+
+Above we used logistic regression, which is appropriate for the binary
+survival rate. A wide variety of linear models are available. Let's try a GLM,
+but assume that the fare is drawn from a Gamma distribution:
+
+%% Cell type:code id: tags:
+
+``` 
+age_dmean = titanic.age - titanic.age.mean()
+result = smf.glm('fare ~ age_dmean + embark_town', data=titanic).fit()
+print(result.summary())
+```
+
+%% Cell type:markdown id: tags:
+
+Cherbourg passengers clearly paid a lot more...
+
+
+Note that we did not actually add the `age_dmean` to the
+`DataFrame`. `statsmodels` (or more precisely the underlying
+[patsy](https://patsy.readthedocs.io/en/latest/) library) automatically
+extracted this from our environment. This can lead to confusing behaviour...
+
+# More reading
+
+Other useful features
+
+- [Concatenating and merging tables](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html)
+- [Lots of time series support](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)
+- [Rolling Window
+  functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-
+  functions) for after you have meaningfully sorted your data
+- and much, much more
--- a/talks/pandas/pandas.md
+++ b/talks/pandas/pandas.md
+# Pandas
+
+Follow along online at: https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/-/blob/master/talks/pandas/pandas.ipynb
+
+Pandas is a data analysis library focused on the cleaning and exploration of
+tabular data.
+
+Some useful links are:
+- [main website](https://pandas.pydata.org)
+- [documentation](http://pandas.pydata.org/pandas-docs/stable/)<sup>1</sup>
+- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)<sup>1</sup> by
+  Jake van der Plas
+- [List of Pandas tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)
+
+<sup>1</sup> This tutorial borrows heavily from the pandas documentation and
+the Python Data Science Handbook
+
+```
+%pylab inline
+import pandas as pd  # pd is the usual abbreviation for pandas
+import matplotlib.pyplot as plt # matplotlib for plotting
+import seaborn as sns  # seaborn is the main plotting library for Pandas
+import statsmodels.api as sm  # statsmodels fits linear models to pandas data
+import statsmodels.formula.api as smf
+from IPython.display import Image
+sns.set()  # use the prettier seaborn plotting settings rather than the default matplotlib one
+```
+
+> We will mostly be using `seaborn` instead of `matplotlib` for
+> visualisation. But `seaborn` is actually an extension to `matplotlib`, so we
+> are still using the latter under the hood.
+
+## Loading in data
+
+Pandas supports a wide range of I/O tools to load from text files, binary files,
+and SQL databases. You can find a table with all formats
+[here](http://pandas.pydata.org/pandas-docs/stable/io.html).
+
+```
+titanic = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
+titanic
+```
+
+This loads the data into a
+[`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
+object, which is the main object we will be interacting with in pandas. It
+represents a table of data.  The other file formats all start with
+`pd.read_{format}`.  Note that we can provide the URL to the dataset, rather
+than download it beforehand.
+
+We can write out the dataset using `dataframe.to_{format}(<filename>)`:
+
+```
+titanic.to_csv('titanic_copy.csv', index=False)  # we set index to False to prevent pandas from storing the row names
+```
+
+If you can not connect to the internet, you can run the command below to load
+this locally stored titanic dataset
+
+```
+titanic = pd.read_csv('titanic.csv')
+titanic
+```
+
+Note that the titanic dataset was also available to us as one of the standard
+datasets included with seaborn. We could load it from there using
+
+```
+sns.load_dataset('titanic')
+```
+
+`Dataframes` can also be created from other python objects, using
+`pd.DataFrame.from_{other type}`. The most useful of these is `from_dict`,
+which converts a mapping of the columns to a pandas `DataFrame` (i.e., table).
+
+```
+pd.DataFrame.from_dict({
+    'random numbers': np.random.rand(5),
+    'sequence (int)': np.arange(5),
+    'sequence (float)': np.linspace(0, 5, 5),
+    'letters': list('abcde'),
+    'constant_value': 'same_value'
+})
+```
+
+For many applications (e.g., ICA, machine learning input) you might want to
+extract your data as a numpy array. The underlying numpy array can be accessed
+using the `to_numpy` method
+
+```
+titanic.to_numpy()
+```
+
+Note that the type of the returned array is the most common type (in this case
+object). If you just want the numeric parts of the table you can use
+`select_dtypes`, which selects specific columns based on their dtype:
+
+```
+titanic.select_dtypes(include=np.number).to_numpy()
+```
+
+Note that the numpy array has no information on the column names or row indices.
+Alternatively, when you want to include the categorical variables in your later
+analysis (e.g., for machine learning), you can extract dummy variables using:
+
+```
+pd.get_dummies(titanic)
+```
+
+## Accessing parts of the data
+
+[Documentation on indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html)
+
+### Selecting columns by name
+
+Single columns can be selected using the normal python indexing:
+
+```
+titanic['embark_town']
+```
+
+If the column names are simple strings (not required) we can also access it
+directly as an attribute
+
+```
+titanic.embark_town
+```
+
+Note that this returns a pandas
+[`Series`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)
+rather than a `DataFrame` object. A `Series` is simply a 1-dimensional array
+representing a single column.  Multiple columns can be returned by providing a
+list of columns names. This will return a `DataFrame`:
+
+```
+titanic[['class', 'alive']]
+```
+
+Note that you have to provide a list here (square brackets). If you provide a
+tuple (round brackets) pandas will think you are trying to access a single
+column that has that tuple as a name:
+
+```
+titanic[('class', 'alive')]
+```
+
+In this case there is no column called `('class', 'alive')` leading to an
+error.  Later on we will see some uses to having columns named like this.
+
+### Indexing rows by name or integer
+
+Individual rows can be accessed based on their name (i.e., the index) or integer
+(i.e., which row it is in). In our current table this will give the same
+results. To ensure that these are different, let's sort our titanic dataset
+based on the passenger fare:
+
+```
+titanic_sorted = titanic.sort_values('fare')
+titanic_sorted
+```
+
+Note that the re-sorting did not change the values in the index (i.e., left-most
+column).
+
+We can select the first row of this newly sorted table using `iloc`
+
+```
+titanic_sorted.iloc[0]
+```
+
+We can select the row with the index 0 using
+
+```
+titanic_sorted.loc[0]
+```
+
+Note that this gives the same passenger as the first row of the initial table
+before sorting
+
+```
+titanic.iloc[0]
+```
+
+Another common way to access the first or last N rows of a table is using the
+head/tail methods
+
+```
+titanic_sorted.head(3)
+```
+
+```
+titanic_sorted.tail(3)
+```
+
+Note that nearly all methods in pandas return a new `Dataframe`, which means
+that we can easily call another method on them
+
+```
+titanic_sorted.tail(10).head(5)  # select the first 5 of the last 10 passengers in the database
+```
+
+```
+titanic_sorted.iloc[-10:-5]  # alternative way to get the same passengers
+```
+
+**Exercise**: use sorting and tail/head or indexing to find the 10 youngest
+passengers on the titanic. Try to do this on a single line by chaining calls
+to the titanic `DataFrame` object
+
+```{.python .input}
+titanic.sort_values...
+```
+
+### Indexing rows by value
+
+One final way to select specific columns is by their value
+
+```
+titanic[titanic.sex == 'female']  # selects all females
+```
+
+```
+# select all passengers older than 60 who departed from Southampton
+titanic[(titanic.age > 60) & (titanic['embark_town'] == 'Southampton')]
+```
+
+Note that this required typing `titanic` quite often. A quicker way to get the
+same result is using the `query` method, which is described in detail
+[here](http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method)
+(note that using the `query` method is also faster and uses a lot less
+memory).
+
+> You may have trouble using the `query` method with columns which have
+a name that cannot be used as a Python identifier.
+
+```
+titanic.query('(age > 60) & (embark_town == "Southampton")')
+```
+
+When selecting a categorical multiple options from a categorical values you 
+might want to use `isin`:
+```
+titanic[titanic['class'].isin(['First','Second'])]
+```
+
+Particularly useful when selecting data like this is the `isna` method which
+finds all missing data
+
+```
+titanic[~titanic.age.isna()]  # select first few passengers whose age is not N/A
+```
+
+This removing of missing numbers is so common that it has is own method
+
+```
+titanic.dropna()  # drops all passengers that have some datapoint missing
+```
+
+```
+titanic.dropna(subset=['age', 'fare'])  # Only drop passengers with missing ages or fares
+```
+
+**Exercise**: use sorting, indexing by value, `dropna` and `tail`/`head` or
+indexing to find the 10 oldest female passengers on the titanic. Try to do
+this on a single line by chaining calls to the titanic `DataFrame` object
+
+```
+titanic...
+```
+
+## Plotting the data
+
+Before we start analyzing the data, let's play around with visualizing it.
+Pandas does have some basic built-in plotting options:
+
+```
+titanic.fare.hist(bins=20, log=True)
+```
+
+```
+titanic.age.plot()
+```
+
+To plot all variables simply call `plot` or `hist` on the full dataframe
+rather than a single Series (i.e., column). You might want to set `subplots=True`
+to plot each variable in a different subplot.
+
+Individual Series are essentially 1D arrays, so we can use them as such in
+`matplotlib`
+
+```
+plt.scatter(titanic.age, titanic.fare)
+```
+
+However, for most purposes much nicer plots can be obtained using
+[Seaborn](https://seaborn.pydata.org). Seaborn has support to produce plots
+showing the
+[univariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-univariate-distributions)
+or
+[bivariate](https://seaborn.pydata.org/tutorial/distributions.html#plotting-bivariate-distributions)
+distribution of data in a single or a grid of plots.  Most of the seaborn
+plotting functions expect to get a pandas `DataFrame` (although they will work
+with Numpy arrays as well). So we can plot age vs. fare like:
+
+```
+sns.jointplot('age', 'fare', data=titanic)
+```
+
+**Exercise**: check the documentation from `sns.jointplot` (hover the mouse
+over the text `jointplot` and press shift-tab) to find out how to turn the
+scatter plot into a density (kde) map
+
+```
+sns.jointplot('age', 'fare', data=titanic, ...)
+```
+
+Here is just a brief example of how we can use multiple columns to illustrate
+the data in more detail
+
+```
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third'))
+```
+
+**Exercise**: Split the plot above into two rows with the first row including
+the passengers who survived and the second row those who did not (you might
+have to check the documentation again by using shift-tab while overing the
+mouse over `relplot`)
+
+```
+sns.relplot(x='age', y='fare', col='class', hue='sex', data=titanic,
+           col_order=('First', 'Second', 'Third')...)
+```
+
+One of the nice thing of Seaborn is how easy it is to update how these plots
+look. You can read more about that
+[here](https://seaborn.pydata.org/tutorial/aesthetics.html). For example, to
+increase the font size to get a plot more approriate for a talk, you can use:
+
+```
+sns.set_context('talk')
+sns.violinplot(x='class', y='age', hue='sex', data=titanic, split=True,
+               order=('First', 'Second', 'Third'))
+```
+
+## Summarizing the data (mean, std, etc.)
+
+There are a large number of built-in methods to summarize the observations in
+a Pandas `DataFrame`. Most of these will return a `Series` with the columns
+names as index:
+
+```
+titanic.mean()
+```
+
+```
+titanic.quantile(0.75)
+```
+
+One very useful one is `describe`, which gives an overview of many common
+summary measures
+
+```
+titanic.describe()
+```
+
+For a more detailed exploration of the data, you might want to check 
+[pandas_profiliing](https://pandas-profiling.github.io/pandas-profiling/docs/)
+(not installed in fslpython, so the following will not run in fslpython):
+
+```
+from pandas_profiling import ProfileReport
+profile = ProfileReport(titanic, title='Titanic Report', html={'style':{'full_width':True}})
+profile.to_widgets()
+```
+
+Note that non-numeric columns are ignored when summarizing data in this way.
+
+We can also define our own functions to apply to the columns (in this case we
+have to explicitly set the data types).
+
+```
+def mad(series):
+    """
+    Computes the median absolute deviatation (MAD)
+
+    This is a outlier-resistant measure of the standard deviation
+    """
+    no_nan = series.dropna()
+    return np.median(abs(no_nan - np.nanmedian(no_nan)))
+
+titanic.select_dtypes(np.number).apply(mad)
+```
+
+We can also provide multiple functions to the `apply` method (note that
+functions can be provided as strings)
+
+```
+titanic.select_dtypes(np.number).apply(['mean', np.median, np.std, mad])
+```
+
+### Grouping by
+
+One of the more powerful features of is `groupby`, which splits the dataset on
+a categorical variable. The book contains a clear tutorial on that feature
+[here](https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html). You
+can check the pandas documentation
+[here](http://pandas.pydata.org/pandas-docs/stable/groupby.html) for a more
+formal introduction. One simple use is just to put it into a loop
+
+```
+for cls, part_table in titanic.groupby('class'):
+    print(f'Mean fare in {cls.lower()} class: {part_table.fare.mean()}')
+```
+
+However, it is more often combined with one of the aggregation functions
+discussed above as illustrated in this figure from the [Python data science
+handbook](https://jakevdp.github.io/PythonDataScienceHandbook/06.00-figure-code.html#Split-Apply-Combine)
+
+![group by image](group_by.png)
+
+```
+titanic.groupby('class').mean()
+```
+
+We can also group by multiple variables at once
+
+```
+titanic.groupby(['class', 'survived']).mean()  # as always in pandas supply multiple column names as lists, not tuples
+```
+
+When grouping it can help to use the `cut` method to split a continuous variable
+into a categorical one
+
+```
+titanic.groupby(['class', pd.cut(titanic.age, bins=(0, 18, 50, np.inf))]).mean()
+```
+
+We can use the `aggregate` method to apply a different function to each series
+
+```
+titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+Note that both the index (on the left) and the column names (on the top) now
+have multiple levels. Such a multi-level index is referred to as `MultiIndex`.
+This does complicate selecting specific columns/rows. You can read more of using
+`MultiIndex` [here](http://pandas.pydata.org/pandas-docs/stable/advanced.html).
+The short version is that columns can be selected using direct indexing (as
+discussed above)
+
+```
+df_full = titanic.groupby(['class', 'survived']).aggregate((np.median, mad))
+```
+
+```
+df_full[('age', 'median')]  # selects median age column; note that the round brackets are optional
+```
+
+```
+df_full['age']  # selects both age columns
+```
+
+Remember that indexing based on the index was done through `loc`. The rest is
+the same as for the columns above
+
+```
+df_full.loc[('First', 0)]
+```
+
+```
+df_full.loc['First']
+
+```
+
+More advanced use of the `MultiIndex` is possible through `xs`:
+
+```
+df_full.xs(0, level='survived') # selects all the zero's from the survived index
+```
+
+```
+df_full.xs('mad', axis=1, level=1) # selects mad from the second level in the columns (i.e., axis=1)
+```
+
+## Reshaping tables
+
+If we were interested in how the survival rate depends on the class and sex of
+the passengers we could simply use a groupby:
+
+```
+titanic.groupby(['class', 'sex']).survived.mean()
+```
+
+However, this single-column table is difficult to read. The reason for this is
+that the indexing is multi-leveled (called `MultiIndex` in pandas), while there
+is only a single column. We would like to move one of the levels in the index to
+the columns. This can be done using `stack`/`unstack`:
+
+- `unstack`: Moves one levels in the index to the columns
+- `stack`: Moves one of levels in the columns to the index
+
+```
+titanic.groupby(['class', 'sex']).survived.mean().unstack('sex')
+```
+
+The former table, where the different groups are defined in different rows, is
+often referred to as long-form. After unstacking the table is often referred to
+as wide-form as the different group (sex in this case) is now represented as
+different columns. In pandas some operations are easier on long-form tables
+(e.g., `groupby`) while others require wide_form tables (e.g., making scatter
+plots of two variables). You can go back and forth using `unstack` or `stack` as
+illustrated above, but as this is a crucial part of pandas there are many
+alternatives, such as `pivot_table`, `melt`, and `wide_to_long`, which we will
+discuss below.
+
+We can prettify the table further using seaborn
+
+```
+ax = sns.heatmap(titanic.groupby(['class', 'sex']).survived.mean().unstack('sex'),
+                 annot=True)
+ax.set_title('survival rate')
+```
+
+Note that there are also many ways to produce prettier tables in pandas (e.g.,
+color all the negative values). This is documented
+[here](http://pandas.pydata.org/pandas-docs/stable/style.html).
+
+Because this stacking/unstacking is fairly common after a groupby operation,
+there is a shortcut for it: `pivot_table`
+
+```
+titanic.pivot_table('survived', 'class', 'sex')
+```
+
+As usual in pandas, where we can also provide multiple column names
+
+```
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))]), annot=True)
+```
+
+We can also change the function to be used to aggregate the data
+
+```
+sns.heatmap(titanic.pivot_table('survived', ['class', 'embark_town'], ['sex', pd.cut(titanic.age, (0, 18, np.inf))],
+                                aggfunc='count'), annot=True)
+```
+
+As in `groupby` the aggregation function can be a string of a common aggregation
+function, or any function that should be applied.
+
+We can even apply different aggregate functions to different columns
+
+```
+titanic.pivot_table(index='class', columns='sex',
+                    aggfunc={'survived': 'count', 'fare': np.mean}) # compute number of survivors and mean fare
+
+```
+
+The opposite of `pivot_table` is `melt`. This can be used to change a wide-form
+table into a long-form table. This is not particularly useful on the titanic
+dataset, so let's create a new table where this might be useful. Let's say we
+have a dataset listing the FA and MD values in various WM tracts:
+
+```
+tracts = ('Corpus callosum', 'Internal capsule', 'SLF', 'Arcuate fasciculus')
+df_wide = pd.DataFrame.from_dict(dict({'subject': list('ABCDEFGHIJ')}, **{
+    f'FA({tract})': np.random.rand(10) for tract in tracts }, **{
+    f'MD({tract})': np.random.rand(10) * 1e-3 for tract in tracts
+}))
+df_wide
+```
+
+This wide-form table (i.e., all the information is in different columns) makes
+it hard to select just all the FA values or only the values associated with the
+SLF. For this it would be easier to list all the values in a single column.
+Most of the tools discussed above (e.g., `group_by` or `seaborn` plotting) work
+better with long-form data, which we can obtain from `melt`:
+
+```
+df_long = df_wide.melt('subject', var_name='measurement', value_name='dti_value')
+df_long.head(12)
+```
+
+We can see that `melt` took all the columns (we could also have specified a
+specific sub-set) and returned each measurement as a seperate row. We probably
+want to seperate the measurement column into the measurement type (FA or MD) and
+the tract name. Many string manipulation function are available in the
+`DataFrame` object under `DataFrame.str`
+([tutorial](http://pandas.pydata.org/pandas-docs/stable/text.html))
+
+```
+df_long['variable'] = df_long.measurement.str.slice(0, 2)  # first two letters correspond to FA or MD
+df_long['tract'] = df_long.measurement.str.slice(3, -1)  # fourth till the second-to-last letter correspond to the tract
+df_long.head(12)
+```
+
+Finally we probably do want the FA and MD variables as different columns.
+
+**Exercise**: Use `pivot_table` or `stack`/`unstack` to create a column for MD
+and FA.
+
+```
+df_unstacked = df_long.
+```
+
+We can now use the tools discussed above to visualize the table (`seaborn`) or
+to group the table based on tract (`groupby` or `pivot_table`).
+
+```
+# feel free to analyze this random data in more detail
+```
+
+In general pandas is better at handling long-form than wide-form data, although
+for better visualization of the data an intermediate format is often best. One
+exception is calculating a covariance (`DataFrame.cov`) or correlation
+(`DataFrame.corr`) matrices which computes the correlation between each column:
+
+```
+sns.heatmap(df_wide.corr(), cmap=sns.diverging_palette(240, 10, s=99, n=300), )
+```
+
+## Linear fitting (`statsmodels`)
+
+Linear fitting between the different columns is available through the
+[`statsmodels`](https://www.statsmodels.org/stable/index.html) library. A nice
+way to play around with a wide variety of possible models is to use R-style
+functions. The usage of the functions in `statsmodels` is described
+[here](https://www.statsmodels.org/dev/example_formulas.html). You can find a
+more detailed description of the R-style functions
+[here](https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-
+language).
+
+In short these functions describe the linear model as a string. For example,
+`"y ~ x + a + x * a"` fits the variable `y` as a function of `x`, `a`, and the
+interaction between `x` and `a`. The intercept is included by default (you can
+add `"+ 0"` to remove it).
+
+```
+result = smf.logit('survived ~ age + sex + age * sex', data=titanic).fit()
+print(result.summary())
+```
+
+Note that `statsmodels` understands categorical variables and automatically
+replaces them with dummy variables.
+
+Above we used logistic regression, which is appropriate for the binary
+survival rate. A wide variety of linear models are available. Let's try a GLM,
+but assume that the fare is drawn from a Gamma distribution:
+
+```
+age_dmean = titanic.age - titanic.age.mean()
+result = smf.glm('fare ~ age_dmean + embark_town', data=titanic).fit()
+print(result.summary())
+```
+
+Cherbourg passengers clearly paid a lot more...
+
+
+Note that we did not actually add the `age_dmean` to the
+`DataFrame`. `statsmodels` (or more precisely the underlying
+[patsy](https://patsy.readthedocs.io/en/latest/) library) automatically
+extracted this from our environment. This can lead to confusing behaviour...
+
+# More reading
+
+Other useful features
+
+- [Concatenating and merging tables](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html)
+- [Lots of time series support](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)
+- [Rolling Window
+  functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-
+  functions) for after you have meaningfully sorted your data
+- and much, much more
--- a/talks/pandas/titanic.csv
+++ b/talks/pandas/titanic.csv
+survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
+0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
+1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
+1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
+1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
+0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.4583,Q,Third,man,True,,Queenstown,no,True
+0,1,male,54.0,0,0,51.8625,S,First,man,True,E,Southampton,no,True
+0,3,male,2.0,3,1,21.075,S,Third,child,False,,Southampton,no,False
+1,3,female,27.0,0,2,11.1333,S,Third,woman,False,,Southampton,yes,False
+1,2,female,14.0,1,0,30.0708,C,Second,child,False,,Cherbourg,yes,False
+1,3,female,4.0,1,1,16.7,S,Third,child,False,G,Southampton,yes,False
+1,1,female,58.0,0,0,26.55,S,First,woman,False,C,Southampton,yes,True
+0,3,male,20.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,39.0,1,5,31.275,S,Third,man,True,,Southampton,no,False
+0,3,female,14.0,0,0,7.8542,S,Third,child,False,,Southampton,no,True
+1,2,female,55.0,0,0,16.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,2.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
+1,2,male,,0,0,13.0,S,Second,man,True,,Southampton,yes,True
+0,3,female,31.0,1,0,18.0,S,Third,woman,False,,Southampton,no,False
+1,3,female,,0,0,7.225,C,Third,woman,False,,Cherbourg,yes,True
+0,2,male,35.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+1,2,male,34.0,0,0,13.0,S,Second,man,True,D,Southampton,yes,True
+1,3,female,15.0,0,0,8.0292,Q,Third,child,False,,Queenstown,yes,True
+1,1,male,28.0,0,0,35.5,S,First,man,True,A,Southampton,yes,True
+0,3,female,8.0,3,1,21.075,S,Third,child,False,,Southampton,no,False
+1,3,female,38.0,1,5,31.3875,S,Third,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+0,1,male,19.0,3,2,263.0,S,First,man,True,C,Southampton,no,False
+1,3,female,,0,0,7.8792,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,1,male,40.0,0,0,27.7208,C,First,man,True,,Cherbourg,no,True
+1,1,female,,1,0,146.5208,C,First,woman,False,B,Cherbourg,yes,False
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+0,2,male,66.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,1,male,28.0,1,0,82.1708,C,First,man,True,,Cherbourg,no,False
+0,1,male,42.0,1,0,52.0,S,First,man,True,,Southampton,no,False
+1,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,yes,True
+0,3,male,21.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,female,18.0,2,0,18.0,S,Third,woman,False,,Southampton,no,False
+1,3,female,14.0,1,0,11.2417,C,Third,child,False,,Cherbourg,yes,False
+0,3,female,40.0,1,0,9.475,S,Third,woman,False,,Southampton,no,False
+0,2,female,27.0,1,0,21.0,S,Second,woman,False,,Southampton,no,False
+0,3,male,,0,0,7.8958,C,Third,man,True,,Cherbourg,no,True
+1,2,female,3.0,1,2,41.5792,C,Second,child,False,,Cherbourg,yes,False
+1,3,female,19.0,0,0,7.8792,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,1,0,15.5,Q,Third,man,True,,Queenstown,no,False
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,,2,0,21.6792,C,Third,man,True,,Cherbourg,no,False
+0,3,female,18.0,1,0,17.8,S,Third,woman,False,,Southampton,no,False
+0,3,male,7.0,4,1,39.6875,S,Third,child,False,,Southampton,no,False
+0,3,male,21.0,0,0,7.8,S,Third,man,True,,Southampton,no,True
+1,1,female,49.0,1,0,76.7292,C,First,woman,False,D,Cherbourg,yes,False
+1,2,female,29.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+0,1,male,65.0,0,1,61.9792,C,First,man,True,B,Cherbourg,no,False
+1,1,male,,0,0,35.5,S,First,man,True,C,Southampton,yes,True
+1,2,female,21.0,0,0,10.5,S,Second,woman,False,,Southampton,yes,True
+0,3,male,28.5,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+1,2,female,5.0,1,2,27.75,S,Second,child,False,,Southampton,yes,False
+0,3,male,11.0,5,2,46.9,S,Third,child,False,,Southampton,no,False
+0,3,male,22.0,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+1,1,female,38.0,0,0,80.0,,First,woman,False,B,,yes,True
+0,1,male,45.0,1,0,83.475,S,First,man,True,C,Southampton,no,False
+0,3,male,4.0,3,2,27.9,S,Third,child,False,,Southampton,no,False
+0,1,male,,0,0,27.7208,C,First,man,True,,Cherbourg,no,True
+1,3,male,,1,1,15.2458,C,Third,man,True,,Cherbourg,yes,False
+1,2,female,29.0,0,0,10.5,S,Second,woman,False,F,Southampton,yes,True
+0,3,male,19.0,0,0,8.1583,S,Third,man,True,,Southampton,no,True
+1,3,female,17.0,4,2,7.925,S,Third,woman,False,,Southampton,yes,False
+0,3,male,26.0,2,0,8.6625,S,Third,man,True,,Southampton,no,False
+0,2,male,32.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,female,16.0,5,2,46.9,S,Third,woman,False,,Southampton,no,False
+0,2,male,21.0,0,0,73.5,S,Second,man,True,,Southampton,no,True
+0,3,male,26.0,1,0,14.4542,C,Third,man,True,,Cherbourg,no,False
+1,3,male,32.0,0,0,56.4958,S,Third,man,True,,Southampton,yes,True
+0,3,male,25.0,0,0,7.65,S,Third,man,True,F,Southampton,no,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,2,male,0.83,0,2,29.0,S,Second,child,False,,Southampton,yes,False
+1,3,female,30.0,0,0,12.475,S,Third,woman,False,,Southampton,yes,True
+0,3,male,22.0,0,0,9.0,S,Third,man,True,,Southampton,no,True
+1,3,male,29.0,0,0,9.5,S,Third,man,True,,Southampton,yes,True
+1,3,female,,0,0,7.7875,Q,Third,woman,False,,Queenstown,yes,True
+0,1,male,28.0,0,0,47.1,S,First,man,True,,Southampton,no,True
+1,2,female,17.0,0,0,10.5,S,Second,woman,False,,Southampton,yes,True
+1,3,female,33.0,3,0,15.85,S,Third,woman,False,,Southampton,yes,False
+0,3,male,16.0,1,3,34.375,S,Third,man,True,,Southampton,no,False
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,female,23.0,3,2,263.0,S,First,woman,False,C,Southampton,yes,False
+0,3,male,24.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,29.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,20.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+0,1,male,46.0,1,0,61.175,S,First,man,True,E,Southampton,no,False
+0,3,male,26.0,1,2,20.575,S,Third,man,True,,Southampton,no,False
+0,3,male,59.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,1,male,71.0,0,0,34.6542,C,First,man,True,A,Cherbourg,no,True
+1,1,male,23.0,0,1,63.3583,C,First,man,True,D,Cherbourg,yes,False
+1,2,female,34.0,0,1,23.0,S,Second,woman,False,,Southampton,yes,False
+0,2,male,34.0,1,0,26.0,S,Second,man,True,,Southampton,no,False
+0,3,female,28.0,0,0,7.8958,S,Third,woman,False,,Southampton,no,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,1,male,21.0,0,1,77.2875,S,First,man,True,D,Southampton,no,False
+0,3,male,33.0,0,0,8.6542,S,Third,man,True,,Southampton,no,True
+0,3,male,37.0,2,0,7.925,S,Third,man,True,,Southampton,no,False
+0,3,male,28.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,3,female,21.0,0,0,7.65,S,Third,woman,False,,Southampton,yes,True
+1,3,male,,0,0,7.775,S,Third,man,True,,Southampton,yes,True
+0,3,male,38.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,3,female,,1,0,24.15,Q,Third,woman,False,,Queenstown,yes,False
+0,1,male,47.0,0,0,52.0,S,First,man,True,C,Southampton,no,True
+0,3,female,14.5,1,0,14.4542,C,Third,child,False,,Cherbourg,no,False
+0,3,male,22.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,female,20.0,1,0,9.825,S,Third,woman,False,,Southampton,no,False
+0,3,female,17.0,0,0,14.4583,C,Third,woman,False,,Cherbourg,no,True
+0,3,male,21.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+0,3,male,70.5,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,2,male,29.0,1,0,21.0,S,Second,man,True,,Southampton,no,False
+0,1,male,24.0,0,1,247.5208,C,First,man,True,B,Cherbourg,no,False
+0,3,female,2.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
+0,2,male,21.0,2,0,73.5,S,Second,man,True,,Southampton,no,False
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,2,male,32.5,1,0,30.0708,C,Second,man,True,,Cherbourg,no,False
+1,2,female,32.5,0,0,13.0,S,Second,woman,False,E,Southampton,yes,True
+0,1,male,54.0,0,1,77.2875,S,First,man,True,D,Southampton,no,False
+1,3,male,12.0,1,0,11.2417,C,Third,child,False,,Cherbourg,yes,False
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,3,male,24.0,0,0,7.1417,S,Third,man,True,,Southampton,yes,True
+1,3,female,,1,1,22.3583,C,Third,woman,False,F,Cherbourg,yes,False
+0,3,male,45.0,0,0,6.975,S,Third,man,True,,Southampton,no,True
+0,3,male,33.0,0,0,7.8958,C,Third,man,True,,Cherbourg,no,True
+0,3,male,20.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+0,3,female,47.0,1,0,14.5,S,Third,woman,False,,Southampton,no,False
+1,2,female,29.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+0,2,male,25.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,23.0,0,0,15.0458,C,Second,man,True,,Cherbourg,no,True
+1,1,female,19.0,0,2,26.2833,S,First,woman,False,D,Southampton,yes,False
+0,1,male,37.0,1,0,53.1,S,First,man,True,C,Southampton,no,False
+0,3,male,16.0,0,0,9.2167,S,Third,man,True,,Southampton,no,True
+0,1,male,24.0,0,0,79.2,C,First,man,True,B,Cherbourg,no,True
+0,3,female,,0,2,15.2458,C,Third,woman,False,,Cherbourg,no,False
+1,3,female,22.0,0,0,7.75,S,Third,woman,False,,Southampton,yes,True
+1,3,female,24.0,1,0,15.85,S,Third,woman,False,,Southampton,yes,False
+0,3,male,19.0,0,0,6.75,Q,Third,man,True,,Queenstown,no,True
+0,2,male,18.0,0,0,11.5,S,Second,man,True,,Southampton,no,True
+0,2,male,19.0,1,1,36.75,S,Second,man,True,,Southampton,no,False
+1,3,male,27.0,0,0,7.7958,S,Third,man,True,,Southampton,yes,True
+0,3,female,9.0,2,2,34.375,S,Third,child,False,,Southampton,no,False
+0,2,male,36.5,0,2,26.0,S,Second,man,True,F,Southampton,no,False
+0,2,male,42.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,51.0,0,0,12.525,S,Second,man,True,,Southampton,no,True
+1,1,female,22.0,1,0,66.6,S,First,woman,False,C,Southampton,yes,False
+0,3,male,55.5,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,40.5,0,2,14.5,S,Third,man,True,,Southampton,no,False
+0,3,male,,0,0,7.3125,S,Third,man,True,,Southampton,no,True
+0,1,male,51.0,0,1,61.3792,C,First,man,True,,Cherbourg,no,False
+1,3,female,16.0,0,0,7.7333,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,30.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,male,,8,2,69.55,S,Third,man,True,,Southampton,no,False
+0,3,male,44.0,0,1,16.1,S,Third,man,True,,Southampton,no,False
+1,2,female,40.0,0,0,15.75,S,Second,woman,False,,Southampton,yes,True
+0,3,male,26.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,3,male,17.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,male,1.0,4,1,39.6875,S,Third,child,False,,Southampton,no,False
+1,3,male,9.0,0,2,20.525,S,Third,child,False,,Southampton,yes,False
+1,1,female,,0,1,55.0,S,First,woman,False,E,Southampton,yes,False
+0,3,female,45.0,1,4,27.9,S,Third,woman,False,,Southampton,no,False
+0,1,male,,0,0,25.925,S,First,man,True,,Southampton,no,True
+0,3,male,28.0,0,0,56.4958,S,Third,man,True,,Southampton,no,True
+0,1,male,61.0,0,0,33.5,S,First,man,True,B,Southampton,no,True
+0,3,male,4.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
+1,3,female,1.0,1,1,11.1333,S,Third,child,False,,Southampton,yes,False
+0,3,male,21.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+0,1,male,56.0,0,0,30.6958,C,First,man,True,A,Cherbourg,no,True
+0,3,male,18.0,1,1,7.8542,S,Third,man,True,,Southampton,no,False
+0,3,male,,3,1,25.4667,S,Third,man,True,,Southampton,no,False
+0,1,female,50.0,0,0,28.7125,C,First,woman,False,C,Cherbourg,no,True
+0,2,male,30.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,36.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
+0,3,female,,8,2,69.55,S,Third,woman,False,,Southampton,no,False
+0,2,male,,0,0,15.05,C,Second,man,True,,Cherbourg,no,True
+0,3,male,9.0,4,2,31.3875,S,Third,child,False,,Southampton,no,False
+1,2,male,1.0,2,1,39.0,S,Second,child,False,F,Southampton,yes,False
+1,3,female,4.0,0,2,22.025,S,Third,child,False,,Southampton,yes,False
+0,1,male,,0,0,50.0,S,First,man,True,A,Southampton,no,True
+1,3,female,,1,0,15.5,Q,Third,woman,False,,Queenstown,yes,False
+1,1,male,45.0,0,0,26.55,S,First,man,True,,Southampton,yes,True
+0,3,male,40.0,1,1,15.5,Q,Third,man,True,,Queenstown,no,False
+0,3,male,36.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,2,female,32.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+0,2,male,19.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,3,female,19.0,1,0,7.8542,S,Third,woman,False,,Southampton,yes,False
+1,2,male,3.0,1,1,26.0,S,Second,child,False,F,Southampton,yes,False
+1,1,female,44.0,0,0,27.7208,C,First,woman,False,B,Cherbourg,yes,True
+1,1,female,58.0,0,0,146.5208,C,First,woman,False,B,Cherbourg,yes,True
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,3,male,42.0,0,1,8.4042,S,Third,man,True,,Southampton,no,False
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+0,2,female,24.0,0,0,13.0,S,Second,woman,False,,Southampton,no,True
+0,3,male,28.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+0,3,male,,8,2,69.55,S,Third,man,True,,Southampton,no,False
+0,3,male,34.0,0,0,6.4958,S,Third,man,True,,Southampton,no,True
+0,3,male,45.5,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+1,3,male,18.0,0,0,8.05,S,Third,man,True,,Southampton,yes,True
+0,3,female,2.0,0,1,10.4625,S,Third,child,False,G,Southampton,no,False
+0,3,male,32.0,1,0,15.85,S,Third,man,True,,Southampton,no,False
+1,3,male,26.0,0,0,18.7875,C,Third,man,True,,Cherbourg,yes,True
+1,3,female,16.0,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+1,1,male,40.0,0,0,31.0,C,First,man,True,A,Cherbourg,yes,True
+0,3,male,24.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+1,2,female,35.0,0,0,21.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,22.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,2,male,30.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,,1,0,7.75,Q,Third,man,True,,Queenstown,no,False
+1,1,female,31.0,1,0,113.275,C,First,woman,False,D,Cherbourg,yes,False
+1,3,female,27.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
+0,2,male,42.0,1,0,27.0,S,Second,man,True,,Southampton,no,False
+1,1,female,32.0,0,0,76.2917,C,First,woman,False,D,Cherbourg,yes,True
+0,2,male,30.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+1,3,male,16.0,0,0,8.05,S,Third,man,True,,Southampton,yes,True
+0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,51.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,male,38.0,1,0,90.0,S,First,man,True,C,Southampton,yes,False
+0,3,male,22.0,0,0,9.35,S,Third,man,True,,Southampton,no,True
+1,2,male,19.0,0,0,10.5,S,Second,man,True,,Southampton,yes,True
+0,3,male,20.5,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,2,male,18.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,female,,3,1,25.4667,S,Third,woman,False,,Southampton,no,False
+1,1,female,35.0,1,0,83.475,S,First,woman,False,C,Southampton,yes,False
+0,3,male,29.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,2,male,59.0,0,0,13.5,S,Second,man,True,,Southampton,no,True
+1,3,female,5.0,4,2,31.3875,S,Third,child,False,,Southampton,yes,False
+0,2,male,24.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,female,,0,0,7.55,S,Third,woman,False,,Southampton,no,True
+0,2,male,44.0,1,0,26.0,S,Second,man,True,,Southampton,no,False
+1,2,female,8.0,0,2,26.25,S,Second,child,False,,Southampton,yes,False
+0,2,male,19.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,2,male,33.0,0,0,12.275,S,Second,man,True,,Southampton,no,True
+0,3,female,,1,0,14.4542,C,Third,woman,False,,Cherbourg,no,False
+1,3,female,,1,0,15.5,Q,Third,woman,False,,Queenstown,yes,False
+0,2,male,29.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,male,22.0,0,0,7.125,S,Third,man,True,,Southampton,no,True
+0,3,male,30.0,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+0,1,male,44.0,2,0,90.0,Q,First,man,True,C,Queenstown,no,False
+0,3,female,25.0,0,0,7.775,S,Third,woman,False,,Southampton,no,True
+1,2,female,24.0,0,2,14.5,S,Second,woman,False,,Southampton,yes,False
+1,1,male,37.0,1,1,52.5542,S,First,man,True,D,Southampton,yes,False
+0,2,male,54.0,1,0,26.0,S,Second,man,True,,Southampton,no,False
+0,3,male,,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,3,female,29.0,1,1,10.4625,S,Third,woman,False,G,Southampton,no,False
+0,1,male,62.0,0,0,26.55,S,First,man,True,C,Southampton,no,True
+0,3,male,30.0,1,0,16.1,S,Third,man,True,,Southampton,no,False
+0,3,female,41.0,0,2,20.2125,S,Third,woman,False,,Southampton,no,False
+1,3,female,29.0,0,2,15.2458,C,Third,woman,False,,Cherbourg,yes,False
+1,1,female,,0,0,79.2,C,First,woman,False,,Cherbourg,yes,True
+1,1,female,30.0,0,0,86.5,S,First,woman,False,B,Southampton,yes,True
+1,1,female,35.0,0,0,512.3292,C,First,woman,False,,Cherbourg,yes,True
+1,2,female,50.0,0,1,26.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,3,male,3.0,4,2,31.3875,S,Third,child,False,,Southampton,yes,False
+0,1,male,52.0,1,1,79.65,S,First,man,True,E,Southampton,no,False
+0,1,male,40.0,0,0,0.0,S,First,man,True,B,Southampton,no,True
+0,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,no,True
+0,2,male,36.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,male,16.0,4,1,39.6875,S,Third,man,True,,Southampton,no,False
+1,3,male,25.0,1,0,7.775,S,Third,man,True,,Southampton,yes,False
+1,1,female,58.0,0,1,153.4625,S,First,woman,False,C,Southampton,yes,False
+1,1,female,35.0,0,0,135.6333,S,First,woman,False,C,Southampton,yes,True
+0,1,male,,0,0,31.0,S,First,man,True,,Southampton,no,True
+1,3,male,25.0,0,0,0.0,S,Third,man,True,,Southampton,yes,True
+1,2,female,41.0,0,1,19.5,S,Second,woman,False,,Southampton,yes,False
+0,1,male,37.0,0,1,29.7,C,First,man,True,C,Cherbourg,no,False
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+1,1,female,63.0,1,0,77.9583,S,First,woman,False,D,Southampton,yes,False
+0,3,female,45.0,0,0,7.75,S,Third,woman,False,,Southampton,no,True
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+0,3,male,7.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
+1,3,female,35.0,1,1,20.25,S,Third,woman,False,,Southampton,yes,False
+0,3,male,65.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,3,male,28.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+0,3,male,16.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+1,3,male,19.0,0,0,8.05,S,Third,man,True,,Southampton,yes,True
+0,1,male,,0,0,26.0,S,First,man,True,A,Southampton,no,True
+0,3,male,33.0,0,0,8.6625,C,Third,man,True,,Cherbourg,no,True
+1,3,male,30.0,0,0,9.5,S,Third,man,True,,Southampton,yes,True
+0,3,male,22.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,2,male,42.0,0,0,13.0,S,Second,man,True,,Southampton,yes,True
+1,3,female,22.0,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+1,1,female,26.0,0,0,78.85,S,First,woman,False,,Southampton,yes,True
+1,1,female,19.0,1,0,91.0792,C,First,woman,False,B,Cherbourg,yes,False
+0,2,male,36.0,0,0,12.875,C,Second,man,True,D,Cherbourg,no,True
+0,3,female,24.0,0,0,8.85,S,Third,woman,False,,Southampton,no,True
+0,3,male,24.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,1,male,,0,0,27.7208,C,First,man,True,,Cherbourg,no,True
+0,3,male,23.5,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,1,female,2.0,1,2,151.55,S,First,child,False,C,Southampton,no,False
+1,1,male,,0,0,30.5,S,First,man,True,C,Southampton,yes,True
+1,1,female,50.0,0,1,247.5208,C,First,woman,False,B,Cherbourg,yes,False
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+1,3,male,,2,0,23.25,Q,Third,man,True,,Queenstown,yes,False
+0,3,male,19.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
+1,2,female,,0,0,12.35,Q,Second,woman,False,E,Queenstown,yes,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,male,0.92,1,2,151.55,S,First,child,False,C,Southampton,yes,False
+1,1,female,,0,0,110.8833,C,First,woman,False,,Cherbourg,yes,True
+1,1,female,17.0,1,0,108.9,C,First,woman,False,C,Cherbourg,yes,False
+0,2,male,30.0,1,0,24.0,C,Second,man,True,,Cherbourg,no,False
+1,1,female,30.0,0,0,56.9292,C,First,woman,False,E,Cherbourg,yes,True
+1,1,female,24.0,0,0,83.1583,C,First,woman,False,C,Cherbourg,yes,True
+1,1,female,18.0,2,2,262.375,C,First,woman,False,B,Cherbourg,yes,False
+0,2,female,26.0,1,1,26.0,S,Second,woman,False,,Southampton,no,False
+0,3,male,28.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,2,male,43.0,1,1,26.25,S,Second,man,True,,Southampton,no,False
+1,3,female,26.0,0,0,7.8542,S,Third,woman,False,,Southampton,yes,True
+1,2,female,24.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+0,2,male,54.0,0,0,14.0,S,Second,man,True,,Southampton,no,True
+1,1,female,31.0,0,2,164.8667,S,First,woman,False,C,Southampton,yes,False
+1,1,female,40.0,1,1,134.5,C,First,woman,False,E,Cherbourg,yes,False
+0,3,male,22.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,3,male,27.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,2,female,30.0,0,0,12.35,Q,Second,woman,False,,Queenstown,yes,True
+1,2,female,22.0,1,1,29.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,,8,2,69.55,S,Third,man,True,,Southampton,no,False
+1,1,female,36.0,0,0,135.6333,C,First,woman,False,C,Cherbourg,yes,True
+0,3,male,61.0,0,0,6.2375,S,Third,man,True,,Southampton,no,True
+1,2,female,36.0,0,0,13.0,S,Second,woman,False,D,Southampton,yes,True
+1,3,female,31.0,1,1,20.525,S,Third,woman,False,,Southampton,yes,False
+1,1,female,16.0,0,1,57.9792,C,First,woman,False,B,Cherbourg,yes,False
+1,3,female,,2,0,23.25,Q,Third,woman,False,,Queenstown,yes,False
+0,1,male,45.5,0,0,28.5,S,First,man,True,C,Southampton,no,True
+0,1,male,38.0,0,1,153.4625,S,First,man,True,C,Southampton,no,False
+0,3,male,16.0,2,0,18.0,S,Third,man,True,,Southampton,no,False
+1,1,female,,1,0,133.65,S,First,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,1,male,29.0,1,0,66.6,S,First,man,True,C,Southampton,no,False
+1,1,female,41.0,0,0,134.5,C,First,woman,False,E,Cherbourg,yes,True
+1,3,male,45.0,0,0,8.05,S,Third,man,True,,Southampton,yes,True
+0,1,male,45.0,0,0,35.5,S,First,man,True,,Southampton,no,True
+1,2,male,2.0,1,1,26.0,S,Second,child,False,F,Southampton,yes,False
+1,1,female,24.0,3,2,263.0,S,First,woman,False,C,Southampton,yes,False
+0,2,male,28.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,25.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,36.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,2,female,24.0,0,0,13.0,S,Second,woman,False,F,Southampton,yes,True
+1,2,female,40.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+1,3,female,,1,0,16.1,S,Third,woman,False,,Southampton,yes,False
+1,3,male,3.0,1,1,15.9,S,Third,child,False,,Southampton,yes,False
+0,3,male,42.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,male,23.0,0,0,9.225,S,Third,man,True,,Southampton,no,True
+0,1,male,,0,0,35.0,S,First,man,True,C,Southampton,no,True
+0,3,male,15.0,1,1,7.2292,C,Third,child,False,,Cherbourg,no,False
+0,3,male,25.0,1,0,17.8,S,Third,man,True,,Southampton,no,False
+0,3,male,,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+0,3,male,28.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+1,1,female,22.0,0,1,55.0,S,First,woman,False,E,Southampton,yes,False
+0,2,female,38.0,0,0,13.0,S,Second,woman,False,,Southampton,no,True
+1,3,female,,0,0,7.8792,Q,Third,woman,False,,Queenstown,yes,True
+1,3,female,,0,0,7.8792,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,40.0,1,4,27.9,S,Third,man,True,,Southampton,no,False
+0,2,male,29.0,1,0,27.7208,C,Second,man,True,,Cherbourg,no,False
+0,3,female,45.0,0,1,14.4542,C,Third,woman,False,,Cherbourg,no,False
+0,3,male,35.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,1,0,15.5,Q,Third,man,True,,Queenstown,no,False
+0,3,male,30.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+1,1,female,60.0,1,0,75.25,C,First,woman,False,D,Cherbourg,yes,False
+1,3,female,,0,0,7.2292,C,Third,woman,False,,Cherbourg,yes,True
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+1,1,female,24.0,0,0,69.3,C,First,woman,False,B,Cherbourg,yes,True
+1,1,male,25.0,1,0,55.4417,C,First,man,True,E,Cherbourg,yes,False
+0,3,male,18.0,1,0,6.4958,S,Third,man,True,,Southampton,no,False
+0,3,male,19.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,1,male,22.0,0,0,135.6333,C,First,man,True,,Cherbourg,no,True
+0,3,female,3.0,3,1,21.075,S,Third,child,False,,Southampton,no,False
+1,1,female,,1,0,82.1708,C,First,woman,False,,Cherbourg,yes,False
+1,3,female,22.0,0,0,7.25,S,Third,woman,False,,Southampton,yes,True
+0,1,male,27.0,0,2,211.5,C,First,man,True,C,Cherbourg,no,False
+0,3,male,20.0,0,0,4.0125,C,Third,man,True,,Cherbourg,no,True
+0,3,male,19.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+1,1,female,42.0,0,0,227.525,C,First,woman,False,,Cherbourg,yes,True
+1,3,female,1.0,0,2,15.7417,C,Third,child,False,,Cherbourg,yes,False
+0,3,male,32.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+1,1,female,35.0,1,0,52.0,S,First,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,2,male,18.0,0,0,73.5,S,Second,man,True,,Southampton,no,True
+0,3,male,1.0,5,2,46.9,S,Third,child,False,,Southampton,no,False
+1,2,female,36.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,,0,0,7.7292,Q,Third,man,True,,Queenstown,no,True
+1,2,female,17.0,0,0,12.0,C,Second,woman,False,,Cherbourg,yes,True
+1,1,male,36.0,1,2,120.0,S,First,man,True,B,Southampton,yes,False
+1,3,male,21.0,0,0,7.7958,S,Third,man,True,,Southampton,yes,True
+0,3,male,28.0,2,0,7.925,S,Third,man,True,,Southampton,no,False
+1,1,female,23.0,1,0,113.275,C,First,woman,False,D,Cherbourg,yes,False
+1,3,female,24.0,0,2,16.7,S,Third,woman,False,G,Southampton,yes,False
+0,3,male,22.0,0,0,7.7958,S,Third,man,True,,Southampton,no,True
+0,3,female,31.0,0,0,7.8542,S,Third,woman,False,,Southampton,no,True
+0,2,male,46.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+0,2,male,23.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+1,2,female,28.0,0,0,12.65,S,Second,woman,False,,Southampton,yes,True
+1,3,male,39.0,0,0,7.925,S,Third,man,True,,Southampton,yes,True
+0,3,male,26.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,female,21.0,1,0,9.825,S,Third,woman,False,,Southampton,no,False
+0,3,male,28.0,1,0,15.85,S,Third,man,True,,Southampton,no,False
+0,3,female,20.0,0,0,8.6625,S,Third,woman,False,,Southampton,no,True
+0,2,male,34.0,1,0,21.0,S,Second,man,True,,Southampton,no,False
+0,3,male,51.0,0,0,7.75,S,Third,man,True,,Southampton,no,True
+1,2,male,3.0,1,1,18.75,S,Second,child,False,,Southampton,yes,False
+0,3,male,21.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,3,female,,3,1,25.4667,S,Third,woman,False,,Southampton,no,False
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,6.8583,Q,Third,man,True,,Queenstown,no,True
+1,1,female,33.0,1,0,90.0,Q,First,woman,False,C,Queenstown,yes,False
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+1,3,male,44.0,0,0,7.925,S,Third,man,True,,Southampton,yes,True
+0,3,female,,0,0,8.05,S,Third,woman,False,,Southampton,no,True
+1,2,female,34.0,1,1,32.5,S,Second,woman,False,,Southampton,yes,False
+1,2,female,18.0,0,2,13.0,S,Second,woman,False,,Southampton,yes,False
+0,2,male,30.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,female,10.0,0,2,24.15,S,Third,child,False,,Southampton,no,False
+0,3,male,,0,0,7.8958,C,Third,man,True,,Cherbourg,no,True
+0,3,male,21.0,0,0,7.7333,Q,Third,man,True,,Queenstown,no,True
+0,3,male,29.0,0,0,7.875,S,Third,man,True,,Southampton,no,True
+0,3,female,28.0,1,1,14.4,S,Third,woman,False,,Southampton,no,False
+0,3,male,18.0,1,1,20.2125,S,Third,man,True,,Southampton,no,False
+0,3,male,,0,0,7.25,S,Third,man,True,,Southampton,no,True
+1,2,female,28.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+1,2,female,19.0,0,0,26.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,3,male,32.0,0,0,8.05,S,Third,man,True,E,Southampton,yes,True
+1,1,male,28.0,0,0,26.55,S,First,man,True,C,Southampton,yes,True
+1,3,female,,1,0,16.1,S,Third,woman,False,,Southampton,yes,False
+1,2,female,42.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,17.0,0,0,7.125,S,Third,man,True,,Southampton,no,True
+0,1,male,50.0,1,0,55.9,S,First,man,True,E,Southampton,no,False
+1,1,female,14.0,1,2,120.0,S,First,child,False,B,Southampton,yes,False
+0,3,female,21.0,2,2,34.375,S,Third,woman,False,,Southampton,no,False
+1,2,female,24.0,2,3,18.75,S,Second,woman,False,,Southampton,yes,False
+0,1,male,64.0,1,4,263.0,S,First,man,True,C,Southampton,no,False
+0,2,male,31.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+1,2,female,45.0,1,1,26.25,S,Second,woman,False,,Southampton,yes,False
+0,3,male,20.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+0,3,male,25.0,1,0,7.775,S,Third,man,True,,Southampton,no,False
+1,2,female,28.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+1,3,male,,0,0,8.1125,S,Third,man,True,,Southampton,yes,True
+1,1,male,4.0,0,2,81.8583,S,First,child,False,A,Southampton,yes,False
+1,2,female,13.0,0,1,19.5,S,Second,child,False,,Southampton,yes,False
+1,1,male,34.0,0,0,26.55,S,First,man,True,,Southampton,yes,True
+1,3,female,5.0,2,1,19.2583,C,Third,child,False,,Cherbourg,yes,False
+1,1,male,52.0,0,0,30.5,S,First,man,True,C,Southampton,yes,True
+0,2,male,36.0,1,2,27.75,S,Second,man,True,,Southampton,no,False
+0,3,male,,1,0,19.9667,S,Third,man,True,,Southampton,no,False
+0,1,male,30.0,0,0,27.75,C,First,man,True,C,Cherbourg,no,True
+1,1,male,49.0,1,0,89.1042,C,First,man,True,C,Cherbourg,yes,False
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,male,29.0,0,0,7.8958,C,Third,man,True,,Cherbourg,yes,True
+0,1,male,65.0,0,0,26.55,S,First,man,True,E,Southampton,no,True
+1,1,female,,1,0,51.8625,S,First,woman,False,D,Southampton,yes,False
+1,2,female,50.0,0,0,10.5,S,Second,woman,False,,Southampton,yes,True
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,1,male,48.0,0,0,26.55,S,First,man,True,E,Southampton,yes,True
+0,3,male,34.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,1,male,47.0,0,0,38.5,S,First,man,True,E,Southampton,no,True
+0,2,male,48.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,38.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+0,1,male,56.0,0,0,26.55,S,First,man,True,,Southampton,no,True
+0,3,male,,0,0,7.725,Q,Third,man,True,,Queenstown,no,True
+1,3,female,0.75,2,1,19.2583,C,Third,child,False,,Cherbourg,yes,False
+0,3,male,,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,3,male,38.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+1,2,female,33.0,1,2,27.75,S,Second,woman,False,,Southampton,yes,False
+1,2,female,23.0,0,0,13.7917,C,Second,woman,False,D,Cherbourg,yes,True
+0,3,female,22.0,0,0,9.8375,S,Third,woman,False,,Southampton,no,True
+0,1,male,,0,0,52.0,S,First,man,True,A,Southampton,no,True
+0,2,male,34.0,1,0,21.0,S,Second,man,True,,Southampton,no,False
+0,3,male,29.0,1,0,7.0458,S,Third,man,True,,Southampton,no,False
+0,3,male,22.0,0,0,7.5208,S,Third,man,True,,Southampton,no,True
+1,3,female,2.0,0,1,12.2875,S,Third,child,False,,Southampton,yes,False
+0,3,male,9.0,5,2,46.9,S,Third,child,False,,Southampton,no,False
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+0,3,male,50.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,female,63.0,0,0,9.5875,S,Third,woman,False,,Southampton,yes,True
+1,1,male,25.0,1,0,91.0792,C,First,man,True,B,Cherbourg,yes,False
+0,3,female,,3,1,25.4667,S,Third,woman,False,,Southampton,no,False
+1,1,female,35.0,1,0,90.0,S,First,woman,False,C,Southampton,yes,False
+0,1,male,58.0,0,0,29.7,C,First,man,True,B,Cherbourg,no,True
+0,3,male,30.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,male,9.0,1,1,15.9,S,Third,child,False,,Southampton,yes,False
+0,3,male,,1,0,19.9667,S,Third,man,True,,Southampton,no,False
+0,3,male,21.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,1,male,55.0,0,0,30.5,S,First,man,True,C,Southampton,no,True
+0,1,male,71.0,0,0,49.5042,C,First,man,True,,Cherbourg,no,True
+0,3,male,21.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,14.4583,C,Third,man,True,,Cherbourg,no,True
+1,1,female,54.0,1,0,78.2667,C,First,woman,False,D,Cherbourg,yes,False
+0,3,male,,0,0,15.1,S,Third,man,True,,Southampton,no,True
+0,1,female,25.0,1,2,151.55,S,First,woman,False,C,Southampton,no,False
+0,3,male,24.0,0,0,7.7958,S,Third,man,True,,Southampton,no,True
+0,3,male,17.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,female,21.0,0,0,7.75,Q,Third,woman,False,,Queenstown,no,True
+0,3,female,,0,0,7.6292,Q,Third,woman,False,,Queenstown,no,True
+0,3,female,37.0,0,0,9.5875,S,Third,woman,False,,Southampton,no,True
+1,1,female,16.0,0,0,86.5,S,First,woman,False,B,Southampton,yes,True
+0,1,male,18.0,1,0,108.9,C,First,man,True,C,Cherbourg,no,False
+1,2,female,33.0,0,2,26.0,S,Second,woman,False,,Southampton,yes,False
+1,1,male,,0,0,26.55,S,First,man,True,,Southampton,yes,True
+0,3,male,28.0,0,0,22.525,S,Third,man,True,,Southampton,no,True
+1,3,male,26.0,0,0,56.4958,S,Third,man,True,,Southampton,yes,True
+1,3,male,29.0,0,0,7.75,Q,Third,man,True,,Queenstown,yes,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,male,36.0,0,0,26.2875,S,First,man,True,E,Southampton,yes,True
+1,1,female,54.0,1,0,59.4,C,First,woman,False,,Cherbourg,yes,False
+0,3,male,24.0,0,0,7.4958,S,Third,man,True,,Southampton,no,True
+0,1,male,47.0,0,0,34.0208,S,First,man,True,D,Southampton,no,True
+1,2,female,34.0,0,0,10.5,S,Second,woman,False,F,Southampton,yes,True
+0,3,male,,0,0,24.15,Q,Third,man,True,,Queenstown,no,True
+1,2,female,36.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,32.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,female,30.0,0,0,93.5,S,First,woman,False,B,Southampton,yes,True
+0,3,male,22.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+1,1,female,44.0,0,1,57.9792,C,First,woman,False,B,Cherbourg,yes,False
+0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,3,male,40.5,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,2,female,50.0,0,0,10.5,S,Second,woman,False,,Southampton,yes,True
+0,1,male,,0,0,221.7792,S,First,man,True,C,Southampton,no,True
+0,3,male,39.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+0,2,male,23.0,2,1,11.5,S,Second,man,True,,Southampton,no,False
+1,2,female,2.0,1,1,26.0,S,Second,child,False,,Southampton,yes,False
+0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,3,male,17.0,1,1,7.2292,C,Third,man,True,,Cherbourg,no,False
+1,3,female,,0,2,22.3583,C,Third,woman,False,,Cherbourg,yes,False
+0,3,female,30.0,0,0,8.6625,S,Third,woman,False,,Southampton,no,True
+1,2,female,7.0,0,2,26.25,S,Second,child,False,,Southampton,yes,False
+0,1,male,45.0,0,0,26.55,S,First,man,True,B,Southampton,no,True
+1,1,female,30.0,0,0,106.425,C,First,woman,False,,Cherbourg,yes,True
+0,3,male,,0,0,14.5,S,Third,man,True,,Southampton,no,True
+1,1,female,22.0,0,2,49.5,C,First,woman,False,B,Cherbourg,yes,False
+1,1,female,36.0,0,2,71.0,S,First,woman,False,B,Southampton,yes,False
+0,3,female,9.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
+0,3,female,11.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
+1,2,male,32.0,1,0,26.0,S,Second,man,True,,Southampton,yes,False
+0,1,male,50.0,1,0,106.425,C,First,man,True,C,Cherbourg,no,False
+0,1,male,64.0,0,0,26.0,S,First,man,True,,Southampton,no,True
+1,2,female,19.0,1,0,26.0,S,Second,woman,False,,Southampton,yes,False
+1,2,male,,0,0,13.8625,C,Second,man,True,,Cherbourg,yes,True
+0,3,male,33.0,1,1,20.525,S,Third,man,True,,Southampton,no,False
+1,2,male,8.0,1,1,36.75,S,Second,child,False,,Southampton,yes,False
+1,1,male,17.0,0,2,110.8833,C,First,man,True,C,Cherbourg,yes,False
+0,2,male,27.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+0,3,male,,0,0,7.8292,Q,Third,man,True,,Queenstown,no,True
+1,3,male,22.0,0,0,7.225,C,Third,man,True,,Cherbourg,yes,True
+1,3,female,22.0,0,0,7.775,S,Third,woman,False,,Southampton,yes,True
+0,1,male,62.0,0,0,26.55,S,First,man,True,,Southampton,no,True
+1,1,female,48.0,1,0,39.6,C,First,woman,False,A,Cherbourg,yes,False
+0,1,male,,0,0,227.525,C,First,man,True,,Cherbourg,no,True
+1,1,female,39.0,1,1,79.65,S,First,woman,False,E,Southampton,yes,False
+1,3,female,36.0,1,0,17.4,S,Third,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,3,male,40.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,2,male,28.0,0,0,13.5,S,Second,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,female,,0,0,8.05,S,Third,woman,False,,Southampton,no,True
+0,3,male,24.0,2,0,24.15,S,Third,man,True,,Southampton,no,False
+0,3,male,19.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,female,29.0,0,4,21.075,S,Third,woman,False,,Southampton,no,False
+0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+1,3,male,32.0,0,0,7.8542,S,Third,man,True,,Southampton,yes,True
+1,2,male,62.0,0,0,10.5,S,Second,man,True,,Southampton,yes,True
+1,1,female,53.0,2,0,51.4792,S,First,woman,False,C,Southampton,yes,False
+1,1,male,36.0,0,0,26.3875,S,First,man,True,E,Southampton,yes,True
+1,3,female,,0,0,7.75,Q,Third,woman,False,,Queenstown,yes,True
+0,3,male,16.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,19.0,0,0,14.5,S,Third,man,True,,Southampton,no,True
+1,2,female,34.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+1,1,female,39.0,1,0,55.9,S,First,woman,False,E,Southampton,yes,False
+0,3,female,,1,0,14.4583,C,Third,woman,False,,Cherbourg,no,False
+1,3,male,32.0,0,0,7.925,S,Third,man,True,,Southampton,yes,True
+1,2,female,25.0,1,1,30.0,S,Second,woman,False,,Southampton,yes,False
+1,1,female,39.0,1,1,110.8833,C,First,woman,False,C,Cherbourg,yes,False
+0,2,male,54.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+0,1,male,36.0,0,0,40.125,C,First,man,True,A,Cherbourg,no,True
+0,3,male,,0,0,8.7125,C,Third,man,True,,Cherbourg,no,True
+1,1,female,18.0,0,2,79.65,S,First,woman,False,E,Southampton,yes,False
+0,2,male,47.0,0,0,15.0,S,Second,man,True,,Southampton,no,True
+1,1,male,60.0,1,1,79.2,C,First,man,True,B,Cherbourg,yes,False
+0,3,male,22.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,3,male,35.0,0,0,7.125,S,Third,man,True,,Southampton,no,True
+1,1,female,52.0,1,0,78.2667,C,First,woman,False,D,Cherbourg,yes,False
+0,3,male,47.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+0,3,female,,0,2,7.75,Q,Third,woman,False,,Queenstown,no,False
+0,2,male,37.0,1,0,26.0,S,Second,man,True,,Southampton,no,False
+0,3,male,36.0,1,1,24.15,S,Third,man,True,,Southampton,no,False
+1,2,female,,0,0,33.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,49.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+1,1,male,49.0,1,0,56.9292,C,First,man,True,A,Cherbourg,yes,False
+1,2,female,24.0,2,1,27.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,1,male,,0,0,42.4,S,First,man,True,,Southampton,no,True
+0,3,male,44.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,male,35.0,0,0,26.55,C,First,man,True,,Cherbourg,yes,True
+0,3,male,36.0,1,0,15.55,S,Third,man,True,,Southampton,no,False
+0,3,male,30.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,male,27.0,0,0,30.5,S,First,man,True,,Southampton,yes,True
+1,2,female,22.0,1,2,41.5792,C,Second,woman,False,,Cherbourg,yes,False
+1,1,female,40.0,0,0,153.4625,S,First,woman,False,C,Southampton,yes,True
+0,3,female,39.0,1,5,31.275,S,Third,woman,False,,Southampton,no,False
+0,3,male,,0,0,7.05,S,Third,man,True,,Southampton,no,True
+1,3,female,,1,0,15.5,Q,Third,woman,False,,Queenstown,yes,False
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,2,female,24.0,1,2,65.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,34.0,1,1,14.4,S,Third,man,True,,Southampton,no,False
+0,3,female,26.0,1,0,16.1,S,Third,woman,False,,Southampton,no,False
+1,2,female,4.0,2,1,39.0,S,Second,child,False,F,Southampton,yes,False
+0,2,male,26.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,male,27.0,1,0,14.4542,C,Third,man,True,,Cherbourg,no,False
+1,1,male,42.0,1,0,52.5542,S,First,man,True,D,Southampton,yes,False
+1,3,male,20.0,1,1,15.7417,C,Third,man,True,,Cherbourg,yes,False
+0,3,male,21.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+0,3,male,21.0,0,0,16.1,S,Third,man,True,,Southampton,no,True
+0,1,male,61.0,0,0,32.3208,S,First,man,True,D,Southampton,no,True
+0,2,male,57.0,0,0,12.35,Q,Second,man,True,,Queenstown,no,True
+1,1,female,21.0,0,0,77.9583,S,First,woman,False,D,Southampton,yes,True
+0,3,male,26.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.7333,Q,Third,man,True,,Queenstown,no,True
+1,1,male,80.0,0,0,30.0,S,First,man,True,A,Southampton,yes,True
+0,3,male,51.0,0,0,7.0542,S,Third,man,True,,Southampton,no,True
+1,1,male,32.0,0,0,30.5,C,First,man,True,B,Cherbourg,yes,True
+0,1,male,,0,0,0.0,S,First,man,True,,Southampton,no,True
+0,3,female,9.0,3,2,27.9,S,Third,child,False,,Southampton,no,False
+1,2,female,28.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+0,3,male,32.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+0,2,male,31.0,1,1,26.25,S,Second,man,True,,Southampton,no,False
+0,3,female,41.0,0,5,39.6875,S,Third,woman,False,,Southampton,no,False
+0,3,male,,1,0,16.1,S,Third,man,True,,Southampton,no,False
+0,3,male,20.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+1,1,female,24.0,0,0,69.3,C,First,woman,False,B,Cherbourg,yes,True
+0,3,female,2.0,3,2,27.9,S,Third,child,False,,Southampton,no,False
+1,3,male,,0,0,56.4958,S,Third,man,True,,Southampton,yes,True
+1,3,female,0.75,2,1,19.2583,C,Third,child,False,,Cherbourg,yes,False
+1,1,male,48.0,1,0,76.7292,C,First,man,True,D,Cherbourg,yes,False
+0,3,male,19.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,male,56.0,0,0,35.5,C,First,man,True,A,Cherbourg,yes,True
+0,3,male,,0,0,7.55,S,Third,man,True,,Southampton,no,True
+1,3,female,23.0,0,0,7.55,S,Third,woman,False,,Southampton,yes,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,2,female,18.0,0,1,23.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,21.0,0,0,8.4333,S,Third,man,True,,Southampton,no,True
+1,3,female,,0,0,7.8292,Q,Third,woman,False,,Queenstown,yes,True
+0,3,female,18.0,0,0,6.75,Q,Third,woman,False,,Queenstown,no,True
+0,2,male,24.0,2,0,73.5,S,Second,man,True,,Southampton,no,False
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,female,32.0,1,1,15.5,Q,Third,woman,False,,Queenstown,no,False
+0,2,male,23.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,1,male,58.0,0,2,113.275,C,First,man,True,D,Cherbourg,no,False
+1,1,male,50.0,2,0,133.65,S,First,man,True,,Southampton,yes,False
+0,3,male,40.0,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+0,1,male,47.0,0,0,25.5875,S,First,man,True,E,Southampton,no,True
+0,3,male,36.0,0,0,7.4958,S,Third,man,True,,Southampton,no,True
+1,3,male,20.0,1,0,7.925,S,Third,man,True,,Southampton,yes,False
+0,2,male,32.0,2,0,73.5,S,Second,man,True,,Southampton,no,False
+0,2,male,25.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,3,male,43.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,female,,1,0,52.0,S,First,woman,False,C,Southampton,yes,False
+1,2,female,40.0,1,1,39.0,S,Second,woman,False,,Southampton,yes,False
+0,1,male,31.0,1,0,52.0,S,First,man,True,B,Southampton,no,False
+0,2,male,70.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+1,2,male,31.0,0,0,13.0,S,Second,man,True,,Southampton,yes,True
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+0,3,male,18.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,3,male,24.5,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,female,18.0,0,0,9.8417,S,Third,woman,False,,Southampton,yes,True
+0,3,female,43.0,1,6,46.9,S,Third,woman,False,,Southampton,no,False
+1,1,male,36.0,0,1,512.3292,C,First,man,True,B,Cherbourg,yes,False
+0,3,female,,0,0,8.1375,Q,Third,woman,False,,Queenstown,no,True
+1,1,male,27.0,0,0,76.7292,C,First,man,True,D,Cherbourg,yes,True
+0,3,male,20.0,0,0,9.225,S,Third,man,True,,Southampton,no,True
+0,3,male,14.0,5,2,46.9,S,Third,child,False,,Southampton,no,False
+0,2,male,60.0,1,1,39.0,S,Second,man,True,,Southampton,no,False
+0,2,male,25.0,1,2,41.5792,C,Second,man,True,,Cherbourg,no,False
+0,3,male,14.0,4,1,39.6875,S,Third,child,False,,Southampton,no,False
+0,3,male,19.0,0,0,10.1708,S,Third,man,True,,Southampton,no,True
+0,3,male,18.0,0,0,7.7958,S,Third,man,True,,Southampton,no,True
+1,1,female,15.0,0,1,211.3375,S,First,child,False,B,Southampton,yes,False
+1,1,male,31.0,1,0,57.0,S,First,man,True,B,Southampton,yes,False
+1,3,female,4.0,0,1,13.4167,C,Third,child,False,,Cherbourg,yes,False
+1,3,male,,0,0,56.4958,S,Third,man,True,,Southampton,yes,True
+0,3,male,25.0,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+0,1,male,60.0,0,0,26.55,S,First,man,True,,Southampton,no,True
+0,2,male,52.0,0,0,13.5,S,Second,man,True,,Southampton,no,True
+0,3,male,44.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,female,,0,0,7.7333,Q,Third,woman,False,,Queenstown,yes,True
+0,1,male,49.0,1,1,110.8833,C,First,man,True,C,Cherbourg,no,False
+0,3,male,42.0,0,0,7.65,S,Third,man,True,F,Southampton,no,True
+1,1,female,18.0,1,0,227.525,C,First,woman,False,C,Cherbourg,yes,False
+1,1,male,35.0,0,0,26.2875,S,First,man,True,E,Southampton,yes,True
+0,3,female,18.0,0,1,14.4542,C,Third,woman,False,,Cherbourg,no,False
+0,3,male,25.0,0,0,7.7417,Q,Third,man,True,,Queenstown,no,True
+0,3,male,26.0,1,0,7.8542,S,Third,man,True,,Southampton,no,False
+0,2,male,39.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+1,2,female,45.0,0,0,13.5,S,Second,woman,False,,Southampton,yes,True
+1,1,male,42.0,0,0,26.2875,S,First,man,True,E,Southampton,yes,True
+1,1,female,22.0,0,0,151.55,S,First,woman,False,,Southampton,yes,True
+1,3,male,,1,1,15.2458,C,Third,man,True,,Cherbourg,yes,False
+1,1,female,24.0,0,0,49.5042,C,First,woman,False,C,Cherbourg,yes,True
+0,1,male,,0,0,26.55,S,First,man,True,C,Southampton,no,True
+1,1,male,48.0,1,0,52.0,S,First,man,True,C,Southampton,yes,False
+0,3,male,29.0,0,0,9.4833,S,Third,man,True,,Southampton,no,True
+0,2,male,52.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,19.0,0,0,7.65,S,Third,man,True,F,Southampton,no,True
+1,1,female,38.0,0,0,227.525,C,First,woman,False,C,Cherbourg,yes,True
+1,2,female,27.0,0,0,10.5,S,Second,woman,False,E,Southampton,yes,True
+0,3,male,,0,0,15.5,Q,Third,man,True,,Queenstown,no,True
+0,3,male,33.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+1,2,female,6.0,0,1,33.0,S,Second,child,False,,Southampton,yes,False
+0,3,male,17.0,1,0,7.0542,S,Third,man,True,,Southampton,no,False
+0,2,male,34.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,50.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,1,male,27.0,1,0,53.1,S,First,man,True,E,Southampton,yes,False
+0,3,male,20.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+1,2,female,30.0,3,0,21.0,S,Second,woman,False,,Southampton,yes,False
+1,3,female,,0,0,7.7375,Q,Third,woman,False,,Queenstown,yes,True
+0,2,male,25.0,1,0,26.0,S,Second,man,True,,Southampton,no,False
+0,3,female,25.0,1,0,7.925,S,Third,woman,False,,Southampton,no,False
+1,1,female,29.0,0,0,211.3375,S,First,woman,False,B,Southampton,yes,True
+0,3,male,11.0,0,0,18.7875,C,Third,child,False,,Cherbourg,no,True
+0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
+0,2,male,23.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,2,male,23.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+0,3,male,28.5,0,0,16.1,S,Third,man,True,,Southampton,no,True
+0,3,female,48.0,1,3,34.375,S,Third,woman,False,,Southampton,no,False
+1,1,male,35.0,0,0,512.3292,C,First,man,True,B,Cherbourg,yes,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,male,,0,0,30.0,S,First,man,True,D,Southampton,yes,True
+0,1,male,36.0,1,0,78.85,S,First,man,True,C,Southampton,no,False
+1,1,female,21.0,2,2,262.375,C,First,woman,False,B,Cherbourg,yes,False
+0,3,male,24.0,1,0,16.1,S,Third,man,True,,Southampton,no,False
+1,3,male,31.0,0,0,7.925,S,Third,man,True,,Southampton,yes,True
+0,1,male,70.0,1,1,71.0,S,First,man,True,B,Southampton,no,False
+0,3,male,16.0,1,1,20.25,S,Third,man,True,,Southampton,no,False
+1,2,female,30.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+0,1,male,19.0,1,0,53.1,S,First,man,True,D,Southampton,no,False
+0,3,male,31.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+1,2,female,4.0,1,1,23.0,S,Second,child,False,,Southampton,yes,False
+1,3,male,6.0,0,1,12.475,S,Third,child,False,E,Southampton,yes,False
+0,3,male,33.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+0,3,male,23.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,2,female,48.0,1,2,65.0,S,Second,woman,False,,Southampton,yes,False
+1,2,male,0.67,1,1,14.5,S,Second,child,False,,Southampton,yes,False
+0,3,male,28.0,0,0,7.7958,S,Third,man,True,,Southampton,no,True
+0,2,male,18.0,0,0,11.5,S,Second,man,True,,Southampton,no,True
+0,3,male,34.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,1,female,33.0,0,0,86.5,S,First,woman,False,B,Southampton,yes,True
+0,3,male,,0,0,14.5,S,Third,man,True,,Southampton,no,True
+0,3,male,41.0,0,0,7.125,S,Third,man,True,,Southampton,no,True
+1,3,male,20.0,0,0,7.2292,C,Third,man,True,,Cherbourg,yes,True
+1,1,female,36.0,1,2,120.0,S,First,woman,False,B,Southampton,yes,False
+0,3,male,16.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+1,1,female,51.0,1,0,77.9583,S,First,woman,False,D,Southampton,yes,False
+0,1,male,,0,0,39.6,C,First,man,True,,Cherbourg,no,True
+0,3,female,30.5,0,0,7.75,Q,Third,woman,False,,Queenstown,no,True
+0,3,male,,1,0,24.15,Q,Third,man,True,,Queenstown,no,False
+0,3,male,32.0,0,0,8.3625,S,Third,man,True,,Southampton,no,True
+0,3,male,24.0,0,0,9.5,S,Third,man,True,,Southampton,no,True
+0,3,male,48.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+0,2,female,57.0,0,0,10.5,S,Second,woman,False,E,Southampton,no,True
+0,3,male,,0,0,7.225,C,Third,man,True,,Cherbourg,no,True
+1,2,female,54.0,1,3,23.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,18.0,0,0,7.75,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.75,Q,Third,man,True,F,Queenstown,no,True
+1,3,female,5.0,0,0,12.475,S,Third,child,False,,Southampton,yes,True
+0,3,male,,0,0,7.7375,Q,Third,man,True,,Queenstown,no,True
+1,1,female,43.0,0,1,211.3375,S,First,woman,False,B,Southampton,yes,False
+1,3,female,13.0,0,0,7.2292,C,Third,child,False,,Cherbourg,yes,True
+1,1,female,17.0,1,0,57.0,S,First,woman,False,B,Southampton,yes,False
+0,1,male,29.0,0,0,30.0,S,First,man,True,D,Southampton,no,True
+0,3,male,,1,2,23.45,S,Third,man,True,,Southampton,no,False
+0,3,male,25.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+0,3,male,25.0,0,0,7.25,S,Third,man,True,,Southampton,no,True
+1,3,female,18.0,0,0,7.4958,S,Third,woman,False,,Southampton,yes,True
+0,3,male,8.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
+1,3,male,1.0,1,2,20.575,S,Third,child,False,,Southampton,yes,False
+0,1,male,46.0,0,0,79.2,C,First,man,True,B,Cherbourg,no,True
+0,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
+0,2,male,16.0,0,0,26.0,S,Second,man,True,,Southampton,no,True
+0,3,female,,8,2,69.55,S,Third,woman,False,,Southampton,no,False
+0,1,male,,0,0,30.6958,C,First,man,True,,Cherbourg,no,True
+0,3,male,25.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,2,male,39.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,1,female,49.0,0,0,25.9292,S,First,woman,False,D,Southampton,yes,True
+1,3,female,31.0,0,0,8.6833,S,Third,woman,False,,Southampton,yes,True
+0,3,male,30.0,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,3,female,30.0,1,1,24.15,S,Third,woman,False,,Southampton,no,False
+0,2,male,34.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,2,female,31.0,1,1,26.25,S,Second,woman,False,,Southampton,yes,False
+1,1,male,11.0,1,2,120.0,S,First,child,False,B,Southampton,yes,False
+1,3,male,0.42,0,1,8.5167,C,Third,child,False,,Cherbourg,yes,False
+1,3,male,27.0,0,0,6.975,S,Third,man,True,,Southampton,yes,True
+0,3,male,31.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,1,male,39.0,0,0,0.0,S,First,man,True,A,Southampton,no,True
+0,3,female,18.0,0,0,7.775,S,Third,woman,False,,Southampton,no,True
+0,2,male,39.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,1,female,33.0,1,0,53.1,S,First,woman,False,E,Southampton,yes,False
+0,3,male,26.0,0,0,7.8875,S,Third,man,True,,Southampton,no,True
+0,3,male,39.0,0,0,24.15,S,Third,man,True,,Southampton,no,True
+0,2,male,35.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,female,6.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
+0,3,male,30.5,0,0,8.05,S,Third,man,True,,Southampton,no,True
+0,1,male,,0,0,0.0,S,First,man,True,B,Southampton,no,True
+0,3,female,23.0,0,0,7.925,S,Third,woman,False,,Southampton,no,True
+0,2,male,31.0,1,1,37.0042,C,Second,man,True,,Cherbourg,no,False
+0,3,male,43.0,0,0,6.45,S,Third,man,True,,Southampton,no,True
+0,3,male,10.0,3,2,27.9,S,Third,child,False,,Southampton,no,False
+1,1,female,52.0,1,1,93.5,S,First,woman,False,B,Southampton,yes,False
+1,3,male,27.0,0,0,8.6625,S,Third,man,True,,Southampton,yes,True
+0,1,male,38.0,0,0,0.0,S,First,man,True,,Southampton,no,True
+1,3,female,27.0,0,1,12.475,S,Third,woman,False,E,Southampton,yes,False
+0,3,male,2.0,4,1,39.6875,S,Third,child,False,,Southampton,no,False
+0,3,male,,0,0,6.95,Q,Third,man,True,,Queenstown,no,True
+0,3,male,,0,0,56.4958,S,Third,man,True,,Southampton,no,True
+1,2,male,1.0,0,2,37.0042,C,Second,child,False,,Cherbourg,yes,False
+1,3,male,,0,0,7.75,Q,Third,man,True,,Queenstown,yes,True
+1,1,female,62.0,0,0,80.0,,First,woman,False,B,,yes,True
+1,3,female,15.0,1,0,14.4542,C,Third,child,False,,Cherbourg,yes,False
+1,2,male,0.83,1,1,18.75,S,Second,child,False,,Southampton,yes,False
+0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,3,male,23.0,0,0,7.8542,S,Third,man,True,,Southampton,no,True
+0,3,male,18.0,0,0,8.3,S,Third,man,True,,Southampton,no,True
+1,1,female,39.0,1,1,83.1583,C,First,woman,False,E,Cherbourg,yes,False
+0,3,male,21.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
+1,3,male,32.0,0,0,56.4958,S,Third,man,True,,Southampton,yes,True
+1,1,male,,0,0,29.7,C,First,man,True,C,Cherbourg,yes,True
+0,3,male,20.0,0,0,7.925,S,Third,man,True,,Southampton,no,True
+0,2,male,16.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+1,1,female,30.0,0,0,31.0,C,First,woman,False,,Cherbourg,yes,True
+0,3,male,34.5,0,0,6.4375,C,Third,man,True,,Cherbourg,no,True
+0,3,male,17.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
+0,3,male,42.0,0,0,7.55,S,Third,man,True,,Southampton,no,True
+0,3,male,,8,2,69.55,S,Third,man,True,,Southampton,no,False
+0,3,male,35.0,0,0,7.8958,C,Third,man,True,,Cherbourg,no,True
+0,2,male,28.0,0,1,33.0,S,Second,man,True,,Southampton,no,False
+1,1,female,,1,0,89.1042,C,First,woman,False,C,Cherbourg,yes,False
+0,3,male,4.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
+0,3,male,74.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
+0,3,female,9.0,1,1,15.2458,C,Third,child,False,,Cherbourg,no,False
+1,1,female,16.0,0,1,39.4,S,First,woman,False,D,Southampton,yes,False
+0,2,female,44.0,1,0,26.0,S,Second,woman,False,,Southampton,no,False
+1,3,female,18.0,0,1,9.35,S,Third,woman,False,,Southampton,yes,False
+1,1,female,45.0,1,1,164.8667,S,First,woman,False,,Southampton,yes,False
+1,1,male,51.0,0,0,26.55,S,First,man,True,E,Southampton,yes,True
+1,3,female,24.0,0,3,19.2583,C,Third,woman,False,,Cherbourg,yes,False
+0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
+0,3,male,41.0,2,0,14.1083,S,Third,man,True,,Southampton,no,False
+0,2,male,21.0,1,0,11.5,S,Second,man,True,,Southampton,no,False
+1,1,female,48.0,0,0,25.9292,S,First,woman,False,D,Southampton,yes,True
+0,3,female,,8,2,69.55,S,Third,woman,False,,Southampton,no,False
+0,2,male,24.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,2,female,42.0,0,0,13.0,S,Second,woman,False,,Southampton,yes,True
+1,2,female,27.0,1,0,13.8583,C,Second,woman,False,,Cherbourg,yes,False
+0,1,male,31.0,0,0,50.4958,S,First,man,True,A,Southampton,no,True
+0,3,male,,0,0,9.5,S,Third,man,True,,Southampton,no,True
+1,3,male,4.0,1,1,11.1333,S,Third,child,False,,Southampton,yes,False
+0,3,male,26.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,female,47.0,1,1,52.5542,S,First,woman,False,D,Southampton,yes,False
+0,1,male,33.0,0,0,5.0,S,First,man,True,B,Southampton,no,True
+0,3,male,47.0,0,0,9.0,S,Third,man,True,,Southampton,no,True
+1,2,female,28.0,1,0,24.0,C,Second,woman,False,,Cherbourg,yes,False
+1,3,female,15.0,0,0,7.225,C,Third,child,False,,Cherbourg,yes,True
+0,3,male,20.0,0,0,9.8458,S,Third,man,True,,Southampton,no,True
+0,3,male,19.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+1,1,female,56.0,0,1,83.1583,C,First,woman,False,C,Cherbourg,yes,False
+1,2,female,25.0,0,1,26.0,S,Second,woman,False,,Southampton,yes,False
+0,3,male,33.0,0,0,7.8958,S,Third,man,True,,Southampton,no,True
+0,3,female,22.0,0,0,10.5167,S,Third,woman,False,,Southampton,no,True
+0,2,male,28.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
+0,3,male,25.0,0,0,7.05,S,Third,man,True,,Southampton,no,True
+0,3,female,39.0,0,5,29.125,Q,Third,woman,False,,Queenstown,no,False
+0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
+1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
+0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
+1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
+0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
--- a/talks/python_cpp/python_cpp.ipynb
+++ b/talks/python_cpp/python_cpp.ipynb
+%% Cell type:markdown id: tags:
+
+
+![Translation](trans.png)
+
+# Calling C++ code from Python
+
+## Problem
+
+ - We have some existing C++ code which operates on array/image data
+ - We want to call it from Python
+ - We want to use Numpy arrays to pass input and receive output
+ - **Ideally, want to avoid too much copying of large data**
+
+
+%% Cell type:markdown id: tags:
+
+## Solution I will present
+
+ - Build a **Cython** extension
+
+
+%% Cell type:markdown id: tags:
+
+## Alternative solutions I will mention briefly
+
+ - Create a pure-C API and use `ctypes`
+ - Wrapper-generators (e.g. `swig`)
+ - Wrapping a command line tool
+
+%% Cell type:markdown id: tags:
+
+# Sample C++ code
+    #include "newimage/newimageall.h"
+
+    void process_volume(NEWIMAGE::volume4D<float> &invol)
+    {
+        // Do some clever stuff
+        invol.binarise(0.5);
+    }
+
+    int main(int argc, char **argv)
+    {
+        char *input_file = argv[1];
+        char *output_file = argv[2];
+
+        NEWIMAGE::volume4D<float> invol;
+        read_volume4D(invol, input_file);
+
+        process_volume(invol);
+        save_volume4D(invol, output_file);
+    }
+
+%% Cell type:markdown id: tags:
+
+
+## First provide an entry point using C++ native types
+
+    #include <vector>
+    #include <iostream>
+
+    std::vector<float> process_vectors(std::vector<float> &input, int nx, int ny, int nz, int nt)
+    {
+        // This is just so we can see if the data has been copied
+        std::cerr << "In C++ the input vector starts at address " << &input[0] << std::endl;
+
+        // Here we ought to check that nx, ny, nz, nt is consistent with overall length of input
+
+        // Create a volume4D using an existing data buffer
+        // when we do this, NEWIMAGE will not try to delete the data buffer
+        NEWIMAGE::volume4D<float> invol(nx, ny, nz, nt, &input[0]);
+
+        // Do our processing step
+        process_volume(invol);
+
+        // Input data has been modified, so return it directly
+        return input;
+    }
+
+%% Cell type:markdown id: tags:
+
+
+# Array ordering
+
+![Ordering](abc.png)
+
+If `input` is a 4D image, it's pretty clear that the first element is the voxel with co-ordinates `(0, 0, 0, 0)`
+
+But what is the next element?
+
+Is it voxel `(1, 0, 0, 0)`?
+
+Or `(0, 0, 0, 1)`?
+
+If the *first* axis is the one which varies fastest, we are using **Column-Major** ordering
+If the *last* axis is the one which varies fastest, we are using **Row-Major** ordering
+
+%% Cell type:markdown id: tags:
+
+## So, which is the standard?
+
+| Row-major            | Column-major |
+| ------------         | ---------    |
+| C/C++ native arrays  | Fortran      |
+| Python/Numpy default | Matlab       |
+| SAS                  | FSL NEWIMAGE |
+
+%% Cell type:markdown id: tags:
+
+
+### Here, we will need to make sure our Numpy arrays are passed as 1-dimensional float arrays in Column-major order to match NEWIMAGE
+
+    data.flatten(order='F').astype(np.float32)
+
+ - `'F'` stands for 'Fortran order'
+ - A C++ `float` is *almost* guaranteed to be 32 bits
+
+
+%% Cell type:markdown id: tags:
+
+
+![Cython](cython.png)
+
+# Cython extension
+
+## First, the Cython wrapper
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis_wrapper.pyx
+
+import numpy as np
+cimport numpy as np
+
+from libcpp.vector cimport vector
+
+cdef extern from "my_analysis.h":
+    vector[float] process_vector(vector[float] &, int, int, int, int)
+
+def process_using_vectors(data):
+    # Save the dimensions of the data because we're going to flatten it to 1D array
+    # Should be checking the dimensions at this point!
+    nx, ny, nz, nt = data.shape
+
+    # Convert data to 1D in Column-major (Fortran) order
+    # This always copies the data
+    data = data.flatten(order='F').astype(np.float32)
+
+    # This line is just so we can see if the data is being copied
+    print("In python the input data starts at %X" % data.__array_interface__['data'][0])
+
+    # Call the C++ code
+    output = process_vectors(data, nx, ny, nz, nt)
+
+    # Output is a 1D array in Fortran order - turn it back into a multidimensional array
+    # This should not copy the data
+    output = np.reshape(output, [nx, ny, nz, nt], order='F')
+    print("In python the reshaped data starts at %X" % output.__array_interface__['data'][0])
+
+    return output
+
+```
+
+%% Cell type:markdown id: tags:
+
+
+## Next, build the extension
+
+This would normally go in `setup.py`
+
+%% Cell type:code id: tags:
+
+``` python
+import os
+import sys
+import numpy
+
+from setuptools import setup
+from Cython.Build import cythonize
+from setuptools.extension import Extension
+
+# My Cython extension
+fsldir = os.environ["FSLDIR"]
+
+ext = Extension("my_analysis_wrapper",
+                 sources=['my_analysis_wrapper.pyx',
+                          'my_analysis.cpp'],
+                 language="c++",
+                 include_dirs=[".", numpy.get_include(),
+                               os.path.join(fsldir, "include"),
+                               os.path.join(fsldir, "extras/include"),
+                               os.path.join(fsldir, "extras/include/newmat")],
+                 libraries=['newimage', 'miscmaths', 'fslio', 'niftiio', 'newmat', 'znz', "zlib"],
+                 library_dirs=[os.path.join(fsldir, "lib"), os.path.join(fsldir, "extras/lib")])
+
+# setup parameters
+setup(name='my_app',
+      description='My Python application which calls C++',
+      ext_modules=cythonize(ext))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%run setup.py build_ext
+```
+
+%% Output
+
+    running build_ext
+    copying build\lib.win-amd64-2.7\my_analysis_wrapper.pyd ->
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import my_analysis_wrapper
+import matplotlib.pyplot as plt
+
+data = numpy.random.rand(10, 10, 10, 10)
+
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output = my_analysis_wrapper.process_with_vectors(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+```
+
+%% Output
+
+
+
+    In python the input data starts at A55A980
+    In C++ the input vector starts at address 000000000A87D5A0
+    
+    In python the reshaped data starts at A6B2040
+
+
+
+%% Cell type:markdown id: tags:
+
+# Great, it worked
+
+## But we did copy our data  - several times
+
+ - We copied the input once in Python to flatten it in the right order
+ - Cython copied it again, because the pointer to the memory in the `vector` is different from the pointer to the data in the Numpy array.
+ - Similarly, converting the output `vector` back to a Numpy array would involve a copy
+
+
+## Do we care?
+
+It depends on:
+
+ - Is the processing time per-voxel much greater than the data copying time?
+   - If so, copying will not add significant overhead
+ - Might the data be comparable in size to system memory?
+   - If so, copying may result in swapping and significant slowness
+
+
+
+%% Cell type:markdown id: tags:
+
+# Solution with less copying
+
+We can't use a `vector`, it needs to be free to manage its own memory, not use an existing fixed buffer.
+
+Instead pass a pure C array:
+
+%% Cell type:markdown id: tags:
+
+
+    void process_array(float *input, int nx, int ny, int nz, int nt)
+    {
+        cerr << "In C++ the input array starts at address " << input << std::endl;
+
+        NEWIMAGE::volume4D<float> invol(nx, ny, nz, nt, input);
+
+        process_volume(invol);
+
+        // Volume data buffer is modified directly, so provided it was not copied
+        // we should be able to see the output directly in Python
+    }
+
+%% Cell type:markdown id: tags:
+
+ - Note that we cannot check the size of the `input` buffer! It had better be correct
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis_wrapper.pyx
+
+import numpy as np
+cimport numpy as np
+
+from libcpp.vector cimport vector
+
+cdef extern from "my_analysis.h":
+    void process_array(float *, int, int, int, int)
+
+def process_c(np.ndarray[np.float32_t, ndim=1] input,
+              nx, ny, nz, nt):
+    process_array(&input[0], nx, ny, nz, nt)
+
+def process_with_arrays(data):
+    # Save the dimension of the data because we're going to flatten it to 1D array
+    nx, ny, nz, nt = data.shape
+
+    # Convert data to 1D in Column-major (Fortran) order
+    data = data.flatten(order='F').astype(np.float32)
+
+    print("In python the data starts at %X" % data.__array_interface__['data'][0])
+
+    process_c(data, nx, ny, nz, nt)
+
+    data = np.reshape(data, [nx, ny, nz, nt], order='F')
+    print("In python the reshaped data starts at %X" % data.__array_interface__['data'][0])
+
+    return data
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import my_analysis_wrapper
+
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output = my_analysis_wrapper.process_with_arrays(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+```
+
+%% Output
+
+
+
+    In python the data starts at A870040
+    In C++ the input array starts at address 000000000A870040
+    
+    In python the reshaped data starts at A870040
+
+
+
+%% Cell type:markdown id: tags:
+
+ - We copied our input data once when we flattened it into Fortran order
+ - C++ code operated directly on that buffer
+ - Output data was not copied when reshaped
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+ - Easy-ish recipe for passing Numpy arrays to C++ either as a `std::vector` or as a `float *` array.
+ - Can construct `NEWIMAGE::volume<float>` or other complex containers from within C++
+ - Easy modification to instead use `double` array
+ - Can pass Python strings to `C++ std::string` and other C++ containers in a similar way
+
+
+%% Cell type:markdown id: tags:
+
+# Alternatives (Briefly!)
+
+## Why?
+
+![Compatibility](cython_compat1.png)
+
+## Can we assume that our newly compiled Cython/C++ code will link correctly with `libnewimage.a`?
+
+ - Often, yes, but in general, no
+ - It depends on the compiler used for each - ideally they need to match
+ - The compiler of your Cython extension is **fixed** by the version of Python you are using
+ - Might need to recompile your dependency libraries with this compiler
+ - If you can't do this (e.g. commercial binary) you may be **stuck**
+
+%% Cell type:markdown id: tags:
+
+## Three common problem scenarios
+
+ - On Mac, need to use the same C++ standard library (either `libc++` or `libstdc++`)
+ - On Python 2, C++ compiler will be very old (may not support all of C++11)
+ - On Windows, no two versions of VC++ are binary compatible
+
+
+%% Cell type:markdown id: tags:
+
+# Alternative approach where this is a problem
+
+ - Make your code a shared library with a *Pure C* API
+ - Use `ctypes`
+
+## `ctypes`
+
+ - Part of Python standard library
+ - Allows you to call library functions from 'C' shared library (not C++)
+ - **Pure 'C' libraries are (generally) binary compatible on a given platform**
+ - We have to load the library manually
+ - We have to tell Python about the input and return types
+
+
+%% Cell type:markdown id: tags:
+
+# Pure 'C' API for our processing function
+
+    // my_analysis_purec.h
+
+    #ifdef __cplusplus
+    extern "C" {
+    #endif
+
+    void process_array(float *input, int nx, int ny, int nz, int nt);
+
+    #ifdef __cplusplus
+    }
+    #endif
+
+
+%% Cell type:markdown id: tags:
+
+ - Note need to use `extern "C" { }` if we may want to include this header from C++
+ - **On Windows, additional code is required to make the shared library (DLL) link correctly!**
+ - Note that the implementation *can* use C++
+
+%% Cell type:code id: tags:
+
+``` python
+import ctypes import CDLL, c_int, c_char_p
+import numpy as np
+import numpy.ctypeslib
+
+def process_ctypes(data):
+
+    clib = ctypes.cdll.LoadLibrary("libmy_analysis.so")
+
+    # This is the data type of a 1-D Numpy array
+    c_float_arr = numpy.ctypeslib.ndpointer(dtype=np.float32, ndim=1, flags='CONTIGUOUS')
+
+    # This specifies the argument types for the 'process_array' function
+    # This is not actually required but enables ctypes to do some error checking
+    clib.process_array.argtypes = [c_float_arr, c_int, c_int, c_int, c_int]
+
+    # Put the Numpy data into row-major order and make sure it is contiguous in memory
+    item = np.ascontiguousarray(item.flatten(order='F'), dtype=np.float32)
+
+    clib.process_carray(data, shape[0], shape[1], shape[2], shape[3])
+```
+
+%% Cell type:markdown id: tags:
+
+# Comparison with Cython
+
+## Cython advantages
+
+ - Python wrapper is probably a little quicker and cleaner to write
+ - Don't need to produce a new pure-C API provided we have an entry point using C++ types
+ - Potential for better error-checking
+ - Might integrate well if you are already using Cython
+ - No need to build a shared library
+
+## `ctypes` advantages
+
+ - Part of the Python standard library
+ - No additional compile step in `setup.py`
+ - Binary compatibility - no need to be tied to a single (perhaps old) C++ compiler
+
+## Conclusion?
+
+ - Use Cython when you can, `ctypes` if you have to
+
+
+%% Cell type:markdown id: tags:
+
+# Other alternatives (briefly for completeness)
+
+## Wrapper Generators (SWIG, shiboken, others)
+
+ - Run a preprocessor on your C++ code to generate an 'automatic' Python wrapper
+ - Generally need to write an 'interface specifier' for each C++ header to describe how it interfaces to Python
+ - Great when you have a large, complex C++ API which needs to be consistently exposed to Python (e.g. wx/wxpython, QT/PyQT)
+ - SWIG can support other languages as well as Python
+ - Probably more work than Cython/ctypes if you have a single simple API
+
+%% Cell type:markdown id: tags:
+
+## Just Wrap the Command Line
+
+ - Quick and dirty
+ - Copies all data to/from filesystem
+ - Need to go via command line API, create temp directories, etc
+ - Don't overlook as a way of getting started - can move to other solution later
+
+
+%% Cell type:code id: tags:
+
+``` python
+import os
+import tempfile
+import subprocess
+import tempfile
+import shutil
+
+import numpy as np
+import nibabel as nib
+os.environ["FSLOUTPUTTYPE"] = "NIFTI_GZ"
+
+def binarise(data):
+    # Remember the directory where we started
+    cwd_orig = os.getcwd()
+    try:
+        # Create a temporary directory
+        tempdir = tempfile.mkdtemp("fsl")
+
+        # Save input data in temp directory
+        os.chdir(tempdir)
+        tmpin = nib.Nifti1Image(data, np.identity(4))
+        tmpin.to_filename("in.nii.gz")
+
+        # Run a command from $FSLDIR
+        fslmaths = os.path.join(os.environ["FSLDIR"], "bin", "fslmaths")
+
+        # We could use os.system here if we don't care about returning the stdout/stderr
+        p = subprocess.Popen([fslmaths, "in.nii.gz", "-thr", "0.5", "-bin", "out.nii.gz"],
+                             stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        cmd_stdout = ""
+        while 1:
+            retcode = p.poll()
+            cmd_stdout += p.stdout.readline()
+            if retcode is not None: break
+        if retcode != 0:
+            raise RuntimeError("Error: %s" % cmd_stdout)
+
+        # Load the output file and return it with the command standard output
+        out_nii = nib.load("out.nii.gz")
+        return out_nii.get_data(), cmd_stdout
+    finally:
+        # Change back to our starting directory
+        os.chdir(cwd_orig)
+
+data = np.random.rand(10, 10, 10, 10)
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output, stdout = binarise(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+
+```
+
+%% Output
+
+
+
+
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+## What we've done
+
+ - It's not that hard to call existing C++ code from Python
+ - Need to be a bit careful with Numpy arrays
+ - Cython is probably the easiest method
+ - Data copying can be minimised by passing data as C arrays (`float *` etc)
+ - `ctypes` may be a good alternative if you have binary compatibility issues
+ - Can always wrap a command line tool as a way of getting started!
+
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+
+![Translation](trans.png)
+
+# Calling C++ code from Python
+
+## Problem
+
+ - We have some existing C++ code which operates on array/image data
+ - We want to call it from Python
+ - We want to use Numpy arrays to pass input and receive output
+ - **Ideally, want to avoid too much copying of large data**
+
+
+%% Cell type:markdown id: tags:
+
+## Solution I will present
+
+ - Build a **Cython** extension
+
+
+%% Cell type:markdown id: tags:
+
+## Alternative solutions I will mention briefly
+
+ - Create a pure-C API and use `ctypes`
+ - Wrapper-generators (e.g. `swig`)
+ - Wrapping a command line tool
+
+%% Cell type:markdown id: tags:
+
+# Sample C++ code
+    #include "newimage/newimageall.h"
+
+    void process_volume(NEWIMAGE::volume4D<float> &invol)
+    {
+        // Do some clever stuff
+        invol.binarise(0.5);
+    }
+
+    int main(int argc, char **argv)
+    {
+        char *input_file = argv[1];
+        char *output_file = argv[2];
+
+        NEWIMAGE::volume4D<float> invol;
+        read_volume4D(invol, input_file);
+
+        process_volume(invol);
+        save_volume4D(invol, output_file);
+    }
+
+%% Cell type:markdown id: tags:
+
+
+## First provide an entry point using C++ native types
+
+    #include <vector>
+    #include <iostream>
+
+    std::vector<float> process_vectors(std::vector<float> &input, int nx, int ny, int nz, int nt)
+    {
+        // This is just so we can see if the data has been copied
+        std::cerr << "In C++ the input vector starts at address " << &input[0] << std::endl;
+
+        // Here we ought to check that nx, ny, nz, nt is consistent with overall length of input
+
+        // Create a volume4D using an existing data buffer
+        // when we do this, NEWIMAGE will not try to delete the data buffer
+        NEWIMAGE::volume4D<float> invol(nx, ny, nz, nt, &input[0]);
+
+        // Do our processing step
+        process_volume(invol);
+
+        // Input data has been modified, so return it directly
+        return input;
+    }
+
+%% Cell type:markdown id: tags:
+
+
+# Array ordering
+
+![Ordering](abc.png)
+
+If `input` is a 4D image, it's pretty clear that the first element is the voxel with co-ordinates `(0, 0, 0, 0)`
+
+But what is the next element?
+
+Is it voxel `(1, 0, 0, 0)`?
+
+Or `(0, 0, 0, 1)`?
+
+If the *first* axis is the one which varies fastest, we are using **Column-Major** ordering
+If the *last* axis is the one which varies fastest, we are using **Row-Major** ordering
+
+%% Cell type:markdown id: tags:
+
+## So, which is the standard?
+
+| Row-major            | Column-major |
+| ------------         | ---------    |
+| C/C++ native arrays  | Fortran      |
+| Python/Numpy default | Matlab       |
+| SAS                  | FSL NEWIMAGE |
+
+%% Cell type:markdown id: tags:
+
+
+### Here, we will need to make sure our Numpy arrays are passed as 1-dimensional float arrays in Column-major order to match NEWIMAGE
+
+    data.flatten(order='F').astype(np.float32)
+
+ - `'F'` stands for 'Fortran order'
+ - A C++ `float` is *almost* guaranteed to be 32 bits
+
+
+%% Cell type:markdown id: tags:
+
+
+![Cython](cython.png)
+
+# Cython extension
+
+## First, the Cython wrapper
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis_wrapper.pyx
+
+import numpy as np
+cimport numpy as np
+
+from libcpp.vector cimport vector
+
+cdef extern from "my_analysis.h":
+    vector[float] process_vector(vector[float] &, int, int, int, int)
+
+def process_using_vectors(data):
+    # Save the dimensions of the data because we're going to flatten it to 1D array
+    # Should be checking the dimensions at this point!
+    nx, ny, nz, nt = data.shape
+
+    # Convert data to 1D in Column-major (Fortran) order
+    # This always copies the data
+    data = data.flatten(order='F').astype(np.float32)
+
+    # This line is just so we can see if the data is being copied
+    print("In python the input data starts at %X" % data.__array_interface__['data'][0])
+
+    # Call the C++ code
+    output = process_vectors(data, nx, ny, nz, nt)
+
+    # Output is a 1D array in Fortran order - turn it back into a multidimensional array
+    # This should not copy the data
+    output = np.reshape(output, [nx, ny, nz, nt], order='F')
+    print("In python the reshaped data starts at %X" % output.__array_interface__['data'][0])
+
+    return output
+
+```
+
+%% Cell type:markdown id: tags:
+
+
+## Next, build the extension
+
+This would normally go in `setup.py`
+
+%% Cell type:code id: tags:
+
+``` python
+import os
+import sys
+import numpy
+
+from setuptools import setup
+from Cython.Build import cythonize
+from setuptools.extension import Extension
+
+# My Cython extension
+fsldir = os.environ["FSLDIR"]
+
+ext = Extension("my_analysis_wrapper",
+                 sources=['my_analysis_wrapper.pyx',
+                          'my_analysis.cpp'],
+                 language="c++",
+                 include_dirs=[".", numpy.get_include(),
+                               os.path.join(fsldir, "include"),
+                               os.path.join(fsldir, "extras/include"),
+                               os.path.join(fsldir, "extras/include/newmat")],
+                 libraries=['newimage', 'miscmaths', 'fslio', 'niftiio', 'newmat', 'znz', "zlib"],
+                 library_dirs=[os.path.join(fsldir, "lib"), os.path.join(fsldir, "extras/lib")])
+
+# setup parameters
+setup(name='my_app',
+      description='My Python application which calls C++',
+      ext_modules=cythonize(ext))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%run setup.py build_ext
+```
+
+%% Output
+
+    running build_ext
+    copying build\lib.win-amd64-2.7\my_analysis_wrapper.pyd ->
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import my_analysis_wrapper
+import matplotlib.pyplot as plt
+
+data = numpy.random.rand(10, 10, 10, 10)
+
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output = my_analysis_wrapper.process_with_vectors(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+```
+
+%% Output
+
+
+
+    In python the input data starts at A55A980
+    In C++ the input vector starts at address 000000000A87D5A0
+    
+    In python the reshaped data starts at A6B2040
+
+
+
+%% Cell type:markdown id: tags:
+
+# Great, it worked
+
+## But we did copy our data  - several times
+
+ - We copied the input once in Python to flatten it in the right order
+ - Cython copied it again, because the pointer to the memory in the `vector` is different from the pointer to the data in the Numpy array.
+ - Similarly, converting the output `vector` back to a Numpy array would involve a copy
+
+
+## Do we care?
+
+It depends on:
+
+ - Is the processing time per-voxel much greater than the data copying time?
+   - If so, copying will not add significant overhead
+ - Might the data be comparable in size to system memory?
+   - If so, copying may result in swapping and significant slowness
+
+
+
+%% Cell type:markdown id: tags:
+
+# Solution with less copying
+
+We can't use a `vector`, it needs to be free to manage its own memory, not use an existing fixed buffer.
+
+Instead pass a pure C array:
+
+%% Cell type:markdown id: tags:
+
+
+    void process_array(float *input, int nx, int ny, int nz, int nt)
+    {
+        cerr << "In C++ the input array starts at address " << input << std::endl;
+
+        NEWIMAGE::volume4D<float> invol(nx, ny, nz, nt, input);
+
+        process_volume(invol);
+
+        // Volume data buffer is modified directly, so provided it was not copied
+        // we should be able to see the output directly in Python
+    }
+
+%% Cell type:markdown id: tags:
+
+ - Note that we cannot check the size of the `input` buffer! It had better be correct
+
+%% Cell type:code id: tags:
+
+``` python
+# my_analysis_wrapper.pyx
+
+import numpy as np
+cimport numpy as np
+
+from libcpp.vector cimport vector
+
+cdef extern from "my_analysis.h":
+    void process_array(float *, int, int, int, int)
+
+def process_c(np.ndarray[np.float32_t, ndim=1] input,
+              nx, ny, nz, nt):
+    process_array(&input[0], nx, ny, nz, nt)
+
+def process_with_arrays(data):
+    # Save the dimension of the data because we're going to flatten it to 1D array
+    nx, ny, nz, nt = data.shape
+
+    # Convert data to 1D in Column-major (Fortran) order
+    data = data.flatten(order='F').astype(np.float32)
+
+    print("In python the data starts at %X" % data.__array_interface__['data'][0])
+
+    process_c(data, nx, ny, nz, nt)
+
+    data = np.reshape(data, [nx, ny, nz, nt], order='F')
+    print("In python the reshaped data starts at %X" % data.__array_interface__['data'][0])
+
+    return data
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy
+import my_analysis_wrapper
+
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output = my_analysis_wrapper.process_with_arrays(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+```
+
+%% Output
+
+
+
+    In python the data starts at A870040
+    In C++ the input array starts at address 000000000A870040
+    
+    In python the reshaped data starts at A870040
+
+
+
+%% Cell type:markdown id: tags:
+
+ - We copied our input data once when we flattened it into Fortran order
+ - C++ code operated directly on that buffer
+ - Output data was not copied when reshaped
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+ - Easy-ish recipe for passing Numpy arrays to C++ either as a `std::vector` or as a `float *` array.
+ - Can construct `NEWIMAGE::volume<float>` or other complex containers from within C++
+ - Easy modification to instead use `double` array
+ - Can pass Python strings to `C++ std::string` and other C++ containers in a similar way
+
+
+%% Cell type:markdown id: tags:
+
+# Alternatives (Briefly!)
+
+## Why?
+
+![Compatibility](cython_compat1.png)
+
+## Can we assume that our newly compiled Cython/C++ code will link correctly with `libnewimage.a`?
+
+ - Often, yes, but in general, no
+ - It depends on the compiler used for each - ideally they need to match
+ - The compiler of your Cython extension is **fixed** by the version of Python you are using
+ - Might need to recompile your dependency libraries with this compiler
+ - If you can't do this (e.g. commercial binary) you may be **stuck**
+
+%% Cell type:markdown id: tags:
+
+## Three common problem scenarios
+
+ - On Mac, need to use the same C++ standard library (either `libc++` or `libstdc++`)
+ - On Python 2, C++ compiler will be very old (may not support all of C++11)
+ - On Windows, no two versions of VC++ are binary compatible
+
+
+%% Cell type:markdown id: tags:
+
+# Alternative approach where this is a problem
+
+ - Make your code a shared library with a *Pure C* API
+ - Use `ctypes`
+
+## `ctypes`
+
+ - Part of Python standard library
+ - Allows you to call library functions from 'C' shared library (not C++)
+ - **Pure 'C' libraries are (generally) binary compatible on a given platform**
+ - We have to load the library manually
+ - We have to tell Python about the input and return types
+
+
+%% Cell type:markdown id: tags:
+
+# Pure 'C' API for our processing function
+
+    // my_analysis_purec.h
+
+    #ifdef __cplusplus
+    extern "C" {
+    #endif
+
+    void process_array(float *input, int nx, int ny, int nz, int nt);
+
+    #ifdef __cplusplus
+    }
+    #endif
+
+
+%% Cell type:markdown id: tags:
+
+ - Note need to use `extern "C" { }` if we may want to include this header from C++
+ - **On Windows, additional code is required to make the shared library (DLL) link correctly!**
+ - Note that the implementation *can* use C++
+
+%% Cell type:code id: tags:
+
+``` python
+import ctypes import CDLL, c_int, c_char_p
+import numpy as np
+import numpy.ctypeslib
+
+def process_ctypes(data):
+
+    clib = ctypes.cdll.LoadLibrary("libmy_analysis.so")
+
+    # This is the data type of a 1-D Numpy array
+    c_float_arr = numpy.ctypeslib.ndpointer(dtype=np.float32, ndim=1, flags='CONTIGUOUS')
+
+    # This specifies the argument types for the 'process_array' function
+    # This is not actually required but enables ctypes to do some error checking
+    clib.process_array.argtypes = [c_float_arr, c_int, c_int, c_int, c_int]
+
+    # Put the Numpy data into row-major order and make sure it is contiguous in memory
+    item = np.ascontiguousarray(item.flatten(order='F'), dtype=np.float32)
+
+    clib.process_carray(data, shape[0], shape[1], shape[2], shape[3])
+```
+
+%% Cell type:markdown id: tags:
+
+# Comparison with Cython
+
+## Cython advantages
+
+ - Python wrapper is probably a little quicker and cleaner to write
+ - Don't need to produce a new pure-C API provided we have an entry point using C++ types
+ - Potential for better error-checking
+ - Might integrate well if you are already using Cython
+ - No need to build a shared library
+
+## `ctypes` advantages
+
+ - Part of the Python standard library
+ - No additional compile step in `setup.py`
+ - Binary compatibility - no need to be tied to a single (perhaps old) C++ compiler
+
+## Conclusion?
+
+ - Use Cython when you can, `ctypes` if you have to
+
+
+%% Cell type:markdown id: tags:
+
+# Other alternatives (briefly for completeness)
+
+## Wrapper Generators (SWIG, shiboken, others)
+
+ - Run a preprocessor on your C++ code to generate an 'automatic' Python wrapper
+ - Generally need to write an 'interface specifier' for each C++ header to describe how it interfaces to Python
+ - Great when you have a large, complex C++ API which needs to be consistently exposed to Python (e.g. wx/wxpython, QT/PyQT)
+ - SWIG can support other languages as well as Python
+ - Probably more work than Cython/ctypes if you have a single simple API
+
+%% Cell type:markdown id: tags:
+
+## Just Wrap the Command Line
+
+ - Quick and dirty
+ - Copies all data to/from filesystem
+ - Need to go via command line API, create temp directories, etc
+ - Don't overlook as a way of getting started - can move to other solution later
+
+
+%% Cell type:code id: tags:
+
+``` python
+import os
+import tempfile
+import subprocess
+import tempfile
+import shutil
+
+import numpy as np
+import nibabel as nib
+os.environ["FSLOUTPUTTYPE"] = "NIFTI_GZ"
+
+def binarise(data):
+    # Remember the directory where we started
+    cwd_orig = os.getcwd()
+    try:
+        # Create a temporary directory
+        tempdir = tempfile.mkdtemp("fsl")
+
+        # Save input data in temp directory
+        os.chdir(tempdir)
+        tmpin = nib.Nifti1Image(data, np.identity(4))
+        tmpin.to_filename("in.nii.gz")
+
+        # Run a command from $FSLDIR
+        fslmaths = os.path.join(os.environ["FSLDIR"], "bin", "fslmaths")
+
+        # We could use os.system here if we don't care about returning the stdout/stderr
+        p = subprocess.Popen([fslmaths, "in.nii.gz", "-thr", "0.5", "-bin", "out.nii.gz"],
+                             stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        cmd_stdout = ""
+        while 1:
+            retcode = p.poll()
+            cmd_stdout += p.stdout.readline()
+            if retcode is not None: break
+        if retcode != 0:
+            raise RuntimeError("Error: %s" % cmd_stdout)
+
+        # Load the output file and return it with the command standard output
+        out_nii = nib.load("out.nii.gz")
+        return out_nii.get_data(), cmd_stdout
+    finally:
+        # Change back to our starting directory
+        os.chdir(cwd_orig)
+
+data = np.random.rand(10, 10, 10, 10)
+plt.imshow(data[:,:,5,5])
+plt.show()
+
+output, stdout = binarise(data)
+plt.imshow(output[:,:,5,5])
+plt.show()
+
+```
+
+%% Output
+
+
+
+
+
+%% Cell type:markdown id: tags:
+
+# Summary
+
+## What we've done
+
+ - It's not that hard to call existing C++ code from Python
+ - Need to be a bit careful with Numpy arrays
+ - Cython is probably the easiest method
+ - Data copying can be minimised by passing data as C arrays (`float *` etc)
+ - `ctypes` may be a good alternative if you have binary compatibility issues
+ - Can always wrap a command line tool as a way of getting started!
+
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/talks/pytreat_debryf/Debryf.key
+++ b/talks/pytreat_debryf/Debryf.key
--- a/talks/scikit-learn/.gitkeep
+++ b/talks/scikit-learn/.gitkeep
--- a/talks/scikit-learn/Scikits-learn.ipynb
+++ b/talks/scikit-learn/Scikits-learn.ipynb
--- a/talks/speed/speed.ipynb
+++ b/talks/speed/speed.ipynb
 %% Cell type:markdown id: tags:

 # Numpy vectorizing

 %% Cell type:code id: tags:

 ``` python
 import numpy as np
 ```

 %% Cell type:code id: tags:

 ``` python
 def root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0)
 ```

-%% Output
-
-    ((nan, nan), (-1.0, 4.0), (0.0, 0.0))
-
 %% Cell type:code id: tags:

 ``` python
 a = np.random.randn(int(1e6))
 b = np.random.randn(int(1e6))
 c = np.random.randn(int(1e6))
 root(a, b, c)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    ValueError                                Traceback (most recent call last)
-    <ipython-input-19-5d1fed3ed2df> in <module>()
-          2 b = np.random.randn(int(1e6))
-          3 c = np.random.randn(int(1e6))
-    ----> 4 root(a, b, c)
-
-    <ipython-input-18-54b500cd66b1> in root(a, b, c)
-          1 def root(a, b, c):
-          2     D = b ** 2 - 4 * a * c
-    ----> 3     if D < 0:
-          4         return np.nan, np.nan
-          5     x1 = (-b + np.sqrt(D)) / (2 * a)
-    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 np.array([root(av, bv, cv) for av, bv, cv in zip(a, b, c)])
 ```

-%% Output
-
-    CPU times: user 3.64 s, sys: 65.9 ms, total: 3.71 s
-    Wall time: 3.71 s
-
-    array([[-0.55797188,  1.11121928],
-           [        nan,         nan],
-           [ 0.76166857, -1.70637551],
-           ...,
-           [-0.56859783,  0.98659831],
-           [        nan,         nan],
-           [-0.53531175,  0.18339357]])
-
 %% Cell type:code id: tags:

 ``` python
 root_vec = np.vectorize(root)
 np.vectorize?
 ```

 %% Cell type:code id: tags:

 ``` python
 %%time
 root_vec(a, b, c)
 ```

-%% Output
-
-    CPU times: user 1.85 s, sys: 71.9 ms, total: 1.92 s
-    Wall time: 1.92 s
-
-    (array([-0.55797188,         nan,  0.76166857, ..., -0.56859783,
-                    nan, -0.53531175]),
-     array([ 1.11121928,         nan, -1.70637551, ...,  0.98659831,
-                    nan,  0.18339357]))
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 root_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 1.66 s, sys: 62.7 ms, total: 1.73 s
-    Wall time: 1.73 s
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:markdown id: tags:

 # Numba

 %% Cell type:code id: tags:

 ``` python
 import numba
 ```

 %% Cell type:code id: tags:

 ``` python
 root_jit = numba.jit(root)
 ```

 %% Cell type:code id: tags:

 ``` python
 %timeit root(23, 78, 19.0)
 ```

-%% Output
-
-    3.08 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
-
 %% Cell type:code id: tags:

 ``` python
 %timeit root_jit(23, 78, 19.0)
 ```

-%% Output
-
-    The slowest run took 11.09 times longer than the fastest. This could mean that an intermediate result is being cached.
-    786 ns ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 root_jit(a, b, 7)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    ValueError                                Traceback (most recent call last)
-    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
-
-The above exception was the direct cause of the following exception:
-    SystemError                               Traceback (most recent call last)
-    <timed eval> in <module>()
-    SystemError: CPUDispatcher(<function root at 0x114035620>) returned a result with an error set
-
 %% Cell type:code id: tags:

 ``` python
 root_jit_vec = np.vectorize(root_jit)
 %time root_jit_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 406 ms, sys: 57.2 ms, total: 464 ms
-    Wall time: 464 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 @np.vectorize
 @numba.jit
 def fast_root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 %time fast_root(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 391 ms, sys: 58.2 ms, total: 449 ms
-    Wall time: 448 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 @numba.vectorize
 def fast_root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 %time fast_root(a, b, 7)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    NotImplementedError                       Traceback (most recent call last)
-    <timed eval> in <module>()
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/dufunc.py in _compile_for_args(self, *args, **kws)
-        166                     argty = argty.dtype
-        167                 argtys.append(argty)
-    --> 168         return self._compile_for_argtys(tuple(argtys))
-        169
-        170     def _compile_for_argtys(self, argtys, return_type=None):
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/dufunc.py in _compile_for_argtys(self, argtys, return_type)
-        188             cres, argtys, return_type)
-        189         dtypenums, ptr, env = ufuncbuilder._build_element_wise_ufunc_wrapper(
-    --> 190             cres, actual_sig)
-        191         self._add_loop(utils.longint(ptr), dtypenums)
-        192         self._keepalive.append((ptr, cres.library, env))
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/ufuncbuilder.py in _build_element_wise_ufunc_wrapper(cres, signature)
-        163     # Get dtypes
-        164     dtypenums = [as_dtype(a).num for a in signature.args]
-    --> 165     dtypenums.append(as_dtype(signature.return_type).num)
-        166     return dtypenums, ptr, env
-        167
-    ~/miniconda3/lib/python3.6/site-packages/numba/numpy_support.py in as_dtype(nbtype)
-        134         return as_dtype(nbtype.dtype)
-        135     raise NotImplementedError("%r cannot be represented as a Numpy dtype"
-    --> 136                               % (nbtype,))
-        137
-        138
-    NotImplementedError: (float64 x 2) cannot be represented as a Numpy dtype
-
 %% Cell type:markdown id: tags:

 # But run the profiler first

 %% Cell type:code id: tags:

 ``` python
 def fib(n):
    if n <= 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)
 %time fib(33)
 ```

-%% Output
-
-    CPU times: user 1.56 s, sys: 3.5 ms, total: 1.56 s
-    Wall time: 1.56 s
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 @numba.jit
 def fib_jit(n):
    if n <= 1:
        return 1
    else:
        return fib_jit(n-2) + fib_jit(n-1)
 %time fib_jit(33)
 ```

-%% Output
-
-    CPU times: user 63.4 ms, sys: 1.79 ms, total: 65.2 ms
-    Wall time: 64.2 ms
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_jit(41)
 ```

-%% Output
-
-    CPU times: user 1.28 s, sys: 5.26 ms, total: 1.28 s
-    Wall time: 1.28 s
-
-    267914296
-
 %% Cell type:code id: tags:

 ``` python
-%prun fib(33)
+%prun fib_jit(41)
 ```

-%% Output
-
-    
-
 %% Cell type:code id: tags:

 ``` python
-%prun fib_jit(41)
+%prun fib(33)
 ```

-%% Output
-
-    
-
 %% Cell type:code id: tags:

 ``` python
 from functools import lru_cache
 @lru_cache(None)
 def fib_cache(n):
    if n <= 1:
        return 1
    else:
        return fib_cache(n-2) + fib_cache(n-1)
 %time fib_cache(33)
 ```

-%% Output
-
-    CPU times: user 20 µs, sys: 1 µs, total: 21 µs
-    Wall time: 24.1 µs
-
-%% Cell type:code id: tags:
-
-``` python
-lru_cache?
-```
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_cache(500)
 ```

-%% Output
-
-    CPU times: user 261 µs, sys: 1e+03 ns, total: 262 µs
-    Wall time: 267 µs
-
-    225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626
-
 %% Cell type:markdown id: tags:

 # Optimizing numpy algebra

 %% Cell type:markdown id: tags:

 Multiplying/adding arrays incurs a lot of overhead, because lots of intermediate arrays get created and discarded again

 %% Cell type:code id: tags:

 ``` python
 a = np.random.randn(int(1e7))
 b = np.random.randn(int(1e7))
 %timeit a * b + 4.1 * a < 3 * b
 ```

-%% Output
-
-    139 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
-
 %% Cell type:code id: tags:

 ``` python
 import numexpr as ne
 %timeit ne.evaluate('a * b + 4.1 * a < 3 * b')
 ```

-%% Output
-
-    11.4 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-
 %% Cell type:markdown id: tags:

 Efficiency gain reduces once the operations become more complex

 %% Cell type:code id: tags:

 ``` python
-%timeit a * np.sqrt(abs(b)) < 3 * b
+c = np.random.randn(int(1e7))
+%timeit (-b - np.sqrt(b ** 2 - 4 * a * c) ) / (2 * a)
 ```

-%% Output
-
-    141 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
-
 %% Cell type:code id: tags:

 ``` python
-%timeit ne.evaluate('a * sqrt(abs(b)) < 3 * b')
+%timeit ne.evaluate('(-b - sqrt(b ** 2 - 4 * a * c) ) / (2 * a)')
 ```

-%% Output
-
-    12.6 ms ± 986 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-
 %% Cell type:markdown id: tags:

 # Cython

 %% Cell type:markdown id: tags:

 http://docs.cython.org/en/latest/

 Compiles most python programs to C code for extra speed (but less introspection). Also allows direct calling of C library code.

 Can include python classes and much more.

 %% Cell type:code id: tags:

 ``` python
 %load_ext cython
 ```

-%% Output
-
-    The cython extension is already loaded. To reload it, use:
-      %reload_ext cython
-
 %% Cell type:code id: tags:

 ``` python
 def primes(kmax):
    p = {}
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
+print(primes(10))
+%time _ = primes(10000)
 ```

-%% Output
-
-    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
-    CPU times: user 67.4 ms, sys: 1.22 ms, total: 68.6 ms
-    Wall time: 68 ms
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
-def cprimes(int kmax):
-    cdef int[1000] p
+def cprimes(kmax):
+    p = {}
    result = []
    if kmax > 1000:
        kmax = 1000
-    cdef int k, n, i
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
 ```

-%% Output
-
-    <IPython.core.display.HTML object>
-
 %% Cell type:code id: tags:

 ``` python
 print(cprimes(10))
 %time _ = cprimes(10000)
 ```

-%% Output
-
-    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
-    CPU times: user 2.2 ms, sys: 10 µs, total: 2.21 ms
-    Wall time: 2.22 ms
-
 %% Cell type:markdown id: tags:

 ## Defining C-style function

 %% Cell type:code id: tags:

 ``` python
-#%%cython -a
-def fib(n):
+%%cython -a
+def cfib(n):
    if n <= 1:
        return 1
    else:
-        return fib(n-2) + fib(n-1)
+        return cfib(n-2) + cfib(n-1)
 ```

 %% Cell type:code id: tags:

 ``` python
-%time fib(33)
+%time cfib(33)
 ```

-%% Output
-
-    CPU times: user 1.48 s, sys: 1.79 ms, total: 1.48 s
-    Wall time: 1.48 s
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_jit(33)
 ```

-%% Output
-
-    CPU times: user 27.5 ms, sys: 73 µs, total: 27.5 ms
-    Wall time: 27.5 ms
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
 cdef int cfib(int n) nogil:
    if n <= 1:
        return 1
    else:
        return cfib(n-2) + cfib(n-1)

 def fib(n):
    return cfib(n)
 ```

-%% Output
-
-    <IPython.core.display.HTML object>
-
 %% Cell type:code id: tags:

 ``` python
 %time fib(33)
 ```

-%% Output
-
-    CPU times: user 10.2 ms, sys: 138 µs, total: 10.4 ms
-    Wall time: 10.4 ms
-
-    5702887
-
 %% Cell type:markdown id: tags:

 ## Cython in production code

 %% Cell type:code id: tags:

 ``` python
 def root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
-        return NAN, NAN
-    x1 = (-b + sqrt(D)) / (2 * a)
-    x2 = (-b - sqrt(D)) / (2 * a)
+        return np.nan, np.nan
+    x1 = (-b + np.sqrt(D)) / (2 * a)
+    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
+root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0)
 ```

 %% Cell type:code id: tags:

 ``` python
+a = np.random.randn(int(1e6))
+b = np.random.randn(int(1e6))
+c = np.random.randn(int(1e6))
 %time np.vectorize(root)(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 236 ms, sys: 53.7 ms, total: 290 ms
-    Wall time: 288 ms
-
-    (array([-1.72246969, -2.6378758 ,         nan, ..., -2.20722389,
-                    nan, -1.84415698]),
-     array([2.27571702, 2.04353046,        nan, ..., 2.62522459,        nan,
-            1.49223876]))
-
 %% Cell type:code id: tags:

 ``` python
 %time root_jit_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 353 ms, sys: 57.8 ms, total: 411 ms
-    Wall time: 410 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
 from libc.math cimport sqrt, NAN

 def root(float a, float b, float c):
    cdef float D, x1, x2
    D = b ** 2 - 4 * a * c
    if D < 0:
        return NAN, NAN
    x1 = (-b + sqrt(D)) / (2 * a)
    x2 = (-b - sqrt(D)) / (2 * a)
    return x1, x2
 print(root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0))
 ```

 %% Cell type:markdown id: tags:

-## Cython in production code
+## More cython

 %% Cell type:markdown id: tags:

 http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html

 %% Cell type:markdown id: tags:

 # Steps to optimizing code:
 - Profile where the bottleneck is (%prun)
+  - You can install a [line profiler](https://github.com/rkern/line_profiler) (available as %lprun after installation)
 - Is there a faster algorithm for the bottleneck?
 - If the bottleneck is vectorized: can we optimize with numexpr?
 - If the internal part of the loop can not be vectorized:
    - numba jit if simple enough otherwise cython
    - the loop itself can be sped up with numpy.vectorize
 - If the bottleneck gets called too often: cache the result
 - If even that fails: pycuda
 - If that is too slow: learn to optimize cuda code or find an easier problem

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Numpy vectorizing

 %% Cell type:code id: tags:

 ``` python
 import numpy as np
 ```

 %% Cell type:code id: tags:

 ``` python
 def root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0)
 ```

-%% Output
-
-    ((nan, nan), (-1.0, 4.0), (0.0, 0.0))
-
 %% Cell type:code id: tags:

 ``` python
 a = np.random.randn(int(1e6))
 b = np.random.randn(int(1e6))
 c = np.random.randn(int(1e6))
 root(a, b, c)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    ValueError                                Traceback (most recent call last)
-    <ipython-input-19-5d1fed3ed2df> in <module>()
-          2 b = np.random.randn(int(1e6))
-          3 c = np.random.randn(int(1e6))
-    ----> 4 root(a, b, c)
-
-    <ipython-input-18-54b500cd66b1> in root(a, b, c)
-          1 def root(a, b, c):
-          2     D = b ** 2 - 4 * a * c
-    ----> 3     if D < 0:
-          4         return np.nan, np.nan
-          5     x1 = (-b + np.sqrt(D)) / (2 * a)
-    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 np.array([root(av, bv, cv) for av, bv, cv in zip(a, b, c)])
 ```

-%% Output
-
-    CPU times: user 3.64 s, sys: 65.9 ms, total: 3.71 s
-    Wall time: 3.71 s
-
-    array([[-0.55797188,  1.11121928],
-           [        nan,         nan],
-           [ 0.76166857, -1.70637551],
-           ...,
-           [-0.56859783,  0.98659831],
-           [        nan,         nan],
-           [-0.53531175,  0.18339357]])
-
 %% Cell type:code id: tags:

 ``` python
 root_vec = np.vectorize(root)
 np.vectorize?
 ```

 %% Cell type:code id: tags:

 ``` python
 %%time
 root_vec(a, b, c)
 ```

-%% Output
-
-    CPU times: user 1.85 s, sys: 71.9 ms, total: 1.92 s
-    Wall time: 1.92 s
-
-    (array([-0.55797188,         nan,  0.76166857, ..., -0.56859783,
-                    nan, -0.53531175]),
-     array([ 1.11121928,         nan, -1.70637551, ...,  0.98659831,
-                    nan,  0.18339357]))
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 root_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 1.66 s, sys: 62.7 ms, total: 1.73 s
-    Wall time: 1.73 s
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:markdown id: tags:

 # Numba

 %% Cell type:code id: tags:

 ``` python
 import numba
 ```

 %% Cell type:code id: tags:

 ``` python
 root_jit = numba.jit(root)
 ```

 %% Cell type:code id: tags:

 ``` python
 %timeit root(23, 78, 19.0)
 ```

-%% Output
-
-    3.08 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
-
 %% Cell type:code id: tags:

 ``` python
 %timeit root_jit(23, 78, 19.0)
 ```

-%% Output
-
-    The slowest run took 11.09 times longer than the fastest. This could mean that an intermediate result is being cached.
-    786 ns ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
-
 %% Cell type:code id: tags:

 ``` python
 %%time
 root_jit(a, b, 7)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    ValueError                                Traceback (most recent call last)
-    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
-
-The above exception was the direct cause of the following exception:
-    SystemError                               Traceback (most recent call last)
-    <timed eval> in <module>()
-    SystemError: CPUDispatcher(<function root at 0x114035620>) returned a result with an error set
-
 %% Cell type:code id: tags:

 ``` python
 root_jit_vec = np.vectorize(root_jit)
 %time root_jit_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 406 ms, sys: 57.2 ms, total: 464 ms
-    Wall time: 464 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 @np.vectorize
 @numba.jit
 def fast_root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 %time fast_root(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 391 ms, sys: 58.2 ms, total: 449 ms
-    Wall time: 448 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 @numba.vectorize
 def fast_root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
        return np.nan, np.nan
    x1 = (-b + np.sqrt(D)) / (2 * a)
    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
 %time fast_root(a, b, 7)
 ```

-%% Output
-
-    ---------------------------------------------------------------------------
-    NotImplementedError                       Traceback (most recent call last)
-    <timed eval> in <module>()
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/dufunc.py in _compile_for_args(self, *args, **kws)
-        166                     argty = argty.dtype
-        167                 argtys.append(argty)
-    --> 168         return self._compile_for_argtys(tuple(argtys))
-        169
-        170     def _compile_for_argtys(self, argtys, return_type=None):
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/dufunc.py in _compile_for_argtys(self, argtys, return_type)
-        188             cres, argtys, return_type)
-        189         dtypenums, ptr, env = ufuncbuilder._build_element_wise_ufunc_wrapper(
-    --> 190             cres, actual_sig)
-        191         self._add_loop(utils.longint(ptr), dtypenums)
-        192         self._keepalive.append((ptr, cres.library, env))
-    ~/miniconda3/lib/python3.6/site-packages/numba/npyufunc/ufuncbuilder.py in _build_element_wise_ufunc_wrapper(cres, signature)
-        163     # Get dtypes
-        164     dtypenums = [as_dtype(a).num for a in signature.args]
-    --> 165     dtypenums.append(as_dtype(signature.return_type).num)
-        166     return dtypenums, ptr, env
-        167
-    ~/miniconda3/lib/python3.6/site-packages/numba/numpy_support.py in as_dtype(nbtype)
-        134         return as_dtype(nbtype.dtype)
-        135     raise NotImplementedError("%r cannot be represented as a Numpy dtype"
-    --> 136                               % (nbtype,))
-        137
-        138
-    NotImplementedError: (float64 x 2) cannot be represented as a Numpy dtype
-
 %% Cell type:markdown id: tags:

 # But run the profiler first

 %% Cell type:code id: tags:

 ``` python
 def fib(n):
    if n <= 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)
 %time fib(33)
 ```

-%% Output
-
-    CPU times: user 1.56 s, sys: 3.5 ms, total: 1.56 s
-    Wall time: 1.56 s
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 @numba.jit
 def fib_jit(n):
    if n <= 1:
        return 1
    else:
        return fib_jit(n-2) + fib_jit(n-1)
 %time fib_jit(33)
 ```

-%% Output
-
-    CPU times: user 63.4 ms, sys: 1.79 ms, total: 65.2 ms
-    Wall time: 64.2 ms
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_jit(41)
 ```

-%% Output
-
-    CPU times: user 1.28 s, sys: 5.26 ms, total: 1.28 s
-    Wall time: 1.28 s
-
-    267914296
-
 %% Cell type:code id: tags:

 ``` python
-%prun fib(33)
+%prun fib_jit(41)
 ```

-%% Output
-
-    
-
 %% Cell type:code id: tags:

 ``` python
-%prun fib_jit(41)
+%prun fib(33)
 ```

-%% Output
-
-    
-
 %% Cell type:code id: tags:

 ``` python
 from functools import lru_cache
 @lru_cache(None)
 def fib_cache(n):
    if n <= 1:
        return 1
    else:
        return fib_cache(n-2) + fib_cache(n-1)
 %time fib_cache(33)
 ```

-%% Output
-
-    CPU times: user 20 µs, sys: 1 µs, total: 21 µs
-    Wall time: 24.1 µs
-
-%% Cell type:code id: tags:
-
-``` python
-lru_cache?
-```
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_cache(500)
 ```

-%% Output
-
-    CPU times: user 261 µs, sys: 1e+03 ns, total: 262 µs
-    Wall time: 267 µs
-
-    225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626
-
 %% Cell type:markdown id: tags:

 # Optimizing numpy algebra

 %% Cell type:markdown id: tags:

 Multiplying/adding arrays incurs a lot of overhead, because lots of intermediate arrays get created and discarded again

 %% Cell type:code id: tags:

 ``` python
 a = np.random.randn(int(1e7))
 b = np.random.randn(int(1e7))
 %timeit a * b + 4.1 * a < 3 * b
 ```

-%% Output
-
-    139 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
-
 %% Cell type:code id: tags:

 ``` python
 import numexpr as ne
 %timeit ne.evaluate('a * b + 4.1 * a < 3 * b')
 ```

-%% Output
-
-    11.4 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-
 %% Cell type:markdown id: tags:

 Efficiency gain reduces once the operations become more complex

 %% Cell type:code id: tags:

 ``` python
-%timeit a * np.sqrt(abs(b)) < 3 * b
+c = np.random.randn(int(1e7))
+%timeit (-b - np.sqrt(b ** 2 - 4 * a * c) ) / (2 * a)
 ```

-%% Output
-
-    141 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
-
 %% Cell type:code id: tags:

 ``` python
-%timeit ne.evaluate('a * sqrt(abs(b)) < 3 * b')
+%timeit ne.evaluate('(-b - sqrt(b ** 2 - 4 * a * c) ) / (2 * a)')
 ```

-%% Output
-
-    12.6 ms ± 986 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-
 %% Cell type:markdown id: tags:

 # Cython

 %% Cell type:markdown id: tags:

 http://docs.cython.org/en/latest/

 Compiles most python programs to C code for extra speed (but less introspection). Also allows direct calling of C library code.

 Can include python classes and much more.

 %% Cell type:code id: tags:

 ``` python
 %load_ext cython
 ```

-%% Output
-
-    The cython extension is already loaded. To reload it, use:
-      %reload_ext cython
-
 %% Cell type:code id: tags:

 ``` python
 def primes(kmax):
    p = {}
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
+print(primes(10))
+%time _ = primes(10000)
 ```

-%% Output
-
-    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
-    CPU times: user 67.4 ms, sys: 1.22 ms, total: 68.6 ms
-    Wall time: 68 ms
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
-def cprimes(int kmax):
-    cdef int[1000] p
+def cprimes(kmax):
+    p = {}
    result = []
    if kmax > 1000:
        kmax = 1000
-    cdef int k, n, i
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
 ```

-%% Output
-
-    <IPython.core.display.HTML object>
-
 %% Cell type:code id: tags:

 ``` python
 print(cprimes(10))
 %time _ = cprimes(10000)
 ```

-%% Output
-
-    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
-    CPU times: user 2.2 ms, sys: 10 µs, total: 2.21 ms
-    Wall time: 2.22 ms
-
 %% Cell type:markdown id: tags:

 ## Defining C-style function

 %% Cell type:code id: tags:

 ``` python
-#%%cython -a
-def fib(n):
+%%cython -a
+def cfib(n):
    if n <= 1:
        return 1
    else:
-        return fib(n-2) + fib(n-1)
+        return cfib(n-2) + cfib(n-1)
 ```

 %% Cell type:code id: tags:

 ``` python
-%time fib(33)
+%time cfib(33)
 ```

-%% Output
-
-    CPU times: user 1.48 s, sys: 1.79 ms, total: 1.48 s
-    Wall time: 1.48 s
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %time fib_jit(33)
 ```

-%% Output
-
-    CPU times: user 27.5 ms, sys: 73 µs, total: 27.5 ms
-    Wall time: 27.5 ms
-
-    5702887
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
 cdef int cfib(int n) nogil:
    if n <= 1:
        return 1
    else:
        return cfib(n-2) + cfib(n-1)

 def fib(n):
    return cfib(n)
 ```

-%% Output
-
-    <IPython.core.display.HTML object>
-
 %% Cell type:code id: tags:

 ``` python
 %time fib(33)
 ```

-%% Output
-
-    CPU times: user 10.2 ms, sys: 138 µs, total: 10.4 ms
-    Wall time: 10.4 ms
-
-    5702887
-
 %% Cell type:markdown id: tags:

 ## Cython in production code

 %% Cell type:code id: tags:

 ``` python
 def root(a, b, c):
    D = b ** 2 - 4 * a * c
    if D < 0:
-        return NAN, NAN
-    x1 = (-b + sqrt(D)) / (2 * a)
-    x2 = (-b - sqrt(D)) / (2 * a)
+        return np.nan, np.nan
+    x1 = (-b + np.sqrt(D)) / (2 * a)
+    x2 = (-b - np.sqrt(D)) / (2 * a)
    return x1, x2
+root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0)
 ```

 %% Cell type:code id: tags:

 ``` python
+a = np.random.randn(int(1e6))
+b = np.random.randn(int(1e6))
+c = np.random.randn(int(1e6))
 %time np.vectorize(root)(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 236 ms, sys: 53.7 ms, total: 290 ms
-    Wall time: 288 ms
-
-    (array([-1.72246969, -2.6378758 ,         nan, ..., -2.20722389,
-                    nan, -1.84415698]),
-     array([2.27571702, 2.04353046,        nan, ..., 2.62522459,        nan,
-            1.49223876]))
-
 %% Cell type:code id: tags:

 ``` python
 %time root_jit_vec(a, b, 7)
 ```

-%% Output
-
-    CPU times: user 353 ms, sys: 57.8 ms, total: 411 ms
-    Wall time: 410 ms
-
-    (array([-1.72246968, -2.63787563,         nan, ..., -2.20722406,
-                    nan, -1.84415698]),
-     array([2.27571708, 2.04353033,        nan, ..., 2.62522454,        nan,
-            1.4922388 ]))
-
 %% Cell type:code id: tags:

 ``` python
 %%cython -a
 from libc.math cimport sqrt, NAN

 def root(float a, float b, float c):
    cdef float D, x1, x2
    D = b ** 2 - 4 * a * c
    if D < 0:
        return NAN, NAN
    x1 = (-b + sqrt(D)) / (2 * a)
    x2 = (-b - sqrt(D)) / (2 * a)
    return x1, x2
 print(root(1, 3, 4), root(-1, 3, 4), root(1, 0, 0))
 ```

 %% Cell type:markdown id: tags:

-## Cython in production code
+## More cython

 %% Cell type:markdown id: tags:

 http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html

 %% Cell type:markdown id: tags:

 # Steps to optimizing code:
 - Profile where the bottleneck is (%prun)
+  - You can install a [line profiler](https://github.com/rkern/line_profiler) (available as %lprun after installation)
 - Is there a faster algorithm for the bottleneck?
 - If the bottleneck is vectorized: can we optimize with numexpr?
 - If the internal part of the loop can not be vectorized:
    - numba jit if simple enough otherwise cython
    - the loop itself can be sped up with numpy.vectorize
 - If the bottleneck gets called too often: cache the result
 - If even that fails: pycuda
 - If that is too slow: learn to optimize cuda code or find an easier problem

 %% Cell type:code id: tags:

 ``` python
 ```

--- a/talks/tensorflow/.gitkeep
+++ b/talks/tensorflow/.gitkeep
--- a/talks/tensorflow/demos/.gitkeep
+++ b/talks/tensorflow/demos/.gitkeep
No results found