Skip to content
Snippets Groups Projects

Main scientific python libraries

See https://scipy.org/

Most of these packages have or are in the progress of dropping support for python2. So use python3!

Numpy: arrays

This is the main library underlying (nearly) all of the scientific python ecosystem. See the tutorial in the beginner session or the official numpy tutorial for usage details.

The usual nickname of numpy is np:

import numpy as np

Numpy includes support for:

  • N-dimensional arrays with various datatypes
    • masked arrays
    • matrices
    • structured/record array
  • basic functions (e.g., sin, log, arctan, polynomials)
  • basic linear algebra
  • random number generation

Scipy: most general scientific tools

At the top level this module includes all of the basic functionality from numpy. You could import this as, but you might as well import numpy directly.

import scipy as sp

The main strength in scipy lies in its sub-packages:

from scipy import optimize
def costfunc(params):
    return (params[0] - 3) ** 2
optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')

Tutorials for all sub-packages can be found here.

Alternative for scipy.ndimage:

  • Scikit-image for image manipulation/segmentation/feature detection

Matplotlib: Main plotting library

import matplotlib as mpl
mpl.use('nbagg')
import matplotlib.pyplot as plt

The matplotlib tutorials are here

x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.xlabel('x label')
plt.ylabel('y label')

plt.title("Simple Plot")

plt.legend()

plt.show()

Alternatives:

  • Mayavi: 3D plotting (hard to install)
  • Bokeh among many others: interactive plots in the browser (i.e., in javascript)

Ipython/Jupyter notebook: interactive python environments

Features:

  • debugging
from scipy import optimize
def costfunc(params):
    if params[0] <= 0:
        raise ValueError('Input variable is too low')
    return 1 / params[0]
optimize.minimize(costfunc, x0=[3], method='l-bfgs-b')
%debug
  • timing/profiling
%%prun
plt.plot([0, 3])
  • getting help
plt.plot?

The next generation is already out: jupyterlab

There are many useful extensions available.

Pandas: Analyzing "clean" data

Once your data is in tabular form (e.g. Biobank IDP's), you want to use pandas dataframes to analyze them. This brings most of the functionality of R into python. Pandas has excellent support for:

  • fast IO to many tabular formats
  • accurate handling of missing data
  • Many, many routines to handle data
    • group by categorical data (e.g., male/female)
    • joining/merging data (all SQL-like operations and much more)
    • time series support
  • statistical models through statsmodels
  • plotting though seaborn
  • Use dask if your data is too big for memory (or if you want to run in parallel)

You should also install numexpr and bottleneck for optimal performance.

For the documentation check here

Adjusted example from statsmodels tutorial

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
df = sm.datasets.get_rdataset("Guerry", "HistData").data
df
df.describe()
df.groupby('Region').mean()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=df).fit()
results.summary()
df['log_pop'] = np.log(df.Pop1831)
df
results = smf.ols('Lottery ~ Literacy + log_pop', data=df).fit()
results.summary()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region', data=df).fit()
results.summary()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region + Region * Literacy', data=df).fit()
results.summary()
%matplotlib nbagg
import seaborn as sns
sns.pairplot(df, hue="Region", vars=('Lottery', 'Literacy', 'log_pop'))

Sympy: Symbolic programming

import sympy as sym  # no standard nickname
x, a, b, c = sym.symbols('x, a, b, c')
sym.solve(a * x ** 2 + b * x + c, x)
sym.integrate(x/(x**2+a*x+2), x)
f = sym.utilities.lambdify((x, a), sym.integrate((x**2+a*x+2), x))
f(np.random.rand(10), np.random.rand(10))

Other topics

Argparse: Command line arguments

%%writefile test_argparse.py
import argparse

def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

if __name__ == '__main__':
    main()
%run test_argparse.py 3 8 -v
%run test_argparse.py -h
%run test_argparse.py 3 8.5

Alternatives:

  • docopt: You write a usage string, docopt will generate the parser
# example from https://realpython.com/blog/python/comparing-python-command-line-parsing-libraries-argparse-docopt-click/
"""Greeter.

Usage:
  commands.py hello
  commands.py goodbye
  commands.py -h | --help

Options:
  -h --help     Show this screen.
"""
from docopt import docopt

if __name__ == '__main__':
    arguments = docopt(__doc__)
  • clize: You write a function, clize will generate the parser
from clize import run

def echo(word):
    return word

if __name__ == '__main__':
    run(echo)

Gooey: GUI from command line tool

%%writefile test_gooey.py
import argparse
from gooey import Gooey

@Gooey
def main():
    parser = argparse.ArgumentParser(description="calculate X to the power of Y")
    parser.add_argument("-v", "--verbose", action="store_true")
    parser.add_argument("x", type=int, help="the base")
    parser.add_argument("y", type=int, help="the exponent")
    args = parser.parse_args()
    answer = args.x**args.y

    if args.verbose:
        print("{} to the power {} equals {}".format(args.x, args.y, answer))
    else:
        print("{}^{} == {}".format(args.x, args.y, answer))

if __name__ == '__main__':
    main()
!python.app test_gooey.py
!gcoord_gui

Jinja2: Templating language

Jinja2 allows to create templates of files with placeholders, where future content will go. This allows for the creation of a large number of similar files.

This can for example be used to produce static HTML output in a highly flexible manner.

%%writefile image_list.jinja2
<!DOCTYPE html>
<html lang="en">
<head>
    {% block head %}
    <title>{{ title }}</title>
    {% endblock %}
</head>
<body>
    <div id="content">
        {% block content %}
            {% for description, filenames in images %}
                <p>
                    {{ description }}
                </p>
                {% for filename in filenames %}
                    <a href="{{ filename }}">
                        <img src="{{ filename }}">
                    </a>
                {% endfor %}
            {% endfor %}
        {% endblock %}
    </div>
    <footer>
        Created on {{ time }}
    </footer>
</body>
</html>
import numpy as np
import matplotlib.pyplot as plt
plt.ioff()

def plot_sine(amplitude, frequency):
    x = np.linspace(0, 2 * np.pi, 100)
    y = amplitude * np.sin(frequency * x)
    plt.plot(x, y)
    plt.xticks([0, np.pi, 2 * np.pi], ['0', '$\pi$', '$2 \pi$'])
    plt.ylim(-1.1, 1.1)
    filename = 'plots/A{:.2f}_F{:.2f}.png'.format(amplitude, frequency)
    plt.title('A={:.2f}, F={:.2f}'.format(amplitude, frequency))
    plt.savefig(filename)
    plt.close(plt.gcf())
    return filename

!mkdir plots
amplitudes = [plot_sine(A, 1.) for A in [0.1, 0.3, 0.7, 1.0]]
frequencies = [plot_sine(1., F) for F in [1, 2, 3, 4, 5, 6]]
plt.ion()
from jinja2 import Environment, FileSystemLoader
from datetime import datetime
loader = FileSystemLoader('.')
env = Environment(loader=loader)
template = env.get_template('image_list.jinja2')

images = [
    ('Varying the amplitude', amplitudes),
    ('Varying the frequency', frequencies),
]

with open('image_list.html', 'w') as f:
    f.write(template.render(title='Lots of sines',
                            images=images, time=datetime.now()))
!open image_list.html

Neuroimage packages

The nipy ecosystem covers most of these.

networkx: graph theory

GUI

  • tkinter: thin wrapper around Tcl/Tk; included in python
  • wxpython: Wrapper around the C++ wxWidgets library
%%writefile wx_hello_world.py
"""
Hello World, but with more meat.
"""

import wx

class HelloFrame(wx.Frame):
    """
    A Frame that says Hello World
    """

    def __init__(self, *args, **kw):
        # ensure the parent's __init__ is called
        super(HelloFrame, self).__init__(*args, **kw)

        # create a panel in the frame
        pnl = wx.Panel(self)

        # and put some text with a larger bold font on it
        st = wx.StaticText(pnl, label="Hello World!", pos=(25,25))
        font = st.GetFont()
        font.PointSize += 10
        font = font.Bold()
        st.SetFont(font)

        # create a menu bar
        self.makeMenuBar()

        # and a status bar
        self.CreateStatusBar()
        self.SetStatusText("Welcome to wxPython!")


    def makeMenuBar(self):
        """
        A menu bar is composed of menus, which are composed of menu items.
        This method builds a set of menus and binds handlers to be called
        when the menu item is selected.
        """

        # Make a file menu with Hello and Exit items
        fileMenu = wx.Menu()
        # The "\t..." syntax defines an accelerator key that also triggers
        # the same event
        helloItem = fileMenu.Append(-1, "&Hello...\tCtrl-H",
                "Help string shown in status bar for this menu item")
        fileMenu.AppendSeparator()
        # When using a stock ID we don't need to specify the menu item's
        # label
        exitItem = fileMenu.Append(wx.ID_EXIT)

        # Now a help menu for the about item
        helpMenu = wx.Menu()
        aboutItem = helpMenu.Append(wx.ID_ABOUT)

        # Make the menu bar and add the two menus to it. The '&' defines
        # that the next letter is the "mnemonic" for the menu item. On the
        # platforms that support it those letters are underlined and can be
        # triggered from the keyboard.
        menuBar = wx.MenuBar()
        menuBar.Append(fileMenu, "&File")
        menuBar.Append(helpMenu, "&Help")

        # Give the menu bar to the frame
        self.SetMenuBar(menuBar)

        # Finally, associate a handler function with the EVT_MENU event for
        # each of the menu items. That means that when that menu item is
        # activated then the associated handler function will be called.
        self.Bind(wx.EVT_MENU, self.OnHello, helloItem)
        self.Bind(wx.EVT_MENU, self.OnExit,  exitItem)
        self.Bind(wx.EVT_MENU, self.OnAbout, aboutItem)


    def OnExit(self, event):
        """Close the frame, terminating the application."""
        self.Close(True)


    def OnHello(self, event):
        """Say hello to the user."""
        wx.MessageBox("Hello again from wxPython")


    def OnAbout(self, event):
        """Display an About Dialog"""
        wx.MessageBox("This is a wxPython Hello World sample",
                      "About Hello World 2",
                      wx.OK|wx.ICON_INFORMATION)


if __name__ == '__main__':
    # When this module is run (not imported) then create the app, the
    # frame, show it, and start the event loop.
    app = wx.App()
    frm = HelloFrame(None, title='Hello World 2')
    frm.Show()
    app.MainLoop()
!python.app wx_hello_world.py

Machine learning

  • scikit-learn
  • theano/tensorflow/pytorch
    • keras

pymc3: Pobabilstic programming

import numpy as np
import matplotlib.pyplot as plt

# Initialize random number generator
np.random.seed(123)

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
import pymc3 as pm
basic_model = pm.Model()

with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=2)
    sigma = pm.HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
with basic_model:

    # obtain starting values via MAP
    start = pm.find_MAP(fmin=optimize.fmin_powell)

    # instantiate sampler
    step = pm.Slice()

    # draw 5000 posterior samples
    trace = pm.sample(5000, step=step, start=start)
_ = pm.traceplot(trace)
pm.summary(trace)

Alternatives:

  • pystan: wrapper around the Stan probabilistic programming language.
  • emcee: if you just want MCMC

Pycuda: Programming the GPU

Wrapper around Cuda.

  • The alternative Pyopencl provides a very similar wrapper around OpenCL.
  • Also see pyopenGL: graphics programming in python (used in FSLeyes)

Testing

import unittest

class TestStringMethods(unittest.TestCase):

    def test_upper(self):
        self.assertEqual('foo'.upper(), 'FOO')

    def test_isupper(self):
        self.assertTrue('FOO'.isupper())
        self.assertFalse('Foo'.isupper())

    def test_split(self):
        s = 'hello world'
        self.assertEqual(s.split(), ['hello', 'world'])
        # check that s.split fails when the separator is not a string
        with self.assertRaises(TypeError):
            s.split(2)

if __name__ == '__main__':
    unittest.main()
  • doctest: checks the example usage in the documentation
def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(30)
    265252859812191058636308480000000
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: n must be >= 0

    Factorials of floats are OK, but the float must be an exact integer:
    >>> factorial(30.1)
    Traceback (most recent call last):
        ...
    ValueError: n must be exact integer
    >>> factorial(30.0)
    265252859812191058636308480000000

    It must also not be ridiculously large:
    >>> factorial(1e100)
    Traceback (most recent call last):
        ...
    OverflowError: n too large
    """

    import math
    if not n >= 0:
        raise ValueError("n must be >= 0")
    if math.floor(n) != n:
        raise ValueError("n must be exact integer")
    if n+1 == n:  # catch a value like 1e300
        raise OverflowError("n too large")
    result = 1
    factor = 2
    while factor <= n:
        result *= factor
        factor += 1
    return result


if __name__ == "__main__":
    import doctest
    doctest.testmod()

Two external packages provide more convenient unit tests:

# content of test_sample.py
def inc(x):
    return x + 1

def test_answer():
    assert inc(3) == 5
  • coverage: measures which part of the code is covered by the tests

Linters

Linters check the code for any syntax errors, style errors, unused variables, unreachable code, etc.

  • pylint: most extensive linter
  • pyflake: if you think pylint is too strict
  • pep8: just checks for style errors

Optional static typing

  • Document how your method/function should be called
    • Static checking of whether your type hints are still up to date
    • Static checking of whether you call your own function correctly
  • Even if you don't assign types yourself, static type checking can still check whether you call typed functions/methods from other packages correctly.
from typing import List

def greet_all(names: List[str]) -> None:
    for name in names:
        print('Hello, {}'.format(name))

greet_all(['python', 'java', 'C++'])  # type checker will be fine with this

greet_all('matlab')  # this will actually run fine, but type checker will raise an error

Packages:

  • typing: built-in library containing generics, unions, etc.
  • mypy: linter doing static type checking
  • pyAnnotate: automatically assign types to most of your functions/methods based on runtime

Web frameworks

  • Django2: includes the most features, but also forces you to do things their way
  • Pyramid: Intermediate options
  • Flask: Bare-bone web framework, but many extensions available

There are also many, many libraries to interact with databases, but you will have to google those yourself.

Quick mentions

  • trimesh: Triangular mesh algorithms
  • Pillow: Read/write/manipulate a wide variety of images (png, jpg, tiff, etc.)
  • psychopy: equivalent of psychtoolbox (workshop coming up in April in Nottingham)
  • Sphinx: documentation generator
  • Buit-in libraries
    • functools: caching, decorators, and support for functional programming
    • json/ipaddress/xml: parsing/writing
    • logging: log your output to stdout or a file (more flexible than print statements)
    • multiprocessing
    • os/sys: Miscellaneous operating system interfaces
    • os.path/pathlib: utilities to deal with filesystem paths (latter provides an object-oriented interface)
    • pickle: Store/load any python object
    • subprocess: call shell commands
    • warnings: tell people they are not using your code properly
import this