-
Michiel Cottaar authoredMichiel Cottaar authored
- Main scientific python libraries
- Numpy: arrays
- Scipy: most general scientific tools
- Matplotlib: Main plotting library
- Ipython/Jupyter notebook: interactive python environments
- Pandas: Analyzing "clean" data
- Adjusted example from statsmodels tutorial
- Sympy: Symbolic programming
- Other topics
- Argparse: Command line arguments
- Gooey: GUI from command line tool
- Jinja2: Templating language
- Neuroimage packages
- networkx: graph theory
- GUI
- Machine learning
- pymc3: Pobabilstic programming
- Pycuda: Programming the GPU
- Testing
- Linters
- Optional static typing
- Web frameworks
- Quick mentions
Main scientific python libraries
Most of these packages have or are in the progress of dropping support for python2. So use python3!
Numpy: arrays
This is the main library underlying (nearly) all of the scientific python ecosystem. See the tutorial in the beginner session or the official numpy tutorial for usage details.
The usual nickname of numpy is np:
import numpy as np
Numpy includes support for:
- N-dimensional arrays with various datatypes
- masked arrays
- matrices
- structured/record array
- basic functions (e.g., sin, log, arctan, polynomials)
- basic linear algebra
- random number generation
Scipy: most general scientific tools
At the top level this module includes all of the basic functionality from numpy. You could import this as, but you might as well import numpy directly.
import scipy as sp
The main strength in scipy lies in its sub-packages:
from scipy import optimize
def costfunc(params):
return (params[0] - 3) ** 2
optimize.minimize(costfunc, x0=[0], method='l-bfgs-b')
Tutorials for all sub-packages can be found here.
Alternative for scipy.ndimage
:
- Scikit-image for image manipulation/segmentation/feature detection
Matplotlib: Main plotting library
import matplotlib as mpl
mpl.use('nbagg')
import matplotlib.pyplot as plt
The matplotlib tutorials are here
x = np.linspace(0, 2, 100)
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')
plt.xlabel('x label')
plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
plt.show()
Alternatives:
- Mayavi: 3D plotting (hard to install)
- Bokeh among many others: interactive plots in the browser (i.e., in javascript)
Ipython/Jupyter notebook: interactive python environments
Features:
- debugging
from scipy import optimize
def costfunc(params):
if params[0] <= 0:
raise ValueError('Input variable is too low')
return 1 / params[0]
optimize.minimize(costfunc, x0=[3], method='l-bfgs-b')
%debug
- timing/profiling
%%prun
plt.plot([0, 3])
- getting help
plt.plot?
The next generation is already out: jupyterlab
There are many useful extensions available.
Pandas: Analyzing "clean" data
Once your data is in tabular form (e.g. Biobank IDP's), you want to use pandas dataframes to analyze them. This brings most of the functionality of R into python. Pandas has excellent support for:
- fast IO to many tabular formats
- accurate handling of missing data
- Many, many routines to handle data
- group by categorical data (e.g., male/female)
- joining/merging data (all SQL-like operations and much more)
- time series support
- statistical models through statsmodels
- plotting though seaborn
- Use dask if your data is too big for memory (or if you want to run in parallel)
You should also install numexpr
and bottleneck
for optimal performance.
For the documentation check here
Adjusted example from statsmodels tutorial
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
df = sm.datasets.get_rdataset("Guerry", "HistData").data
df
df.describe()
df.groupby('Region').mean()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=df).fit()
results.summary()
df['log_pop'] = np.log(df.Pop1831)
df
results = smf.ols('Lottery ~ Literacy + log_pop', data=df).fit()
results.summary()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region', data=df).fit()
results.summary()
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831) + Region + Region * Literacy', data=df).fit()
results.summary()
%matplotlib nbagg
import seaborn as sns
sns.pairplot(df, hue="Region", vars=('Lottery', 'Literacy', 'log_pop'))
Sympy: Symbolic programming
import sympy as sym # no standard nickname
x, a, b, c = sym.symbols('x, a, b, c')
sym.solve(a * x ** 2 + b * x + c, x)
sym.integrate(x/(x**2+a*x+2), x)
f = sym.utilities.lambdify((x, a), sym.integrate((x**2+a*x+2), x))
f(np.random.rand(10), np.random.rand(10))
Other topics
Argparse: Command line arguments
%%writefile test_argparse.py
import argparse
def main():
parser = argparse.ArgumentParser(description="calculate X to the power of Y")
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("x", type=int, help="the base")
parser.add_argument("y", type=int, help="the exponent")
args = parser.parse_args()
answer = args.x**args.y
if args.verbose:
print("{} to the power {} equals {}".format(args.x, args.y, answer))
else:
print("{}^{} == {}".format(args.x, args.y, answer))
if __name__ == '__main__':
main()
%run test_argparse.py 3 8 -v
%run test_argparse.py -h
%run test_argparse.py 3 8.5
Alternatives:
- docopt: You write a usage string, docopt will generate the parser
# example from https://realpython.com/blog/python/comparing-python-command-line-parsing-libraries-argparse-docopt-click/ """Greeter. Usage: commands.py hello commands.py goodbye commands.py -h | --help Options: -h --help Show this screen. """ from docopt import docopt if __name__ == '__main__': arguments = docopt(__doc__)
- clize: You write a function, clize will generate the parser
from clize import run def echo(word): return word if __name__ == '__main__': run(echo)
Gooey: GUI from command line tool
%%writefile test_gooey.py
import argparse
from gooey import Gooey
@Gooey
def main():
parser = argparse.ArgumentParser(description="calculate X to the power of Y")
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("x", type=int, help="the base")
parser.add_argument("y", type=int, help="the exponent")
args = parser.parse_args()
answer = args.x**args.y
if args.verbose:
print("{} to the power {} equals {}".format(args.x, args.y, answer))
else:
print("{}^{} == {}".format(args.x, args.y, answer))
if __name__ == '__main__':
main()
!python.app test_gooey.py
!gcoord_gui
Jinja2: Templating language
Jinja2 allows to create templates of files with placeholders, where future content will go. This allows for the creation of a large number of similar files.
This can for example be used to produce static HTML output in a highly flexible manner.
%%writefile image_list.jinja2
<!DOCTYPE html>
<html lang="en">
<head>
{% block head %}
<title>{{ title }}</title>
{% endblock %}
</head>
<body>
<div id="content">
{% block content %}
{% for description, filenames in images %}
<p>
{{ description }}
</p>
{% for filename in filenames %}
<a href="{{ filename }}">
<img src="{{ filename }}">
</a>
{% endfor %}
{% endfor %}
{% endblock %}
</div>
<footer>
Created on {{ time }}
</footer>
</body>
</html>
import numpy as np
import matplotlib.pyplot as plt
plt.ioff()
def plot_sine(amplitude, frequency):
x = np.linspace(0, 2 * np.pi, 100)
y = amplitude * np.sin(frequency * x)
plt.plot(x, y)
plt.xticks([0, np.pi, 2 * np.pi], ['0', '$\pi$', '$2 \pi$'])
plt.ylim(-1.1, 1.1)
filename = 'plots/A{:.2f}_F{:.2f}.png'.format(amplitude, frequency)
plt.title('A={:.2f}, F={:.2f}'.format(amplitude, frequency))
plt.savefig(filename)
plt.close(plt.gcf())
return filename
!mkdir plots
amplitudes = [plot_sine(A, 1.) for A in [0.1, 0.3, 0.7, 1.0]]
frequencies = [plot_sine(1., F) for F in [1, 2, 3, 4, 5, 6]]
plt.ion()
from jinja2 import Environment, FileSystemLoader
from datetime import datetime
loader = FileSystemLoader('.')
env = Environment(loader=loader)
template = env.get_template('image_list.jinja2')
images = [
('Varying the amplitude', amplitudes),
('Varying the frequency', frequencies),
]
with open('image_list.html', 'w') as f:
f.write(template.render(title='Lots of sines',
images=images, time=datetime.now()))
!open image_list.html
Neuroimage packages
The nipy ecosystem covers most of these.
networkx: graph theory
GUI
- tkinter: thin wrapper around Tcl/Tk; included in python
- wxpython: Wrapper around the C++ wxWidgets library
%%writefile wx_hello_world.py
"""
Hello World, but with more meat.
"""
import wx
class HelloFrame(wx.Frame):
"""
A Frame that says Hello World
"""
def __init__(self, *args, **kw):
# ensure the parent's __init__ is called
super(HelloFrame, self).__init__(*args, **kw)
# create a panel in the frame
pnl = wx.Panel(self)
# and put some text with a larger bold font on it
st = wx.StaticText(pnl, label="Hello World!", pos=(25,25))
font = st.GetFont()
font.PointSize += 10
font = font.Bold()
st.SetFont(font)
# create a menu bar
self.makeMenuBar()
# and a status bar
self.CreateStatusBar()
self.SetStatusText("Welcome to wxPython!")
def makeMenuBar(self):
"""
A menu bar is composed of menus, which are composed of menu items.
This method builds a set of menus and binds handlers to be called
when the menu item is selected.
"""
# Make a file menu with Hello and Exit items
fileMenu = wx.Menu()
# The "\t..." syntax defines an accelerator key that also triggers
# the same event
helloItem = fileMenu.Append(-1, "&Hello...\tCtrl-H",
"Help string shown in status bar for this menu item")
fileMenu.AppendSeparator()
# When using a stock ID we don't need to specify the menu item's
# label
exitItem = fileMenu.Append(wx.ID_EXIT)
# Now a help menu for the about item
helpMenu = wx.Menu()
aboutItem = helpMenu.Append(wx.ID_ABOUT)
# Make the menu bar and add the two menus to it. The '&' defines
# that the next letter is the "mnemonic" for the menu item. On the
# platforms that support it those letters are underlined and can be
# triggered from the keyboard.
menuBar = wx.MenuBar()
menuBar.Append(fileMenu, "&File")
menuBar.Append(helpMenu, "&Help")
# Give the menu bar to the frame
self.SetMenuBar(menuBar)
# Finally, associate a handler function with the EVT_MENU event for
# each of the menu items. That means that when that menu item is
# activated then the associated handler function will be called.
self.Bind(wx.EVT_MENU, self.OnHello, helloItem)
self.Bind(wx.EVT_MENU, self.OnExit, exitItem)
self.Bind(wx.EVT_MENU, self.OnAbout, aboutItem)
def OnExit(self, event):
"""Close the frame, terminating the application."""
self.Close(True)
def OnHello(self, event):
"""Say hello to the user."""
wx.MessageBox("Hello again from wxPython")
def OnAbout(self, event):
"""Display an About Dialog"""
wx.MessageBox("This is a wxPython Hello World sample",
"About Hello World 2",
wx.OK|wx.ICON_INFORMATION)
if __name__ == '__main__':
# When this module is run (not imported) then create the app, the
# frame, show it, and start the event loop.
app = wx.App()
frm = HelloFrame(None, title='Hello World 2')
frm.Show()
app.MainLoop()
!python.app wx_hello_world.py
Machine learning
- scikit-learn
- theano/tensorflow/pytorch
- keras
pymc3: Pobabilstic programming
import numpy as np
import matplotlib.pyplot as plt
# Initialize random number generator
np.random.seed(123)
# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]
# Size of dataset
size = 100
# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
# Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
import pymc3 as pm
basic_model = pm.Model()
with basic_model:
# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10, shape=2)
sigma = pm.HalfNormal('sigma', sd=1)
# Expected value of outcome
mu = alpha + beta[0]*X1 + beta[1]*X2
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
with basic_model:
# obtain starting values via MAP
start = pm.find_MAP(fmin=optimize.fmin_powell)
# instantiate sampler
step = pm.Slice()
# draw 5000 posterior samples
trace = pm.sample(5000, step=step, start=start)
_ = pm.traceplot(trace)
pm.summary(trace)
Alternatives:
Pycuda: Programming the GPU
Wrapper around Cuda.
- The alternative Pyopencl provides a very similar wrapper around OpenCL.
- Also see pyopenGL: graphics programming in python (used in FSLeyes)
Testing
- unittest: python built-in testing
import unittest
class TestStringMethods(unittest.TestCase):
def test_upper(self):
self.assertEqual('foo'.upper(), 'FOO')
def test_isupper(self):
self.assertTrue('FOO'.isupper())
self.assertFalse('Foo'.isupper())
def test_split(self):
s = 'hello world'
self.assertEqual(s.split(), ['hello', 'world'])
# check that s.split fails when the separator is not a string
with self.assertRaises(TypeError):
s.split(2)
if __name__ == '__main__':
unittest.main()
- doctest: checks the example usage in the documentation
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(30)
265252859812191058636308480000000
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
Factorials of floats are OK, but the float must be an exact integer:
>>> factorial(30.1)
Traceback (most recent call last):
...
ValueError: n must be exact integer
>>> factorial(30.0)
265252859812191058636308480000000
It must also not be ridiculously large:
>>> factorial(1e100)
Traceback (most recent call last):
...
OverflowError: n too large
"""
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
if __name__ == "__main__":
import doctest
doctest.testmod()
Two external packages provide more convenient unit tests:
# content of test_sample.py
def inc(x):
return x + 1
def test_answer():
assert inc(3) == 5
- coverage: measures which part of the code is covered by the tests
Linters
Linters check the code for any syntax errors, style errors, unused variables, unreachable code, etc.
- pylint: most extensive linter
- pyflake: if you think pylint is too strict
- pep8: just checks for style errors
Optional static typing
- Document how your method/function should be called
- Static checking of whether your type hints are still up to date
- Static checking of whether you call your own function correctly
- Even if you don't assign types yourself, static type checking can still check whether you call typed functions/methods from other packages correctly.
from typing import List
def greet_all(names: List[str]) -> None:
for name in names:
print('Hello, {}'.format(name))
greet_all(['python', 'java', 'C++']) # type checker will be fine with this
greet_all('matlab') # this will actually run fine, but type checker will raise an error
Packages:
- typing: built-in library containing generics, unions, etc.
- mypy: linter doing static type checking
- pyAnnotate: automatically assign types to most of your functions/methods based on runtime
Web frameworks
- Django2: includes the most features, but also forces you to do things their way
- Pyramid: Intermediate options
- Flask: Bare-bone web framework, but many extensions available
There are also many, many libraries to interact with databases, but you will have to google those yourself.
Quick mentions
- trimesh: Triangular mesh algorithms
- Pillow: Read/write/manipulate a wide variety of images (png, jpg, tiff, etc.)
- psychopy: equivalent of psychtoolbox (workshop coming up in April in Nottingham)
- Sphinx: documentation generator
-
Buit-in libraries
- functools: caching, decorators, and support for functional programming
- json/ipaddress/xml: parsing/writing
- logging: log your output to stdout or a file (more flexible than print statements)
- multiprocessing
- os/sys: Miscellaneous operating system interfaces
- os.path/pathlib: utilities to deal with filesystem paths (latter provides an object-oriented interface)
- pickle: Store/load any python object
- subprocess: call shell commands
- warnings: tell people they are not using your code properly
import this