diff --git a/talks/virtual_intro/data/2d_array.txt b/talks/virtual_intro/data/2d_array.txt new file mode 100644 index 0000000000000000000000000000000000000000..923f35ecad43a521cf1de5bbbe73e2fbbe265b4d --- /dev/null +++ b/talks/virtual_intro/data/2d_array.txt @@ -0,0 +1,13 @@ +% The purpose of this comment is purely +% to make life difficult for you. Sorry +% about that. I'm not really sorry. +128.3 100.8 67.2 120.1 150.2 53.0 64.2 139.3 46.7 118.1 125.8 153.1 83.2 115.9 3.4 126.3 92.2 104.7 131.2 29.3 + 89.3 118.8 119.2 80.1 121.2 35.0 66.2 153.3 43.7 102.1 147.8 160.1 94.2 140.9 70.4 124.3 124.2 93.7 100.2 32.3 + 91.3 146.8 137.2 129.1 157.2 -22.0 75.2 90.3 33.7 86.1 74.8 90.1 114.2 81.9 72.4 135.3 85.2 102.7 58.2 55.3 + 75.3 87.8 84.2 112.1 131.2 57.0 150.2 73.3 41.7 110.1 78.8 102.1 116.2 134.9 34.4 74.3 150.2 52.7 89.2 21.3 + 79.3 83.8 78.2 103.1 96.2 26.0 151.2 122.3 46.7 113.1 143.8 80.1 61.2 136.9 8.4 94.3 76.2 123.7 150.2 10.3 + 93.3 157.8 154.2 79.1 155.2 8.0 63.2 120.3 12.7 130.1 105.8 111.1 66.2 99.9 54.4 120.3 113.2 128.7 110.2 46.3 +111.3 126.8 65.2 90.1 101.2 36.0 116.2 139.3 36.7 91.1 88.8 94.1 90.2 92.9 -0.6 46.3 80.2 105.7 88.2 -11.7 +149.3 119.8 134.2 124.1 89.2 21.0 133.2 113.3 28.7 139.1 136.8 82.1 141.2 90.9 11.4 97.3 109.2 137.7 120.2 32.3 +141.3 149.8 115.2 115.1 105.2 51.0 77.2 130.3 -4.3 124.1 141.8 162.1 95.2 130.9 -10.6 114.3 132.2 134.7 79.2 69.3 +141.3 77.8 135.2 167.1 103.2 55.0 153.2 68.3 33.7 136.1 125.8 85.1 148.2 114.9 76.4 57.3 147.2 125.7 153.2 45.3 diff --git a/talks/virtual_intro/data/comma_separated.txt b/talks/virtual_intro/data/comma_separated.txt new file mode 100644 index 0000000000000000000000000000000000000000..9aa1ff615e7cce6765f308a9e2ae05fcab1a2fd9 --- /dev/null +++ b/talks/virtual_intro/data/comma_separated.txt @@ -0,0 +1,10 @@ +13,7,3,18,15,3,12,11,9,9 +13,1,7,17,16,13,18,9,18,6 +9,19,16,3,18,3,19,12,9,6 +2,11,6,12,2,11,15,9,3,9 +2,12,1,7,4,3,6,6,2,4 +10,8,14,1,17,19,8,19,2,9 +18,14,2,11,17,14,6,16,14,18 +6,8,13,16,11,17,16,5,16,15 +6,5,4,18,6,14,19,8,4,15 +15,17,12,10,17,12,5,9,18,6 diff --git a/talks/virtual_intro/data/file.txt b/talks/virtual_intro/data/file.txt new file mode 100644 index 0000000000000000000000000000000000000000..e1c854300eae248a864f732f5308a22c58650028 --- /dev/null +++ b/talks/virtual_intro/data/file.txt @@ -0,0 +1 @@ +Congratulations, you have read your first file from Python! diff --git a/talks/virtual_intro/data/space_separated.txt b/talks/virtual_intro/data/space_separated.txt new file mode 100644 index 0000000000000000000000000000000000000000..f5fce0dafc3a5ac00aaecf1544e197a2f863eb1f --- /dev/null +++ b/talks/virtual_intro/data/space_separated.txt @@ -0,0 +1,20 @@ +1 3 +14 3 +4 7 +12 5 +15 5 +14 4 +15 19 +7 17 +7 15 +16 3 +15 14 +17 9 +7 8 +4 5 +15 4 +11 5 +17 10 +3 19 +14 4 +18 3 diff --git a/talks/virtual_intro/intro.ipynb b/talks/virtual_intro/intro.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9e6552da9bfdc2b67944af7256449cbc7e48a4d6 --- /dev/null +++ b/talks/virtual_intro/intro.ipynb @@ -0,0 +1,1412 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Welcome to the WIN Virtual Mini PyTreat 2020!\n", + "\n", + "\n", + "This notebook is available at:\n", + "\n", + "\n", + "https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/-/tree/master/talks%2Fvirtual_intro/intro.ipynb\n", + "\n", + "\n", + "If you have FSL installed and you'd like to follow along *interactively*,\n", + "follow the instructions for attendees in the `README.md` file of the above\n", + "repository, and then open the `talks/virtual_intro/intro.ipynb` notebook.\n", + "\n", + "\n", + "# Contents\n", + "\n", + "\n", + "* [Introduction](#introduction)\n", + " * [Python in a nutshell](#python-in-a-nutshell)\n", + " * [Different ways of running Python](#different-ways-of-running-python)\n", + "* [Variables and basic types](#variables-and-basic-types)\n", + " * [Integer and floating point scalars](#integer-and-floating-point-scalars)\n", + " * [Strings](#strings)\n", + " * [Lists and tuples](#lists-and-tuples)\n", + " * [Dictionaries](#dictionaries)\n", + " * [A note on mutablility](#a-note-on-mutablility)\n", + "* [Flow control](#flow-control)\n", + " * [List comprehensions](#list-comprehensions)\n", + "* [Reading and writing text files](#reading-and-writing-text-files)\n", + " * [Example: processing lesion counts](#example-processing-lesion-counts)\n", + "* [Functions](#functions)\n", + "* [Working with `numpy`](#working-with-numpy)\n", + " * [The Python list versus the `numpy` array](#the-python-list-versus-the-numpy-array)\n", + " * [Creating arrays](#creating-arrays)\n", + " * [Example: reading arrays from text files](#example-reading-arrays-from-text-files)\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"introduction\"></a>\n", + "# Introduction\n", + "\n", + "\n", + "This talk is an attempt to give a whirlwind overview of the Python programming\n", + "language. It is assumed that you have experience with another programming\n", + "language (e.g. MATLAB).\n", + "\n", + "\n", + "This talk is presented as an interactive [Jupyter\n", + "Notebook](https://jupyter.org/) - you can run all of the code on your own\n", + "machine - click on a code block, and press **SHIFT+ENTER**. You can also \"run\"\n", + "the text sections, so you can just move down the document by pressing\n", + "**SHIFT+ENTER**.\n", + "\n", + "\n", + "It is also possible to *change* the contents of each code block (these pages\n", + "are completely interactive) so do experiment with the code you see and try\n", + "some variations!\n", + "\n", + "\n", + "You can get help on any Python object, function, or method by putting a `?`\n", + "before or after the thing you want help on:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'hello!'\n", + "?a.upper" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And you can explore the available methods on a Python object by using the\n", + "**TAB** key:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Put the cursor after the dot, and press the TAB key...\n", + "a." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"python-in-a-nutshell\"></a>\n", + "## Python in a nutshell\n", + "\n", + "\n", + "**Pros**\n", + "\n", + "\n", + "* _Flexible_ Feel free to use functions, classes, objects, modules and\n", + " packages. Or don't - it's up to you!\n", + "\n", + "* _Fast_ If you do things right (in other words, if you use `numpy`)\n", + "\n", + "* _Dynamically typed_ No need to declare your variables, or specify their\n", + " types.\n", + "\n", + "* _Intuitive syntax_ How do I run some code for each of the elements in my\n", + " list?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mylist = [1, 2, 3, 4, 5]\n", + "\n", + "for element in mylist:\n", + " print(element)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Cons**\n", + "\n", + "\n", + "* _Dynamically typed_ Easier to make mistakes, harder to catch them\n", + "\n", + "* _No compiler_ See above\n", + "\n", + "* _Slow_ if you don't do things the right way\n", + "\n", + "* _Python 2 is not the same as Python 3_ But there's an easy solution: Forget\n", + " that Python 2 exists.\n", + "\n", + "* _Hard to manage different versions of python_ But we have a solution for\n", + " you: `fslpython`.\n", + "\n", + "\n", + "Python is a widely used language, so you can get lots of help through google\n", + "and [stackoverflow](https://stackoverflow.com). But make sure that the\n", + "information you find is for **Python 3**, and **not** for **Python 2**!\n", + "Python 2 is obsolete, but is still used by many organisations, so you will\n", + "inevitably come across many Python 2 resources.\n", + "\n", + "\n", + "The differences between Python 2 and 3 are small, but important. The most\n", + "visible difference is in the `print` function: in Python 3, we write\n", + "`print('hello!')`, but in Python 2, we would write `print 'hello!'`.\n", + "\n", + "\n", + "FSL 5.0.10 and newer comes with its own version of Python, bundled with nearly\n", + "all of the scientific libraries that you are likely to need.\n", + "\n", + "\n", + "So if you use `fslpython` for all of your development, you can be sure that it\n", + "will work in FSL!\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"different-ways-of-running-python\"></a>\n", + "## Different ways of running Python\n", + "\n", + "\n", + "Many of the Pytreat talks and practicals are presented as *Jupyter notebooks*,\n", + "which is a way of running python code in a web browser.\n", + "\n", + "\n", + "Jupyter notebooks are good for presentations and practicals, and some people\n", + "find them very useful for exploratory data analysis. But they're not the only\n", + "way of running Python code.\n", + "\n", + "\n", + "**Run Python from a file**\n", + "\n", + "\n", + "This works just like it does in MATLAB:\n", + "\n", + "\n", + "1. Put your code in a `.py` file (e.g. `mycode.py`).\n", + "2. Run `fslpython mycode.py` in a terminal.\n", + "3. ??\n", + "4. Profit.\n", + "\n", + "\n", + "**Run python in an interpreter**\n", + "\n", + "\n", + "Python is an [*interpreted\n", + "language*](https://en.wikipedia.org/wiki/Interpreted_language), like MATLAB.\n", + "So you can either write your code into a file, and then run that file, or you\n", + "can type code directly into a Python interpreter.\n", + "\n", + "\n", + "Python has a standard interpreter built-in - run `fslpython` in a terminal,\n", + "and see what happens (use CTRL+D to exit).\n", + "\n", + "\n", + "**But** there is another interpreter called [IPython](https://ipython.org/)\n", + "which is vastly superior to the standard Python interpreter. Use IPython\n", + "instead! It is already installed in `fslpython`, so if you want to do some\n", + "interactive work, you can use `fslipython` in a terminal.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"variables-and-basic-types\"></a>\n", + "# Variables and basic types\n", + "\n", + "\n", + "There are many different types of values in Python. Python *variables* do not\n", + "have a type though - a variable can refer to values of any type, and a\n", + "variable can be updated to refer to different values (of different\n", + "types). This is just like how things work in MATLAB.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"integer-and-floating-point-scalars\"></a>\n", + "## Integer and floating point scalars" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 7\n", + "b = 1 / 3\n", + "c = a + b\n", + "print('a: ', a)\n", + "print('b: ', b)\n", + "print('c: ', c)\n", + "print('b: {:0.4f}'.format(b))\n", + "print('a + b:', a + b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"strings)\"></a>\n", + "## Strings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'Hello'\n", + "b = \"Kitty\"\n", + "c = '''\n", + "Magic\n", + "multi-line\n", + "strings!\n", + "'''\n", + "\n", + "print(a, b)\n", + "print(a + b)\n", + "print('{}, {}!'.format(a, b))\n", + "print(c)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "String objects have a number of useful methods:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'This is a Test String'\n", + "print(s.upper())\n", + "print(s.lower())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another useful method is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'This is a Test String'\n", + "s2 = s.replace('Test', 'Better')\n", + "print(s2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Two common and convenient string methods are `strip()` and `split()`. The\n", + "first will remove any whitespace at the beginning and end of a string:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s2 = ' A very spacy string '\n", + "print('*' + s2 + '*')\n", + "print('*' + s2.strip() + '*')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With `split()` we can tokenize a string (to turn it into a list of strings)\n", + "like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(s.split())\n", + "print(s2.split())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use the `join` method to re-construct a new string. Imagine that\n", + "we need to reformat some data from being comma-separated to being\n", + "space-separated:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = ' 1,2,3,4,5,6,7 '" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`strip`, `split` and `join` makes this job trivial:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Original: {}'.format(data))\n", + "print('Strip, split, and join: {}'.format(' '.join(data.strip().split(','))))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"lists-and-tuples\"></a>\n", + "## Lists and tuples\n", + "\n", + "\n", + "Both tuples and lists are built-in Python types and are like cell-arrays in\n", + "MATLAB. For numerical vectors and arrays it is much better to use *numpy*\n", + "arrays, which are covered later.\n", + "\n", + "\n", + "Tuples are defined using round brackets and lists are defined using square\n", + "brackets. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = (3, 7.6, 'str')\n", + "l = [1, 'mj', -5.4]\n", + "print(t)\n", + "print(l)\n", + "\n", + "t2 = (t, l)\n", + "l2 = [t, l]\n", + "print('t2 is: ', t2)\n", + "print('l3 is: ', l2)\n", + "print(len(t2))\n", + "print(len(l2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The key difference between lists and tuples is that tuples are *immutable*\n", + "(once created, they cannot be changed), whereas lists are *mutable*:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [10, 20, 30]\n", + "a[2] = 999\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Square brackets are used to index tuples, lists, strings, dictionaries, etc.\n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = [10, 20, 30]\n", + "print(d[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **MATLAB pitfall:** Python uses zero-based indexing, unlike MATLAB, where\n", + "> indices start from 1." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [10, 20, 30, 40, 50, 60]\n", + "print(a[0])\n", + "print(a[2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A range of values for the indices can be specified to extract values from a\n", + "list or tuple using the `:` character. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[0:3])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **MATLAB pitfall:** Note that Python's slicing syntax is different from\n", + "> MATLAB in that the second number is *exclusive*, i.e. `a[0:3]` gives us the\n", + "> elements of `a` at positions `0`, `1` and `2` , but *not* at position `3`.\n", + "\n", + "\n", + "When slicing a list or tuple, you can leave the start and end values out -\n", + "when you do this, Python will assume that you want to start slicing from the\n", + "beginning or the end of the list. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[:3])\n", + "print(a[1:])\n", + "print(a[:])\n", + "print(a[:-1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also change the step size, which is specified by the third value (not\n", + "the second one, as in MATLAB). For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[0:4:2])\n", + "print(a[::2])\n", + "print(a[::-1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some methods are available on `list` objects for adding and removing items:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(d)\n", + "d.append(40)\n", + "print(d)\n", + "d.extend([50, 60])\n", + "print(d)\n", + "d = d + [70, 80]\n", + "print(d)\n", + "d.remove(20)\n", + "print(d)\n", + "d.pop(0)\n", + "print(d)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What will `d.append([50,60])` do, and how is it different from\n", + "`d.extend([50,60])`?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d.append([50, 60])\n", + "print(d)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"dictionaries\"></a>\n", + "## Dictionaries\n", + "\n", + "\n", + "Dictionaries (or *dicts*) can be used to store key-value pairs. Almost\n", + "anything can used as a key, and anything can be stored as a value; it is\n", + "common to use strings as keys:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e = {'a' : 10, 'b': 20}\n", + "print(len(e))\n", + "print(e.keys())\n", + "print(e.values())\n", + "print(e['a'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like lists (and unlike tuples), dicts are mutable, and have a number of\n", + "methods for manipulating them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e['c'] = 30\n", + "e.pop('a')\n", + "e.update({'a' : 100, 'd' : 400})\n", + "print(e)\n", + "e.clear()\n", + "print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"a-note-on-mutability\"></a>\n", + "## A note on mutablility\n", + "\n", + "\n", + "Python variables can refer to values which are either mutable, or\n", + "immutable. Examples of immutable values are strings, tuples, and integer and\n", + "floating point scalars. Examples of mutable values are lists, dicts, and most\n", + "user-defined types.\n", + "\n", + "\n", + "When you pass an immutable value around (e.g. into a function, or to another\n", + "variable), it works the same as if you were to copy the value and pass in the\n", + "copy - the original value is not changed:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'abcde'\n", + "b = a\n", + "b = b.upper()\n", + "print('a:', a)\n", + "print('b:', b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In contrast, when you pass a mutable value around, you are passing a\n", + "*reference* to that value - there is only ever one value in existence, but\n", + "multiple variables refer to it. You can manipulate the value through any of\n", + "the variables that refer to it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3, 4, 5]\n", + "b = a\n", + "\n", + "a[3] = 999\n", + "b.append(6)\n", + "\n", + "print('a', a)\n", + "print('b', b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"flow-control\"></a>\n", + "# Flow control\n", + "\n", + "\n", + "Python also has a boolean type which can be either `True` or `False`. Most\n", + "Python types can be implicitly converted into booleans when used in a\n", + "conditional expression.\n", + "\n", + "\n", + "Relevant boolean and comparison operators include: `not`, `and`, `or`, `==`\n", + "and `!=`\n", + "\n", + "\n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = True\n", + "b = False\n", + "print('Not a is:', not a)\n", + "print('a or b is:', a or b)\n", + "print('a and b is:', a and b)\n", + "print('Not 1 is:', not 1)\n", + "print('Not 0 is:', not 0)\n", + "print('Not {} is:', not {})\n", + "print('{}==0 is:', {}==0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is also the `in` test for strings, lists, etc:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('the' in 'a number of words')\n", + "print('of' in 'a number of words')\n", + "print(3 in [1, 2, 3, 4])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use boolean values in `if`-`else` conditional expressions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3, 4]\n", + "val = 3\n", + "if val in a:\n", + " print('Found {}!'.format(val))\n", + "else:\n", + " print('{} not found :('.format(val))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the indentation in the `if`-`else` statement is **crucial**.\n", + "**All** python control blocks are delineated purely by indentation. We\n", + "recommend using **four spaces** and no tabs, as this is a standard practice\n", + "and will help a lot when collaborating with others.\n", + "\n", + "\n", + "You can use the `for` statement to loop over elements in a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = [10, 20, 30]\n", + "for x in d:\n", + " print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also loop over the key-value pairs in a dict:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = {'a' : 10, 'b' : 20, 'c' : 30}\n", + "print('a.items()')\n", + "for key, val in a.items():\n", + " print(key, val)\n", + "print('a.keys()')\n", + "for key in a.keys():\n", + " print(key, a[key])\n", + "print('a.values()')\n", + "for val in a.values():\n", + " print(val)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> In older versions of Python 3, there was no guarantee of ordering when using dictionaries.\n", + "> However, a of Python 3.7, dictionaries will remember the order in which items are inserted,\n", + "> and the `keys()`, `values()`, and `items()` methods will return elements in that order.\n", + ">\n", + "\n", + "> If you want a dictionary with ordering, *and* you want your code to work with\n", + "> Python versions older than 3.7, you can use the\n", + "> [`OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict)\n", + "> class.\n", + "\n", + "\n", + "There are some handy built-in functions that you can use with `for` loops:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = [10, 20, 30]\n", + "print('Using the range function')\n", + "for i in range(len(d)):\n", + " print('element at position {}: {}'.format(i, d[i]))\n", + "\n", + "print('Using the enumerate function')\n", + "for i, elem in enumerate(d):\n", + " print('element at position {}: {}'.format(i, elem))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\" list-comprehensions\"></a>\n", + "## List comprehensions\n", + "\n", + "\n", + "Python has a really neat way to create lists (and dicts), called\n", + "*comprehensions*. Let's say we have some strings, and we want to count the\n", + "number of characters in each of them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "strings = ['hello', 'howdy', 'hi', 'hey']\n", + "nchars = [len(s) for s in strings]\n", + "for s, c in zip(strings, nchars):\n", + " print('{}: {}'.format(s, c))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The `zip` function \"zips\" two or more sequences, so you can loop over them\n", + "> together.\n", + "\n", + "\n", + "Or we could store the character counts in a dict:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nchars = { s : len(s) for s in strings }\n", + "\n", + "for s, c in nchars.items():\n", + " print('{}: {}'.format(s, c))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"reading-and-writing-text-files\"></a>\n", + "# Reading and writing text files\n", + "\n", + "\n", + "The syntax to open a file in python is\n", + "`with open(<filename>, <mode>) as <file_object>: <block of code>`, where\n", + "\n", + "\n", + "* `filename` is a string with the name of the file\n", + "* `mode` is one of 'r' (for read-only access), 'w' (for writing a file, this\n", + " wipes out any existing content), 'a' (for appending to an existing file).\n", + "* `file_object` is a variable name which will be used within the `block of\n", + " code` to access the opened file.\n", + "\n", + "\n", + "For example the following will read all the text in `data/file.txt` and print\n", + "it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('data/file.txt', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A very similar syntax is used to write files:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('new_file.txt', 'w') as f:\n", + " f.write('This is my first line\\n')\n", + " f.writelines(['Second line\\n', 'and the third\\n'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"example-processing-lesion-counts\"></a>\n", + "## Example: processing lesion counts\n", + "\n", + "\n", + "Imagine that we have written an amazing algorithm in Python which\n", + "automatically counts the number of lesions in an individual's structural MRI\n", + "image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "subject_ids = ['01', '07', '21', '32']\n", + "lesion_counts = [ 4, 9, 13, 2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We may wish to process this data in another application (e.g. Excel or SPSS).\n", + "Let's save the results out to a CSV (comma-separated value) file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('lesion_counts.csv', 'w') as f:\n", + " f.write('Subject ID, Lesion count\\n')\n", + " for subj_id, count in zip(subject_ids, lesion_counts):\n", + " f.write('{}, {}\\n'.format(subj_id, count))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now load the `lesion_counts.csv` file into our analysis software of\n", + "choice. Or we could load it back into another Python session, and store\n", + "the data in a dict:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lesion_counts = {}\n", + "\n", + "with open('lesion_counts.csv', 'r') as f:\n", + " # skip the header\n", + " f.readline()\n", + " for line in f.readlines():\n", + " subj_id, count = line.split(',')\n", + " lesion_counts[subj_id] = int(count)\n", + "\n", + "print('Loaded lesion counts:')\n", + "for subj, count in lesion_counts.items():\n", + " print('{}: {}'.format(subj, count))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"functions\"></a>\n", + "## Functions\n", + "\n", + "\n", + "You will find functions pretty familiar in python to start with, although they\n", + "have a few options which are really handy and different from C++ or matlab (to\n", + "be covered in a later practical). To start with we'll look at a simple\n", + "function but note a few key points:\n", + "\n", + "\n", + "* you *must* indent everything inside the function (it is a code block and\n", + " indentation is the only way of determining this - just like for the guts of a\n", + " loop)\n", + "* you can return *whatever you want* from a python function, but only a single\n", + " object - it is usual to package up multiple things in a tuple or list, which\n", + " is easily unpacked by the calling invocation: e.g., `a, b, c = myfunc(x)`\n", + "* parameters are passed by *reference* (more on this below)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def myfunc(x, y, z=0):\n", + " r2 = x*x + y*y + z*z\n", + " r = r2**0.5\n", + " return r, r2\n", + "\n", + "rad = myfunc(10, 20)\n", + "print(rad)\n", + "rad, dummy = myfunc(10, 20, 30)\n", + "print(rad)\n", + "rad, _ = myfunc(10,20,30)\n", + "print(rad)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that the `_` is used as shorthand here for a dummy variable\n", + "> that you want to throw away.\n", + ">\n", + "> The return statement implicitly creates a tuple to return and is equivalent\n", + "> to `return (r, r2)`\n", + "\n", + "\n", + "One nice feature of python functions is that you can name the arguments when\n", + "you call them, rather than only doing it by position. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def myfunc(x, y, z=0, flag=''):\n", + " if flag=='L1':\n", + " r = abs(x) + abs(y) + abs(z)\n", + " else:\n", + " r = (x*x + y*y + z*z)**0.5\n", + " return r\n", + "\n", + "rA = myfunc(10, 20)\n", + "rB = myfunc(10, 20, flag='L1')\n", + "rC = myfunc(10, 20, flag='L1', z=30)\n", + "print(rA, rB, rC)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You will often see python functions called with these named arguments. In\n", + "fact, for functions with more than 2 or 3 variables this naming of arguments\n", + "is recommended, because it clarifies what each of the arguments does for\n", + "anyone reading the code.\n", + "\n", + "\n", + "Arguments passed into a python function are *passed by reference* - this is\n", + "where the difference between *mutable* and *immutable* types becomes\n", + "important - if you pass a mutable object into a function, the function\n", + "might change it!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def changelist(l):\n", + " l[0] = 'mwahahaha!'\n", + "\n", + "mylist = [1,2,3,4,5]\n", + "\n", + "print('before:', mylist)\n", + "changelist(mylist)\n", + "print('after:', mylist)\n", + "\n", + "mytup = [1,2,3,4,5]\n", + "changelist(mytup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"working-with-numpy\"></a>\n", + "# Working with `numpy`\n", + "\n", + "\n", + "This section introduces you to [`numpy`](http://www.numpy.org/), Python's\n", + "numerical computing library. Numpy adds a new data type to the Python\n", + "language - the `array` (more specifically, the `ndarray`). A Numpy `array`\n", + "is a N-dimensional array of homogeneously-typed numerical data.\n", + "\n", + "\n", + "Pretty much every scientific computing library in Python is built on top of\n", + "Numpy - whenever you want to access some data, you will be accessing it in the\n", + "form of a Numpy array. So it is worth getting to know the basics.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"the-python-list-versus-the-numpy-array\"></a>\n", + "## The Python list versus the `numpy` array\n", + "\n", + "\n", + "You have already been introduced to the Python `list`, which you can easily\n", + "use to store a handful of numbers (or anything else):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = [10, 8, 12, 14, 7, 6, 11]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You could also emulate a 2D or ND matrix by using lists of lists, for example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "xyz_coords = [[-11.4, 1.0, 22.6],\n", + " [ 22.7, -32.8, 19.1],\n", + " [ 62.8, -18.2, -34.5]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For simple tasks, you could stick with processing your data using python\n", + "lists, and the built-in\n", + "[`math`](https://docs.python.org/3.5/library/math.html) library. And this\n", + "might be tempting, because it does look quite a lot like what you might type\n", + "into Matlab.\n", + "\n", + "\n", + "But **BEWARE!** A Python list is a terrible data structure for scientific\n", + "computing!\n", + "\n", + "\n", + "This is a major source of confusion for people who are learning Python, and\n", + "are trying to write efficient code. It is _crucial_ to be able to distinguish\n", + "between a Python list and a Numpy array.\n", + "\n", + "\n", + "**Python list == Matlab cell array:** A list in Python is akin to a cell\n", + "array in Matlab - they can store anything, but are extremely inefficient, and\n", + "unwieldy when you have more than a couple of dimensions.\n", + "\n", + "\n", + "**Numpy array == Matlab matrix:** These are in contrast to the Numpy array\n", + "and Matlab matrix, which are both thin wrappers around a contiguous chunk of\n", + "memory, and which provide blazing-fast performance (because behind the scenes\n", + "in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down).\n", + "\n", + "\n", + "So you should strongly consider turning those lists into Numpy arrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "data = np.array([10, 8, 12, 14, 7, 6, 11])\n", + "\n", + "xyz_coords = np.array([[-11.4, 1.0, 22.6],\n", + " [ 22.7, -32.8, 19.1],\n", + " [ 62.8, -18.2, -34.5]])\n", + "print('data: ', data)\n", + "print('xyz_coords: ', xyz_coords)\n", + "print('data.shape: ', data.shape)\n", + "print('xyz_coords.shape:', xyz_coords.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Numpy is not a \"built-in\" library, so we have to import it. The statement\n", + "> `import numpy as np` tells Python to *Import the `numpy` library, and make\n", + "> it available as a variable called `np`.*\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"creating-arrays\"></a>\n", + "## Creating arrays\n", + "\n", + "\n", + "Numpy has quite a few functions which behave similarly to their equivalents in\n", + "Matlab:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('np.zeros gives us zeros: ', np.zeros(5))\n", + "print('np.ones gives us ones: ', np.ones(5))\n", + "print('np.arange gives us a range: ', np.arange(5))\n", + "print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5))\n", + "print('np.random.random gives us random numbers [0-1]:', np.random.random(5))\n", + "print('np.random.randint gives us random integers: ', np.random.randint(1, 10, 5))\n", + "print('np.eye gives us an identity matrix:')\n", + "print(np.eye(4))\n", + "print('np.diag gives us a diagonal matrix:')\n", + "print(np.diag([1, 2, 3, 4]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `zeros` and `ones` functions can also be used to generate N-dimensional\n", + "arrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = np.zeros((3, 4))\n", + "o = np.ones((2, 10))\n", + "print(z)\n", + "print(o)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the\n", + "> second to columns - just like in Matlab.\n", + "\n", + "\n", + "\n", + "> **MATLAB pitfall:** Arithmetic operations on arrays in Numpy work on an\n", + "> *elementwise* basis. In particular, if you multiply two arrays together,\n", + "> you will get the elementwise product. You **won't** get the dot product,\n", + "> like you would in MATLAB. You can, however, use the `@` operator to perform\n", + "> matrix multiplication on numpy arrays.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"example-reading-arrays-from-text-files\"></a>\n", + "## Example: reading arrays from text files\n", + "\n", + "\n", + "The `numpy.loadtxt` function is capable of loading numerical data from\n", + "plain-text files. By default it expects space-separated data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = np.loadtxt('data/space_separated.txt')\n", + "print('data in data/space_separated.txt:')\n", + "print(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But you can also specify the delimiter to expect<sup>1</sup>:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = np.loadtxt('data/comma_separated.txt', delimiter=',')\n", + "print('data in data/comma_separated.txt:')\n", + "print(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> <sup>1</sup> And many other things such as file headers, footers, comments,\n", + "> and newline characters - see the\n", + "> [docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html)\n", + "> for more information.\n", + "\n", + "\n", + "Of course you can also save data out to a text file just as easily, with\n", + "[`numpy.savetxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = np.random.randint(1, 10, (10, 10))\n", + "np.savetxt('mydata.txt', data, delimiter=',', fmt='%i')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Jupyter notebooks have a special feature - if you start a line with a `!`\n", + "character, you can run a `bash` command. Let's look at the file we just\n", + "generated:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!cat mydata.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The `!` feature won't work in regular Python scripts.\n", + "\n", + "\n", + "Here's how we can load a 2D array fom a file, and calculate the mean of each\n", + "column:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = np.loadtxt('data/2d_array.txt', comments='%')\n", + "colmeans = data.mean(axis=0)\n", + "\n", + "print('Column means')\n", + "print('\\n'.join(['{}: {:0.2f}'.format(i, m) for i, m in enumerate(colmeans)]))" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/talks/virtual_intro/intro.md b/talks/virtual_intro/intro.md new file mode 100644 index 0000000000000000000000000000000000000000..ae6feba4a44b1df274a62eb48dbcee07aa3b648e --- /dev/null +++ b/talks/virtual_intro/intro.md @@ -0,0 +1,1001 @@ +# Welcome to the WIN Virtual Mini PyTreat 2020! + + +This notebook is available at: + + +https://git.fmrib.ox.ac.uk/fsl/pytreat-practicals-2020/-/tree/master/talks%2Fvirtual_intro/intro.ipynb + + +If you have FSL installed and you'd like to follow along *interactively*, +follow the instructions for attendees in the `README.md` file of the above +repository, and then open the `talks/virtual_intro/intro.ipynb` notebook. + + +# Contents + + +* [Introduction](#introduction) + * [Python in a nutshell](#python-in-a-nutshell) + * [Different ways of running Python](#different-ways-of-running-python) +* [Variables and basic types](#variables-and-basic-types) + * [Integer and floating point scalars](#integer-and-floating-point-scalars) + * [Strings](#strings) + * [Lists and tuples](#lists-and-tuples) + * [Dictionaries](#dictionaries) + * [A note on mutablility](#a-note-on-mutablility) +* [Flow control](#flow-control) + * [List comprehensions](#list-comprehensions) +* [Reading and writing text files](#reading-and-writing-text-files) + * [Example: processing lesion counts](#example-processing-lesion-counts) +* [Functions](#functions) +* [Working with `numpy`](#working-with-numpy) + * [The Python list versus the `numpy` array](#the-python-list-versus-the-numpy-array) + * [Creating arrays](#creating-arrays) + * [Example: reading arrays from text files](#example-reading-arrays-from-text-files) + + +<a class="anchor" id="introduction"></a> +# Introduction + + +This talk is an attempt to give a whirlwind overview of the Python programming +language. It is assumed that you have experience with another programming +language (e.g. MATLAB). + + +This talk is presented as an interactive [Jupyter +Notebook](https://jupyter.org/) - you can run all of the code on your own +machine - click on a code block, and press **SHIFT+ENTER**. You can also "run" +the text sections, so you can just move down the document by pressing +**SHIFT+ENTER**. + + +It is also possible to *change* the contents of each code block (these pages +are completely interactive) so do experiment with the code you see and try +some variations! + + +You can get help on any Python object, function, or method by putting a `?` +before or after the thing you want help on: + + +``` +a = 'hello!' +?a.upper +``` + + +And you can explore the available methods on a Python object by using the +**TAB** key: + + +``` +# Put the cursor after the dot, and press the TAB key... +a. +``` + + +<a class="anchor" id="python-in-a-nutshell"></a> +## Python in a nutshell + + +**Pros** + + +* _Flexible_ Feel free to use functions, classes, objects, modules and + packages. Or don't - it's up to you! + +* _Fast_ If you do things right (in other words, if you use `numpy`) + +* _Dynamically typed_ No need to declare your variables, or specify their + types. + +* _Intuitive syntax_ How do I run some code for each of the elements in my + list? + + +``` +mylist = [1, 2, 3, 4, 5] + +for element in mylist: + print(element) +``` + + +**Cons** + + +* _Dynamically typed_ Easier to make mistakes, harder to catch them + +* _No compiler_ See above + +* _Slow_ if you don't do things the right way + +* _Python 2 is not the same as Python 3_ But there's an easy solution: Forget + that Python 2 exists. + +* _Hard to manage different versions of python_ But we have a solution for + you: `fslpython`. + + +Python is a widely used language, so you can get lots of help through google +and [stackoverflow](https://stackoverflow.com). But make sure that the +information you find is for **Python 3**, and **not** for **Python 2**! +Python 2 is obsolete, but is still used by many organisations, so you will +inevitably come across many Python 2 resources. + + +The differences between Python 2 and 3 are small, but important. The most +visible difference is in the `print` function: in Python 3, we write +`print('hello!')`, but in Python 2, we would write `print 'hello!'`. + + +FSL 5.0.10 and newer comes with its own version of Python, bundled with nearly +all of the scientific libraries that you are likely to need. + + +So if you use `fslpython` for all of your development, you can be sure that it +will work in FSL! + + +<a class="anchor" id="different-ways-of-running-python"></a> +## Different ways of running Python + + +Many of the Pytreat talks and practicals are presented as *Jupyter notebooks*, +which is a way of running python code in a web browser. + + +Jupyter notebooks are good for presentations and practicals, and some people +find them very useful for exploratory data analysis. But they're not the only +way of running Python code. + + +**Run Python from a file** + + +This works just like it does in MATLAB: + + +1. Put your code in a `.py` file (e.g. `mycode.py`). +2. Run `fslpython mycode.py` in a terminal. +3. ?? +4. Profit. + + +**Run python in an interpreter** + + +Python is an [*interpreted +language*](https://en.wikipedia.org/wiki/Interpreted_language), like MATLAB. +So you can either write your code into a file, and then run that file, or you +can type code directly into a Python interpreter. + + +Python has a standard interpreter built-in - run `fslpython` in a terminal, +and see what happens (use CTRL+D to exit). + + +**But** there is another interpreter called [IPython](https://ipython.org/) +which is vastly superior to the standard Python interpreter. Use IPython +instead! It is already installed in `fslpython`, so if you want to do some +interactive work, you can use `fslipython` in a terminal. + + +<a class="anchor" id="variables-and-basic-types"></a> +# Variables and basic types + + +There are many different types of values in Python. Python *variables* do not +have a type though - a variable can refer to values of any type, and a +variable can be updated to refer to different values (of different +types). This is just like how things work in MATLAB. + + +<a class="anchor" id="integer-and-floating-point-scalars"></a> +## Integer and floating point scalars + +``` +a = 7 +b = 1 / 3 +c = a + b +print('a: ', a) +print('b: ', b) +print('c: ', c) +print('b: {:0.4f}'.format(b)) +print('a + b:', a + b) +``` + + +<a class="anchor" id="strings)"></a> +## Strings + + +``` +a = 'Hello' +b = "Kitty" +c = ''' +Magic +multi-line +strings! +''' + +print(a, b) +print(a + b) +print('{}, {}!'.format(a, b)) +print(c) +``` + + +String objects have a number of useful methods: + + +``` +s = 'This is a Test String' +print(s.upper()) +print(s.lower()) +``` + + +Another useful method is: + + +``` +s = 'This is a Test String' +s2 = s.replace('Test', 'Better') +print(s2) +``` + + +Two common and convenient string methods are `strip()` and `split()`. The +first will remove any whitespace at the beginning and end of a string: + + +``` +s2 = ' A very spacy string ' +print('*' + s2 + '*') +print('*' + s2.strip() + '*') +``` + + +With `split()` we can tokenize a string (to turn it into a list of strings) +like this: + + +``` +print(s.split()) +print(s2.split()) +``` + + +We can also use the `join` method to re-construct a new string. Imagine that +we need to reformat some data from being comma-separated to being +space-separated: + + +``` +data = ' 1,2,3,4,5,6,7 ' +``` + + +`strip`, `split` and `join` makes this job trivial: + + +``` +print('Original: {}'.format(data)) +print('Strip, split, and join: {}'.format(' '.join(data.strip().split(',')))) +``` + + +<a class="anchor" id="lists-and-tuples"></a> +## Lists and tuples + + +Both tuples and lists are built-in Python types and are like cell-arrays in +MATLAB. For numerical vectors and arrays it is much better to use *numpy* +arrays, which are covered later. + + +Tuples are defined using round brackets and lists are defined using square +brackets. For example: + + +``` +t = (3, 7.6, 'str') +l = [1, 'mj', -5.4] +print(t) +print(l) + +t2 = (t, l) +l2 = [t, l] +print('t2 is: ', t2) +print('l3 is: ', l2) +print(len(t2)) +print(len(l2)) +``` + + +The key difference between lists and tuples is that tuples are *immutable* +(once created, they cannot be changed), whereas lists are *mutable*: + + +``` +a = [10, 20, 30] +a[2] = 999 +print(a) +``` + + +Square brackets are used to index tuples, lists, strings, dictionaries, etc. +For example: + + +``` +d = [10, 20, 30] +print(d[1]) +``` + + +> **MATLAB pitfall:** Python uses zero-based indexing, unlike MATLAB, where +> indices start from 1. + + +``` +a = [10, 20, 30, 40, 50, 60] +print(a[0]) +print(a[2]) +``` + + +A range of values for the indices can be specified to extract values from a +list or tuple using the `:` character. For example: + + +``` +print(a[0:3]) +``` + + + +> **MATLAB pitfall:** Note that Python's slicing syntax is different from +> MATLAB in that the second number is *exclusive*, i.e. `a[0:3]` gives us the +> elements of `a` at positions `0`, `1` and `2` , but *not* at position `3`. + + +When slicing a list or tuple, you can leave the start and end values out - +when you do this, Python will assume that you want to start slicing from the +beginning or the end of the list. For example: + + +``` +print(a[:3]) +print(a[1:]) +print(a[:]) +print(a[:-1]) +``` + + +You can also change the step size, which is specified by the third value (not +the second one, as in MATLAB). For example: + + +``` +print(a[0:4:2]) +print(a[::2]) +print(a[::-1]) +``` + + +Some methods are available on `list` objects for adding and removing items: + + +``` +print(d) +d.append(40) +print(d) +d.extend([50, 60]) +print(d) +d = d + [70, 80] +print(d) +d.remove(20) +print(d) +d.pop(0) +print(d) +``` + + +What will `d.append([50,60])` do, and how is it different from +`d.extend([50,60])`? + + +``` +d.append([50, 60]) +print(d) +``` + + +<a class="anchor" id="dictionaries"></a> +## Dictionaries + + +Dictionaries (or *dicts*) can be used to store key-value pairs. Almost +anything can used as a key, and anything can be stored as a value; it is +common to use strings as keys: + + +``` +e = {'a' : 10, 'b': 20} +print(len(e)) +print(e.keys()) +print(e.values()) +print(e['a']) +``` + + +Like lists (and unlike tuples), dicts are mutable, and have a number of +methods for manipulating them: + + +``` +e['c'] = 30 +e.pop('a') +e.update({'a' : 100, 'd' : 400}) +print(e) +e.clear() +print(e) +``` + + +<a class="anchor" id="a-note-on-mutability"></a> +## A note on mutablility + + +Python variables can refer to values which are either mutable, or +immutable. Examples of immutable values are strings, tuples, and integer and +floating point scalars. Examples of mutable values are lists, dicts, and most +user-defined types. + + +When you pass an immutable value around (e.g. into a function, or to another +variable), it works the same as if you were to copy the value and pass in the +copy - the original value is not changed: + + +``` +a = 'abcde' +b = a +b = b.upper() +print('a:', a) +print('b:', b) +``` + + +In contrast, when you pass a mutable value around, you are passing a +*reference* to that value - there is only ever one value in existence, but +multiple variables refer to it. You can manipulate the value through any of +the variables that refer to it: + + +``` +a = [1, 2, 3, 4, 5] +b = a + +a[3] = 999 +b.append(6) + +print('a', a) +print('b', b) +``` + + +<a class="anchor" id="flow-control"></a> +# Flow control + + +Python also has a boolean type which can be either `True` or `False`. Most +Python types can be implicitly converted into booleans when used in a +conditional expression. + + +Relevant boolean and comparison operators include: `not`, `and`, `or`, `==` +and `!=` + + +For example: +``` +a = True +b = False +print('Not a is:', not a) +print('a or b is:', a or b) +print('a and b is:', a and b) +print('Not 1 is:', not 1) +print('Not 0 is:', not 0) +print('Not {} is:', not {}) +print('{}==0 is:', {}==0) +``` + + +There is also the `in` test for strings, lists, etc: + + +``` +print('the' in 'a number of words') +print('of' in 'a number of words') +print(3 in [1, 2, 3, 4]) +``` + + +We can use boolean values in `if`-`else` conditional expressions: + + +``` +a = [1, 2, 3, 4] +val = 3 +if val in a: + print('Found {}!'.format(val)) +else: + print('{} not found :('.format(val)) +``` + + +Note that the indentation in the `if`-`else` statement is **crucial**. +**All** python control blocks are delineated purely by indentation. We +recommend using **four spaces** and no tabs, as this is a standard practice +and will help a lot when collaborating with others. + + +You can use the `for` statement to loop over elements in a list: + + +``` +d = [10, 20, 30] +for x in d: + print(x) +``` + + +You can also loop over the key-value pairs in a dict: + + +``` +a = {'a' : 10, 'b' : 20, 'c' : 30} +print('a.items()') +for key, val in a.items(): + print(key, val) +print('a.keys()') +for key in a.keys(): + print(key, a[key]) +print('a.values()') +for val in a.values(): + print(val) +``` + + +> In older versions of Python 3, there was no guarantee of ordering when using dictionaries. +> However, a of Python 3.7, dictionaries will remember the order in which items are inserted, +> and the `keys()`, `values()`, and `items()` methods will return elements in that order. +> + +> If you want a dictionary with ordering, *and* you want your code to work with +> Python versions older than 3.7, you can use the +> [`OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict) +> class. + + +There are some handy built-in functions that you can use with `for` loops: + + +``` +d = [10, 20, 30] +print('Using the range function') +for i in range(len(d)): + print('element at position {}: {}'.format(i, d[i])) + +print('Using the enumerate function') +for i, elem in enumerate(d): + print('element at position {}: {}'.format(i, elem)) +``` + + +<a class="anchor" id=" list-comprehensions"></a> +## List comprehensions + + +Python has a really neat way to create lists (and dicts), called +*comprehensions*. Let's say we have some strings, and we want to count the +number of characters in each of them: + + +``` +strings = ['hello', 'howdy', 'hi', 'hey'] +nchars = [len(s) for s in strings] +for s, c in zip(strings, nchars): + print('{}: {}'.format(s, c)) +``` + + +> The `zip` function "zips" two or more sequences, so you can loop over them +> together. + + +Or we could store the character counts in a dict: + + +``` +nchars = { s : len(s) for s in strings } + +for s, c in nchars.items(): + print('{}: {}'.format(s, c)) +``` + + +<a class="anchor" id="reading-and-writing-text-files"></a> +# Reading and writing text files + + +The syntax to open a file in python is +`with open(<filename>, <mode>) as <file_object>: <block of code>`, where + + +* `filename` is a string with the name of the file +* `mode` is one of 'r' (for read-only access), 'w' (for writing a file, this + wipes out any existing content), 'a' (for appending to an existing file). +* `file_object` is a variable name which will be used within the `block of + code` to access the opened file. + + +For example the following will read all the text in `data/file.txt` and print +it: + + +``` +with open('data/file.txt', 'r') as f: + print(f.read()) +``` + + +A very similar syntax is used to write files: + + +``` +with open('new_file.txt', 'w') as f: + f.write('This is my first line\n') + f.writelines(['Second line\n', 'and the third\n']) +``` + + +<a class="anchor" id="example-processing-lesion-counts"></a> +## Example: processing lesion counts + + +Imagine that we have written an amazing algorithm in Python which +automatically counts the number of lesions in an individual's structural MRI +image. + + +``` +subject_ids = ['01', '07', '21', '32'] +lesion_counts = [ 4, 9, 13, 2] +``` + + +We may wish to process this data in another application (e.g. Excel or SPSS). +Let's save the results out to a CSV (comma-separated value) file: + + +``` +with open('lesion_counts.csv', 'w') as f: + f.write('Subject ID, Lesion count\n') + for subj_id, count in zip(subject_ids, lesion_counts): + f.write('{}, {}\n'.format(subj_id, count)) +``` + + +We can now load the `lesion_counts.csv` file into our analysis software of +choice. Or we could load it back into another Python session, and store +the data in a dict: + + +``` +lesion_counts = {} + +with open('lesion_counts.csv', 'r') as f: + # skip the header + f.readline() + for line in f.readlines(): + subj_id, count = line.split(',') + lesion_counts[subj_id] = int(count) + +print('Loaded lesion counts:') +for subj, count in lesion_counts.items(): + print('{}: {}'.format(subj, count)) +``` + + +<a class="anchor" id="functions"></a> +## Functions + + +You will find functions pretty familiar in python to start with, although they +have a few options which are really handy and different from C++ or matlab (to +be covered in a later practical). To start with we'll look at a simple +function but note a few key points: + + +* you *must* indent everything inside the function (it is a code block and + indentation is the only way of determining this - just like for the guts of a + loop) +* you can return *whatever you want* from a python function, but only a single + object - it is usual to package up multiple things in a tuple or list, which + is easily unpacked by the calling invocation: e.g., `a, b, c = myfunc(x)` +* parameters are passed by *reference* (more on this below) + + +``` +def myfunc(x, y, z=0): + r2 = x*x + y*y + z*z + r = r2**0.5 + return r, r2 + +rad = myfunc(10, 20) +print(rad) +rad, dummy = myfunc(10, 20, 30) +print(rad) +rad, _ = myfunc(10,20,30) +print(rad) +``` + + +> Note that the `_` is used as shorthand here for a dummy variable +> that you want to throw away. +> +> The return statement implicitly creates a tuple to return and is equivalent +> to `return (r, r2)` + + +One nice feature of python functions is that you can name the arguments when +you call them, rather than only doing it by position. For example: + + +``` +def myfunc(x, y, z=0, flag=''): + if flag=='L1': + r = abs(x) + abs(y) + abs(z) + else: + r = (x*x + y*y + z*z)**0.5 + return r + +rA = myfunc(10, 20) +rB = myfunc(10, 20, flag='L1') +rC = myfunc(10, 20, flag='L1', z=30) +print(rA, rB, rC) +``` + + +You will often see python functions called with these named arguments. In +fact, for functions with more than 2 or 3 variables this naming of arguments +is recommended, because it clarifies what each of the arguments does for +anyone reading the code. + + +Arguments passed into a python function are *passed by reference* - this is +where the difference between *mutable* and *immutable* types becomes +important - if you pass a mutable object into a function, the function +might change it! + + +``` +def changelist(l): + l[0] = 'mwahahaha!' + +mylist = [1,2,3,4,5] + +print('before:', mylist) +changelist(mylist) +print('after:', mylist) + +mytup = [1,2,3,4,5] +changelist(mytup) +``` + + +<a class="anchor" id="working-with-numpy"></a> +# Working with `numpy` + + +This section introduces you to [`numpy`](http://www.numpy.org/), Python's +numerical computing library. Numpy adds a new data type to the Python +language - the `array` (more specifically, the `ndarray`). A Numpy `array` +is a N-dimensional array of homogeneously-typed numerical data. + + +Pretty much every scientific computing library in Python is built on top of +Numpy - whenever you want to access some data, you will be accessing it in the +form of a Numpy array. So it is worth getting to know the basics. + + +<a class="anchor" id="the-python-list-versus-the-numpy-array"></a> +## The Python list versus the `numpy` array + + +You have already been introduced to the Python `list`, which you can easily +use to store a handful of numbers (or anything else): + + +``` +data = [10, 8, 12, 14, 7, 6, 11] +``` + + +You could also emulate a 2D or ND matrix by using lists of lists, for example: + + +``` +xyz_coords = [[-11.4, 1.0, 22.6], + [ 22.7, -32.8, 19.1], + [ 62.8, -18.2, -34.5]] +``` + + +For simple tasks, you could stick with processing your data using python +lists, and the built-in +[`math`](https://docs.python.org/3.5/library/math.html) library. And this +might be tempting, because it does look quite a lot like what you might type +into Matlab. + + +But **BEWARE!** A Python list is a terrible data structure for scientific +computing! + + +This is a major source of confusion for people who are learning Python, and +are trying to write efficient code. It is _crucial_ to be able to distinguish +between a Python list and a Numpy array. + + +**Python list == Matlab cell array:** A list in Python is akin to a cell +array in Matlab - they can store anything, but are extremely inefficient, and +unwieldy when you have more than a couple of dimensions. + + +**Numpy array == Matlab matrix:** These are in contrast to the Numpy array +and Matlab matrix, which are both thin wrappers around a contiguous chunk of +memory, and which provide blazing-fast performance (because behind the scenes +in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down). + + +So you should strongly consider turning those lists into Numpy arrays: + + +``` +import numpy as np + +data = np.array([10, 8, 12, 14, 7, 6, 11]) + +xyz_coords = np.array([[-11.4, 1.0, 22.6], + [ 22.7, -32.8, 19.1], + [ 62.8, -18.2, -34.5]]) +print('data: ', data) +print('xyz_coords: ', xyz_coords) +print('data.shape: ', data.shape) +print('xyz_coords.shape:', xyz_coords.shape) +``` + + +> Numpy is not a "built-in" library, so we have to import it. The statement +> `import numpy as np` tells Python to *Import the `numpy` library, and make +> it available as a variable called `np`.* + + +<a class="anchor" id="creating-arrays"></a> +## Creating arrays + + +Numpy has quite a few functions which behave similarly to their equivalents in +Matlab: + + +``` +print('np.zeros gives us zeros: ', np.zeros(5)) +print('np.ones gives us ones: ', np.ones(5)) +print('np.arange gives us a range: ', np.arange(5)) +print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5)) +print('np.random.random gives us random numbers [0-1]:', np.random.random(5)) +print('np.random.randint gives us random integers: ', np.random.randint(1, 10, 5)) +print('np.eye gives us an identity matrix:') +print(np.eye(4)) +print('np.diag gives us a diagonal matrix:') +print(np.diag([1, 2, 3, 4])) +``` + + +The `zeros` and `ones` functions can also be used to generate N-dimensional +arrays: + + +``` +z = np.zeros((3, 4)) +o = np.ones((2, 10)) +print(z) +print(o) +``` + + +> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the +> second to columns - just like in Matlab. + + + +> **MATLAB pitfall:** Arithmetic operations on arrays in Numpy work on an +> *elementwise* basis. In particular, if you multiply two arrays together, +> you will get the elementwise product. You **won't** get the dot product, +> like you would in MATLAB. You can, however, use the `@` operator to perform +> matrix multiplication on numpy arrays. + + +<a class="anchor" id="example-reading-arrays-from-text-files"></a> +## Example: reading arrays from text files + + +The `numpy.loadtxt` function is capable of loading numerical data from +plain-text files. By default it expects space-separated data: + + +``` +data = np.loadtxt('data/space_separated.txt') +print('data in data/space_separated.txt:') +print(data) +``` + + +But you can also specify the delimiter to expect<sup>1</sup>: + + +``` +data = np.loadtxt('data/comma_separated.txt', delimiter=',') +print('data in data/comma_separated.txt:') +print(data) +``` + + +> <sup>1</sup> And many other things such as file headers, footers, comments, +> and newline characters - see the +> [docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) +> for more information. + + +Of course you can also save data out to a text file just as easily, with +[`numpy.savetxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html): + + +``` +data = np.random.randint(1, 10, (10, 10)) +np.savetxt('mydata.txt', data, delimiter=',', fmt='%i') +``` + + +Jupyter notebooks have a special feature - if you start a line with a `!` +character, you can run a `bash` command. Let's look at the file we just +generated: + + +``` +!cat mydata.txt +``` + + +> The `!` feature won't work in regular Python scripts. + + +Here's how we can load a 2D array fom a file, and calculate the mean of each +column: + + +``` +data = np.loadtxt('data/2d_array.txt', comments='%') +colmeans = data.mean(axis=0) + +print('Column means') +print('\n'.join(['{}: {:0.2f}'.format(i, m) for i, m in enumerate(colmeans)])) +```