From 0bee0fbffaeb8ddd1967515ec9f17dd004438ce9 Mon Sep 17 00:00:00 2001
From: Paul McCarthy <>
Date: Sat, 27 Jan 2018 16:58:51 +0000
Subject: [PATCH] Working on numpy practical

 getting_started/04_numpy.ipynb | 647 +++++++++++++++++++++++++++++++--
 getting_started/    | 493 +++++++++++++++++++++++--
 2 files changed, 1080 insertions(+), 60 deletions(-)

diff --git a/getting_started/04_numpy.ipynb b/getting_started/04_numpy.ipynb
index f0a4507..0f8ccf3 100644
--- a/getting_started/04_numpy.ipynb
+++ b/getting_started/04_numpy.ipynb
@@ -14,18 +14,31 @@
     "Numpy is not actually part of the standard Python library. But it is a\n",
     "fundamental part of the Python ecosystem - it forms the basis for many\n",
     "important Python libraries, and it (along with its partners\n",
-    "[`scipy`]( and [`matplotlib`](\n",
-    "is what makes Python a viable alternative to Matlab as a scientific computing\n",
-    "platform.\n",
+    "[`scipy`](, [`matplotlib`]( and\n",
+    "[`pandas`]( is what makes Python an attractive\n",
+    "alternative to Matlab as a scientific computing platform.\n",
     "## Contents\n",
     "* [The Python list versus the Numpy array](#the-python-list-versus-the-numpy-array)\n",
-    "* [Importing Numpy](#importing-numpy)\n",
     "* [Numpy basics](#numpy-basics)\n",
-    "* [Indexing](#indexing)\n",
+    " * [Creating arrays](#creating-arrays)\n",
+    " * [Operating on arrays](#operating-on-arrays)\n",
+    " * [Array properties](#array-properties)\n",
+    " * [Descriptive statistics](#descriptive-statistics)\n",
+    " * [Reshaping and rearranging arrays](#reshaping-and-rearranging-arrays)\n",
+    "* [Array indexing](#array-indexing)\n",
+    " * [Indexing multi-dimensional arrays](#indexing-multi-dimensional-arrays)\n",
+    " * [Boolean indexing](#boolean-indexing)\n",
+    " * [Coordinate array indexing](#coordinate-array-indexing)\n",
+    "* [Array operations and broadcasting](#array-operations-and-broadcasting)\n",
+    "* [Generating random numbers](#generating-random-numbers)\n",
+    "\n",
+    "* [Appendix: Importing Numpy](#appendix-importing-numpy)\n",
+    "* [Appendix: Vectors in Numpy](#appendix-vectors-in-numpy)\n",
+    "\n",
     "<a class=\"anchor\" id=\"the-python-list-versus-the-numpy-array\"></a>\n",
@@ -33,9 +46,12 @@
     "Numpy adds a new data type to the Python language - the `array` (more\n",
-    "specifically, the `ndarray`). You have already been introduced to the Python\n",
-    "`list`, which you can easily use to store a handful of numbers (or anything\n",
-    "else):"
+    "specifically, the `ndarray`). A Numpy `array` is a N-dimensional array of\n",
+    "homogeneously-typed numerical data.\n",
+    "\n",
+    "\n",
+    "You have already been introduced to the Python `list`, which you can easily\n",
+    "use to store a handful of numbers (or anything else):"
@@ -60,7 +76,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "xyz_coords = [[-11.4, 1.0, 22.6], [22.7, -32.8, 19.1], [62.8, -18.2, -34.5]]"
+    "xyz_coords = [[-11.4,   1.0,  22.6],\n",
+    "              [ 22.7, -32.8,  19.1],\n",
+    "              [ 62.8, -18.2, -34.5]]"
@@ -80,19 +98,19 @@
     "This is a major source of confusion for those poor souls who have spent their\n",
     "lives working in Matlab, but have finally seen the light and switched to\n",
-    "Python. It is very important to be able to distinguish between a Python list,\n",
-    "and a Numpy array.\n",
+    "Python. It is _crucial_ to be able to distinguish between a Python list and a\n",
+    "Numpy array.\n",
-    "A list in python is akin to a cell array in Matlab - they can store anything,\n",
-    "but are extremely inefficient, and unwieldy when you have more than a couple\n",
-    "of dimensions.\n",
+    "___Python list == Matlab cell array:___ A list in python is akin to a cell\n",
+    "array in Matlab - they can store anything, but are extremely inefficient, and\n",
+    "unwieldy when you have more than a couple of dimensions.\n",
-    "These are in contrast to the Numpy array and Matlab matrix, which are both\n",
-    "thin wrappers around a contiguous chunk of memory, and which provide\n",
-    "blazing-fast performance (because behind the scenes in both Numpy and Matlab,\n",
-    "it's C, C++ and FORTRAN all the way down).\n",
+    "___Numy array == Matlab matrix:___ These are in contrast to the Numpy array\n",
+    "and Matlab matrix, which are both thin wrappers around a contiguous chunk of\n",
+    "memory, and which provide blazing-fast performance (because behind the scenes\n",
+    "in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down).\n",
     "So you should strongly consider turning those lists into Numpy arrays:"
@@ -119,8 +137,7 @@
    "source": [
     "If you look carefully at the code above, you will notice that we are still\n",
     "actually using Python lists. We have declared our data sets in exactly the\n",
-    "same way that we did earlier, by denoting them with square brackets `[` and\n",
-    "`]`.\n",
+    "same way as we did earlier, by denoting them with square brackets `[` and `]`.\n",
     "The key difference here is that these lists immediately get converted into\n",
@@ -151,12 +168,564 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "I'm emphasising this to help you understand the difference between Python\n",
-    "lists and Numpy arrays. Apologies if you've already got it.\n",
+    "Of course, in practice, we would never create a Numpy array in this way - we\n",
+    "will be loading our data from text or binary files directly into a Numpy\n",
+    "array, thus completely bypassing the use of Python lists and the costly\n",
+    "list-to-array conversion.  I'm emphasising this to help you understand the\n",
+    "difference between Python lists and Numpy arrays. Apologies if you've already\n",
+    "got it, forgiveness please.\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"numpy-basics\"></a>\n",
+    "## Numpy basics\n",
+    "\n",
+    "\n",
+    "Let's get started."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"creating-arrays\"></a>\n",
+    "### Creating arrays\n",
+    "\n",
+    "\n",
+    "Numpy has quite a few functions which behave similarly to their equivalents in\n",
+    "Matlab:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('np.zeros gives us zeros:                       ', np.zeros(5))\n",
+    "print('np.ones gives us ones:                         ', np.ones(5))\n",
+    "print('np.arange gives us a range:                    ', np.arange(5))\n",
+    "print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5))\n",
+    "print('np.random.random gives us random numbers:      ', np.random.random(5))\n",
+    "print('np.random.randint gives us random integers:    ', np.random.randint(1, 10, 5))\n",
+    "print('np.eye gives us an identity matrix:')\n",
+    "print(np.eye(4))\n",
+    "print('np.diag gives us a diagonal matrix:')\n",
+    "print(np.diag([1, 2, 3, 4]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> There will be more on random numbers [below](#generating-random-numbers).\n",
+    "\n",
+    "\n",
+    "The `zeros` and `ones` functions can also be used to generate N-dimensional\n",
+    "arrays:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = np.zeros((3, 4))\n",
+    "o = np.ones((2, 10))\n",
+    "print(z)\n",
+    "print(o)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the\n",
+    "> second to columns - just like in Matlab.\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"operating-on-arrays\"></a>\n",
+    "### Operating on arrays\n",
+    "\n",
+    "\n",
+    "All of the mathematical operators you know and love can be applied to Numpy\n",
+    "arrays:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (3, 3))\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print('a + 2:')\n",
+    "print( a + 2)\n",
+    "print('a * 3:')\n",
+    "print( a * 3)\n",
+    "print('a % 2:')\n",
+    "print( a % 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll cover more advanced array operations\n",
+    "[below](#array-operations-and-broadcasting).\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"array-properties\"></a>\n",
+    "### Array properties\n",
+    "\n",
+    "\n",
+    "Numpy is a bit different than Matlab in the way that you interact with\n",
+    "arrays. In Matlab, you would typically pass an array to a built-in function,\n",
+    "e.g. `size(M)`, `ndims(M)`, etc. In contrast, a Numpy array is a Python\n",
+    "object which has _attributes_ that contain basic information about the array:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = np.zeros((2, 3, 4))\n",
+    "print(z)\n",
+    "print('Shape:                     ', z.shape)\n",
+    "print('Number of dimensions:      ', z.ndim)\n",
+    "print('Number of elements:        ', z.size)\n",
+    "print('Data type:                 ', z.dtype)\n",
+    "print('Number of bytes:           ', z.nbytes)\n",
+    "print('Length of first dimension: ', len(z))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> As depicted above, passing a Numpy array to the built-in `len` function will\n",
+    "> only give you the length of the first dimension, so you will typically want\n",
+    "> to avoid using it - use the `size` attribute instead.\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"descriptive-statistics\"></a>\n",
+    "### Descriptive statistics\n",
+    "\n",
+    "\n",
+    "Similarly, a Numpy array has a set of methods<sup>1</sup> which allow you to\n",
+    "calculate basic descriptive statisics on an array:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.random(10)\n",
+    "print('a: ', a)\n",
+    "print('min:          ', a.min())\n",
+    "print('max:          ', a.max())\n",
+    "print('index of min: ', a.argmin())  # remember that in Python, list indices\n",
+    "print('index of max: ', a.argmax())  # start from zero - Numpy is the same!\n",
+    "print('mean:         ', a.mean())\n",
+    "print('variance:     ', a.var())\n",
+    "print('stddev:       ', a.std())\n",
+    "print('sum:          ', a.sum())\n",
+    "print('prod:         ',"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> <sup>1</sup> Python, being an object-oriented language, distinguishes\n",
+    "> between _functions_ and _methods_. _Method_ is simply the term used to refer\n",
+    "> to a function that is associated with a specific object. Similarly, the term\n",
+    "> _attribute_ is used to refer to some piece of information that is attached\n",
+    "> to an object, such as `z.shape`, or `z.dtype`.\n",
-    "<a class=\"anchor\" id=\"importing-numpy\"></a>\n",
-    "## Importing numpy\n",
+    "<a class=\"anchor\" id=\"reshaping-and-rearranging-arrays\"></a>\n",
+    "### Reshaping and rearranging arrays\n",
+    "\n",
+    "\n",
+    "A numpy array can be reshaped very easily, using the `reshape` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (4, 4))\n",
+    "b = a.reshape((2, 8))\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print('b:')\n",
+    "print(b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that this does not modify the underlying data in any way - the `reshape`\n",
+    "method returns a _view_ of the same array, just indexed differently:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a[3, 3] = 12345\n",
+    "b[0, 7] = 54321\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print('b:')\n",
+    "print(b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you need to create a reshaped copy of an array, use the `np.array`\n",
+    "function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (4, 4))\n",
+    "b = np.array(a.reshape(2, 8))\n",
+    "a[3, 3] = 12345\n",
+    "b[0, 7] = 54321\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print('b:')\n",
+    "print(b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `T` attribute is a shortcut to obtain the transpose of an array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (4, 4))\n",
+    "print(a)\n",
+    "print(a.T)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `transpose` method allows you to obtain more complicated rearrangements\n",
+    "of an array's axes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (2, 3, 4))\n",
+    "b = a.transpose((2, 0, 1))\n",
+    "print('a: ', a.shape)\n",
+    "print(a)\n",
+    "print('b:', b.shape)\n",
+    "print(b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> Note again that the `T` attribute and `transpose` method return _views_ of\n",
+    "> your array.\n",
+    "\n",
+    "\n",
+    "Numpy has some useful functions which allow you to concatenate or stack\n",
+    "multiple arrays into one. The `concatenate` function does what it says on the\n",
+    "tin:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.zeros(3)\n",
+    "b = np.ones(3)\n",
+    "\n",
+    "print('1D concatenation:', np.concatenate((a, b)))\n",
+    "\n",
+    "a = np.zeros((3, 3))\n",
+    "b = np.ones((3, 3))\n",
+    "\n",
+    "print('2D column-wise concatenation:')\n",
+    "print(np.concatenate((a, b), axis=1))\n",
+    "\n",
+    "print('2D row-wise concatenation:')\n",
+    "\n",
+    "# The axis parameter defaults to 0,\n",
+    "# so it is not strictly necessary here.\n",
+    "print(np.concatenate((a, b), axis=0))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `hstack`, `vstack` and `dstack` functions allow you to concatenate vectors\n",
+    "or arrays along the first, second, or third dimension respectively:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.zeros(3)\n",
+    "b = np.ones(3)\n",
+    "\n",
+    "print('a: ', a)\n",
+    "print('b: ', b)\n",
+    "\n",
+    "hstacked = np.hstack((a, b))\n",
+    "vstacked = np.vstack((a, b))\n",
+    "dstacked = np.dstack((a, b))\n",
+    "\n",
+    "print('hstacked: (shape {}):'.format(hstacked.shape))\n",
+    "print( hstacked)\n",
+    "print('vstacked: (shape {}):'.format(vstacked.shape))\n",
+    "print( vstacked)\n",
+    "print('dstacked: (shape {}):'.format(dstacked.shape))\n",
+    "print( dstacked)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"array-indexing\"></a>\n",
+    "## Array indexing\n",
+    "\n",
+    "\n",
+    "Just like in Matlab, slicing up your arrays is a breeze in Numpy.  If you are\n",
+    "after some light reading, you might want to check out the [comprehensive Numpy\n",
+    "Indexing\n",
+    "reference](\n",
+    "\n",
+    "\n",
+    "> As with indexing regular Python lists, array indices start from 0, and end\n",
+    "> indices (if specified) are exclusive.\n",
+    "\n",
+    "\n",
+    "Let's whet our appetites with some basic 1D array slicing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, 10)\n",
+    "\n",
+    "print('a:                              ', a)\n",
+    "print('first element:                  ', a[0])\n",
+    "print('first two elements:             ', a[:2])\n",
+    "print('last element:                   ', a[a.shape[0] - 1])\n",
+    "print('last element again:             ', a[-1])\n",
+    "print('last two elements:              ', a[-2:])\n",
+    "print('middle four elements:           ', a[3:7])\n",
+    "print('Every second element:           ', a[::2])\n",
+    "print('Every second element, reversed: ', a[1::-2])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that slicing an array in this way returns a _view_, not a copy, into that\n",
+    "array:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, 10)\n",
+    "print('a:', a)\n",
+    "every2nd = a[::2]\n",
+    "print('every 2nd:', every2nd)\n",
+    "every2nd += 10\n",
+    "print('a':, a)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"indexing-multi-dimensional-arrays\"></a>\n",
+    "### Indexing multi-dimensional arrays\n",
+    "\n",
+    "\n",
+    "Multi-dimensional array indexing works in much the same way as one-dimensional\n",
+    "indexing but with, well, more dimensions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (5, 5))\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print(' First row:     ', a[  0, :])\n",
+    "print(' Last row:      ', a[ -1, :])\n",
+    "print(' second column: ', a[  :, 1])\n",
+    "print(' Centre:')\n",
+    "print(a[1:4, 1:4])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For arrays with more than two dimensions, the ellipsis (`...`) is a handy\n",
+    "feature - it allows you to specify a slice comprising all elements along\n",
+    "more than one dimension:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, (3, 3, 3))\n",
+    "print('a:')\n",
+    "print(a)\n",
+    "print('All elements at x=0:')\n",
+    "print(a[0, ...])\n",
+    "print('All elements at z=2:')\n",
+    "print(a[..., 2])\n",
+    "print('All elements at x=0, z=2:')\n",
+    "print(a[0, ..., 2])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"boolean-indexing\"></a>\n",
+    "### Boolean indexing\n",
+    "\n",
+    "\n",
+    "A numpy array can be indexed with a boolean array of the same shape. For\n",
+    "example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.random.randint(1, 10, 10)\n",
+    "\n",
+    "print('a:                          ', a)\n",
+    "print('a > 5:                      ', a > 4)\n",
+    "print('elements in a that are > 5: ', a[a > 5])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate\n",
+    "and combine boolean Numpy arrays:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a    = np.random.randint(1, 10, 10)\n",
+    "gt5  = a > 5\n",
+    "even = a % 2 == 0\n",
+    "\n",
+    "print('a:                                    ', a)\n",
+    "print('elements in a which are > 5:          ', a[gt5])\n",
+    "print('elements in a which are <= 5:         ', a[~gt5])\n",
+    "print('elements in a which are even:         ', a[even])\n",
+    "print('elements in a which are odd:          ', a[~even])\n",
+    "print('elements in a which are > 5 and even: ', a[gt5 &  even])\n",
+    "print('elements in a which are > 5 or odd:   ', a[gt5 | ~even])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"coordinate-array-indexing\"></a>\n",
+    "### Coordinate array indexing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a class=\"anchor\" id=\"array-operations-and-broadcasting\"></a>\n",
+    "## Array operations and broadcasting\n",
+    "\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"generating-random-numbers\"></a>\n",
+    "## Generating random numbers\n",
+    "\n",
+    "\n",
+    "<a class=\"anchor\" id=\"appendix-importing-numpy\"></a>\n",
+    "## Appendix: Importing Numpy\n",
     "For interactive exploration/experimentation, you might want to import\n",
@@ -245,11 +814,21 @@
-    "<a class=\"anchor\" id=\"numpy-basics\"></a>\n",
-    "## Numpy basics\n",
+    "<a class=\"anchor\" id=\"appendix-vectors-in-numpy\"></a>\n",
+    "## Appendix: Vectors in Numpy\n",
-    "Let's get started."
+    "One aspect of Numpy which might trip you up, and which can be quite\n",
+    "frustrating at times, is that Numpy has no understanding of row or column\n",
+    "vectors.  __An array with only one dimension is neither a row, nor a column\n",
+    "vector - it is just a 1D array__.  If you have a 1D array, and you want to use\n",
+    "it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly,\n",
+    "to use a 1D array as a column vector, you must reshape it to have shape\n",
+    "`(N, 1)`.\n",
+    "\n",
+    "\n",
+    "In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you\n",
+    "need to make sure that your arrays have the correct shape. For example:"
@@ -258,15 +837,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np"
+    "r = np.random.randint(1, 10, 3)\n",
+    "\n",
+    "print('r is a row:                                  ', r)\n",
+    "print('r.T should be a column:                      ', r.T, ' ... huh?')\n",
+    "print('Ok, make n a 2D array with one row:          ', r.reshape(1, -1))\n",
+    "print('We could also use the np.atleast_2d function:', np.atleast_2d(r))\n",
+    "print('Now we can transpose r to get a column:')\n",
+    "print(np.atleast_2d(r).T)"
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a class=\"anchor\" id=\"indexing\"></a>\n",
-    "## Indexing"
+    "> Here we used a handy feature of the `reshape` method - if you pass `-1` for\n",
+    "> the size of one dimension, it will automatically determine the size to use\n",
+    "> for that dimension."
diff --git a/getting_started/ b/getting_started/
index d395b6a..16bc676 100644
--- a/getting_started/
+++ b/getting_started/
@@ -8,18 +8,31 @@ numerical computing library.
 Numpy is not actually part of the standard Python library. But it is a
 fundamental part of the Python ecosystem - it forms the basis for many
 important Python libraries, and it (along with its partners
-[`scipy`]( and [`matplotlib`](
-is what makes Python a viable alternative to Matlab as a scientific computing
+[`scipy`](, [`matplotlib`]( and
+[`pandas`]( is what makes Python an attractive
+alternative to Matlab as a scientific computing platform.
 ## Contents
 * [The Python list versus the Numpy array](#the-python-list-versus-the-numpy-array)
-* [Importing Numpy](#importing-numpy)
 * [Numpy basics](#numpy-basics)
-* [Indexing](#indexing)
+ * [Creating arrays](#creating-arrays)
+ * [Operating on arrays](#operating-on-arrays)
+ * [Array properties](#array-properties)
+ * [Descriptive statistics](#descriptive-statistics)
+ * [Reshaping and rearranging arrays](#reshaping-and-rearranging-arrays)
+* [Array indexing](#array-indexing)
+ * [Indexing multi-dimensional arrays](#indexing-multi-dimensional-arrays)
+ * [Boolean indexing](#boolean-indexing)
+ * [Coordinate array indexing](#coordinate-array-indexing)
+* [Array operations and broadcasting](#array-operations-and-broadcasting)
+* [Generating random numbers](#generating-random-numbers)
+* [Appendix: Importing Numpy](#appendix-importing-numpy)
+* [Appendix: Vectors in Numpy](#appendix-vectors-in-numpy)
 <a class="anchor" id="the-python-list-versus-the-numpy-array"></a>
@@ -27,9 +40,12 @@ platform.
 Numpy adds a new data type to the Python language - the `array` (more
-specifically, the `ndarray`). You have already been introduced to the Python
-`list`, which you can easily use to store a handful of numbers (or anything
+specifically, the `ndarray`). A Numpy `array` is a N-dimensional array of
+homogeneously-typed numerical data.
+You have already been introduced to the Python `list`, which you can easily
+use to store a handful of numbers (or anything else):
@@ -41,7 +57,9 @@ You could also emulate a 2D or ND matrix by using lists of lists, for example:
-xyz_coords = [[-11.4, 1.0, 22.6], [22.7, -32.8, 19.1], [62.8, -18.2, -34.5]]
+xyz_coords = [[-11.4,   1.0,  22.6],
+              [ 22.7, -32.8,  19.1],
+              [ 62.8, -18.2, -34.5]]
@@ -58,19 +76,19 @@ computing!
 This is a major source of confusion for those poor souls who have spent their
 lives working in Matlab, but have finally seen the light and switched to
-Python. It is very important to be able to distinguish between a Python list,
-and a Numpy array.
+Python. It is _crucial_ to be able to distinguish between a Python list and a
+Numpy array.
-A list in python is akin to a cell array in Matlab - they can store anything,
-but are extremely inefficient, and unwieldy when you have more than a couple
-of dimensions.
+___Python list == Matlab cell array:___ A list in python is akin to a cell
+array in Matlab - they can store anything, but are extremely inefficient, and
+unwieldy when you have more than a couple of dimensions.
-These are in contrast to the Numpy array and Matlab matrix, which are both
-thin wrappers around a contiguous chunk of memory, and which provide
-blazing-fast performance (because behind the scenes in both Numpy and Matlab,
-it's C, C++ and FORTRAN all the way down).
+___Numy array == Matlab matrix:___ These are in contrast to the Numpy array
+and Matlab matrix, which are both thin wrappers around a contiguous chunk of
+memory, and which provide blazing-fast performance (because behind the scenes
+in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down).
 So you should strongly consider turning those lists into Numpy arrays:
@@ -89,8 +107,7 @@ xyz_coords = np.array([[-11.4,   1.0,  22.6],
 If you look carefully at the code above, you will notice that we are still
 actually using Python lists. We have declared our data sets in exactly the
-same way that we did earlier, by denoting them with square brackets `[` and
+same way as we did earlier, by denoting them with square brackets `[` and `]`.
 The key difference here is that these lists immediately get converted into
@@ -113,12 +130,410 @@ xyz_coords = np.array(xyz_coords)
-I'm emphasising this to help you understand the difference between Python
-lists and Numpy arrays. Apologies if you've already got it.
+Of course, in practice, we would never create a Numpy array in this way - we
+will be loading our data from text or binary files directly into a Numpy
+array, thus completely bypassing the use of Python lists and the costly
+list-to-array conversion.  I'm emphasising this to help you understand the
+difference between Python lists and Numpy arrays. Apologies if you've already
+got it, forgiveness please.
+<a class="anchor" id="numpy-basics"></a>
+## Numpy basics
+Let's get started.
+import numpy as np
+<a class="anchor" id="creating-arrays"></a>
+### Creating arrays
+Numpy has quite a few functions which behave similarly to their equivalents in
+print('np.zeros gives us zeros:                       ', np.zeros(5))
+print('np.ones gives us ones:                         ', np.ones(5))
+print('np.arange gives us a range:                    ', np.arange(5))
+print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5))
+print('np.random.random gives us random numbers:      ', np.random.random(5))
+print('np.random.randint gives us random integers:    ', np.random.randint(1, 10, 5))
+print('np.eye gives us an identity matrix:')
+print('np.diag gives us a diagonal matrix:')
+print(np.diag([1, 2, 3, 4]))
+> There will be more on random numbers [below](#generating-random-numbers).
+The `zeros` and `ones` functions can also be used to generate N-dimensional
+z = np.zeros((3, 4))
+o = np.ones((2, 10))
+> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the
+> second to columns - just like in Matlab.
+<a class="anchor" id="operating-on-arrays"></a>
+### Operating on arrays
+All of the mathematical operators you know and love can be applied to Numpy
+a = np.random.randint(1, 10, (3, 3))
+print('a + 2:')
+print( a + 2)
+print('a * 3:')
+print( a * 3)
+print('a % 2:')
+print( a % 2)
+We'll cover more advanced array operations
+<a class="anchor" id="array-properties"></a>
+### Array properties
+Numpy is a bit different than Matlab in the way that you interact with
+arrays. In Matlab, you would typically pass an array to a built-in function,
+e.g. `size(M)`, `ndims(M)`, etc. In contrast, a Numpy array is a Python
+object which has _attributes_ that contain basic information about the array:
+z = np.zeros((2, 3, 4))
+print('Shape:                     ', z.shape)
+print('Number of dimensions:      ', z.ndim)
+print('Number of elements:        ', z.size)
+print('Data type:                 ', z.dtype)
+print('Number of bytes:           ', z.nbytes)
+print('Length of first dimension: ', len(z))
+> As depicted above, passing a Numpy array to the built-in `len` function will
+> only give you the length of the first dimension, so you will typically want
+> to avoid using it - use the `size` attribute instead.
+<a class="anchor" id="descriptive-statistics"></a>
+### Descriptive statistics
+Similarly, a Numpy array has a set of methods<sup>1</sup> which allow you to
+calculate basic descriptive statisics on an array:
+a = np.random.random(10)
+print('a: ', a)
+print('min:          ', a.min())
+print('max:          ', a.max())
+print('index of min: ', a.argmin())  # remember that in Python, list indices
+print('index of max: ', a.argmax())  # start from zero - Numpy is the same!
+print('mean:         ', a.mean())
+print('variance:     ', a.var())
+print('stddev:       ', a.std())
+print('sum:          ', a.sum())
+print('prod:         ',
+> <sup>1</sup> Python, being an object-oriented language, distinguishes
+> between _functions_ and _methods_. _Method_ is simply the term used to refer
+> to a function that is associated with a specific object. Similarly, the term
+> _attribute_ is used to refer to some piece of information that is attached
+> to an object, such as `z.shape`, or `z.dtype`.
+<a class="anchor" id="reshaping-and-rearranging-arrays"></a>
+### Reshaping and rearranging arrays
+A numpy array can be reshaped very easily, using the `reshape` method.
+a = np.random.randint(1, 10, (4, 4))
+b = a.reshape((2, 8))
+Note that this does not modify the underlying data in any way - the `reshape`
+method returns a _view_ of the same array, just indexed differently:
+a[3, 3] = 12345
+b[0, 7] = 54321
+If you need to create a reshaped copy of an array, use the `np.array`
+a = np.random.randint(1, 10, (4, 4))
+b = np.array(a.reshape(2, 8))
+a[3, 3] = 12345
+b[0, 7] = 54321
+The `T` attribute is a shortcut to obtain the transpose of an array.
+a = np.random.randint(1, 10, (4, 4))
+The `transpose` method allows you to obtain more complicated rearrangements
+of an array's axes:
+a = np.random.randint(1, 10, (2, 3, 4))
+b = a.transpose((2, 0, 1))
+print('a: ', a.shape)
+print('b:', b.shape)
+> Note again that the `T` attribute and `transpose` method return _views_ of
+> your array.
+Numpy has some useful functions which allow you to concatenate or stack
+multiple arrays into one. The `concatenate` function does what it says on the
+a = np.zeros(3)
+b = np.ones(3)
+print('1D concatenation:', np.concatenate((a, b)))
+a = np.zeros((3, 3))
+b = np.ones((3, 3))
+print('2D column-wise concatenation:')
+print(np.concatenate((a, b), axis=1))
+print('2D row-wise concatenation:')
+# The axis parameter defaults to 0,
+# so it is not strictly necessary here.
+print(np.concatenate((a, b), axis=0))
+The `hstack`, `vstack` and `dstack` functions allow you to concatenate vectors
+or arrays along the first, second, or third dimension respectively:
+a = np.zeros(3)
+b = np.ones(3)
+print('a: ', a)
+print('b: ', b)
+hstacked = np.hstack((a, b))
+vstacked = np.vstack((a, b))
+dstacked = np.dstack((a, b))
+print('hstacked: (shape {}):'.format(hstacked.shape))
+print( hstacked)
+print('vstacked: (shape {}):'.format(vstacked.shape))
+print( vstacked)
+print('dstacked: (shape {}):'.format(dstacked.shape))
+print( dstacked)
+<a class="anchor" id="array-indexing"></a>
+## Array indexing
+Just like in Matlab, slicing up your arrays is a breeze in Numpy.  If you are
+after some light reading, you might want to check out the [comprehensive Numpy
+> As with indexing regular Python lists, array indices start from 0, and end
+> indices (if specified) are exclusive.
+Let's whet our appetites with some basic 1D array slicing:
+a = np.random.randint(1, 10, 10)
+print('a:                              ', a)
+print('first element:                  ', a[0])
+print('first two elements:             ', a[:2])
+print('last element:                   ', a[a.shape[0] - 1])
+print('last element again:             ', a[-1])
+print('last two elements:              ', a[-2:])
+print('middle four elements:           ', a[3:7])
+print('Every second element:           ', a[::2])
+print('Every second element, reversed: ', a[1::-2])
+Note that slicing an array in this way returns a _view_, not a copy, into that
+a = np.random.randint(1, 10, 10)
+print('a:', a)
+every2nd = a[::2]
+print('every 2nd:', every2nd)
+every2nd += 10
+print('a':, a)
+<a class="anchor" id="indexing-multi-dimensional-arrays"></a>
+### Indexing multi-dimensional arrays
-<a class="anchor" id="importing-numpy"></a>
-## Importing numpy
+Multi-dimensional array indexing works in much the same way as one-dimensional
+indexing but with, well, more dimensions:
+a = np.random.randint(1, 10, (5, 5))
+print(' First row:     ', a[  0, :])
+print(' Last row:      ', a[ -1, :])
+print(' second column: ', a[  :, 1])
+print(' Centre:')
+print(a[1:4, 1:4])
+For arrays with more than two dimensions, the ellipsis (`...`) is a handy
+feature - it allows you to specify a slice comprising all elements along
+more than one dimension:
+a = np.random.randint(1, 10, (3, 3, 3))
+print('All elements at x=0:')
+print(a[0, ...])
+print('All elements at z=2:')
+print(a[..., 2])
+print('All elements at x=0, z=2:')
+print(a[0, ..., 2])
+<a class="anchor" id="boolean-indexing"></a>
+### Boolean indexing
+A numpy array can be indexed with a boolean array of the same shape. For
+a = np.random.randint(1, 10, 10)
+print('a:                          ', a)
+print('a > 5:                      ', a > 4)
+print('elements in a that are > 5: ', a[a > 5])
+Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate
+and combine boolean Numpy arrays:
+a    = np.random.randint(1, 10, 10)
+gt5  = a > 5
+even = a % 2 == 0
+print('a:                                    ', a)
+print('elements in a which are > 5:          ', a[gt5])
+print('elements in a which are <= 5:         ', a[~gt5])
+print('elements in a which are even:         ', a[even])
+print('elements in a which are odd:          ', a[~even])
+print('elements in a which are > 5 and even: ', a[gt5 &  even])
+print('elements in a which are > 5 or odd:   ', a[gt5 | ~even])
+<a class="anchor" id="coordinate-array-indexing"></a>
+### Coordinate array indexing
+<a class="anchor" id="array-operations-and-broadcasting"></a>
+## Array operations and broadcasting
+<a class="anchor" id="generating-random-numbers"></a>
+## Generating random numbers
+<a class="anchor" id="appendix-importing-numpy"></a>
+## Appendix: Importing Numpy
 For interactive exploration/experimentation, you might want to import
@@ -175,17 +590,35 @@ figure out what the hell your code is doing. Namespaces are your friend - use
-<a class="anchor" id="numpy-basics"></a>
-## Numpy basics
+<a class="anchor" id="appendix-vectors-in-numpy"></a>
+## Appendix: Vectors in Numpy
-Let's get started.
+One aspect of Numpy which might trip you up, and which can be quite
+frustrating at times, is that Numpy has no understanding of row or column
+vectors.  __An array with only one dimension is neither a row, nor a column
+vector - it is just a 1D array__.  If you have a 1D array, and you want to use
+it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly,
+to use a 1D array as a column vector, you must reshape it to have shape
+`(N, 1)`.
+In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you
+need to make sure that your arrays have the correct shape. For example:
-import numpy as np
+r = np.random.randint(1, 10, 3)
+print('r is a row:                                  ', r)
+print('r.T should be a column:                      ', r.T, ' ... huh?')
+print('Ok, make n a 2D array with one row:          ', r.reshape(1, -1))
+print('We could also use the np.atleast_2d function:', np.atleast_2d(r))
+print('Now we can transpose r to get a column:')
-<a class="anchor" id="indexing"></a>
-## Indexing
+> Here we used a handy feature of the `reshape` method - if you pass `-1` for
+> the size of one dimension, it will automatically determine the size to use
+> for that dimension.