From 0bee0fbffaeb8ddd1967515ec9f17dd004438ce9 Mon Sep 17 00:00:00 2001 From: Paul McCarthy <pauldmccarthy@gmail.com> Date: Sat, 27 Jan 2018 16:58:51 +0000 Subject: [PATCH] Working on numpy practical --- getting_started/04_numpy.ipynb | 647 +++++++++++++++++++++++++++++++-- getting_started/04_numpy.md | 493 +++++++++++++++++++++++-- 2 files changed, 1080 insertions(+), 60 deletions(-) diff --git a/getting_started/04_numpy.ipynb b/getting_started/04_numpy.ipynb index f0a4507..0f8ccf3 100644 --- a/getting_started/04_numpy.ipynb +++ b/getting_started/04_numpy.ipynb @@ -14,18 +14,31 @@ "Numpy is not actually part of the standard Python library. But it is a\n", "fundamental part of the Python ecosystem - it forms the basis for many\n", "important Python libraries, and it (along with its partners\n", - "[`scipy`](https://www.scipy.org/) and [`matplotlib`](https://matplotlib.org/))\n", - "is what makes Python a viable alternative to Matlab as a scientific computing\n", - "platform.\n", + "[`scipy`](https://www.scipy.org/), [`matplotlib`](https://matplotlib.org/) and\n", + "[`pandas`](https://pandas.pydata.org/)) is what makes Python an attractive\n", + "alternative to Matlab as a scientific computing platform.\n", "\n", "\n", "## Contents\n", "\n", "\n", "* [The Python list versus the Numpy array](#the-python-list-versus-the-numpy-array)\n", - "* [Importing Numpy](#importing-numpy)\n", "* [Numpy basics](#numpy-basics)\n", - "* [Indexing](#indexing)\n", + " * [Creating arrays](#creating-arrays)\n", + " * [Operating on arrays](#operating-on-arrays)\n", + " * [Array properties](#array-properties)\n", + " * [Descriptive statistics](#descriptive-statistics)\n", + " * [Reshaping and rearranging arrays](#reshaping-and-rearranging-arrays)\n", + "* [Array indexing](#array-indexing)\n", + " * [Indexing multi-dimensional arrays](#indexing-multi-dimensional-arrays)\n", + " * [Boolean indexing](#boolean-indexing)\n", + " * [Coordinate array indexing](#coordinate-array-indexing)\n", + "* [Array operations and broadcasting](#array-operations-and-broadcasting)\n", + "* [Generating random numbers](#generating-random-numbers)\n", + "\n", + "* [Appendix: Importing Numpy](#appendix-importing-numpy)\n", + "* [Appendix: Vectors in Numpy](#appendix-vectors-in-numpy)\n", + "\n", "\n", "\n", "<a class=\"anchor\" id=\"the-python-list-versus-the-numpy-array\"></a>\n", @@ -33,9 +46,12 @@ "\n", "\n", "Numpy adds a new data type to the Python language - the `array` (more\n", - "specifically, the `ndarray`). You have already been introduced to the Python\n", - "`list`, which you can easily use to store a handful of numbers (or anything\n", - "else):" + "specifically, the `ndarray`). A Numpy `array` is a N-dimensional array of\n", + "homogeneously-typed numerical data.\n", + "\n", + "\n", + "You have already been introduced to the Python `list`, which you can easily\n", + "use to store a handful of numbers (or anything else):" ] }, { @@ -60,7 +76,9 @@ "metadata": {}, "outputs": [], "source": [ - "xyz_coords = [[-11.4, 1.0, 22.6], [22.7, -32.8, 19.1], [62.8, -18.2, -34.5]]" + "xyz_coords = [[-11.4, 1.0, 22.6],\n", + " [ 22.7, -32.8, 19.1],\n", + " [ 62.8, -18.2, -34.5]]" ] }, { @@ -80,19 +98,19 @@ "\n", "This is a major source of confusion for those poor souls who have spent their\n", "lives working in Matlab, but have finally seen the light and switched to\n", - "Python. It is very important to be able to distinguish between a Python list,\n", - "and a Numpy array.\n", + "Python. It is _crucial_ to be able to distinguish between a Python list and a\n", + "Numpy array.\n", "\n", "\n", - "A list in python is akin to a cell array in Matlab - they can store anything,\n", - "but are extremely inefficient, and unwieldy when you have more than a couple\n", - "of dimensions.\n", + "___Python list == Matlab cell array:___ A list in python is akin to a cell\n", + "array in Matlab - they can store anything, but are extremely inefficient, and\n", + "unwieldy when you have more than a couple of dimensions.\n", "\n", "\n", - "These are in contrast to the Numpy array and Matlab matrix, which are both\n", - "thin wrappers around a contiguous chunk of memory, and which provide\n", - "blazing-fast performance (because behind the scenes in both Numpy and Matlab,\n", - "it's C, C++ and FORTRAN all the way down).\n", + "___Numy array == Matlab matrix:___ These are in contrast to the Numpy array\n", + "and Matlab matrix, which are both thin wrappers around a contiguous chunk of\n", + "memory, and which provide blazing-fast performance (because behind the scenes\n", + "in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down).\n", "\n", "\n", "So you should strongly consider turning those lists into Numpy arrays:" @@ -119,8 +137,7 @@ "source": [ "If you look carefully at the code above, you will notice that we are still\n", "actually using Python lists. We have declared our data sets in exactly the\n", - "same way that we did earlier, by denoting them with square brackets `[` and\n", - "`]`.\n", + "same way as we did earlier, by denoting them with square brackets `[` and `]`.\n", "\n", "\n", "The key difference here is that these lists immediately get converted into\n", @@ -151,12 +168,564 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "I'm emphasising this to help you understand the difference between Python\n", - "lists and Numpy arrays. Apologies if you've already got it.\n", + "Of course, in practice, we would never create a Numpy array in this way - we\n", + "will be loading our data from text or binary files directly into a Numpy\n", + "array, thus completely bypassing the use of Python lists and the costly\n", + "list-to-array conversion. I'm emphasising this to help you understand the\n", + "difference between Python lists and Numpy arrays. Apologies if you've already\n", + "got it, forgiveness please.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"numpy-basics\"></a>\n", + "## Numpy basics\n", + "\n", + "\n", + "Let's get started." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"creating-arrays\"></a>\n", + "### Creating arrays\n", + "\n", + "\n", + "Numpy has quite a few functions which behave similarly to their equivalents in\n", + "Matlab:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('np.zeros gives us zeros: ', np.zeros(5))\n", + "print('np.ones gives us ones: ', np.ones(5))\n", + "print('np.arange gives us a range: ', np.arange(5))\n", + "print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5))\n", + "print('np.random.random gives us random numbers: ', np.random.random(5))\n", + "print('np.random.randint gives us random integers: ', np.random.randint(1, 10, 5))\n", + "print('np.eye gives us an identity matrix:')\n", + "print(np.eye(4))\n", + "print('np.diag gives us a diagonal matrix:')\n", + "print(np.diag([1, 2, 3, 4]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> There will be more on random numbers [below](#generating-random-numbers).\n", + "\n", + "\n", + "The `zeros` and `ones` functions can also be used to generate N-dimensional\n", + "arrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = np.zeros((3, 4))\n", + "o = np.ones((2, 10))\n", + "print(z)\n", + "print(o)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the\n", + "> second to columns - just like in Matlab.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"operating-on-arrays\"></a>\n", + "### Operating on arrays\n", + "\n", + "\n", + "All of the mathematical operators you know and love can be applied to Numpy\n", + "arrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (3, 3))\n", + "print('a:')\n", + "print(a)\n", + "print('a + 2:')\n", + "print( a + 2)\n", + "print('a * 3:')\n", + "print( a * 3)\n", + "print('a % 2:')\n", + "print( a % 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll cover more advanced array operations\n", + "[below](#array-operations-and-broadcasting).\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"array-properties\"></a>\n", + "### Array properties\n", + "\n", + "\n", + "Numpy is a bit different than Matlab in the way that you interact with\n", + "arrays. In Matlab, you would typically pass an array to a built-in function,\n", + "e.g. `size(M)`, `ndims(M)`, etc. In contrast, a Numpy array is a Python\n", + "object which has _attributes_ that contain basic information about the array:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = np.zeros((2, 3, 4))\n", + "print(z)\n", + "print('Shape: ', z.shape)\n", + "print('Number of dimensions: ', z.ndim)\n", + "print('Number of elements: ', z.size)\n", + "print('Data type: ', z.dtype)\n", + "print('Number of bytes: ', z.nbytes)\n", + "print('Length of first dimension: ', len(z))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> As depicted above, passing a Numpy array to the built-in `len` function will\n", + "> only give you the length of the first dimension, so you will typically want\n", + "> to avoid using it - use the `size` attribute instead.\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"descriptive-statistics\"></a>\n", + "### Descriptive statistics\n", + "\n", + "\n", + "Similarly, a Numpy array has a set of methods<sup>1</sup> which allow you to\n", + "calculate basic descriptive statisics on an array:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.random(10)\n", + "print('a: ', a)\n", + "print('min: ', a.min())\n", + "print('max: ', a.max())\n", + "print('index of min: ', a.argmin()) # remember that in Python, list indices\n", + "print('index of max: ', a.argmax()) # start from zero - Numpy is the same!\n", + "print('mean: ', a.mean())\n", + "print('variance: ', a.var())\n", + "print('stddev: ', a.std())\n", + "print('sum: ', a.sum())\n", + "print('prod: ', a.prod())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> <sup>1</sup> Python, being an object-oriented language, distinguishes\n", + "> between _functions_ and _methods_. _Method_ is simply the term used to refer\n", + "> to a function that is associated with a specific object. Similarly, the term\n", + "> _attribute_ is used to refer to some piece of information that is attached\n", + "> to an object, such as `z.shape`, or `z.dtype`.\n", "\n", "\n", - "<a class=\"anchor\" id=\"importing-numpy\"></a>\n", - "## Importing numpy\n", + "<a class=\"anchor\" id=\"reshaping-and-rearranging-arrays\"></a>\n", + "### Reshaping and rearranging arrays\n", + "\n", + "\n", + "A numpy array can be reshaped very easily, using the `reshape` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (4, 4))\n", + "b = a.reshape((2, 8))\n", + "print('a:')\n", + "print(a)\n", + "print('b:')\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that this does not modify the underlying data in any way - the `reshape`\n", + "method returns a _view_ of the same array, just indexed differently:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a[3, 3] = 12345\n", + "b[0, 7] = 54321\n", + "print('a:')\n", + "print(a)\n", + "print('b:')\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you need to create a reshaped copy of an array, use the `np.array`\n", + "function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (4, 4))\n", + "b = np.array(a.reshape(2, 8))\n", + "a[3, 3] = 12345\n", + "b[0, 7] = 54321\n", + "print('a:')\n", + "print(a)\n", + "print('b:')\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `T` attribute is a shortcut to obtain the transpose of an array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (4, 4))\n", + "print(a)\n", + "print(a.T)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `transpose` method allows you to obtain more complicated rearrangements\n", + "of an array's axes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (2, 3, 4))\n", + "b = a.transpose((2, 0, 1))\n", + "print('a: ', a.shape)\n", + "print(a)\n", + "print('b:', b.shape)\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note again that the `T` attribute and `transpose` method return _views_ of\n", + "> your array.\n", + "\n", + "\n", + "Numpy has some useful functions which allow you to concatenate or stack\n", + "multiple arrays into one. The `concatenate` function does what it says on the\n", + "tin:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.zeros(3)\n", + "b = np.ones(3)\n", + "\n", + "print('1D concatenation:', np.concatenate((a, b)))\n", + "\n", + "a = np.zeros((3, 3))\n", + "b = np.ones((3, 3))\n", + "\n", + "print('2D column-wise concatenation:')\n", + "print(np.concatenate((a, b), axis=1))\n", + "\n", + "print('2D row-wise concatenation:')\n", + "\n", + "# The axis parameter defaults to 0,\n", + "# so it is not strictly necessary here.\n", + "print(np.concatenate((a, b), axis=0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `hstack`, `vstack` and `dstack` functions allow you to concatenate vectors\n", + "or arrays along the first, second, or third dimension respectively:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.zeros(3)\n", + "b = np.ones(3)\n", + "\n", + "print('a: ', a)\n", + "print('b: ', b)\n", + "\n", + "hstacked = np.hstack((a, b))\n", + "vstacked = np.vstack((a, b))\n", + "dstacked = np.dstack((a, b))\n", + "\n", + "print('hstacked: (shape {}):'.format(hstacked.shape))\n", + "print( hstacked)\n", + "print('vstacked: (shape {}):'.format(vstacked.shape))\n", + "print( vstacked)\n", + "print('dstacked: (shape {}):'.format(dstacked.shape))\n", + "print( dstacked)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"array-indexing\"></a>\n", + "## Array indexing\n", + "\n", + "\n", + "Just like in Matlab, slicing up your arrays is a breeze in Numpy. If you are\n", + "after some light reading, you might want to check out the [comprehensive Numpy\n", + "Indexing\n", + "reference](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).\n", + "\n", + "\n", + "> As with indexing regular Python lists, array indices start from 0, and end\n", + "> indices (if specified) are exclusive.\n", + "\n", + "\n", + "Let's whet our appetites with some basic 1D array slicing:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, 10)\n", + "\n", + "print('a: ', a)\n", + "print('first element: ', a[0])\n", + "print('first two elements: ', a[:2])\n", + "print('last element: ', a[a.shape[0] - 1])\n", + "print('last element again: ', a[-1])\n", + "print('last two elements: ', a[-2:])\n", + "print('middle four elements: ', a[3:7])\n", + "print('Every second element: ', a[::2])\n", + "print('Every second element, reversed: ', a[1::-2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that slicing an array in this way returns a _view_, not a copy, into that\n", + "array:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, 10)\n", + "print('a:', a)\n", + "every2nd = a[::2]\n", + "print('every 2nd:', every2nd)\n", + "every2nd += 10\n", + "print('a':, a)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"indexing-multi-dimensional-arrays\"></a>\n", + "### Indexing multi-dimensional arrays\n", + "\n", + "\n", + "Multi-dimensional array indexing works in much the same way as one-dimensional\n", + "indexing but with, well, more dimensions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (5, 5))\n", + "print('a:')\n", + "print(a)\n", + "print(' First row: ', a[ 0, :])\n", + "print(' Last row: ', a[ -1, :])\n", + "print(' second column: ', a[ :, 1])\n", + "print(' Centre:')\n", + "print(a[1:4, 1:4])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For arrays with more than two dimensions, the ellipsis (`...`) is a handy\n", + "feature - it allows you to specify a slice comprising all elements along\n", + "more than one dimension:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, (3, 3, 3))\n", + "print('a:')\n", + "print(a)\n", + "print('All elements at x=0:')\n", + "print(a[0, ...])\n", + "print('All elements at z=2:')\n", + "print(a[..., 2])\n", + "print('All elements at x=0, z=2:')\n", + "print(a[0, ..., 2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"boolean-indexing\"></a>\n", + "### Boolean indexing\n", + "\n", + "\n", + "A numpy array can be indexed with a boolean array of the same shape. For\n", + "example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, 10)\n", + "\n", + "print('a: ', a)\n", + "print('a > 5: ', a > 4)\n", + "print('elements in a that are > 5: ', a[a > 5])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate\n", + "and combine boolean Numpy arrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.randint(1, 10, 10)\n", + "gt5 = a > 5\n", + "even = a % 2 == 0\n", + "\n", + "print('a: ', a)\n", + "print('elements in a which are > 5: ', a[gt5])\n", + "print('elements in a which are <= 5: ', a[~gt5])\n", + "print('elements in a which are even: ', a[even])\n", + "print('elements in a which are odd: ', a[~even])\n", + "print('elements in a which are > 5 and even: ', a[gt5 & even])\n", + "print('elements in a which are > 5 or odd: ', a[gt5 | ~even])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"coordinate-array-indexing\"></a>\n", + "### Coordinate array indexing" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a class=\"anchor\" id=\"array-operations-and-broadcasting\"></a>\n", + "## Array operations and broadcasting\n", + "\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"generating-random-numbers\"></a>\n", + "## Generating random numbers\n", + "\n", + "\n", + "<a class=\"anchor\" id=\"appendix-importing-numpy\"></a>\n", + "## Appendix: Importing Numpy\n", "\n", "\n", "For interactive exploration/experimentation, you might want to import\n", @@ -245,11 +814,21 @@ "them!\n", "\n", "\n", - "<a class=\"anchor\" id=\"numpy-basics\"></a>\n", - "## Numpy basics\n", + "<a class=\"anchor\" id=\"appendix-vectors-in-numpy\"></a>\n", + "## Appendix: Vectors in Numpy\n", "\n", "\n", - "Let's get started." + "One aspect of Numpy which might trip you up, and which can be quite\n", + "frustrating at times, is that Numpy has no understanding of row or column\n", + "vectors. __An array with only one dimension is neither a row, nor a column\n", + "vector - it is just a 1D array__. If you have a 1D array, and you want to use\n", + "it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly,\n", + "to use a 1D array as a column vector, you must reshape it to have shape\n", + "`(N, 1)`.\n", + "\n", + "\n", + "In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you\n", + "need to make sure that your arrays have the correct shape. For example:" ] }, { @@ -258,15 +837,23 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np" + "r = np.random.randint(1, 10, 3)\n", + "\n", + "print('r is a row: ', r)\n", + "print('r.T should be a column: ', r.T, ' ... huh?')\n", + "print('Ok, make n a 2D array with one row: ', r.reshape(1, -1))\n", + "print('We could also use the np.atleast_2d function:', np.atleast_2d(r))\n", + "print('Now we can transpose r to get a column:')\n", + "print(np.atleast_2d(r).T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "<a class=\"anchor\" id=\"indexing\"></a>\n", - "## Indexing" + "> Here we used a handy feature of the `reshape` method - if you pass `-1` for\n", + "> the size of one dimension, it will automatically determine the size to use\n", + "> for that dimension." ] } ], diff --git a/getting_started/04_numpy.md b/getting_started/04_numpy.md index d395b6a..16bc676 100644 --- a/getting_started/04_numpy.md +++ b/getting_started/04_numpy.md @@ -8,18 +8,31 @@ numerical computing library. Numpy is not actually part of the standard Python library. But it is a fundamental part of the Python ecosystem - it forms the basis for many important Python libraries, and it (along with its partners -[`scipy`](https://www.scipy.org/) and [`matplotlib`](https://matplotlib.org/)) -is what makes Python a viable alternative to Matlab as a scientific computing -platform. +[`scipy`](https://www.scipy.org/), [`matplotlib`](https://matplotlib.org/) and +[`pandas`](https://pandas.pydata.org/)) is what makes Python an attractive +alternative to Matlab as a scientific computing platform. ## Contents * [The Python list versus the Numpy array](#the-python-list-versus-the-numpy-array) -* [Importing Numpy](#importing-numpy) * [Numpy basics](#numpy-basics) -* [Indexing](#indexing) + * [Creating arrays](#creating-arrays) + * [Operating on arrays](#operating-on-arrays) + * [Array properties](#array-properties) + * [Descriptive statistics](#descriptive-statistics) + * [Reshaping and rearranging arrays](#reshaping-and-rearranging-arrays) +* [Array indexing](#array-indexing) + * [Indexing multi-dimensional arrays](#indexing-multi-dimensional-arrays) + * [Boolean indexing](#boolean-indexing) + * [Coordinate array indexing](#coordinate-array-indexing) +* [Array operations and broadcasting](#array-operations-and-broadcasting) +* [Generating random numbers](#generating-random-numbers) + +* [Appendix: Importing Numpy](#appendix-importing-numpy) +* [Appendix: Vectors in Numpy](#appendix-vectors-in-numpy) + <a class="anchor" id="the-python-list-versus-the-numpy-array"></a> @@ -27,9 +40,12 @@ platform. Numpy adds a new data type to the Python language - the `array` (more -specifically, the `ndarray`). You have already been introduced to the Python -`list`, which you can easily use to store a handful of numbers (or anything -else): +specifically, the `ndarray`). A Numpy `array` is a N-dimensional array of +homogeneously-typed numerical data. + + +You have already been introduced to the Python `list`, which you can easily +use to store a handful of numbers (or anything else): ``` @@ -41,7 +57,9 @@ You could also emulate a 2D or ND matrix by using lists of lists, for example: ``` -xyz_coords = [[-11.4, 1.0, 22.6], [22.7, -32.8, 19.1], [62.8, -18.2, -34.5]] +xyz_coords = [[-11.4, 1.0, 22.6], + [ 22.7, -32.8, 19.1], + [ 62.8, -18.2, -34.5]] ``` @@ -58,19 +76,19 @@ computing! This is a major source of confusion for those poor souls who have spent their lives working in Matlab, but have finally seen the light and switched to -Python. It is very important to be able to distinguish between a Python list, -and a Numpy array. +Python. It is _crucial_ to be able to distinguish between a Python list and a +Numpy array. -A list in python is akin to a cell array in Matlab - they can store anything, -but are extremely inefficient, and unwieldy when you have more than a couple -of dimensions. +___Python list == Matlab cell array:___ A list in python is akin to a cell +array in Matlab - they can store anything, but are extremely inefficient, and +unwieldy when you have more than a couple of dimensions. -These are in contrast to the Numpy array and Matlab matrix, which are both -thin wrappers around a contiguous chunk of memory, and which provide -blazing-fast performance (because behind the scenes in both Numpy and Matlab, -it's C, C++ and FORTRAN all the way down). +___Numy array == Matlab matrix:___ These are in contrast to the Numpy array +and Matlab matrix, which are both thin wrappers around a contiguous chunk of +memory, and which provide blazing-fast performance (because behind the scenes +in both Numpy and Matlab, it's C, C++ and FORTRAN all the way down). So you should strongly consider turning those lists into Numpy arrays: @@ -89,8 +107,7 @@ xyz_coords = np.array([[-11.4, 1.0, 22.6], If you look carefully at the code above, you will notice that we are still actually using Python lists. We have declared our data sets in exactly the -same way that we did earlier, by denoting them with square brackets `[` and -`]`. +same way as we did earlier, by denoting them with square brackets `[` and `]`. The key difference here is that these lists immediately get converted into @@ -113,12 +130,410 @@ xyz_coords = np.array(xyz_coords) ``` -I'm emphasising this to help you understand the difference between Python -lists and Numpy arrays. Apologies if you've already got it. +Of course, in practice, we would never create a Numpy array in this way - we +will be loading our data from text or binary files directly into a Numpy +array, thus completely bypassing the use of Python lists and the costly +list-to-array conversion. I'm emphasising this to help you understand the +difference between Python lists and Numpy arrays. Apologies if you've already +got it, forgiveness please. + + +<a class="anchor" id="numpy-basics"></a> +## Numpy basics + + +Let's get started. + + +``` +import numpy as np +``` + + +<a class="anchor" id="creating-arrays"></a> +### Creating arrays + + +Numpy has quite a few functions which behave similarly to their equivalents in +Matlab: + + +``` +print('np.zeros gives us zeros: ', np.zeros(5)) +print('np.ones gives us ones: ', np.ones(5)) +print('np.arange gives us a range: ', np.arange(5)) +print('np.linspace gives us N linearly spaced numbers:', np.linspace(0, 1, 5)) +print('np.random.random gives us random numbers: ', np.random.random(5)) +print('np.random.randint gives us random integers: ', np.random.randint(1, 10, 5)) +print('np.eye gives us an identity matrix:') +print(np.eye(4)) +print('np.diag gives us a diagonal matrix:') +print(np.diag([1, 2, 3, 4])) +``` + + +> There will be more on random numbers [below](#generating-random-numbers). + + +The `zeros` and `ones` functions can also be used to generate N-dimensional +arrays: + + +``` +z = np.zeros((3, 4)) +o = np.ones((2, 10)) +print(z) +print(o) +``` + + +> Note that, in a 2D Numpy array, the first axis corresponds to rows, and the +> second to columns - just like in Matlab. + + +<a class="anchor" id="operating-on-arrays"></a> +### Operating on arrays + + +All of the mathematical operators you know and love can be applied to Numpy +arrays: + + +``` +a = np.random.randint(1, 10, (3, 3)) +print('a:') +print(a) +print('a + 2:') +print( a + 2) +print('a * 3:') +print( a * 3) +print('a % 2:') +print( a % 2) +``` + + +We'll cover more advanced array operations +[below](#array-operations-and-broadcasting). + + +<a class="anchor" id="array-properties"></a> +### Array properties + + +Numpy is a bit different than Matlab in the way that you interact with +arrays. In Matlab, you would typically pass an array to a built-in function, +e.g. `size(M)`, `ndims(M)`, etc. In contrast, a Numpy array is a Python +object which has _attributes_ that contain basic information about the array: + + +``` +z = np.zeros((2, 3, 4)) +print(z) +print('Shape: ', z.shape) +print('Number of dimensions: ', z.ndim) +print('Number of elements: ', z.size) +print('Data type: ', z.dtype) +print('Number of bytes: ', z.nbytes) +print('Length of first dimension: ', len(z)) +``` + + +> As depicted above, passing a Numpy array to the built-in `len` function will +> only give you the length of the first dimension, so you will typically want +> to avoid using it - use the `size` attribute instead. + + +<a class="anchor" id="descriptive-statistics"></a> +### Descriptive statistics + + +Similarly, a Numpy array has a set of methods<sup>1</sup> which allow you to +calculate basic descriptive statisics on an array: + + +``` +a = np.random.random(10) +print('a: ', a) +print('min: ', a.min()) +print('max: ', a.max()) +print('index of min: ', a.argmin()) # remember that in Python, list indices +print('index of max: ', a.argmax()) # start from zero - Numpy is the same! +print('mean: ', a.mean()) +print('variance: ', a.var()) +print('stddev: ', a.std()) +print('sum: ', a.sum()) +print('prod: ', a.prod()) +``` + + +> <sup>1</sup> Python, being an object-oriented language, distinguishes +> between _functions_ and _methods_. _Method_ is simply the term used to refer +> to a function that is associated with a specific object. Similarly, the term +> _attribute_ is used to refer to some piece of information that is attached +> to an object, such as `z.shape`, or `z.dtype`. + + +<a class="anchor" id="reshaping-and-rearranging-arrays"></a> +### Reshaping and rearranging arrays + + +A numpy array can be reshaped very easily, using the `reshape` method. + +``` +a = np.random.randint(1, 10, (4, 4)) +b = a.reshape((2, 8)) +print('a:') +print(a) +print('b:') +print(b) +``` + + +Note that this does not modify the underlying data in any way - the `reshape` +method returns a _view_ of the same array, just indexed differently: + + +``` +a[3, 3] = 12345 +b[0, 7] = 54321 +print('a:') +print(a) +print('b:') +print(b) +``` + + +If you need to create a reshaped copy of an array, use the `np.array` +function: + + +``` +a = np.random.randint(1, 10, (4, 4)) +b = np.array(a.reshape(2, 8)) +a[3, 3] = 12345 +b[0, 7] = 54321 +print('a:') +print(a) +print('b:') +print(b) +``` + + +The `T` attribute is a shortcut to obtain the transpose of an array. + + +``` +a = np.random.randint(1, 10, (4, 4)) +print(a) +print(a.T) +``` + + +The `transpose` method allows you to obtain more complicated rearrangements +of an array's axes: + + +``` +a = np.random.randint(1, 10, (2, 3, 4)) +b = a.transpose((2, 0, 1)) +print('a: ', a.shape) +print(a) +print('b:', b.shape) +print(b) +``` + + +> Note again that the `T` attribute and `transpose` method return _views_ of +> your array. + + +Numpy has some useful functions which allow you to concatenate or stack +multiple arrays into one. The `concatenate` function does what it says on the +tin: + + +``` +a = np.zeros(3) +b = np.ones(3) + +print('1D concatenation:', np.concatenate((a, b))) + +a = np.zeros((3, 3)) +b = np.ones((3, 3)) + +print('2D column-wise concatenation:') +print(np.concatenate((a, b), axis=1)) + +print('2D row-wise concatenation:') + +# The axis parameter defaults to 0, +# so it is not strictly necessary here. +print(np.concatenate((a, b), axis=0)) +``` + + +The `hstack`, `vstack` and `dstack` functions allow you to concatenate vectors +or arrays along the first, second, or third dimension respectively: + +``` +a = np.zeros(3) +b = np.ones(3) + +print('a: ', a) +print('b: ', b) + +hstacked = np.hstack((a, b)) +vstacked = np.vstack((a, b)) +dstacked = np.dstack((a, b)) + +print('hstacked: (shape {}):'.format(hstacked.shape)) +print( hstacked) +print('vstacked: (shape {}):'.format(vstacked.shape)) +print( vstacked) +print('dstacked: (shape {}):'.format(dstacked.shape)) +print( dstacked) +``` + + +<a class="anchor" id="array-indexing"></a> +## Array indexing + + +Just like in Matlab, slicing up your arrays is a breeze in Numpy. If you are +after some light reading, you might want to check out the [comprehensive Numpy +Indexing +reference](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html). + + +> As with indexing regular Python lists, array indices start from 0, and end +> indices (if specified) are exclusive. + + +Let's whet our appetites with some basic 1D array slicing: + + +``` +a = np.random.randint(1, 10, 10) + +print('a: ', a) +print('first element: ', a[0]) +print('first two elements: ', a[:2]) +print('last element: ', a[a.shape[0] - 1]) +print('last element again: ', a[-1]) +print('last two elements: ', a[-2:]) +print('middle four elements: ', a[3:7]) +print('Every second element: ', a[::2]) +print('Every second element, reversed: ', a[1::-2]) +``` + + +Note that slicing an array in this way returns a _view_, not a copy, into that +array: + + +``` +a = np.random.randint(1, 10, 10) +print('a:', a) +every2nd = a[::2] +print('every 2nd:', every2nd) +every2nd += 10 +print('a':, a) +``` + +<a class="anchor" id="indexing-multi-dimensional-arrays"></a> +### Indexing multi-dimensional arrays -<a class="anchor" id="importing-numpy"></a> -## Importing numpy + +Multi-dimensional array indexing works in much the same way as one-dimensional +indexing but with, well, more dimensions: + + +``` +a = np.random.randint(1, 10, (5, 5)) +print('a:') +print(a) +print(' First row: ', a[ 0, :]) +print(' Last row: ', a[ -1, :]) +print(' second column: ', a[ :, 1]) +print(' Centre:') +print(a[1:4, 1:4]) +``` + + +For arrays with more than two dimensions, the ellipsis (`...`) is a handy +feature - it allows you to specify a slice comprising all elements along +more than one dimension: + + +``` +a = np.random.randint(1, 10, (3, 3, 3)) +print('a:') +print(a) +print('All elements at x=0:') +print(a[0, ...]) +print('All elements at z=2:') +print(a[..., 2]) +print('All elements at x=0, z=2:') +print(a[0, ..., 2]) +``` + + +<a class="anchor" id="boolean-indexing"></a> +### Boolean indexing + + +A numpy array can be indexed with a boolean array of the same shape. For +example: + + +``` +a = np.random.randint(1, 10, 10) + +print('a: ', a) +print('a > 5: ', a > 4) +print('elements in a that are > 5: ', a[a > 5]) +``` + + +Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate +and combine boolean Numpy arrays: + +``` +a = np.random.randint(1, 10, 10) +gt5 = a > 5 +even = a % 2 == 0 + +print('a: ', a) +print('elements in a which are > 5: ', a[gt5]) +print('elements in a which are <= 5: ', a[~gt5]) +print('elements in a which are even: ', a[even]) +print('elements in a which are odd: ', a[~even]) +print('elements in a which are > 5 and even: ', a[gt5 & even]) +print('elements in a which are > 5 or odd: ', a[gt5 | ~even]) +``` + + +<a class="anchor" id="coordinate-array-indexing"></a> +### Coordinate array indexing + + +``` + +``` + + +<a class="anchor" id="array-operations-and-broadcasting"></a> +## Array operations and broadcasting + + + +<a class="anchor" id="generating-random-numbers"></a> +## Generating random numbers + + +<a class="anchor" id="appendix-importing-numpy"></a> +## Appendix: Importing Numpy For interactive exploration/experimentation, you might want to import @@ -175,17 +590,35 @@ figure out what the hell your code is doing. Namespaces are your friend - use them! -<a class="anchor" id="numpy-basics"></a> -## Numpy basics +<a class="anchor" id="appendix-vectors-in-numpy"></a> +## Appendix: Vectors in Numpy -Let's get started. +One aspect of Numpy which might trip you up, and which can be quite +frustrating at times, is that Numpy has no understanding of row or column +vectors. __An array with only one dimension is neither a row, nor a column +vector - it is just a 1D array__. If you have a 1D array, and you want to use +it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly, +to use a 1D array as a column vector, you must reshape it to have shape +`(N, 1)`. + + +In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you +need to make sure that your arrays have the correct shape. For example: ``` -import numpy as np +r = np.random.randint(1, 10, 3) + +print('r is a row: ', r) +print('r.T should be a column: ', r.T, ' ... huh?') +print('Ok, make n a 2D array with one row: ', r.reshape(1, -1)) +print('We could also use the np.atleast_2d function:', np.atleast_2d(r)) +print('Now we can transpose r to get a column:') +print(np.atleast_2d(r).T) ``` -<a class="anchor" id="indexing"></a> -## Indexing +> Here we used a handy feature of the `reshape` method - if you pass `-1` for +> the size of one dimension, it will automatically determine the size to use +> for that dimension. -- GitLab