04_numpy.ipynb

  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In contrast to the simple indexing we have already seen, boolean indexing will\n",
    "return a _copy_ of the indexed data, __not__ a view. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(10)\n",
    "b = a[a > 5]\n",
    "print('a: ', a)\n",
    "print('b: ', b)\n",
    "print('Setting b[0] to 999')\n",
    "b[0] = 999\n",
    "print('a: ', a)\n",
    "print('b: ', b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> In general, any 'simple' indexing operation on a Numpy array, where the\n",
    "> indexing object comprises integers, slices (using the standard Python\n",
    "> `start:stop:step` notation), colons (`:`) and/or ellipses (`...`), will\n",
    "> result in a __view__ into the indexed array. Any 'advanced' indexing\n",
    "> operation, where the indexing object contains anything else (e.g. boolean or\n",
    "> integer arrays, or even python lists), will result in a __copy__ of the\n",
    "> data.\n",
    "\n",
    "\n",
    "Logical operators `~` (not), `&` (and) and `|` (or) can be used to manipulate\n",
    "and combine boolean Numpy arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a    = np.arange(10)\n",
    "gt5  = a > 5\n",
    "even = a % 2 == 0\n",
    "\n",
    "print('a:                                    ', a)\n",
    "print('elements in a which are > 5:          ', a[ gt5])\n",
    "print('elements in a which are <= 5:         ', a[~gt5])\n",
    "print('elements in a which are even:         ', a[ even])\n",
    "print('elements in a which are odd:          ', a[~even])\n",
    "print('elements in a which are > 5 and even: ', a[gt5 &  even])\n",
    "print('elements in a which are > 5 or odd:   ', a[gt5 | ~even])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Numpy also has two handy functions, `all` and `any`, which respectively allow\n",
    "you to perform boolean `and` and `or` operations along the axes of an array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(9).reshape((3, 3))\n",
    "\n",
    "print('a:')\n",
    "print(a)\n",
    "print('rows with any element divisible by 3: ', np.any(a % 3 == 0, axis=1))\n",
    "print('cols with any element divisible by 3: ', np.any(a % 3 == 0, axis=0))\n",
    "print('rows with all elements divisible by 3:', np.all(a % 3 == 0, axis=1))\n",
    "print('cols with all elements divisible by 3:', np.all(a % 3 == 0, axis=0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a class=\"anchor\" id=\"coordinate-array-indexing\"></a>\n",
    "### Coordinate array indexing\n",
    "\n",
    "\n",
    "You can index a numpy array using another array containing coordinates into\n",
    "the first array.  As with boolean indexing, this will result in a copy of the\n",
    "data.  Generally, you will need to have a separate array, or list, of\n",
    "coordinates for each axis of the array you wish to index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(16).reshape((4, 4))\n",
    "print('a:')\n",
    "print(a)\n",
    "\n",
    "rows    = [0, 2, 3]\n",
    "cols    = [1, 0, 2]\n",
    "indexed = a[rows, cols]\n",
    "\n",
    "for r, c, v in zip(rows, cols, indexed):\n",
    "    print('a[{}, {}] = {}'.format(r, c, v))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `numpy.where` function can be combined with boolean arrays to easily\n",
    "generate of coordinate arrays for values which meet some condition:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(16).reshape((4, 4))\n",
    "print('a:')\n",
    "print(a)\n",
    "\n",
    "evenrows, evencols = np.where(a % 2 == 0)\n",
    "\n",
    "print('even row coordinates:', evenx)\n",
    "print('even col coordinates:', eveny)\n",
    "\n",
    "print(a[evenrows, evencols])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a class=\"anchor\" id=\"exercises\"></a>\n",
    "## Exercises\n",
    "\n",
    "\n",
    "The challenge for each of these exercises is to complete them in as few lines\n",
    "of code as possible!\n",
    "\n",
    "\n",
    "> You can find example answers to the exercises in `04_numpy/.solutions`.\n",
    "\n",
    "\n",
    "<a class=\"anchor\" id=\"load-an-array-from-a-file-and-do-stuff-with-it\"></a>\n",
    "### Load an array from a file and do stuff with it\n",
    "\n",
    "\n",
    "Load the file `04_numpy/2d_array.txt`, and calculate and print the mean for\n",
    "each column.  If your code doesn't work, you might want to __LOOK AT YOUR\n",
    "DATA__, as you will have learnt if you have ever attended the FSL course.\n",
    "\n",
    "\n",
    "> Bonus: Find the hidden message (hint:\n",
    "> [chr](https://docs.python.org/3/library/functions.html#chr))\n",
    "\n",
    "\n",
    "<a class=\"anchor\" id=\"concatenate-affine-transforms\"></a>\n",
    "### Concatenate affine transforms\n",
    "\n",
    "\n",
    "Given all of the files in `04_numpy/xfms/`, create a transformation matrix\n",
    "which can transform coordinates from subject 1 functional space to subject 2\n",
    "functional space<sup>4</sup>.\n",
    "\n",
    "Write some code to transform the following coordinates from subject 1\n",
    "functional space to subject 2 functional space, to test that your matrix is\n",
    "correct:\n",
    "\n",
    "\n",
    "| __Subject 1 coordinates__ | __Subject 2 coordinates__ |\n",
    "|:-------------------------:|:-------------------------:|\n",
    "| `[ 0.0,   0.0,   0.0]`    | `[ 3.21,   4.15, -9.89]`  |\n",
    "| `[-5.0, -20.0,  10.0]`    | `[-0.87, -14.36, -1.13]`  |\n",
    "| `[20.0,  25.0,  60.0]`    | `[29.58,  27.61, 53.37]`  |\n",
    "\n",
    "\n",
    "> <sup>4</sup> Even though these are FLIRT transforms, this is just a toy\n",
    "> example.  Look\n",
    "> [here](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT/FAQ#What_is_the_format_of_the_matrix_used_by_FLIRT.2C_and_how_does_it_relate_to_the_transformation_parameters.3F)\n",
    "> and\n",
    "> [here](https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/1.6.2/fsl/utils/transform.py#L537)\n",
    "> if you actually need to work with FLIRT transforms.\n",
    "\n",
    "\n",
    "\n",
    "<a class=\"anchor\" id=\"appendix-generating-random-numbers\"></a>\n",
    "## Appendix A: Generating random numbers\n",
    "\n",
    "\n",
    "Numpy's\n",
    "[`numpy.random`](https://docs.scipy.org/doc/numpy/reference/routines.random.html)\n",
    "module is where you should go if you want to introduce a little randomness\n",
    "into your code.  You have already seen a couple of functions for generating\n",
    "uniformly distributed real or integer data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy.random as npr\n",
    "\n",
    "print('Random floats between 0.0 (inclusive) and 1.0 (exclusive):')\n",
    "print(npr.random((3, 3)))\n",
    "\n",
    "print('Random integers in a specified range:')\n",
    "print(npr.randint(1, 100, (3, 3)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also draw random data from other distributions - here are just a few\n",
    "examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Gaussian (mean: 0, stddev: 1):')\n",
    "print(npr.normal(0, 1, (3, 3)))\n",
    "\n",
    "print('Gamma (shape: 1, scale: 1):')\n",
    "print(npr.normal(1, 1, (3, 3)))\n",
    "\n",
    "print('Chi-square (dof: 10):')\n",
    "print(npr.chisquare(10, (3, 3)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `numpy.random` module also has a couple of other handy functions for\n",
    "random sampling of existing data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.arange(5)\n",
    "\n",
    "print('data:               ', data)\n",
    "print('two random values:  ', npr.choice(data, 2))\n",
    "print('random permutation: ', npr.permutation(data))\n",
    "\n",
    "# The numpy.random.shuffle function\n",
    "# will shuffle an array *in-place*.\n",
    "npr.shuffle(data)\n",
    "print('randomly shuffled: ', data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a class=\"anchor\" id=\"appendix-importing-numpy\"></a>\n",
    "## Appendix B: Importing Numpy\n",
    "\n",
    "\n",
    "For interactive exploration/experimentation, you might want to import\n",
    "Numpy like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This makes your Python session very similar to Matlab - you can call all\n",
    "of the Numpy functions directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "e = array([1, 2, 3, 4, 5])\n",
    "z = zeros((100, 100))\n",
    "d = diag([2, 3, 4, 5])\n",
    "\n",
    "print(e)\n",
    "print(z)\n",
    "print(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But if you are writing a script or application using Numpy, I implore you to\n",
    "Numpy (and its commonly used sub-modules) like this instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy        as np\n",
    "import numpy.random as npr\n",
    "import numpy.linalg as npla"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The downside to this is that you will have to prefix all Numpy functions with\n",
    "`np.`, like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "e = np.array([1, 2, 3, 4, 5])\n",
    "z = np.zeros((100, 100))\n",
    "d = np.diag([2, 3, 4, 5])\n",
    "r = npr.random(5)\n",
    "\n",
    "print(e)\n",
    "print(z)\n",
    "print(d)\n",
    "print(r)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a big upside, however, in that other people who have to read/use your\n",
    "code will like you a lot more. This is because it will be easier for them to\n",
    "figure out what the hell your code is doing. Namespaces are your friend - use\n",
    "them!\n",
    "\n",
    "\n",
    "<a class=\"anchor\" id=\"appendix-vectors-in-numpy\"></a>\n",
    "## Appendix C: Vectors in Numpy\n",
    "\n",
    "\n",
    "One aspect of Numpy which might trip you up, and which can be quite\n",
    "frustrating at times, is that Numpy has no understanding of row or column\n",
    "vectors.  __An array with only one dimension is neither a row, nor a column\n",
    "vector - it is just a 1D array__.  If you have a 1D array, and you want to use\n",
    "it as a row vector, you need to reshape it to a shape of `(1, N)`. Similarly,\n",
    "to use a 1D array as a column vector, you must reshape it to have shape\n",
    "`(N, 1)`.\n",
    "\n",
    "\n",
    "In general, when you are mixing 1D arrays with 2- or N-dimensional arrays, you\n",
    "need to make sure that your arrays have the correct shape. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = np.random.randint(1, 10, 3)\n",
    "\n",
    "print('r is a row:                                  ', r)\n",
    "print('r.T should be a column:                      ', r.T, ' ... huh?')\n",
    "print('Ok, make n a 2D array with one row:          ', r.reshape(1, -1))\n",
    "print('We could also use the np.atleast_2d function:', np.atleast_2d(r))\n",
    "print('Now we can transpose r to get a column:')\n",
    "print(np.atleast_2d(r).T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a class=\"anchor\" id=\"useful-references\"></a>\n",
    "## Useful references\n",
    "\n",
    "\n",
    "* [The Numpy manual](https://docs.scipy.org/doc/numpy/)\n",
    "* [Linear algebra in `numpy.linalg`](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)\n",
    "* [Broadcasting in Numpy](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)\n",
    "* [Indexing in Numpy](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)\n",
    "* [Random sampling in `numpy.random`](https://docs.scipy.org/doc/numpy/reference/routines.random.html)\n",
    "* [Python slicing](https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/)"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}