From 0727cba6df5269bac44623822d726bab74e27f03 Mon Sep 17 00:00:00 2001
From: Paul McCarthy <pauldmccarthy@gmail.com>
Date: Fri, 6 Mar 2020 19:36:39 +0000
Subject: [PATCH] WIP: threading/multiproc

---
 advanced_topics/07_threading.ipynb | 236 +++++++++++++++++++++++------
 advanced_topics/07_threading.md    | 220 +++++++++++++++++++++------
 2 files changed, 356 insertions(+), 100 deletions(-)

diff --git a/advanced_topics/07_threading.ipynb b/advanced_topics/07_threading.ipynb
index 83167ef..621921e 100644
--- a/advanced_topics/07_threading.ipynb
+++ b/advanced_topics/07_threading.ipynb
@@ -8,20 +8,31 @@
     "\n",
     "\n",
     "The Python language has built-in support for multi-threading in the\n",
-    "[`threading`](https://docs.python.org/3.5/library/threading.html) module, and\n",
+    "[`threading`](https://docs.python.org/3/library/threading.html) module, and\n",
     "true parallelism in the\n",
-    "[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)\n",
-    "module.  If you want to be impressed, skip straight to the section on\n",
+    "[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)\n",
+    "and\n",
+    "[`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html)\n",
+    "modules.  If you want to be impressed, skip straight to the section on\n",
     "[`multiprocessing`](todo).\n",
     "\n",
     "\n",
-    "\n",
+    "> *Note*: If you are familiar with a \"real\" programming language such as C++\n",
+    "> or Java, you will be disappointed with the native support for parallelism in\n",
+    "> Python. Python threads do not run in parallel because of the Global\n",
+    "> Interpreter Lock, and if you use `multiprocessing`, be prepared to either\n",
+    "> bear the performance hit of copying data between processes, or jump through\n",
+    "> hoops order to share data between processes.\n",
+    ">\n",
+    "> This limitation *might* be solved in a future Python release by way of\n",
+    "> [*sub-interpreters*](https://www.python.org/dev/peps/pep-0554/), but the\n",
+    "> author of this practical is not holding his breath.\n",
     "\n",
     "\n",
     "## Threading\n",
     "\n",
     "\n",
-    "The [`threading`](https://docs.python.org/3.5/library/threading.html) module\n",
+    "The [`threading`](https://docs.python.org/3/library/threading.html) module\n",
     "provides a traditional multi-threading API that should be familiar to you if\n",
     "you have worked with threads in other languages.\n",
     "\n",
@@ -145,7 +156,7 @@
    "metadata": {},
    "source": [
     "See the [`Thread`\n",
-    "documentation](https://docs.python.org/3.5/library/threading.html#thread-objects)\n",
+    "documentation](https://docs.python.org/3/library/threading.html#thread-objects)\n",
     "for more details.\n",
     "\n",
     "\n",
@@ -155,16 +166,16 @@
     "The `threading` module provides some useful thread-synchronisation primitives\n",
     "- the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes.  The\n",
     "`threading` module also provides `Condition` and `Semaphore` classes - refer\n",
-    "to the [documentation](https://docs.python.org/3.5/library/threading.html) for\n",
+    "to the [documentation](https://docs.python.org/3/library/threading.html) for\n",
     "more details.\n",
     "\n",
     "\n",
     "#### `Lock`\n",
     "\n",
     "\n",
-    "The [`Lock`](https://docs.python.org/3.5/library/threading.html#lock-objects)\n",
+    "The [`Lock`](https://docs.python.org/3/library/threading.html#lock-objects)\n",
     "class (and its re-entrant version, the\n",
-    "[`RLock`](https://docs.python.org/3.5/library/threading.html#rlock-objects))\n",
+    "[`RLock`](https://docs.python.org/3/library/threading.html#rlock-objects))\n",
     "prevents a block of code from being accessed by more than one thread at a\n",
     "time. For example, if we have multiple threads running this `task` function,\n",
     "their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably\n",
@@ -291,7 +302,7 @@
     "\n",
     "\n",
     "The\n",
-    "[`Event`](https://docs.python.org/3.5/library/threading.html#event-objects)\n",
+    "[`Event`](https://docs.python.org/3/library/threading.html#event-objects)\n",
     "class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to\n",
     "signal events between threads. Threads can `wait` on the event, and be awoken\n",
     "when the event is `set` by another thread:\n",
@@ -332,8 +343,8 @@
     "### The Global Interpreter Lock (GIL)\n",
     "\n",
     "\n",
-    "The [_Global Interpreter\n",
-    "Lock_](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)\n",
+    "The [*Global Interpreter\n",
+    "Lock*](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)\n",
     "is an implementation detail of [CPython](https://github.com/python/cpython)\n",
     "(the official Python interpreter).  The GIL means that a multi-threaded\n",
     "program written in pure Python is not able to take advantage of multiple\n",
@@ -353,7 +364,7 @@
     "\n",
     "\n",
     "For true parallelism, you should check out the\n",
-    "[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)\n",
+    "[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)\n",
     "module.\n",
     "\n",
     "\n",
@@ -367,11 +378,11 @@
     "\n",
     "\n",
     "The\n",
-    "[`Process`](https://docs.python.org/3.5/library/multiprocessing.html#the-process-class)\n",
+    "[`Process`](https://docs.python.org/3/library/multiprocessing.html#the-process-class)\n",
     "class is the `multiprocessing` equivalent of the\n",
-    "[`threading.Thread`](https://docs.python.org/3.5/library/threading.html#thread-objects)\n",
+    "[`threading.Thread`](https://docs.python.org/3/library/threading.html#thread-objects)\n",
     "class.  `multprocessing` also has equivalents of the [`Lock` and `Event`\n",
-    "classes](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-between-processes),\n",
+    "classes](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes),\n",
     "and the other synchronisation primitives provided by `threading`.\n",
     "\n",
     "\n",
@@ -380,10 +391,12 @@
     "\n",
     "\n",
     "Because your \"threads\" are now independent processes, you need to be a little\n",
-    "careful about how to share information across them. Fortunately, the\n",
-    "`multiprocessing` module provides [`Queue` and `Pipe`\n",
-    "classes](https://docs.python.org/3.5/library/multiprocessing.html#exchanging-objects-between-processes)\n",
-    "which make it easy to share data across processes.\n",
+    "careful about how to share information across them. If you only need to share\n",
+    "small amounts of data, you can use the [`Queue` and `Pipe`\n",
+    "classes](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes),\n",
+    "in the `multiprocessing` module. If you are working with large amounts of data\n",
+    "where copying between processes is not feasible, things become more\n",
+    "complicated, but read on...\n",
     "\n",
     "\n",
     "### Higher-level API - the `multiprocessing.Pool`\n",
@@ -391,11 +404,13 @@
     "\n",
     "The real advantages of `multiprocessing` lie in its higher level API, centered\n",
     "around the [`Pool`\n",
-    "class](https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers).\n",
+    "class](https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers).\n",
     "\n",
     "\n",
     "Essentially, you create a `Pool` of worker processes - you specify the number\n",
-    "of processes when you create the pool.\n",
+    "of processes when you create the pool. Once you have created a `Pool`, you can\n",
+    "use its methods to automatically parallelise tasks. The most useful are the\n",
+    "`map`, `starmap` and `apply_async` methods.\n",
     "\n",
     "\n",
     "> The best number of processes to use for a `Pool` will depend on the system\n",
@@ -403,18 +418,13 @@
     "> I/O bound or CPU bound).\n",
     "\n",
     "\n",
-    "Once you have created a `Pool`, you can use its methods to automatically\n",
-    "parallelise tasks. The most useful are the `map`, `starmap` and\n",
-    "`apply_async` methods.\n",
-    "\n",
-    "\n",
     "#### `Pool.map`\n",
     "\n",
     "\n",
     "The\n",
-    "[`Pool.map`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map)\n",
+    "[`Pool.map`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map)\n",
     "method is the multiprocessing equivalent of the built-in\n",
-    "[`map`](https://docs.python.org/3.5/library/functions.html#map) function - it\n",
+    "[`map`](https://docs.python.org/3/library/functions.html#map) function - it\n",
     "is given a function, and a sequence, and it applies the function to each\n",
     "element in the sequence."
    ]
@@ -467,7 +477,7 @@
     "The `Pool.map` method only works with functions that accept one argument, such\n",
     "as our `crunchImage` function above. If you have a function which accepts\n",
     "multiple arguments, use the\n",
-    "[`Pool.starmap`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)\n",
+    "[`Pool.starmap`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)\n",
     "method instead:"
    ]
   },
@@ -514,7 +524,7 @@
    "source": [
     "The `map` and `starmap` methods also have asynchronous equivalents `map_async`\n",
     "and `starmap_async`, which return immediately. Refer to the\n",
-    "[`Pool`](https://docs.python.org/3.5/library/multiprocessing.html#module-multiprocessing.pool)\n",
+    "[`Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)\n",
     "documentation for more details.\n",
     "\n",
     "\n",
@@ -522,16 +532,16 @@
     "\n",
     "\n",
     "The\n",
-    "[`Pool.apply`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply)\n",
+    "[`Pool.apply`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply)\n",
     "method will execute a function on one of the processes, and block until it has\n",
     "finished.  The\n",
-    "[`Pool.apply_async`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)\n",
+    "[`Pool.apply_async`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)\n",
     "method returns immediately, and is thus more suited to asynchronously\n",
     "scheduling multiple jobs to run in parallel.\n",
     "\n",
     "\n",
     "`apply_async` returns an object of type\n",
-    "[`AsyncResult`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.AsyncResult).\n",
+    "[`AsyncResult`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult).\n",
     "An `AsyncResult` object has `wait` and `get` methods which will block until\n",
     "the job has completed."
    ]
@@ -621,9 +631,9 @@
     "\n",
     "\n",
     "Any items which you wish to pass to a function that is executed by a `Pool`\n",
-    "must be - the built-in\n",
-    "[`pickle`](https://docs.python.org/3.5/library/pickle.html) module is used by\n",
-    "`multiprocessing` to serialise and de-serialise the data passed into and\n",
+    "must be *pickleable*<sup>1</sup> - the built-in\n",
+    "[`pickle`](https://docs.python.org/3/library/pickle.html) module is used by\n",
+    "`multiprocessing` to serialise and de-serialise the data passed to and\n",
     "returned from a child process. The majority of standard Python types (`list`,\n",
     "`dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only\n",
     "need to worry about this detail if you are passing objects of a custom type\n",
@@ -631,24 +641,150 @@
     "third-party library).\n",
     "\n",
     "\n",
+    "> <sup>1</sup>*Pickleable* is the term used in the Python world to refer to\n",
+    "> something that is *serialisable* - basically, the process of converting an\n",
+    "> in-memory object into a binary form that can be stored and/or transmitted.\n",
+    "\n",
+    "\n",
     "There is obviously some overhead in copying data back and forth between the\n",
-    "main process and the worker processes.  For most computationally intensive\n",
-    "tasks, this communication overhead is not important - the performance\n",
-    "bottleneck is typically going to be the computation time, rather than I/O\n",
-    "between the parent and child processes. You may need to spend some time\n",
-    "adjusting the way in which you split up your data, and the number of\n",
-    "processes, in order to get the best performance.\n",
-    "\n",
-    "\n",
-    "However, if you have determined that copying data between processes is having\n",
-    "a substantial impact on your performance, the `multiprocessing` module\n",
-    "provides the [`Value`, `Array`, and `RawArray`\n",
-    "classes](https://docs.python.org/3.5/library/multiprocessing.html#shared-ctypes-objects),\n",
+    "main process and the worker processes; this may or may not be a problem.  For\n",
+    "most computationally intensive tasks, this communication overhead is not\n",
+    "important - the performance bottleneck is typically going to be the\n",
+    "computation time, rather than I/O between the parent and child processes.\n",
+    "\n",
+    "\n",
+    "However, if you are working with a large dataset, you have determined that\n",
+    "copying data between processes is having a substantial impact on your\n",
+    "performance, and instead wish to *share* a single copy of the data between\n",
+    "the processes, you will need to:\n",
+    "\n",
+    " 1. Structure your code so that the data you want to share is accessible at\n",
+    "    the *module level*.\n",
+    " 2. Define/create/load the data *before* creating the `Pool`.\n",
+    "\n",
+    "\n",
+    "This is because, when you create a `Pool`, what actually happens is that the\n",
+    "process your Pythonn script is running in will [**fork**](wiki-fork) itself -\n",
+    "the child processes that are created are used as the worker processes by the\n",
+    "`Pool`. And if you create/load your data in your main process *before* this\n",
+    "fork occurs, all of the child processes will inherit the memory space of the\n",
+    "main process, and will therefore have (read-only) access to the data, without\n",
+    "any copying required.\n",
+    "\n",
+    "\n",
+    "[wiki-fork]: https://en.wikipedia.org/wiki/Fork_(system_call)\n",
+    "\n",
+    "\n",
+    "Let's see this in action with a simple example. We'll start by defining a\n",
+    "little helper function which allows us to track the total memory usage, using\n",
+    "the unix `free` command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# todo mac version\n",
+    "import subprocess as sp\n",
+    "def memusage(msg):\n",
+    "    stdout = sp.run(['free', '--mega'], capture_output=True).stdout.decode()\n",
+    "    stdout = stdout.split('\\n')[1].split()\n",
+    "    total  = stdout[1]\n",
+    "    used   = stdout[2]\n",
+    "    print('Memory usage {}: {} / {} MB'.format(msg, used, total))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now our task is simply to calculate the sum of a large array of numbers. We're\n",
+    "going to create a big chunk of data, and process it in chunks, keeping track\n",
+    "of memory usage as the task progresses:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "memusage('before creating data')\n",
+    "\n",
+    "# allocate 500MB of data\n",
+    "data = np.random.random(500 * (1048576 // 8))\n",
+    "\n",
+    "# Assign nelems values to each worker\n",
+    "# process (hard-coded so we need 12\n",
+    "# jobs to complete the task)\n",
+    "nelems =  len(data) // 12\n",
+    "\n",
+    "memusage('after creating data')\n",
+    "\n",
+    "# Each job process nelems values,\n",
+    "# starting from the specified offset\n",
+    "def process_chunk(offset):\n",
+    "    time.sleep(1)\n",
+    "    return data[offset:offset + nelems].mean()\n",
+    "\n",
+    "# Create our worker process pool\n",
+    "pool = mp.Pool(4)\n",
+    "\n",
+    "# Generate an offset into the data for each\n",
+    "# job, and call process_chunk for each offset\n",
+    "offsets = range(0, len(data), nelems)\n",
+    "results = pool.map_async(process_chunk, offsets)\n",
+    "\n",
+    "# Wait for all of the jobs to finish\n",
+    "elapsed = 0\n",
+    "while not results.ready():\n",
+    "    memusage('after {} seconds'.format(elapsed))\n",
+    "    time.sleep(1)\n",
+    "    elapsed += 1\n",
+    "\n",
+    "results = results.get()\n",
+    "print('Total sum:', sum(results))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should be able to see that only one copy of `data` is created, and is\n",
+    "shared by all of the worker processes without any copying taking place.\n",
+    "\n",
+    "So if you only need read-only acess ...\n",
+    "\n",
+    "But what if your worker processes need ...\n",
+    "\n",
+    "Go back to the code block above and ...\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "> If you have worked with a real programming language with true parallelism\n",
+    "> and shared memory via within-process multi-threading, feel free to take a\n",
+    "> break at this point. Breathe. Relax. Go punch a hole in a wall. I've been\n",
+    "> coding in Python for years, and this still makes me angry. Sometimes\n",
+    "> ... don't tell anyone I said this ... I even find myself wishing I were\n",
+    "> coding in *Java* instead of Python. Ugh. I need to take a shower.\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "The `multiprocessing` module provides the [`Value`, `Array`, and `RawArray`\n",
+    "classes](https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects),\n",
     "which allow you to share individual values, or arrays of values, respectively.\n",
     "\n",
     "\n",
     "The `Array` and `RawArray` classes essentially wrap a typed pointer (from the\n",
-    "built-in [`ctypes`](https://docs.python.org/3.5/library/ctypes.html) module)\n",
+    "built-in [`ctypes`](https://docs.python.org/3/library/ctypes.html) module)\n",
     "to a block of memory. We can use the `Array` or `RawArray` class to share a\n",
     "Numpy array between our worker processes. The difference between an `Array`\n",
     "and a `RawArray` is that the former offers synchronised (i.e. process-safe)\n",
diff --git a/advanced_topics/07_threading.md b/advanced_topics/07_threading.md
index 460396d..a021b6d 100644
--- a/advanced_topics/07_threading.md
+++ b/advanced_topics/07_threading.md
@@ -2,20 +2,31 @@
 
 
 The Python language has built-in support for multi-threading in the
-[`threading`](https://docs.python.org/3.5/library/threading.html) module, and
+[`threading`](https://docs.python.org/3/library/threading.html) module, and
 true parallelism in the
-[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)
-module.  If you want to be impressed, skip straight to the section on
+[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
+and
+[`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html)
+modules.  If you want to be impressed, skip straight to the section on
 [`multiprocessing`](todo).
 
 
-
+> *Note*: If you are familiar with a "real" programming language such as C++
+> or Java, you will be disappointed with the native support for parallelism in
+> Python. Python threads do not run in parallel because of the Global
+> Interpreter Lock, and if you use `multiprocessing`, be prepared to either
+> bear the performance hit of copying data between processes, or jump through
+> hoops order to share data between processes.
+>
+> This limitation *might* be solved in a future Python release by way of
+> [*sub-interpreters*](https://www.python.org/dev/peps/pep-0554/), but the
+> author of this practical is not holding his breath.
 
 
 ## Threading
 
 
-The [`threading`](https://docs.python.org/3.5/library/threading.html) module
+The [`threading`](https://docs.python.org/3/library/threading.html) module
 provides a traditional multi-threading API that should be familiar to you if
 you have worked with threads in other languages.
 
@@ -107,7 +118,7 @@ t.daemon = True
 
 
 See the [`Thread`
-documentation](https://docs.python.org/3.5/library/threading.html#thread-objects)
+documentation](https://docs.python.org/3/library/threading.html#thread-objects)
 for more details.
 
 
@@ -117,16 +128,16 @@ for more details.
 The `threading` module provides some useful thread-synchronisation primitives
 - the `Lock`, `RLock` (re-entrant `Lock`), and `Event` classes.  The
 `threading` module also provides `Condition` and `Semaphore` classes - refer
-to the [documentation](https://docs.python.org/3.5/library/threading.html) for
+to the [documentation](https://docs.python.org/3/library/threading.html) for
 more details.
 
 
 #### `Lock`
 
 
-The [`Lock`](https://docs.python.org/3.5/library/threading.html#lock-objects)
+The [`Lock`](https://docs.python.org/3/library/threading.html#lock-objects)
 class (and its re-entrant version, the
-[`RLock`](https://docs.python.org/3.5/library/threading.html#rlock-objects))
+[`RLock`](https://docs.python.org/3/library/threading.html#rlock-objects))
 prevents a block of code from being accessed by more than one thread at a
 time. For example, if we have multiple threads running this `task` function,
 their [outputs](https://www.youtube.com/watch?v=F5fUFnfPpYU) will inevitably
@@ -229,7 +240,7 @@ what it does to the output.
 
 
 The
-[`Event`](https://docs.python.org/3.5/library/threading.html#event-objects)
+[`Event`](https://docs.python.org/3/library/threading.html#event-objects)
 class is essentially a boolean [semaphore][semaphore-wiki]. It can be used to
 signal events between threads. Threads can `wait` on the event, and be awoken
 when the event is `set` by another thread:
@@ -261,8 +272,8 @@ print('Processing finished!')
 ### The Global Interpreter Lock (GIL)
 
 
-The [_Global Interpreter
-Lock_](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)
+The [*Global Interpreter
+Lock*](https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock)
 is an implementation detail of [CPython](https://github.com/python/cpython)
 (the official Python interpreter).  The GIL means that a multi-threaded
 program written in pure Python is not able to take advantage of multiple
@@ -282,7 +293,7 @@ running on another core.
 
 
 For true parallelism, you should check out the
-[`multiprocessing`](https://docs.python.org/3.5/library/multiprocessing.html)
+[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
 module.
 
 
@@ -296,11 +307,11 @@ the `threading` module, and a powerful higher-level API.
 
 
 The
-[`Process`](https://docs.python.org/3.5/library/multiprocessing.html#the-process-class)
+[`Process`](https://docs.python.org/3/library/multiprocessing.html#the-process-class)
 class is the `multiprocessing` equivalent of the
-[`threading.Thread`](https://docs.python.org/3.5/library/threading.html#thread-objects)
+[`threading.Thread`](https://docs.python.org/3/library/threading.html#thread-objects)
 class.  `multprocessing` also has equivalents of the [`Lock` and `Event`
-classes](https://docs.python.org/3.5/library/multiprocessing.html#synchronization-between-processes),
+classes](https://docs.python.org/3/library/multiprocessing.html#synchronization-between-processes),
 and the other synchronisation primitives provided by `threading`.
 
 
@@ -309,10 +320,12 @@ and you will have true parallelism.
 
 
 Because your "threads" are now independent processes, you need to be a little
-careful about how to share information across them. Fortunately, the
-`multiprocessing` module provides [`Queue` and `Pipe`
-classes](https://docs.python.org/3.5/library/multiprocessing.html#exchanging-objects-between-processes)
-which make it easy to share data across processes.
+careful about how to share information across them. If you only need to share
+small amounts of data, you can use the [`Queue` and `Pipe`
+classes](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes),
+in the `multiprocessing` module. If you are working with large amounts of data
+where copying between processes is not feasible, things become more
+complicated, but read on...
 
 
 ### Higher-level API - the `multiprocessing.Pool`
@@ -320,11 +333,13 @@ which make it easy to share data across processes.
 
 The real advantages of `multiprocessing` lie in its higher level API, centered
 around the [`Pool`
-class](https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers).
+class](https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers).
 
 
 Essentially, you create a `Pool` of worker processes - you specify the number
-of processes when you create the pool.
+of processes when you create the pool. Once you have created a `Pool`, you can
+use its methods to automatically parallelise tasks. The most useful are the
+`map`, `starmap` and `apply_async` methods.
 
 
 > The best number of processes to use for a `Pool` will depend on the system
@@ -332,18 +347,13 @@ of processes when you create the pool.
 > I/O bound or CPU bound).
 
 
-Once you have created a `Pool`, you can use its methods to automatically
-parallelise tasks. The most useful are the `map`, `starmap` and
-`apply_async` methods.
-
-
 #### `Pool.map`
 
 
 The
-[`Pool.map`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map)
+[`Pool.map`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map)
 method is the multiprocessing equivalent of the built-in
-[`map`](https://docs.python.org/3.5/library/functions.html#map) function - it
+[`map`](https://docs.python.org/3/library/functions.html#map) function - it
 is given a function, and a sequence, and it applies the function to each
 element in the sequence.
 
@@ -388,7 +398,7 @@ print('Total execution time: {:0.2f} seconds'.format(end - start))
 The `Pool.map` method only works with functions that accept one argument, such
 as our `crunchImage` function above. If you have a function which accepts
 multiple arguments, use the
-[`Pool.starmap`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)
+[`Pool.starmap`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.starmap)
 method instead:
 
 
@@ -427,7 +437,7 @@ print('Total execution time: {:0.2f} seconds'.format(end - start))
 
 The `map` and `starmap` methods also have asynchronous equivalents `map_async`
 and `starmap_async`, which return immediately. Refer to the
-[`Pool`](https://docs.python.org/3.5/library/multiprocessing.html#module-multiprocessing.pool)
+[`Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
 documentation for more details.
 
 
@@ -435,16 +445,16 @@ documentation for more details.
 
 
 The
-[`Pool.apply`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply)
+[`Pool.apply`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply)
 method will execute a function on one of the processes, and block until it has
 finished.  The
-[`Pool.apply_async`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)
+[`Pool.apply_async`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async)
 method returns immediately, and is thus more suited to asynchronously
 scheduling multiple jobs to run in parallel.
 
 
 `apply_async` returns an object of type
-[`AsyncResult`](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.AsyncResult).
+[`AsyncResult`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult).
 An `AsyncResult` object has `wait` and `get` methods which will block until
 the job has completed.
 
@@ -526,9 +536,9 @@ the data that they return then has to be copied back to the parent process.
 
 
 Any items which you wish to pass to a function that is executed by a `Pool`
-must be - the built-in
-[`pickle`](https://docs.python.org/3.5/library/pickle.html) module is used by
-`multiprocessing` to serialise and de-serialise the data passed into and
+must be *pickleable*<sup>1</sup> - the built-in
+[`pickle`](https://docs.python.org/3/library/pickle.html) module is used by
+`multiprocessing` to serialise and de-serialise the data passed to and
 returned from a child process. The majority of standard Python types (`list`,
 `dict`, `str` etc), and Numpy arrays can be pickled and unpickled, so you only
 need to worry about this detail if you are passing objects of a custom type
@@ -536,24 +546,134 @@ need to worry about this detail if you are passing objects of a custom type
 third-party library).
 
 
+> <sup>1</sup>*Pickleable* is the term used in the Python world to refer to
+> something that is *serialisable* - basically, the process of converting an
+> in-memory object into a binary form that can be stored and/or transmitted.
+
+
 There is obviously some overhead in copying data back and forth between the
-main process and the worker processes.  For most computationally intensive
-tasks, this communication overhead is not important - the performance
-bottleneck is typically going to be the computation time, rather than I/O
-between the parent and child processes. You may need to spend some time
-adjusting the way in which you split up your data, and the number of
-processes, in order to get the best performance.
-
-
-However, if you have determined that copying data between processes is having
-a substantial impact on your performance, the `multiprocessing` module
-provides the [`Value`, `Array`, and `RawArray`
-classes](https://docs.python.org/3.5/library/multiprocessing.html#shared-ctypes-objects),
+main process and the worker processes; this may or may not be a problem.  For
+most computationally intensive tasks, this communication overhead is not
+important - the performance bottleneck is typically going to be the
+computation time, rather than I/O between the parent and child processes.
+
+
+However, if you are working with a large dataset, you have determined that
+copying data between processes is having a substantial impact on your
+performance, and instead wish to *share* a single copy of the data between
+the processes, you will need to:
+
+ 1. Structure your code so that the data you want to share is accessible at
+    the *module level*.
+ 2. Define/create/load the data *before* creating the `Pool`.
+
+
+This is because, when you create a `Pool`, what actually happens is that the
+process your Pythonn script is running in will [**fork**](wiki-fork) itself -
+the child processes that are created are used as the worker processes by the
+`Pool`. And if you create/load your data in your main process *before* this
+fork occurs, all of the child processes will inherit the memory space of the
+main process, and will therefore have (read-only) access to the data, without
+any copying required.
+
+
+[wiki-fork]: https://en.wikipedia.org/wiki/Fork_(system_call)
+
+
+Let's see this in action with a simple example. We'll start by defining a
+little helper function which allows us to track the total memory usage, using
+the unix `free` command:
+
+
+```
+# todo mac version
+import subprocess as sp
+def memusage(msg):
+    stdout = sp.run(['free', '--mega'], capture_output=True).stdout.decode()
+    stdout = stdout.split('\n')[1].split()
+    total  = stdout[1]
+    used   = stdout[2]
+    print('Memory usage {}: {} / {} MB'.format(msg, used, total))
+```
+
+
+Now our task is simply to calculate the sum of a large array of numbers. We're
+going to create a big chunk of data, and process it in chunks, keeping track
+of memory usage as the task progresses:
+
+
+```
+import time
+
+memusage('before creating data')
+
+# allocate 500MB of data
+data = np.random.random(500 * (1048576 // 8))
+
+# Assign nelems values to each worker
+# process (hard-coded so we need 12
+# jobs to complete the task)
+nelems =  len(data) // 12
+
+memusage('after creating data')
+
+# Each job process nelems values,
+# starting from the specified offset
+def process_chunk(offset):
+    time.sleep(1)
+    return data[offset:offset + nelems].mean()
+
+# Create our worker process pool
+pool = mp.Pool(4)
+
+# Generate an offset into the data for each
+# job, and call process_chunk for each offset
+offsets = range(0, len(data), nelems)
+results = pool.map_async(process_chunk, offsets)
+
+# Wait for all of the jobs to finish
+elapsed = 0
+while not results.ready():
+    memusage('after {} seconds'.format(elapsed))
+    time.sleep(1)
+    elapsed += 1
+
+results = results.get()
+print('Total sum:', sum(results))
+```
+
+
+You should be able to see that only one copy of `data` is created, and is
+shared by all of the worker processes without any copying taking place.
+
+So if you only need read-only acess ...
+
+But what if your worker processes need ...
+
+Go back to the code block above and ...
+
+
+
+
+
+
+> If you have worked with a real programming language with true parallelism
+> and shared memory via within-process multi-threading, feel free to take a
+> break at this point. Breathe. Relax. Go punch a hole in a wall. I've been
+> coding in Python for years, and this still makes me angry. Sometimes
+> ... don't tell anyone I said this ... I even find myself wishing I were
+> coding in *Java* instead of Python. Ugh. I need to take a shower.
+
+
+
+
+The `multiprocessing` module provides the [`Value`, `Array`, and `RawArray`
+classes](https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects),
 which allow you to share individual values, or arrays of values, respectively.
 
 
 The `Array` and `RawArray` classes essentially wrap a typed pointer (from the
-built-in [`ctypes`](https://docs.python.org/3.5/library/ctypes.html) module)
+built-in [`ctypes`](https://docs.python.org/3/library/ctypes.html) module)
 to a block of memory. We can use the `Array` or `RawArray` class to share a
 Numpy array between our worker processes. The difference between an `Array`
 and a `RawArray` is that the former offers synchronised (i.e. process-safe)
-- 
GitLab