Commit d0eeace0 authored by Paul McCarthy's avatar Paul McCarthy 🚵
Browse files

Merge branch 'mnt/macos-fork' into 'master'

Work around fork/multiprocessing issues on macOS

See merge request !83
parents e29b3ae4 6d008e1c
[run]
concurrency = thread multiprocessing
concurrency = thread,multiprocessing
parallel = True
source = funpack
......
FUNPACK changelog
=================
2.9.0 (Under development)
-------------------------
Changed
^^^^^^^
* Explicitly use the ``fork`` method for the ``multiprocessing`` module.
2.8.0 (Thursday 19th August 2021)
---------------------------------
......
......@@ -285,6 +285,49 @@ Then you can run the test suite using ``pytest``::
pytest
macOS issues
------------
FUNPACK makes extensive use of the Python `multiprocessing
<https://docs.python.org/3/library/multiprocessing.html>`_ module to speed up
certain steps in its processing pipeline. FUNPACK relies on the POSIX `fork()
<https://www.man7.org/linux/man-pages/man2/fork.2.html>`_ mechanism, so that
worker processes may inexpensively inherit the memory space of the main
process (often referred to as *copy-on-write*). This is to avoid having to
serialise the data set being processed (stored internally as a
``pandas.DataFrame``).
In python 3.8 on macOS, the default method used by the ``multiprocessing``
module was changed from ``fork`` to ``spawn``, due to changes in macOS 10.13
restricting the use of ``fork()`` for safety reasons. Some background
information on this change can be found at https://bugs.python.org/issue33725,
and at `this blog post
<https://wefearchange.org/2018/11/forkmacos.rst.html>`_.
FUNPACK therefore explicitly sets the method used by the ``multiprocessing``
to ``fork``, to take advantage of copy-on-write semantics. Using ``fork()``
on macOS *should* be safe for single-threaded parent processes, but as FUNPACK
calls ``fork()`` numerous times (by creating and discarding
``multiprocessing.Pool()`` objects on an as-needed basis), this assumption may
not be valid, and FUNPACK may crash with an error message resembling the
following::
+[SomeClass initialize] may have been in progress in another thread
when fork() was called. We cannot safely call it or ignore it in the
fork() child process. Crashing instead.
You might be able to work around this error by setting an environment variable
before calling FUNPACK, like so::
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Citing
------
......
......@@ -73,7 +73,7 @@ PLUGIN_TYPES = ['loader',
'exporter']
class PluginRegistry(object):
class PluginRegistry:
"""The ``PluginRegistry`` keeps track of, and provides access to all
registered plugins.
"""
......@@ -198,7 +198,6 @@ def registerBuiltIns():
import funpack.metaproc_functions as mf
if firstTime:
loglevel = log.getEffectiveLevel()
log.setLevel(logging.CRITICAL)
importlib.reload(ue)
......@@ -209,7 +208,7 @@ def registerBuiltIns():
importlib.reload(mf)
if firstTime:
log.setLevel(loglevel)
log.setLevel(logging.NOTSET)
registry = PluginRegistry()
......
......@@ -110,12 +110,23 @@ def main(argv=None):
if args.num_jobs > 1:
log.debug('Running up to %i jobs in parallel', args.num_jobs)
mgr = mp.Manager()
# We need to initialise icd10
# before the worker processes
# are created, so its state is
# shared by all processes.
# Funpack relies on fork() to share
# the dataset to child processes
# when parallelising tasks. This
# is potentially dangerous on macOS
# - see the "macOS issues" note in
# README, and the python bug report:
# https://bugs.python.org/issue33725
if mp.get_start_method(True) is None:
mp.set_start_method('fork')
# The icd10 module maintains information
# which is potentially shared (read from
# and written to) by multiple processes,
# so we use a mp.Manager to handle
# shared state.
mgr = mp.Manager()
icd10.initialise(mgr)
else:
mgr = None
......
......@@ -59,11 +59,20 @@ def test_demo():
def test_demo_commands():
# Issues with pytest-cov 3.0.0 / coveragepy 6.2
# cause coverage to spit out error information
# in subprocess-called invocations. Disabling for
# the time being.
env = os.environ.copy()
for k in list(env.keys()):
if 'COVERAGE' in k or k.startswith('COV_'):
env.pop(k)
def eval_cmd(cmd, out):
# TODO extract all funpack calls, and turn
# them into funpack.main function calls.
result = sp.run(['bash', cmd], stdout=sp.PIPE)
result = sp.run(['bash', cmd], stdout=sp.PIPE, env=env)
with open(out, 'rt') as f:
out = f.read()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment