Python Multiprocessing Quirks on MacOS.

Prelude

Currently, I’m working on the product, built around a large Django monolithic application and a bunch of microservices around it. The codebase is quite huge and has a lot of (tens of thousands) tests, that are normally run in a parallel mode in the CI environment.

The CPython and Django versions are a little bit stale (3.8 and 3.2 respectively).

For local development purposes, it’s well enough to run a subset of tests in a non-parallel mode or to wait for the whole suite to pass during the CI run, but for one specific use case I had to run a parallel test suite locally. It was a surprise for me to see the Segmentation Fault as the test failure reason for a bunch of tests.

The traceback itself could be reduced to this:

Current thread 0x00000001d9574bc0 (most recent call first):
  File "/Users/some-use/.asdf/installs/python/3.8.12/lib/python3.8/urllib/request.py", line 2631 in proxy_bypass_macosx_sysconf
  File "/Users/some-user/.asdf/installs/python/3.8.12/lib/python3.8/urllib/request.py", line 2655 in proxy_bypass
  File "/some/work/project/dir/.venv/lib/python3.8/site-packages/requests/utils.py", line 814 in should_bypass_proxies
  File "/some/work/project/dir/.venv/lib/python3.8/site-packages/requests/utils.py", line 830 in get_environ_proxies
  File "/some/work/project/dir/.venv/lib/python3.8/site-packages/requests/sessions.py", line 761 in merge_environment_settings
  File "/some/work/project/dir/.venv/lib/python3.8/site-packages/requests/sessions.py", line 579 in request

That’s quite unfortunate, segfaults are a rare beast within the Python world, especially in libraries like requests, written in pure Python, so I had to google my way around a little bit.

Search

The search was quite easy. I quickly found a workaround in this article from 2018 describing a problem and providing a solution: replace the global django’s manage.py script with the patched version, waiting for a magical OBJC_DISABLE_INITIALIZE_FORK_SAFETY to be set to true and changing the multiprocessing set_start_method to “fork”.

#!/usr/bin/env python
""" Django's command-line utility for administrative tasks."""
import multiprocessing
import os
import sys


def main():
    try:
        command = sys.argv[1]
    except IndexError:
        command = "help"

    if command == "test" and sys.platform == "darwin":
        # Workaround for https://code.djangoproject.com/ticket/31169
        if os.environ.get("OBJC_DISABLE_INITIALIZE_FORK_SAFETY", "") != "YES":
            print(
                (
                    "Set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES in your"
                    + " environment to work around use of forking in Django's"
                    + " test runner."
                ),
                file=sys.stderr,
            )
            sys.exit(1)
        multiprocessing.set_start_method("fork")
    ...

Wait, what?

Apart from being quite cryptic this approach had one problem: it was already implemented by someone in our repo and it did not work for me either. I had to dive a little bit deeper into what was actually happening.

How tests are run in Django

Removing specifics the management command boils down to this

env $(cat .env-base | xargs) coverage run \
    -p manage.py \
    test <some-specific-module> \
    -v 3 \
    --settings <some-specific-settings> \
    --no-input --failfast --parallel 6

So here we run the patched manage.py file I’ve mentioned before and run the management test command with the parallel mode.

The implementation of the test command can be easily found in the Django repository, here I use the latest master but the specific version doesn’t quite matter for our investigation purposes.

    def handle(self, *test_labels, **options):
        TestRunner = get_runner(settings, options["testrunner"])

        time_keeper = TimeKeeper() if options.get("timing", False) else NullTimeKeeper()
        parallel = options.get("parallel")
        if parallel == "auto":
            options["parallel"] = get_max_test_processes()
        test_runner = TestRunner(**options)
        with time_keeper.timed("Total run"):
            failures = test_runner.run_tests(test_labels)
        time_keeper.print_results()
        if failures:
            sys.exit(1)

Here we find out the test-runner (omitting the details it’s gonna be a subclass of the DiscoverRunner) and, as we have the --parallel argument set we fallback to the ParallelTestSuite implementation of Django.

        if self.parallel > 1:
            subsuites = partition_suite_by_case(suite)
            # Since tests are distributed across processes on a per-TestCase
            # basis, there's no need for more processes than TestCases.
            processes = min(self.parallel, len(subsuites))
            # Update also "parallel" because it's used to determine the number
            # of test databases.
            self.parallel = processes
            if processes > 1:
                suite = self.parallel_test_suite(
                    subsuites,
                    processes,
                    self.failfast,
                    self.debug_mode,
                    self.buffer,
                )
        return suite

ParallelTestSuite in Django under hood uses the multiprocessing Pool object to partition and run TestCases in different spawned processes.

I will not provide the code but rather links, as there’s quite a lot.

Django initializes the multiprocessing.Pool with some worker set-up being done in child processes, passes some pickled variables representing the actual test-case pointers and communicates the results of test runs with its children.

How multiprocessing works in CPython

The culprit is somewhere here (remember, we’re in the “fork” way of spawning processes), but what actually happens?

        self.initialize_suite()
        counter = multiprocessing.Value(ctypes.c_int, 0)
        pool = multiprocessing.Pool(
            processes=self.processes,
            initializer=self.init_worker.__func__,
            initargs=[
                counter,
                self.initial_settings,
                self.serialized_contents,
                self.process_setup.__func__,
                self.process_setup_args,
                self.debug_mode,
                self.used_aliases,
            ],
        )
        args = [
            (self.runner_class, index, subsuite, self.failfast, self.buffer)
            for index, subsuite in enumerate(self.subsuites)
        ]
        test_results = pool.imap_unordered(self.run_subsuite.__func__, args)

I will not go into a lot of details here as it’s quite a lot to tackle and it is better to follow the source code yourselves with the help of the amazing CPython Internals book We have several places where multiprocessing in CPython is managed:

Context classes providing common interfaces for platform-specific way of processes’ spawning
poxix_fork Popen implementation the actual heavy-lifter here, actually making process forks.
Actual process Pool implementation actually maintains a pool of forked processes, maintains the pool size, and, with the help of various queues passes data between main and child processes.

    def _launch(self, process_obj):
        code = 1
        parent_r, child_w = os.pipe()
        child_r, parent_w = os.pipe()
        self.pid = os.fork()
        if self.pid == 0:
            try:
                os.close(parent_r)
                os.close(parent_w)
                code = process_obj._bootstrap(parent_sentinel=child_r)
            finally:
                os._exit(code)
        else:
            os.close(child_w)
            os.close(child_r)
            self.finalizer = util.Finalize(self, util.close_fds,
                                           (parent_r, parent_w,))
            self.sentinel = parent_r

What do `requests` do

So, at the moment we have a Django parent process with the manage.py test command running, and we have the multiprocessing.Pool started, which initialized a bunch of worker processes via the os.fork() call, we have some queues and variables being set up for the whole coordination mumbo-jumbo, and we have tests running.

Somewhere in the test case, there’s an unlucky requests.get() call that is not mocked in any way (gross, but something we live with sometimes) and we get a SegFault error. But why?

Let’s take a look at the last stack trace item before the SegFault:

File "/Users/some-use/.asdf/installs/python/3.8.12/lib/python3.8/urllib/request.py", line 2631 in proxy_bypass_macosx_sysconf

Let’s follow the rabbit.


if sys.platform == 'darwin':
    from _scproxy import _get_proxy_settings, _get_proxies

    def proxy_bypass_macosx_sysconf(host):
        proxy_settings = _get_proxy_settings()
        return _proxy_bypass_macosx_sysconf(host, proxy_settings)

    def getproxies_macosx_sysconf():
        """Return a dictionary of scheme -> proxy server URL mappings.

        This function uses the MacOSX framework SystemConfiguration
        to fetch the proxy information.
        """
        return _get_proxies()

During the requests processing we want to resolve local proxy configuration to get an idea of how to actually perform the HTTP request and where we should connect.

The _scproxy is written in C and can be found here (I refer to the 3.8 version, things may have changed).

/*
 * Helper method for urllib to fetch the proxy configuration settings
 * using the SystemConfiguration framework.
 */
#include <Python.h>
#include <SystemConfiguration/SystemConfiguration.h>

// Some helper functions
// ...
//

// Implementation of the _get_proxies method in C
static PyObject*
get_proxies(PyObject* Py_UNUSED(mod), PyObject *Py_UNUSED(ignored))
{
    PyObject* result = NULL;
    int r;
    CFDictionaryRef proxyDict = NULL;

    // Release the GIL
    Py_BEGIN_ALLOW_THREADS
    // https://developer.apple.com/documentation/systemconfiguration/1517088-scdynamicstorecopyproxies
    proxyDict = SCDynamicStoreCopyProxies(NULL);
    // Acquire the GIL
    Py_END_ALLOW_THREADS

    if (proxyDict == NULL) {
        return PyDict_New();
    }

    result = PyDict_New();
    if (result == NULL) goto error;

    r = set_proxy(result, "http", proxyDict,
        kSCPropNetProxiesHTTPEnable,
        kSCPropNetProxiesHTTPProxy,
        kSCPropNetProxiesHTTPPort);
    if (r == -1) goto error;
    // Repeated copy blocks omitted for brevity

    CFRelease(proxyDict);
    return result;
error:
    if (proxyDict)  CFRelease(proxyDict);
    Py_XDECREF(result);
    return NULL;
}

Well, hmm. We use the API of the CoreFoundation apple framework to get some proxy system settings, do some mapping from the framework data structures to Python dictionaries, process some errors and return the dictionary back to the caller. Pretty straightforward, so why the segfault all of a sudden?

The problem lies in the way Apple changed the usage of its APIs in a multi-process environment.

More details here, but in general it is not safe to use some Objective-C APIs between the fork() and exec() method calls, so they put a guard in the codebase. A guard that can be overcome by using the OBJC_DISABLE_INITIALIZE_FORK_SAFETY env variable, but really shouldn’t, as it may lead to some irreproducible sporadic bugs in the codebase.

I was unable to reproduce the problem with a small snippet of the codebase, but I found some issues people faced with the advice of using the variable and it seems to prove the problem:

ETHON: Libcurl initialized
objc[96843]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[96843]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[96822] - Worker 1 (PID: 96850) booted in 0.0s, phase: 0

There’s some additional traceback that was caught that tells, that the problem really is in the initialized part of Objective-C runtime data structures in the underlying framework in the post-fork execution of processes.

The question of using fork and fork/exec patterns is quite non-trivial so I leave some useful links to avoid growing the post even bigger.

Solutions

There are quite a few for my specific case:

Just use the latest Django versions. The spawn behavior is fixed somewhere along 4.2 versions so you can just throw away the OBJC_DISABLE_INITIALIZE_FORK_SAFETY workaround from your codebase.
Use some other solution for parallel test runs like pytest-xdist.
In general mock every network connection (well, except for the database if you’re a fan of integration tests). It seems like a problem like this often pairs with requests library usage that tries to resolve the proxy configuration using some low-level APIs.
Move the development of your application inside docker containers / VMs, so that you have a parity of low-level system APIs on your local machine and in your production environment.
In general, try to understand a little bit deeper quick-fixes you find around the internet.

Prelude#

Search#

Wait, what?#

How tests are run in Django#

How multiprocessing works in CPython#

What do requests do#

Solutions#

Useful links#