jobserver.py: _try_read()'s alarm timeout needs to throw an exception.

In python3, os.read() automatically retries after EINTR, which breaks our ability to interrupt on SIGALRM. Instead, throw an exception from the SIGALRM handler, which should work on both python2 and python3. This fixes a rare deadlock during parallel builds on python3. For background: https://www.python.org/dev/peps/pep-0475/#backward-compatibility "Applications relying on the fact that system calls are interrupted with InterruptedError will hang. The authors of this PEP don't think that such applications exist [...]" Well, apparently they were mistaken :)
2020-06-15 02:17:25 -04:00 · 2020-06-15 02:17:25 -04:00 · 670abbe305
commit 670abbe305
parent aa920f12ed
1 changed files with 6 additions and 2 deletions
--- a/redo/jobserver.py
+++ b/redo/jobserver.py
@ -142,8 +142,10 @@ def release_mine():
    _release(1)
 class TimeoutError(Exception): pass
 def _timeout(sig, frame):
-    pass
+    raise TimeoutError()
 # We make the pipes use the first available fd numbers starting at startfd.
@ -171,11 +173,13 @@ def _try_read(fd, n):
        return None  # try again
    # ok, the socket is readable - but some other process might get there
    # first.  We have to set an alarm() in case our read() gets stuck.
    oldh = signal.signal(signal.SIGALRM, _timeout)
    try:
        oldh = signal.signal(signal.SIGALRM, _timeout)
        signal.setitimer(signal.ITIMER_REAL, 0.01, 0.01)  # emergency fallback
        try:
            b = os.read(fd, 1)
        except TimeoutError:
            return None  # try again
        except OSError as e:
            if e.errno in (errno.EAGAIN, errno.EINTR):
                # interrupted or it was nonblocking