jobserver.py: _try_read()'s alarm timeout needs to throw an exception.

In python3, os.read() automatically retries after EINTR, which breaks
our ability to interrupt on SIGALRM.

Instead, throw an exception from the SIGALRM handler, which should work
on both python2 and python3.

This fixes a rare deadlock during parallel builds on python3.

For background:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

"Applications relying on the fact that system calls are interrupted
with InterruptedError will hang. The authors of this PEP don't think
that such applications exist [...]"

Well, apparently they were mistaken :)
This commit is contained in:
Avery Pennarun 2020-06-15 02:17:25 -04:00
commit 670abbe305

View file

@ -142,8 +142,10 @@ def release_mine():
_release(1) _release(1)
class TimeoutError(Exception): pass
def _timeout(sig, frame): def _timeout(sig, frame):
pass raise TimeoutError()
# We make the pipes use the first available fd numbers starting at startfd. # We make the pipes use the first available fd numbers starting at startfd.
@ -171,11 +173,13 @@ def _try_read(fd, n):
return None # try again return None # try again
# ok, the socket is readable - but some other process might get there # ok, the socket is readable - but some other process might get there
# first. We have to set an alarm() in case our read() gets stuck. # first. We have to set an alarm() in case our read() gets stuck.
oldh = signal.signal(signal.SIGALRM, _timeout)
try: try:
oldh = signal.signal(signal.SIGALRM, _timeout)
signal.setitimer(signal.ITIMER_REAL, 0.01, 0.01) # emergency fallback signal.setitimer(signal.ITIMER_REAL, 0.01, 0.01) # emergency fallback
try: try:
b = os.read(fd, 1) b = os.read(fd, 1)
except TimeoutError:
return None # try again
except OSError as e: except OSError as e:
if e.errno in (errno.EAGAIN, errno.EINTR): if e.errno in (errno.EAGAIN, errno.EINTR):
# interrupted or it was nonblocking # interrupted or it was nonblocking