Commit graph

52 commits

Author SHA1 Message Date
Avery Pennarun
7f00abc36b jobserver: fix rare race condition in previous timer-exception workaround.
We have to clear the setitimer *before* leaving the try/except clause,
or the timer might fire between the try/except and the try/finally,
leaking a TimeoutError.

Reported-by: Denton Gentry
2021-07-27 20:48:36 -07:00
Avery Pennarun
670abbe305 jobserver.py: _try_read()'s alarm timeout needs to throw an exception.
In python3, os.read() automatically retries after EINTR, which breaks
our ability to interrupt on SIGALRM.

Instead, throw an exception from the SIGALRM handler, which should work
on both python2 and python3.

This fixes a rare deadlock during parallel builds on python3.

For background:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

"Applications relying on the fact that system calls are interrupted
with InterruptedError will hang. The authors of this PEP don't think
that such applications exist [...]"

Well, apparently they were mistaken :)
2020-06-15 02:20:02 -04:00
Avery Pennarun
aa920f12ed jobserver.py: fix very rare python3 failure reported by a user.
Traceback (most recent call last):
  File "/nix/store/i0835myyhrfr13lh4y26r58406kk90xj-redo-apenwarr-0.42a/bin/../lib/redo/cmd_ifchange.py", line 54, in main
    jobserver.force_return_tokens()
  File "/nix/store/i0835myyhrfr13lh4y26r58406kk90xj-redo-apenwarr-0.42a/bin/../lib/redo/jobserver.py", line 482, in force_return_tokens
    os.write(_cheatfds[1], 't' * _cheats)
TypeError: a bytes-like object is required, not 'str'

Unfortunately I wasn't able to replicate it, but this is obviously the right fix.
2020-06-15 00:39:10 -04:00
Moritz Lell
7f2d04d5ff Prevent iterator being changed while iterating
In Python 3, `zip()` returns an iterator that in turn contains references to
`tparts` and `bparts`. Because those two variables are modified within the
loop, the iterator does not advance properly.

Fix this by explicitly obtaining a list.
2019-10-30 21:59:01 +01:00
Moritz Lell
efab08fc9f Python 2/3 compatible treatment of max(n, None)
Python 3 does no longer allow comparisons of different types. Therefore,
explicitly convert f.checked_runid to a number that's always smaller than
`f.changed_runid`
2019-10-30 21:59:01 +01:00
Moritz Lell
52a8ca25b2 Prevent "Exception ... ignored" in redo-log ... | head
When STDOUT is piped to another program and that program closes the pipe,
this program is supposed to terminate. However, Python 3 prints an error
message because prior to exiting, STDOUT is automatically flushed which
does not work due to it being closed by the other side.

This commit includes code taken from the Python documentation for this case.
The output stream is redirected to /dev/null
2019-10-30 21:28:49 +01:00
Moritz Lell
e239820afd Distinguish byte (python2 str type) and unicode strings (python 3 str type)
Python 3 strings are python 2 unicode strings. Therefore consistently mark
strings that are sent via pipes or written/read to file as byte strings.
2019-10-30 21:28:49 +01:00
Moritz Lell
0d8d19437e Set file descriptor as inheritable for all pythons >=3.4
This is mandated by PEP 446
2019-10-30 21:28:49 +01:00
Moritz Lell
62845688e5 Unify print function usage for Python 2 and 3 via __future__ import 2019-10-30 21:28:49 +01:00
Moritz Lell
491040ea72 Run 2to3 utility 2019-10-30 19:58:12 +01:00
Moritz Lell
e8d4809bc5 Remove python interpreter selection 2019-10-30 19:58:12 +01:00
Avery Pennarun
8924fa35fa Oops, redo/whichpython.do would fail if python2.7 didn't exist. 2019-06-21 00:27:40 +09:00
Avery Pennarun
4b01cdd9b5 Fix minor pylint warnings. 2019-05-13 23:08:51 +00:00
Avery Pennarun
5684142f11 redo-log: "(resumed)" lines didn't print as often as they should.
If we print a line indicating that we've started building a subprogram,
then that's an "interruption" of the detailed output of the current
program; we need to "resume" the current program's logs before printing
any new detailed information that comes through.
2019-05-13 22:58:42 +00:00
Avery Pennarun
642d6fa193 Unset CDPATH if it is set.
Apparently on some (probably buggy) platforms, when CDPATH is set, the
'cd' command will always print text to stdout.  This caused fail 122
and 123 in shelltest.od.

CDPATH is an interactive-mode option, so let's clear it when running
scripts.
2019-05-01 13:17:35 -04:00
Avery Pennarun
7238b370e4 builder.py: create temp log file in the same directory as the final one.
We're going to rename() it from the temp name to the final name, which
doesn't work across filesystems, so the safest option is to keep it in
the same directory.

Reported-by: spacefrogg@meterriblecrew.net
2019-03-12 00:03:34 -04:00
Avery Pennarun
cead02bd21 redo-log: sometimes print a (resumed) line after ending a level of recursion.
If A calls B, and B produces stderr output, and then A wants to produce
output, the resulting log would be confusing: we'd see 'redo A' and
then 'redo B' and then B's output, but no indicator that B has ended
and we're back in A.  Now we show 'redo A (resumed)' before A's output.

If B didn't produce any output, or A doesn't produce any output, we
don't bother with the (resumed) line.  This seems nice, as it doesn't
clutter the log when there is no ambiguity anyway.
2019-03-02 19:08:47 -05:00
Avery Pennarun
b196315222 Change -x/-v to only affect top-level targets by default, not recursively.
Because redo targets are nicely isolated (unlike make targets), you
usually only want to debug one of them at a time.  Using -x could be
confusing, because you might end up with a dump of output from a
dependency you're not interested in.

Now, by default we'll disable -x when recursing into sub-targets, so
you only see the trace from the targets you are actually trying to
debug.  To get recursive behaviour, specify -x twice, eg. -xx.

Same idea with -v.
2019-03-02 18:46:00 -05:00
Avery Pennarun
7b4e3326bd logs.py: don't print (unchanged) lines with --no-log unless DEBUG.
This accidentally made output look different with --no-log vs with
normal redo-log output, because redo-log has a -u option while plain
redo does not.

(This is on purpose.  When running redo, you only want to see the things
that actually happened, so it never passes -u to the auto-launched
redo-log instance.  But when reviewing logs later, you might want to
look at the past logs from building a component that was unchanged in
the most recent run.)
2019-03-02 18:41:37 -05:00
Avery Pennarun
63230a1ae3 builder.py: atomically replace the log for a given target.
Previously we were truncating the log if it existed.  This would cause
redo-log to produce invalid output if you had the following (admittedly
rare) sequence in a single session:
- start building X
- redo-log starts showing the log for X
- finish building X
- redo-log has not finished showing the log for X yet
- start building X again for some reason
- redo-log sees a truncated logfile.

Now, redo-log can finish reading the original file (which no longer has
a filename since it was overwritten) while the new file is being
created.
2019-03-02 04:05:36 -05:00
Avery Pennarun
8a97b0cb2c redo-ifchange regression: if REDO_LOG is not set, assume it's 1.
At some point this got broken during a refactoring.  The result was
that redo-ifchange, run from the command line (as opposed to inside a
.do script) would fail to start the log prettifier.
2019-03-02 04:05:36 -05:00
Avery Pennarun
83bc49512f Explicitly reject target/source filenames with newlines in them.
This avoids an ugly assertion failure when we try to log a message
containing an inner newline.
2019-03-02 04:05:36 -05:00
Avery Pennarun
e5a27f04e8 If redo searched all the way up to /default.do, it would run ./default.do instead.
This only happened if the containing project was buggy, ie. you tried
to build a target that has no .do file available anywhere. However, it
resulted in a confusing outcome for that case, where we'd run the wrong
default.do file with the wrong parameters.

Extended an existing test to catch this mistake.
2019-03-02 04:05:36 -05:00
Avery Pennarun
90989d1ffb Overridden files were accidentally getting reclassified as static.
This is relatively harmless, since we treat them *almost* identically,
except that we print a warning for overridden files to remind you that
something fishy is going on.

Add a test for the actual warning message to ensure it is printed.  (I
don't like tests for specific warning messages, but it was necessary in
this case.)
2019-03-02 04:05:36 -05:00
Avery Pennarun
8100aa4973 Certain redo post-build failures would still mark a target as built.
If we failed because:
- target dir doesn't exist
- failed to copy from stdout
- failed to rename $3

We would correctly return error 209, but the target would still be
marked as having been built, so redo-ifchange would not try to build it
next time, beause we forgot to call sf.set_failed() in those cases.

minimal/do worked correctly.

Added a test to catch this in the future.
2019-03-02 04:05:36 -05:00
Michael Raitza
c18c4b92c9 Fix builder: Reinstate stderr instead of opening /dev/tty
Reconnect the builder's original stderr file descriptor after the logger has
finished its job.

Fixes that redo could not be run without a controlling terminal.
2019-02-14 10:12:44 +01:00
Avery Pennarun
3dbdfbc06f Better handling if parent closes REDO_CHEATFDS or MAKEFLAGS fds.
Silently recover if REDO_CHEATFDS file descriptors are closed, because
they aren't completely essential and MAKEFLAGS-related warnings already
get printed if all file descriptors have been closed.

If MAKEFLAGS --jobserver-auth flags are closed, improve the error
message so that a) it's a normal error instead of an exception and b)
we link to documentation about why it happens.  Also write some more
detailed documentation about what's going on here.
2019-01-18 00:11:48 +00:00
Avery Pennarun
61f3e4672e Workaround for completely broken file locking on Windows 10 WSL.
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI
for userspace processes, but the current version doesn't not implement
fcntl() locks at all; it just always returns success.  See
https://github.com/Microsoft/WSL/issues/1927.

This causes us three kinds of problem:
  1. sqlite3 in WAL mode gives "OperationalError: locking protocol".
     1b. Other sqlite3 journal modes also don't work when used by
         multiple processes.
  2. redo parallelism doesn't work, because we can't prevent the same
     target from being build several times simultaneously.
  3. "redo-log -f" doesn't work, since it can't tell whether the log
     file it's tailing is "done" or not.

To fix #1, we switch the sqlite3 journal back to PERSIST instead of
WAL.  We originally changed to WAL in commit 5156feae9d to reduce
deadlocks on MacOS.  That was never adequately explained, but PERSIST
still acts weird on MacOS, so we'll only switch to PERSIST when we
detect that locking is definitely broken.  Sigh.

To (mostly) fix #2, we disable any -j value > 1 when locking is broken.
This prevents basic forms of parallelism, but doesn't stop you from
re-entrantly starting other instances of redo.  To fix that properly,
we need to switch to a different locking mechanism entirely, which is
tough in python.  flock() locks probably work, for example, but
python's locks lie and just use fcntl locks for those.

To fix #3, we always force --no-log mode when we find that locking is
broken.
2019-01-02 14:49:33 -05:00
Avery Pennarun
19049d52fc jobserver: allow overriding the parent jobserver in a subprocess.
Previously, if you passed a -j option to a redo process in a redo or
make process hierarchy with MAKEFLAGS already set, it would ignore the
-j option and continue using the jobserver provided by the parent.

With this change, we instead initialize a new jobserver with the
desired number of tokens, which is what GNU make does in the same
situation.  A typical use case for this is to force serialization of
build steps in a subtree (by using -j1).  In make, this is often useful
for "fixing" makefiles that haven't been written correctly for parallel
builds.  In redo, that happens much less often, but it's useful at
least in unit tests.

Passing -j1 is relatively harmless (the redo you are starting inherits
a token anyway, so it doesn't create any new tokens).  Passing -j > 1
is more risky, because it creates new tokens, thus increasing the level
of parallelism in the system.  Because this may not be what you wanted,
we print a warning when you pass -j > 1 to a sub-redo.  GNU make gives
a similar warning in this situation.
2018-12-31 19:24:27 -05:00
Avery Pennarun
e247a72300 jobserver: don't release the very last token in wait_all().
After waiting for children to exit, we would release our own token, and
then the caller would immediately try to obtain a token again.  This
accounted for tokens correctly, but would pass tokens around the call
tree in unexpected ways.

For example, imagine we had only one token.  We call 'redo a1 a2', and
a1 calls 'redo b1 b2', and b1 calls 'redo c1'.  When c1 exits, it
releases its token, then tries to re-acquire it before exiting.  This
also includes 'redo b1 b2' and 'redo a1 a2' in the race for the token,
which means b1 might get suspended while *either* a2 or b2 starts
running.

This never caused a deadlock, even if a2 or b2 depends on b1, because
if they tried to build b1, they would notice it is locked, give up
their token, and wait for the lock.  c1 (and then b1) could then obtain
the token and immediately terminate, allowing progress to continue.

But this is not really the way we expect things to happen.  "Obviously"
what we want here is a straightforward stack unwinding: c1 should finish,
then b1, then b2, then a1, then b2.

The not-very-obvious symptom of this bug is that redo's unit tests
seemed to run in the wrong order when using -j1 --no-log.  (--log would
hide the problem by rearranging logs back into the right order!)
2018-12-31 19:02:55 -05:00
Avery Pennarun
174a093dc5 Don't set_checked() on is_override files.
If a file is overridden and then overridden again, this caused us to
rebuild only the first thing that depends on it, but not any subsequent
things, which is a pretty serious bug.

It turned out that t/350-deps-forget is already supposed to test this,
but I had cleverly encoded the wrong behaviour into the expected
results in the table-driven test.  I blame lack of sleep.  Anyway, I
fixed the test, which made it fail, and then fixed the code, which made
it pass.
2018-12-18 13:01:40 +00:00
Avery Pennarun
686c381109 Fix more inconsistent behaviour with symlinks in paths.
Both redo and minimal/do were doing slightly weird things with
symlinked directories, especially when combined with "..".  For
example, if x is a link to ., then x/x/x/x/../y should resolve to
"../y", which is quite non-obvious.

Added some tests to make sure this stays fixed.
2018-12-17 16:17:37 +00:00
Avery Pennarun
1f64cc4525 shelltest.od: add more "set -e" tests and add a 'skip' return code.
Based on the earlier t/000-set-minus-e bug in minimal/do on some
shells, let's add some extra tests that reveal the weirdness on those
shells.  Unfortunately because they are so popular (including bash and
zsh), we can't reject them outright for failing this one.

While we're here, add a new return code, "skip", which notes that a
test has failed but is not important enough to be considered a warning
or failure.  Previously we just had these commented out, which is not
quite obvious enough.

...and I updated a few comments while reviewing some of the older
tests.
2018-12-17 16:17:37 +00:00
Avery Pennarun
761b77333e redo/sh.do: include the 'lksh' variant of mksh.
This one attempts to be a much closer match to POSIX, and seems to
succeed, giving only warning W118.
2018-12-17 16:17:37 +00:00
Avery Pennarun
6cf06f707a shelltest.od: we accidentally treated some fails as mere warnings.
We were setting a global variable FAIL on failure, but if we failed
inside a subshell (which a very small number of tests might do), this
setting would be lost.  The script output (a series of failed/warning
lines) was still valid, but not the return code, so the shell might be
selected even if one of these tests failed.

To avoid the problem, put the fail/warning state in the filesystem
instead, which is shared across subshells.
2018-12-17 16:17:37 +00:00
Avery Pennarun
29f939013e Add a bunch of missing python docstrings.
This appeases pylint, so un-disable its docstring warning.
2018-12-14 09:03:53 +00:00
Avery Pennarun
39e017869d Ensure correct operation with read-only target dirs and .do file dirs.
Although I expect this is rather rare, some people may want to build in
a read-write subdir of a read-only tree.  Other than some confusing
error reporting, this works fine in redo after the recent changes to
temp file handling, but let's add a test to make sure it stays that
way.  The test found a bug in minimal/do, so let's fix that.

Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
2018-12-13 13:28:44 +00:00
Avery Pennarun
d95277d121 Use mkstemp() to create the stdout temp file, and simplify $3 path.
Previously, we'd try to put the stdout temp file in the same dir as the
target, if that dir exists.  Otherwise we'd walk up the directory tree
looking for a good place.  But this would go wrong if the directory we
chose got *deleted* during the run of the .do file.

Instead, we switch to an entirely new design: we use mkstemp() to
generate a temp file in the standard temp file location (probably
/tmp), then open it and immediately delete it, so the .do file can't
cause any unexpected behaviour.  After the .do file exits, we use our
still-open fd to the stdout file to read the content back out.

In the old implementation, we also put the $3 in the "adjusted"
location that depended whether the target dir already existed, just for
consistency.  But that was never necessary: we didn't create the $3
file, and if the .do script wants to write to $3, it should create the
target dir first anyway.  So change it to *always* use a $3 temp
filename in the target dir, which is much simpler and so has fewer edge
cases.

Add t/202-del/deltest4 with some tests for all these edge cases.

Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
2018-12-13 13:28:44 +00:00
Avery Pennarun
1f79bf1174 Detect when a .do script deletes its stdout tmp file.
This can happen if we create the .tmp file in the same directory as the
target, and the .do file first does "rm -rf" on that directory, then
re-creates it.  The result is that the stdout file is lost.

We'll make this a warning if the .do script *didn't* write to stdout
(so the loss is harmless, just weird), and an error if they *did* write
to stdout, which we can detect because we still have an open fd on the
file, so we can fstat() it.
2018-12-12 03:45:33 +00:00
Avery Pennarun
2b4fe812e2 Some renaming and comments to try to clarify builder and jobserver.
The code is still a bit spaghetti-like, especialy when it comes to
redo-unlocked, but at least the new names are slightly more
comprehensible.
2018-12-11 04:17:27 +00:00
Avery Pennarun
4d2b4cfccb Make calls to logs.setup() explicit in each cmd.
Further reducing magic implicit behaviour to make code easier to
follow.
2018-12-11 02:35:11 +00:00
Avery Pennarun
474e12eed8 Fix minimal/do and tests when built in a path containing spaces.
Basically all just missing quotes around shell strings that use $PWD.
Most paths inside a project, since redo uses relative paths, only need
to worry when project-internal directories or filenames have spaces in
them.

Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
2018-12-11 01:22:29 +00:00
Avery Pennarun
bd8dbfb487 Switch to module-relative import syntax.
Now that the python scripts are all in a "redo" python module, we can
use the "new style" (ahem) package-relative imports.  This appeases
pylint, plus avoids confusion in case more than one package has
similarly-named modules.
2018-12-05 02:34:36 -05:00
Avery Pennarun
0b648521fd Move setproctitle() stuff into title.py.
This removes another instance of magical code running at module import
time.  And the process title wasn't really part of the state database
anyway.

Unfortunately this uncovered a bug: the recent change to use
'python -S' makes it not find the setproctitle module if installed.

My goodness, I hate the horrible python easy_install module gunk that
makes startup linearly slower the more modules you have installed,
whether you import them or not, if you don't use -S.  But oh well,
we're stuck with it for now.
2018-12-05 02:28:34 -05:00
Avery Pennarun
9b6d1eeb6e env and env_init: Eliminate weird auto-initialization of globals.
Merge the two files into env, and make each command explicitly call the
function that sets it up in the way that's needed for that command.

This means we can finally just import all the modules at the top of
each file, without worrying about import order.  Phew.

While we're here, remove the weird auto-appending-'all'-to-targets
feature in env.init().  Instead, do it explicitly, and only from redo and
redo-ifchange, only if is_toplevel and no other targets are given.
2018-12-05 02:27:04 -05:00
Avery Pennarun
99188bef0d Rename redo/python -> redo/py.
This avoids a name overlap with the system-installed copy of python.
Since redo adds the redo/ dir to the $PATH before running .do files,
python.do might see its own previously-created target instead of the
"real" python when testing, and create an infinite loop by accident.
2018-12-05 02:27:04 -05:00
Avery Pennarun
f1305b49eb Move env.{add,get}_lock() into cycles.py, and rename.
They really aren't locks at all, they're a cycle detector.  Also rename
REDO_LOCKS to a more meaningful REDO_CYCLES.  And we'll move the
CyclicDependencyError exception in here as well, instead of state.py
where it doesn't really belong.
2018-12-05 02:26:58 -05:00
Avery Pennarun
ded14507b0 Rename vars{,_init}.py -> env{,_init}.py.
This fixes some pylint 'redefined builtins' warnings.  While I was
here, I fixed the others too by renaming a few local variables.
2018-12-05 02:26:49 -05:00
Avery Pennarun
65cf1c9854 Rename jwack.py -> jobserver.py.
I'm not really sure why I called it jwack.  I think it was kind of a
wack jobserver(tm).  But nowadays most of the wack-ness is gone.
2018-12-05 00:22:10 -05:00
Avery Pennarun
6e96395d48 redo/version: fix pylint warning. 2018-12-05 00:22:10 -05:00