Because redo targets are nicely isolated (unlike make targets), you
usually only want to debug one of them at a time. Using -x could be
confusing, because you might end up with a dump of output from a
dependency you're not interested in.
Now, by default we'll disable -x when recursing into sub-targets, so
you only see the trace from the targets you are actually trying to
debug. To get recursive behaviour, specify -x twice, eg. -xx.
Same idea with -v.
Previously we were truncating the log if it existed. This would cause
redo-log to produce invalid output if you had the following (admittedly
rare) sequence in a single session:
- start building X
- redo-log starts showing the log for X
- finish building X
- redo-log has not finished showing the log for X yet
- start building X again for some reason
- redo-log sees a truncated logfile.
Now, redo-log can finish reading the original file (which no longer has
a filename since it was overwritten) while the new file is being
created.
This only happened if the containing project was buggy, ie. you tried
to build a target that has no .do file available anywhere. However, it
resulted in a confusing outcome for that case, where we'd run the wrong
default.do file with the wrong parameters.
Extended an existing test to catch this mistake.
This is relatively harmless, since we treat them *almost* identically,
except that we print a warning for overridden files to remind you that
something fishy is going on.
Add a test for the actual warning message to ensure it is printed. (I
don't like tests for specific warning messages, but it was necessary in
this case.)
If we failed because:
- target dir doesn't exist
- failed to copy from stdout
- failed to rename $3
We would correctly return error 209, but the target would still be
marked as having been built, so redo-ifchange would not try to build it
next time, beause we forgot to call sf.set_failed() in those cases.
minimal/do worked correctly.
Added a test to catch this in the future.
Reconnect the builder's original stderr file descriptor after the logger has
finished its job.
Fixes that redo could not be run without a controlling terminal.
Silently recover if REDO_CHEATFDS file descriptors are closed, because
they aren't completely essential and MAKEFLAGS-related warnings already
get printed if all file descriptors have been closed.
If MAKEFLAGS --jobserver-auth flags are closed, improve the error
message so that a) it's a normal error instead of an exception and b)
we link to documentation about why it happens. Also write some more
detailed documentation about what's going on here.
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI
for userspace processes, but the current version doesn't not implement
fcntl() locks at all; it just always returns success. See
https://github.com/Microsoft/WSL/issues/1927.
This causes us three kinds of problem:
1. sqlite3 in WAL mode gives "OperationalError: locking protocol".
1b. Other sqlite3 journal modes also don't work when used by
multiple processes.
2. redo parallelism doesn't work, because we can't prevent the same
target from being build several times simultaneously.
3. "redo-log -f" doesn't work, since it can't tell whether the log
file it's tailing is "done" or not.
To fix#1, we switch the sqlite3 journal back to PERSIST instead of
WAL. We originally changed to WAL in commit 5156feae9d to reduce
deadlocks on MacOS. That was never adequately explained, but PERSIST
still acts weird on MacOS, so we'll only switch to PERSIST when we
detect that locking is definitely broken. Sigh.
To (mostly) fix#2, we disable any -j value > 1 when locking is broken.
This prevents basic forms of parallelism, but doesn't stop you from
re-entrantly starting other instances of redo. To fix that properly,
we need to switch to a different locking mechanism entirely, which is
tough in python. flock() locks probably work, for example, but
python's locks lie and just use fcntl locks for those.
To fix#3, we always force --no-log mode when we find that locking is
broken.
After waiting for children to exit, we would release our own token, and
then the caller would immediately try to obtain a token again. This
accounted for tokens correctly, but would pass tokens around the call
tree in unexpected ways.
For example, imagine we had only one token. We call 'redo a1 a2', and
a1 calls 'redo b1 b2', and b1 calls 'redo c1'. When c1 exits, it
releases its token, then tries to re-acquire it before exiting. This
also includes 'redo b1 b2' and 'redo a1 a2' in the race for the token,
which means b1 might get suspended while *either* a2 or b2 starts
running.
This never caused a deadlock, even if a2 or b2 depends on b1, because
if they tried to build b1, they would notice it is locked, give up
their token, and wait for the lock. c1 (and then b1) could then obtain
the token and immediately terminate, allowing progress to continue.
But this is not really the way we expect things to happen. "Obviously"
what we want here is a straightforward stack unwinding: c1 should finish,
then b1, then b2, then a1, then b2.
The not-very-obvious symptom of this bug is that redo's unit tests
seemed to run in the wrong order when using -j1 --no-log. (--log would
hide the problem by rearranging logs back into the right order!)
If a file is overridden and then overridden again, this caused us to
rebuild only the first thing that depends on it, but not any subsequent
things, which is a pretty serious bug.
It turned out that t/350-deps-forget is already supposed to test this,
but I had cleverly encoded the wrong behaviour into the expected
results in the table-driven test. I blame lack of sleep. Anyway, I
fixed the test, which made it fail, and then fixed the code, which made
it pass.
Although I expect this is rather rare, some people may want to build in
a read-write subdir of a read-only tree. Other than some confusing
error reporting, this works fine in redo after the recent changes to
temp file handling, but let's add a test to make sure it stays that
way. The test found a bug in minimal/do, so let's fix that.
Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
Previously, we'd try to put the stdout temp file in the same dir as the
target, if that dir exists. Otherwise we'd walk up the directory tree
looking for a good place. But this would go wrong if the directory we
chose got *deleted* during the run of the .do file.
Instead, we switch to an entirely new design: we use mkstemp() to
generate a temp file in the standard temp file location (probably
/tmp), then open it and immediately delete it, so the .do file can't
cause any unexpected behaviour. After the .do file exits, we use our
still-open fd to the stdout file to read the content back out.
In the old implementation, we also put the $3 in the "adjusted"
location that depended whether the target dir already existed, just for
consistency. But that was never necessary: we didn't create the $3
file, and if the .do script wants to write to $3, it should create the
target dir first anyway. So change it to *always* use a $3 temp
filename in the target dir, which is much simpler and so has fewer edge
cases.
Add t/202-del/deltest4 with some tests for all these edge cases.
Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
This can happen if we create the .tmp file in the same directory as the
target, and the .do file first does "rm -rf" on that directory, then
re-creates it. The result is that the stdout file is lost.
We'll make this a warning if the .do script *didn't* write to stdout
(so the loss is harmless, just weird), and an error if they *did* write
to stdout, which we can detect because we still have an open fd on the
file, so we can fstat() it.
Now that the python scripts are all in a "redo" python module, we can
use the "new style" (ahem) package-relative imports. This appeases
pylint, plus avoids confusion in case more than one package has
similarly-named modules.
Merge the two files into env, and make each command explicitly call the
function that sets it up in the way that's needed for that command.
This means we can finally just import all the modules at the top of
each file, without worrying about import order. Phew.
While we're here, remove the weird auto-appending-'all'-to-targets
feature in env.init(). Instead, do it explicitly, and only from redo and
redo-ifchange, only if is_toplevel and no other targets are given.
They really aren't locks at all, they're a cycle detector. Also rename
REDO_LOCKS to a more meaningful REDO_CYCLES. And we'll move the
CyclicDependencyError exception in here as well, instead of state.py
where it doesn't really belong.
It's time to start preparing for a version of redo that doesn't work
unless we build it first (because it will rely on C modules, and
eventually be rewritten in C altogether).
To get rolling, remove the old-style symlinks to the main programs, and
rename those programs from redo-*.py to redo/cmd_*.py. We'll also move
all library functions into the redo/ dir, which is a more python-style
naming convention.
Previously, install.do was generating wrappers for installing in
/usr/bin, which extend sys.path and then import+run the right file.
This made "installed" redo work quite differently from running redo
inside its source tree. Instead, let's always generate the wrappers in
bin/, and not make anything executable except those wrappers.
Since we're generating wrappers anyway, let's actually auto-detect the
right version of python for the running system; distros can't seem to
agree on what to call their python2 binaries (sigh). We'll fill in the
right #! shebang lines. Since we're doing that, we can stop using
/usr/bin/env, which will a) make things slightly faster, and b) let us
use "python -S", which tells python not to load a bunch of extra crap
we're not using, thus improving startup times.
Annoyingly, we now have to build redo using minimal/do, then run the
tests using bin/redo. To make this less annoying, we add a toplevel
./do script that knows the right steps, and a Makefile (whee!) for
people who are used to typing 'make' and 'make test' and 'make clean'.