The builder was holding lock variables in the loop which means that
sometimes a state.Lock object would be created for the same file-id
twice, triggering the assertion. Assign the lock variables to None to
ensure that the state.Lock objects are destroyed before creating the
next one in the loop.
If a depends on b which depends on a, redo would just freeze. Now it
aborts with a somewhat helpful error message.
[Updated by apenwarr for coding style and to add a test.]
Removed information (ctime) that was causing spurious
"you modified it" warnings.
[apenwarr: This was not quite right. ctime includes some things we
don't want to care about, such as link count, so we have to remove it.
But it also notices changes to st_uid, st_gid, and st_mode, which we do
care about, so now we have to include those explicitly.]
[apenwarr's note: ctime includes extra inode attributes like link
count, which are not important for this check, but which could cause
spurious warnings.]
The >& form is only for file descriptors, passing a file name there is
a bash extension.
$ /bin/dash -c 'echo foo >&/dev/null'
/bin/dash: 1: Syntax error: Bad fd number
WAL mode does make the deadlocking on MacOS go away. This suggests
that pysqlite3 was leaving *read* transactions open for a long time; in
old-style sqlite journals, this prevents anyone from obtaining a write
lock, although it doesn't prevent other concurrent reads.
With WAL journals, writes can happen even while readers are holding a
lock, but the journal doesn't flush until the readers have released it.
This is not a "real" fix but it's fairly harmless, since all redo
instances will exit eventually, and when they do, the WAL journal can
be flushed.
I think we were sometimes leaving half-done sqlite transactions sitting
around for a long time (eg. across sub-calls to .do files). This
seemed to be okay on Linux, but caused sqlite deadlocks on MacOS. Most
likely it's not the operating system, but the sqlite version and
journal mode in use.
In any case, the correct thing to do is to actually commit or rollback
transactions, not leave them hanging around.
...unfortunately this doesn't actually fix my MacOS deadlocks, which
makes me rather nervous.
If ./default.do knows how to build x/y/z, then we will run
./default.do x/y/z x/y/z x__y__z.redo2.tmp
which can correctly generate $3, but then we can fail to rename it to
x/y/z because x/y doesn't exist. This would previously through an
exception. Now it prints a helpful error message.
default.do may create x/y, in which case renaming will succeed.
I think this aligns better with how redo works. Otherwise, if a.do
creates a as a symlink, then changes to the symlink's *target* will
change a's stat/stamp information without re-running a.do, which looks
to redo like you modified a by hand, which causes it to stop running
a.do altogether.
With this change, modifications to a's target are okay, but they don't
trigger any redo dependency changes. If you want that, then a.do
should redo-ifchange on its symlink target explicitly.
For example:
$ redo-whichdo a/b/c/.x.y
- a/b/c.x.y.do
- a/b/default.x.y.do
- a/b/default.y.do
- a/b/default.do
- a/default.x.y.do
- a/default.y.do
- a/default.do
- default.x.y.do
- default.y.do
+ default.do
1 a/b/c.x.y
2 a/b/c.x.y
Lines starting with '-' mean a potential .do file that did not exist,
so we moved onto the next choice (but consider using redo-ifcreate in
case it gets created). '+' means the .do file we actually chose. '1'
and '2' are the $1 and $2 to pass along to the given .do file if you want to
call it for the given target.
(The output format is a little weird to make sure it's parseable with
sh 'read x y' calls, even when filenames contain spaces or special
characters.)
GNU make post-4.2 renamed the --jobserver-fds option to
--jobserver-auth. For compatibility with both older and newer
versions, when we set MAKEFLAGS we set both, and when we read MAKEFLAGS
we will accept either one.
Also, when MAKEFLAGS was not already set, redo would set a MAKEFLAGS with a
leading 'None' string, which was incorrect. It should be the empty
string instead.
...or files that contain bytes but not a trailing newline. It's okay if we
don't get any data, but we definitely have to *not* let "set -e" abort us.
Now that we fixed set -e in the previous patch, it revealed this problem.
If you run something like
blah_function || return 1
then everything even *inside* blah_function is *not* subject to the "set -e"
that would otherwise be in effect. That's true even for ". subfile" inside
blah_function - which is exactly how minimal/do runs .do files.
Instead, rewrite it as
blah_function
[ "$?" = "0" ] || return 1
And add a bit to the unit tests to ensure that "set -e" behaviour is enabled
in .do files as we expect, and crash loudly otherwise.
(This weird behaviour may only happen in some shells and not others.)
Also, we had a "helpful" alias of redo() defined at the bottom of the file.
Combined with the way we use '.' to source the .do files, this would make it
not start a new shell just to run a recursive 'redo' command. It almost
works, but this stupid "set -e" bug could cause a nested .do file to not
honour "set -e" if someone ran "redo foo || exit 1" from inside a .do
script. The performance optimization is clearly not worth it here, so
rename it to _redo(); that causes it to actually re-exec the redo program
(which is a symlink to minimal/do).
If you use "redo --old-args", it will switch back to the old
(apenwarr-style) arguments for now, to give you time to update your .do
scripts. This option will go away eventually.
Note: minimal/do doesn't understand the --old-args option. If you're using
minimal/do in your project, keep using the old one until you update your use
of $1/$2, and then update to the new one.
apenwarr-style default.o.do:
$1 foo
$2 .o
$3 whatever.tmp
djb-style default.o.do:
$1 foo.o
$2 foo
$3 whatever.tmp
apenwarr-style foo.o.do:
$1 foo.o
$2 ""
$3 whatever.tmp
djb-style foo.o.do:
$1 foo.o
$2 foo.o (I think?)
$3 whatever.tmp
If the error message only triggered a shelltest warning instead of a
failure, then they're harmless, so we might as well silence them when
running 'redo test' (which runs t/shelltest.do, which enables
SHELLTEST_QUIET). We still want to print them when actually testing out
shelltest.od, though, to help with debugging the script.
If y contains a * character, it's a case-esac style wildcard unless it's in
"" or '', which tells it to match a literal *. But in dash/busybox, the ''
quoting doesn't actually work, only "" does.
Since there's a workaround - just always use "" quoting - it's not that
fatal. If the "" quoting doesn't work, then we'll make it fail.
Previously, if 'redo-ifchange foo' failed last time, then creating foo
manually wouldn't help; 'redo-ifchange foo' would still try to rebuild it.
But if the first run *did* create it, then manually overriding it *did*
work.
That inconsistency is pointless. If the user creates it by hand, it doesn't
matter if it failed to build last time or not; the user wants it overridden.
So this way, something that can't build can at least be manually created as
a hack.