The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty
complicated. In this case, we accidentally assumed that the current
instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so
the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the
current .do script did chdir().
The correct answer is REDO_STARTDIR+getcwd()+t.
If 'redo clean' deletes the lockfile after trylock() succeeds but before
unlock(), then unlock() won't be able to open the pipe in order to release
readers, and any waiters might end up waiting forever.
We can't open the fifo for write until there's at least one reader, so let's
open a reader *just* to let us open a writer. Then we'll leave them open
until the later unlock(), which can just close them both.
Now t/curse passes again when parallelized (except for the countall
mismatch, since we haven't fixed the source of that problem yet). At least
it's consistent now.
There's a bunch of stuff rearranged in here, but the actual important
problem was that we were doing unlink() on the lock fifo even if ENXIO,
which meant a reader could connect in between ENXIO and unlink(), and thus
never get notified of the disconnection. This would cause the build to
randomly freeze.
The problem is that redo-ifchange has a different $PWD than its
sub-dependencies, so as it's chasing them down, fixing up the relative paths
totally doesn't work at all.
There's probably a much smarter fix than this, but it's too late at night to
think of it right now.
atoi() was getting redundant, and unfortunately we can't easily load
helpers.py in some places where we'd want to, because it depends on vars.py.
So move it to its own module.
It used to say:
redo: t/all
redo: hello
and now it says:
redo t/all
redo t/hello
ie. there's no colon, and the path is intact. That means if the build
fails, you can cut-and-paste 'redo t/hello', add a -v, and try to debug
what went wrong.