In flush-cache.sh, we have to do this, because the sqlite3 command-line tool
sets it to zero. Inevitably during parallel testing, it'll end up
contending for a lock, and we really want it to wait a bit.
In state.py, it's not as important since the default is nonzero. But
python-sqlite3's default of 5 seconds makes me a little too nervous; I can
imagine a disk write waiting for more than 5 seconds sometime. So let's use
60 instead.
It passes all tests when run serialized, but still gives weird errors
(OperationalError: database is locked) when run with -j5. sqlite3 shouldn't
be barfing just because the database is locked, since the default timeout is
5 seconds, and it's dying *way* faster than that.
This allows files to transition from generated to not-generated if the .do
file is ever removed (ie. the user is changing things and the file is now a
source file, not a target).
So if you have a default.do, it might be used to build mydir, even if mydir
already existed when we started.
This might be useful if you have a mydir.setup.do and a mydir.do; mydir.do
depends on mydir.setup.do, but mydir.setup.do creates mydir, it just doesn't
*finish* with mydir. Still, my the time mydir.do runs, mydir already
exists, and redo would get confused.
I think directories are fundamentally special in this way, because it makes
sense to "create" a directory even if that directory isn't "done" at this
phase.
The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty
complicated. In this case, we accidentally assumed that the current
instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so
the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the
current .do script did chdir().
The correct answer is REDO_STARTDIR+getcwd()+t.
If a and b both depend on c, and c is a static (non-generated) file that has
changed since the last successful build of a and b, we would try to redo
a, but would forget to redo b. Now it does both.
We never chdir() except just as we exec a subprocess, so it's okay to cache
this value. This makes strace output look cleaner, and speeds things up a
little bit when checking a large number of dependencies.
Relatedly, take a debug2() message and put it in an additional if, so that
we don't have to do so much work to calculate it when we're just going to
throw it away anyhow.
If a file previously was generated but now isn't (ie. its .do file
disappears), we would never re-stamp that target, and so all its
dependencies would rebuild continually.
We would build 'somefile' correctly the first time, but we wouldn't
attach the dependency on somefile to the right $TARGET, so our target would
not auto-rebuild in the future based on somefile.
It actually decreases readability of the .do files - by not making it
explicit when you're going into a subdir.
Plus it adds ambiguity: what if there's a dirname.do *and* a dirname/all?
We could resolve the ambiguity if we wanted, but that adds more code, while
taking out this special case makes *less* code and improves readability.
I think it's the right way to go.
Normally, creating the target $1 yourself is bad; create $3 instead. But if
$1 is a directory, we'll allow it. That way 'redo subdir' can call
subdir.do, and subdir.do can both create the directory *and* run a bunch of
sub-.do files on it.
We had a bug (fixed in the previous commit) where doing 'redo-ifchange
dirname' (which runs dirname/all.do) would not create the stamp correctly,
so that it would always show up as dirty.
It's a little bit complicated to simulate, but this does it.
The behaviour is what we wanted, but it shouldn't have worked. So fix the
bug in redo-ifchange, then change test.do to use 'redo' instead so it
continues to do what we want, only for the right reason.
(The bug is that 'redo-ifchange dirname', which runs dirname/all.do, didn't
result in stamps getting checked correctly.)
Unfortunately it failed before the previous patch, so that's why this test
is needed :(
The test is a little ugly, because the bug I'm testing for didn't happen
except if you ran 'redo' two times in a row, not two times inside the same
redo session. That's because dependency caching inside the one session
prevents the accidental rebuild.
...because we deliberately stamp non-generated files as well, and that
doesn't need to imply that we rebuilt them just now. In fact, we know for a
fact that we *didn't* rebuild them just now, but we still need to record the
timestamp for later.
.do files should never modify $1, and should write to *either* $3 or stdout,
but not both. If they write to both, it's probably because they forgot to
redirect stdout to stderr, a very easy mistake to make but a hard one to
detect.
Now redo detects it for you and prints an informative message.
If 'redo clean' deletes the lockfile after trylock() succeeds but before
unlock(), then unlock() won't be able to open the pipe in order to release
readers, and any waiters might end up waiting forever.
We can't open the fifo for write until there's at least one reader, so let's
open a reader *just* to let us open a writer. Then we'll leave them open
until the later unlock(), which can just close them both.
Again, I forgot to make vars.py not crash if the variables aren't set, so we
can print a useful error message. But this time I have the right solution:
vars.py will do the checking for itself, and abort with a nice message.
Otherwise it could be inherited by other jwack jobs started from the same
parent process, resulting in some very-hard-to-debug race conditions, let me
tell you.
This avoids a second fork altogether, which apparently doesn't matter all
that much (5.3s -> 5.15s). However, simply not importing the subprocess
module reduces us further to 4.6s.
And 'ps axf' now looks prettier because we have fewer stupid intermediate
processes.
instead of inside the fork.
Still doesn't seem to affect runtime. Good.
One nice side effect is jwack.py no longer needs to know anything about our
locks.
We can also avoid forking altogether if should_build() returns false. This
doesn't seem to result in any noticeable speedup, but it's cleaner at least.
Previously, the default was to *always* keep going, which is actually not
usually what you want. Now we actually exit correctly after an error. Of
course you still might have multiple errors before existing if you were
building in parallel.
Get rid of the "locked..." and "...unlocked!" messages by default, since
they're not usually interesting. But add a new option to bring them back in
case we end up with trouble debugging the locking stuff. (I don't really
100% trust it yet, although I haven't had a problem for a while now.)
Now 'redo test' runs the tests, but 'redo t' just builds the programs.
Also removed wvtest stuff; we're not really using it properly anyway and
it's not helping our testing right now. It might come back later.