If a checksummed target A used to exist but is now missing, and we tried to
redo-ifchange that exact file, we would unnecessarily run 'redo-oob A A';
that is, we have to build A in order to determine if A needs to be built.
The sub-targets of redo-oob aren't run with REDO_UNLOCKED, so this would
deadlock instantly.
Add an assertion to redo-oob to ensure we never try to redo-ifchange the
primary target (thus converting the deadlock into an exception). And skip
doing redo-oob when the target is already the same as the thing we have to
check.
We were giving up and rebuilding the toplevel object, which did eventually
rebuild our checksummed file, but then the file turned out to be identical
to what it was before, so that nobody *else* who depended on it ended up
getting rebuilt. So the results were indeterminate.
Now we treat it as if its dirtiness is unknown, so we build it using
redo-oob before building any of its dependencies.
If a depends on b depends on c, and c is dirty but b uses redo-stamp
checksums, then 'redo-ifchange a' is indeterminate: we won't know if we need
to run a.do unless we first build b, but the script that *normally* runs
'redo-ifchange b' is a.do, and we don't want to run that yet, because we
don't know for sure if b is dirty, and we shouldn't build a unless one of
its dependencies is dirty. Eek!
Luckily, there's a safe solution. If we *know* a is dirty - eg. because
a.do or one of its children has definitely changed - then we can just run
a.do immediately and there's no problem, even if b is indeterminate, because
we were going to run a.do anyhow.
If a's dependencies are *not* definitely dirty, and all we have is
indeterminate ones like b, then that means a's build process *hasn't
changed*, which means its tree of dependencies still includes b, which means
we can deduce that if we *did* run a.do, it would end up running b.do.
Since we know that anyhow, we can safely just run b.do, which will either
b.set_checked() or b.set_changed(). Once that's done, we can re-parse a's
dependencies and this time conclusively tell if it needs to be redone or
not. Even if it does, b is already up-to-date, so the 'redo-ifchange b'
line in a.do will be fast.
...now take all the above and do it recursively to handle nested
dependencies, etc, and you're done.
We were rebuilding the checksummed file every time because redo-ifchange was
incorrectly assuming that a child's changed_runid that's greater than my
changed_runid means I'm dirty. But if my checked_runid is >= the child's
checked_runid, then I'm clean, because my checksum didn't change.
Clear as mud?
That way the user can modify an auto-generated 'compile' script, for
example, and it'll stay modified.
If they delete the file, we can then generate it for them again.
Also, we have to warn whenever we're doing this, or people might think it's
a bug.
It's really a separate condition. And since we're not removing the target
*file* in case of error - we update it atomically, and keeping it is better
than losing it - there's no reason to wipe the timestamp in that case
either.
However, we do need to know that the build failed, so that anybody else
(especially in a parallel build) who looks at that target knows that it
died. So add a separate flag just for that.
When you have lots of unmodified dependencies, building these printout
strings (which aren't even printed unless you're using -d) ends up taking
something like 5% of the runtime.
It passes all tests when run serialized, but still gives weird errors
(OperationalError: database is locked) when run with -j5. sqlite3 shouldn't
be barfing just because the database is locked, since the default timeout is
5 seconds, and it's dying *way* faster than that.
The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty
complicated. In this case, we accidentally assumed that the current
instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so
the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the
current .do script did chdir().
The correct answer is REDO_STARTDIR+getcwd()+t.
We would build 'somefile' correctly the first time, but we wouldn't
attach the dependency on somefile to the right $TARGET, so our target would
not auto-rebuild in the future based on somefile.
It actually decreases readability of the .do files - by not making it
explicit when you're going into a subdir.
Plus it adds ambiguity: what if there's a dirname.do *and* a dirname/all?
We could resolve the ambiguity if we wanted, but that adds more code, while
taking out this special case makes *less* code and improves readability.
I think it's the right way to go.
The behaviour is what we wanted, but it shouldn't have worked. So fix the
bug in redo-ifchange, then change test.do to use 'redo' instead so it
continues to do what we want, only for the right reason.
(The bug is that 'redo-ifchange dirname', which runs dirname/all.do, didn't
result in stamps getting checked correctly.)
Again, I forgot to make vars.py not crash if the variables aren't set, so we
can print a useful error message. But this time I have the right solution:
vars.py will do the checking for itself, and abort with a nice message.
We can also avoid forking altogether if should_build() returns false. This
doesn't seem to result in any noticeable speedup, but it's cleaner at least.
dirty_deps() changed its meaning now that we also have to check
state.isbuilt(). Now, just because dirty_deps() returns true doesn't mean
that the file should be unstamped (which forces a rebuild); this might have
happened because of state.isbuilt, which means someone already *did* do a
rebuild.
If we get past state.isbuilt() and into looking at the children, however,
and one of the children is dirty, then we should definitely unstamp the
current target.
If someone else built and marked one of our dependencies, then that
dependency would show up as *clean* in a later redo-ifchange, so other
dependents of that file wouldn't be rebuilt.
We actually have to track two session-specific variables: whether the file
has been checked, and whether it was rebuilt. (Or alternatively, whether it
was dirty when we checked it the first time. But we store the former.)
If a depends on b depends on c, then if when we consider building a, we have
to check b and c. If we then are asked about a2 which depends on b, there
is no reason to re-check b and its dependencies; we already know it's done.
This takes the time to do 'redo t/curse/all' the *second* time down from
1.0s to 0.13s. (make can still do it in 0.07s.)
'redo t/curse/all' the first time is down from 5.4s to to 4.6s. With -j4,
from 3.0s to 2.5s.
This doesn't really seem to change anything, but it's more correct and
should reveal weirdness (especially an incorrect .redo directory in a
sub-redo) sooner.
This makes 'redo -j1000' now run successfully in t/curse, except that we
foolishly generate the same files more than once. But at least not more
than once *in parallel*.
The problem is that redo-ifchange has a different $PWD than its
sub-dependencies, so as it's chasing them down, fixing up the relative paths
totally doesn't work at all.
There's probably a much smarter fix than this, but it's too late at night to
think of it right now.