If a depends on b depends on c, then if when we consider building a, we have
to check b and c. If we then are asked about a2 which depends on b, there
is no reason to re-check b and its dependencies; we already know it's done.
This takes the time to do 'redo t/curse/all' the *second* time down from
1.0s to 0.13s. (make can still do it in 0.07s.)
'redo t/curse/all' the first time is down from 5.4s to to 4.6s. With -j4,
from 3.0s to 2.5s.
redo: 5.4s
redo -j4: 3.0s
make: 2.3s
make -j4: 1.4s
make SHELL=/bin/dash: 1.2s
make SHELL=/bin/dash -j4: 0.83s
We have some distance to go yet. Of course, redo is still written in
python, not C, so it's very expensive, and the on-disk dependency store is
very inefficient.
This greatly reduces the number of fork+exec calls, so in particular,
t/curse/all.do now runs much faster:
/bin/sh (bash): was 5.9s, now 2.2s
/bin/dash: was 3.2s, now 1.1s
Obviously improving the speed of minimal/do doesn't really matter, except
that it makes a good benchmark to compare the "real" redo against. So far
it's losing badly: 5.4s.
That way, if everything is locked, we can determine that with a single
token, reducing context switches.
But mostly this is good because the code is simpler.
The 'redo' command is supposed to *always* rebuild, not just if nobody else
rebuilt it. (If you want "rebuild sometimes" behaviour, use redo-ifchange.)
Thus, we shouldn't be short circuiting it just because a file was previously
locked and then built okay.
However, there's still a race condition in parallel builds, because
redo-ifchange only checks the build stamp of each file once, then passes it
to redo. Thus, we end up trying to build the same stuff over and over.
This change actually makes it build *more* times, which seems dumb, but is
one step closer to right.
Doing this broke 'make test', however, because we were unlinking the target
right before building it, rather than replacing it atomically as djb's
original design suggested we should do. Thus, because of the combination of
the above two bugs, CC would appear and then disappear even as people were
trying to actually use it. Now it gets replaced atomically so it should
at least work at all times... even though we're still building it more than
once, which is incorrect.
Now t/curse passes again when parallelized (except for the countall
mismatch, since we haven't fixed the source of that problem yet). At least
it's consistent now.
There's a bunch of stuff rearranged in here, but the actual important
problem was that we were doing unlink() on the lock fifo even if ENXIO,
which meant a reader could connect in between ENXIO and unlink(), and thus
never get notified of the disconnection. This would cause the build to
randomly freeze.
This doesn't really seem to change anything, but it's more correct and
should reveal weirdness (especially an incorrect .redo directory in a
sub-redo) sooner.
This makes 'redo -j1000' now run successfully in t/curse, except that we
foolishly generate the same files more than once. But at least not more
than once *in parallel*.
...because it seems my locking isn't very good. It exposes annoying
problems involving rebuilding the same files more than once, screwing up
stamp files with redo -j, and being unnecessarily slow when checking
dependencies. So it's a pretty good test considering how simple it is.
Didn't add it to t/all.do yet, because it would fail.
Now people waiting for a lock can wait for the fifo to be ready, which means
it's instant instead of polled. Very pretty. Probably doesn't work on
Windows though.
The problem is that redo-ifchange has a different $PWD than its
sub-dependencies, so as it's chasing them down, fixing up the relative paths
totally doesn't work at all.
There's probably a much smarter fix than this, but it's too late at night to
think of it right now.
atoi() was getting redundant, and unfortunately we can't easily load
helpers.py in some places where we'd want to, because it depends on vars.py.
So move it to its own module.
The problem is if someone accidentally creates a file called "test" *before*
.redo/gen^test got created, then 'redo test' would do nothing, because redo
would assume it's a source file instead of a destination, according to djb's
rule. But in this case, we know it's not, since test.do exists, so let's
build it anyway. The problem is related to .PHONY rules in make.
This workaround is kind of cheating, because we can't safely apply that rule
if foo and default.do exist, even though default.do can be used to build
foo.
This probably won't happen very often... except with minimal/do, which
creates these empty files even when it shouldn't. I'm not sure if I should
try to fix that or not, though.
Previously, for testing, we were *always* randomizing the build order of
dependencies. That's annoying since it'll make build logs differ randomly
from one run to the next, which could make comparisons harder. However, the
feature is still useful for uncovering hidden dependencies between objects.
Reading the docs for GNU make more closely, it seems they *don't* use the
one from the environment, because the user's interactive shell preferences
shouldn't affect how the Makefile runs. Good point.
In a Makefile, you can define SHELL explicitly, and that works. But let's
worry about that some other time.
This could be good for distributing with your packages, so that people who
don't have redo installed can at least build it. Also, we could use it for
building redo itself.
Will surely need to get slightly bigger as I inevitably discover I've
forgotten a critical feature.
We'll have to stop using nonblocking reads, unfortunately. But this seems
to work better than nothing. There's still a race condition that could
theoretically make GNU make angry, unfortunately, since we briefly set the
socket to nonblocking.
So that more than one redo doesn't try to build the same thing at the same
time. Kind of dumb, though, since it currently wipes out all the locks at
the toplevel, so running more than one at a time won't give accurate
results, but the -j option doesn't do anything yet.
It used to say:
redo: t/all
redo: hello
and now it says:
redo t/all
redo t/hello
ie. there's no colon, and the path is intact. That means if the build
fails, you can cut-and-paste 'redo t/hello', add a -v, and try to debug
what went wrong.
This is a departure from how djb seems to have it set up, but I just like it
better. It's more like the reasonably-common Makefile standard. (Although
what make *actually* does is just use the first target declared in the
file.)
This is what GNU make does. If SHELL isn't defined, we still fall back to
calling sh.
Rumour has it that Google has some kind of build system that can be
massively distributed if you just set SHELL to the right program; maybe
it'll work with redo now. (Of course it won't do you any good until we
implement parallel builds...)