redo now saves the stderr from every .do script, for every target, into
a file in the .redo directory. That means you can look up the logs
from the most recent build of any target using the new redo-log
command, for example:
redo-log -r all
The default is to show logs non-recursively, that is, it'll show when a
target does redo-ifchange on another target, but it won't recurse into
the logs for the latter target. With -r (recursive), it does. With -u
(unchanged), it does even if redo-ifchange discovered that the target
was already up-to-date; in that case, it prints the logs of the *most
recent* time the target was generated.
With --no-details, redo-log will show only the 'redo' lines, not the
other log messages. For very noisy build systems (like recursing into
a 'make' instance) this can be helpful to get an overview of what
happened, without all the cruft.
You can use the -f (follow) option like tail -f, to follow a build
that's currently in progress until it finishes. redo itself spins up a
copy of redo-log -r -f while it runs, so you can see what's going on.
Still broken in this version:
- No man page or new tests yet.
- ANSI colors don't yet work (unless you use --raw-logs, which gives
the old-style behaviour).
- You can't redirect the output of a sub-redo to a file or a
pipe right now, because redo-log is eating it.
- The regex for matching 'redo' lines in the log is very gross.
Instead, we should put the raw log files in a more machine-parseable
format, and redo-log should turn that into human-readable format.
- redo-log tries to "linearize" the logs, which makes them
comprehensible even for a large parallel build. It recursively shows
log messages for each target in depth-first tree order (by tracing
into a new target every time it sees a 'redo' line). This works
really well, but in some specific cases, the "topmost" redo instance
can get stuck waiting for a jwack token, which makes it look like the
whole build has stalled, when really redo-log is just waiting a long
time for a particular subprocess to be able to continue. We'll need to
add a specific workaround for that.
On systems where 'python' refers to python3, redo
failed to launch. All invocations of python have been
made explicitly python2 invocations. All tests pass
on an Arch Linux system as of this commit.
I think we were sometimes leaving half-done sqlite transactions sitting
around for a long time (eg. across sub-calls to .do files). This
seemed to be okay on Linux, but caused sqlite deadlocks on MacOS. Most
likely it's not the operating system, but the sqlite version and
journal mode in use.
In any case, the correct thing to do is to actually commit or rollback
transactions, not leave them hanging around.
...unfortunately this doesn't actually fix my MacOS deadlocks, which
makes me rather nervous.
If you use "redo --old-args", it will switch back to the old
(apenwarr-style) arguments for now, to give you time to update your .do
scripts. This option will go away eventually.
Note: minimal/do doesn't understand the --old-args option. If you're using
minimal/do in your project, keep using the old one until you update your use
of $1/$2, and then update to the new one.
apenwarr-style default.o.do:
$1 foo
$2 .o
$3 whatever.tmp
djb-style default.o.do:
$1 foo.o
$2 foo
$3 whatever.tmp
apenwarr-style foo.o.do:
$1 foo.o
$2 ""
$3 whatever.tmp
djb-style foo.o.do:
$1 foo.o
$2 foo.o (I think?)
$3 whatever.tmp
We already did this in minimal/do, and redo-ifchange already did this, but
plain redo didn't. This made constructs like:
for d in *.x; do
echo "${d%.x}"
done | xargs redo
dangerous, because if there were no matching files, we'd try to 'redo all'.
In redo-ifchange, this might be a good idea, since you might just want to
set a dependency on it, so we won't say anything from inside builder.py.
But if you're calling redo.py, that means you expect it to be rebuilt, since
there's no other reason to try. So print a warning.
(This is what make does, more or less.)
We can also avoid forking altogether if should_build() returns false. This
doesn't seem to result in any noticeable speedup, but it's cleaner at least.
Previously, the default was to *always* keep going, which is actually not
usually what you want. Now we actually exit correctly after an error. Of
course you still might have multiple errors before existing if you were
building in parallel.
Get rid of the "locked..." and "...unlocked!" messages by default, since
they're not usually interesting. But add a new option to bring them back in
case we end up with trouble debugging the locking stuff. (I don't really
100% trust it yet, although I haven't had a problem for a while now.)
If a depends on b depends on c, then if when we consider building a, we have
to check b and c. If we then are asked about a2 which depends on b, there
is no reason to re-check b and its dependencies; we already know it's done.
This takes the time to do 'redo t/curse/all' the *second* time down from
1.0s to 0.13s. (make can still do it in 0.07s.)
'redo t/curse/all' the first time is down from 5.4s to to 4.6s. With -j4,
from 3.0s to 2.5s.
That way, if everything is locked, we can determine that with a single
token, reducing context switches.
But mostly this is good because the code is simpler.
The 'redo' command is supposed to *always* rebuild, not just if nobody else
rebuilt it. (If you want "rebuild sometimes" behaviour, use redo-ifchange.)
Thus, we shouldn't be short circuiting it just because a file was previously
locked and then built okay.
However, there's still a race condition in parallel builds, because
redo-ifchange only checks the build stamp of each file once, then passes it
to redo. Thus, we end up trying to build the same stuff over and over.
This change actually makes it build *more* times, which seems dumb, but is
one step closer to right.
Doing this broke 'make test', however, because we were unlinking the target
right before building it, rather than replacing it atomically as djb's
original design suggested we should do. Thus, because of the combination of
the above two bugs, CC would appear and then disappear even as people were
trying to actually use it. Now it gets replaced atomically so it should
at least work at all times... even though we're still building it more than
once, which is incorrect.
Now t/curse passes again when parallelized (except for the countall
mismatch, since we haven't fixed the source of that problem yet). At least
it's consistent now.
There's a bunch of stuff rearranged in here, but the actual important
problem was that we were doing unlink() on the lock fifo even if ENXIO,
which meant a reader could connect in between ENXIO and unlink(), and thus
never get notified of the disconnection. This would cause the build to
randomly freeze.
This doesn't really seem to change anything, but it's more correct and
should reveal weirdness (especially an incorrect .redo directory in a
sub-redo) sooner.
This makes 'redo -j1000' now run successfully in t/curse, except that we
foolishly generate the same files more than once. But at least not more
than once *in parallel*.
Now people waiting for a lock can wait for the fifo to be ready, which means
it's instant instead of polled. Very pretty. Probably doesn't work on
Windows though.
The problem is that redo-ifchange has a different $PWD than its
sub-dependencies, so as it's chasing them down, fixing up the relative paths
totally doesn't work at all.
There's probably a much smarter fix than this, but it's too late at night to
think of it right now.
atoi() was getting redundant, and unfortunately we can't easily load
helpers.py in some places where we'd want to, because it depends on vars.py.
So move it to its own module.
The problem is if someone accidentally creates a file called "test" *before*
.redo/gen^test got created, then 'redo test' would do nothing, because redo
would assume it's a source file instead of a destination, according to djb's
rule. But in this case, we know it's not, since test.do exists, so let's
build it anyway. The problem is related to .PHONY rules in make.
This workaround is kind of cheating, because we can't safely apply that rule
if foo and default.do exist, even though default.do can be used to build
foo.
This probably won't happen very often... except with minimal/do, which
creates these empty files even when it shouldn't. I'm not sure if I should
try to fix that or not, though.
Previously, for testing, we were *always* randomizing the build order of
dependencies. That's annoying since it'll make build logs differ randomly
from one run to the next, which could make comparisons harder. However, the
feature is still useful for uncovering hidden dependencies between objects.
Reading the docs for GNU make more closely, it seems they *don't* use the
one from the environment, because the user's interactive shell preferences
shouldn't affect how the Makefile runs. Good point.
In a Makefile, you can define SHELL explicitly, and that works. But let's
worry about that some other time.
This could be good for distributing with your packages, so that people who
don't have redo installed can at least build it. Also, we could use it for
building redo itself.
Will surely need to get slightly bigger as I inevitably discover I've
forgotten a critical feature.
So that more than one redo doesn't try to build the same thing at the same
time. Kind of dumb, though, since it currently wipes out all the locks at
the toplevel, so running more than one at a time won't give accurate
results, but the -j option doesn't do anything yet.
It used to say:
redo: t/all
redo: hello
and now it says:
redo t/all
redo t/hello
ie. there's no colon, and the path is intact. That means if the build
fails, you can cut-and-paste 'redo t/hello', add a -v, and try to debug
what went wrong.
This is a departure from how djb seems to have it set up, but I just like it
better. It's more like the reasonably-common Makefile standard. (Although
what make *actually* does is just use the first target declared in the
file.)
This is what GNU make does. If SHELL isn't defined, we still fall back to
calling sh.
Rumour has it that Google has some kind of build system that can be
massively distributed if you just set SHELL to the right program; maybe
it'll work with redo now. (Of course it won't do you any good until we
implement parallel builds...)