I think we were sometimes leaving half-done sqlite transactions sitting
around for a long time (eg. across sub-calls to .do files). This
seemed to be okay on Linux, but caused sqlite deadlocks on MacOS. Most
likely it's not the operating system, but the sqlite version and
journal mode in use.
In any case, the correct thing to do is to actually commit or rollback
transactions, not leave them hanging around.
...unfortunately this doesn't actually fix my MacOS deadlocks, which
makes me rather nervous.
If ./default.do knows how to build x/y/z, then we will run
./default.do x/y/z x/y/z x__y__z.redo2.tmp
which can correctly generate $3, but then we can fail to rename it to
x/y/z because x/y doesn't exist. This would previously through an
exception. Now it prints a helpful error message.
default.do may create x/y, in which case renaming will succeed.
I think this aligns better with how redo works. Otherwise, if a.do
creates a as a symlink, then changes to the symlink's *target* will
change a's stat/stamp information without re-running a.do, which looks
to redo like you modified a by hand, which causes it to stop running
a.do altogether.
With this change, modifications to a's target are okay, but they don't
trigger any redo dependency changes. If you want that, then a.do
should redo-ifchange on its symlink target explicitly.
For example:
$ redo-whichdo a/b/c/.x.y
- a/b/c.x.y.do
- a/b/default.x.y.do
- a/b/default.y.do
- a/b/default.do
- a/default.x.y.do
- a/default.y.do
- a/default.do
- default.x.y.do
- default.y.do
+ default.do
1 a/b/c.x.y
2 a/b/c.x.y
Lines starting with '-' mean a potential .do file that did not exist,
so we moved onto the next choice (but consider using redo-ifcreate in
case it gets created). '+' means the .do file we actually chose. '1'
and '2' are the $1 and $2 to pass along to the given .do file if you want to
call it for the given target.
(The output format is a little weird to make sure it's parseable with
sh 'read x y' calls, even when filenames contain spaces or special
characters.)
If you use "redo --old-args", it will switch back to the old
(apenwarr-style) arguments for now, to give you time to update your .do
scripts. This option will go away eventually.
Note: minimal/do doesn't understand the --old-args option. If you're using
minimal/do in your project, keep using the old one until you update your use
of $1/$2, and then update to the new one.
apenwarr-style default.o.do:
$1 foo
$2 .o
$3 whatever.tmp
djb-style default.o.do:
$1 foo.o
$2 foo
$3 whatever.tmp
apenwarr-style foo.o.do:
$1 foo.o
$2 ""
$3 whatever.tmp
djb-style foo.o.do:
$1 foo.o
$2 foo.o (I think?)
$3 whatever.tmp
Previously, if 'redo-ifchange foo' failed last time, then creating foo
manually wouldn't help; 'redo-ifchange foo' would still try to rebuild it.
But if the first run *did* create it, then manually overriding it *did*
work.
That inconsistency is pointless. If the user creates it by hand, it doesn't
matter if it failed to build last time or not; the user wants it overridden.
So this way, something that can't build can at least be manually created as
a hack.
The reason we'd crash is that we tried to pre-create a file called
$target.redo.tmp, which wouldn't work because the directory containing
$target didn't exist.
We now try to generate a smarter filename by using the innermost directory
of target that *does* exist. It's a little messy, but the idea is to make
sure we won't have to rename() across a filesystem boundary if, for example,
there's a mounted filesystem in the middle of the hierarchy somewhere.
We were accidentally including things like the atime in the comparison,
which is obviously silly; someone reading the file shouldn't mark it as a
manual override.
If we're using a .do file from a parent directory, we should set $3 using
the same path prefix as $1. We were previously using just the basename,
which mostly works (since we would rename it to $1$2 eventually anyway) but
is not quite right, and you can't safely rename files across filesystems, so
it could theoretically cause problems.
Also improved t/defaults-nested to test for this behaviour.
Reported by Eric Kow.
The result was that t/deps/dirtest was actually failing in some cases. But
it wasn't failing quite reliably enough, because the failing test was
dirtest/dir1/all, which has the same name as some other 'all' files,
confusing the issue. Renamed dirtest/dir1/all.do to dirtest/dir1/go.do instead.
Reported by Prakhar Goel and Berke Durak.
* master:
Fixed markdown errors in README - code samples now correctly formatted.
Fix use of config.sh in example
log.py, minimal/do: don't use ansi colour codes if $TERM is blank or 'dumb'
Use named constants for terminal control codes.
redo-sh: keep testing even after finding a 'good' shell.
redo-sh.do: hide warning output from 'which' in some shells.
redo-sh.do: wrap long lines.
Handle .do files that start with "#!/" to specify an explicit interpreter.
minimal/do: don't print an error on exit if we don't build anything.
bash completions: also mark 'do' as a completable command.
bash completions: work correctly when $cur is an empty string.
bash completions: call redo-targets for a more complete list.
bash completions: work correctly with subdirs, ie. 'redo t/<tab>'
Sample bash completion rules for redo targets.
minimal/do: faster deletion of stamp files.
minimal/do: delete .tmp files if a build fails.
minimal/do: use ".did" stamp files instead of empty target files.
minimal/do: use posix shell features instead of dirname/basename.
Automatically select a good shell instead of relying on /bin/sh.
Conflicts:
t/clean.do
This includes a fairly detailed test of various known shell bugs from the
autoconf docs.
The idea here is that if redo works on your system, you should be able to
rely on a *good* shell to run your .do files; you shouldn't have to work
around zillions of bugs like autoconf does.
Previously, we would only search for default*.do in the same directory in
the target; now we search parent directories as well.
Let's say we're in a/b/ and trying to build foo.o. If we find
../../default.o.do, then we'll run
cd ../..; sh default.o.do a/b/foo .o $TMPNAME
In other words, we still always chdir to the same directory as the .do file.
But now $1 might have a path in it, not just a basename.
This could happen if you did 'redo foo foo'. Which nobody ever did, I
think, but let's make sure we catch it if they do.
One problem with having multiple locks on the same file is then you have to
remember not to *unlock* it until they're all done. But there are other
problems, such as: why the heck did we think it was a good idea to lock the
same file more than once? So just prevent it from happening for now,
unless/until we somehow come up with a reason it might be a good idea.
We can't just delete all the dependencies at the beginning and re-add them:
other people might be checking the same dependencies in parallel. Instead,
mark them as delete_me up front, and then after the build completes, remove
only the delete_me entries.
We called 'redo' instead of 'redo-ifchange' on our indeterminate objects.
Since other instances of redo-oob might be running at the same time, this
could cause the same object to get rebuilt more than once unnecessarily.
The unit tests caught this, I just didn't notice earlier.
If a depends on b depends on c, and c is dirty but b uses redo-stamp
checksums, then 'redo-ifchange a' is indeterminate: we won't know if we need
to run a.do unless we first build b, but the script that *normally* runs
'redo-ifchange b' is a.do, and we don't want to run that yet, because we
don't know for sure if b is dirty, and we shouldn't build a unless one of
its dependencies is dirty. Eek!
Luckily, there's a safe solution. If we *know* a is dirty - eg. because
a.do or one of its children has definitely changed - then we can just run
a.do immediately and there's no problem, even if b is indeterminate, because
we were going to run a.do anyhow.
If a's dependencies are *not* definitely dirty, and all we have is
indeterminate ones like b, then that means a's build process *hasn't
changed*, which means its tree of dependencies still includes b, which means
we can deduce that if we *did* run a.do, it would end up running b.do.
Since we know that anyhow, we can safely just run b.do, which will either
b.set_checked() or b.set_changed(). Once that's done, we can re-parse a's
dependencies and this time conclusively tell if it needs to be redone or
not. Even if it does, b is already up-to-date, so the 'redo-ifchange b'
line in a.do will be fast.
...now take all the above and do it recursively to handle nested
dependencies, etc, and you're done.
A new redo-stamp program takes whatever you give it as stdin and uses it to
calculate a checksum for the current target. If that checksum is the same
as last time, then we consider the target to be unchanged, and we set
checked_runid and stamp, but leave changed_runid alone. That will make
future callers of redo-ifchange see this target as unmodified.
However, this is only "half" support because by the time we run the .do
script that calls redo-stamp, it's too late; the caller is a dependant of
the stamped program, which is already being rebuilt, even if redo-stamp
turns out to say that this target is unchanged.
The other half is coming up.
This is slightly inelegant, as the old style
echo foo
echo blah
chmod a+x $3
doesn't work anymore; the stuff you wrote to stdout didn't end up in $3.
You can rewrite it as:
exec >$3
echo foo
echo blah
chmod a+x $3
Anyway, it's better this way, because now we can tell the difference between
a zero-length $3 and a nonexistent one. A .do script can thus produce
either one and we'll either delete the target or move the empty $3 to
replace it, whichever is right.
As a bonus, this simplifies our detection of whether you did something weird
with overlapping changes to stdout and $3.
Although we were deadlock-free before, under some circumstances we'd end up
holding a perfectly good token while in sync wait; that would reduce our
parallelism for no good reason. So give back our tokens before waiting for
anybody else.
That way the user can modify an auto-generated 'compile' script, for
example, and it'll stay modified.
If they delete the file, we can then generate it for them again.
Also, we have to warn whenever we're doing this, or people might think it's
a bug.
It's really a separate condition. And since we're not removing the target
*file* in case of error - we update it atomically, and keeping it is better
than losing it - there's no reason to wipe the timestamp in that case
either.
However, we do need to know that the build failed, so that anybody else
(especially in a parallel build) who looks at that target knows that it
died. So add a separate flag just for that.
This should reduce filesystem grinding a bit, and makes the code simpler.
It's also theoretically a bit more portable, since I'm guessing fifo
semantics aren't the same on win32 if we ever get there.
Also, a major problem with the old fifo-based system is that if a redo
process died without cleaning up after itself, it wouldn't delete its
lockfiles, so we had to wipe them all at the beginning of each build. Now
we don't; in theory, you can now have multiple copies of redo poking at the
same tree at the same time and not stepping on each other.
Just commit when we're about to do something blocking. sqlite goes a lot
faster with bigger transactions. This change does show a small percentage
speedup in tests, but not as much as I'd like.
It passes all tests when run serialized, but still gives weird errors
(OperationalError: database is locked) when run with -j5. sqlite3 shouldn't
be barfing just because the database is locked, since the default timeout is
5 seconds, and it's dying *way* faster than that.
This allows files to transition from generated to not-generated if the .do
file is ever removed (ie. the user is changing things and the file is now a
source file, not a target).
So if you have a default.do, it might be used to build mydir, even if mydir
already existed when we started.
This might be useful if you have a mydir.setup.do and a mydir.do; mydir.do
depends on mydir.setup.do, but mydir.setup.do creates mydir, it just doesn't
*finish* with mydir. Still, my the time mydir.do runs, mydir already
exists, and redo would get confused.
I think directories are fundamentally special in this way, because it makes
sense to "create" a directory even if that directory isn't "done" at this
phase.
The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty
complicated. In this case, we accidentally assumed that the current
instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so
the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the
current .do script did chdir().
The correct answer is REDO_STARTDIR+getcwd()+t.
If a and b both depend on c, and c is a static (non-generated) file that has
changed since the last successful build of a and b, we would try to redo
a, but would forget to redo b. Now it does both.
If a file previously was generated but now isn't (ie. its .do file
disappears), we would never re-stamp that target, and so all its
dependencies would rebuild continually.
It actually decreases readability of the .do files - by not making it
explicit when you're going into a subdir.
Plus it adds ambiguity: what if there's a dirname.do *and* a dirname/all?
We could resolve the ambiguity if we wanted, but that adds more code, while
taking out this special case makes *less* code and improves readability.
I think it's the right way to go.
Normally, creating the target $1 yourself is bad; create $3 instead. But if
$1 is a directory, we'll allow it. That way 'redo subdir' can call
subdir.do, and subdir.do can both create the directory *and* run a bunch of
sub-.do files on it.
...because we deliberately stamp non-generated files as well, and that
doesn't need to imply that we rebuilt them just now. In fact, we know for a
fact that we *didn't* rebuild them just now, but we still need to record the
timestamp for later.
.do files should never modify $1, and should write to *either* $3 or stdout,
but not both. If they write to both, it's probably because they forgot to
redirect stdout to stderr, a very easy mistake to make but a hard one to
detect.
Now redo detects it for you and prints an informative message.