Commit graph

78 commits

Author SHA1 Message Date
Avery Pennarun
b2411fe483 redo-log: capture and linearize the output of redo builds.
redo now saves the stderr from every .do script, for every target, into
a file in the .redo directory.  That means you can look up the logs
from the most recent build of any target using the new redo-log
command, for example:

	redo-log -r all

The default is to show logs non-recursively, that is, it'll show when a
target does redo-ifchange on another target, but it won't recurse into
the logs for the latter target.  With -r (recursive), it does.  With -u
(unchanged), it does even if redo-ifchange discovered that the target
was already up-to-date; in that case, it prints the logs of the *most
recent* time the target was generated.

With --no-details, redo-log will show only the 'redo' lines, not the
other log messages.  For very noisy build systems (like recursing into
a 'make' instance) this can be helpful to get an overview of what
happened, without all the cruft.

You can use the -f (follow) option like tail -f, to follow a build
that's currently in progress until it finishes.  redo itself spins up a
copy of redo-log -r -f while it runs, so you can see what's going on.

Still broken in this version:

- No man page or new tests yet.

- ANSI colors don't yet work (unless you use --raw-logs, which gives
  the old-style behaviour).

- You can't redirect the output of a sub-redo to a file or a
  pipe right now, because redo-log is eating it.

- The regex for matching 'redo' lines in the log is very gross.
  Instead, we should put the raw log files in a more machine-parseable
  format, and redo-log should turn that into human-readable format.

- redo-log tries to "linearize" the logs, which makes them
  comprehensible even for a large parallel build.  It recursively shows
  log messages for each target in depth-first tree order (by tracing
  into a new target every time it sees a 'redo' line).  This works
  really well, but in some specific cases, the "topmost" redo instance
  can get stuck waiting for a jwack token, which makes it look like the
  whole build has stalled, when really redo-log is just waiting a long
  time for a particular subprocess to be able to continue.  We'll need to
  add a specific workaround for that.
2018-11-17 10:27:43 -05:00
Avery Pennarun
bb80118298 Cyclic dependency checker: don't give up token in common case.
The way the code was written, we'd give up our token, detect a cyclic
dependency, and then try to get our token back before exiting.  Even
with -j1, the temporary token release allowed any parent up the tree to
continue running jobs, so it would take an arbitrary amount of time
before we could exit (and report an error code to the parent).

There was no visible symptom of this except that, with -j1, t/355-deps-cyclic
would not finish until some of the later tests finished, which was
surprising.

To fix it, let's just check for a cyclic dependency first, then release
the token only once we're sure things are sane.
2018-11-13 07:00:09 -05:00
Avery Pennarun
2a936a7574 Print a nicer error message when asked to build an empty string ('').
This happens sometimes, for example, if you do
	whatever | while read x; do
		redo-ifchange "$x"
	done
and the input contains blank lines.

We could ignore the request for blankness, but it seems like that
situation might indicate a more serious bug in your parser, so it's
probably better to just abort with a meaningful error.
2018-11-03 22:02:26 -04:00
Avery Pennarun
e40dc5bad2 redo-whichdo: fix a bug where the last dir was checked twice, and add tests.
When we can't find a .do file, we walk all the way back to the root
directory.  When that happens, the root directory is actually searched
twice.  This is harmless (since a .do file doesn't exist there anyway)
but causes redo-whichdo to produce the wrong output.

Also, add a test, which I forgot to do when writing whichdo in the
first place.

To make the test work from the root directory, we need a way to
initialize redo without actually creating a .redo directory.  Add a
init_no_state() function for that purpose, and split the necessary path
functions into their own module so we can avoid importing builder.py.
2018-11-02 02:20:52 -04:00
Avery Pennarun
711b05766f Print a better message when detecting pre-existing cyclic dependencies.
We already printed an error at build time, but added the broken
dependency anyway.  If the .do script decided to succeed despite
redo-ifchange aborting, the target would be successfully created
and we'd end up with an infinite loop when running isdirty() later.

The result was still "correct", because python helpfully aborted
the infinite loop after the recursion got too deep.  But let's
explicitly detect it and print a better error message.

(Thanks to Nils Dagsson Moskopp's redo-testcases repo for exposing this
problem.  If you put a #!/bin/sh header on your .do script means you
need to run 'set -e' yourself if you want .do scripts to abort after an
error, which you almost always do, and those testcases don't, which
exposed this bug if you ran the tests twice.)
2018-11-02 02:20:52 -04:00
Avery Pennarun
52236a3aed builder.py: don't spin while fighting for a file lock.
If we end up in builder phase 2, where we might need to
build stuff that was previously locked by someone else,
we will need to obtain a job token *and* the lock at the
same time in order to continue.  To prevent deadlocks,
we don't wait synchronously for one lock while holding the
other.

If several instances are fighting over the same lock and
there are insufficient job tokens for everyone, timing
could cause them to fight for a long time.  This seems
to happen a lot in freebsd for some reason.  To be a good
citizen, sleep for a while after each loop iteration.
This should ensure that eventually, most of the fighting
instances will be asleep by the time the next one tries to
grab the token, thus breaking the deadlock.
2018-10-29 07:31:28 +00:00
Avery Pennarun
887df98ead builder.py: refresh the File object after obtaining the lock.
We need to create the File object to get its f.id, then lock that id.
During that gap, another instance of redo may have modified the file or
its state data, so we have to refresh it.

This fixes 'redo -j10 t/stress'.
2018-10-13 01:37:08 -04:00
Avery Pennarun
d8811601f1 Better logging for 'manual override' detection.
The first time we notice a file has been overridden, log the old and
new stamp data, which might give a hint about how this happened.

Currently if I do
    rm t/950-curse/countall
    while :; do redo -j10 t/950-curse/all --shuffle || break; done

it will end up complaining that countall has been overridden within
just a few runs, even though it definitely hasn't been.  There seems to
be someone reading a file stamp while someone else is redoing the
file, but I haven't found it yet.
2018-10-12 05:20:35 -04:00
Alan Falloon
9354e78871 Null out lock vars so that __del__ gets called
The builder was holding lock variables in the loop which means that
sometimes a state.Lock object would be created for the same file-id
twice, triggering the assertion. Assign the lock variables to None to
ensure that the state.Lock objects are destroyed before creating the
next one in the loop.
2018-10-11 23:15:37 -04:00
Alan Falloon
f4b4c400b2 Handle errors on rename of target file.
[apenwarr: this is the remaining part after part of the original was
included in someone else's separate patch.]
2018-10-11 23:12:07 -04:00
Robert L. Bocchino Jr
7dd63efb37 Add cyclic dependence detection.
If a depends on b which depends on a, redo would just freeze.  Now it
aborts with a somewhat helpful error message.

[Updated by apenwarr for coding style and to add a test.]
2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
63f9dcb640 Remove deprecated old-args feature. 2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
f739a0fc6e Fix mtime/ctime bug
[apenwarr's note: ctime includes extra inode attributes like link
count, which are not important for this check, but which could cause
spurious warnings.]
2018-10-11 03:28:05 -04:00
Travis Cross
cb713bdace Restore SIGPIPE default action before exec(3)
Python chooses to ignore SIGPIPE, however most unix processes expect
to terminate on the signal.  Therefore failing to restore the default
action results in surprising behavior.  For example, we expect
`dd if=/dev/zero | head -c1` to return immediately.  However, prior to
this commit, that pipeline would hang forever.  Insidious forms of
data corruption or loss were also possible.

See:

  http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html
  http://blog.nelhage.com/2010/02/a-very-subtle-bug/
2018-10-11 03:28:05 -04:00
Avery Pennarun
613625b580 Add more assertions about uncommitted sqlite transactions.
I think we were sometimes leaving half-done sqlite transactions sitting
around for a long time (eg. across sub-calls to .do files).  This
seemed to be okay on Linux, but caused sqlite deadlocks on MacOS.  Most
likely it's not the operating system, but the sqlite version and
journal mode in use.

In any case, the correct thing to do is to actually commit or rollback
transactions, not leave them hanging around.

...unfortunately this doesn't actually fix my MacOS deadlocks, which
makes me rather nervous.
2018-10-06 05:06:19 -04:00
Avery Pennarun
74f968d6ca Correctly report error when target dir does not exist.
If ./default.do knows how to build x/y/z, then we will run
	./default.do x/y/z x/y/z x__y__z.redo2.tmp
which can correctly generate $3, but then we can fail to rename it to
x/y/z because x/y doesn't exist.  This would previously through an
exception.  Now it prints a helpful error message.

default.do may create x/y, in which case renaming will succeed.
2018-10-06 02:38:32 -04:00
Avery Pennarun
34669fba65 Use os.lstat() instead of os.stat().
I think this aligns better with how redo works.  Otherwise, if a.do
creates a as a symlink, then changes to the symlink's *target* will
change a's stat/stamp information without re-running a.do, which looks
to redo like you modified a by hand, which causes it to stop running
a.do altogether.

With this change, modifications to a's target are okay, but they don't
trigger any redo dependency changes.  If you want that, then a.do
should redo-ifchange on its symlink target explicitly.
2018-10-06 00:14:02 -04:00
Avery Pennarun
61d35d3972 redo-whichdo: a command that explains the .do search path for a target.
For example:

$ redo-whichdo a/b/c/.x.y

- a/b/c.x.y.do
- a/b/default.x.y.do
- a/b/default.y.do
- a/b/default.do
- a/default.x.y.do
- a/default.y.do
- a/default.do
- default.x.y.do
- default.y.do
+ default.do
1 a/b/c.x.y
2 a/b/c.x.y

Lines starting with '-' mean a potential .do file that did not exist,
so we moved onto the next choice (but consider using redo-ifcreate in
case it gets created).  '+' means the .do file we actually chose.  '1'
and '2' are the $1 and $2 to pass along to the given .do file if you want to
call it for the given target.

(The output format is a little weird to make sure it's parseable with
sh 'read x y' calls, even when filenames contain spaces or special
characters.)
2018-10-04 20:20:53 -04:00
Avery Pennarun
21f88094d5 Change definitions of $1,$2,$3 to match djb's redo.
If you use "redo --old-args", it will switch back to the old
(apenwarr-style) arguments for now, to give you time to update your .do
scripts.  This option will go away eventually.

Note: minimal/do doesn't understand the --old-args option.  If you're using
minimal/do in your project, keep using the old one until you update your use
of $1/$2, and then update to the new one.

apenwarr-style default.o.do:
   $1      foo
   $2      .o
   $3      whatever.tmp

djb-style default.o.do:
   $1      foo.o
   $2      foo
   $3      whatever.tmp

apenwarr-style foo.o.do:
   $1      foo.o
   $2      ""
   $3      whatever.tmp

djb-style foo.o.do:
   $1      foo.o
   $2      foo.o  (I think?)
   $3      whatever.tmp
2011-12-31 02:49:39 -05:00
Avery Pennarun
0a5f424ef8 User-overridden targets stay overridden even if the last build failed.
Previously, if 'redo-ifchange foo' failed last time, then creating foo
manually wouldn't help; 'redo-ifchange foo' would still try to rebuild it.
But if the first run *did* create it, then manually overriding it *did*
work.

That inconsistency is pointless.  If the user creates it by hand, it doesn't
matter if it failed to build last time or not; the user wants it overridden.
So this way, something that can't build can at least be manually created as
a hack.
2011-03-30 23:35:20 -04:00
Avery Pennarun
2efbbc26b9 Don't crash on targets in directories that don't exist yet.
The reason we'd crash is that we tried to pre-create a file called
$target.redo.tmp, which wouldn't work because the directory containing
$target didn't exist.

We now try to generate a smarter filename by using the innermost directory
of target that *does* exist.  It's a little messy, but the idea is to make
sure we won't have to rename() across a filesystem boundary if, for example,
there's a mounted filesystem in the middle of the hierarchy somewhere.
2011-03-22 23:00:34 -07:00
Avery Pennarun
4bf569c2a4 builder.py: detect overrides by only ctime, not all of struct stat.
We were accidentally including things like the atime in the comparison,
which is obviously silly; someone reading the file shouldn't mark it as a
manual override.
2011-03-10 14:36:45 -08:00
Avery Pennarun
c1a1f32445 MacOS: "-e filename/." returns true even for non-directories.
This has something to do with resource forks.  So use "-d filename/."
instead, which returns false if filename is not a directory.
2011-03-05 19:03:30 -08:00
Avery Pennarun
ea7057d9b6 redo-ifchange: remove special case for zero arguments.
Not sure why I put there, but special cases aren't worth the hassle.
2011-02-21 03:55:18 -08:00
Avery Pennarun
c077d77285 builder.py: correctly set $3 to include the subdir path.
If we're using a .do file from a parent directory, we should set $3 using
the same path prefix as $1.  We were previously using just the basename,
which mostly works (since we would rename it to $1$2 eventually anyway) but
is not quite right, and you can't safely rename files across filesystems, so
it could theoretically cause problems.

Also improved t/defaults-nested to test for this behaviour.

Reported by Eric Kow.
2011-01-18 00:48:52 -08:00
Avery Pennarun
4c06332ea1 builder.py: we weren't stamping .do files correctly if dodir!='.'.
The result was that t/deps/dirtest was actually failing in some cases.  But
it wasn't failing quite reliably enough, because the failing test was
dirtest/dir1/all, which has the same name as some other 'all' files,
confusing the issue.  Renamed dirtest/dir1/all.do to dirtest/dir1/go.do instead.

Reported by Prakhar Goel and Berke Durak.
2011-01-18 00:48:51 -08:00
Avery Pennarun
e98696caef Merge branch 'master' into search-parent-dirs
* master:
  Fixed markdown errors in README - code samples now correctly formatted.
  Fix use of config.sh in example
  log.py, minimal/do: don't use ansi colour codes if $TERM is blank or 'dumb'
  Use named constants for terminal control codes.
  redo-sh: keep testing even after finding a 'good' shell.
  redo-sh.do: hide warning output from 'which' in some shells.
  redo-sh.do: wrap long lines.
  Handle .do files that start with "#!/" to specify an explicit interpreter.
  minimal/do: don't print an error on exit if we don't build anything.
  bash completions: also mark 'do' as a completable command.
  bash completions: work correctly when $cur is an empty string.
  bash completions: call redo-targets for a more complete list.
  bash completions: work correctly with subdirs, ie. 'redo t/<tab>'
  Sample bash completion rules for redo targets.
  minimal/do: faster deletion of stamp files.
  minimal/do: delete .tmp files if a build fails.
  minimal/do: use ".did" stamp files instead of empty target files.
  minimal/do: use posix shell features instead of dirname/basename.
  Automatically select a good shell instead of relying on /bin/sh.

Conflicts:
	t/clean.do
2011-01-15 16:00:12 -08:00
Avery Pennarun
f641e52e3b Handle .do files that start with "#!/" to specify an explicit interpreter.
Now you can have your .do files interpreted by whatever interpreter you
want.
2011-01-01 22:10:14 -08:00
Avery Pennarun
fb388b3dde Automatically select a good shell instead of relying on /bin/sh.
This includes a fairly detailed test of various known shell bugs from the
autoconf docs.

The idea here is that if redo works on your system, you should be able to
rely on a *good* shell to run your .do files; you shouldn't have to work
around zillions of bugs like autoconf does.
2010-12-21 04:44:39 -08:00
Avery Pennarun
0dcc3f61b6 Search parent directories for default*.do.
Previously, we would only search for default*.do in the same directory in
the target; now we search parent directories as well.

Let's say we're in a/b/ and trying to build foo.o.  If we find
../../default.o.do, then we'll run

	cd ../..; sh default.o.do a/b/foo .o $TMPNAME

In other words, we still always chdir to the same directory as the .do file.
But now $1 might have a path in it, not just a basename.
2010-12-19 05:58:49 -08:00
Avery Pennarun
df85b3d163 Move dependency checking from redo-ifchange into deps.py.
In preparation for sharing between multiple commands.
2010-12-19 03:50:38 -08:00
Avery Pennarun
db4c4fc17a Rename redo-oob to redo-unlocked, to more accurately represent its use.
It's still undocumented.  Because you shouldn't run it by hand.  So don't!
It's dangerous!
2010-12-19 01:20:13 -08:00
Avery Pennarun
294945bd0f Assert that one instance never holds multiple locks on the same file at once.
This could happen if you did 'redo foo foo'.  Which nobody ever did, I
think, but let's make sure we catch it if they do.

One problem with having multiple locks on the same file is then you have to
remember not to *unlock* it until they're all done.  But there are other
problems, such as: why the heck did we think it was a good idea to lock the
same file more than once?  So just prevent it from happening for now,
unless/until we somehow come up with a reason it might be a good idea.
2010-12-14 02:19:08 -08:00
Avery Pennarun
c64b8a3eb1 Fix a race condition caused by zap_deps().
We can't just delete all the dependencies at the beginning and re-add them:
other people might be checking the same dependencies in parallel.  Instead,
mark them as delete_me up front, and then after the build completes, remove
only the delete_me entries.
2010-12-11 22:59:55 -08:00
Avery Pennarun
49f0a041b2 clean.do: cleanup *.tmp files that might have been left lying around.
...and fix a bug where builder.py can't handle it if its temp file is
deleted out from under it.
2010-12-11 21:10:57 -08:00
Avery Pennarun
e18fa85d58 The only thing in helpers.py that needed vars.py was the log stuff.
So put it in its own file.  Now it's safer to import and use helpers even if
you can't safely touch vars.
2010-12-11 18:34:02 -08:00
Avery Pennarun
91630a892a Whoops, redo-oob was slightly wrong when used with -j.
We called 'redo' instead of 'redo-ifchange' on our indeterminate objects.
Since other instances of redo-oob might be running at the same time, this
could cause the same object to get rebuilt more than once unnecessarily.
The unit tests caught this, I just didn't notice earlier.
2010-12-11 05:54:39 -08:00
Avery Pennarun
f702417ef3 The second half of redo-stamp: out-of-order building.
If a depends on b depends on c, and c is dirty but b uses redo-stamp
checksums, then 'redo-ifchange a' is indeterminate: we won't know if we need
to run a.do unless we first build b, but the script that *normally* runs
'redo-ifchange b' is a.do, and we don't want to run that yet, because we
don't know for sure if b is dirty, and we shouldn't build a unless one of
its dependencies is dirty.  Eek!

Luckily, there's a safe solution.  If we *know* a is dirty - eg. because
a.do or one of its children has definitely changed - then we can just run
a.do immediately and there's no problem, even if b is indeterminate, because
we were going to run a.do anyhow.

If a's dependencies are *not* definitely dirty, and all we have is
indeterminate ones like b, then that means a's build process *hasn't
changed*, which means its tree of dependencies still includes b, which means
we can deduce that if we *did* run a.do, it would end up running b.do.

Since we know that anyhow, we can safely just run b.do, which will either
b.set_checked() or b.set_changed().  Once that's done, we can re-parse a's
dependencies and this time conclusively tell if it needs to be redone or
not.  Even if it does, b is already up-to-date, so the 'redo-ifchange b'
line in a.do will be fast.

...now take all the above and do it recursively to handle nested
dependencies, etc, and you're done.
2010-12-11 05:54:39 -08:00
Avery Pennarun
22617d335c Half-support for using file checksums instead of stamps.
A new redo-stamp program takes whatever you give it as stdin and uses it to
calculate a checksum for the current target.  If that checksum is the same
as last time, then we consider the target to be unchanged, and we set
checked_runid and stamp, but leave changed_runid alone.  That will make
future callers of redo-ifchange see this target as unmodified.

However, this is only "half" support because by the time we run the .do
script that calls redo-stamp, it's too late; the caller is a dependant of
the stamped program, which is already being rebuilt, even if redo-stamp
turns out to say that this target is unchanged.

The other half is coming up.
2010-12-11 05:54:37 -08:00
Avery Pennarun
59201dd7a0 $3 and stdout no longer refer to the same file.
This is slightly inelegant, as the old style
	echo foo
	echo blah
	chmod a+x $3

doesn't work anymore; the stuff you wrote to stdout didn't end up in $3.
You can rewrite it as:
	exec >$3
	echo foo
	echo blah
	chmod a+x $3

Anyway, it's better this way, because now we can tell the difference between
a zero-length $3 and a nonexistent one.  A .do script can thus produce
either one and we'll either delete the target or move the empty $3 to
replace it, whichever is right.

As a bonus, this simplifies our detection of whether you did something weird
with overlapping changes to stdout and $3.
2010-12-11 00:29:04 -08:00
Avery Pennarun
c4be0050f7 Release the jwack token when doing a synchronous lock wait.
Although we were deadlock-free before, under some circumstances we'd end up
holding a perfectly good token while in sync wait; that would reduce our
parallelism for no good reason.  So give back our tokens before waiting for
anybody else.
2010-12-10 23:04:46 -08:00
Avery Pennarun
f6d11d5411 If a user manually changes a generated file, don't ever overwrite it.
That way the user can modify an auto-generated 'compile' script, for
example, and it'll stay modified.

If they delete the file, we can then generate it for them again.

Also, we have to warn whenever we're doing this, or people might think it's
a bug.
2010-12-10 22:43:11 -08:00
Avery Pennarun
0126f6be1e Don't wipe the timestamp when a target fails to redo.
It's really a separate condition.  And since we're not removing the target
*file* in case of error - we update it atomically, and keeping it is better
than losing it - there's no reason to wipe the timestamp in that case
either.

However, we do need to know that the build failed, so that anybody else
(especially in a parallel build) who looks at that target knows that it
died.  So add a separate flag just for that.
2010-12-10 22:41:11 -08:00
Avery Pennarun
16bebd21b5 builder: the (WAITING) message from --debug-locks didn't print every time.
This was misleading; we end up waiting synchronously for a lock more often
than I thought, and it really does slow down builds.
2010-12-10 22:39:25 -08:00
Avery Pennarun
f70c028a8a With --debug-locks, print a message when we stop to wait on a lock.
Helps in seeing why a particular process might be stopped, and in detecting
potential reasons that parallelism might be reduced.
2010-12-10 04:31:22 -08:00
Avery Pennarun
84169c5d27 Change locking stuff from fifos to fcntl.lockf().
This should reduce filesystem grinding a bit, and makes the code simpler.
It's also theoretically a bit more portable, since I'm guessing fifo
semantics aren't the same on win32 if we ever get there.

Also, a major problem with the old fifo-based system is that if a redo
process died without cleaning up after itself, it wouldn't delete its
lockfiles, so we had to wipe them all at the beginning of each build.  Now
we don't; in theory, you can now have multiple copies of redo poking at the
same tree at the same time and not stepping on each other.
2010-12-10 03:55:51 -08:00
Avery Pennarun
6e6e453908 Some speedups for doing redo-ifchange on a large number of static files.
Fix some wastage revealed by the (almost useless, sigh) python profiler.
2010-12-10 00:50:53 -08:00
Avery Pennarun
e446d4dd04 builder.py: don't import the 'random' module unless we need it.
Initializing the random number generator involves some pointless reading
from /dev/urandom.
2010-12-10 00:50:53 -08:00
Avery Pennarun
3ef2bd7300 Don't check as often whether the .redo directory exists.
Just check it once after running a subprocess: that's the only way it ought
to be able to disappear (ie. in a 'make clean' setup).
2010-12-10 00:50:52 -08:00
Avery Pennarun
29d6c9a746 Don't db.commit() so frequently.
Just commit when we're about to do something blocking.  sqlite goes a lot
faster with bigger transactions.  This change does show a small percentage
speedup in tests, but not as much as I'd like.
2010-12-10 00:50:52 -08:00