Commit graph

366 commits

Author SHA1 Message Date
Avery Pennarun
2706525fc0 redo-stamp: print a helpful message if stdin is a tty.
Otherwise your redo process might just freeze in the middle, and you'll
wonder why.
2010-12-11 18:13:58 -08:00
Avery Pennarun
0da5c7c082 Add a redo-always command: it adds an "always dirty" dependency to your target.
This is mostly useless except when combined with redo-stamp... I think.
2010-12-11 07:02:45 -08:00
Avery Pennarun
1d26d99e0c Fix a deadlock with redo-oob.
If a checksummed target A used to exist but is now missing, and we tried to
redo-ifchange that exact file, we would unnecessarily run 'redo-oob A A';
that is, we have to build A in order to determine if A needs to be built.

The sub-targets of redo-oob aren't run with REDO_UNLOCKED, so this would
deadlock instantly.

Add an assertion to redo-oob to ensure we never try to redo-ifchange the
primary target (thus converting the deadlock into an exception).  And skip
doing redo-oob when the target is already the same as the thing we have to
check.
2010-12-11 06:16:32 -08:00
Avery Pennarun
91630a892a Whoops, redo-oob was slightly wrong when used with -j.
We called 'redo' instead of 'redo-ifchange' on our indeterminate objects.
Since other instances of redo-oob might be running at the same time, this
could cause the same object to get rebuilt more than once unnecessarily.
The unit tests caught this, I just didn't notice earlier.
2010-12-11 05:54:39 -08:00
Avery Pennarun
e7f7119f2e If a checksummed file is deleted, we should still use redo-oob.
We were giving up and rebuilding the toplevel object, which did eventually
rebuild our checksummed file, but then the file turned out to be identical
to what it was before, so that nobody *else* who depended on it ended up
getting rebuilt.  So the results were indeterminate.

Now we treat it as if its dirtiness is unknown, so we build it using
redo-oob before building any of its dependencies.
2010-12-11 05:54:39 -08:00
Avery Pennarun
f702417ef3 The second half of redo-stamp: out-of-order building.
If a depends on b depends on c, and c is dirty but b uses redo-stamp
checksums, then 'redo-ifchange a' is indeterminate: we won't know if we need
to run a.do unless we first build b, but the script that *normally* runs
'redo-ifchange b' is a.do, and we don't want to run that yet, because we
don't know for sure if b is dirty, and we shouldn't build a unless one of
its dependencies is dirty.  Eek!

Luckily, there's a safe solution.  If we *know* a is dirty - eg. because
a.do or one of its children has definitely changed - then we can just run
a.do immediately and there's no problem, even if b is indeterminate, because
we were going to run a.do anyhow.

If a's dependencies are *not* definitely dirty, and all we have is
indeterminate ones like b, then that means a's build process *hasn't
changed*, which means its tree of dependencies still includes b, which means
we can deduce that if we *did* run a.do, it would end up running b.do.

Since we know that anyhow, we can safely just run b.do, which will either
b.set_checked() or b.set_changed().  Once that's done, we can re-parse a's
dependencies and this time conclusively tell if it needs to be redone or
not.  Even if it does, b is already up-to-date, so the 'redo-ifchange b'
line in a.do will be fast.

...now take all the above and do it recursively to handle nested
dependencies, etc, and you're done.
2010-12-11 05:54:39 -08:00
Avery Pennarun
1355ade7c7 Correctly handle a checksummed file that depends on a non-checksummed file.
We were rebuilding the checksummed file every time because redo-ifchange was
incorrectly assuming that a child's changed_runid that's greater than my
changed_runid means I'm dirty.  But if my checked_runid is >= the child's
checked_runid, then I'm clean, because my checksum didn't change.

Clear as mud?
2010-12-11 05:54:39 -08:00
Avery Pennarun
22617d335c Half-support for using file checksums instead of stamps.
A new redo-stamp program takes whatever you give it as stdin and uses it to
calculate a checksum for the current target.  If that checksum is the same
as last time, then we consider the target to be unchanged, and we set
checked_runid and stamp, but leave changed_runid alone.  That will make
future callers of redo-ifchange see this target as unmodified.

However, this is only "half" support because by the time we run the .do
script that calls redo-stamp, it's too late; the caller is a dependant of
the stamped program, which is already being rebuilt, even if redo-stamp
turns out to say that this target is unchanged.

The other half is coming up.
2010-12-11 05:54:37 -08:00
Avery Pennarun
ca67f5e71a redo-ifchange: fix relative pathnames printed in debug messages. 2010-12-11 02:15:42 -08:00
Avery Pennarun
59201dd7a0 $3 and stdout no longer refer to the same file.
This is slightly inelegant, as the old style
	echo foo
	echo blah
	chmod a+x $3

doesn't work anymore; the stuff you wrote to stdout didn't end up in $3.
You can rewrite it as:
	exec >$3
	echo foo
	echo blah
	chmod a+x $3

Anyway, it's better this way, because now we can tell the difference between
a zero-length $3 and a nonexistent one.  A .do script can thus produce
either one and we'll either delete the target or move the empty $3 to
replace it, whichever is right.

As a bonus, this simplifies our detection of whether you did something weird
with overlapping changes to stdout and $3.
2010-12-11 00:29:04 -08:00
Avery Pennarun
c4be0050f7 Release the jwack token when doing a synchronous lock wait.
Although we were deadlock-free before, under some circumstances we'd end up
holding a perfectly good token while in sync wait; that would reduce our
parallelism for no good reason.  So give back our tokens before waiting for
anybody else.
2010-12-10 23:04:46 -08:00
Avery Pennarun
f6d11d5411 If a user manually changes a generated file, don't ever overwrite it.
That way the user can modify an auto-generated 'compile' script, for
example, and it'll stay modified.

If they delete the file, we can then generate it for them again.

Also, we have to warn whenever we're doing this, or people might think it's
a bug.
2010-12-10 22:43:11 -08:00
Avery Pennarun
0126f6be1e Don't wipe the timestamp when a target fails to redo.
It's really a separate condition.  And since we're not removing the target
*file* in case of error - we update it atomically, and keeping it is better
than losing it - there's no reason to wipe the timestamp in that case
either.

However, we do need to know that the build failed, so that anybody else
(especially in a parallel build) who looks at that target knows that it
died.  So add a separate flag just for that.
2010-12-10 22:41:11 -08:00
Avery Pennarun
16bebd21b5 builder: the (WAITING) message from --debug-locks didn't print every time.
This was misleading; we end up waiting synchronously for a lock more often
than I thought, and it really does slow down builds.
2010-12-10 22:39:25 -08:00
Avery Pennarun
b1bb48a029 Merge branch 'sqlite'
This replaces the .redo state directory with an sqlite database instead,
improving correctness and sometimes performance.
2010-12-10 05:43:47 -08:00
Avery Pennarun
18b5263db7 jwack: fix a typo in the "wrong number of tokens on exit" error.
Not that we ever see that error, except when I'm screwing around.
2010-12-10 05:19:49 -08:00
Avery Pennarun
49ebea445f jwack: don't ever set the jobserver socket to O_NONBLOCK.
It creates a race condition: GNU Make might try to read while the socket is
O_NONBLOCK, get EAGAIN, and die; or else another redo might set it back to
blocking in between our call to make it O_NONBLOCK and our call to read().

This method - setting an alarm() during the read - is hacky, but should work
every time.  Unfortunately you get a 1s delay - rarely - when this happens.
The good news is it only happens when there are no tokens available anyhow,
so it won't affect performance much in any situation I can imagine.
2010-12-10 04:57:13 -08:00
Avery Pennarun
f70c028a8a With --debug-locks, print a message when we stop to wait on a lock.
Helps in seeing why a particular process might be stopped, and in detecting
potential reasons that parallelism might be reduced.
2010-12-10 04:31:22 -08:00
Avery Pennarun
675a5106d2 dup() the jobserver fds to 100,101 to make debugging a bit easier.
Now if a process is stuck waiting on one of those fds, it'll be obvious from
the strace.
2010-12-10 04:11:44 -08:00
Avery Pennarun
84169c5d27 Change locking stuff from fifos to fcntl.lockf().
This should reduce filesystem grinding a bit, and makes the code simpler.
It's also theoretically a bit more portable, since I'm guessing fifo
semantics aren't the same on win32 if we ever get there.

Also, a major problem with the old fifo-based system is that if a redo
process died without cleaning up after itself, it wouldn't delete its
lockfiles, so we had to wipe them all at the beginning of each build.  Now
we don't; in theory, you can now have multiple copies of redo poking at the
same tree at the same time and not stepping on each other.
2010-12-10 03:55:51 -08:00
Avery Pennarun
10afd9000f Add some conditionals around some high-bandwidth debug statements.
When you have lots of unmodified dependencies, building these printout
strings (which aren't even printed unless you're using -d) ends up taking
something like 5% of the runtime.
2010-12-10 00:50:53 -08:00
Avery Pennarun
6e6e453908 Some speedups for doing redo-ifchange on a large number of static files.
Fix some wastage revealed by the (almost useless, sigh) python profiler.
2010-12-10 00:50:53 -08:00
Avery Pennarun
b5c02e410e state.py: reorder things so sqlite never does fdatasync().
It was briefly synchronous at data creation time, adding a few ms to
redo startup.
2010-12-10 00:50:53 -08:00
Avery Pennarun
e446d4dd04 builder.py: don't import the 'random' module unless we need it.
Initializing the random number generator involves some pointless reading
from /dev/urandom.
2010-12-10 00:50:53 -08:00
Avery Pennarun
e1a0fc9c12 state.File.is_checked() was being too paranoid.
It wasn't allowing us to short circuit a dependency if that dependency had
been built previously, but that was already being checked (more correctly)
in dirty_deps().
2010-12-10 00:50:52 -08:00
Avery Pennarun
94cecc240b Don't abort if 'insert into Files' gives an IntegrityError.
It can happen occasionally if some other parallel redo adds the same file at
the same time.
2010-12-10 00:50:52 -08:00
Avery Pennarun
3ef2bd7300 Don't check as often whether the .redo directory exists.
Just check it once after running a subprocess: that's the only way it ought
to be able to disappear (ie. in a 'make clean' setup).
2010-12-10 00:50:52 -08:00
Avery Pennarun
29d6c9a746 Don't db.commit() so frequently.
Just commit when we're about to do something blocking.  sqlite goes a lot
faster with bigger transactions.  This change does show a small percentage
speedup in tests, but not as much as I'd like.
2010-12-10 00:50:52 -08:00
Avery Pennarun
fb79851530 Calculate dependencies with fewer sqlite queries. 2010-12-10 00:50:52 -08:00
Avery Pennarun
c339359f04 Schema cleanup. 2010-12-10 00:50:52 -08:00
Avery Pennarun
b86a32d33d flush-cache.sh: for speed, disable sqlite's synchronous mode. 2010-12-10 00:50:52 -08:00
Avery Pennarun
f4535be0cd Fix a deadlock.
We were holding a database open with a read lock while a child redo might
need to open it with a write lock.
2010-12-10 00:50:52 -08:00
Avery Pennarun
9e36106642 sqlite3: configure the timeout explicitly.
In flush-cache.sh, we have to do this, because the sqlite3 command-line tool
sets it to zero.  Inevitably during parallel testing, it'll end up
contending for a lock, and we really want it to wait a bit.

In state.py, it's not as important since the default is nonzero.  But
python-sqlite3's default of 5 seconds makes me a little too nervous; I can
imagine a disk write waiting for more than 5 seconds sometime.  So let's use
60 instead.
2010-12-10 00:50:52 -08:00
Avery Pennarun
a62bd50d44 Switch state.py to use sqlite3 instead of filesystem-based stamps.
It passes all tests when run serialized, but still gives weird errors
(OperationalError: database is locked) when run with -j5.  sqlite3 shouldn't
be barfing just because the database is locked, since the default timeout is
5 seconds, and it's dying *way* faster than that.
2010-12-10 00:50:52 -08:00
Avery Pennarun
8dad223225 flush-cache: run it as a separate program, not using 'source'
That way it doesn't clutter up 'redo -x' as much.
2010-12-10 00:50:52 -08:00
Avery Pennarun
43b74f3220 builder._nice(): show the right filename in the case of chdir().
This only affects cosmetics, not actual behaviour, which is why the unit
tests didn't catch it.
2010-12-10 00:49:30 -08:00
Avery Pennarun
51bbdc6c5a If we can't find a .do file for a target, mark it as not is_generated.
This allows files to transition from generated to not-generated if the .do
file is ever removed (ie. the user is changing things and the file is now a
source file, not a target).
2010-12-06 03:12:53 -08:00
Avery Pennarun
0979a6e666 t/passfailtest.do: just return exit codes, don't print messages.
The exit code numbers are useful enough, and the messages are the sort of
thing that might turn into lies eventually.
2010-12-06 03:12:02 -08:00
Avery Pennarun
b3a14a28c4 When -x or -v is given, print the sh command we're executing. 2010-12-06 02:47:24 -08:00
Avery Pennarun
4669903887 The mtime of a directory is kind of useless, so don't use it. 2010-12-05 03:58:20 -08:00
Avery Pennarun
66187e879e Slightly improve the "if target already existed" rule to ignore directories.
So if you have a default.do, it might be used to build mydir, even if mydir
already existed when we started.

This might be useful if you have a mydir.setup.do and a mydir.do; mydir.do
depends on mydir.setup.do, but mydir.setup.do creates mydir, it just doesn't
*finish* with mydir.  Still, my the time mydir.do runs, mydir already
exists, and redo would get confused.

I think directories are fundamentally special in this way, because it makes
sense to "create" a directory even if that directory isn't "done" at this
phase.
2010-12-04 05:42:07 -08:00
Avery Pennarun
66e7c0db5e toplevel all.do: 'redo t' no longer works.
It's too bad, but it's actually more readable to force people to say
'redo t/all' if they really mean it.  So just fix the help message.
2010-11-27 23:17:41 -08:00
Avery Pennarun
8953260d28 deps/test1.do: fix an == vs. =
In sh's [] command, you should use =, not ==.  I got away with this because
bash accepts ==, but that's non-portable.
2010-11-27 21:48:43 -08:00
Avery Pennarun
a5855641f8 This tests the chdir-related bug from the previous commit. 2010-11-25 06:37:24 -08:00
Avery Pennarun
c29de89051 Fix more trouble with .do scripts that cd to other directories.
The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty
complicated.  In this case, we accidentally assumed that the current
instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so
the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the
current .do script did chdir().

The correct answer is REDO_STARTDIR+getcwd()+t.
2010-11-25 06:37:24 -08:00
Avery Pennarun
f3413c0f7c doublestatic: fix dependencies if two files depend on one non-generated file.
If a and b both depend on c, and c is a static (non-generated) file that has
changed since the last successful build of a and b, we would try to redo
a, but would forget to redo b.  Now it does both.
2010-11-24 04:52:30 -08:00
Avery Pennarun
9fc5ae1b56 Optimization: don't getcwd() so often.
We never chdir() except just as we exec a subprocess, so it's okay to cache
this value.  This makes strace output look cleaner, and speeds things up a
little bit when checking a large number of dependencies.

Relatedly, take a debug2() message and put it in an additional if, so that
we don't have to do so much work to calculate it when we're just going to
throw it away anyhow.
2010-11-24 03:45:38 -08:00
Avery Pennarun
0ec15eeb09 If a target's .do file disappears, don't forget to stamp it.
If a file previously was generated but now isn't (ie. its .do file
disappears), we would never re-stamp that target, and so all its
dependencies would rebuild continually.
2010-11-24 03:44:37 -08:00
Avery Pennarun
60f5446733 Correctly handle dependencies for "cd somewhere; redo-ifchange somefile"
We would build 'somefile' correctly the first time, but we wouldn't
attach the dependency on somefile to the right $TARGET, so our target would
not auto-rebuild in the future based on somefile.
2010-11-24 03:06:33 -08:00
Avery Pennarun
984ad747f8 Remove special case for "dirname" -> "dirname/all"
It actually decreases readability of the .do files - by not making it
explicit when you're going into a subdir.

Plus it adds ambiguity: what if there's a dirname.do *and* a dirname/all?
We could resolve the ambiguity if we wanted, but that adds more code, while
taking out this special case makes *less* code and improves readability.
I think it's the right way to go.
2010-11-24 02:48:27 -08:00