apenwarr-redo

Author	SHA1	Message	Date
Avery Pennarun	4edb6f78e0	redo-log: add automated tests, and fix some path bugs revealed by them. When a log for X was saying it wanted to refer to Y, we used a relative path, but it was sometimes relative to the wrong starting location, so redo-log couldn't find it later. Two examples: - if default.o.do is handling builds for a/b/x.o, and default.o.do does 'redo a/b/x.h', the log for x.o should refer to ./x.h, not a/b/x.h. - if foo.do is handling builds for foo, and it does "cd a/b && redo x", the log for foo should refer to a/b/x, not just x.	2018-11-19 18:58:36 -05:00
Avery Pennarun	8b5a567b2e	redo-log: prioritize the "foreground" process. When running a parallel build, redo-log -f (which is auto-started by redo) tries to traverse through the logs depth first, in the order parent processes started subprocesses. This works pretty well, but if its dependencies are locked, a process might have to give up its jobserver token while other stuff builds its dependencies. After the dependency finishes, the parent might not be able to get a token for quite some time, and the logs will appear to stop. To prevent this from happening, we can instantiate up to one "cheater" token, only in the foreground process (the one locked by redo-log -f), which will allow it to continue running, albeit a bit slowly (since it only has one token out of possibly many). When the process finishes, we then destroy the fake token. It gets a little complicated; see explanation at the top of jwack.py.	2018-11-17 11:13:20 -05:00
Avery Pennarun	5c4f710f4e	Raw logs contain @@REDO lines instead of formatted data. This makes them more reliable to parse. redo-log can parse each line, format and print it, then recurse if necessary. This got a little ugly because I wanted 'redo --raw-logs' to work, which we want to format the output nicely, but not call redo-log. (As a result, --raw-logs has a different meaning to redo and redo-log, which is kinda dumb. I should fix that.) As an added bonus, redo-log now handles indenting of recursive logs, so if the build was a -> a/b -> a/b/c, and you look at the log for a/b, it can still start at the top level indentation.	2018-11-17 10:27:44 -05:00
Avery Pennarun	b2411fe483	redo-log: capture and linearize the output of redo builds. redo now saves the stderr from every .do script, for every target, into a file in the .redo directory. That means you can look up the logs from the most recent build of any target using the new redo-log command, for example: redo-log -r all The default is to show logs non-recursively, that is, it'll show when a target does redo-ifchange on another target, but it won't recurse into the logs for the latter target. With -r (recursive), it does. With -u (unchanged), it does even if redo-ifchange discovered that the target was already up-to-date; in that case, it prints the logs of the most recent time the target was generated. With --no-details, redo-log will show only the 'redo' lines, not the other log messages. For very noisy build systems (like recursing into a 'make' instance) this can be helpful to get an overview of what happened, without all the cruft. You can use the -f (follow) option like tail -f, to follow a build that's currently in progress until it finishes. redo itself spins up a copy of redo-log -r -f while it runs, so you can see what's going on. Still broken in this version: - No man page or new tests yet. - ANSI colors don't yet work (unless you use --raw-logs, which gives the old-style behaviour). - You can't redirect the output of a sub-redo to a file or a pipe right now, because redo-log is eating it. - The regex for matching 'redo' lines in the log is very gross. Instead, we should put the raw log files in a more machine-parseable format, and redo-log should turn that into human-readable format. - redo-log tries to "linearize" the logs, which makes them comprehensible even for a large parallel build. It recursively shows log messages for each target in depth-first tree order (by tracing into a new target every time it sees a 'redo' line). This works really well, but in some specific cases, the "topmost" redo instance can get stuck waiting for a jwack token, which makes it look like the whole build has stalled, when really redo-log is just waiting a long time for a particular subprocess to be able to continue. We'll need to add a specific workaround for that.	2018-11-17 10:27:43 -05:00
Avery Pennarun	bb80118298	Cyclic dependency checker: don't give up token in common case. The way the code was written, we'd give up our token, detect a cyclic dependency, and then try to get our token back before exiting. Even with -j1, the temporary token release allowed any parent up the tree to continue running jobs, so it would take an arbitrary amount of time before we could exit (and report an error code to the parent). There was no visible symptom of this except that, with -j1, t/355-deps-cyclic would not finish until some of the later tests finished, which was surprising. To fix it, let's just check for a cyclic dependency first, then release the token only once we're sure things are sane.	2018-11-13 07:00:09 -05:00
Avery Pennarun	2a936a7574	Print a nicer error message when asked to build an empty string (''). This happens sometimes, for example, if you do whatever \| while read x; do redo-ifchange "$x" done and the input contains blank lines. We could ignore the request for blankness, but it seems like that situation might indicate a more serious bug in your parser, so it's probably better to just abort with a meaningful error.	2018-11-03 22:02:26 -04:00
Avery Pennarun	711b05766f	Print a better message when detecting pre-existing cyclic dependencies. We already printed an error at build time, but added the broken dependency anyway. If the .do script decided to succeed despite redo-ifchange aborting, the target would be successfully created and we'd end up with an infinite loop when running isdirty() later. The result was still "correct", because python helpfully aborted the infinite loop after the recursion got too deep. But let's explicitly detect it and print a better error message. (Thanks to Nils Dagsson Moskopp's redo-testcases repo for exposing this problem. If you put a #!/bin/sh header on your .do script means you need to run 'set -e' yourself if you want .do scripts to abort after an error, which you almost always do, and those testcases don't, which exposed this bug if you ran the tests twice.)	2018-11-02 02:20:52 -04:00
Avery Pennarun	d143cca7da	class Lock: set lockfile = None before trying to open a file. In case the open causes a delay and we got a KeyboardInterrupt, this prevents a weird error from the destructor about an uninitialized variable.	2018-11-02 02:20:52 -04:00
Daniele Varrazzo	adbaaf38ce	Use py-setproctitle to clean up ps output in redo scripts * Change the process title with a cleaned-up version of the script * Document the use of setproctitle in the README	2018-10-11 03:52:20 -04:00
Robert L. Bocchino Jr	7dd63efb37	Add cyclic dependence detection. If a depends on b which depends on a, redo would just freeze. Now it aborts with a somewhat helpful error message. [Updated by apenwarr for coding style and to add a test.]	2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr	99873775ba	Revise file stamp in state.py Removed information (ctime) that was causing spurious "you modified it" warnings. [apenwarr: This was not quite right. ctime includes some things we don't want to care about, such as link count, so we have to remove it. But it also notices changes to st_uid, st_gid, and st_mode, which we do care about, so now we have to include those explicitly.]	2018-10-11 03:28:05 -04:00
Alan Falloon	67c1d4f7d8	We sometimes missed deps when more than one dep required a stamp check. If must_build was nonempty when recursively calling isdirty() that returned a list, we'd lose the original value of must_build.	2018-10-11 03:28:05 -04:00
Avery Pennarun	5156feae9d	Switch sqlite3 journal mode to WAL. WAL mode does make the deadlocking on MacOS go away. This suggests that pysqlite3 was leaving read transactions open for a long time; in old-style sqlite journals, this prevents anyone from obtaining a write lock, although it doesn't prevent other concurrent reads. With WAL journals, writes can happen even while readers are holding a lock, but the journal doesn't flush until the readers have released it. This is not a "real" fix but it's fairly harmless, since all redo instances will exit eventually, and when they do, the WAL journal can be flushed.	2018-10-06 05:06:42 -04:00
Avery Pennarun	613625b580	Add more assertions about uncommitted sqlite transactions. I think we were sometimes leaving half-done sqlite transactions sitting around for a long time (eg. across sub-calls to .do files). This seemed to be okay on Linux, but caused sqlite deadlocks on MacOS. Most likely it's not the operating system, but the sqlite version and journal mode in use. In any case, the correct thing to do is to actually commit or rollback transactions, not leave them hanging around. ...unfortunately this doesn't actually fix my MacOS deadlocks, which makes me rather nervous.	2018-10-06 05:06:19 -04:00
Avery Pennarun	34669fba65	Use os.lstat() instead of os.stat(). I think this aligns better with how redo works. Otherwise, if a.do creates a as a symlink, then changes to the symlink's target will change a's stat/stamp information without re-running a.do, which looks to redo like you modified a by hand, which causes it to stop running a.do altogether. With this change, modifications to a's target are okay, but they don't trigger any redo dependency changes. If you want that, then a.do should redo-ifchange on its symlink target explicitly.	2018-10-06 00:14:02 -04:00
Avery Pennarun	4e285607f0	Don't use "insert ... default values" in sqlite3. It isn't supported in older sqlite3 versions. Let's just do something equivalent instead.	2011-05-07 23:47:03 -04:00
Avery Pennarun	705722d612	state.relpath: correct broken paths ending with /, etc. Bug reported by Elliott Hird. Based on a patch by Tim Allen, but I didn't do it the same way.	2011-03-27 15:25:11 -04:00
Tim Allen	e27aaf01e7	Make redo read byte-strings from the database. By default, the database redo uses to store file state returns filenames as Unicode strings, and if redo tries to run a build-script whose fully-qualified path contains non-ASCII characters then redo crashes when trying to promote the path to a Unicode string. This patch ensures that the database always returns byte-strings, not Unicode strings. That way, the fully-qualified path and the target name are both byte-strings and can be joined without issue. (Fixes a bug reported by Zoran Zaric.)	2011-02-14 18:48:33 -08:00
Avery Pennarun	340aad1797	state.Lock: destructor: unlock before closing, not the other way around.	2011-01-18 00:48:51 -08:00
Avery Pennarun	642d60a7c9	state.File.set_static(): barf if we set_static() a nonexistent file. If that ever happens, we probably got our paths mangled (like in the previous commit) so we should die right away rather than allow weird things to happen later.	2011-01-18 00:48:51 -08:00
Avery Pennarun	0dcc3f61b6	Search parent directories for default.do. Previously, we would only search for default.do in the same directory in the target; now we search parent directories as well. Let's say we're in a/b/ and trying to build foo.o. If we find ../../default.o.do, then we'll run cd ../..; sh default.o.do a/b/foo .o $TMPNAME In other words, we still always chdir to the same directory as the .do file. But now $1 might have a path in it, not just a basename.	2010-12-19 05:58:49 -08:00
Avery Pennarun	560f95fd77	Don't update the database during redo-ood. Makes it slightly faster.	2010-12-19 03:50:55 -08:00
Avery Pennarun	df85b3d163	Move dependency checking from redo-ifchange into deps.py. In preparation for sharing between multiple commands.	2010-12-19 03:50:38 -08:00
Avery Pennarun	f2d34fa685	New redo-sources and redo-targets commands. Suggested by djb in personal email, and on the mailing list. redo-targets lists all the targets in the database; redo-sources lists all the existing sources (ie. files that are referred to but which aren't targets). redo-ifcreate filenames aren't included in the redo-sources list.	2010-12-19 03:50:38 -08:00
Avery Pennarun	95680ed7ef	Switch to using a separate lockfile per target. The previous method, using fcntl byterange locks, was very efficient and avoided unnecessarily filesystem metadata churn (ie. creating/deleting inodes). Unfortunately, MacOS X (at least version 10.6.5) apparently has a race condition in its fcntl locking that makes it unusably unreliable (http://apenwarr.ca/log/?m=201012#13). My tests indicate that if you only ever lock a single byterange on a file, the race condition doesn't cause a problem. So let's just use one lockfile per target. Now "redo -j20 test" passes for me on both MacOS and Linux. This doesn't measurably affect the speed on Linux, at least, in my tests. The bad news: it's hard to safely delete those lockfiles when we're done with them, so they tend to accumulate in the .redo dir.	2010-12-14 02:44:29 -08:00
Avery Pennarun	294945bd0f	Assert that one instance never holds multiple locks on the same file at once. This could happen if you did 'redo foo foo'. Which nobody ever did, I think, but let's make sure we catch it if they do. One problem with having multiple locks on the same file is then you have to remember not to unlock it until they're all done. But there are other problems, such as: why the heck did we think it was a good idea to lock the same file more than once? So just prevent it from happening for now, unless/until we somehow come up with a reason it might be a good idea.	2010-12-14 02:19:08 -08:00
Avery Pennarun	c64b8a3eb1	Fix a race condition caused by zap_deps(). We can't just delete all the dependencies at the beginning and re-add them: other people might be checking the same dependencies in parallel. Instead, mark them as delete_me up front, and then after the build completes, remove only the delete_me entries.	2010-12-11 22:59:55 -08:00
Avery Pennarun	1cb000ece1	redo.py: report when you're trying to rebuild a static file. In redo-ifchange, this might be a good idea, since you might just want to set a dependency on it, so we won't say anything from inside builder.py. But if you're calling redo.py, that means you expect it to be rebuilt, since there's no other reason to try. So print a warning. (This is what make does, more or less.)	2010-12-11 21:19:15 -08:00
Avery Pennarun	e18fa85d58	The only thing in helpers.py that needed vars.py was the log stuff. So put it in its own file. Now it's safer to import and use helpers even if you can't safely touch vars.	2010-12-11 18:34:02 -08:00
Avery Pennarun	0da5c7c082	Add a redo-always command: it adds an "always dirty" dependency to your target. This is mostly useless except when combined with redo-stamp... I think.	2010-12-11 07:02:45 -08:00
Avery Pennarun	1d26d99e0c	Fix a deadlock with redo-oob. If a checksummed target A used to exist but is now missing, and we tried to redo-ifchange that exact file, we would unnecessarily run 'redo-oob A A'; that is, we have to build A in order to determine if A needs to be built. The sub-targets of redo-oob aren't run with REDO_UNLOCKED, so this would deadlock instantly. Add an assertion to redo-oob to ensure we never try to redo-ifchange the primary target (thus converting the deadlock into an exception). And skip doing redo-oob when the target is already the same as the thing we have to check.	2010-12-11 06:16:32 -08:00
Avery Pennarun	22617d335c	Half-support for using file checksums instead of stamps. A new redo-stamp program takes whatever you give it as stdin and uses it to calculate a checksum for the current target. If that checksum is the same as last time, then we consider the target to be unchanged, and we set checked_runid and stamp, but leave changed_runid alone. That will make future callers of redo-ifchange see this target as unmodified. However, this is only "half" support because by the time we run the .do script that calls redo-stamp, it's too late; the caller is a dependant of the stamped program, which is already being rebuilt, even if redo-stamp turns out to say that this target is unchanged. The other half is coming up.	2010-12-11 05:54:37 -08:00
Avery Pennarun	f6d11d5411	If a user manually changes a generated file, don't ever overwrite it. That way the user can modify an auto-generated 'compile' script, for example, and it'll stay modified. If they delete the file, we can then generate it for them again. Also, we have to warn whenever we're doing this, or people might think it's a bug.	2010-12-10 22:43:11 -08:00
Avery Pennarun	0126f6be1e	Don't wipe the timestamp when a target fails to redo. It's really a separate condition. And since we're not removing the target file in case of error - we update it atomically, and keeping it is better than losing it - there's no reason to wipe the timestamp in that case either. However, we do need to know that the build failed, so that anybody else (especially in a parallel build) who looks at that target knows that it died. So add a separate flag just for that.	2010-12-10 22:41:11 -08:00
Avery Pennarun	84169c5d27	Change locking stuff from fifos to fcntl.lockf(). This should reduce filesystem grinding a bit, and makes the code simpler. It's also theoretically a bit more portable, since I'm guessing fifo semantics aren't the same on win32 if we ever get there. Also, a major problem with the old fifo-based system is that if a redo process died without cleaning up after itself, it wouldn't delete its lockfiles, so we had to wipe them all at the beginning of each build. Now we don't; in theory, you can now have multiple copies of redo poking at the same tree at the same time and not stepping on each other.	2010-12-10 03:55:51 -08:00
Avery Pennarun	b5c02e410e	state.py: reorder things so sqlite never does fdatasync(). It was briefly synchronous at data creation time, adding a few ms to redo startup.	2010-12-10 00:50:53 -08:00
Avery Pennarun	e1a0fc9c12	state.File.is_checked() was being too paranoid. It wasn't allowing us to short circuit a dependency if that dependency had been built previously, but that was already being checked (more correctly) in dirty_deps().	2010-12-10 00:50:52 -08:00
Avery Pennarun	94cecc240b	Don't abort if 'insert into Files' gives an IntegrityError. It can happen occasionally if some other parallel redo adds the same file at the same time.	2010-12-10 00:50:52 -08:00
Avery Pennarun	3ef2bd7300	Don't check as often whether the .redo directory exists. Just check it once after running a subprocess: that's the only way it ought to be able to disappear (ie. in a 'make clean' setup).	2010-12-10 00:50:52 -08:00
Avery Pennarun	29d6c9a746	Don't db.commit() so frequently. Just commit when we're about to do something blocking. sqlite goes a lot faster with bigger transactions. This change does show a small percentage speedup in tests, but not as much as I'd like.	2010-12-10 00:50:52 -08:00
Avery Pennarun	fb79851530	Calculate dependencies with fewer sqlite queries.	2010-12-10 00:50:52 -08:00
Avery Pennarun	c339359f04	Schema cleanup.	2010-12-10 00:50:52 -08:00
Avery Pennarun	f4535be0cd	Fix a deadlock. We were holding a database open with a read lock while a child redo might need to open it with a write lock.	2010-12-10 00:50:52 -08:00
Avery Pennarun	9e36106642	sqlite3: configure the timeout explicitly. In flush-cache.sh, we have to do this, because the sqlite3 command-line tool sets it to zero. Inevitably during parallel testing, it'll end up contending for a lock, and we really want it to wait a bit. In state.py, it's not as important since the default is nonzero. But python-sqlite3's default of 5 seconds makes me a little too nervous; I can imagine a disk write waiting for more than 5 seconds sometime. So let's use 60 instead.	2010-12-10 00:50:52 -08:00
Avery Pennarun	a62bd50d44	Switch state.py to use sqlite3 instead of filesystem-based stamps. It passes all tests when run serialized, but still gives weird errors (OperationalError: database is locked) when run with -j5. sqlite3 shouldn't be barfing just because the database is locked, since the default timeout is 5 seconds, and it's dying way faster than that.	2010-12-10 00:50:52 -08:00
Avery Pennarun	51bbdc6c5a	If we can't find a .do file for a target, mark it as not is_generated. This allows files to transition from generated to not-generated if the .do file is ever removed (ie. the user is changing things and the file is now a source file, not a target).	2010-12-06 03:12:53 -08:00
Avery Pennarun	c29de89051	Fix more trouble with .do scripts that cd to other directories. The interaction of REDO_STARTDIR, REDO_PWD, and getcwd() are pretty complicated. In this case, we accidentally assumed that the current instance of redo was running with getcwd() == REDO_STARTDIR+REDO_PWD, and so the new target was REDO_STARTDIR+REDO_PWD+t, but this isn't the case if the current .do script did chdir(). The correct answer is REDO_STARTDIR+getcwd()+t.	2010-11-25 06:37:24 -08:00
Avery Pennarun	f3413c0f7c	doublestatic: fix dependencies if two files depend on one non-generated file. If a and b both depend on c, and c is a static (non-generated) file that has changed since the last successful build of a and b, we would try to redo a, but would forget to redo b. Now it does both.	2010-11-24 04:52:30 -08:00
Avery Pennarun	9fc5ae1b56	Optimization: don't getcwd() so often. We never chdir() except just as we exec a subprocess, so it's okay to cache this value. This makes strace output look cleaner, and speeds things up a little bit when checking a large number of dependencies. Relatedly, take a debug2() message and put it in an additional if, so that we don't have to do so much work to calculate it when we're just going to throw it away anyhow.	2010-11-24 03:45:38 -08:00
Avery Pennarun	f337df463d	state.stamp() can't imply state.built(). ...because we deliberately stamp non-generated files as well, and that doesn't need to imply that we rebuilt them just now. In fact, we know for a fact that we didn't rebuild them just now, but we still need to record the timestamp for later.	2010-11-22 22:53:40 -08:00

1 2

60 commits