apenwarr-redo

Author	SHA1	Message	Date
Avery Pennarun	f6fe00db5c	Directory reorg: move code into redo/, generate binaries in bin/. It's time to start preparing for a version of redo that doesn't work unless we build it first (because it will rely on C modules, and eventually be rewritten in C altogether). To get rolling, remove the old-style symlinks to the main programs, and rename those programs from redo-.py to redo/cmd_.py. We'll also move all library functions into the redo/ dir, which is a more python-style naming convention. Previously, install.do was generating wrappers for installing in /usr/bin, which extend sys.path and then import+run the right file. This made "installed" redo work quite differently from running redo inside its source tree. Instead, let's always generate the wrappers in bin/, and not make anything executable except those wrappers. Since we're generating wrappers anyway, let's actually auto-detect the right version of python for the running system; distros can't seem to agree on what to call their python2 binaries (sigh). We'll fill in the right #! shebang lines. Since we're doing that, we can stop using /usr/bin/env, which will a) make things slightly faster, and b) let us use "python -S", which tells python not to load a bunch of extra crap we're not using, thus improving startup times. Annoyingly, we now have to build redo using minimal/do, then run the tests using bin/redo. To make this less annoying, we add a toplevel ./do script that knows the right steps, and a Makefile (whee!) for people who are used to typing 'make' and 'make test' and 'make clean'.	2018-12-04 02:53:40 -05:00
Avery Pennarun	df44dc54a2	jwack: _cheatfds error when run from toplevel make -j. Also added a new unit test to confirm that 'make' behaviour works as expected, with and without parallelism.	2018-12-04 02:43:58 -05:00
Avery Pennarun	e1327540fb	Move into the 21st century by fixing some pylint warnings.	2018-12-03 00:11:27 -05:00
Avery Pennarun	8b5a567b2e	redo-log: prioritize the "foreground" process. When running a parallel build, redo-log -f (which is auto-started by redo) tries to traverse through the logs depth first, in the order parent processes started subprocesses. This works pretty well, but if its dependencies are locked, a process might have to give up its jobserver token while other stuff builds its dependencies. After the dependency finishes, the parent might not be able to get a token for quite some time, and the logs will appear to stop. To prevent this from happening, we can instantiate up to one "cheater" token, only in the foreground process (the one locked by redo-log -f), which will allow it to continue running, albeit a bit slowly (since it only has one token out of possibly many). When the process finishes, we then destroy the fake token. It gets a little complicated; see explanation at the top of jwack.py.	2018-11-17 11:13:20 -05:00
Avery Pennarun	b2411fe483	redo-log: capture and linearize the output of redo builds. redo now saves the stderr from every .do script, for every target, into a file in the .redo directory. That means you can look up the logs from the most recent build of any target using the new redo-log command, for example: redo-log -r all The default is to show logs non-recursively, that is, it'll show when a target does redo-ifchange on another target, but it won't recurse into the logs for the latter target. With -r (recursive), it does. With -u (unchanged), it does even if redo-ifchange discovered that the target was already up-to-date; in that case, it prints the logs of the most recent time the target was generated. With --no-details, redo-log will show only the 'redo' lines, not the other log messages. For very noisy build systems (like recursing into a 'make' instance) this can be helpful to get an overview of what happened, without all the cruft. You can use the -f (follow) option like tail -f, to follow a build that's currently in progress until it finishes. redo itself spins up a copy of redo-log -r -f while it runs, so you can see what's going on. Still broken in this version: - No man page or new tests yet. - ANSI colors don't yet work (unless you use --raw-logs, which gives the old-style behaviour). - You can't redirect the output of a sub-redo to a file or a pipe right now, because redo-log is eating it. - The regex for matching 'redo' lines in the log is very gross. Instead, we should put the raw log files in a more machine-parseable format, and redo-log should turn that into human-readable format. - redo-log tries to "linearize" the logs, which makes them comprehensible even for a large parallel build. It recursively shows log messages for each target in depth-first tree order (by tracing into a new target every time it sees a 'redo' line). This works really well, but in some specific cases, the "topmost" redo instance can get stuck waiting for a jwack token, which makes it look like the whole build has stalled, when really redo-log is just waiting a long time for a particular subprocess to be able to continue. We'll need to add a specific workaround for that.	2018-11-17 10:27:43 -05:00
Avery Pennarun	80aafaf290	Use signal.setitimer instead of signal.alarm. This gives us more precise timeouts, so that when _try_read hits a race condition, we don't suffer as badly.	2018-11-17 10:27:43 -05:00
Avery Pennarun	bb80118298	Cyclic dependency checker: don't give up token in common case. The way the code was written, we'd give up our token, detect a cyclic dependency, and then try to get our token back before exiting. Even with -j1, the temporary token release allowed any parent up the tree to continue running jobs, so it would take an arbitrary amount of time before we could exit (and report an error code to the parent). There was no visible symptom of this except that, with -j1, t/355-deps-cyclic would not finish until some of the later tests finished, which was surprising. To fix it, let's just check for a cyclic dependency first, then release the token only once we're sure things are sane.	2018-11-13 07:00:09 -05:00
Avery Pennarun	613625b580	Add more assertions about uncommitted sqlite transactions. I think we were sometimes leaving half-done sqlite transactions sitting around for a long time (eg. across sub-calls to .do files). This seemed to be okay on Linux, but caused sqlite deadlocks on MacOS. Most likely it's not the operating system, but the sqlite version and journal mode in use. In any case, the correct thing to do is to actually commit or rollback transactions, not leave them hanging around. ...unfortunately this doesn't actually fix my MacOS deadlocks, which makes me rather nervous.	2018-10-06 05:06:19 -04:00
Avery Pennarun	484ed925ad	Fix bug setting MAKEFLAGS, and support --jobserver-auth GNU make post-4.2 renamed the --jobserver-fds option to --jobserver-auth. For compatibility with both older and newer versions, when we set MAKEFLAGS we set both, and when we read MAKEFLAGS we will accept either one. Also, when MAKEFLAGS was not already set, redo would set a MAKEFLAGS with a leading 'None' string, which was incorrect. It should be the empty string instead.	2018-10-03 19:54:54 -04:00
Avery Pennarun	e18fa85d58	The only thing in helpers.py that needed vars.py was the log stuff. So put it in its own file. Now it's safer to import and use helpers even if you can't safely touch vars.	2010-12-11 18:34:02 -08:00
Avery Pennarun	1abaf77d35	jwack: start waitfds around fd#50. That makes it a little easier to tell, in a strace, what the process is waiting on. If it's 100/101, then it's waiting on a token; 50+ means waiting on a subtask. Also, we weren't closing the read side of subtask fds on exec. This didn't cause any problems, but did result in a wasted fd in subprocesses.	2010-12-11 18:25:13 -08:00
Avery Pennarun	c4be0050f7	Release the jwack token when doing a synchronous lock wait. Although we were deadlock-free before, under some circumstances we'd end up holding a perfectly good token while in sync wait; that would reduce our parallelism for no good reason. So give back our tokens before waiting for anybody else.	2010-12-10 23:04:46 -08:00
Avery Pennarun	18b5263db7	jwack: fix a typo in the "wrong number of tokens on exit" error. Not that we ever see that error, except when I'm screwing around.	2010-12-10 05:19:49 -08:00
Avery Pennarun	49ebea445f	jwack: don't ever set the jobserver socket to O_NONBLOCK. It creates a race condition: GNU Make might try to read while the socket is O_NONBLOCK, get EAGAIN, and die; or else another redo might set it back to blocking in between our call to make it O_NONBLOCK and our call to read(). This method - setting an alarm() during the read - is hacky, but should work every time. Unfortunately you get a 1s delay - rarely - when this happens. The good news is it only happens when there are no tokens available anyhow, so it won't affect performance much in any situation I can imagine.	2010-12-10 04:57:13 -08:00
Avery Pennarun	675a5106d2	dup() the jobserver fds to 100,101 to make debugging a bit easier. Now if a process is stuck waiting on one of those fds, it'll be obvious from the strace.	2010-12-10 04:11:44 -08:00
Avery Pennarun	6e6e453908	Some speedups for doing redo-ifchange on a large number of static files. Fix some wastage revealed by the (almost useless, sigh) python profiler.	2010-12-10 00:50:53 -08:00
Avery Pennarun	dcc2edba0c	builder.py: further refactoring to run more stuff in the parent process instead of inside the fork. Still doesn't seem to affect runtime. Good. One nice side effect is jwack.py no longer needs to know anything about our locks.	2010-11-22 00:04:15 -08:00
Avery Pennarun	7aa7c41e38	builder,jwack: slight cleanup to token passing. In rare cases, one process could end up holding onto more than one token.	2010-11-21 22:46:20 -08:00
Avery Pennarun	840a8da1ef	redo-ifchange: return nonzero if one of the dependencies fails to build. Oops! We were just always returning 0 (success) in that case.	2010-11-21 07:15:48 -08:00
Avery Pennarun	116e7e5f13	We now check the lock before releasing our jwack token. That way, if everything is locked, we can determine that with a single token, reducing context switches. But mostly this is good because the code is simpler.	2010-11-19 07:20:55 -08:00
Avery Pennarun	362ca2997a	A whole bunch of cleanups to state.Lock. Now t/curse passes again when parallelized (except for the countall mismatch, since we haven't fixed the source of that problem yet). At least it's consistent now. There's a bunch of stuff rearranged in here, but the actual important problem was that we were doing unlink() on the lock fifo even if ENXIO, which meant a reader could connect in between ENXIO and unlink(), and thus never get notified of the disconnection. This would cause the build to randomly freeze.	2010-11-19 06:07:41 -08:00
Avery Pennarun	2a9a332451	jwack.py: print the full traceback if a task fails to run.	2010-11-19 00:54:36 -08:00
Avery Pennarun	94b0e7166e	Move atoi() into atoi.py and add a new debug2() debug level. atoi() was getting redundant, and unfortunately we can't easily load helpers.py in some places where we'd want to, because it depends on vars.py. So move it to its own module.	2010-11-16 04:13:17 -08:00
Avery Pennarun	93ab32d0f9	jwack: has not been actually executable since the earlier rewrite.	2010-11-16 00:28:01 -08:00
Avery Pennarun	3803da525c	Notice sooner when make has "helpfully" closed its job control file descriptors.	2010-11-13 05:05:48 -08:00
Avery Pennarun	662f53896a	_try_read: hacky fix to make GNU make less angry. We'll have to stop using nonblocking reads, unfortunately. But this seems to work better than nothing. There's still a race condition that could theoretically make GNU make angry, unfortunately, since we briefly set the socket to nonblocking.	2010-11-13 04:50:53 -08:00
Avery Pennarun	bd5daf9754	Totally disgusting support for jobservers. It needs some heavy rethinking and cleanup. But it seems to work! And it's even compatible with GNU make, apparently!	2010-11-13 04:36:44 -08:00
Avery Pennarun	7cbf39d52a	Make jwack... mostly... work with GNU make. But it seems to be pretty unsolvable in the current form; the problem is that when you're nesting one jwack inside the other and the jobserver is GNU make, there's no way to tell the parent jwack not to use up a token. Thus, if you nest too deeply, it just deadlocks. So this approach isn't really going to work the way it is.	2010-11-12 21:09:29 -08:00
Avery Pennarun	f77e4b5c91	Add jwack, a GNU make-like jobserver. Theoretically compatible with GNU make's jobserver pipes. Haven't tested that yet.	2010-11-12 20:10:21 -08:00

29 commits