apenwarr-redo/redo/cmd_redo.py

131 lines
4.8 KiB
Python
Raw Normal View History

"""redo: build the listed targets whether they need it or not."""
#
# Copyright 2010-2018 Avery Pennarun and contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import sys, os, traceback
from . import builder, env, helpers, jobserver, logs, options, state
from .atoi import atoi
from .logs import warn, err
optspec = """
redo [targets...]
--
j,jobs= maximum number of jobs to build at once
2010-11-12 07:03:06 -08:00
d,debug print dependency checks as they happen
v,verbose print commands as they are read from .do files (variables intact)
x,xtrace print commands as they are executed (variables expanded)
k,keep-going keep going as long as possible even if some targets fail
shuffle randomize the build order to find dependency bugs
version print the current version and exit
redo-log options:
no-log don't capture error output, just let it flow straight to stderr
redo-log: capture and linearize the output of redo builds. redo now saves the stderr from every .do script, for every target, into a file in the .redo directory. That means you can look up the logs from the most recent build of any target using the new redo-log command, for example: redo-log -r all The default is to show logs non-recursively, that is, it'll show when a target does redo-ifchange on another target, but it won't recurse into the logs for the latter target. With -r (recursive), it does. With -u (unchanged), it does even if redo-ifchange discovered that the target was already up-to-date; in that case, it prints the logs of the *most recent* time the target was generated. With --no-details, redo-log will show only the 'redo' lines, not the other log messages. For very noisy build systems (like recursing into a 'make' instance) this can be helpful to get an overview of what happened, without all the cruft. You can use the -f (follow) option like tail -f, to follow a build that's currently in progress until it finishes. redo itself spins up a copy of redo-log -r -f while it runs, so you can see what's going on. Still broken in this version: - No man page or new tests yet. - ANSI colors don't yet work (unless you use --raw-logs, which gives the old-style behaviour). - You can't redirect the output of a sub-redo to a file or a pipe right now, because redo-log is eating it. - The regex for matching 'redo' lines in the log is very gross. Instead, we should put the raw log files in a more machine-parseable format, and redo-log should turn that into human-readable format. - redo-log tries to "linearize" the logs, which makes them comprehensible even for a large parallel build. It recursively shows log messages for each target in depth-first tree order (by tracing into a new target every time it sees a 'redo' line). This works really well, but in some specific cases, the "topmost" redo instance can get stuck waiting for a jwack token, which makes it look like the whole build has stalled, when really redo-log is just waiting a long time for a particular subprocess to be able to continue. We'll need to add a specific workaround for that.
2018-11-03 22:09:18 -04:00
no-details only show 'redo' recursion trace (to see more later, use redo-log)
no-status don't display build summary line at the bottom of the screen
no-pretty don't pretty-print logs, show raw @@REDO output instead
no-color disable ANSI color; --color to force enable (default: auto)
debug-locks print messages about file locking (useful for debugging)
debug-pids print process ids as part of log messages (useful for debugging)
"""
def main():
o = options.Options(optspec)
(opt, flags, extra) = o.parse(sys.argv[1:])
targets = extra
if opt.version:
from . import version
print version.TAG
sys.exit(0)
if opt.debug:
os.environ['REDO_DEBUG'] = str(opt.debug or 0)
if opt.verbose:
os.environ['REDO_VERBOSE'] = '1'
if opt.xtrace:
os.environ['REDO_XTRACE'] = '1'
if opt.keep_going:
os.environ['REDO_KEEP_GOING'] = '1'
if opt.shuffle:
os.environ['REDO_SHUFFLE'] = '1'
if opt.debug_locks:
os.environ['REDO_DEBUG_LOCKS'] = '1'
if opt.debug_pids:
os.environ['REDO_DEBUG_PIDS'] = '1'
Workaround for completely broken file locking on Windows 10 WSL. WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI for userspace processes, but the current version doesn't not implement fcntl() locks at all; it just always returns success. See https://github.com/Microsoft/WSL/issues/1927. This causes us three kinds of problem: 1. sqlite3 in WAL mode gives "OperationalError: locking protocol". 1b. Other sqlite3 journal modes also don't work when used by multiple processes. 2. redo parallelism doesn't work, because we can't prevent the same target from being build several times simultaneously. 3. "redo-log -f" doesn't work, since it can't tell whether the log file it's tailing is "done" or not. To fix #1, we switch the sqlite3 journal back to PERSIST instead of WAL. We originally changed to WAL in commit 5156feae9d to reduce deadlocks on MacOS. That was never adequately explained, but PERSIST still acts weird on MacOS, so we'll only switch to PERSIST when we detect that locking is definitely broken. Sigh. To (mostly) fix #2, we disable any -j value > 1 when locking is broken. This prevents basic forms of parallelism, but doesn't stop you from re-entrantly starting other instances of redo. To fix that properly, we need to switch to a different locking mechanism entirely, which is tough in python. flock() locks probably work, for example, but python's locks lie and just use fcntl locks for those. To fix #3, we always force --no-log mode when we find that locking is broken.
2019-01-02 14:18:51 -05:00
# These might get overridden in subprocesses in builder.py
def _set_defint(name, val):
os.environ[name] = os.environ.get(name, str(int(val)))
_set_defint('REDO_LOG', opt.log)
_set_defint('REDO_PRETTY', opt.pretty)
_set_defint('REDO_COLOR', opt.color)
try:
state.init(targets)
if env.is_toplevel and not targets:
targets = ['all']
j = atoi(opt.jobs)
if env.is_toplevel and (env.v.LOG or j > 1):
builder.close_stdin()
if env.is_toplevel and env.v.LOG:
builder.start_stdin_log_reader(
status=opt.status, details=opt.details,
Workaround for completely broken file locking on Windows 10 WSL. WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI for userspace processes, but the current version doesn't not implement fcntl() locks at all; it just always returns success. See https://github.com/Microsoft/WSL/issues/1927. This causes us three kinds of problem: 1. sqlite3 in WAL mode gives "OperationalError: locking protocol". 1b. Other sqlite3 journal modes also don't work when used by multiple processes. 2. redo parallelism doesn't work, because we can't prevent the same target from being build several times simultaneously. 3. "redo-log -f" doesn't work, since it can't tell whether the log file it's tailing is "done" or not. To fix #1, we switch the sqlite3 journal back to PERSIST instead of WAL. We originally changed to WAL in commit 5156feae9d to reduce deadlocks on MacOS. That was never adequately explained, but PERSIST still acts weird on MacOS, so we'll only switch to PERSIST when we detect that locking is definitely broken. Sigh. To (mostly) fix #2, we disable any -j value > 1 when locking is broken. This prevents basic forms of parallelism, but doesn't stop you from re-entrantly starting other instances of redo. To fix that properly, we need to switch to a different locking mechanism entirely, which is tough in python. flock() locks probably work, for example, but python's locks lie and just use fcntl locks for those. To fix #3, we always force --no-log mode when we find that locking is broken.
2019-01-02 14:18:51 -05:00
pretty=env.v.PRETTY, color=env.v.COLOR,
debug_locks=opt.debug_locks, debug_pids=opt.debug_pids)
else:
Workaround for completely broken file locking on Windows 10 WSL. WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI for userspace processes, but the current version doesn't not implement fcntl() locks at all; it just always returns success. See https://github.com/Microsoft/WSL/issues/1927. This causes us three kinds of problem: 1. sqlite3 in WAL mode gives "OperationalError: locking protocol". 1b. Other sqlite3 journal modes also don't work when used by multiple processes. 2. redo parallelism doesn't work, because we can't prevent the same target from being build several times simultaneously. 3. "redo-log -f" doesn't work, since it can't tell whether the log file it's tailing is "done" or not. To fix #1, we switch the sqlite3 journal back to PERSIST instead of WAL. We originally changed to WAL in commit 5156feae9d to reduce deadlocks on MacOS. That was never adequately explained, but PERSIST still acts weird on MacOS, so we'll only switch to PERSIST when we detect that locking is definitely broken. Sigh. To (mostly) fix #2, we disable any -j value > 1 when locking is broken. This prevents basic forms of parallelism, but doesn't stop you from re-entrantly starting other instances of redo. To fix that properly, we need to switch to a different locking mechanism entirely, which is tough in python. flock() locks probably work, for example, but python's locks lie and just use fcntl locks for those. To fix #3, we always force --no-log mode when we find that locking is broken.
2019-01-02 14:18:51 -05:00
logs.setup(
tty=sys.stderr, parent_logs=env.v.LOG,
pretty=env.v.PRETTY, color=env.v.COLOR)
if (env.is_toplevel or j > 1) and env.v.LOCKS_BROKEN:
warn('detected broken fcntl locks; parallelism disabled.\n')
warn(' ...details: https://github.com/Microsoft/WSL/issues/1927\n')
if j > 1:
j = 1
for t in targets:
if os.path.exists(t):
f = state.File(name=t)
if not f.is_generated:
warn(('%s: exists and not marked as generated; ' +
'not redoing.\n') % f.nicename())
state.rollback()
if j < 0 or j > 1000:
err('invalid --jobs value: %r\n' % opt.jobs)
jobserver.setup(j)
try:
assert state.is_flushed()
retcode = builder.run(targets, lambda t: (True, True))
assert state.is_flushed()
finally:
try:
state.rollback()
finally:
try:
jobserver.force_return_tokens()
except Exception, e: # pylint: disable=broad-except
traceback.print_exc(100, sys.stderr)
err('unexpected error: %r\n' % e)
retcode = 1
if env.is_toplevel:
builder.await_log_reader()
sys.exit(retcode)
except (KeyboardInterrupt, helpers.ImmediateReturn):
if env.is_toplevel:
builder.await_log_reader()
sys.exit(200)
if __name__ == '__main__':
main()