2018-12-14 08:38:53 +00:00
|
|
|
"""redo: build the listed targets whether they need it or not."""
|
2018-11-26 17:04:31 -05:00
|
|
|
#
|
|
|
|
|
# Copyright 2010-2018 Avery Pennarun and contributors
|
2018-12-02 23:15:37 -05:00
|
|
|
#
|
2018-11-26 17:04:31 -05:00
|
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
# you may not use this file except in compliance with the License.
|
|
|
|
|
# You may obtain a copy of the License at
|
2018-12-02 23:15:37 -05:00
|
|
|
#
|
2018-11-26 17:04:31 -05:00
|
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
2018-12-02 23:15:37 -05:00
|
|
|
#
|
2018-11-26 17:04:31 -05:00
|
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
# See the License for the specific language governing permissions and
|
|
|
|
|
# limitations under the License.
|
|
|
|
|
#
|
redo-log: prioritize the "foreground" process.
When running a parallel build, redo-log -f (which is auto-started by
redo) tries to traverse through the logs depth first, in the order
parent processes started subprocesses. This works pretty well, but if
its dependencies are locked, a process might have to give up its
jobserver token while other stuff builds its dependencies. After the
dependency finishes, the parent might not be able to get a token for
quite some time, and the logs will appear to stop.
To prevent this from happening, we can instantiate up to one "cheater"
token, only in the foreground process (the one locked by redo-log -f),
which will allow it to continue running, albeit a bit slowly (since it
only has one token out of possibly many). When the process finishes,
we then destroy the fake token. It gets a little complicated; see
explanation at the top of jwack.py.
2018-11-17 04:32:09 -05:00
|
|
|
import sys, os, traceback
|
2019-01-18 00:06:18 +00:00
|
|
|
from . import builder, env, helpers, jobserver, logs, options, state
|
2018-12-05 02:34:36 -05:00
|
|
|
from .atoi import atoi
|
|
|
|
|
from .logs import warn, err
|
2010-11-12 05:24:46 -08:00
|
|
|
|
|
|
|
|
optspec = """
|
|
|
|
|
redo [targets...]
|
|
|
|
|
--
|
2010-11-13 04:36:44 -08:00
|
|
|
j,jobs= maximum number of jobs to build at once
|
2010-11-12 07:03:06 -08:00
|
|
|
d,debug print dependency checks as they happen
|
2010-11-21 06:35:52 -08:00
|
|
|
v,verbose print commands as they are read from .do files (variables intact)
|
|
|
|
|
x,xtrace print commands as they are executed (variables expanded)
|
2010-11-21 07:10:48 -08:00
|
|
|
k,keep-going keep going as long as possible even if some targets fail
|
2010-11-16 00:14:57 -08:00
|
|
|
shuffle randomize the build order to find dependency bugs
|
2018-12-02 22:35:57 -05:00
|
|
|
version print the current version and exit
|
|
|
|
|
|
|
|
|
|
redo-log options:
|
|
|
|
|
no-log don't capture error output, just let it flow straight to stderr
|
redo-log: capture and linearize the output of redo builds.
redo now saves the stderr from every .do script, for every target, into
a file in the .redo directory. That means you can look up the logs
from the most recent build of any target using the new redo-log
command, for example:
redo-log -r all
The default is to show logs non-recursively, that is, it'll show when a
target does redo-ifchange on another target, but it won't recurse into
the logs for the latter target. With -r (recursive), it does. With -u
(unchanged), it does even if redo-ifchange discovered that the target
was already up-to-date; in that case, it prints the logs of the *most
recent* time the target was generated.
With --no-details, redo-log will show only the 'redo' lines, not the
other log messages. For very noisy build systems (like recursing into
a 'make' instance) this can be helpful to get an overview of what
happened, without all the cruft.
You can use the -f (follow) option like tail -f, to follow a build
that's currently in progress until it finishes. redo itself spins up a
copy of redo-log -r -f while it runs, so you can see what's going on.
Still broken in this version:
- No man page or new tests yet.
- ANSI colors don't yet work (unless you use --raw-logs, which gives
the old-style behaviour).
- You can't redirect the output of a sub-redo to a file or a
pipe right now, because redo-log is eating it.
- The regex for matching 'redo' lines in the log is very gross.
Instead, we should put the raw log files in a more machine-parseable
format, and redo-log should turn that into human-readable format.
- redo-log tries to "linearize" the logs, which makes them
comprehensible even for a large parallel build. It recursively shows
log messages for each target in depth-first tree order (by tracing
into a new target every time it sees a 'redo' line). This works
really well, but in some specific cases, the "topmost" redo instance
can get stuck waiting for a jwack token, which makes it look like the
whole build has stalled, when really redo-log is just waiting a long
time for a particular subprocess to be able to continue. We'll need to
add a specific workaround for that.
2018-11-03 22:09:18 -04:00
|
|
|
no-details only show 'redo' recursion trace (to see more later, use redo-log)
|
|
|
|
|
no-status don't display build summary line at the bottom of the screen
|
2018-11-19 10:55:56 -05:00
|
|
|
no-pretty don't pretty-print logs, show raw @@REDO output instead
|
2018-11-19 11:22:53 -05:00
|
|
|
no-color disable ANSI color; --color to force enable (default: auto)
|
2010-11-21 06:23:41 -08:00
|
|
|
debug-locks print messages about file locking (useful for debugging)
|
2010-11-22 01:50:46 -08:00
|
|
|
debug-pids print process ids as part of log messages (useful for debugging)
|
2010-11-12 05:24:46 -08:00
|
|
|
"""
|
|
|
|
|
|
2018-12-05 01:07:16 -05:00
|
|
|
def main():
|
|
|
|
|
o = options.Options(optspec)
|
|
|
|
|
(opt, flags, extra) = o.parse(sys.argv[1:])
|
2010-11-12 05:24:46 -08:00
|
|
|
|
2018-12-05 01:07:16 -05:00
|
|
|
targets = extra
|
2010-11-13 00:53:55 -08:00
|
|
|
|
2018-12-05 01:07:16 -05:00
|
|
|
if opt.version:
|
2018-12-05 02:34:36 -05:00
|
|
|
from . import version
|
2018-12-05 01:07:16 -05:00
|
|
|
print version.TAG
|
|
|
|
|
sys.exit(0)
|
|
|
|
|
if opt.debug:
|
|
|
|
|
os.environ['REDO_DEBUG'] = str(opt.debug or 0)
|
|
|
|
|
if opt.verbose:
|
|
|
|
|
os.environ['REDO_VERBOSE'] = '1'
|
|
|
|
|
if opt.xtrace:
|
|
|
|
|
os.environ['REDO_XTRACE'] = '1'
|
|
|
|
|
if opt.keep_going:
|
|
|
|
|
os.environ['REDO_KEEP_GOING'] = '1'
|
|
|
|
|
if opt.shuffle:
|
|
|
|
|
os.environ['REDO_SHUFFLE'] = '1'
|
|
|
|
|
if opt.debug_locks:
|
|
|
|
|
os.environ['REDO_DEBUG_LOCKS'] = '1'
|
|
|
|
|
if opt.debug_pids:
|
|
|
|
|
os.environ['REDO_DEBUG_PIDS'] = '1'
|
Workaround for completely broken file locking on Windows 10 WSL.
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI
for userspace processes, but the current version doesn't not implement
fcntl() locks at all; it just always returns success. See
https://github.com/Microsoft/WSL/issues/1927.
This causes us three kinds of problem:
1. sqlite3 in WAL mode gives "OperationalError: locking protocol".
1b. Other sqlite3 journal modes also don't work when used by
multiple processes.
2. redo parallelism doesn't work, because we can't prevent the same
target from being build several times simultaneously.
3. "redo-log -f" doesn't work, since it can't tell whether the log
file it's tailing is "done" or not.
To fix #1, we switch the sqlite3 journal back to PERSIST instead of
WAL. We originally changed to WAL in commit 5156feae9d to reduce
deadlocks on MacOS. That was never adequately explained, but PERSIST
still acts weird on MacOS, so we'll only switch to PERSIST when we
detect that locking is definitely broken. Sigh.
To (mostly) fix #2, we disable any -j value > 1 when locking is broken.
This prevents basic forms of parallelism, but doesn't stop you from
re-entrantly starting other instances of redo. To fix that properly,
we need to switch to a different locking mechanism entirely, which is
tough in python. flock() locks probably work, for example, but
python's locks lie and just use fcntl locks for those.
To fix #3, we always force --no-log mode when we find that locking is
broken.
2019-01-02 14:18:51 -05:00
|
|
|
# These might get overridden in subprocesses in builder.py
|
|
|
|
|
def _set_defint(name, val):
|
|
|
|
|
os.environ[name] = os.environ.get(name, str(int(val)))
|
|
|
|
|
_set_defint('REDO_LOG', opt.log)
|
|
|
|
|
_set_defint('REDO_PRETTY', opt.pretty)
|
|
|
|
|
_set_defint('REDO_COLOR', opt.color)
|
2010-11-19 03:13:40 -08:00
|
|
|
|
2010-11-13 04:36:44 -08:00
|
|
|
try:
|
2018-12-05 01:07:16 -05:00
|
|
|
state.init(targets)
|
|
|
|
|
if env.is_toplevel and not targets:
|
|
|
|
|
targets = ['all']
|
jobserver: allow overriding the parent jobserver in a subprocess.
Previously, if you passed a -j option to a redo process in a redo or
make process hierarchy with MAKEFLAGS already set, it would ignore the
-j option and continue using the jobserver provided by the parent.
With this change, we instead initialize a new jobserver with the
desired number of tokens, which is what GNU make does in the same
situation. A typical use case for this is to force serialization of
build steps in a subtree (by using -j1). In make, this is often useful
for "fixing" makefiles that haven't been written correctly for parallel
builds. In redo, that happens much less often, but it's useful at
least in unit tests.
Passing -j1 is relatively harmless (the redo you are starting inherits
a token anyway, so it doesn't create any new tokens). Passing -j > 1
is more risky, because it creates new tokens, thus increasing the level
of parallelism in the system. Because this may not be what you wanted,
we print a warning when you pass -j > 1 to a sub-redo. GNU make gives
a similar warning in this situation.
2018-12-31 18:57:58 -05:00
|
|
|
j = atoi(opt.jobs)
|
2018-12-05 01:07:16 -05:00
|
|
|
if env.is_toplevel and (env.v.LOG or j > 1):
|
2018-12-03 21:15:31 -05:00
|
|
|
builder.close_stdin()
|
2018-12-05 01:07:16 -05:00
|
|
|
if env.is_toplevel and env.v.LOG:
|
2018-12-03 21:15:31 -05:00
|
|
|
builder.start_stdin_log_reader(
|
|
|
|
|
status=opt.status, details=opt.details,
|
Workaround for completely broken file locking on Windows 10 WSL.
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI
for userspace processes, but the current version doesn't not implement
fcntl() locks at all; it just always returns success. See
https://github.com/Microsoft/WSL/issues/1927.
This causes us three kinds of problem:
1. sqlite3 in WAL mode gives "OperationalError: locking protocol".
1b. Other sqlite3 journal modes also don't work when used by
multiple processes.
2. redo parallelism doesn't work, because we can't prevent the same
target from being build several times simultaneously.
3. "redo-log -f" doesn't work, since it can't tell whether the log
file it's tailing is "done" or not.
To fix #1, we switch the sqlite3 journal back to PERSIST instead of
WAL. We originally changed to WAL in commit 5156feae9d to reduce
deadlocks on MacOS. That was never adequately explained, but PERSIST
still acts weird on MacOS, so we'll only switch to PERSIST when we
detect that locking is definitely broken. Sigh.
To (mostly) fix #2, we disable any -j value > 1 when locking is broken.
This prevents basic forms of parallelism, but doesn't stop you from
re-entrantly starting other instances of redo. To fix that properly,
we need to switch to a different locking mechanism entirely, which is
tough in python. flock() locks probably work, for example, but
python's locks lie and just use fcntl locks for those.
To fix #3, we always force --no-log mode when we find that locking is
broken.
2019-01-02 14:18:51 -05:00
|
|
|
pretty=env.v.PRETTY, color=env.v.COLOR,
|
2018-12-03 21:15:31 -05:00
|
|
|
debug_locks=opt.debug_locks, debug_pids=opt.debug_pids)
|
2018-12-11 00:55:05 +00:00
|
|
|
else:
|
Workaround for completely broken file locking on Windows 10 WSL.
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI
for userspace processes, but the current version doesn't not implement
fcntl() locks at all; it just always returns success. See
https://github.com/Microsoft/WSL/issues/1927.
This causes us three kinds of problem:
1. sqlite3 in WAL mode gives "OperationalError: locking protocol".
1b. Other sqlite3 journal modes also don't work when used by
multiple processes.
2. redo parallelism doesn't work, because we can't prevent the same
target from being build several times simultaneously.
3. "redo-log -f" doesn't work, since it can't tell whether the log
file it's tailing is "done" or not.
To fix #1, we switch the sqlite3 journal back to PERSIST instead of
WAL. We originally changed to WAL in commit 5156feae9d to reduce
deadlocks on MacOS. That was never adequately explained, but PERSIST
still acts weird on MacOS, so we'll only switch to PERSIST when we
detect that locking is definitely broken. Sigh.
To (mostly) fix #2, we disable any -j value > 1 when locking is broken.
This prevents basic forms of parallelism, but doesn't stop you from
re-entrantly starting other instances of redo. To fix that properly,
we need to switch to a different locking mechanism entirely, which is
tough in python. flock() locks probably work, for example, but
python's locks lie and just use fcntl locks for those.
To fix #3, we always force --no-log mode when we find that locking is
broken.
2019-01-02 14:18:51 -05:00
|
|
|
logs.setup(
|
|
|
|
|
tty=sys.stderr, parent_logs=env.v.LOG,
|
|
|
|
|
pretty=env.v.PRETTY, color=env.v.COLOR)
|
|
|
|
|
if (env.is_toplevel or j > 1) and env.v.LOCKS_BROKEN:
|
|
|
|
|
warn('detected broken fcntl locks; parallelism disabled.\n')
|
|
|
|
|
warn(' ...details: https://github.com/Microsoft/WSL/issues/1927\n')
|
|
|
|
|
if j > 1:
|
|
|
|
|
j = 1
|
2018-12-11 00:55:05 +00:00
|
|
|
|
2018-12-03 21:15:31 -05:00
|
|
|
for t in targets:
|
|
|
|
|
if os.path.exists(t):
|
|
|
|
|
f = state.File(name=t)
|
|
|
|
|
if not f.is_generated:
|
2018-12-04 23:34:28 -05:00
|
|
|
warn(('%s: exists and not marked as generated; ' +
|
|
|
|
|
'not redoing.\n') % f.nicename())
|
2018-12-03 21:15:31 -05:00
|
|
|
state.rollback()
|
|
|
|
|
|
jobserver: allow overriding the parent jobserver in a subprocess.
Previously, if you passed a -j option to a redo process in a redo or
make process hierarchy with MAKEFLAGS already set, it would ignore the
-j option and continue using the jobserver provided by the parent.
With this change, we instead initialize a new jobserver with the
desired number of tokens, which is what GNU make does in the same
situation. A typical use case for this is to force serialization of
build steps in a subtree (by using -j1). In make, this is often useful
for "fixing" makefiles that haven't been written correctly for parallel
builds. In redo, that happens much less often, but it's useful at
least in unit tests.
Passing -j1 is relatively harmless (the redo you are starting inherits
a token anyway, so it doesn't create any new tokens). Passing -j > 1
is more risky, because it creates new tokens, thus increasing the level
of parallelism in the system. Because this may not be what you wanted,
we print a warning when you pass -j > 1 to a sub-redo. GNU make gives
a similar warning in this situation.
2018-12-31 18:57:58 -05:00
|
|
|
if j < 0 or j > 1000:
|
2018-12-03 21:15:31 -05:00
|
|
|
err('invalid --jobs value: %r\n' % opt.jobs)
|
2018-12-04 23:20:14 -05:00
|
|
|
jobserver.setup(j)
|
2018-10-06 04:36:24 -04:00
|
|
|
try:
|
2018-12-03 21:15:31 -05:00
|
|
|
assert state.is_flushed()
|
2018-12-11 02:57:29 +00:00
|
|
|
retcode = builder.run(targets, lambda t: (True, True))
|
2018-12-03 21:15:31 -05:00
|
|
|
assert state.is_flushed()
|
2018-10-06 04:36:24 -04:00
|
|
|
finally:
|
redo-log: prioritize the "foreground" process.
When running a parallel build, redo-log -f (which is auto-started by
redo) tries to traverse through the logs depth first, in the order
parent processes started subprocesses. This works pretty well, but if
its dependencies are locked, a process might have to give up its
jobserver token while other stuff builds its dependencies. After the
dependency finishes, the parent might not be able to get a token for
quite some time, and the logs will appear to stop.
To prevent this from happening, we can instantiate up to one "cheater"
token, only in the foreground process (the one locked by redo-log -f),
which will allow it to continue running, albeit a bit slowly (since it
only has one token out of possibly many). When the process finishes,
we then destroy the fake token. It gets a little complicated; see
explanation at the top of jwack.py.
2018-11-17 04:32:09 -05:00
|
|
|
try:
|
2018-12-03 21:15:31 -05:00
|
|
|
state.rollback()
|
|
|
|
|
finally:
|
|
|
|
|
try:
|
2018-12-04 23:20:14 -05:00
|
|
|
jobserver.force_return_tokens()
|
2018-12-03 21:15:31 -05:00
|
|
|
except Exception, e: # pylint: disable=broad-except
|
|
|
|
|
traceback.print_exc(100, sys.stderr)
|
|
|
|
|
err('unexpected error: %r\n' % e)
|
|
|
|
|
retcode = 1
|
2018-12-05 01:07:16 -05:00
|
|
|
if env.is_toplevel:
|
2018-12-03 21:15:31 -05:00
|
|
|
builder.await_log_reader()
|
|
|
|
|
sys.exit(retcode)
|
2019-01-18 00:06:18 +00:00
|
|
|
except (KeyboardInterrupt, helpers.ImmediateReturn):
|
2018-12-05 01:07:16 -05:00
|
|
|
if env.is_toplevel:
|
2018-12-03 21:15:31 -05:00
|
|
|
builder.await_log_reader()
|
|
|
|
|
sys.exit(200)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if __name__ == '__main__':
|
|
|
|
|
main()
|