On MacOS (at least 10.11.6), /bin/pwd defaults to using $PWD (ie. pwd
-L). On most other OSes it defaults to *not* using $PWD (ie. pwd -P).
We need the latter behaviour. It appears that 'pwd -P' has been
specified by POSIX for quite a few years now, so let's rely on it.
shelltest.od will now also check for it, though if your 'sh' doesn't
support this feature, it'll be too late, because shelltest needs
minimal/do in order to run.
This is unfixable when running with -j > 1 because of how the current
t/flush-cache script works. We'll only be able to fix that after
making a more granular flush-cache tool, which is already on my todo
list.
This new test validates that you can pass -j1 and -j2 in a sub-redo to
create a sub-jobserver with exactly the number of jobs you specified.
Now that we have that feature, we can also test for the bug fixed two
commits ago where, with -j1, targets would be built in an unexpected
order.
Previously, if you passed a -j option to a redo process in a redo or
make process hierarchy with MAKEFLAGS already set, it would ignore the
-j option and continue using the jobserver provided by the parent.
With this change, we instead initialize a new jobserver with the
desired number of tokens, which is what GNU make does in the same
situation. A typical use case for this is to force serialization of
build steps in a subtree (by using -j1). In make, this is often useful
for "fixing" makefiles that haven't been written correctly for parallel
builds. In redo, that happens much less often, but it's useful at
least in unit tests.
Passing -j1 is relatively harmless (the redo you are starting inherits
a token anyway, so it doesn't create any new tokens). Passing -j > 1
is more risky, because it creates new tokens, thus increasing the level
of parallelism in the system. Because this may not be what you wanted,
we print a warning when you pass -j > 1 to a sub-redo. GNU make gives
a similar warning in this situation.
After waiting for children to exit, we would release our own token, and
then the caller would immediately try to obtain a token again. This
accounted for tokens correctly, but would pass tokens around the call
tree in unexpected ways.
For example, imagine we had only one token. We call 'redo a1 a2', and
a1 calls 'redo b1 b2', and b1 calls 'redo c1'. When c1 exits, it
releases its token, then tries to re-acquire it before exiting. This
also includes 'redo b1 b2' and 'redo a1 a2' in the race for the token,
which means b1 might get suspended while *either* a2 or b2 starts
running.
This never caused a deadlock, even if a2 or b2 depends on b1, because
if they tried to build b1, they would notice it is locked, give up
their token, and wait for the lock. c1 (and then b1) could then obtain
the token and immediately terminate, allowing progress to continue.
But this is not really the way we expect things to happen. "Obviously"
what we want here is a straightforward stack unwinding: c1 should finish,
then b1, then b2, then a1, then b2.
The not-very-obvious symptom of this bug is that redo's unit tests
seemed to run in the wrong order when using -j1 --no-log. (--log would
hide the problem by rearranging logs back into the right order!)
all.do's main job was to print a "nothing much to do" message after
running. Nowadays it actually does do stuff, so we can remove the
warning, making _all.do redundant.
posh will abort the entire script if it detects a syntax error. I
don't know if that's good or not, but you shouldn't be writing scripts
with syntax errors, so that by itself isn't a good reason for posh to
fail.
It still fails some actual tests, but at least now we don't consider it
a 'crash' outcome.
If a file is overridden and then overridden again, this caused us to
rebuild only the first thing that depends on it, but not any subsequent
things, which is a pretty serious bug.
It turned out that t/350-deps-forget is already supposed to test this,
but I had cleverly encoded the wrong behaviour into the expected
results in the table-driven test. I blame lack of sleep. Anyway, I
fixed the test, which made it fail, and then fixed the code, which made
it pass.
Both redo and minimal/do were doing slightly weird things with
symlinked directories, especially when combined with "..". For
example, if x is a link to ., then x/x/x/x/../y should resolve to
"../y", which is quite non-obvious.
Added some tests to make sure this stays fixed.
Based on the earlier t/000-set-minus-e bug in minimal/do on some
shells, let's add some extra tests that reveal the weirdness on those
shells. Unfortunately because they are so popular (including bash and
zsh), we can't reject them outright for failing this one.
While we're here, add a new return code, "skip", which notes that a
test has failed but is not important enough to be considered a warning
or failure. Previously we just had these commented out, which is not
quite obvious enough.
...and I updated a few comments while reviewing some of the older
tests.
We were setting a global variable FAIL on failure, but if we failed
inside a subshell (which a very small number of tests might do), this
setting would be lost. The script output (a series of failed/warning
lines) was still valid, but not the return code, so the shell might be
selected even if one of these tests failed.
To avoid the problem, put the fail/warning state in the filesystem
instead, which is shared across subshells.
This can happen when $PWD contains a symlink somewhere in the path. In
that case, "cd ..; cat x" could mean something different from "cat ../x".
Notably, this error occurs when running "./do test" if your build
directory is through a symlink. For example, on freebsd your home
directory is /home/$USER, but /home is a symlink to /usr/home, which
triggers this problem.
Not adding tests in this commit, because when I added some tests, I
found even more symlink-related bugs, but those ones are much more
unlikely to occur. The additional fixes+tests are in a later commit.
Running commands in "||" context (like "x || return") disables "set -e"
behaviour in that context, even several levels deep in the call
hierarchy. The exact behaviour varies between shells, but this caused
a test failure with at least zsh 5.3.1 on debian.
Although I expect this is rather rare, some people may want to build in
a read-write subdir of a read-only tree. Other than some confusing
error reporting, this works fine in redo after the recent changes to
temp file handling, but let's add a test to make sure it stays that
way. The test found a bug in minimal/do, so let's fix that.
Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
Previously, we'd try to put the stdout temp file in the same dir as the
target, if that dir exists. Otherwise we'd walk up the directory tree
looking for a good place. But this would go wrong if the directory we
chose got *deleted* during the run of the .do file.
Instead, we switch to an entirely new design: we use mkstemp() to
generate a temp file in the standard temp file location (probably
/tmp), then open it and immediately delete it, so the .do file can't
cause any unexpected behaviour. After the .do file exits, we use our
still-open fd to the stdout file to read the content back out.
In the old implementation, we also put the $3 in the "adjusted"
location that depended whether the target dir already existed, just for
consistency. But that was never necessary: we didn't create the $3
file, and if the .do script wants to write to $3, it should create the
target dir first anyway. So change it to *always* use a $3 temp
filename in the target dir, which is much simpler and so has fewer edge
cases.
Add t/202-del/deltest4 with some tests for all these edge cases.
Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
This can happen if we create the .tmp file in the same directory as the
target, and the .do file first does "rm -rf" on that directory, then
re-creates it. The result is that the stdout file is lost.
We'll make this a warning if the .do script *didn't* write to stdout
(so the loss is harmless, just weird), and an error if they *did* write
to stdout, which we can detect because we still have an open fd on the
file, so we can fstat() it.
Basically all just missing quotes around shell strings that use $PWD.
Most paths inside a project, since redo uses relative paths, only need
to worry when project-internal directories or filenames have spaces in
them.
Reported-by: Jeff Stearns <jeff.stearns@gmail.com>
Now that the python scripts are all in a "redo" python module, we can
use the "new style" (ahem) package-relative imports. This appeases
pylint, plus avoids confusion in case more than one package has
similarly-named modules.
This removes another instance of magical code running at module import
time. And the process title wasn't really part of the state database
anyway.
Unfortunately this uncovered a bug: the recent change to use
'python -S' makes it not find the setproctitle module if installed.
My goodness, I hate the horrible python easy_install module gunk that
makes startup linearly slower the more modules you have installed,
whether you import them or not, if you don't use -S. But oh well,
we're stuck with it for now.
Merge the two files into env, and make each command explicitly call the
function that sets it up in the way that's needed for that command.
This means we can finally just import all the modules at the top of
each file, without worrying about import order. Phew.
While we're here, remove the weird auto-appending-'all'-to-targets
feature in env.init(). Instead, do it explicitly, and only from redo and
redo-ifchange, only if is_toplevel and no other targets are given.
Previously, we'd try to build all the critical stuff first, and then
run the tests. Nowadays, it takes a little longer to build the docs
(especially some of the docs/cookbook/ stuff), and this isn't needed to
run the tests, so let's allow them to parallelize.
This avoids a name overlap with the system-installed copy of python.
Since redo adds the redo/ dir to the $PATH before running .do files,
python.do might see its own previously-created target instead of the
"real" python when testing, and create an infinite loop by accident.
They really aren't locks at all, they're a cycle detector. Also rename
REDO_LOCKS to a more meaningful REDO_CYCLES. And we'll move the
CyclicDependencyError exception in here as well, instead of state.py
where it doesn't really belong.
It's time to start preparing for a version of redo that doesn't work
unless we build it first (because it will rely on C modules, and
eventually be rewritten in C altogether).
To get rolling, remove the old-style symlinks to the main programs, and
rename those programs from redo-*.py to redo/cmd_*.py. We'll also move
all library functions into the redo/ dir, which is a more python-style
naming convention.
Previously, install.do was generating wrappers for installing in
/usr/bin, which extend sys.path and then import+run the right file.
This made "installed" redo work quite differently from running redo
inside its source tree. Instead, let's always generate the wrappers in
bin/, and not make anything executable except those wrappers.
Since we're generating wrappers anyway, let's actually auto-detect the
right version of python for the running system; distros can't seem to
agree on what to call their python2 binaries (sigh). We'll fill in the
right #! shebang lines. Since we're doing that, we can stop using
/usr/bin/env, which will a) make things slightly faster, and b) let us
use "python -S", which tells python not to load a bunch of extra crap
we're not using, thus improving startup times.
Annoyingly, we now have to build redo using minimal/do, then run the
tests using bin/redo. To make this less annoying, we add a toplevel
./do script that knows the right steps, and a Makefile (whee!) for
people who are used to typing 'make' and 'make test' and 'make clean'.
Because our experiment involves creating a file called redo/sh, and the
redo directory (or some other version of redo's dirctory) might be on
the path, just testing 'sh' is actually wrong: we might end up with
some earlier version of redo-sh.
Instead, specifically for sh, always demand that we test /bin/sh.
This seems to only affect old zsh on MacOS. But we want to catch it
anyway, because it caused t/351-deps-forget to fail in a weird way on
that version of zsh.
Shells really suck.
Parallelism and redo-log cause lots of confusion for any rules that try
to ask the user for questions, so disable it altogether.
Arguably, we should just disable stdin all the time, but maybe it's
still occasionally useful (even though you have to pass --no-log to get
it back).
I attempted to fix this in commit c06d1fba40, but it was apparently
incomplete and not all cases were covered by tests.
Let's add a much more thorough test by going through every possible
combination and making sure redo-{sources,targets,ood} all work as
expected, that the "you modified it" warning does or does not show up
when expected, and that dependencies are rebuilt the number of times we
expect.
It seems like we're using these differently than most readthedocs.org
users. Remove the borders and padding so they work better inline, and
prevent confusing word wraps.