Commit graph

444 commits

Author SHA1 Message Date
Avery Pennarun
e40dc5bad2 redo-whichdo: fix a bug where the last dir was checked twice, and add tests.
When we can't find a .do file, we walk all the way back to the root
directory.  When that happens, the root directory is actually searched
twice.  This is harmless (since a .do file doesn't exist there anyway)
but causes redo-whichdo to produce the wrong output.

Also, add a test, which I forgot to do when writing whichdo in the
first place.

To make the test work from the root directory, we need a way to
initialize redo without actually creating a .redo directory.  Add a
init_no_state() function for that purpose, and split the necessary path
functions into their own module so we can avoid importing builder.py.
2018-11-02 02:20:52 -04:00
Avery Pennarun
f835becde4 redo-whichdo: updated output format.
The new format is just a list of .do files we tried, with a newline
after each one.  If we successfully found a .do file, we exit 0, else
we exit 1.

As discussed on the redo-list mailing list, it's easier to parse
without the extra cruft.  This makes users figure out $1 and $2
themselves, but that's not very hard, and maybe for the best.
2018-11-02 02:20:52 -04:00
Avery Pennarun
711b05766f Print a better message when detecting pre-existing cyclic dependencies.
We already printed an error at build time, but added the broken
dependency anyway.  If the .do script decided to succeed despite
redo-ifchange aborting, the target would be successfully created
and we'd end up with an infinite loop when running isdirty() later.

The result was still "correct", because python helpfully aborted
the infinite loop after the recursion got too deep.  But let's
explicitly detect it and print a better error message.

(Thanks to Nils Dagsson Moskopp's redo-testcases repo for exposing this
problem.  If you put a #!/bin/sh header on your .do script means you
need to run 'set -e' yourself if you want .do scripts to abort after an
error, which you almost always do, and those testcases don't, which
exposed this bug if you ran the tests twice.)
2018-11-02 02:20:52 -04:00
Avery Pennarun
d143cca7da class Lock: set lockfile = None before trying to open a file.
In case the open causes a delay and we got a KeyboardInterrupt, this
prevents a weird error from the destructor about an uninitialized variable.
2018-11-02 02:20:52 -04:00
Avery Pennarun
f4f9ed97ec t/clean.do: don't forget to run s??/clean. 2018-11-02 02:20:52 -04:00
Avery Pennarun
eb004d531e shelltest: downgrade #48 to a warning.
This is the test that
   x=y f
does *not* unset x after running f, if f is a shell function.  Apparently
that's the right thing to do, but freebsd dash 0.5.10.2 fails it.  This is
surprising because debian dash 0.5.8-2.4 passes, and it is presumably older.
Downgrade this to a warning because we want freebsd to have some cheap sh
variant that passes with redo, and nobody should be *relying* on this
insane behaviour anyway.

freebsd 11.2 sh (but not freebsd dash) fails test #24.  That one seems
rather serious, so I don't want to downgrade it to a warning.
2018-10-29 07:35:56 +00:00
Avery Pennarun
52236a3aed builder.py: don't spin while fighting for a file lock.
If we end up in builder phase 2, where we might need to
build stuff that was previously locked by someone else,
we will need to obtain a job token *and* the lock at the
same time in order to continue.  To prevent deadlocks,
we don't wait synchronously for one lock while holding the
other.

If several instances are fighting over the same lock and
there are insufficient job tokens for everyone, timing
could cause them to fight for a long time.  This seems
to happen a lot in freebsd for some reason.  To be a good
citizen, sleep for a while after each loop iteration.
This should ensure that eventually, most of the fighting
instances will be asleep by the time the next one tries to
grab the token, thus breaking the deadlock.
2018-10-29 07:31:28 +00:00
Avery Pennarun
6c81b110be Tests: use cc instead of gcc.
This fixes problems where llvm is installed without a 'gcc' alias.
2018-10-29 07:20:33 +00:00
Avery Pennarun
0fc2e46708 minimal/do: add support for -x -v -d -c options.
-x, -v, and -d are the same as redo.

-c means "continuable", which disables the feature that deletes (and
forgets) all targets at the start of each run.  This is a little risky,
since minimal/do still doesn't understand dependencies, but it allows
you to run minimal/do several times in succession, so that

    minimal/do -c a
    minimal/do -c b
is the same as
    minimal/do a b
2018-10-17 02:57:33 -04:00
Avery Pennarun
f345eae290 minimal/do: redo vs redo-ifchange, and fix empty target handling.
We previously assumed that redo and redo-ifchange are the same in
minimal/do's design, because it rebuilds all targets on every run, and
so there's no reason to ever build the same target more than once.

Unfortunately that's incorrect: if you run 'redo x' from two points in
a single run (or even twice in the same .do file), we expect x to be
built twice.  If you wanted redo to decide whether to build it the
second time, you should have used redo-ifchange.

t/102-empty/touchtest was trying to test for this.  However, a
second bug in minimal/do made the test pass anyway.  minimal/do would
*always* rebuild any target x that produced no output, not caring
whether it had tried to build before, whether you used redo or
redo-ifchange.  And while we tested that redo would redo a file that
had been deleted, we didn't ensure that it would redo a file that was
*not* deleted, nor that redo-ifchange would *not* redo that file.

Fix both bugs in minimal/do, and make t/102-empty/touchtest cover the
missing cases.
2018-10-17 01:54:29 -04:00
Avery Pennarun
887df98ead builder.py: refresh the File object after obtaining the lock.
We need to create the File object to get its f.id, then lock that id.
During that gap, another instance of redo may have modified the file or
its state data, so we have to refresh it.

This fixes 'redo -j10 t/stress'.
2018-10-13 01:37:08 -04:00
Avery Pennarun
999ba4fb13 t/stress: add a test that usually triggers a bug using 950-curse.
It looks like we're updating the stamp for t/countall while another
task is replacing the file, which suggests a race condition in our
state management database.
2018-10-12 05:48:56 -04:00
Avery Pennarun
d8811601f1 Better logging for 'manual override' detection.
The first time we notice a file has been overridden, log the old and
new stamp data, which might give a hint about how this happened.

Currently if I do
    rm t/950-curse/countall
    while :; do redo -j10 t/950-curse/all --shuffle || break; done

it will end up complaining that countall has been overridden within
just a few runs, even though it definitely hasn't been.  There seems to
be someone reading a file stamp while someone else is redoing the
file, but I haven't found it yet.
2018-10-12 05:20:35 -04:00
Avery Pennarun
aa423a723f t/660-stamp: don't run at the same time as other tests in redo -j.
flush-cache can cause files affected by redo-stamp to get rebuilt
unnecessarily, which the test is specifically trying to validate.
Since other tests run flush-cache at random times when using -j, this
would cause random test failures.
2018-10-12 05:20:27 -04:00
Avery Pennarun
728a19cd52 t/*: some cleanups so switching between redo and minimal/do works.
Because the two programs use separate state databases, it helps if we
clean up some temp files between runs.  Otherwise they might think you
created some targets "by hand" and refuse to rebuild them.
2018-10-12 05:20:27 -04:00
Avery Pennarun
637521070b t/all.do: use redo instead of redo-ifchange.
I'm not quite sure what I was thinking, using redo-ifchange there, but
the result was that some tests wouldn't run if you run 'redo test'
repeatedly, even after modifying redo itself.

Also tweaked t/950-curse so that it always runs, not just the first
time.
2018-10-12 05:19:43 -04:00
Avery Pennarun
a3077a4a4d minimal/do: fix failure with paths containing spaces.
We grew a test for these at some point, but minimal/do didn't actually
pass it.

sh syntax oddity: if you say
    x=$1 y=$2
then it works fine when $1 and $2 contain spaces.  But if you say
    export x=$1 y=$2
(or "local" instead of "export") then $1 and $2 will be split on IFS,
and it won't do what you think.  I guess this is because 'export' and
'local' are like commands, and command arguments are split if not
quoted.
2018-10-12 05:18:44 -04:00
Avery Pennarun
0d174f92c3 redo-sh: downgrade failures that affected dash; add a bash warning.
I feel a little dirty doing this, but the way the code was before, redo
almost always picked bash as the shell.  bash is way too overpowered
and this led to bashisms in do scripts unnecessarily.  The two failures
in dash are things that I would really like to have, but they haven't
materialized after 6 years, so I guess we should be realistic.

To appropriately penalize bash for asking for trouble, I added a
warning about [ 1 == 1 ] syntax being valid (as opposed to the POSIX
correct [ 1 = 1 ]).  This allows dash to be selected ahead of bash.

I also moved 'sh' to the end of the list, because although it's the
weakest shell on some systems, on other systems it's just bash.  And I
put zsh in front of bash, because fewer people have zsh and we want
them to test zsh.
2018-10-12 05:18:25 -04:00
Alan Falloon
9354e78871 Null out lock vars so that __del__ gets called
The builder was holding lock variables in the loop which means that
sometimes a state.Lock object would be created for the same file-id
twice, triggering the assertion. Assign the lock variables to None to
ensure that the state.Lock objects are destroyed before creating the
next one in the loop.
2018-10-11 23:15:37 -04:00
Alan Falloon
f4b4c400b2 Handle errors on rename of target file.
[apenwarr: this is the remaining part after part of the original was
included in someone else's separate patch.]
2018-10-11 23:12:07 -04:00
Avery Pennarun
c724523473 Docs: Add missing commands to redo.md, and add redo-whichdo.md. 2018-10-11 05:56:21 -04:00
Avery Pennarun
84fb972fb7 Documentation: Fix some markdown formatting bugs. 2018-10-11 05:56:21 -04:00
Daniele Varrazzo
adbaaf38ce Use py-setproctitle to clean up ps output in redo scripts
* Change the process title with a cleaned-up version of the script
* Document the use of setproctitle in the README
2018-10-11 03:52:20 -04:00
Avery Pennarun
2affe20fb2 redo.me: missing .do in do search path description.
[Noticed by mait on github.]
2018-10-11 03:34:25 -04:00
Daniel Benamy
4c09289fb4 Fix missing blank line that breaks README.md formatting. 2018-10-11 03:30:31 -04:00
Rob Donnelly
587ad39b03 Add installation instructions 2018-10-11 03:28:54 -04:00
Avery Pennarun
0d60e4e2ec Missing state flush after checking initial file existence.
This caused an assertion in some error conditions.
2018-10-11 03:28:05 -04:00
Avery Pennarun
93d2515bc5 Missing a couple of rules in t/clean.do files. 2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
7dd63efb37 Add cyclic dependence detection.
If a depends on b which depends on a, redo would just freeze.  Now it
aborts with a somewhat helpful error message.

[Updated by apenwarr for coding style and to add a test.]
2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
63f9dcb640 Remove deprecated old-args feature. 2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
99873775ba Revise file stamp in state.py
Removed information (ctime) that was causing spurious
"you modified it" warnings.

[apenwarr: This was not quite right.  ctime includes some things we
don't want to care about, such as link count, so we have to remove it.
But it also notices changes to st_uid, st_gid, and st_mode, which we do
care about, so now we have to include those explicitly.]
2018-10-11 03:28:05 -04:00
Robert L. Bocchino Jr
f739a0fc6e Fix mtime/ctime bug
[apenwarr's note: ctime includes extra inode attributes like link
count, which are not important for this check, but which could cause
spurious warnings.]
2018-10-11 03:28:05 -04:00
Mildred Ki'Lya
1f32d06c4e Fix t/130-mode: "ls -l" output is not always as expected
[tweaked by apenwarr to remove dependency on non-portable /usr/bin/stat]
2018-10-11 03:28:05 -04:00
Alan Falloon
67c1d4f7d8 We sometimes missed deps when more than one dep required a stamp check.
If must_build was nonempty when recursively calling isdirty() that
returned a list, we'd lose the original value of must_build.
2018-10-11 03:28:05 -04:00
Travis Cross
cb713bdace Restore SIGPIPE default action before exec(3)
Python chooses to ignore SIGPIPE, however most unix processes expect
to terminate on the signal.  Therefore failing to restore the default
action results in surprising behavior.  For example, we expect
`dd if=/dev/zero | head -c1` to return immediately.  However, prior to
this commit, that pipeline would hang forever.  Insidious forms of
data corruption or loss were also possible.

See:

  http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html
  http://blog.nelhage.com/2010/02/a-very-subtle-bug/
2018-10-11 03:28:05 -04:00
Tommi Virtanen
c2c013970e Avoid bashism >&file
The >& form is only for file descriptors, passing a file name there is
a bash extension.

    $ /bin/dash -c 'echo foo >&/dev/null'
    /bin/dash: 1: Syntax error: Bad fd number
2018-10-11 03:28:05 -04:00
Avery Pennarun
5156feae9d Switch sqlite3 journal mode to WAL.
WAL mode does make the deadlocking on MacOS go away.  This suggests
that pysqlite3 was leaving *read* transactions open for a long time; in
old-style sqlite journals, this prevents anyone from obtaining a write
lock, although it doesn't prevent other concurrent reads.

With WAL journals, writes can happen even while readers are holding a
lock, but the journal doesn't flush until the readers have released it.
This is not a "real" fix but it's fairly harmless, since all redo
instances will exit eventually, and when they do, the WAL journal can
be flushed.
2018-10-06 05:06:42 -04:00
Avery Pennarun
613625b580 Add more assertions about uncommitted sqlite transactions.
I think we were sometimes leaving half-done sqlite transactions sitting
around for a long time (eg. across sub-calls to .do files).  This
seemed to be okay on Linux, but caused sqlite deadlocks on MacOS.  Most
likely it's not the operating system, but the sqlite version and
journal mode in use.

In any case, the correct thing to do is to actually commit or rollback
transactions, not leave them hanging around.

...unfortunately this doesn't actually fix my MacOS deadlocks, which
makes me rather nervous.
2018-10-06 05:06:19 -04:00
Avery Pennarun
74f968d6ca Correctly report error when target dir does not exist.
If ./default.do knows how to build x/y/z, then we will run
	./default.do x/y/z x/y/z x__y__z.redo2.tmp
which can correctly generate $3, but then we can fail to rename it to
x/y/z because x/y doesn't exist.  This would previously through an
exception.  Now it prints a helpful error message.

default.do may create x/y, in which case renaming will succeed.
2018-10-06 02:38:32 -04:00
Avery Pennarun
34669fba65 Use os.lstat() instead of os.stat().
I think this aligns better with how redo works.  Otherwise, if a.do
creates a as a symlink, then changes to the symlink's *target* will
change a's stat/stamp information without re-running a.do, which looks
to redo like you modified a by hand, which causes it to stop running
a.do altogether.

With this change, modifications to a's target are okay, but they don't
trigger any redo dependency changes.  If you want that, then a.do
should redo-ifchange on its symlink target explicitly.
2018-10-06 00:14:02 -04:00
Avery Pennarun
61d35d3972 redo-whichdo: a command that explains the .do search path for a target.
For example:

$ redo-whichdo a/b/c/.x.y

- a/b/c.x.y.do
- a/b/default.x.y.do
- a/b/default.y.do
- a/b/default.do
- a/default.x.y.do
- a/default.y.do
- a/default.do
- default.x.y.do
- default.y.do
+ default.do
1 a/b/c.x.y
2 a/b/c.x.y

Lines starting with '-' mean a potential .do file that did not exist,
so we moved onto the next choice (but consider using redo-ifcreate in
case it gets created).  '+' means the .do file we actually chose.  '1'
and '2' are the $1 and $2 to pass along to the given .do file if you want to
call it for the given target.

(The output format is a little weird to make sure it's parseable with
sh 'read x y' calls, even when filenames contain spaces or special
characters.)
2018-10-04 20:20:53 -04:00
Avery Pennarun
484ed925ad Fix bug setting MAKEFLAGS, and support --jobserver-auth
GNU make post-4.2 renamed the --jobserver-fds option to
--jobserver-auth.  For compatibility with both older and newer
versions, when we set MAKEFLAGS we set both, and when we read MAKEFLAGS
we will accept either one.

Also, when MAKEFLAGS was not already set, redo would set a MAKEFLAGS with a
leading 'None' string, which was incorrect.  It should be the empty
string instead.
2018-10-03 19:54:54 -04:00
Avery Pennarun
cb05a7bd98 Nowadays there is a "non-recursive make considered harmful" paper.
Of course there is!  Let's complete the circle by linking to it,
because it links to this project (among many others).
2018-09-18 13:23:57 -04:00
Avery Pennarun
33dadbfe07 minimal/do: some shells return error in "read x <file" for empty files.
...or files that contain bytes but not a trailing newline.  It's okay if we
don't get any data, but we definitely have to *not* let "set -e" abort us.
Now that we fixed set -e in the previous patch, it revealed this problem.
2012-02-09 00:42:41 -05:00
Avery Pennarun
c28181e26f minimal/do: fix a really scary bugs in "set -e" behaviour.
If you run something like

  blah_function || return 1

then everything even *inside* blah_function is *not* subject to the "set -e"
that would otherwise be in effect.  That's true even for ". subfile" inside
blah_function - which is exactly how minimal/do runs .do files.

Instead, rewrite it as

  blah_function
  [ "$?" = "0" ] || return 1

And add a bit to the unit tests to ensure that "set -e" behaviour is enabled
in .do files as we expect, and crash loudly otherwise.

(This weird behaviour may only happen in some shells and not others.)

Also, we had a "helpful" alias of redo() defined at the bottom of the file.
Combined with the way we use '.' to source the .do files, this would make it
not start a new shell just to run a recursive 'redo' command.  It almost
works, but this stupid "set -e" bug could cause a nested .do file to not
honour "set -e" if someone ran "redo foo || exit 1" from inside a .do
script.  The performance optimization is clearly not worth it here, so
rename it to _redo(); that causes it to actually re-exec the redo program
(which is a symlink to minimal/do).
2012-02-09 00:42:41 -05:00
Avery Pennarun
ede182cb84 Add a test for install.do. 2012-02-09 00:42:41 -05:00
Avery Pennarun
a9ebabd6b7 Add a test for --keep-going option. 2012-02-09 00:42:41 -05:00
Avery Pennarun
9f6447e2cb Add a test for --shuffle option. 2012-02-09 00:42:41 -05:00
Avery Pennarun
34ce233f5e Rename 111-compile to 111-compile2.
Not a great name, but make it obvious that this is a slightly different test
from 110-compile.
2012-02-09 00:42:40 -05:00
Avery Pennarun
1f304f4d1d t/100-args: add a test for --old-args feature. 2012-02-09 00:42:40 -05:00