Directory reorg: move code into redo/, generate binaries in bin/.

It's time to start preparing for a version of redo that doesn't work unless we build it first (because it will rely on C modules, and eventually be rewritten in C altogether). To get rolling, remove the old-style symlinks to the main programs, and rename those programs from redo-*.py to redo/cmd_*.py. We'll also move all library functions into the redo/ dir, which is a more python-style naming convention. Previously, install.do was generating wrappers for installing in /usr/bin, which extend sys.path and then import+run the right file. This made "installed" redo work quite differently from running redo inside its source tree. Instead, let's always generate the wrappers in bin/, and not make anything executable except those wrappers. Since we're generating wrappers anyway, let's actually auto-detect the right version of python for the running system; distros can't seem to agree on what to call their python2 binaries (sigh). We'll fill in the right #! shebang lines. Since we're doing that, we can stop using /usr/bin/env, which will a) make things slightly faster, and b) let us use "python -S", which tells python not to load a bunch of extra crap we're not using, thus improving startup times. Annoyingly, we now have to build redo using minimal/do, then run the tests using bin/redo. To make this less annoying, we add a toplevel ./do script that knows the right steps, and a Makefile (whee!) for people who are used to typing 'make' and 'make test' and 'make clean'.
2018-12-03 21:39:15 -05:00 · 2018-12-03 21:39:15 -05:00 · f6fe00db5c
commit f6fe00db5c
parent 5bc7c861b6
140 changed files with 256 additions and 99 deletions
--- a/Documentation/Roadmap.md
+++ b/Documentation/Roadmap.md
@ -1,329 +0,0 @@
-We believe that, unlike most programs, it's actually possible to "finish"
-redo, in the sense of eventually not needing to extend its semantics or add
-new features.  That's because redo is a pretty low-level system that just
-provides some specific features (dependency checking, parallelism, log
-linearization, inter-process locking).  It's the job of your build scripts
-to tie those features together in the way you want.
-
-`make` has its own imperative syntax, which creates a temptation to add
-new built-in functions and syntax extensions.  In more
-"declarative" build systems, there's a constant need to write new extension
-modules or features in order to create functionality that wasn't available
-declaratively.  redo avoids that by using a turing-complete language
-to run your builds.  You should be able to build anything at all with redo,
-just by writing your .do scripts the way you want.
-
-Thus, the only things that need to be added to redo (other than portability
-and bug fixes, which will likely be needed forever) are to fix gaps in
-redo's model that prevent you from getting your work done.  This document
-describes the gaps we're currently aware of.
-
-Note that all of the items in this document are still unimplemented.  In
-most cases, that's because we haven't yet settled on a final design, and
-it's still open to discussion.  The place to discuss design issues is the
-[mailing list](../Contributing/#mailing-list).
-
-
-### default.do search path, and separated build directories
-
-One of the most controversial topics in redo development, and for developers
-trying to use redo, is: where do you put all those .do scripts?
-
-redo searches hierarchically up the directory tree from a target's filename,
-hoping to find a `default*.do` file that will match, and then uses the first
-one it finds.  This method is rather elegant when it works.  But many
-developers would like to put their output files into a separate directory
-from their source files, and that output directory might not be a
-subdirectory of the main project (for example, if the main project is on a
-read-only filesystem).
-
-There are already a few ways to make this work, such as placing a single
-`default.do` "proxy" or "delegation" script at the root of the output
-directory, which will bounce requests to .do files it finds elsewhere.  One
-nice thing about this feature is it doesn't require any changes to redo
-itself; redo already knows how to call your toplevel default.do script.
-However, some people find the delegation script to be inelegant and
-complicated.
-
-Other options include searching inside a known subdirectory name (eg.
-`do/`), which could be a symlink; or adding a `.dopath` file which tells
-redo to search elsewhere.
-
-So far, we haven't settled on the best design, and discussion is welcome.
-In the meantime, you can write a delegation script (TODO: link to example)
-for your project.  Because this requires no special redo features, it's
-unlikely to break in some later version of redo, even if we add a new
-method.
-
-
-### .do files that produce directories
-
-Sometimes you want a .do file to produce multiple output files in a single
-step.  One example is an autoconf `./configure` script, which might produce
-multiple files.  Or, for example, look at the [LaTeX typesetting
-example](../cookbook/latex/) in the redo cookbook.
-
-In the purest case, generating multiple outputs from a single .do file
-execution violates the redo semantics.  The design of redo calls for
-generating *one* output file from *zero or more* input files.  And most of
-the time, that works fine. But sometimes it's not enough.
-
-Currently (like in the LaTeX example linked above) we need to resolve this
-problem by taking advantage of "side effects" in redo: creating a set of
-files that are unknown to redo, but sit alongside the "known" files in the
-filesystem.  But this has the annoying limitation that you cannot
-redo-ifchange directly on the file you want, if it was generated this way.
-For example, if `runconfig.do` generates `Makefile` and `config.h`, you
-must not `redo-ifchange config.h` directly; there is no .do file for
-`config.h`.  You must `redo-ifchange runconfig` and then *use*
-`config.h`.
-
-(There are workarounds for that workaround: for example, `runconfig.do`
-could put all its output files in a `config/` directory, and then you could
-have a `config.h.do` that does `redo-ifchange runconfig` and `cp
-config/config.h $3`.  Then other scripts can `redo-ifchange config.h`
-without knowing any more about it.  But this method gets tedious.)
-
-One suggestion for improving the situation would be to teach redo about
-"directory" targets.  For example, maybe we have a `config.dir.do` that
-runs `./configure` and produces files in a directory named `config`.  The
-`.dir.do` magic suffix tells redo that if someone asks for
-`config/config.h`, it must first try to instantiate the directory named
-`config` (using `config.dir.do`), and only then try to depend on the file
-inside that directory.
-
-There are a number of holes in this design, however.  Notably, it's not
-obvious how redo should detect when to activate the magic directory feature.
-It's easy when there is a file named `config.dir.do`, but much less obvious
-for a file like `default.dir.do` that can construct certain directory types,
-but it's not advertised which ones.
-
-This particular cure may turn out to be worse than the disease.
-
-
-### Per-target-directory .redo database
-
-An unexpectedly very useful feature of redo is the ability to "redo from
-anywhere" and get the same results:
-```shell
-	$ cd /a/b/c
-	$ redo /x/y/z/all
-```
-should have the same results as
-```shell
-	$ cd /x/y/z
-	$ redo all
-```
-
-Inside a single project, this already works.  But as redo gets used more
-widely, and in particular when you have multiple redo-using projects that
-want to refer to other redo-using projects, redo can get confused about
-where to put its `.redo` state database.  Normally, it goes into a directory
-called `$REDO_BASE`, the root directory of your project.  But if a .do
-script refers to a directory outside or beside the root, this method doesn't
-work, and redo gets the wrong file state information.
-
-Further complications arise in the case of symlinks.  For example, if you
-ask redo to build `x/y/z/file` but `y` is a symlink to `q`, then redo will
-effectively end up replacing `x/q/z/file` when it replces `x/y/z/file`,
-since they're the same.  If someone then does `redo-ifchange x/q/z/file`,
-redo may become confused about why that file has "unexpectedly" changed.
-
-The fix for both problems is simple: put one `.redo` database in every
-directory that contains target files.  The `.redo` in each directory
-contains information only about the targets in that directory.  As a result,
-`x/y/z/file` and `x/q/z/file` will share the same state database,
-`x/q/z/.redo`, and building either target will update the state database's
-understanding of the file called `file` in the same directory, and there
-will be no confusion.
-
-Similarly, one redo-using project can refer to targets in another redo-using
-project with no problem, because redo will no longer have the concept of a
-`$REDO_BASE`, so there is no way to talk about targets "outside" the
-`$REDO_BASE`.
-
-Note that there is no reason to maintain a `.redo` state database in
-*source* directories (which might be read-only), only target directories.
-This is because we store `stat(2)` information for each dependency anyway, so
-it's harmless if multiple source filenames are aliases for the same
-underlying content.
-
-
-### redo-{sources,targets,ood} should take a list of targets
-
-With the above change to a per-target-directory `.redo` database, the
-original concept of the `redo-sources`, `redo-targets`, and `redo-ood`
-commands needs to change.  Currently they're defined to list "all" the
-sources, targets, and out-of-date targets, respectively.  But when there is
-no single database reflecting the entire filesystem, the concept of "all"
-becomes fuzzy.
-
-We'll have to change these programs to refer to "all (recursive)
-dependencies of targets in the current directory" by default, or of all
-targets listed on the command line otherwise.  This is probably more useful
-than the current behaviour anyway, since in a large project, one rarely
-wants to see a complete list of all sources and targets.
-
-
-### Deprecating "stdout capture" behaviour
-
-The [original design for redo](http://cr.yp.to/redo.html) specified that a
-.do script could produce its output either by writing to stdout, or by
-writing to the file named by the `$3` variable.
-
-Experience has shown that most developers find this very confusing.  In
-particular, results are undefined if you write to *both* stdout and `$3`.
-Also, many programs (including `make`!) write their log messages to stdout
-when they should write to stderr, so many .do scripts need to start with
-`exec >&2` to avoid confusion.
-
-In retrospect, automatically capturing stdout was probably a bad idea.  .do
-scripts should intentionally redirect to `$3`.  To enforce this, we could
-have redo report an error whenever a .do script returns after writing to its
-stdout.  For backward compatibility, we could provide a command-line option
-to downgrade the error to a warning.
-
-
-### Deprecating environment variable sharing
-
-In redo, it's considered a poor practice to pass environment variables (and
-other process attributes, like namespaces) from one .do script to another.
-This is because running `redo-ifchange /path/to/file` should always run
-`file`'s .do script with exactly the same settings, whether you do it from
-the toplevel from from deep inside a tree of dependencies.  If an
-environment variable set in one .do script can change what's seen by an
-inner .do script, this breaks the dependency mechanism and makes builds less
-repeatable.
-
-To make it harder to do this by accident, redo could intentionally wipe all
-but a short whitelist of allowed environment variables before running any
-.do script.
-
-As a bonus, by never sharing any state outside the filesystem, it becomes
-much more possible to make a "distributed redo" that builds different
-targets on different physical computers.
-
-
-### redo-recheck command
-
-Normally, redo only checks any given file dependency at most once per
-session, in order to reduce the number of system calls executed, thus
-greatly speeding up incremental builds.  As a result, `redo-ifchange` of the
-same target will only execute the relevant .do script at most once per
-session.
-
-In some situations, notably integration tests, we want to force redo to
-re-check more often.  Right now there's a hacky script called
-`t/flush-cache` in the redo distribution which does this, but it relies on
-specific knowledge of the .redo directory's database format, which means it
-only works in this specific version of redo; this prevents the integration
-tests from running (and thus checking compatibility with) competing redo
-implementations.
-
-If we standardized a `redo-recheck` command, which would flush the cache for
-the targets given on the command line, and all of their dependencies, this
-sort of integration test could work across multiple redo versions.  For redo
-versions which don't bother caching, `redo-recheck` could be a null
-operation.
-
-
-### tty input
-
-Right now, redo only allows a .do file to request input from the user's
-terminal if using `--no-log` and *not* using the `-j` option.  Terminal
-input is occasionally useful for `make config` interfaces, but parallelism
-and log linearization make the console too cluttered for a UI to work.
-
-The ninja build system has a [console
-pool](https://ninja-build.org/manual.html#_the_literal_console_literal_pool)
-that can contain up to one job at a time.  When a job is in the console
-pool, it takes over the console entirely.
-
-We could probably implement something similar in redo by using POSIX job
-control features, which suspend subtasks whenever they try to read
-from the tty.  If we caught the suspension signal and acquired a lock, we
-could serialize console access.
-
-Whether the complexity of this feature is worthwhile is unclear.  Maybe it
-makes more sense just to have a './configure' script that runs outside the
-redo environment, but still can call into redo-ifchange if needed.
-
-
-### redo-lock command
-
-Because it supports parallelism via recursion, redo automatically handles
-inter-process locking so that only one instance of redo can try to build a
-given target at a time.
-
-This sort of locking turns out to be very useful, but there are a few
-situations where requiring redo to "build a target by calling a .do file" in
-order to acquire a lock becomes awkward.
-
-For example, imagine redo is being used to call into `make` to run arbitrary
-`Makefile` targets.  `default.make.do` might look like this:
-```sh
-make "$2"
-```
-
-redo will automatically prevent two copies of `redo all.make` from running
-at once.  However, if someone runs `redo all.make myprogram.make`, then two
-copies of `make` will execute at once.  This *might* be harmless, but if the
-`all` target in the `Makefile` has a dependency on `myprogram`, then we will
-actually end up implicitly building `myprogram` from two places at once:
-from the `myprogram` part of `all.make` and from `myprogram.make`.
-
-In hairy situations like that, it would be nice to serialize all access
-inside `default.make.do`, perhaps like this:
-```sh
-redo-lock make.lock make "$2"
-```
-
-This would create a redo-style lock on the (virtual) file `make.lock`, but
-then instead of trying to `redo make.lock`, it would run the given command,
-in this case `make "$2"`.
-
-It's unclear whether this feature is really a good idea.  There are other
-(convoluted) ways to achieve the same goal.  Nevertheless, it would be easy
-enough to implement.  And redo versions that don't support parallelism could
-just make redo-lock a no-op, since they guarantee serialization in all cases
-anyway.
-
-
-### Include a (minimal) POSIX shell
-
-A common problem when writing build scripts, both in `make` and in redo, is
-gratuitous incompatibility between all the available POSIX-like unix shells.
-Nowadays, most shells support [various pure POSIX sh
-features](https://apenwarr.ca/log/20110228), but there are always glitches.
-In some cases, POSIX doesn't define the expected behaviour for certain
-situations.  In others, shells like `bash` try to "improve" things by
-changing the syntax in non-POSIX ways.  Or maybe they just add new
-backward-compatible features, which you then rely on accidentally because
-you only tested your scripts with `bash`.
-
-redo on Windows using something like [MSYS](http://www.mingw.org/wiki/msys)
-is especially limited by the lack of (and oddity of) available unix tools.
-
-To avoid all these portability problems for .do script maintainers, we might
-consider bundling redo with a particular (optional) sh implementation, and
-maybe also unix-like tools, that it will use by default.  An obvious
-candidate would be busybox, which has a win32 version called
-[busybox-w32](https://frippery.org/busybox/).
-
-
-### redoconf
-
-redo is fundamentally a low-level tool that doesn't know as much about
-compiling specific programming languages as do higher-level tools like
-[cmake](https://cmake.org/).
-
-Similarly, `make` doesn't know much about specific programming languages
-(and what it does know is hopelessly out of date, but cannot be deleted or
-updated because it would break backward compatibility with old Makefiles).
-This is why `autoconf` and `automake` were created: to automatically fill in
-the language- and platform-specific blanks, while letting `make` still
-handle executing the low level instructions.
-
-It might be useful to have a redo-native autoconf/automake-like system,
-although you can already use autoconf with redo, so this might not be
-essential.