Directory reorg: move code into redo/, generate binaries in bin/.
It's time to start preparing for a version of redo that doesn't work unless we build it first (because it will rely on C modules, and eventually be rewritten in C altogether). To get rolling, remove the old-style symlinks to the main programs, and rename those programs from redo-*.py to redo/cmd_*.py. We'll also move all library functions into the redo/ dir, which is a more python-style naming convention. Previously, install.do was generating wrappers for installing in /usr/bin, which extend sys.path and then import+run the right file. This made "installed" redo work quite differently from running redo inside its source tree. Instead, let's always generate the wrappers in bin/, and not make anything executable except those wrappers. Since we're generating wrappers anyway, let's actually auto-detect the right version of python for the running system; distros can't seem to agree on what to call their python2 binaries (sigh). We'll fill in the right #! shebang lines. Since we're doing that, we can stop using /usr/bin/env, which will a) make things slightly faster, and b) let us use "python -S", which tells python not to load a bunch of extra crap we're not using, thus improving startup times. Annoyingly, we now have to build redo using minimal/do, then run the tests using bin/redo. To make this less annoying, we add a toplevel ./do script that knows the right steps, and a Makefile (whee!) for people who are used to typing 'make' and 'make test' and 'make clean'.
This commit is contained in:
parent
5bc7c861b6
commit
f6fe00db5c
140 changed files with 256 additions and 99 deletions
|
|
@ -1,329 +0,0 @@
|
|||
We believe that, unlike most programs, it's actually possible to "finish"
|
||||
redo, in the sense of eventually not needing to extend its semantics or add
|
||||
new features. That's because redo is a pretty low-level system that just
|
||||
provides some specific features (dependency checking, parallelism, log
|
||||
linearization, inter-process locking). It's the job of your build scripts
|
||||
to tie those features together in the way you want.
|
||||
|
||||
`make` has its own imperative syntax, which creates a temptation to add
|
||||
new built-in functions and syntax extensions. In more
|
||||
"declarative" build systems, there's a constant need to write new extension
|
||||
modules or features in order to create functionality that wasn't available
|
||||
declaratively. redo avoids that by using a turing-complete language
|
||||
to run your builds. You should be able to build anything at all with redo,
|
||||
just by writing your .do scripts the way you want.
|
||||
|
||||
Thus, the only things that need to be added to redo (other than portability
|
||||
and bug fixes, which will likely be needed forever) are to fix gaps in
|
||||
redo's model that prevent you from getting your work done. This document
|
||||
describes the gaps we're currently aware of.
|
||||
|
||||
Note that all of the items in this document are still unimplemented. In
|
||||
most cases, that's because we haven't yet settled on a final design, and
|
||||
it's still open to discussion. The place to discuss design issues is the
|
||||
[mailing list](../Contributing/#mailing-list).
|
||||
|
||||
|
||||
### default.do search path, and separated build directories
|
||||
|
||||
One of the most controversial topics in redo development, and for developers
|
||||
trying to use redo, is: where do you put all those .do scripts?
|
||||
|
||||
redo searches hierarchically up the directory tree from a target's filename,
|
||||
hoping to find a `default*.do` file that will match, and then uses the first
|
||||
one it finds. This method is rather elegant when it works. But many
|
||||
developers would like to put their output files into a separate directory
|
||||
from their source files, and that output directory might not be a
|
||||
subdirectory of the main project (for example, if the main project is on a
|
||||
read-only filesystem).
|
||||
|
||||
There are already a few ways to make this work, such as placing a single
|
||||
`default.do` "proxy" or "delegation" script at the root of the output
|
||||
directory, which will bounce requests to .do files it finds elsewhere. One
|
||||
nice thing about this feature is it doesn't require any changes to redo
|
||||
itself; redo already knows how to call your toplevel default.do script.
|
||||
However, some people find the delegation script to be inelegant and
|
||||
complicated.
|
||||
|
||||
Other options include searching inside a known subdirectory name (eg.
|
||||
`do/`), which could be a symlink; or adding a `.dopath` file which tells
|
||||
redo to search elsewhere.
|
||||
|
||||
So far, we haven't settled on the best design, and discussion is welcome.
|
||||
In the meantime, you can write a delegation script (TODO: link to example)
|
||||
for your project. Because this requires no special redo features, it's
|
||||
unlikely to break in some later version of redo, even if we add a new
|
||||
method.
|
||||
|
||||
|
||||
### .do files that produce directories
|
||||
|
||||
Sometimes you want a .do file to produce multiple output files in a single
|
||||
step. One example is an autoconf `./configure` script, which might produce
|
||||
multiple files. Or, for example, look at the [LaTeX typesetting
|
||||
example](../cookbook/latex/) in the redo cookbook.
|
||||
|
||||
In the purest case, generating multiple outputs from a single .do file
|
||||
execution violates the redo semantics. The design of redo calls for
|
||||
generating *one* output file from *zero or more* input files. And most of
|
||||
the time, that works fine. But sometimes it's not enough.
|
||||
|
||||
Currently (like in the LaTeX example linked above) we need to resolve this
|
||||
problem by taking advantage of "side effects" in redo: creating a set of
|
||||
files that are unknown to redo, but sit alongside the "known" files in the
|
||||
filesystem. But this has the annoying limitation that you cannot
|
||||
redo-ifchange directly on the file you want, if it was generated this way.
|
||||
For example, if `runconfig.do` generates `Makefile` and `config.h`, you
|
||||
must not `redo-ifchange config.h` directly; there is no .do file for
|
||||
`config.h`. You must `redo-ifchange runconfig` and then *use*
|
||||
`config.h`.
|
||||
|
||||
(There are workarounds for that workaround: for example, `runconfig.do`
|
||||
could put all its output files in a `config/` directory, and then you could
|
||||
have a `config.h.do` that does `redo-ifchange runconfig` and `cp
|
||||
config/config.h $3`. Then other scripts can `redo-ifchange config.h`
|
||||
without knowing any more about it. But this method gets tedious.)
|
||||
|
||||
One suggestion for improving the situation would be to teach redo about
|
||||
"directory" targets. For example, maybe we have a `config.dir.do` that
|
||||
runs `./configure` and produces files in a directory named `config`. The
|
||||
`.dir.do` magic suffix tells redo that if someone asks for
|
||||
`config/config.h`, it must first try to instantiate the directory named
|
||||
`config` (using `config.dir.do`), and only then try to depend on the file
|
||||
inside that directory.
|
||||
|
||||
There are a number of holes in this design, however. Notably, it's not
|
||||
obvious how redo should detect when to activate the magic directory feature.
|
||||
It's easy when there is a file named `config.dir.do`, but much less obvious
|
||||
for a file like `default.dir.do` that can construct certain directory types,
|
||||
but it's not advertised which ones.
|
||||
|
||||
This particular cure may turn out to be worse than the disease.
|
||||
|
||||
|
||||
### Per-target-directory .redo database
|
||||
|
||||
An unexpectedly very useful feature of redo is the ability to "redo from
|
||||
anywhere" and get the same results:
|
||||
```shell
|
||||
$ cd /a/b/c
|
||||
$ redo /x/y/z/all
|
||||
```
|
||||
should have the same results as
|
||||
```shell
|
||||
$ cd /x/y/z
|
||||
$ redo all
|
||||
```
|
||||
|
||||
Inside a single project, this already works. But as redo gets used more
|
||||
widely, and in particular when you have multiple redo-using projects that
|
||||
want to refer to other redo-using projects, redo can get confused about
|
||||
where to put its `.redo` state database. Normally, it goes into a directory
|
||||
called `$REDO_BASE`, the root directory of your project. But if a .do
|
||||
script refers to a directory outside or beside the root, this method doesn't
|
||||
work, and redo gets the wrong file state information.
|
||||
|
||||
Further complications arise in the case of symlinks. For example, if you
|
||||
ask redo to build `x/y/z/file` but `y` is a symlink to `q`, then redo will
|
||||
effectively end up replacing `x/q/z/file` when it replces `x/y/z/file`,
|
||||
since they're the same. If someone then does `redo-ifchange x/q/z/file`,
|
||||
redo may become confused about why that file has "unexpectedly" changed.
|
||||
|
||||
The fix for both problems is simple: put one `.redo` database in every
|
||||
directory that contains target files. The `.redo` in each directory
|
||||
contains information only about the targets in that directory. As a result,
|
||||
`x/y/z/file` and `x/q/z/file` will share the same state database,
|
||||
`x/q/z/.redo`, and building either target will update the state database's
|
||||
understanding of the file called `file` in the same directory, and there
|
||||
will be no confusion.
|
||||
|
||||
Similarly, one redo-using project can refer to targets in another redo-using
|
||||
project with no problem, because redo will no longer have the concept of a
|
||||
`$REDO_BASE`, so there is no way to talk about targets "outside" the
|
||||
`$REDO_BASE`.
|
||||
|
||||
Note that there is no reason to maintain a `.redo` state database in
|
||||
*source* directories (which might be read-only), only target directories.
|
||||
This is because we store `stat(2)` information for each dependency anyway, so
|
||||
it's harmless if multiple source filenames are aliases for the same
|
||||
underlying content.
|
||||
|
||||
|
||||
### redo-{sources,targets,ood} should take a list of targets
|
||||
|
||||
With the above change to a per-target-directory `.redo` database, the
|
||||
original concept of the `redo-sources`, `redo-targets`, and `redo-ood`
|
||||
commands needs to change. Currently they're defined to list "all" the
|
||||
sources, targets, and out-of-date targets, respectively. But when there is
|
||||
no single database reflecting the entire filesystem, the concept of "all"
|
||||
becomes fuzzy.
|
||||
|
||||
We'll have to change these programs to refer to "all (recursive)
|
||||
dependencies of targets in the current directory" by default, or of all
|
||||
targets listed on the command line otherwise. This is probably more useful
|
||||
than the current behaviour anyway, since in a large project, one rarely
|
||||
wants to see a complete list of all sources and targets.
|
||||
|
||||
|
||||
### Deprecating "stdout capture" behaviour
|
||||
|
||||
The [original design for redo](http://cr.yp.to/redo.html) specified that a
|
||||
.do script could produce its output either by writing to stdout, or by
|
||||
writing to the file named by the `$3` variable.
|
||||
|
||||
Experience has shown that most developers find this very confusing. In
|
||||
particular, results are undefined if you write to *both* stdout and `$3`.
|
||||
Also, many programs (including `make`!) write their log messages to stdout
|
||||
when they should write to stderr, so many .do scripts need to start with
|
||||
`exec >&2` to avoid confusion.
|
||||
|
||||
In retrospect, automatically capturing stdout was probably a bad idea. .do
|
||||
scripts should intentionally redirect to `$3`. To enforce this, we could
|
||||
have redo report an error whenever a .do script returns after writing to its
|
||||
stdout. For backward compatibility, we could provide a command-line option
|
||||
to downgrade the error to a warning.
|
||||
|
||||
|
||||
### Deprecating environment variable sharing
|
||||
|
||||
In redo, it's considered a poor practice to pass environment variables (and
|
||||
other process attributes, like namespaces) from one .do script to another.
|
||||
This is because running `redo-ifchange /path/to/file` should always run
|
||||
`file`'s .do script with exactly the same settings, whether you do it from
|
||||
the toplevel from from deep inside a tree of dependencies. If an
|
||||
environment variable set in one .do script can change what's seen by an
|
||||
inner .do script, this breaks the dependency mechanism and makes builds less
|
||||
repeatable.
|
||||
|
||||
To make it harder to do this by accident, redo could intentionally wipe all
|
||||
but a short whitelist of allowed environment variables before running any
|
||||
.do script.
|
||||
|
||||
As a bonus, by never sharing any state outside the filesystem, it becomes
|
||||
much more possible to make a "distributed redo" that builds different
|
||||
targets on different physical computers.
|
||||
|
||||
|
||||
### redo-recheck command
|
||||
|
||||
Normally, redo only checks any given file dependency at most once per
|
||||
session, in order to reduce the number of system calls executed, thus
|
||||
greatly speeding up incremental builds. As a result, `redo-ifchange` of the
|
||||
same target will only execute the relevant .do script at most once per
|
||||
session.
|
||||
|
||||
In some situations, notably integration tests, we want to force redo to
|
||||
re-check more often. Right now there's a hacky script called
|
||||
`t/flush-cache` in the redo distribution which does this, but it relies on
|
||||
specific knowledge of the .redo directory's database format, which means it
|
||||
only works in this specific version of redo; this prevents the integration
|
||||
tests from running (and thus checking compatibility with) competing redo
|
||||
implementations.
|
||||
|
||||
If we standardized a `redo-recheck` command, which would flush the cache for
|
||||
the targets given on the command line, and all of their dependencies, this
|
||||
sort of integration test could work across multiple redo versions. For redo
|
||||
versions which don't bother caching, `redo-recheck` could be a null
|
||||
operation.
|
||||
|
||||
|
||||
### tty input
|
||||
|
||||
Right now, redo only allows a .do file to request input from the user's
|
||||
terminal if using `--no-log` and *not* using the `-j` option. Terminal
|
||||
input is occasionally useful for `make config` interfaces, but parallelism
|
||||
and log linearization make the console too cluttered for a UI to work.
|
||||
|
||||
The ninja build system has a [console
|
||||
pool](https://ninja-build.org/manual.html#_the_literal_console_literal_pool)
|
||||
that can contain up to one job at a time. When a job is in the console
|
||||
pool, it takes over the console entirely.
|
||||
|
||||
We could probably implement something similar in redo by using POSIX job
|
||||
control features, which suspend subtasks whenever they try to read
|
||||
from the tty. If we caught the suspension signal and acquired a lock, we
|
||||
could serialize console access.
|
||||
|
||||
Whether the complexity of this feature is worthwhile is unclear. Maybe it
|
||||
makes more sense just to have a './configure' script that runs outside the
|
||||
redo environment, but still can call into redo-ifchange if needed.
|
||||
|
||||
|
||||
### redo-lock command
|
||||
|
||||
Because it supports parallelism via recursion, redo automatically handles
|
||||
inter-process locking so that only one instance of redo can try to build a
|
||||
given target at a time.
|
||||
|
||||
This sort of locking turns out to be very useful, but there are a few
|
||||
situations where requiring redo to "build a target by calling a .do file" in
|
||||
order to acquire a lock becomes awkward.
|
||||
|
||||
For example, imagine redo is being used to call into `make` to run arbitrary
|
||||
`Makefile` targets. `default.make.do` might look like this:
|
||||
```sh
|
||||
make "$2"
|
||||
```
|
||||
|
||||
redo will automatically prevent two copies of `redo all.make` from running
|
||||
at once. However, if someone runs `redo all.make myprogram.make`, then two
|
||||
copies of `make` will execute at once. This *might* be harmless, but if the
|
||||
`all` target in the `Makefile` has a dependency on `myprogram`, then we will
|
||||
actually end up implicitly building `myprogram` from two places at once:
|
||||
from the `myprogram` part of `all.make` and from `myprogram.make`.
|
||||
|
||||
In hairy situations like that, it would be nice to serialize all access
|
||||
inside `default.make.do`, perhaps like this:
|
||||
```sh
|
||||
redo-lock make.lock make "$2"
|
||||
```
|
||||
|
||||
This would create a redo-style lock on the (virtual) file `make.lock`, but
|
||||
then instead of trying to `redo make.lock`, it would run the given command,
|
||||
in this case `make "$2"`.
|
||||
|
||||
It's unclear whether this feature is really a good idea. There are other
|
||||
(convoluted) ways to achieve the same goal. Nevertheless, it would be easy
|
||||
enough to implement. And redo versions that don't support parallelism could
|
||||
just make redo-lock a no-op, since they guarantee serialization in all cases
|
||||
anyway.
|
||||
|
||||
|
||||
### Include a (minimal) POSIX shell
|
||||
|
||||
A common problem when writing build scripts, both in `make` and in redo, is
|
||||
gratuitous incompatibility between all the available POSIX-like unix shells.
|
||||
Nowadays, most shells support [various pure POSIX sh
|
||||
features](https://apenwarr.ca/log/20110228), but there are always glitches.
|
||||
In some cases, POSIX doesn't define the expected behaviour for certain
|
||||
situations. In others, shells like `bash` try to "improve" things by
|
||||
changing the syntax in non-POSIX ways. Or maybe they just add new
|
||||
backward-compatible features, which you then rely on accidentally because
|
||||
you only tested your scripts with `bash`.
|
||||
|
||||
redo on Windows using something like [MSYS](http://www.mingw.org/wiki/msys)
|
||||
is especially limited by the lack of (and oddity of) available unix tools.
|
||||
|
||||
To avoid all these portability problems for .do script maintainers, we might
|
||||
consider bundling redo with a particular (optional) sh implementation, and
|
||||
maybe also unix-like tools, that it will use by default. An obvious
|
||||
candidate would be busybox, which has a win32 version called
|
||||
[busybox-w32](https://frippery.org/busybox/).
|
||||
|
||||
|
||||
### redoconf
|
||||
|
||||
redo is fundamentally a low-level tool that doesn't know as much about
|
||||
compiling specific programming languages as do higher-level tools like
|
||||
[cmake](https://cmake.org/).
|
||||
|
||||
Similarly, `make` doesn't know much about specific programming languages
|
||||
(and what it does know is hopelessly out of date, but cannot be deleted or
|
||||
updated because it would break backward compatibility with old Makefiles).
|
||||
This is why `autoconf` and `automake` were created: to automatically fill in
|
||||
the language- and platform-specific blanks, while letting `make` still
|
||||
handle executing the low level instructions.
|
||||
|
||||
It might be useful to have a redo-native autoconf/automake-like system,
|
||||
although you can already use autoconf with redo, so this might not be
|
||||
essential.
|
||||
Loading…
Add table
Add a link
Reference in a new issue