Generally clean up the README.

This commit is contained in:
Avery Pennarun 2010-12-12 03:50:56 -08:00
commit 14456d5892

472
README.md
View file

@ -4,11 +4,11 @@
program. There are many such competitors, because many people over the program. There are many such competitors, because many people over the
years have been dissatisfied with make's limitations. However, of all the years have been dissatisfied with make's limitations. However, of all the
replacements I've seen, only redo captures the essential simplicity and replacements I've seen, only redo captures the essential simplicity and
flexibility of make, while eliminating its flaws. To my great surprise, it flexibility of make, while avoiding its flaws. To my great surprise, it
manages to do this while being simultaneously simpler than make, more manages to do this while being simultaneously simpler than make, more
flexible than make, and more powerful than make. flexible than make, and more powerful than make.
Although I wrote redo and I would love to take credit for it, this magical Although I wrote redo and I would love to take credit for it, the magical
simplicity and flexibility comes because I copied verbatim a design by simplicity and flexibility comes because I copied verbatim a design by
Daniel J. Bernstein (creator of qmail and djbdns, among many other useful Daniel J. Bernstein (creator of qmail and djbdns, among many other useful
things). He posted some very terse notes on his web site at one point things). He posted some very terse notes on his web site at one point
@ -24,25 +24,27 @@ way](http://thedjbway.b0llix.net/future.html), by Wayne Marshall.
After I found out about djb redo, I searched the Internet for any sign that After I found out about djb redo, I searched the Internet for any sign that
other people had discovered what I had: a hidden, unimplemented gem of other people had discovered what I had: a hidden, unimplemented gem of
brilliant code design. I found only one interesting link: Alan Grosskurth, whose brilliant code design. I found only one interesting link: Alan Grosskurth,
[Master's thesis at the University of whose [Master's thesis at the University of Waterloo](http://grosskurth.ca/papers/mmath-thesis.pdf)
Waterloo](http://grosskurth.ca/papers/mmath-thesis.pdf) was about top-down was about top-down software rebuilding, that is, djb redo. He wrote his
software rebuilding, that is, djb redo. He wrote his own (admittedly slow) own (admittedly slow) implementation in about 250 lines of shell script.
implementation in about 250 lines of shell script.
If you've ever thought about rewriting GNU make from scratch, the idea of If you've ever thought about rewriting GNU make from scratch, the idea of
doing it in 250 lines of shell script probably didn't occur to you. redo is doing it in 250 lines of shell script probably didn't occur to you. redo is
so simple that it's actually possible. For testing, I actually wrote an so simple that it's actually possible. For testing, I actually wrote an
even more minimal version (which always rebuilds everything instead of even more minimal version,which always rebuilds everything instead of
checking dependencies) in only 70 lines of shell. checking dependencies, in 99 lines of shell (less than 2 kbytes).
The design is simply that good. The design is simply that good.
This implementation of redo is called `redo` for the same reason that there My implementation of redo is called `redo` for the same reason that there
are 75 different versions of `make` that are all called `make`. It's just are 75 different versions of `make` that are all called `make`. It's somehow
easier that way. Hopefully it will turn out to be compatible with the other easier that way. Hopefully it will turn out to be compatible with the other
implementations, should there be any. implementations, should there be any.
My extremely minimal implementation, called `do`, is in the `minimal/`
directory of this repository.
# License # License
@ -60,7 +62,11 @@ get all the speed of non-recursive `make` (only check dependencies once per
run) combined with all the cleanliness of recursive `make` (you don't have run) combined with all the cleanliness of recursive `make` (you don't have
code from one module stomping on code from another module). code from one module stomping on code from another module).
But the easiest way to show it is with an example. (Disclaimer: my current implementation is not as fast as `make` for some
things, because it's written in python. Eventually I'll rewrite it an C and
it'll be very, very fast.)
The easiest way to show it is with an example.
Create a file called default.o.do: Create a file called default.o.do:
redo-ifchange $1.c redo-ifchange $1.c
@ -110,9 +116,8 @@ the autodependencies? The filename `default.o.do` means "run this script to
generate a .o file unless there's a more specific whatever.o.do script that generate a .o file unless there's a more specific whatever.o.do script that
applies." applies."
The amazing innovation in redo - and really, this is the key innovation that The key thing to understand about redo is that declaring a dependency is just
makes everything else work - is that declaring a dependency is just another another shell command. The `redo-ifchange` command means, "build each of my
shell command. The `redo-ifchange` command means, "build each of my
arguments. If any of them or their dependencies ever change, then I need to arguments. If any of them or their dependencies ever change, then I need to
run the *current script* over again." run the *current script* over again."
@ -126,13 +131,14 @@ just once right at the start of your project build.
But best of all, as you can see in `default.o.do`, you can declare a But best of all, as you can see in `default.o.do`, you can declare a
dependency *after* building the program. In C, you get your best dependency dependency *after* building the program. In C, you get your best dependency
information by trying to actually build, since that's how you find out which information by trying to actually build, since that's how you find out which
headers you need. redo offers the following simple insight: you don't actually headers you need. redo is based on the following simple insight:
care what the dependencies are *before* you build the file; if the file you don't actually
care what the dependencies are *before* you build the target; if the target
doesn't exist, you obviously need to build it. Then, the build script doesn't exist, you obviously need to build it. Then, the build script
itself can provide the dependency information however it wants; unlike in itself can provide the dependency information however it wants; unlike in
`make`, you don't need a special dependency syntax at all. You can even `make`, you don't need a special dependency syntax at all. You can even
declare your dependencies after building, which makes everything much declare some of your dependencies after building, which makes C-style
simpler. autodependencies much simpler.
(GNU make supports putting some of your dependencies in include files, and (GNU make supports putting some of your dependencies in include files, and
auto-reloading those include files if they change. But this is very auto-reloading those include files if they change. But this is very
@ -142,16 +148,18 @@ changes. With redo, you can just read the script from top to bottom. A
`redo-ifchange` call is like calling a function, which you can also read `redo-ifchange` call is like calling a function, which you can also read
from top to bottom.) from top to bottom.)
# One script per file? Can't I just put it all in one big Redofile like make does? # One script per file? Can't I just put it all in one big Redofile like make does?
One of my favourite features of redo is that it doesn't add any new syntax; One of my favourite features of redo is that it doesn't add any new syntax;
the syntax of redo is *just* the syntax of sh. the syntax of redo is *exactly* the syntax of sh... because sh is the program
interpreting your .do file.
Also, it's surprisingly useful to have each build script in its own file; Also, it's surprisingly useful to have each build script in its own file;
that way, you can declare a dependency on just that one build script instead that way, you can declare a dependency on just that one build script instead
of the entire Makefile, and you won't have to rebuild everything just of the entire Makefile, and you won't have to rebuild everything just
because of a one-line Makefile change. (Some build tools avoid that because of a one-line Makefile change. (Some build tools avoid that same
by tracking exactly which variables and commands were used to do the build. problem by tracking which variables and commands were used to do the build.
But that's more complex, more error prone, and slower.) But that's more complex, more error prone, and slower.)
Still, it would be rather easy to make a "Redofile" parser that just has a Still, it would be rather easy to make a "Redofile" parser that just has a
@ -167,9 +175,8 @@ into their own files. You could even write a .do file to do it.
It's not obvious that this would be a real improvement however. It's not obvious that this would be a real improvement however.
See djb's [Target files depend on build See djb's [Target files depend on build scripts](http://cr.yp.to/redo/honest-script.html)
scripts](http://cr.yp.to/redo/honest-script.html) article for more article for more information.
information.
# What are the three parameters ($1, $2, $3) to a .do file? # What are the three parameters ($1, $2, $3) to a .do file?
@ -201,20 +208,23 @@ You should use $1 and $2 only in constructing input
filenames and dependencies; never modify the file named by filenames and dependencies; never modify the file named by
$1 in your script. Only ever write to the file named by $1 in your script. Only ever write to the file named by
$3. That way redo can guarantee proper dependency $3. That way redo can guarantee proper dependency
management and atomicity. management and atomicity. (For convenience, you can write
to stdout instead of $3 if you want.)
For example, you could compile a .c file into a .o file For example, you could compile a .c file into a .o file
like this, from a script named `default.o.do`: like this, from a script named `default.o.do`:
redo-ifchange $1.c
gcc -o $3 -c $1.c gcc -o $3 -c $1.c
Note that $2, the .o extension, is rarely useful. Note that $2, the output file's .o extension, is rarely useful
since you always know what it is.
FIXME: djb's design documentation doesn't clearly describe FIXME: djb's design documentation doesn't clearly describe
$1 and $2, although it's clear that $3 is the output $1 and $2, although it's clear that $3 is the output
filename. We may have guessed $1 and $2, particularly $2, filename. We may have guessed $1 and $2, particularly $2,
incorrectly, so we might have to change their meanings incorrectly, so we might have to change their meanings
later. later in order to be compatible with djb's implementation.
# What happens to the stdin/stdout/stderr in a redo file? # What happens to the stdin/stdout/stderr in a redo file?
@ -244,13 +254,13 @@ Not currently. There's nothing fundamentally preventing us from allowing
it. However, it seems easier to reason about your build process if you it. However, it seems easier to reason about your build process if you
*aren't* auto-generating your build scripts on the fly. *aren't* auto-generating your build scripts on the fly.
This might change. This might change someday.
# Do end users have to have redo installed in order to build my project? # Do end users have to have redo installed in order to build my project?
No. We include a very short (70 lines, as of this writing) shell script No. We include a very short (99 lines, as of this writing) shell script
called `do` in the `minimal` subdirectory of the redo project. `do` is like called `do` in the `minimal/` subdirectory of the redo project. `do` is like
`redo` (and it works with the same `*.do` scripts), except it doesn't `redo` (and it works with the same `*.do` scripts), except it doesn't
understand dependencies; it just always rebuilds everything from the top. understand dependencies; it just always rebuilds everything from the top.
@ -264,11 +274,14 @@ matters, you could just include it with your project.
# How does redo store dependencies? # How does redo store dependencies?
FIXME: At the toplevel of your project, redo creates a directory
Currently, in a directory called `.redo` that's full of text files. This named `.redo`. That directory contains a sqlite3 database
isn't really optimal, so it will change eventually. Please consider the with dependency information.
storage format undocumented (but feel free to poke around and look; it's
simple enough). The format of the `.redo` directory is undocumented because
it may change at any time. If you really need to make a
took that pokes around in there, please ask on the mailing
list if we can standardize something for you.
# If a target didn't change, how to I prevent dependents from being rebuilt? # If a target didn't change, how to I prevent dependents from being rebuilt?
@ -281,27 +294,34 @@ identical.
With `make`, which makes build decisions based on timestamps, you would With `make`, which makes build decisions based on timestamps, you would
simply have the ./configure script write to config.h.new, then only simply have the ./configure script write to config.h.new, then only
overwrite config.h with that if the two files are different. overwrite config.h with that if the two files are different.
However, that's a bit tedious.
FIXME: With `redo`, there's an easier way. You can have a
This is not possible in the current version of `redo`. redo knows whenever it config.do script that looks like this:
rebuilds a target and doesn't bother re-checking dependencies after that;
even if the file didn't technically change, it is considered "rebuilt,"
which means all its dependants now need to be rebuilt.
The advantage of this method is you can't accidentally prevent the redo-ifchange autogen.sh *.ac
rebuilding of things just by marking the target files as "newer" or marking ./autogen.sh
the source files as "older" (as sometimes happens when you extract an old ./configure
tarball or backup on top of your source code files). The disadvantage is cat config.h configure Makefile | redo-stamp
unnecessary rebuilding of some stuff sometimes.
We will have to find a solution to this before redo 1.0. Now any of your other .do files can depend on a target called
`config`. `config` gets rebuilt automatically if any of
your autoconf input files are changed (or if someone does
`redo config` to force it). But because of the call to
redo-stamp, `config` is only considered to have changed if
the contents of config.h, configure, or Makefile are
different than they were before.
See also the next question. (Note that you might actually want to break this .do up into a
few phases: for example, one that runs aclocal, one that
runs autoconf, and one that runs ./configure. That way
your build can always do the minimum amount of work
necessary.)
# What about checksum-based dependencies instead of timestamp-based ones? # Why not always use checksum-based dependencies instead of timestamps?
Some build systems keep a checksum of target files and rebuild dependants Some build systems keep a checksum of target files and rebuild dependents
only when the target changes. This is appealing in some cases; for example, only when the target changes. This is appealing in some cases; for example,
with ./configure generating config.h, it could just go ahead and generate with ./configure generating config.h, it could just go ahead and generate
config.h; the build system would be smart enough to rebuild or not rebuild config.h; the build system would be smart enough to rebuild or not rebuild
@ -309,24 +329,23 @@ dependencies automatically. This keeps build scripts simple and gets rid of
the need for people to re-implement file comparison over and over in every the need for people to re-implement file comparison over and over in every
project or for multiple files in the same project. project or for multiple files in the same project.
I think this would be a good addition to `redo` - and not a very difficult There are disadvantages to using checksums for everything,
one. however:
Probably we should add a new command similar to `redo-ifchange`; let's call - calculating checksums for every output file adds time to
it `redo-ifsum` or `redo-ifdiff`. That command would verify checksums the build;
instead of timestamps.
Sometimes you don't want to use checksums for verification; for example, in - it makes it hard to *force* things to rebuild when you
some complicated build systems, you want to create empty `something.stamp` know you absolutely want that;
files to indicate that some big complex operation has completed
successfully. But empty files all have the same checksum, so perhaps you'd
rather just use a timestamp comparison in that case. (Alternatively, you
could fill the file with data - maybe a series of checksums - indicating the
state of the data that was produced. If that data changed, the stamp would
then be out of date.)
FIXME: This requires a bit more thought before we commit to any particular - targets that are just used for aggregation (ie. they
option. don't produce any output of their own) would always have
the same checksum - the checksum of a zero-byte file -
which causes confusing results.
Thus, we made the decision to only use checksums for
targets that explicitly call `redo-stamp` (see previous
question).
# Can my .do files be written in a language other than sh? # Can my .do files be written in a language other than sh?
@ -337,29 +356,41 @@ shell to run your .do script. But that opens new problems, like figuring
out what is the equivalent of the `redo-ifchange` command in python. Do you out what is the equivalent of the `redo-ifchange` command in python. Do you
just run it in a subprocess? That might be unnecessarily slow. And so on. just run it in a subprocess? That might be unnecessarily slow. And so on.
Right now, `redo` explicitly runs `sh -c filename.do`. The main reasons for Right now, `redo` explicitly runs `sh filename.do`. The main reasons for
this are to make the #!/ line optional, and so you don't have to remember to this are to make the #!/ line optional, and so you don't have to remember to
`chmod +x` your .do files. `chmod +x` your .do files.
# Can a single .do script generate multiple outputs? # Can a single .do script generate multiple outputs?
FIXME: Not presently. This seems like a useful feature though. FIXME: Yes, but this is a bit imperfect.
For example, you could have a file called `default.do.do` that would For example, compiling a .java file produces a bunch of .class
generate .do files from a `Redofile`. Then you wouldn't have to argue with files, but exactly which files? It depends on the content
the `redo` maintainers about whether putting stuff into a single `Redofile` of the .java file. Ideally, we would like to allow our .do
is better than the current behaviour. file to compile the .java file, note which .class files
were generated, and tell redo about it for dependency
checking.
Right now you could do that, except you would want to parse the `Redofile` However, this ends up being confusing; if myprog depends
only once and produce a bunch of `.do` files from that single action. But on foo.class, we know that foo.class was generated from
you would still want `redo` to know which `.do` files were produced, so it bar.java only *after* bar.java has been compiled. But how
could rerun the splitter script, if `Redofile` ever changed, before using do you know, the first time someone asks to build myprog,
one of the generated `.do` files. where foo.class is supposed to come from?
It would also be useful, again, with ./configure: typically running the So we haven't thought about this enough yet.
configure script produces several output files, and it would be nice to
declare dependencies on all of them. Note that it's *okay* for a .do file to produce targets
other than the advertised one; you just have to be careful.
You could have a default.javac.do that runs 'javac
$1.java', and then have your program depend on a bunch of .javac
files. Just be careful not to depend on the .class files
themselves, since redo won't know how to regenerate them.
This feature would also be useful, again, with ./configure:
typically running the configure script produces several
output files, and it would be nice to declare dependencies
on all of them.
# Recursive make is considered harmful. Isn't redo even *more* recursive? # Recursive make is considered harmful. Isn't redo even *more* recursive?
@ -370,33 +401,41 @@ by Peter Miller.
Yes, redo is recursive, in the sense that every target is built by its own Yes, redo is recursive, in the sense that every target is built by its own
`.do` file, and every `.do` file is a shell script being run recursively `.do` file, and every `.do` file is a shell script being run recursively
from other shell scripts, which might call back into `redo`. In fact, it's from other shell scripts, which might call back into `redo`. In fact, it's
even more recursive than recursive make. even more recursive than recursive make. There is no
non-recursive way to use redo.
However the reason recursive make is considered harmful is that each However, the reason recursive make is considered harmful is that each
instance of make has no access to the dependency information seen by the instance of make has no access to the dependency information seen by the
other instances. Each one starts from its own Makefile, which only has a other instances. Each one starts from its own Makefile, which only has a
partial picture of what's going on; moreover, each one has to stat a lot of partial picture of what's going on; moreover, each one has to
the same files over again, leading to slowness. That's the thesis of the stat() a lot of the same files over again, leading to slowness. That's
"considered harmful" paper. the thesis of the "considered harmful" paper.
On the other hand, nobody has written a paper about it, but non-recursive Nobody has written a paper about it, but *non-recursive*
make should also be considered harmful. The problem is Makefiles aren't make should also be considered harmful! The problem is Makefiles aren't
very "hygienic" or "modular"; if you're not running make recursively, then very "hygienic" or "modular"; if you're not running make recursively, then
your one copy of make has to know *everything* about *everything* in your your one copy of make has to know *everything* about *everything* in your
entire project. Every variable in make is global, so every variable defined entire project. Every variable in make is global, so every variable defined
in *any* of your Makefiles is visible in *all* of your Makefiles. Every in *any* of your Makefiles is visible in *all* of your Makefiles. Every
little private function or macro is visible everywhere. In a huge project little private function or macro is visible everywhere. In a huge project
made up of multiple projects from multiple vendors, that's just not okay. made up of multiple projects from multiple vendors, that's just not okay.
Plus, if all your Makefiles are tangled together, make has
to read and parse the entire mess even to build the
smallest, simplest target file, making it slow.
`redo` deftly manages to dodge both sets of problems. First of all, `redo` deftly dodges both the problems of recursive make
and the problems of non-recursive make. First of all,
dependency information is shared through a global persistent `.redo` dependency information is shared through a global persistent `.redo`
directory, which is accessed by all your `redo` instances at once. database, which is accessed by all your `redo` instances at once.
Dependencies created or checked by one instance can be immediately used by Dependencies created or checked by one instance can be immediately used by
another instance. And there's locking to prevent two instances from another instance. And there's locking to prevent two instances from
building the same target at the same time. So you get all the "global building the same target at the same time. So you get all the "global
dependency" knowledge of non-recursive make. dependency" knowledge of non-recursive make. And it's a
binary file, so you can just grab the dependency
information you need right now, rather than going through
everything linearly.
But also, every `.do` script is entirely hygienic and traceable; `redo` Also, every `.do` script is entirely hygienic and traceable; `redo`
discourages the use of global environment variables, suggesting that you put discourages the use of global environment variables, suggesting that you put
settings into files (which can have timestamps and dependencies) instead. settings into files (which can have timestamps and dependencies) instead.
So you also get all the hygiene and modularity advantages of recursive make. So you also get all the hygiene and modularity advantages of recursive make.
@ -408,8 +447,9 @@ non-recursive Makefile setup with a bunch of included files, you end up with
lots and lots of rules that can all be executed in a random order; tracing lots and lots of rules that can all be executed in a random order; tracing
becomes impossible. Recursive make tries to compensate for this by breaking becomes impossible. Recursive make tries to compensate for this by breaking
the rules into subsections, but that ends up with all the "considered harmful" the rules into subsections, but that ends up with all the "considered harmful"
paper's complaints. `redo` just runs from top to bottom in a nice tree, so paper's complaints. `redo` runs your scripts from top to bottom in a
it's traceable no matter how many layers you have. nice tree, so it's traceable no matter how many layers you have.
# How do I set environment variables that affect the entire build? # How do I set environment variables that affect the entire build?
@ -429,14 +469,9 @@ create a file called `compile.do`:
redo-ifchange config.sh redo-ifchange config.sh
. ./config.sh . ./config.sh
echo "gcc -c -o \$3 $1.c $CFLAGS" echo "gcc -c -o \$3 $1.c $CFLAGS" >$3
chmod a+x $3 chmod a+x $3
(Note that if a .do script produces data to stdout, like we have here, then
`redo` will use that to atomically update the target file. So you don't
have to redirect it yourself, or worry about using a temp file and then
renaming it.)
Then, your `default.o.do` can simply look like this: Then, your `default.o.do` can simply look like this:
redo-ifchange compile $1.c redo-ifchange compile $1.c
@ -457,13 +492,15 @@ command line is being used.
As a bonus, all the variable expansions only need to be done once: when As a bonus, all the variable expansions only need to be done once: when
generating the ./compile program. With make, it would be recalculating generating the ./compile program. With make, it would be recalculating
expansions every time it compiles a file. expansions every time it compiles a file. Because of the
way make does expansions as macros instead of as normal
variables, this can be slow.
# How do I write a default.o.do that works for both C and C++ source? # How do I write a default.o.do that works for both C and C++ source?
See the previous answer for why you will probably use a compile.do instead. We can upgrade the compile.do from the previous answer to
Then your default.o.do looks like it does in the previous answer. We can look something like this:
then upgrade compile.do to look something like this:
redo-ifchange config.sh redo-ifchange config.sh
. ./config.sh . ./config.sh
@ -485,14 +522,18 @@ in make:
Then it has to do all the same checks. Except make has even *more* implicit Then it has to do all the same checks. Except make has even *more* implicit
rules than that, so it ends up trying and discarding lots of possibilities rules than that, so it ends up trying and discarding lots of possibilities
before it actually builds your program. before it actually builds your program. Is there a %.s? A
%.cpp? A %.pas? It needs to look for *all* of them, and
it gets slow. The more implicit rules you have, the slower
make gets.
So in this case, it's not implicit at all; you're specifying exactly how to In redo, it's not implicit at all; you're specifying exactly how to
decide whether it's a C program or a C++ program, and what to do in each decide whether it's a C program or a C++ program, and what to do in each
case. You can also share the two gcc command lines between the two rules, case. Plus you can share the two gcc command lines between the two rules,
which is hard in make. (In GNU make you can use macro functions, but the which is hard in make. (In GNU make you can use macro functions, but the
syntax for those is ugly.) syntax for those is ugly.)
# Can I just rebuild a part of the project? # Can I just rebuild a part of the project?
Absolutely! Although `redo` runs "top down" in the sense of one .do file Absolutely! Although `redo` runs "top down" in the sense of one .do file
@ -504,7 +545,7 @@ you start, `redo` will be able to build all the dependencies in the right
order. order.
Unlike non-recursive make, you don't have to jump through any strange hoops Unlike non-recursive make, you don't have to jump through any strange hoops
(like adding a fake Makefile in each directory that does `make -C ${TOPDIR}` (like adding, in each directory, a fake Makefile that does `make -C ${TOPDIR}`
back up to the main non-recursive Makefile). redo just uses `filename.do` back up to the main non-recursive Makefile). redo just uses `filename.do`
to build `filename`, or uses `default*.do` if the specific `filename.do` to build `filename`, or uses `default*.do` if the specific `filename.do`
doesn't exist. doesn't exist.
@ -519,6 +560,31 @@ And it will work exactly like this:
cd ../utils cd ../utils
redo foo.o redo foo.o
In make, if you run
make ../utils/foo.o
it means to look in ./Makefile for a rule called
../utils/foo.o... and it probably doesn't have such a
rule. On the other hand, if you run
cd ../utils
make foo.o
it means to look in ../utils/Makefile and look for a rule
called foo.o. And that might do something totally
different! redo combines these two forms and does
the right thing in both cases.
Note: redo will always change to the directory containing
the target before trying to build it. So if you do
redo ../utils/foo.o
the .do file will be run with its current directory set to
../utils. Thus, the .do file's runtime environment is
always reliable.
# Can my filenames have spaces in them? # Can my filenames have spaces in them?
@ -538,8 +604,8 @@ No.
# What if my .c file depends on a generated .h file? # What if my .c file depends on a generated .h file?
This problem arises as follows. foo.c includes config.h, and config.h is This problem arises as follows. foo.c includes config.h, and config.h is
created by running ./configure. The second part is easy; just write a config.h.do created by running ./configure. The second part is easy; just write a
that depends on the existence of configure (which is created by config.h.do that depends on the existence of configure (which is created by
configure.do, which probably runs autoconf). configure.do, which probably runs autoconf).
The first part, however, is not so easy. Normally, the headers that a C The first part, however, is not so easy. Normally, the headers that a C
@ -549,46 +615,44 @@ you do
redo foo.o redo foo.o
There's no way for redo to know that compiling foo.c into foo.o depends on There's no way for redo to *automatically* know that compiling foo.c
first generating config.h. into foo.o depends on first generating config.h.
FIXME: Since most .h files are *not* auto-generated, the easiest
There seem to be a few workarounds for this, but none are very good. You thing to do is probably to just add a line like this to
can explicitly make all .o files depend on config.h (which will cause things your default.o.do:
to get recompiled unnecessarily). You can just tell people to run
./configure before running redo (which is what make users typically do).
You can try to guess dependencies using `gcc -MM -MG` (-MG means don't die redo-ifchange config.h
if headers aren't present), but that's not actually very helpful, because of
the compiler's header include paths: if I #include "foo.h", which directory
is redo supposed to look at to find foo.h? All of them? Normally a C
file's autodependencies have full paths attached, so this would be special.
Sometimes a specific solution is much easier than a general
one.
If you really want to solve the general case,
[djb has a solution for his own [djb has a solution for his own
projects](http://cr.yp.to/redo/honest-nonfile.html), which is a simple projects](http://cr.yp.to/redo/honest-nonfile.html), which is a simple
script that looks through C files to pull out #include lines. He assumes script that looks through C files to pull out #include lines. He assumes
that `#include <file.h>` is a system header (thus not subject to being that `#include <file.h>` is a system header (thus not subject to being
built) and `#include "file.h"` is in the current directory (thus easy to built) and `#include "file.h"` is in the current directory (thus easy to
find). Unfortunately this isn't really a complete solution in the general find). Unfortunately this isn't really a complete
case. solution, but at least it would be able to redo-ifchange a
required header before compiling a program that requires
Maybe just telling redo to try to generate the file in all possible that header.
locations is the best way to go. But we need to think about this more.
In the meantime, you'll have to explicitly specify such dependencies in your
*.o.do files.
# Why don't you by default print the commands as they are run? # Why doesn't redo by default print the commands as they are run?
make prints the commands it runs as it runs them. redo doesn't, although make prints the commands it runs as it runs them. redo doesn't, although
you can get this behaviour with `redo -v`. you can get this behaviour with `redo -v` or `redo -x`.
(The difference between -v and -x is the same as it is in
sh... because we simply forward those options onward to sh
as it runs your .do script.)
The main reason we made this decision is that the commands get pretty long The main reason we don't do this by default is that the commands get
pretty long
winded (a compiler command line might be multiple lines of repeated winded (a compiler command line might be multiple lines of repeated
gibberish) and, on large projects, it's hard to actually see the progress of gibberish) and, on large projects, it's hard to actually see the progress of
the overall build. Thus, make users often work hard to have make hide the the overall build. Thus, make users often work hard to have make hide the
command output in order to make the log "more useful." command output in order to make the log "more readable."
The reduced output is a pain with make, however, because if there's ever a The reduced output is a pain with make, however, because if there's ever a
problem, you're left wondering exactly what commands were run at what time, problem, you're left wondering exactly what commands were run at what time,
@ -628,15 +692,19 @@ you can cut-and-paste a line from the build script and rerun it directly.
So if you ever want to debug what happened at a particular step, you can So if you ever want to debug what happened at a particular step, you can
choose to run only that step in verbose mode: choose to run only that step in verbose mode:
$ ./redo t/c.c.c.b.b -v $ redo t/c.c.c.b.b -x
redo t/c.c.c.b.b redo t/c.c.c.b.b
redo-ifchange "$1$2.a" * sh -ex default.b.do c.c.c.b .b c.c.c.b.b.redo2.tmp
echo a-to-b + redo-ifchange c.c.c.b.b.a
cat "$1$2.a" + echo a-to-b
+ cat c.c.c.b.b.a
+ ./sleep 1.1
redo t/c.c.c.b.b (done)
If you're using an autobuilder or something that logs build results for If you're using an autobuilder or something that logs build results for
future examination, you should probably set it to always run redo with future examination, you should probably set it to always run redo with
the -v option. the -x option.
# Is redo compatible with autoconf? # Is redo compatible with autoconf?
@ -654,14 +722,15 @@ Hells no. You can thank me later. But see next question.
# Is redo compatible with make? # Is redo compatible with make?
Yes. If you have an existing Makefile (for example, in one of your Yes. If you have an existing Makefile (for example, in one of your
subprojects), you can just call make to build that subproject. subprojects), you can just call make from a .do script to build that
subproject.
In a file called myproject.stamp.do: In a file called myproject.stamp.do:
redo-ifchange $(find myproject -name '*.[ch]') redo-ifchange $(find myproject -name '*.[ch]')
make -C myproject all make -C myproject all
So, to amend our answer to the previous question, you can include So, to amend our answer to the previous question, you *can* use
automake-generated Makefiles as part of your redo-based project. automake-generated Makefiles as part of your redo-based project.
@ -670,7 +739,27 @@ automake-generated Makefiles as part of your redo-based project.
Yes! redo implements the same jobserver protocol as GNU make, which means Yes! redo implements the same jobserver protocol as GNU make, which means
that redo running under make -j, or make running under redo -j, will do the that redo running under make -j, or make running under redo -j, will do the
right thing. Thus, it's safe to mix-and-match redo and make in a recursive right thing. Thus, it's safe to mix-and-match redo and make in a recursive
build system. Just make sure you declare your dependencies correctly. build system.
Just make sure you declare your dependencies correctly;
redo won't know all the specific dependencies included in
your Makefile, and make won't know your redo dependencies,
of course.
One way of cheating is to just have your make.do script
depend on *all* the source files of a subproject, like
this:
make -C subproject all
find subproject -name '*.[ch]' | xargs redo-ifchange
Now if any of the .c or .h files in subproject are changed,
your make.do will run, which calls into the subproject to
rebuild anything that might be needed. Worst case, if the
dependencies are too generous, we end up calling 'make all'
more often than necessary. But 'make all' probably runs
pretty fast when there's nothing to do, so that's not so
bad.
# What about distributed builds? # What about distributed builds?
@ -696,12 +785,24 @@ same preparsed headers over and over again).
At the time, he promised to open source this tool eventually. It would be At the time, he promised to open source this tool eventually. It would be
pretty fun to play with it. pretty fun to play with it.
FIXME: However, it won't work as easily with redo as it did with make. With The problem:
This idea won't work as easily with redo as it did with make. With
make, a separate copy of $SHELL is launched for each step of the build (and make, a separate copy of $SHELL is launched for each step of the build (and
gets migrated to the remote machine), but make runs only on your local gets migrated to the remote machine), but make runs only on your local
machine, so it can control parallelism and avoid building the same target machine, so it can control parallelism and avoid building the same target
from multiple machines, and so on. With redo, since the entire script runs from multiple machines, and so on. The key to the above
inside the shell, we'd have to do the parallelism another way. distribution mechanism is it can send files to the remote
machine at the beginning of the $SHELL, and send them back
when the $SHELL exits, and know that nobody cares about
them in the meantime. With redo, since the entire script runs
inside a shell (and the shell might not exit until the very
end of the build), we'd have to do the parallelism some
other way.
I'm sure it's doable, however. One nice thing about redo
is that the source code is so small compared to make: you
can just rewrite it.
# How fast is redo compared to make? # How fast is redo compared to make?
@ -709,64 +810,65 @@ inside the shell, we'd have to do the parallelism another way.
FIXME: FIXME:
The current version of redo is written in python and has not been optimized. The current version of redo is written in python and has not been optimized.
So right now, it's usually a bit slower. Not too embarrassingly slower, So right now, it's usually a bit slower. Not too embarrassingly slower,
though, and the slowness really only strikes the first time you build a though, and the slowness mostly only strikes when you're
project. building a project from scratch.
For incrementally building only the changed parts of the project, redo can For incrementally building only the changed parts of the project, redo can
be much faster than make, because it can check all the dependencies up be much faster than make, because it can check all the dependencies up
front and doesn't need to repeatedly parse and re-parse the Makefile (as front and doesn't need to repeatedly parse and re-parse the Makefile (as
recursive make needs to do). recursive make needs to do).
More bad news, however: the current version of redo also pointlessly redo's sqlite3-based dependency database is very fast (and
re-evaluates the same dependencies over and over. Unlike with make, there it would be even faster if we rewrite redo in C instead of
is no good reason for this, so it'll be fixed eventually. python). Better still, it would be possible to write an
inotify daemon that can update the dependency database in
real time; if you're running the daemon, you can run 'redo'
from the toplevel and if your build is clean, it could return
instantly, no matter how many dependencies you have.
Even better, in 'redo -j' mode, it sometimes rebuilds the On my machine, redo can currently check about 10,000
same dependency more than once. Not at the same time, so dependencies per second. As an example, a program that
your build won't break, but it'll just be unnecessarily depends on every single .c or .h file in the Linux kernel
slow. Obviously this should be fixed. 2.6.36 repo (about 36000 files) can be checked in about 4
seconds.
Furthermore, redo's current dependency database (in the `.redo` directory) Rewritten in C, dependency checking would probably go about
is not very efficient, and thrashes a lot of filesystem metadata, which is 10 times faster still.
especially slow on some filesystems, notably ext3. We'll want to improve that
by using a "real" data structure eventually. With a "real" data structure,
a daemon, and inotify, you could actually get redo's dependency evaluation
to run instantly. On very large projects with tens of thousands of files to
stat() before we can even start building, that could make a pretty big
difference. With make, an inotify implementation would be pretty hard to
do (since just parsing the Makefiles of such a project is complicated, and
there's no guarantee the dependencies are the same as last time). But with
the way redo works, this kind of optimization would be pretty easy.
We'll probably have to rewrite redo in C eventually to speed it up further. This probably isn't too hard; the design of redo is so simple that
This won't be very hard; the design is so simple that it should be easy to it should be easy to write in any language. It's just
write in any language. It's just *even easier* in python, *even easier* in python, which was good for writing the
which was good for writing a prototype. prototype and debugging the parallelism and locking rules.
Most of the so-called slowness at the moment (it's really not that bad) Most of the slowness at the moment is because redo-ifchange
is because redo-ifchange (and also sh itself) need to be fork'd and (and also sh itself) need to be fork'd and exec'd over and
exec'd over and over during the build process. The `minimal/do` script over during the build process.
shows a way around this; redo-ifchange could be a shell function
instead of a standalone program. Eliminating the need for extra
fork/exec in the common case could actually get us much faster than
make even when doing an initial build; make executes $SHELL for every
command, but with redo, there is a shell running at all times anyway,
so if we're very careful we could optimize that out.
As a point of reference, on my computer, I can fork-exec As a point of reference, on my computer, I can fork-exec
redo-ifchange.py about 87 times per second; an empty python program, redo-ifchange.py about 87 times per second; an empty python
about 100 times per second; an empty C program, about 1000 times per program, about 100 times per second; an empty C program,
second; an empty make, about 300 times per second. So if I could about 1000 times per second; an empty make, about 300 times
compile 87 files per second, which I can't, then python overhead per second. So if I could compile 87 files per second with
would be 50%. Also, if you're using redo -j on a multicore machine, all gcc, which I can't because gcc is slower than that, then
the python forking happens in parallel with everything else, so that's python overhead would be 50%. Since gcc is slower than
87 per second per core. Nevertheless, that's still slower than make and that, python overhead is generally much less - more like
should be fixed. 10%.
Also, if you're using redo -j on a multicore machine, all
the python forking happens in parallel with everything
else, so that's 87 per second per core. Nevertheless,
that's still slower than make and should be fixed.
(On the other hand, all this measurement is confounded
because redo's more fine-grained dependencies mean you can
have more parallelism. So if you have a lot of CPU cores, redo
might build *faster* than make just because it makes better
use of them.)
# What's missing? How can I help? # What's missing? How can I help?
redo is thoroughly incomplete and probably has numerous bugs. Just what you redo is incomplete and probably has numerous bugs. Just what you
always wanted in a build system, I know. always wanted in a build system, I know.
What's missing? Search for the word FIXME in this document; anything with a What's missing? Search for the word FIXME in this document; anything with a