README: a bunch of improvements to answer recent questions.

Inspired by some questions send to me in private email.
2011-02-21 05:11:59 -08:00 · 2011-02-21 05:11:59 -08:00 · 0ce949fb81
commit 0ce949fb81
parent a2bce72255
1 changed files with 264 additions and 11 deletions
--- a/README.md
+++ b/README.md
@ -252,6 +252,19 @@ combination of the above.  And you can put some of your
 targets in default.do and some of them in their own files. 
 Lay it out in whatever way makes sense to you.
 One more thing: if you put all your build rules in a single
 default.do, you'll soon discover that changing *anything*
 in that default.do will cause all your targets to rebuilt -
 because their .do file has changed.  This is technically
 correct, but you might find it annoying.  To work around
 it, try making your default.do look like this:
 	. ./default.od
 And then put the above case statement in default.od
 instead.  Since you didn't `redo-ifchange default.od`,
 changes to default.od won't cause everything to rebuild.
 # Can I set my dircolors to highlight .do files?
@ -277,6 +290,13 @@ To activate it, you can add a line like this to your .bashrc:
 # What are the three parameters ($1, $2, $3) to a .do file?
 FIXME:
 These definitions might change.  It turns out that
 djb's original definitions differ from these and we should
 probably change ours in order to maintain compatibility. 
 (In his version, $1 is always the name of the target, and
 $2 is the target with the extension removed.)
 $1 is the name of the target, with the extension removed,
 if any.
@ -317,14 +337,33 @@ Note that $2, the output file's .o extension, is rarely useful
 since you always know what it is.
 # Why not named variables like $FILE, $EXT, $OUT instead of $1, $2, $3?
 That sounds tempting and easy, but one downside would be
 lack of backward compatibility with djb's original redo
 design.
 Longer names aren't necessarily better.  Learning the
 meanings of the three numbers doesn't take long, and over
 time, those extra few keystrokes can add up.  And remember
 that Makefiles and perl have had strange one-character
 variable names for a long time.  It's not at all clear that
 removing them is an improvement.
 # What happens to the stdin/stdout/stderr in a redo file?
 As with make, stdin is not redirected.  You're probably
 better off not using it, though, because especially with
-parallel builds, it might not do anything useful.
+parallel builds, it might not do anything useful.  We might
 change this behaviour someday since it's such a terrible
 idea for .do scripts to read from stdin.
 As with make, stderr is also not redirected.  You can use
-it to print status messages as your build proceeds.
+it to print status messages as your build proceeds. 
 (Eventually, we might want to capture stderr so it's easier
 to look at the results of parallel builds, but this is
 tricky to do in a user-friendly way.)
 Redo treats stdout specially: it redirects it to point at
 $3 (see previous question).  That is, if your .do file
@ -338,6 +377,55 @@ will correctly, and atomically, generate an output file
 named `chicken` only if the echo command succeeds.
 # Isn't it confusing to have stdout go to the target by default?
 Yes, it is.  It's unlike what almost any other program
 does, especially make, and it's very easy to make a
 mistake.  For example, if you write in your script:
 	echo "Hello world"
 it will go to the target file rather than to the screen.
 A more common mistake is to run a program that writes to
 stdout by accident as it runs.  When you do that, you'll
 produce your target on $3, but it might be intermingled
 with junk you wrote to stdout.  redo is pretty good about
 catching this mistake, and it'll print a message like this:
 	redo  zot.do wrote to stdout *and* created $3.
 	redo  ...you should write status messages to stderr, not stdout.
 	redo  zot: exit code 207
 Despite the disadvantages, though, automatically capturing
 stdout does make certain kinds of .do scripts really
 elegant.  The "simplest possible .do file" can be very
 short.  For example, here's one that produces a sub-list
 from a list:
 	redo-ifchange filelist
 	grep ^src/ filelist
 redo's simplicity is an attempt to capture the "Zen of
 Unix," which has a lot to do with concepts like pipelines
 and stdout.  Why should every program have to implement its
 own -o (output filename) option when the shell already has
 a redirection operator?  Maybe if redo gets more popular,
 more programs in the world will be able to be even simpler
 than they are today.
 By the way, if you're running some programs that might
 misbehave and write garbage to stdout instead of stderr
 (Informational/status messages always belong on stderr, not
 stdout!  Fix your programs!), then just add this line to
 the top of your .do script:
 	exec >&2
 That will redirect your stdout to stderr, so it works more
 like you expect.
 # Can a *.do file itself be generated as part of the build process?
 Not currently.  There's nothing fundamentally preventing us from allowing
@ -375,6 +463,58 @@ tool that pokes around in there, please ask on the mailing
 list if we can standardize something for you.
 # Isn't using sqlite3 overkill?  And un-djb-ish?
 Well, yes.  Sort of.  I think people underestimate how
 "lite" sqlite really is:
 	root root 573376 2010-10-20 09:55 /usr/lib/libsqlite3.so.0.8.6
 573k for a *complete* and *very fast* and *transactional*
 SQL database.  For comparison, libdb is:
 	root root 1256548 2008-09-13 03:23 /usr/lib/libdb-4.6.so
 ...more than twice as big, and it doesn't even have an SQL parser in
 it!  Or if you want to be really horrified:
 	root root 1995612 2009-02-03 13:54 /usr/lib/libmysqlclient.so.15.0.0
 The mysql *client* library is two megs, and it doesn't even
 have a database server in it!  People who think SQL
 databases are automatically bloated and gross have not yet
 actually experienced the joys of sqlite.  SQL has a
 well-deserved bad reputation, but sqlite is another story
 entirely.  It's excellent, and much simpler and better
 written than you'd expect.
 But still, I'm pretty sure it's not very "djbish" to use a
 general-purpose database, especially one that has a *SQL
 parser* in it.  (One of the great things about redo's
 design is that it doesn't ever need to parse anything, so
 a SQL parser is a bit embarrassing.)
 I'm pretty sure djb never would have done it that way.
 However, I don't think we can reach the performance we want
 with dependency/build/lock information stored in plain text
 files; among other things, that results in too much
 fstat/open activity, which is slow in general, and even
 slower if you want to run on Windows.  That leads us to a
 binary database, and if the binary database isn't sqlite or
 libdb or something, that means we have to implement our own
 data structures.  Which is probably what djb would do, of
 course, but I'm just not convinced that I can do a better
 (or even a smaller) job of it than the sqlite guys did.
 Most of the state database stuff has been isolated in
 state.py.  If you're feeling brave, you can try to
 implement your own better state database, with or without
 sqlite.
 It is almost certainly possible to do it much more nicely
 than I have, so if you do, please send it in!
 # If a target didn't change, how do I prevent dependents from being rebuilt?
 For example, running ./configure creates a bunch of files including
@ -410,7 +550,15 @@ your build can always do the minimum amount of work
 necessary.)
-# Why not always use checksum-based dependencies instead of timestamps?
+# What hash algorithm does redo-stamp use?
 It's intentionally undocumented because you shouldn't need
 to care and it might change at any time.  But trust me,
 it's not the slow part of your build, and you'll never
 accidentally get a hash collision.
 # Why not *always* use checksum-based dependencies instead of timestamps?
 Some build systems keep a checksum of target files and rebuild dependents
 only when the target changes.  This is appealing in some cases; for example,
@ -420,24 +568,67 @@ dependencies automatically.  This keeps build scripts simple and gets rid of
 the need for people to re-implement file comparison over and over in every
 project or for multiple files in the same project.
-There are disadvantages to using checksums for everything,
+There are disadvantages to using checksums for everything
-however:
+automatically, however:
- calculating checksums for every output file adds time to
+- Building stuff unnecessarily is *much* less dangerous
-  the build;
+  than not building stuff that should be built.  Using
  checksums will 
 - It makes it hard to *force* things to rebuild when you
  know you absolutely want that.  (With timestamps, you can
  just `touch filename` to rebuild everything that depends
  on `filename`.)
- it makes it hard to *force* things to rebuild when you
+- Targets that are just used for aggregation (ie. they
  know you absolutely want that;
 - targets that are just used for aggregation (ie. they
  don't produce any output of their own) would always have
  the same checksum - the checksum of a zero-byte file -
  which causes confusing results.
 - Calculating checksums for every output file adds time to
  the build, even if you don't need that feature.
 - Building stuff unnecessarily and then stamping it is
  much slower than just not building it in the first place,
  so for *almost* every use of redo-stamp, it's not the
  right solution anyway.
 - To steal a line from the Zen of Python: explicit is
  better than implicit.  Making people think about when
  they're using the stamp feature - knowing that it's slow
  and a little annoying to do - will help people design
  better build scripts that depend on this feature as
  little as possible.
 - djb's (as yet unreleased) version of redo doesn't
  implement checksums, so doing that would produce an
  incompatible implementation.  With redo-stamp and
  redo-always being separate programs, you can simply
  choose not to use them if you want to keep maximum
  compatibility for the future.
 - Bonus: the redo-stamp algorithm is interchangeable.  You
  don't have to stamp the target file or the source files
  or anything in particular; you can stamp any data you
  want, including the output of `ls` or the content of a
  web page.  We could never have made things like that
  implicit anyway, so some form of explicit redo-stamp
  would always have been needed, and then we'd have to
  explain when to use the explicit one and when to use the
  implicit one.
 Thus, we made the decision to only use checksums for
 targets that explicitly call `redo-stamp` (see previous
 question).
 I suggest actually trying it out to see how it feels for
 you.  For myself, before there was redo-stamp and
 redo-always, a few types of problems (in particular,
 depending on a list of which files exist and which don't)
 were really annoying, and I definitely felt it.  Adding
 redo-stamp and redo-always work the way they do made the
 pain disappear, so I stopped changing things.
 # Why does 'redo target' always redo the target, even if it's unchanged?
@ -453,6 +644,41 @@ needs it or not.
 If you really want to only rebuild targets that have
 changed, you can run `redo-ifchange target` instead.
 The reasons I like this arrangement come down to semantics:
 - "make target" implies that if target exists, you're done;
  conversely, "redo target" in English implies you really
  want to *redo* it, not just sit around.
 - If this weren't the rule, `redo` and `redo-ifchange`
  would mean the same thing, which seems rather confusing.
 - If `redo` could refuse to run a .do script, you would
  have no easy one-line way to force a particular target to
  be rebuilt.  You'd have to remove the target and *then*
  redo it, which is more typing.  On the other hand, nobody
  actually types "redo foo.o" if they honestly think foo.o
  doesn't need rebuilding.
 - For "contentless" targets like "test" or "clean", it would
  be extremely confusing if they refused to run just
  because they ran successfully last time.
 In make, things get complicated because it doesn't
 differentiate between these two modes.  Makefile rules
 with no dependencies run every time, *unless* the target
 exists, in which case they run never, *unless* the target
 is marked ".PHONY", in which case they run every time.  But
 targets that *do* have dependencies follow totally
 different rules.  And all this is needed because there's no
 way to tell make, "Listen, I just really want you to run
 the rules for this target *right now*."
 With redo, the semantics are really simple to explain.  If
 your brain has already been fried by make, you might be
 surprised by it at first, but once you get used to it, it's
 really much nicer this way.
 # Can my .do files be written in a language other than sh?
@ -723,6 +949,33 @@ always be correct no matter where in the hierarchy your
 source files are.
 # Can I put my .o files in a different directory from my .c files?
 Yes.  There's nothing in redo that assumes anything about
 the location of your source files.  You can do all sorts of
 interesting tricks, limited only by your imagination.  For
 example, imagine that you have a toplevel default.o.do that looks
 like this:
 	ARCH=${1#out/}
 	ARCH=${ARCH%%/*}
 	SRC=${1#out/$ARCH/}
 	redo-ifchange $SRC.c
 	$ARCH-gcc -o $3 -c $SRC.c
 If you run `redo out/i586-mingw32msvc/path/to/foo.o`, then
 the above script would end up running
 	i586-mingw32msvc-gcc -o $3 -c path/to/foo.c
 You could also choose to read the compiler name or options from
 out/$ARCH/config.sh, or config.$ARCH.sh, or use any other
 arrangement you want.
 You could use the same technique to have separate build
 directories for out/debug, out/optimized, out/profiled, and so on.
 # Can my filenames have spaces in them?
 Yes, unlike with make.  For historical reasons, the Makefile syntax doesn't