Refactor the huge README.md into the more structured mkdocs.
I also cleaned up the installation section and added links to various competing redo implementations. The new README.md is basically just link to the docs on readthedocs.org, and a link to the mailing list. These docs need a *lot* more work, but this is enough of an improvement that I'll commit it anyway for now.
This commit is contained in:
parent
bde3b5526f
commit
d0607d0091
11 changed files with 1587 additions and 1477 deletions
311
Documentation/FAQImpl.md
Normal file
311
Documentation/FAQImpl.md
Normal file
|
|
@ -0,0 +1,311 @@
|
|||
# Hey, does redo even *run* on Windows?
|
||||
|
||||
FIXME:
|
||||
Probably under cygwin. But it hasn't been tested, so no.
|
||||
|
||||
If I were going to port redo to Windows in a "native" way,
|
||||
I might grab the source code to a posix shell (like the
|
||||
one in MSYS) and link it directly into redo.
|
||||
|
||||
`make` also doesn't *really* run on Windows (unless you use
|
||||
MSYS or Cygwin or something like that). There are versions
|
||||
of make that do - like Microsoft's version - but their
|
||||
syntax is horrifically different from one vendor to
|
||||
another, so you might as well just be writing for a
|
||||
vendor-specific tool.
|
||||
|
||||
At least redo is simple enough that, theoretically, one
|
||||
day, I can imagine it being cross platform.
|
||||
|
||||
One interesting project that has appeared recently is
|
||||
busybox-w32 (https://github.com/pclouds/busybox-w32). It's
|
||||
a port of busybox to win32 that includes a mostly POSIX
|
||||
shell (ash) and a bunch of standard Unix utilities. This
|
||||
might be enough to get your redo scripts working on a win32
|
||||
platform without having to install a bunch of stuff. But
|
||||
all of this needs more experimentation.
|
||||
|
||||
|
||||
# Can a *.do file itself be generated as part of the build process?
|
||||
|
||||
Not currently. There's nothing fundamentally preventing us from allowing
|
||||
it. However, it seems easier to reason about your build process if you
|
||||
*aren't* auto-generating your build scripts on the fly.
|
||||
|
||||
This might change someday.
|
||||
|
||||
|
||||
# How does redo store dependencies?
|
||||
|
||||
At the toplevel of your project, redo creates a directory
|
||||
named `.redo`. That directory contains a sqlite3 database
|
||||
with dependency information.
|
||||
|
||||
The format of the `.redo` directory is undocumented because
|
||||
it may change at any time. Maybe it will turn out that we
|
||||
can do something simpler than sqlite3. If you really need to make a
|
||||
tool that pokes around in there, please ask on the mailing
|
||||
list if we can standardize something for you.
|
||||
|
||||
|
||||
# Isn't using sqlite3 overkill? And un-djb-ish?
|
||||
|
||||
Well, yes. Sort of. I think people underestimate how
|
||||
"lite" sqlite really is:
|
||||
|
||||
root root 573376 2010-10-20 09:55 /usr/lib/libsqlite3.so.0.8.6
|
||||
|
||||
573k for a *complete* and *very fast* and *transactional*
|
||||
SQL database. For comparison, libdb is:
|
||||
|
||||
root root 1256548 2008-09-13 03:23 /usr/lib/libdb-4.6.so
|
||||
|
||||
...more than twice as big, and it doesn't even have an SQL parser in
|
||||
it! Or if you want to be really horrified:
|
||||
|
||||
root root 1995612 2009-02-03 13:54 /usr/lib/libmysqlclient.so.15.0.0
|
||||
|
||||
The mysql *client* library is two megs, and it doesn't even
|
||||
have a database server in it! People who think SQL
|
||||
databases are automatically bloated and gross have not yet
|
||||
actually experienced the joys of sqlite. SQL has a
|
||||
well-deserved bad reputation, but sqlite is another story
|
||||
entirely. It's excellent, and much simpler and better
|
||||
written than you'd expect.
|
||||
|
||||
But still, I'm pretty sure it's not very "djbish" to use a
|
||||
general-purpose database, especially one that has a *SQL
|
||||
parser* in it. (One of the great things about redo's
|
||||
design is that it doesn't ever need to parse anything, so
|
||||
a SQL parser is a bit embarrassing.)
|
||||
|
||||
I'm pretty sure djb never would have done it that way.
|
||||
However, I don't think we can reach the performance we want
|
||||
with dependency/build/lock information stored in plain text
|
||||
files; among other things, that results in too much
|
||||
fstat/open activity, which is slow in general, and even
|
||||
slower if you want to run on Windows. That leads us to a
|
||||
binary database, and if the binary database isn't sqlite or
|
||||
libdb or something, that means we have to implement our own
|
||||
data structures. Which is probably what djb would do, of
|
||||
course, but I'm just not convinced that I can do a better
|
||||
(or even a smaller) job of it than the sqlite guys did.
|
||||
|
||||
Most of the state database stuff has been isolated in
|
||||
state.py. If you're feeling brave, you can try to
|
||||
implement your own better state database, with or without
|
||||
sqlite.
|
||||
|
||||
It is almost certainly possible to do it much more nicely
|
||||
than I have, so if you do, please send it in!
|
||||
|
||||
|
||||
# What hash algorithm does redo-stamp use?
|
||||
|
||||
It's intentionally undocumented because you shouldn't need
|
||||
to care and it might change at any time. But trust me,
|
||||
it's not the slow part of your build, and you'll never
|
||||
accidentally get a hash collision.
|
||||
|
||||
|
||||
# Why not *always* use checksum-based dependencies instead of timestamps?
|
||||
|
||||
Some build systems keep a checksum of target files and rebuild dependents
|
||||
only when the target changes. This is appealing in some cases; for example,
|
||||
with ./configure generating config.h, it could just go ahead and generate
|
||||
config.h; the build system would be smart enough to rebuild or not rebuild
|
||||
dependencies automatically. This keeps build scripts simple and gets rid of
|
||||
the need for people to re-implement file comparison over and over in every
|
||||
project or for multiple files in the same project.
|
||||
|
||||
There are disadvantages to using checksums for everything
|
||||
automatically, however:
|
||||
|
||||
- Building stuff unnecessarily is *much* less dangerous
|
||||
than not building stuff that should be built. Checksums
|
||||
aren't perfect (think of zero-byte output files); using
|
||||
checksums will cause more builds to be skipped by
|
||||
default, which is very dangerous.
|
||||
|
||||
- It makes it hard to *force* things to rebuild when you
|
||||
know you absolutely want that. (With timestamps, you can
|
||||
just `touch filename` to rebuild everything that depends
|
||||
on `filename`.)
|
||||
|
||||
- Targets that are just used for aggregation (ie. they
|
||||
don't produce any output of their own) would always have
|
||||
the same checksum - the checksum of a zero-byte file -
|
||||
which causes confusing results.
|
||||
|
||||
- Calculating checksums for every output file adds time to
|
||||
the build, even if you don't need that feature.
|
||||
|
||||
- Building stuff unnecessarily and then stamping it is
|
||||
much slower than just not building it in the first place,
|
||||
so for *almost* every use of redo-stamp, it's not the
|
||||
right solution anyway.
|
||||
|
||||
- To steal a line from the Zen of Python: explicit is
|
||||
better than implicit. Making people think about when
|
||||
they're using the stamp feature - knowing that it's slow
|
||||
and a little annoying to do - will help people design
|
||||
better build scripts that depend on this feature as
|
||||
little as possible.
|
||||
|
||||
- djb's (as yet unreleased) version of redo doesn't
|
||||
implement checksums, so doing that would produce an
|
||||
incompatible implementation. With redo-stamp and
|
||||
redo-always being separate programs, you can simply
|
||||
choose not to use them if you want to keep maximum
|
||||
compatibility for the future.
|
||||
|
||||
- Bonus: the redo-stamp algorithm is interchangeable. You
|
||||
don't have to stamp the target file or the source files
|
||||
or anything in particular; you can stamp any data you
|
||||
want, including the output of `ls` or the content of a
|
||||
web page. We could never have made things like that
|
||||
implicit anyway, so some form of explicit redo-stamp
|
||||
would always have been needed, and then we'd have to
|
||||
explain when to use the explicit one and when to use the
|
||||
implicit one.
|
||||
|
||||
Thus, we made the decision to only use checksums for
|
||||
targets that explicitly call `redo-stamp` (see previous
|
||||
question).
|
||||
|
||||
I suggest actually trying it out to see how it feels for
|
||||
you. For myself, before there was redo-stamp and
|
||||
redo-always, a few types of problems (in particular,
|
||||
depending on a list of which files exist and which don't)
|
||||
were really annoying, and I definitely felt it. Adding
|
||||
redo-stamp and redo-always work the way they do made the
|
||||
pain disappear, so I stopped changing things.
|
||||
|
||||
|
||||
# Why doesn't redo by default print the commands as they are run?
|
||||
|
||||
make prints the commands it runs as it runs them. redo doesn't, although
|
||||
you can get this behaviour with `redo -v` or `redo -x`.
|
||||
(The difference between -v and -x is the same as it is in
|
||||
sh... because we simply forward those options onward to sh
|
||||
as it runs your .do script.)
|
||||
|
||||
The main reason we don't do this by default is that the commands get
|
||||
pretty long
|
||||
winded (a compiler command line might be multiple lines of repeated
|
||||
gibberish) and, on large projects, it's hard to actually see the progress of
|
||||
the overall build. Thus, make users often work hard to have make hide the
|
||||
command output in order to make the log "more readable."
|
||||
|
||||
The reduced output is a pain with make, however, because if there's ever a
|
||||
problem, you're left wondering exactly what commands were run at what time,
|
||||
and you often have to go editing the Makefile in order to figure it out.
|
||||
|
||||
With redo, it's much less of a problem. By default, redo produces output
|
||||
that looks like this:
|
||||
|
||||
$ redo t
|
||||
redo t/all
|
||||
redo t/hello
|
||||
redo t/LD
|
||||
redo t/hello.o
|
||||
redo t/CC
|
||||
redo t/yellow
|
||||
redo t/yellow.o
|
||||
redo t/bellow
|
||||
redo t/c
|
||||
redo t/c.c
|
||||
redo t/c.c.c
|
||||
redo t/c.c.c.b
|
||||
redo t/c.c.c.b.b
|
||||
redo t/d
|
||||
|
||||
The indentation indicates the level of recursion (deeper levels are
|
||||
dependencies of earlier levels). The repeated word "redo" down the left
|
||||
column looks strange, but it's there for a reason, and the reason is this:
|
||||
you can cut-and-paste a line from the build script and rerun it directly.
|
||||
|
||||
$ redo t/c
|
||||
redo t/c
|
||||
redo t/c.c
|
||||
redo t/c.c.c
|
||||
redo t/c.c.c.b
|
||||
redo t/c.c.c.b.b
|
||||
|
||||
So if you ever want to debug what happened at a particular step, you can
|
||||
choose to run only that step in verbose mode:
|
||||
|
||||
$ redo t/c.c.c.b.b -x
|
||||
redo t/c.c.c.b.b
|
||||
* sh -ex default.b.do c.c.c.b .b c.c.c.b.b.redo2.tmp
|
||||
+ redo-ifchange c.c.c.b.b.a
|
||||
+ echo a-to-b
|
||||
+ cat c.c.c.b.b.a
|
||||
+ ./sleep 1.1
|
||||
redo t/c.c.c.b.b (done)
|
||||
|
||||
|
||||
If you're using an autobuilder or something that logs build results for
|
||||
future examination, you should probably set it to always run redo with
|
||||
the -x option.
|
||||
|
||||
|
||||
# How fast is redo compared to make?
|
||||
|
||||
FIXME:
|
||||
The current version of redo is written in python and has not been optimized.
|
||||
So right now, it's usually a bit slower. Not too embarrassingly slower,
|
||||
though, and the slowness mostly only strikes when you're
|
||||
building a project from scratch.
|
||||
|
||||
For incrementally building only the changed parts of the project, redo can
|
||||
be much faster than make, because it can check all the dependencies up
|
||||
front and doesn't need to repeatedly parse and re-parse the Makefile (as
|
||||
recursive make needs to do).
|
||||
|
||||
redo's sqlite3-based dependency database is very fast (and
|
||||
it would be even faster if we rewrite redo in C instead of
|
||||
python). Better still, it would be possible to write an
|
||||
inotify daemon that can update the dependency database in
|
||||
real time; if you're running the daemon, you can run 'redo'
|
||||
from the toplevel and if your build is clean, it could return
|
||||
instantly, no matter how many dependencies you have.
|
||||
|
||||
On my machine, redo can currently check about 10,000
|
||||
dependencies per second. As an example, a program that
|
||||
depends on every single .c or .h file in the Linux kernel
|
||||
2.6.36 repo (about 36000 files) can be checked in about 4
|
||||
seconds.
|
||||
|
||||
Rewritten in C, dependency checking would probably go about
|
||||
10 times faster still.
|
||||
|
||||
This probably isn't too hard; the design of redo is so simple that
|
||||
it should be easy to write in any language. It's just
|
||||
*even easier* in python, which was good for writing the
|
||||
prototype and debugging the parallelism and locking rules.
|
||||
|
||||
Most of the slowness at the moment is because redo-ifchange
|
||||
(and also sh itself) need to be fork'd and exec'd over and
|
||||
over during the build process.
|
||||
|
||||
As a point of reference, on my computer, I can fork-exec
|
||||
redo-ifchange.py about 87 times per second; an empty python
|
||||
program, about 100 times per second; an empty C program,
|
||||
about 1000 times per second; an empty make, about 300 times
|
||||
per second. So if I could compile 87 files per second with
|
||||
gcc, which I can't because gcc is slower than that, then
|
||||
python overhead would be 50%. Since gcc is slower than
|
||||
that, python overhead is generally much less - more like
|
||||
10%.
|
||||
|
||||
Also, if you're using redo -j on a multicore machine, all
|
||||
the python forking happens in parallel with everything
|
||||
else, so that's 87 per second per core. Nevertheless,
|
||||
that's still slower than make and should be fixed.
|
||||
|
||||
(On the other hand, all this measurement is confounded
|
||||
because redo's more fine-grained dependencies mean you can
|
||||
have more parallelism. So if you have a lot of CPU cores, redo
|
||||
might build *faster* than make just because it makes better
|
||||
use of them.)
|
||||
Loading…
Add table
Add a link
Reference in a new issue