Directory reorg: move code into redo/, generate binaries in bin/.
It's time to start preparing for a version of redo that doesn't work unless we build it first (because it will rely on C modules, and eventually be rewritten in C altogether). To get rolling, remove the old-style symlinks to the main programs, and rename those programs from redo-*.py to redo/cmd_*.py. We'll also move all library functions into the redo/ dir, which is a more python-style naming convention. Previously, install.do was generating wrappers for installing in /usr/bin, which extend sys.path and then import+run the right file. This made "installed" redo work quite differently from running redo inside its source tree. Instead, let's always generate the wrappers in bin/, and not make anything executable except those wrappers. Since we're generating wrappers anyway, let's actually auto-detect the right version of python for the running system; distros can't seem to agree on what to call their python2 binaries (sigh). We'll fill in the right #! shebang lines. Since we're doing that, we can stop using /usr/bin/env, which will a) make things slightly faster, and b) let us use "python -S", which tells python not to load a bunch of extra crap we're not using, thus improving startup times. Annoyingly, we now have to build redo using minimal/do, then run the tests using bin/redo. To make this less annoying, we add a toplevel ./do script that knows the right steps, and a Makefile (whee!) for people who are used to typing 'make' and 'make test' and 'make clean'.
This commit is contained in:
parent
5bc7c861b6
commit
f6fe00db5c
140 changed files with 256 additions and 99 deletions
|
|
@ -1,252 +0,0 @@
|
|||
### A LaTeX typesetting example
|
||||
|
||||
[LaTeX](https://www.latex-project.org/) is a typesetting system that's
|
||||
especially popular in academia. Among other things, it lets you produce
|
||||
postscript and pdf files from a set of (mostly text) input files.
|
||||
|
||||
LaTeX documents often include images and charts. In our example, we'll show
|
||||
how to auto-generate a chart for inclusion using an [R script with
|
||||
ggplot2](https://ggplot2.tidyverse.org/).
|
||||
|
||||
To play with this code on your own machine, get the [redo
|
||||
source code](https://github.com/apenwarr/redo) and look in the
|
||||
`Documentation/cookbook/latex/` directory.
|
||||
|
||||
|
||||
### Generating a plot from an R script
|
||||
|
||||
First, let's tell redo how to generate our chart. We'll use
|
||||
the R language, and ask it to plot some of its sample data (the mpg, "miles
|
||||
per gallon" data set) and save it to an eps (encapsulated postscript) file.
|
||||
eps files are usually a good format for LaTeX embedded images, because they
|
||||
scale to any printer or display resolution.
|
||||
|
||||
First, let's make an R script that generates a plot:
|
||||
<pre><code lang='r' src='mpg.R'></code></pre>
|
||||
|
||||
And then a .do file to tie that into redo:
|
||||
<pre><code lang='sh' src='mpg.eps.do'></code></pre>
|
||||
|
||||
We can build and view the image:
|
||||
```shell
|
||||
$ redo mpg.eps
|
||||
redo mpg.eps
|
||||
|
||||
# View the file on Linux
|
||||
$ evince mpg.eps
|
||||
|
||||
# View the file on MacOS
|
||||
$ open mpg.eps
|
||||
```
|
||||
|
||||
|
||||
### Running the LaTeX processor
|
||||
|
||||
Here's the first draft of our very important scientific paper:
|
||||
<pre><code lang='tex' src='paper.latex'></code></pre>
|
||||
|
||||
Notice how it refers to the chart from above, `mpg.eps`, and a text file,
|
||||
`discovery.txt`. Let's create the latter as a static file.
|
||||
<pre><code lang='tex' src='discovery.txt'></code></pre>
|
||||
|
||||
With all the parts of our document in places, we can now compile it directly
|
||||
using `pdflatex`:
|
||||
```shell
|
||||
$ pdflatex paper.latex
|
||||
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/Debian) (preloaded format=pdflatex)
|
||||
restricted \write18 enabled.
|
||||
entering extended mode
|
||||
...[a lot of unnecessary diagnostic messages]...
|
||||
Output written on paper.pdf (2 pages, 68257 bytes).
|
||||
Transcript written on paper.log.
|
||||
```
|
||||
|
||||
But this has a few problems. First of all, it doesn't understand
|
||||
dependencies; if `mpg.R` changes, it won't know to rebuild `mpg.eps`.
|
||||
Secondly, the TeX/LaTeX toolchain has an idiosyncracy that means you might
|
||||
have to rebuild your document more than once. In our example, we generate a
|
||||
table of contents, but it ends up getting generated *before* processing the
|
||||
rest of the content in the document, so it's initially blank. As it
|
||||
continues, LaTeX produces a file called `paper.aux` with a list of the
|
||||
references needed by the table of contents, and their page numbers. If we
|
||||
run LaTeX over again, it'll use that to build a proper of table of contents.
|
||||
|
||||
Of course, life is not necessarily so easy. Once the table of contents
|
||||
isn't blank, it might start to push content onto the next page. This will
|
||||
change all the page numbers! So we'd have to do it one more time. And that
|
||||
might lead to even more subtle problems, like a reference to page 99
|
||||
changing to page 100, which pushes a word onto the next page, which changes
|
||||
some other page number, and so on. Thus, we need a script that will keep
|
||||
looping, re-running LaTeX until `paper.aux` stabilizes.
|
||||
|
||||
The whole script we'll use is below. Instead of running `pdflatex`
|
||||
directly, we'll use the regular `latex` command, which produces a .dvi
|
||||
(DeVice Independent) intermediate file which we can later turn into a pdf or
|
||||
ps file.
|
||||
|
||||
LaTeX produces a bunch of clutter files (like `paper.aux`) that can be used
|
||||
in future runs, but which also make its execution nondeterministic. To
|
||||
avoid that problem, we tell it to use a temporary `--output-directory` that
|
||||
we delete and recreate before each build (although we might need to run
|
||||
`latex` multiple times in one build, to get `paper.aux` to converge).
|
||||
<pre><code lang='sh' src='default.runtex.do'></code></pre>
|
||||
|
||||
|
||||
### Virtual targets, side effects, and multiple outputs
|
||||
|
||||
Why did we call our script `default.runtex.do`? Why not `default.pdf.do` or
|
||||
`default.dvi.do`, depending what kind of file we ask LaTeX to produce?
|
||||
|
||||
The problem is that the `latex` command actually produces several
|
||||
files in that temporary directory, and we might want to keep them around.
|
||||
If we name our .do file after only *one* of those outputs, things get messy.
|
||||
|
||||
The biggest problem is that redo requires a .do file to write its output to
|
||||
$3 (or stdout), so that it can guarantee the output gets replaced
|
||||
atomically. When there is more than one output, at most one file can
|
||||
be sent to $3; how do you choose which one? Even worse, some programs don't
|
||||
even have the ability to choose the output filename; for an input of
|
||||
`paper.latex`, the `latex` command just writes a bunch of files named
|
||||
`paper.*` directly. You can't ask it to put just one of them in $3.
|
||||
|
||||
The easiest way to handle this situation in redo is to use a "virtual
|
||||
target", which is a target name that doesn't actually get created has a file,
|
||||
and has only side effects. You've seen these before: when we use `all.do`
|
||||
or `clean.do`, we don't expect to produce a file named `all` or `clean`. We
|
||||
expect redo to run a collection of other commands. In `make`, these are
|
||||
sometimes called ".PHONY rules" because of the way they are declared in a
|
||||
`Makefile`. But the rules aren't phony, they really are executed; they just
|
||||
don't produce output. So in redo we call them "virtual."
|
||||
|
||||
When we `redo paper.runtex`, it builds our virtual target. There is no
|
||||
`paper.runtex` file or directory generated. But as a side effect, a
|
||||
directory named `paper.tmp` is created.
|
||||
|
||||
(Side note: it's tempting to name the directory the same as the target. So
|
||||
we could have a `paper.runtex` directory instead of `paper.tmp`. This is
|
||||
not inherently a bad idea, but currently redo behaviour is undefined if you
|
||||
redo-ifchange a directory. Directories are weird. If one file in that
|
||||
directory disappears, does that mean you "modified" the output by hand?
|
||||
What if two redo targets modify the same directory? Should we require
|
||||
scripts to only atomically replace an entire output directory via $3? And
|
||||
so on. We might carefully define this behaviour eventually, but for now,
|
||||
it's better to use a separate directory name and avoid the undefined
|
||||
behaviour.)
|
||||
|
||||
|
||||
### Depending on side effects produced by virtual targets
|
||||
|
||||
Next, we want to produce .pdf and .ps files from the collection of files
|
||||
produced by the `latex` command, particularly `paper.tmp/paper.dvi`. To do
|
||||
that, we have to bring our files back from the "virtual target" world into
|
||||
the real world.
|
||||
|
||||
Depending on virtual targets is easy; we'll just
|
||||
`redo-ifchange paper.runtex`. Then we want to materialize `paper.dvi` from
|
||||
the temporary files in `paper.tmp/paper.dvi`, which we can do with an
|
||||
efficient [hardlink](https://en.wikipedia.org/wiki/Hard_link) (rather than
|
||||
making an unnecessary copy), like this:
|
||||
<pre><code lang='sh' src='default.dvi.do'></code></pre>
|
||||
|
||||
Notice that we *don't* do `redo-ifchange paper.tmp/paper.dvi`. That's
|
||||
because redo has no knowledge of that file. If you ask redo to build that
|
||||
file for you, it doesn't know how to do it. You have to ask for
|
||||
`paper.runtex`, which you know - but redo doesn't know - will produce the
|
||||
input file you want. Then you can safely use it.
|
||||
|
||||
Once we have a .do file that produces the "real" (non-virtual,
|
||||
non-side-effect) `paper.dvi` file, however, it's safe to depend directly on
|
||||
it. Let's use that to produce our .ps and .pdf outputs:
|
||||
<pre><code lang='sh' src='default.ps.do'></code></pre>
|
||||
<pre><code lang='sh' src='default.pdf.do'></code></pre>
|
||||
|
||||
(As above, we include `exec >&2` lines because LaTeX tools incorrectly write
|
||||
their log messages to stdout. We need to redirect it all to stderr. That
|
||||
way [redo-log](../../redo-log) can handle all the log output appropriately.)
|
||||
|
||||
|
||||
### Explicit dependencies
|
||||
|
||||
We've made a generalized script, `default.runtex.do`, that can compile any
|
||||
.latex file and produce a .tmp directory with its output. But that's not
|
||||
quite enough: different .latex files might have extra dependencies that need
|
||||
to exist *before* the compilation can continue. In our case, we need the
|
||||
auto-generated `mpg.eps` that we discussed above.
|
||||
|
||||
To make that work, `default.runtex.do` looks for a .deps file with the same
|
||||
name as the .latex file being processed. It contains just a list of extra
|
||||
dependencies that need to be built. Here's ours:
|
||||
<pre><code src='paper.deps'></code></pre>
|
||||
|
||||
You can use this same ".deps" technique in various different places in redo.
|
||||
For example, you could have a default.do that can link a C program from any
|
||||
set of .o files. To specify the right set of .o files for target `X`,
|
||||
default.do might look in an `X.deps` or `X.list` file. If you later want to
|
||||
get even fancier, you could make an `X.deps.do` that programmatically
|
||||
generates the list of dependencies; for example, it might include one set of
|
||||
files on win32 platforms and a different set on unix platforms.
|
||||
|
||||
|
||||
### Autodependencies
|
||||
|
||||
Our `paper.latex` file actually includes two files: `mpg.eps`, which we
|
||||
explicitly depended upon above, and `discovery.txt`, which we didn't. The
|
||||
latter is a static source file, so we can let redo discover it
|
||||
automatically, based on the set of files that LaTeX opens while it runs.
|
||||
The `latex` command has a `--record` option to do this; it produces a file
|
||||
called `paper.tmp/paper.fls` (.fls is short for "File LiSt").
|
||||
|
||||
One of redo's best features is that you can declare dependencies *after*
|
||||
you've done your build steps, when you have the best knowledge of which
|
||||
files were actually needed. That's why in `default.runtex.do`, we parse the
|
||||
.fls file and then redo-ifchange on its contents right at the end.
|
||||
|
||||
(This brings up a rather subtle point about how redo works. When you run
|
||||
redo-ifchange, redo adds to the list of files which, if they change, mean
|
||||
your target needs to be rebuilt. But unlike make, redo will not actually
|
||||
rebuild those files merely because they're listed as a dependency; it just
|
||||
knows to rebuild your target, which means to run your .do file, which will
|
||||
run redo-ifchange *again* if it still needs those input files to be fresh.
|
||||
|
||||
This avoids an annoying problem in `make` where you can teach it about
|
||||
which .h files your C program depended on last time, but if you change
|
||||
A.c to no longer include X.h, and then delete X.h, make might complain
|
||||
that X.h is missing, because A.c depended on it *last time*. redo will
|
||||
simply notice that since X.h is missing, A.c needs to be recompiled, and let
|
||||
your compilation .do script report an error, or not.)
|
||||
|
||||
Anyway, this feature catches not just our `discovery.txt` dependency, but
|
||||
also the implicit dependencies on various LaTeX template and font files, and
|
||||
so on. If any of those change, our LaTeX file needs to be rebuilt.
|
||||
```shell
|
||||
$ redo --no-detail paper.pdf
|
||||
redo paper.pdf
|
||||
redo paper.dvi
|
||||
redo paper.runtex
|
||||
redo mpg.eps
|
||||
|
||||
$ redo --no-detail paper.pdf
|
||||
redo paper.pdf
|
||||
|
||||
$ touch discovery.txt
|
||||
|
||||
$ redo --no-detail paper.pdf
|
||||
redo paper.pdf
|
||||
redo paper.dvi
|
||||
redo paper.runtex
|
||||
|
||||
$ redo --no-detail paper.pdf
|
||||
redo paper.pdf
|
||||
```
|
||||
|
||||
|
||||
### Housekeeping
|
||||
|
||||
As usual, to polish up our project, let's create an `all.do` and
|
||||
`clean.do`.
|
||||
|
||||
Because this project is included in the redo source and we don't want redo
|
||||
to fail to build just because you don't have LaTeX or R installed, we'll
|
||||
have `all.do` quit politely if the necessary tools are missing.
|
||||
<pre><code lang='sh' src='all.do'></code></pre>
|
||||
<pre><code lang='sh' src='clean.do'></code></pre>
|
||||
Loading…
Add table
Add a link
Reference in a new issue