feat: transparently append to compressed archives

When appending to a compressed archive (gzip, brotli, zstd), the tool
now handles compression automatically. Since some compression formats don't
support appending to compressed files in place, we write a new
compressed file with all the data and atomically rename it to replace
the original (assuming there is enough space on that filesystem).

This means you can work with compressed archives the same way as
uncompressed ones. Point the tool at your .json.gz file and append
values. No manual decompression/recompression needed.
This commit is contained in:
nobody 2025-11-30 17:09:44 -08:00
commit 2ab1c31993
Signed by: GrocerPublishAgent
GPG key ID: 43B1C298CDDE181C
34 changed files with 4747 additions and 1099 deletions

View file

@ -0,0 +1,141 @@
<!-- Generated by: cargo run --bin pointer_errors_demo > docs/diagnostics/json-pointer.md -->
# JSON Pointer Diagnostics
These are the error messages you'll see when a [JSON Pointer (RFC 6901)](https://datatracker.ietf.org/doc/html/rfc6901)
operation fails.
## Why These Errors Are Limited
The JSON object that failed to index probably doesn't exist anywhere as a file. It's
built by replaying delta events from the archive. The filename and line numbers in
these errors point to the source of the JSON pointer paths—the add/change/remove
events in the archive—not to the object itself.
A proper solution would dump the reconstructed JSON object to a file so you could
inspect it with `jq` or a text editor. That engineering work didn't happen.
Instead, you get:
- The pointer path that failed, with the failing segment underlined
- The actual value at the parent path (truncated)
- Some strings you can grep for in the archive
This is better than nothing, but it's still awkward. You can see *what* failed but
not easily inspect the full object we tried to index into. If you're lucky, the
truncated value shown is enough. If you're developing on this project, at least
you know what the errors look like.
## Contributing
If an error message is confusing or unhelpful for your case, please open an issue
or submit a pull request.
## Key Not Found
Key doesn't exist in the object. Shows available keys and suggests typos.
```
error E051: Path not found
I was traversing the JSON path '/user/emial' and got stuck.
I couldn't find the key 'emial'.
/user/emial
^^^^^
Value at '/user':
│ "age": ...
│ "email": ...
│ "name": ...
Available keys: age, email, name
Did you mean 'email'?
```
## Type Mismatch
Tried to index into a value that doesn't support it (e.g., `/domain` on a string,
`/0` on a number). Shows the actual type.
```
error E060: Type mismatch
I was traversing the JSON path '/users/0/email/domain' and got stuck.
I can't index into string with 'domain'.
/users/0/email/domain
^^^^^^
Value at '/users/0/email':
│ "alice@example.com"
Object keys like '/domain' only work on objects, not string.
```
## Array Index Out of Bounds
Index past the end of the array. Shows the array length.
```
error E051: Path not found
I was traversing the JSON path '/items/5' and got stuck.
I couldn't find index 5 (array length is 3).
/items/5
^
Value at '/items':
│ 0: "apple"
│ 1: "banana"
│ 2: "cherry"
Valid indices are 0-2.
```
## Array Index
If you think you have an object but you're actually indexing into an array, you'll see this error.
```
error E052: Invalid array index
I was traversing the JSON path '/items/foo' and got stuck.
I couldn't parse 'foo' as an array index.
/items/foo
^^^
Value at '/items':
│ 0: "apple"
│ 1: "banana"
│ 2: "cherry"
Array indices must be non-negative integers. Got 'foo'.
```
## Deep Path Failures
For long paths, the underline shows which segment failed. The full path remains
visible so you can see what you were trying to reach.
```
error E051: Path not found
I was traversing the JSON path '/data/users/0/profile/settings/theme' and got stuck.
I couldn't find the key 'settings'.
/data/users/0/profile/settings/theme
^^^^^^^^
Value at '/data/users/0/profile':
│ "name": ...
Available keys: name
```

49
docs/fuzz-testing.md Normal file
View file

@ -0,0 +1,49 @@
# Fuzz Testing
Fuzz testing throws random inputs at your code until something breaks.
## Commands
List available fuzz targets:
```
cargo fuzz list
```
Run a fuzz target:
```
cargo fuzz run fuzz_apply_move
```
Runs until you kill it or it finds a crash.
## Reading the Output
```
#787958 REDUCE cov: 1281 ft: 6423 corp: 1112/621Kb lim: 4096 exec/s: 13823 rss: 584Mb L: 19/3954 MS: 1 EraseBytes-
#788755 REDUCE cov: 1281 ft: 6424 corp: 1113/621Kb lim: 4096 exec/s: 13837 rss: 584Mb L: 767/3954 MS: 2 CMP-CrossOver- DE: "6\000\000\000"-
#789383 REDUCE cov: 1281 ft: 6424 corp: 1113/621Kb lim: 4096 exec/s: 13848 rss: 584Mb L: 59/3954 MS: 3 InsertByte-ShuffleBytes-EraseBytes-
```
The fields:
- `#787958` — test case number. How many inputs have been tried.
- `REDUCE` — what happened. `NEW` means new code was reached. `REDUCE` means an input was shrunk while keeping the same coverage. `pulse` is just a heartbeat.
- `cov: 1281` — coverage. Number of code edges hit. This is what you care about.
- `ft: 6423` — features. Finer-grained coverage metric. Ignore it.
- `corp: 1112/621Kb` — corpus. 1112 interesting inputs saved, 621KB total.
- `exec/s: 13823` — speed. Test cases per second.
- `rss: 584Mb` — memory use.
- `L: 19/3954` — input length. This one was 19 bytes. Largest in corpus is 3954.
- `MS: 1 EraseBytes-` — mutation. How the input was generated. Doesn't matter.
## Is It Working?
Watch `cov`. If it goes up, the fuzzer is finding new code paths. If it stops going up, either you have good coverage or the fuzzer is stuck.
`exec/s` in the thousands is fine. If it drops to double digits, something is wrong.
Seeing `NEW` events means progress. Long stretches without `NEW` means diminishing returns.
## When to Stop
When `cov` stops increasing and you're bored. Hours for a quick check, days for thoroughness.