feat: transparently append to compressed archives

When appending to a compressed archive (gzip, brotli, zstd), the tool
now handles compression automatically. Since some compression formats don't
support appending to compressed files in place, we write a new
compressed file with all the data and atomically rename it to replace
the original (assuming there is enough space on that filesystem).

This means you can work with compressed archives the same way as
uncompressed ones. Point the tool at your .json.gz file and append
values. No manual decompression/recompression needed.
This commit is contained in:
nobody 2025-11-30 17:09:44 -08:00
commit 2ab1c31993
Signed by: GrocerPublishAgent
GPG key ID: 43B1C298CDDE181C
34 changed files with 4747 additions and 1099 deletions

View file

@ -2,11 +2,11 @@
A practical CLI tool for tracking JSON file changes over time. Instead of keeping multiple copies of JSON files, this creates compact delta-based archives that preserve the complete history.
## What it does
## Why I made this tool
This tool solves a simple problem: you have a JSON file that changes regularly, and you want to track its history without storing dozens of full copies.
The problem I am solving: I have a JSON file that changes regularly (output of a scraping pipeline), and I want to track its history without storing dozens of full copies.
`json-archive` creates a `.json.archive` file next to your original JSON file. Each time you run the tool, it calculates only what changed and appends those deltas to the archive. You get complete history with minimal storage overhead.
`json-archive` creates a `.json.archive` file next to your original JSON file. Each time you run the tool, it calculates only what changed and appends those deltas to the archive. You get complete history with minimal storage overhead. It can move a .json file into the archive or leave it untouched.
The archive format is human-readable JSONL (not binary), making it easy to inspect, debug, and pipe into other scripts or web visualizations.
@ -18,9 +18,6 @@ json-archive data.json
# Later, append changes to existing archive
json-archive data.json.archive data.json
# Or let it infer again (won't overwrite without --force)
json-archive data.json # Safe: won't overwrite existing data.json.archive
```
## Real-world use case
@ -63,7 +60,7 @@ While the core design keeps things simple and readable, the tool does work with
This works fine for the happy path with archive files up to a few hundred megabytes, but contradicts the "keep it simple" design philosophy - it's included because it's practically useful.
**Building without compression**: Compression libraries are a security vulnerability vector. The default build includes them because most users want convenience. If you don't want to bundle compression libraries:
**Building without compression**: Compression libraries are a security vulnerability vector. The default build includes them because I want convenience. If you don't want to bundle compression libraries:
```bash
cargo install json-archive --no-default-features
@ -157,38 +154,11 @@ cargo build --release
Archives use the `.json.archive` extension by default:
- `data.json` -> `data.json.archive`
- `video.info.json` -> `video.info.json.archive`
- `config.json` -> `config.json.archive`
This makes it immediately clear which files are archives and which are source files.
## Error handling
The tool uses descriptive diagnostics instead of cryptic error codes:
```
error: I couldn't find the input file: missing.json
|
= help: Make sure the file path is correct and the file exists.
Check for typos in the filename.
```
Diagnostics are categorized as Fatal, Warning, or Info, and the tool exits with non-zero status only for fatal errors.
## Performance characteristics
- **Memory usage**: Bounded by largest single JSON file, not archive size
- **Append speed**: Fast - only computes deltas, doesn't re-read entire archive
- **Read speed**: Linear scan, but snapshots allow seeking to recent state
- **File size**: Typically 10-30% the size of storing all JSON copies
For very large archives, consider using snapshots (`-s` flag) to enable faster seeking.
- `<filename>.json` -> `<filename>.json.archive`
## Browser compatibility
Archives can be loaded directly in web applications:
The strength of the file format is easy browser visualization:
```javascript
// Parse archive in browser
@ -205,7 +175,8 @@ fetch('data.json.archive')
});
```
The format uses only standard JSON. No special parsing required.
The format uses only standard JSON and organizes the data into roughly the shape
you would need anyway.
## Contributing
@ -227,10 +198,10 @@ This project is licensed under the GNU Affero General Public License v3.0 (AGPL-
- You can use, modify, and distribute this software
- If you modify and distribute it, you must share your changes under the same license
- If you run a modified version on a server or embed it in a larger system, you must make the entire system's source code available to users
- No TiVoization - hardware restrictions that prevent users from running modified versions are prohibited
- No TiVoization! Hardware restrictions that prevent users from running
modified versions are prohibited. If you have a setup where you hard code a
signing key into firmware and refuse to run any user modified programs signed
by your secret key... then you are not allowed to use this software.
The AGPL ensures that improvements to this tool remain open and available to everyone, even when used in hosted services or embedded systems.
---
*Built with Rust for reliability and performance. Designed to be simple enough to understand, powerful enough to be useful.*
The AGPL ensures that improvements to this tool remain open and available to
everyone, even when used in hosted services or embedded systems.