A practical CLI tool for tracking JSON file changes over time.

Find a file

nobody b65103c9f7 refactor: decompose archive read/write into composable building blocks Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated logic). Replace with four focused modules: 1. open_archive() - opens a file, detects compression, returns raw bytes 2. read_archive() - parses bytes into validated observations 3. CompressionWriter - writes bytes with any compression format 4. WriteStrategy - given a list of files, determines input archive, output archive, output format, and which of four write modes to use: - Create: new archive, no input - Append: uncompressed input, seek to end - AtomicSwap: compressed input, rewrite via temp file - CopyOnWrite: different input/output paths, transcode between formats Previously you could not specify output format. Appending always preserved the input format, creating compressed archives didn't work. Now all four cases work with any supported compression format. Atomic swap now writes to temp file, then renames. Crash-safe. Trade-off: This approach prioritizes code clarity over syscall efficiency. The archive file may be opened and read multiple times during a single operation (once for format detection, once for reading state, once for copying content). A more optimized implementation could reuse file handles, but the current approach makes each step's purpose obvious.		2025-12-01 21:01:27 -08:00
docs	feat: transparently append to compressed archives	2025-11-30 17:09:49 -08:00
fuzz	refactor: decompose archive read/write into composable building blocks	2025-12-01 21:01:27 -08:00
src	refactor: decompose archive read/write into composable building blocks	2025-12-01 21:01:27 -08:00
tests	refactor: decompose archive read/write into composable building blocks	2025-12-01 21:01:27 -08:00
.gitignore	feat: transparently append to compressed archives	2025-11-30 17:09:49 -08:00
Cargo.lock	refactor: decompose archive read/write into composable building blocks	2025-12-01 21:01:27 -08:00
Cargo.toml	refactor: decompose archive read/write into composable building blocks	2025-12-01 21:01:27 -08:00
CLAUDE.md	feat: transparently append to compressed archives	2025-11-30 17:09:49 -08:00
default.do	feat: transparently append to compressed archives	2025-11-30 17:09:49 -08:00
LICENSE	feat: open source vibecoding example project	2025-09-28 11:20:25 -07:00
LICENSE-header	feat: open source vibecoding example project	2025-09-28 11:20:25 -07:00
README.md	refactor: reorder readme sections	2025-11-30 17:23:19 -08:00

README.md

json-archive

A practical CLI tool for tracking JSON file changes over time. Instead of keeping multiple copies of JSON files, this creates compact delta-based archives that preserve the complete history.

Why I made this tool

The problem I am solving: I have a JSON file that changes regularly (output of a scraping pipeline), and I want to track its history without storing dozens of full copies.

json-archive creates a .json.archive file next to your original JSON file. Each time you run the tool, it calculates only what changed and appends those deltas to the archive. You get complete history with minimal storage overhead. It can move a .json file into the archive or leave it untouched.

The archive format is human-readable JSONL (not binary), making it easy to inspect, debug, and pipe into other scripts or web visualizations.

Quick example

# Create initial archive from data.json (infers output: data.json.archive)
json-archive data.json

# Later, append changes to existing archive
json-archive data.json.archive data.json

Installation

cargo install json-archive

Or build from source:

git clone <repo>
cd json-archive
cargo build --release

Archive format

The format is JSONL with delta-based changes using JSON Pointer paths. For complete technical details about the file format, see the file format specification.

{"version": 1, "created": "2025-01-15T10:00:00Z", "initial": {"views": 100, "title": "My Video"}}
# First observation  
["observe", "obs-001", "2025-01-15T10:05:00Z", 2]
["change", "/views", 100, 150, "obs-001"]
["change", "/title", "My Video", "My Awesome Video", "obs-001"]
# Second observation
["observe", "obs-002", "2025-01-15T11:00:00Z", 1]  
["change", "/views", 150, 200, "obs-002"]

Each observation records:

What changed (using JSON Pointer paths like /views)
The old and new values
When it happened
A unique observation ID

Design philosophy

Hackable over efficient: The file format prioritizes human readability and scriptability over binary compactness. You can:

Open archives in any text editor
Grep through them for specific changes
Parse them in JavaScript without special libraries
Pipe them through standard Unix tools

Minimal workflow changes: Archive files sit next to your original JSON files with a .archive extension. Your existing scripts need minimal modification.

Compression support

Compression libraries are a security vulnerability vector. The default build includes them because I want convenience. If you don't want to bundle compression libraries:

cargo install json-archive --no-default-features

The minimal build detects compressed files and errors with a clear message explaining you need the full version or manual decompression.

While the file format keeps things simple and readable, the full build of this tool also works with compressed archives. You can read from and write to gzip, deflate, zlib, brotli, and zstd compressed files without special flags.

Important caveat: Compressed archives may require rewriting the entire file during updates (depending on the compression format). If your temporary filesystem is full or too small, updates can fail. In that case, manually specify an output destination with -o to write the new archive elsewhere. This works fine for the happy path with archive files up to a few hundred megabytes. Maybe think about custom code if you want to track gigabytes of changes.

Real-world use case

Perfect for tracking YouTube video metadata over time:

# Download video info with yt-dlp
yt-dlp --write-info-json -o "%(id)s.%(ext)s" "https://youtube.com/watch?v=..."

# Create initial archive (creates videoID.info.json.archive)
json-archive videoID.info.json

# Later, append new observations to existing archive
json-archive videoID.info.json.archive videoID.info.json

# Or safely re-run (won't overwrite existing archive)
json-archive videoID.info.json

# Run daily in a cron job to capture changes
# The archive preserves your title/description experiments and view count history

Commands

The tool infers behavior from filenames:

Documentation

Info command - View archive metadata and observation timeline
State command - Retrieve JSON state at specific observations
File format specification - Technical details about the archive format

Creating archives

# Create archive from JSON files (output inferred from first filename)
json-archive file1.json file2.json file3.json
# Creates: file1.json.archive

# Won't overwrite existing archives (safe to re-run)
json-archive data.json  # Won't overwrite data.json.archive if it exists

# Force overwrite existing archive
json-archive --force data.json

# Specify custom output location
json-archive -o custom.archive data.json

Appending to archives

# First file is archive, rest are appended
json-archive existing.json.archive new1.json new2.json

# Works with any mix of files
json-archive data.json.archive updated-data.json

Additional options

# Add snapshots after 10 observations instead of default of 100 for faster append operations
json-archive -s 50 data.json

# Add source metadata
json-archive --source "youtube-metadata" data.json

File naming convention

Archives use the .json.archive extension by default:

<filename>.json -> <filename>.json.archive

Browser compatibility

The strength of the file format is easy browser visualization:

// Parse archive in browser
fetch('data.json.archive')
  .then(response => response.text())
  .then(text => {
    const lines = text.split('\n');
    const header = JSON.parse(lines[0]);
    const events = lines.slice(1)
      .filter(line => line && !line.startsWith('#'))
      .map(line => JSON.parse(line));
    
    // Replay history, build visualizations, etc.
  });

The format uses only standard JSON and organizes the data into roughly the shape you would need anyway.

Contributing

Contributions are welcome! However, you will need to sign a contributor license agreement with Peoples Grocers before we can accept your pull request.

I promise to fix bugs quickly, but the overall design prioritizes being hackable over raw performance. This means many obvious performance improvements won't be implemented as they would compromise the tool's simplicity and inspectability.

Areas where contributions are especially appreciated:

Additional CLI commands (validate, info, extract)
Better diff algorithms for arrays
More compression format support
Bug fixes and edge case handling

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This means:

You can use, modify, and distribute this software
If you modify and distribute it, you must share your changes under the same license
If you run a modified version on a server or embed it in a larger system, you must make the entire system's source code available to users
No TiVoization! Hardware restrictions that prevent users from running modified versions are prohibited. If you have a setup where you hard code a signing key into firmware and refuse to run any user modified programs signed by your secret key... then you are not allowed to use this software.

The AGPL ensures that improvements to this tool remain open and available to everyone, even when used in hosted services or embedded systems.