2025-11-30 17:09:44 -08:00
|
|
|
#!/usr/bin/env bash
|
|
|
|
|
#
|
|
|
|
|
# Test brotli compression workflow:
|
|
|
|
|
# 1. Create archive from first state file
|
|
|
|
|
# 2. Compress with brotli
|
|
|
|
|
# 3. Append remaining state files to the compressed archive
|
|
|
|
|
# 4. Decompress and show info
|
|
|
|
|
#
|
|
|
|
|
|
|
|
|
|
set -euo pipefail
|
|
|
|
|
|
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
|
|
|
PROJECT_DIR="$(dirname "$(dirname "$SCRIPT_DIR")")"
|
|
|
|
|
BINARY="$PROJECT_DIR/target/debug/json-archive"
|
|
|
|
|
DATA_DIR="$SCRIPT_DIR/data"
|
|
|
|
|
OUT_DIR="$SCRIPT_DIR/out/brotli"
|
|
|
|
|
|
|
|
|
|
echo "=== Brotli Compression Test ==="
|
|
|
|
|
|
|
|
|
|
# Setup
|
|
|
|
|
rm -rf "$OUT_DIR"
|
|
|
|
|
mkdir -p "$OUT_DIR"
|
|
|
|
|
|
|
|
|
|
# Create initial archive from first state file
|
|
|
|
|
echo "Creating archive from state_1.json..."
|
|
|
|
|
"$BINARY" "$DATA_DIR/state_1.json" -o "$OUT_DIR/test.json.archive"
|
|
|
|
|
|
|
|
|
|
# Compress with brotli
|
|
|
|
|
echo "Compressing with brotli..."
|
|
|
|
|
brotli "$OUT_DIR/test.json.archive"
|
|
|
|
|
ls -la "$OUT_DIR/"
|
|
|
|
|
|
|
|
|
|
# Append remaining files to compressed archive
|
|
|
|
|
for i in $(seq 2 9); do
|
|
|
|
|
echo "Appending state_$i.json to compressed archive..."
|
|
|
|
|
"$BINARY" "$OUT_DIR/test.json.archive.br" "$DATA_DIR/state_$i.json"
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
# Show info on the result
|
|
|
|
|
echo ""
|
|
|
|
|
echo "Final archive info:"
|
|
|
|
|
"$BINARY" info "$OUT_DIR/test.json.archive.br"
|
|
|
|
|
|
|
|
|
|
# Decompress for manual inspection
|
|
|
|
|
echo ""
|
|
|
|
|
echo "Decompressing for comparison..."
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
brotli -d -k "$OUT_DIR/test.json.archive.br" -o "$OUT_DIR/test-decompressed.json.archive"
|
2025-11-30 17:09:44 -08:00
|
|
|
|
|
|
|
|
echo ""
|
|
|
|
|
echo "Decompressed archive info:"
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
"$BINARY" info "$OUT_DIR/test-decompressed.json.archive"
|
2025-11-30 17:09:44 -08:00
|
|
|
|
|
|
|
|
echo ""
|
|
|
|
|
echo "Files in $OUT_DIR:"
|
|
|
|
|
ls -la "$OUT_DIR/"
|