2025-11-30 17:09:44 -08:00
|
|
|
// Integration tests for compressed archive functionality
|
|
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
use json_archive::archive_open::open_archive;
|
|
|
|
|
use json_archive::write_observation;
|
|
|
|
|
use json_archive::{read_archive, ReadMode};
|
|
|
|
|
use serde_json::{json, Value};
|
|
|
|
|
use std::fs::File;
|
|
|
|
|
use std::io::{BufWriter, Write};
|
2025-11-30 17:09:44 -08:00
|
|
|
use tempfile::NamedTempFile;
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
#[cfg(feature = "compression")]
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
fn test_append_to_compressed_archive_basic() {
|
2025-11-30 17:09:44 -08:00
|
|
|
use flate2::write::GzEncoder;
|
|
|
|
|
use flate2::Compression;
|
|
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
// Create initial archive with one state
|
|
|
|
|
let initial_state = create_json_file(&json!({"count": 0}));
|
|
|
|
|
let archive_file = NamedTempFile::with_suffix(".json.archive").unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
#[allow(unused_assignments)]
|
2025-11-30 17:09:44 -08:00
|
|
|
{
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
let file = File::create(archive_file.path()).unwrap();
|
|
|
|
|
let mut writer = BufWriter::new(file);
|
|
|
|
|
let mut current_state = Value::Null;
|
|
|
|
|
let mut observation_count: usize = 0;
|
|
|
|
|
|
|
|
|
|
current_state = write_observation(
|
|
|
|
|
&mut writer,
|
|
|
|
|
&mut observation_count,
|
|
|
|
|
None,
|
|
|
|
|
¤t_state,
|
|
|
|
|
&initial_state.path().to_path_buf(),
|
|
|
|
|
Some("test".to_string()),
|
|
|
|
|
)
|
|
|
|
|
.unwrap();
|
|
|
|
|
|
|
|
|
|
writer.flush().unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
}
|
|
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
dump_file(archive_file.path(), "Uncompressed archive");
|
|
|
|
|
|
2025-11-30 17:09:44 -08:00
|
|
|
// Compress it
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
let compressed_file = NamedTempFile::with_suffix(".json.archive.gz").unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
{
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
let input = std::fs::read(archive_file.path()).unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
let mut encoder = GzEncoder::new(
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
compressed_file.as_file().try_clone().unwrap(),
|
|
|
|
|
Compression::default(),
|
2025-11-30 17:09:44 -08:00
|
|
|
);
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
encoder.write_all(&input).unwrap();
|
|
|
|
|
encoder.finish().unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
}
|
|
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
dump_file(compressed_file.path(), "Compressed archive");
|
2025-11-30 17:09:44 -08:00
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
// Verify the compressed archive can be read
|
|
|
|
|
let opened = open_archive(compressed_file.path()).unwrap();
|
|
|
|
|
let result = read_archive(
|
|
|
|
|
opened.reader,
|
|
|
|
|
&compressed_file.path().display().to_string(),
|
|
|
|
|
ReadMode::FullValidation,
|
|
|
|
|
)
|
|
|
|
|
.unwrap();
|
2025-11-30 17:09:44 -08:00
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
eprintln!("=== Reader result ===");
|
|
|
|
|
eprintln!("final_state: {:?}", result.final_state);
|
|
|
|
|
eprintln!("observation_count: {}", result.observation_count);
|
|
|
|
|
eprintln!("diagnostics: {:?}", result.diagnostics);
|
|
|
|
|
eprintln!();
|
2025-11-30 17:09:44 -08:00
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
assert_eq!(result.final_state, json!({"count": 0}));
|
|
|
|
|
assert_eq!(result.observation_count, 0);
|
|
|
|
|
}
|
2025-11-30 17:09:44 -08:00
|
|
|
|
refactor: decompose archive read/write into composable building blocks
Delete archive_context.rs and archive_ops.rs (1200+ lines of duplicated
logic). Replace with four focused modules:
1. open_archive() - opens a file, detects compression, returns raw bytes
2. read_archive() - parses bytes into validated observations
3. CompressionWriter - writes bytes with any compression format
4. WriteStrategy - given a list of files, determines input archive,
output archive, output format, and which of four write modes to use:
- Create: new archive, no input
- Append: uncompressed input, seek to end
- AtomicSwap: compressed input, rewrite via temp file
- CopyOnWrite: different input/output paths, transcode between formats
Previously you could not specify output format. Appending always
preserved the input format, creating compressed archives didn't work.
Now all four cases work with any supported compression format.
Atomic swap now writes to temp file, then renames. Crash-safe.
Trade-off: This approach prioritizes code clarity over syscall efficiency.
The archive file may be opened and read multiple times during a single
operation (once for format detection, once for reading state, once for
copying content). A more optimized implementation could reuse file
handles, but the current approach makes each step's purpose obvious.
2025-12-01 20:51:50 -08:00
|
|
|
/// Helper to create a temp file with JSON content
|
|
|
|
|
fn create_json_file(content: &Value) -> NamedTempFile {
|
|
|
|
|
let mut file = NamedTempFile::new().expect("Failed to create temp file");
|
|
|
|
|
writeln!(file, "{}", serde_json::to_string(content).unwrap()).unwrap();
|
|
|
|
|
file
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/// Debug helper: print file contents as both hex and text
|
|
|
|
|
fn dump_file(path: &std::path::Path, label: &str) {
|
|
|
|
|
let bytes = std::fs::read(path).unwrap();
|
|
|
|
|
eprintln!("=== {} ({} bytes) ===", label, bytes.len());
|
|
|
|
|
eprintln!("Hex: {:02x?}", &bytes[..bytes.len().min(100)]);
|
|
|
|
|
if let Ok(text) = std::str::from_utf8(&bytes) {
|
|
|
|
|
eprintln!("Text:\n{}", &text[..text.len().min(500)]);
|
|
|
|
|
} else {
|
|
|
|
|
eprintln!("(not valid UTF-8)");
|
|
|
|
|
}
|
|
|
|
|
eprintln!();
|
2025-11-30 17:09:44 -08:00
|
|
|
}
|