Add merge module for combining two blobstores via hard-links using SQLite ATTACH DATABASE, with batched transactions and idempotent re-runs. Add oplog module for configurable per-operation logging with zero-cost disabled path. Migrate CLI parsing from clap to xflags. Refactor load_blob to accept &AppState and return LoadOutcome. Improve rebalancer with duplicate message draining and pending-count tracking. Replace println! calls with oplog! macro. |
||
|---|---|---|
| benches | ||
| migrations | ||
| src | ||
| tests | ||
| .gitignore | ||
| architecture.excalidraw | ||
| Cargo.lock | ||
| Cargo.toml | ||
| config.toml | ||
| README.md | ||
rust-blobstore
Design
The Larger Problem
Consider someone accumulating video files over several years - a photographer shooting weddings, a YouTuber building their channel, anyone creating video content as a small business or hobby. They start with one external drive, then buy another when that fills up, then another. Different capacities purchased at different times, formatted with whatever filesystem made sense then: NTFS on Windows, ext4 on Linux, exFAT for compatibility. Eventually they have a dozen drives, maybe connected through a USB hub to work with several at once.
Total storage capacity exceeds what they can keep online simultaneously. A proper RAID array large enough to consolidate everything costs $30,000-$100,000 - prohibitive for someone running a small operation. So the drives stay external and disconnected.
Now the problems compound. The same video clip exists on multiple drives with different filenames - original camera files, renamed exports, copies made during editing. Which drives contain which footage? When you plug in a drive, scanning terabytes to build an index every time is impractical. Moving files between drives for consolidation risks data loss or redundant transfers. Deduplication requires identifying files by content, but you cannot compare files when the drives aren't simultaneously connected.
What you want: a software solution working with existing hardware, costing perhaps $100 for an extra 1TB SSD to store metadata, that enables deduplication across intermittently-connected drives and makes it obvious what content lives where.
Decomposing the Solution
The overall system needs several things: a global catalog mapping filenames and metadata to content, a deduplication strategy, a way to track which drives are connected, a user interface, and the actual storage management on each drive.
That last piece - storage management on each drive - becomes simpler if we separate concerns. The global catalog stores human-meaningful information: this video file was shot at this wedding, has this title, these tags. The storage layer just manages blobs identified by content hash.
This separation makes each drive self-describing. Plug in a drive, and a local database immediately tells you which content-addressed blobs exist on it, no filesystem scanning required. The global catalog (separate program, not included here) maps metadata to blob hashes and tracks which blobs exist on which drives.
This program implements the per-drive storage layer. One instance per filesystem. Lightweight process spawned when drive connects, stopped when drive disconnects.
What This Component Does
Each blobstore instance manages one filesystem:
Content addressing: Files stored by BLAKE3 hash. Same content gets same hash regardless of filename. Deduplication falls out naturally - you either have this hash or you don't.
Local index: SQLite database tracking which hashes exist on this filesystem. Query this database instead of scanning the filesystem. The database lives on the same drive, making the drive self-describing.
Filesystem UUID: At database creation, generate random UUID. This permanently identifies the filesystem. The global catalog references "blob blake3:abc123 exists on filesystem uuid:7f3d". Drive can be unplugged, moved between machines, reconnected - same UUID maintains identity.
Directory structure: Store blobs in subdirectories to prevent any single directory from containing too many files. Maximum 4096 items per directory enforced by using 3 hex characters from the hash as the path. Keeps directories browseable in standard file managers and prevents tools that try to index entire directories from choking.
HTTP server: Serve blobs directly to web browsers. The global catalog can proxy requests: "user wants video X, which is blob blake3:abc123 on filesystem uuid:7f3d, that blobstore is online at localhost:8081, proxy GET request to http://localhost:8081/blobs/blake3:abc123". Browser gets the video, no separate web server configuration needed.
Import directory: For bulk transfers between drives, copy files to the import staging directory using standard Unix tools (rsync, cp, whatever). Blobstore detects new files, calculates hashes, moves them into the content-addressed structure, reports the blobrefs. The global catalog updates its index to reflect new locations. Source can be deleted after confirmation. No data loss, no redundant transfers.
Reconciliation: If someone modifies the filesystem directly (copies files in, deletes files, moves a drive between systems), reconciliation scans actual stored content and rebuilds the index. The database reflects reality.
Why This Design
No shared state between blobstores: Each instance knows only its local filesystem. Makes the component simple. The global coordinator handles distributed state.
Interoperates with existing tools: Standard filesystems, standard directory structures browseable in file managers, standard Unix tools for file movement. No custom file formats locking you into specific software.
Minimal cost: Works with accumulated hardware. No need to replace drives or buy expensive arrays. Computational cost is hashing (unavoidable for content addressing) and database lookups (cheap).
Scales with intermittent connectivity: Drive plugged in? Spawn process. Drive unplugged? Stop process. Global coordinator tracks which blobstores are available. No assumption that all storage is online simultaneously.
Clear component boundary: This program does blob storage, nothing else. Doesn't track filenames, metadata, relationships between files, or user intent. The global catalog handles those concerns. Clean separation makes both components simpler.
Defense in depth for chaotic workflows: The reality of this use case matters. Someone managing video files across a plastic tub of drives labeled with tape and sharpie is operating in a fundamentally different environment than someone with a proper RAID array and backup strategy. If the data were highly valuable and the workflow well-funded, they would have proper infrastructure. The fact that they're using this approach means higher risk and more variance.
The global catalog tracks everything: metadata for all files across all drives (keyed by content hash), mapping of which content hashes exist on which drives, and the actual filesystem paths for each blobref. But prudent design suggests the global program should also sync copies of relevant metadata and the blobref-to-path mapping onto each physical drive.
Why? Because drives in this workflow get disconnected, moved between systems, borrowed, stored in closets, taken to different locations. A drive might be plugged into a different machine where the global catalog isn't available. Someone might need to manually browse a drive to find footage. The drive might outlive the global catalog database (laptop dies, database corrupted, migration gone wrong).
Making each drive self-describing isn't just about performance (avoiding scans). It's about resilience. Each drive carries enough information to be useful independently. The local SQLite database on each drive provides that self-description. The blobstore program provides the interface to that database. Modularity and defense in depth for an inherently messy operational environment.