json-archive/docs/state-command.md

6.8 KiB

State Command

The json-archive state command retrieves the JSON state at a specific observation

Flags and Semantics:

Primary access methods:

  • --id : Get state at specific observation (unambiguous, primary method)
  • --index : Get state at Nth observation in file order (convenience for using outout from info command, with caveat)

Timestamp-based access:

  • --as-of : Most recent observation with timestamp ≤ given time ("state as of this moment")
  • ---before : Most recent observation with timestamp < given time (strictly before)
  • --after : Earliest observation with timestamp > given time (first state after)
  • --latest: Most recent observation by timestamp (default if no flags given)

Access Methods

You must specify exactly one access method to identify which observation's state to retrieve:

json-archive state --id <OBSERVATION_ID> file.archive

Gets the state at the observation with the specified ID. This is the most unambiguous method since observation IDs are unique within the archive.

Example:

json-archive state --id obs-c4636428-1400-44d7-b30f-1c080c608e3c data.json.archive

By File Index

json-archive state --index <INDEX> file.archive

Gets the state at the Nth observation in file order (0-indexed). Note: Observations are not guaranteed to be in chronological order in the file.

Example:

json-archive state --index 0 data.json.archive    # Initial state (first observation)
json-archive state --index 1 data.json.archive    # Second observation

By Timestamp

As-Of Timestamp

json-archive state --as-of <TIMESTAMP> file.archive

Gets the state from the most recent observation with timestamp ≤ the given time.

Example:

json-archive state --as-of "2025-01-15T10:05:00Z" data.json.archive

Right Before Timestamp

json-archive state --before <TIMESTAMP> file.archive

Gets the state from the most recent observation with timestamp < the given time (strictly before).

Example:

json-archive state --before "2025-01-15T10:05:00Z" data.json.archive

After Timestamp

json-archive state --after <TIMESTAMP> file.archive

Gets the state from the earliest observation with timestamp > the given time.

Example:

json-archive state --after "2025-01-15T10:05:00Z" data.json.archive

Latest State (Default)

json-archive state --latest file.archive
# Or simply:
json-archive state file.archive

Gets the state from the observation with the latest timestamp. This is the default behavior when no other access method is specified.

Timestamp Format

All timestamps must be in ISO-8601 format with UTC timezone:

  • 2025-01-15T10:05:00Z
  • 2025-01-15T10:05:00.123Z (with milliseconds)

Output

The command outputs the JSON state as pretty-printed JSON to stdout:

{
  "count": 10,
  "id": 1,
  "lastSeen": "2025-01-16",
  "name": "Bob",
  "score": 95,
  "tags": [
    "initial",
    "final"
  ]
}

Error Cases

Non-existent Observation ID

$ json-archive state --id nonexistent-id file.archive
error E030: Non-existent observation ID

I couldn't find an observation with ID 'nonexistent-id'

Use 'json-archive info' to see available observation IDs

Out-of-bounds Index

$ json-archive state --index 10 file.archive
error E053: Array index out of bounds

Index 10 is out of bounds. The archive has 3 observations (0-2)

Use 'json-archive info' to see available observation indices

Invalid Timestamp

$ json-archive state --as-of "invalid-timestamp" file.archive
error W012: Invalid timestamp

I couldn't parse the timestamp 'invalid-timestamp'. Please use ISO-8601 format like '2025-01-15T10:05:00Z'

No Observations Match Timestamp

$ json-archive state --as-of "2020-01-01T00:00:00Z" file.archive
error E051: Path not found

No observations found as of 2020-01-01 00:00:00 UTC

Try using --after to find the first observation after this time

Multiple Access Methods

$ json-archive state --id obs-123 --index 1 file.archive
error E022: Wrong field count

Please specify only one access method (--id, --index, --as-of, --right-before, --after, or --latest)

Examples:
json-archive state --id obs-123 file.archive
json-archive state --index 2 file.archive
json-archive state --as-of "2025-01-15T10:05:00Z" file.archive

Implementation Details and Design Rationale

Why Timestamp Access Uses More Memory

The timestamp-based flags (--as-of, --before, --after, --latest) are memory-intensive because they must read the entire archive file into memory. Here's why:

The Problem: Since observations are not guaranteed to be in chronological order in the file, we cannot use efficient seeking or leverage snapshots for optimization. To find "the most recent observation ≤ timestamp", we must:

  1. Read every single observation in the file
  2. Parse all their timestamps
  3. Sort them chronologically in memory
  4. Find the target observation
  5. Replay all events to reconstruct the state

This is fundamentally different from index-based access (--index, --id) which can potentially use snapshots and delta compression for efficiency.

Why Files Aren't Chronologically Sorted

The tool accepts observations in any order because data collection is ad hoc. You don't always have perfect control over when or how observations are collected.

Real scenario: You have JSON files scattered across multiple hard drives from different systems. You can't mount all drives simultaneously, so you load them one at a time and absorb the data as you go. The observations end up out of chronological order, but that's fine - you consolidate first, then organize later.

Another scenario: You inherit data from multiple APIs and systems that were collecting observations over time. The timestamps are there, but the files weren't collected in chronological order. You want to merge everything into one archive without having to pre-sort.

The approach: "Octopus-style merging" - absorb all the data as-is, then make sense of it incrementally. The file format doesn't fight this workflow.

Design Philosophy

Timestamp access works the way it does because the tool prioritizes data consolidation over performance optimization. When you use --as-of or --latest, you're asking the tool to figure out chronological relationships that may not exist in the file structure.

Index/ID access is direct - you're referencing observations by their position in the file or unique identifier. This is efficient because it doesn't require chronological analysis.

See Also