BroomVA

Lago

Event-sourced persistence substrate — append-only journal, content-addressed blobs, and knowledge index.

Lago

Lago is the persistence substrate of the Agent OS. It provides an append-only event journal, content-addressed blob storage, a knowledge index with graph traversal, and filesystem manifests with branching. Every state change in the system is an immutable event stored in Lago.

The name comes from "lake" -- a deep, still body that preserves everything deposited into it.

Architecture

Lago is structured as a Rust workspace with these crates:

CrateRole
lago-coreCore types, event envelope, stream identifiers
lago-journalAppend-only event journal trait and redb implementation
lago-storeContent-addressed blob storage (SHA-256 + zstd compression)
lago-fsFilesystem manifests with branching (tree of content-addressed nodes)
lago-ingestSSE stream ingestion (OpenAI, Anthropic, Vercel, Lago formats)
lago-apiHTTP API server (axum, SSE endpoints)
lago-policyRBAC policy engine (roles, permissions, hooks)
lago-knowledgeKnowledge index (frontmatter extraction, wikilinks, scored search, graph traversal)
lago-authJWT authentication with per-user vault sessions
lago-aios-eventstore-adapterAdapter implementing aiOS EventStore trait
lago-cliCLI for journal inspection and management
lagodStandalone daemon binary

Event journal

The journal is the heart of Lago. It stores events as immutable, ordered records in a redb v2 embedded database.

Event envelope

Every event is wrapped in an EventEnvelope that provides identity, ordering, integrity, and provenance:

struct EventEnvelope {
    id: Ulid,           // Globally unique, time-ordered identifier
    stream_id: String,  // Logical stream (e.g., session ID)
    kind: EventKind,    // Type of event (from aiOS taxonomy)
    payload: Vec<u8>,   // Serialized event data
    checksum: [u8; 32], // SHA-256 of the payload
    timestamp: u64,     // Unix timestamp in milliseconds
    metadata: Metadata, // Trace context, actor, provenance
}

The id is a ULID -- lexicographically sortable and time-ordered, so events naturally sort in append order. The checksum is a SHA-256 hash of the payload, providing integrity verification without additional storage.

Event kinds

Events follow the aiOS EventKind taxonomy:

CategoryEventsDescription
InputUserMessage, ExternalSignalUser input and external triggers
SessionSessionCreated, SessionResumed, SessionClosedSession lifecycle
CognitionAssistantMessage, ToolCall, ToolResultLLM responses and tool execution
MemoryMemoryStored, MemoryRetrievedKnowledge persistence
ApprovalApprovalRequested, ApprovalGranted, ApprovalDeniedHuman-in-the-loop gates
CustomAny string prefixSubsystem events (autonomic.*, finance.*)

The Custom kind enables forward-compatible persistence -- new event types (from Autonomic, Haima, or future subsystems) can be stored without schema migrations.

Append-only guarantee

Events are never modified or deleted. The journal is strictly append-only. This means:

  • State at any point in time can be reconstructed by replaying events up to that timestamp
  • Auditing is inherent -- the complete history of every agent action is preserved
  • Branching is cheap -- create a branch by remembering a cursor position and appending new events from there

Stream isolation

Events are organized into streams identified by a string ID (typically a session ID). Streams are independent -- appending to one stream does not affect others. Cross-stream queries are supported through the knowledge index.

Blob storage

Large binary content (files, images, documents) is stored separately from events in content-addressed blob storage:

  • Content addressing -- blobs are identified by their SHA-256 hash
  • Deduplication -- identical content is stored only once, regardless of how many events reference it
  • Compression -- all blobs are compressed with zstd before storage, reducing disk usage
  • Integrity -- the hash serves as a checksum; any corruption is immediately detectable

Blobs are referenced from events by their hash. The blob store is backed by the local filesystem with a flat hash-based directory structure:

data/blobs/
  a1/b2c3d4...  (SHA-256 hash prefix for directory sharding)

Knowledge index

The knowledge index provides searchable, graph-structured access to the information stored in Lago:

Frontmatter extraction

Documents ingested into Lago have their frontmatter parsed and indexed. Title, tags, dates, and custom fields become searchable metadata. The parser handles YAML frontmatter delimited by ---.

Documents can reference each other using [[wikilink]] syntax. These links form a directed graph that can be traversed to discover related content. The graph is maintained incrementally as documents are ingested.

Full-text search with relevance scoring. Queries are matched against document titles, frontmatter fields, and body content. Results are ranked by a TF-IDF-inspired scoring function that considers:

  • Term frequency in the document
  • Inverse document frequency across the corpus
  • Title match bonus (matches in titles score higher)
  • Frontmatter tag match bonus

Graph traversal

Starting from any document, you can traverse the wikilink graph to find related documents at a specified depth. This powers the "related content" and "see also" features in the platform, as well as the memory retrieval system that provides cross-session context.

Filesystem manifests

Lago provides a virtual filesystem built on content-addressed nodes:

  • Trees -- directory nodes containing references to child nodes
  • Blobs -- file nodes referencing content in blob storage
  • Branches -- named pointers to tree roots, enabling Git-like branching
  • Snapshots -- immutable captures of a tree state at a point in time

This allows agents to maintain a workspace that is fully versioned and branchable without a traditional filesystem.

RBAC policy

Lago includes a built-in policy engine for access control defined in lago-policy:

  • 3 default roles -- admin, user, reader
  • 5 default rules -- scoping journal access, blob access, and knowledge queries per role
  • 2 hooks -- pre-append and post-append hooks for custom validation and side effects

Policies are evaluated synchronously before each operation. Denied operations return an error without modifying state.

// Policy evaluation example
let policy = Policy::default(); // 3 roles, 5 rules, 2 hooks
let result = policy.evaluate(actor_role, operation);
match result {
    PolicyResult::Allow => { /* proceed */ }
    PolicyResult::Deny(reason) => { /* return error */ }
}

Running Lago

Lago typically runs embedded within Arcan through the arcan-lago bridge. For standalone use:

cd lago
cargo run -p lagod -- --data-dir /path/to/data --port 3001

CLI

# List streams in the journal
cargo run -p lago-cli -- streams

# Count events in a stream
cargo run -p lago-cli -- count --stream my-session

# Cat events as JSON
cargo run -p lago-cli -- cat --stream my-session --format json

# Search the knowledge index
cargo run -p lago-cli -- search "event sourcing"

Critical patterns

redb is synchronous. The redb embedded database does not support async I/O. All redb operations in Lago use tokio::task::spawn_blocking to avoid blocking the async runtime. Never call redb directly from an async context.

The Journal trait uses BoxFuture for dyn-compatibility, allowing different journal implementations to be swapped at runtime (redb for production, in-memory for tests).

JWT authentication

lago-auth provides JWT validation for the HTTP API. Tokens are verified using the same signing key as the platform auth system. Per-user vault sessions allow each authenticated user to have isolated access to their own streams and blobs.

On this page