BroomVA

Vigil

Observability layer — OpenTelemetry-native tracing, GenAI semantic conventions, and contract-derived instrumentation.

Vigil

Vigil is the observability layer of the Agent OS -- the senses through which the system perceives its own operation. It provides OpenTelemetry-native tracing, GenAI semantic conventions, and contract-derived instrumentation that maps 1:1 to the aiOS kernel lifecycle.

Design principles

Observability should be invisible when not needed, and comprehensive when enabled. Without an OTLP endpoint configured, Vigil only sets up structured logging via tracing-subscriber. No OpenTelemetry SDK overhead, no network calls, no performance impact.

Span hierarchy derives from the kernel contract. Vigil does not invent its own span structure -- it mirrors the aiOS 8-phase tick lifecycle. Agent sessions produce invoke_agent root spans, loop phases produce child spans, and LLM calls produce chat spans with GenAI semantic attributes. This ensures observability and runtime behavior are always in sync.

Dual-write architecture. OTel trace and span IDs are written into EventEnvelope objects via write_trace_context, linking persisted Lago events to their live traces. You can follow a trace from Langfuse or Jaeger back to the exact events in the journal.

Architecture

Vigil is a single crate with four modules:

config

VigConfig configures the telemetry pipeline with environment variable overrides:

  • VigConfig::for_service("arcan") -- create config for a named service
  • VigConfig::from_env() -- build config purely from environment variables
  • config.with_env_overrides() -- apply env overrides on top of programmatic values

semconv

Semantic convention constants organized into four namespaces:

NamespaceAttributesPurpose
gen_ai.*operation name, system, model, tokens, tool name, agentGenAI semantic conventions (OTel spec)
life.*session/run/branch IDs, loop phase, operating mode, budget, state vector, tool statusLife Agent OS attributes
autonomic.*economic mode, health pillarsAutonomic controller attributes
lago.*stream ID, blob hash, fs branchLago persistence attributes

spans

Contract-derived span builders that create properly-attributed tracing spans:

BuilderOTel operationPurpose
agent_span(session_id, agent_name)invoke_agentRoot span for agent sessions
phase_span(LoopPhase)per-phaseChild span for loop phases (perceive, deliberate, gate, execute, commit, reflect, sleep)
chat_span(model, provider, max_tokens, temperature)chatGenAI client span for LLM calls
tool_span(tool_name, tool_call_id)execute_toolGenAI span for tool calls
record_token_usage(span, usage)--Record token counts on a chat span
record_finish_reason(span, reason)--Record stop reason
write_trace_context(envelope)--Write OTel trace/span IDs into an EventEnvelope (dual-write)
extract_trace_context(envelope)--Extract trace context from persisted events

metrics

GenAiMetrics provides pre-created OTel metric instruments:

InstrumentTypeDescription
gen_ai.client.token.usageHistogramToken counts per request (input/output breakdown)
gen_ai.client.operation.durationHistogramLLM call duration in seconds
life.tool.executionsCounterTool executions by name and status
life.budget.tokens_remainingGaugeRemaining token budget
life.budget.cost_remaining_usdGaugeRemaining cost budget
life.mode.transitionsCounterOperating mode transitions

Platform integration

Vigil works with any OpenTelemetry-compatible backend. Configure the OTLP endpoint and Vigil handles the rest.

Langfuse

export OTEL_EXPORTER_OTLP_ENDPOINT="https://cloud.langfuse.com/api/public/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64(public_key:secret_key)>"
export OTEL_SERVICE_NAME="arcan"

LangSmith

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.smith.langchain.com/otel"
export OTEL_EXPORTER_OTLP_HEADERS="x-api-key=<langsmith_api_key>"
export OTEL_SERVICE_NAME="arcan"

Jaeger

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_SERVICE_NAME="arcan"

Grafana Tempo

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_SERVICE_NAME="arcan"

Environment variables

VariableDescriptionDefault
OTEL_EXPORTER_OTLP_ENDPOINTOTLP collector endpoint (e.g. http://localhost:4317)None (logging only)
OTEL_EXPORTER_OTLP_HEADERSComma-separated key=value pairs for OTLP headersNone
OTEL_SERVICE_NAMEService identity for OTel resource"vigil"
VIGIL_LOG_FORMATLog output format: pretty or jsonpretty
VIGIL_CAPTURE_CONTENTCapture prompt/completion content in spans: true/1/yesfalse
VIGIL_SAMPLING_RATIOTrace sampling ratio (0.0 to 1.0)1.0

Graceful degradation: Without OTEL_EXPORTER_OTLP_ENDPOINT, Vigil only configures tracing-subscriber for structured logging. No OTel SDK is initialized, and there is no performance overhead from telemetry.

Dependencies

Vigil depends only on aios-protocol from the kernel contract. It has no dependency on Arcan, Lago, Autonomic, Praxis, or Spaces. This makes it safe to use from any subsystem without introducing circular dependencies.

aios-protocol (canonical contract — EventEnvelope, LoopPhase, TokenUsage)
  └── vigil (observability — tracing + metrics + GenAI conventions)

Troubleshooting

No spans appearing in Langfuse or LangSmith

  1. Check that OTEL_EXPORTER_OTLP_ENDPOINT is set correctly (include the full URL path)
  2. Verify OTEL_EXPORTER_OTLP_HEADERS has valid auth credentials
  3. Ensure the VigGuard is kept alive for the application lifetime (it flushes on drop)

"failed to initialize tracing subscriber" error

This happens when tracing_subscriber::registry().try_init() is called more than once in the same process. The global subscriber can only be set once. In tests, Vigil uses try_init() to tolerate this.

On this page