Observability & Metrics

The Semantic Observability extension provides structured tracing of agent operations, Prometheus-format metrics export, and an aggregation query engine. All trace events are stored in Cortex as RDF triples for causal analysis.

TraceEmitter

The TraceEmitter buffers trace events in memory and flushes them to the Cortex /api/v1/events endpoint in batch.

Emit methods

Method	Event type	Key fields
`emitToolCall()`	`tool_call`	toolName, input, output, durationMs
`emitLLMCall()`	`llm_call`	model, promptTokens, completionTokens, durationMs
`emitDecision()`	`decision`	description, alternatives, chosen, reasoning
`emitDelegation()`	`delegation`	parentId, childId, task, runId
`emitError()`	`error`	error, context
`emitRaw()`	any	Push a pre-built event (used by SessionForkManager)

Buffer management

Maximum buffer size: 5,000 events (oldest dropped via FIFO)
Event TTL: 5 minutes — stale events discarded during flush
Flush interval: configurable (default varies by config)
Backoff: on flush failure, interval doubles up to 60 seconds
Final flush: on stop(), remaining events are flushed and any unflushed events are dropped with a warning

RDF serialization

Events are serialized with namespace-prefixed subjects:

subject: {ns}:event:{uuid}
type: tool_call
agentId: orchestrator
timestamp: 2026-03-08T12:00:00.000Z
fields: { toolName: "Read", input: "...", output: "..." }

MetricsExporter

The MetricsExporter collects counters and gauges, exporting them in Prometheus text exposition format with no external dependencies.

Registered metrics

Metric	Type	Labels	Description
`mayros_tool_calls_total`	counter	`tool_name`	Total tool calls by tool
`mayros_llm_calls_total`	counter	`model`	Total LLM calls by model
`mayros_llm_tokens_total`	counter	`direction`	Tokens by prompt/completion
`mayros_skill_queries_total`	counter	`tool`	Skill graph queries
`mayros_cortex_requests_total`	counter	`status`	Cortex requests by success/error
`mayros_active_skills`	gauge	—	Number of active skills

Prometheus output

# HELP mayros_tool_calls_total Total tool calls by tool name
# TYPE mayros_tool_calls_total counter
mayros_tool_calls_total{tool_name="Read"} 42
mayros_tool_calls_total{tool_name="Write"} 15

# HELP mayros_llm_tokens_total Total LLM tokens by direction
# TYPE mayros_llm_tokens_total counter
mayros_llm_tokens_total{direction="prompt"} 125000
mayros_llm_tokens_total{direction="completion"} 45000

The metrics endpoint is registered at the configured path (e.g., /metrics) when metrics.enabled is true.

ObservabilityQueryEngine

The query engine provides aggregation and analysis over stored trace events.

AgentStats

typescript
type AgentStats = {
  agentId: string;
  totalEvents: number;
  toolCalls: number;
  llmCalls: number;
  decisions: number;
  delegations: number;
  errors: number;
  avgToolDurationMs: number;
  avgLLMDurationMs: number;
};

Query methods

Method	Description
`aggregateStats(agentId, timeRange?)`	Aggregate event counts and average durations
`findSlowOps(agentId, thresholdMs)`	Find tool/LLM calls exceeding a duration
`findErrors(agentId, limit?)`	Group and rank error patterns by frequency

Error patterns

The findErrors method groups errors by message and returns:

typescript
type ErrorPattern = {
  error: string;    // Error message
  count: number;    // Occurrence count
  lastSeen: string; // ISO timestamp of last occurrence
  agentId: string;
};

Agent tools

Tool	Description
`trace_query`	Query trace events with optional agent, time range, type, and format filters
`trace_explain`	Explain why an event occurred by walking its causal chain
`trace_stats`	Show aggregated statistics for an agent
`trace_session_fork`	Fork a session into a new session
`trace_session_rewind`	Rewind a session to a timestamp

Output formats

The trace_query and trace_stats tools support multiple output formats:

terminal — formatted for console display
json — raw JSON output
markdown — formatted for documentation

Hooks wiring

Hook	Condition	Action
`after_tool_call`	`captureToolCalls`	Emit tool_call event + increment metrics
`llm_input`	`captureLLMCalls`	Start LLM call timer
`llm_output`	`captureLLMCalls`	Complete timer, emit llm_call event + token metrics
`subagent_spawned`	`captureDelegations`	Record delegation, start run timer
`subagent_ended`	`captureDelegations`	Complete delegation, emit error if failed
`agent_end`	tracing enabled	Emit error event if agent run failed

bash
mayros observe status                    # Show config, Cortex status, buffered events
mayros observe events [--agent id] [--type tool_call] [--from iso] [--to iso]
mayros observe explain <eventId>         # Causal chain analysis
mayros observe stats [--agent id] [--format json]

Decision Graph — causal chain analysis and session trees
Session Fork — fork and rewind sessions
Token Economy — budget tracking and cost metrics
Cortex — AIngle Cortex knowledge graph