Cortex Resilience

All communication between Mayros and Cortex is wrapped in resilience layers: a circuit breaker, retry logic, auto-restart, and graceful degradation.

Circuit breaker

The circuit breaker prevents cascading failures when Cortex is unhealthy:

mermaid
stateDiagram-v2
    [*] --> Closed
    Closed --> Open : failures >= threshold (5)
    Open --> HalfOpen : reset timeout (30s)
    HalfOpen --> Closed : 2 consecutive successes
    HalfOpen --> Open : any failure
StateBehavior
ClosedRequests pass through normally. Failures are counted.
OpenAll requests are blocked immediately. Waits for reset timeout.
Half-openTest requests are allowed. 2 consecutive successes close the circuit.

resilientFetch

Every Cortex HTTP call goes through resilientFetch(), which layers:

  1. Request timeout — aborts after 5s (configurable)
  2. Circuit breaker check — blocks if circuit is open
  3. Retry with backoff — retries on 500+ errors and connection failures (up to 2 retries)
  4. Jitter — adds 0–30% random delay to prevent thundering herd
  5. Circuit state update — records success or failure

Auto-restart

The Cortex sidecar process is monitored for crashes:

  • On unexpected exit, Mayros auto-restarts the sidecar
  • Up to 3 restart attempts with exponential backoff
  • Health is verified via GET /health polling after each restart
  • Lock file (~/.mayros/cortex-data/.cortex.lock) prevents concurrent instances
  • Lock is automatically reclaimed on self-restart (same parent PID) or stale process detection

Port conflict detection

Before spawning, Mayros probes the configured port via TCP:

  • If an external Cortex is already listening, Mayros attaches to it instead of spawning a new process
  • If the port is occupied by a non-Cortex process, startup fails with a clear error message

Flush before update

During binary updates (mayros update), data is preserved:

  1. POST /api/v1/flush persists graph and Ineru snapshots to disk
  2. Pending write queue is drained (10s timeout)
  3. SIGTERM → SIGKILL (10s timeout) stops the sidecar
  4. Binary is replaced, sidecar restarts with same --db path

Graceful degradation

When Cortex is unavailable, features degrade gracefully:

  • Semantic memory: tools return empty results instead of errors
  • Knowledge graph: queries return no triples
  • P2P sync: sync operations are skipped
  • Observability: trace events are buffered locally
  • Rules engine: rules are not applied

The agent continues functioning — Cortex-dependent features simply become no-ops.

Configuration

json5
{
  cortex: {
    resilience: {
      timeoutMs: 5000,          // Request timeout
      maxRetries: 2,            // Retry attempts
      retryDelayMs: 300,        // Base delay between retries
      circuitThreshold: 5,      // Failures before opening circuit
      circuitResetMs: 30000,    // Time before half-open test
    },
  },
}

Cortex is optional — Mayros works without it, but semantic memory, P2P sync, and knowledge graph features require a running Cortex instance.