Cortex Resilience
All communication between Mayros and Cortex is wrapped in resilience layers: a circuit breaker, retry logic, auto-restart, and graceful degradation.
Circuit breaker
The circuit breaker prevents cascading failures when Cortex is unhealthy:
mermaidstateDiagram-v2 [*] --> Closed Closed --> Open : failures >= threshold (5) Open --> HalfOpen : reset timeout (30s) HalfOpen --> Closed : 2 consecutive successes HalfOpen --> Open : any failure
| State | Behavior |
|---|---|
| Closed | Requests pass through normally. Failures are counted. |
| Open | All requests are blocked immediately. Waits for reset timeout. |
| Half-open | Test requests are allowed. 2 consecutive successes close the circuit. |
resilientFetch
Every Cortex HTTP call goes through resilientFetch(), which layers:
- Request timeout — aborts after 5s (configurable)
- Circuit breaker check — blocks if circuit is open
- Retry with backoff — retries on 500+ errors and connection failures (up to 2 retries)
- Jitter — adds 0–30% random delay to prevent thundering herd
- Circuit state update — records success or failure
Auto-restart
The Cortex sidecar process is monitored for crashes:
- On unexpected exit, Mayros auto-restarts the sidecar
- Up to 3 restart attempts with exponential backoff
- Health is verified via
GET /healthpolling after each restart - Lock file (
~/.mayros/cortex-data/.cortex.lock) prevents concurrent instances - Lock is automatically reclaimed on self-restart (same parent PID) or stale process detection
Port conflict detection
Before spawning, Mayros probes the configured port via TCP:
- If an external Cortex is already listening, Mayros attaches to it instead of spawning a new process
- If the port is occupied by a non-Cortex process, startup fails with a clear error message
Flush before update
During binary updates (mayros update), data is preserved:
POST /api/v1/flushpersists graph and Ineru snapshots to disk- Pending write queue is drained (10s timeout)
- SIGTERM → SIGKILL (10s timeout) stops the sidecar
- Binary is replaced, sidecar restarts with same
--dbpath
Graceful degradation
When Cortex is unavailable, features degrade gracefully:
- Semantic memory: tools return empty results instead of errors
- Knowledge graph: queries return no triples
- P2P sync: sync operations are skipped
- Observability: trace events are buffered locally
- Rules engine: rules are not applied
The agent continues functioning — Cortex-dependent features simply become no-ops.
Configuration
json5{ cortex: { resilience: { timeoutMs: 5000, // Request timeout maxRetries: 2, // Retry attempts retryDelayMs: 300, // Base delay between retries circuitThreshold: 5, // Failures before opening circuit circuitResetMs: 30000, // Time before half-open test }, }, }
Cortex is optional — Mayros works without it, but semantic memory, P2P sync, and knowledge graph features require a running Cortex instance.
Related
- Cortex (AIngle) — sidecar overview and REST API
- P2P Sync — synchronization between instances