Token Economy
The Token Economy extension tracks LLM usage costs across session, daily, and monthly windows. It supports soft and hard enforcement modes, a built-in model pricing catalog, and an LRU prompt cache for estimating savings.
Budget tracking
The BudgetTracker maintains three independent cost counters:
| Scope | Resets | Persistence |
|---|---|---|
session | On each new session | In-memory only |
daily | Midnight (auto-rollover) | ~/.mayros/token-budget.json |
monthly | First of month (auto-rollover) | ~/.mayros/token-budget.json |
Each counter tracks total USD spent, and compares against its configured limit. The status for each scope is one of ok, warn, or exceeded.
Warning threshold
When usage reaches the warnThreshold (default: 80% of the limit), the status changes to warn. A context message is injected into the prompt via the before_prompt_build hook.
Enforcement modes
Soft enforcement (default)
When a budget is exceeded, a polite message is prepended to the prompt asking the agent to wrap up. The agent can still make LLM and tool calls.
Hard enforcement
When a budget is exceeded, a system-level STOP message is injected into the prompt. After a configurable grace period (default: 3 tool calls), subsequent tool calls are blocked entirely via the before_tool_call hook.
typescript// Hard enforcement flow: // 1. Budget exceeded → warn message injected // 2. Agent gets gracePeriodCalls more tool calls // 3. After grace period → tool calls return { block: true, reason: "..." }
Model pricing catalog
A built-in catalog provides fallback pricing (USD per 1M tokens) for models from Anthropic, OpenAI, and Google. User-configured cost entries take priority over the catalog.
| Provider | Models | Context windows |
|---|---|---|
| Anthropic | Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5, 3.5 Sonnet, 3.5 Haiku | 200K |
| OpenAI | GPT-4o, GPT-4o Mini, o1, o1-mini, o3-mini | 128K–200K |
| Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Pro | 1M–2M |
Each entry includes input, output, cacheRead, and cacheWrite rates plus context window size.
The catalog supports prefix matching: a model ID like claude-sonnet-4-6-20260301 matches the claude-sonnet-4-6 entry.
Prompt cache
The PromptCache uses an LRU eviction strategy with TTL-based expiry:
- Key: SHA-256 hash of
provider + model + systemPrompt + prompt - Capacity: configurable
maxEntries(default: 256) - TTL: configurable
ttlMs(default: 300,000 ms / 5 minutes)
The cache is observational — it does not skip LLM calls. Instead, it tracks cache hits and estimates how much would have been saved if the response had been reused. This provides visibility into prompt repetition without altering agent behavior.
Budget persistence and rollover
Budget state is persisted to ~/.mayros/token-budget.json (configurable via persistPath). The file is flushed:
- Every 30 seconds during an active session
- On session end
- On service stop
Rollover is automatic: if the persisted dailyDate does not match today, the daily counter resets. If monthlyKey does not match the current month, the monthly counter resets.
Agent tools
| Tool | Description |
|---|---|
budget_status | Show session, daily, and monthly cost vs limits |
budget_set_limit | Set or update a budget limit at runtime |
budget_model_usage | Per-model cost and token breakdown |
budget_pricing_catalog | List the built-in pricing catalog |
Hooks wiring
| Hook | Action |
|---|---|
session_start | Load persisted budget, rollover, init tracker |
llm_output | Accumulate cost, update prompt cache |
llm_input | Check prompt cache for observational tracking |
before_tool_call | Hard enforcement: block tool calls after grace |
before_prompt_build | Inject budget warnings or stop messages |
session_end | Flush budget to disk, clear cache |
Configuration
typescript{ sessionLimitUsd?: number; // Per-session limit (default: unlimited) dailyLimitUsd?: number; // Daily limit (default: unlimited) monthlyLimitUsd?: number; // Monthly limit (default: unlimited) warnThreshold: number; // 0.0–1.0 (default: 0.8) persistPath: string; // Default: "~/.mayros/token-budget.json" enforcement: "soft" | "hard"; // Default: "soft" gracePeriodCalls: number; // Tool calls allowed after exceed (default: 3) cache: { enabled: boolean; // Default: true maxEntries: number; // Default: 256 ttlMs: number; // Default: 300000 } }
CLI
bashmayros budget status # Print budget summary table mayros budget set <scope> <usd> # Set a budget limit mayros budget reset [scope] # Reset counters (session|daily|monthly|all) mayros budget models # Per-model cost breakdown mayros budget pricing # Show built-in pricing catalog mayros budget cache # Prompt cache stats
Related
- Usage Tracking — token counting and cost display
- Observability & Metrics — Prometheus metrics for LLM calls
- Models — model configuration and selection