Token Economy

The Token Economy extension tracks LLM usage costs across session, daily, and monthly windows. It supports soft and hard enforcement modes, a built-in model pricing catalog, and an LRU prompt cache for estimating savings.

Budget tracking

The BudgetTracker maintains three independent cost counters:

Scope	Resets	Persistence
`session`	On each new session	In-memory only
`daily`	Midnight (auto-rollover)	`~/.mayros/token-budget.json`
`monthly`	First of month (auto-rollover)	`~/.mayros/token-budget.json`

Each counter tracks total USD spent, and compares against its configured limit. The status for each scope is one of ok, warn, or exceeded.

Warning threshold

When usage reaches the warnThreshold (default: 80% of the limit), the status changes to warn. A context message is injected into the prompt via the before_prompt_build hook.

Enforcement modes

Soft enforcement (default)

When a budget is exceeded, a polite message is prepended to the prompt asking the agent to wrap up. The agent can still make LLM and tool calls.

Hard enforcement

When a budget is exceeded, a system-level STOP message is injected into the prompt. After a configurable grace period (default: 3 tool calls), subsequent tool calls are blocked entirely via the before_tool_call hook.

typescript
// Hard enforcement flow:
// 1. Budget exceeded → warn message injected
// 2. Agent gets gracePeriodCalls more tool calls
// 3. After grace period → tool calls return { block: true, reason: "..." }

Model pricing catalog

A built-in catalog provides fallback pricing (USD per 1M tokens) for models from Anthropic, OpenAI, and Google. User-configured cost entries take priority over the catalog.

Provider	Models	Context windows
Anthropic	Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5, 3.5 Sonnet, 3.5 Haiku	200K
OpenAI	GPT-4o, GPT-4o Mini, o1, o1-mini, o3-mini	128K–200K
Google	Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Pro	1M–2M

Each entry includes input, output, cacheRead, and cacheWrite rates plus context window size.

The catalog supports prefix matching: a model ID like claude-sonnet-4-6-20260301 matches the claude-sonnet-4-6 entry.

Prompt cache

The PromptCache uses an LRU eviction strategy with TTL-based expiry:

Key: SHA-256 hash of provider + model + systemPrompt + prompt
Capacity: configurable maxEntries (default: 256)
TTL: configurable ttlMs (default: 300,000 ms / 5 minutes)

The cache is observational — it does not skip LLM calls. Instead, it tracks cache hits and estimates how much would have been saved if the response had been reused. This provides visibility into prompt repetition without altering agent behavior.

Budget persistence and rollover

Budget state is persisted to ~/.mayros/token-budget.json (configurable via persistPath). The file is flushed:

Every 30 seconds during an active session
On session end
On service stop

Rollover is automatic: if the persisted dailyDate does not match today, the daily counter resets. If monthlyKey does not match the current month, the monthly counter resets.

Agent tools

Tool	Description
`budget_status`	Show session, daily, and monthly cost vs limits
`budget_set_limit`	Set or update a budget limit at runtime
`budget_model_usage`	Per-model cost and token breakdown
`budget_pricing_catalog`	List the built-in pricing catalog

Hooks wiring

Hook	Action
`session_start`	Load persisted budget, rollover, init tracker
`llm_output`	Accumulate cost, update prompt cache
`llm_input`	Check prompt cache for observational tracking
`before_tool_call`	Hard enforcement: block tool calls after grace
`before_prompt_build`	Inject budget warnings or stop messages
`session_end`	Flush budget to disk, clear cache

Configuration

typescript
{
  sessionLimitUsd?: number;      // Per-session limit (default: unlimited)
  dailyLimitUsd?: number;        // Daily limit (default: unlimited)
  monthlyLimitUsd?: number;      // Monthly limit (default: unlimited)
  warnThreshold: number;         // 0.0–1.0 (default: 0.8)
  persistPath: string;           // Default: "~/.mayros/token-budget.json"
  enforcement: "soft" | "hard";  // Default: "soft"
  gracePeriodCalls: number;      // Tool calls allowed after exceed (default: 3)
  cache: {
    enabled: boolean;            // Default: true
    maxEntries: number;          // Default: 256
    ttlMs: number;               // Default: 300000
  }
}

CLI

bash
mayros budget status             # Print budget summary table
mayros budget set <scope> <usd>  # Set a budget limit
mayros budget reset [scope]      # Reset counters (session|daily|monthly|all)
mayros budget models             # Per-model cost breakdown
mayros budget pricing            # Show built-in pricing catalog
mayros budget cache              # Prompt cache stats

Usage Tracking — token counting and cost display
Observability & Metrics — Prometheus metrics for LLM calls
Models — model configuration and selection