claude-model-router automatically routes Claude Code sub-tasks to the cheapest capable model — saving 35–50% on token costs without sacrificing quality.
When Claude Code spawns sub-agents for file exploration, code search, or formatting, it defaults to the most capable model. That's like hiring an architect to measure a room.
Install the plugin and everything is automatic. Token tracking starts immediately, routing instructions are injected into your CLAUDE.md, and savings are reported every session.
| Complexity | Context | Model |
|---|---|---|
| Simple | Low | Haiku |
| Simple | High | Sonnet |
| Medium | Low | Sonnet |
| Medium | High | Opus |
| Complex | Any | Opus |
Every API interaction is logged — main thread and sub-agents. No manual calls needed. A stop hook processes transcripts after every response.
Two-step classification routes tasks by agent type, complexity, and context dependency. Hard rules catch obvious cases before the classifier runs.
Colorized terminal banners show per-session and all-time cost breakdowns with per-model usage and savings percentages.
The get_routing_config tool exposes the full routing matrix and model pricing. No black boxes.
Failed sub-agents retry at the next model tier, never the same one. Haiku → Sonnet → Opus. Quality is never sacrificed.
All data stays in a local SQLite database. Nothing leaves your machine except standard Anthropic API calls.
Real-world results from heavy Claude Code usage with agent-based workflows.
That's it. Token tracking starts automatically on your next session.
When paired with claude-mem, the two plugins attack costs from both sides:
ANTHROPIC_API_KEY environment variable