Arbiter documentation
Overview and reading guide.
Arbiter is a drop-in, OpenAI-compatible router. It sits in front of the Bad Theory Labs (BTL) runtime and sends each request to the cheapest model that still does the job well - measuring the savings from the runtime's own cost headers rather than estimating them.
This folder explains what it does, why it works the way it does, and how to run and integrate it.
Where to start
- New here? Read architecture.md for the whole system in one pass, then strategies.md for the reasoning behind each decision.
- Integrating an app? Go straight to integration.md.
- Wiring against the HTTP API? See api-reference.md.
- Looking at the web app? See interface.md.
- Tuning behaviour? See configuration.md.
- Deploying? The repo's
DEPLOY.mdcovers the single-service Docker deploy (one image serves the API, the dashboard, and the docs on one URL).
The documents
| Document | What it covers |
|---|---|
| architecture.md | The request lifecycle and how the pieces fit: Classify -> Filter -> Route -> Grade -> Learn -> React. |
| interface.md | The web app: the landing page, the onboarding flow, the dashboard tabs, and the Playground. |
| strategies.md | Each strategy in depth - classification, the context filter, the routing policy, quality scoring, the judge, and price-shift re-exploration - with the trade-offs behind them. |
| api-reference.md | Every HTTP endpoint, request and response shapes, and the arbiter block added to each completion. |
| integration.md | Running Arbiter, pointing an OpenAI client at it, and the key / auth model. |
| configuration.md | Environment variables, the model registry, and the tunable thresholds. |
| roadmap.md | What's next: hardening, smarter routing, and wider reach - and which limitation each item addresses. |
Conventions used in these docs
- Baseline means the premium model we measure savings against (
gpt-4oby default). "Savings" always means versus running everything on the baseline. - Task type is one of
code,math,structured,factual,open. - Explore / exploit describe the two routing modes: gathering data about a model versus using what has already been learned.
- Model ids (e.g.
deepseek-chat-v3) refer to routes on the BTL runtime that answer on the/v1/chat/completionssurface. - Code references are given as
file:function, e.g.policy.py:choose.
The one-paragraph version
A request arrives dressed as an OpenAI call. Arbiter classifies it, drops any
model whose context window can't hold it, and routes it to the cheapest
remaining model whose learned quality is within tolerance of the best. It sends
that to the runtime, grades the answer, and folds the measured cost and quality
back into a shared, persistent policy - so the next similar request is routed
better. If a model's price moves, Arbiter notices and re-routes. The savings are
read from the runtime's x-btl-* headers, so they are measured, not guessed.