Inference
Any open model, autoscaled. Cold start < 900ms.
The layer your models run on. Routing, inference, and observability with a latency budget you can read.
11ms
We publish p50, p95, and p99. The number nobody else shows you is the one under load. Ours is on the next screen.
PLATFORM / 4 MODULES
Any open model, autoscaled. Cold start < 900ms.
Latency- and cost-aware routing across 14 regions.
Per-request traces, token accounting, replay.
LoRA + full-weight, versioned, one API.
BENCHMARK / vs NAMED BASELINES
| Provider | p50 | p95 | Throughput | Cost / 1M tok |
|---|---|---|---|---|
| Substrate | 11ms | 34ms | 2,400 tok/s | $0.42 |
| Hyperscaler A | 28ms | 91ms | 1,100 tok/s | $0.71 |
| Hyperscaler B | 24ms | 77ms | 1,350 tok/s | $0.68 |
| Self-hosted (ref) | 19ms | 58ms | 1,800 tok/s | $0.55 |
Llama-3.1-70B · eu-central-1 · 1M requests · methodology at /benchmarks
REQUEST LIFECYCLE / 4 STAGES
01
Client call hits the nearest edge.
02
Latency/cost router selects a region + model.
03
Model executes on autoscaled capacity.
04
Streamed tokens, fully traced.
Four stages, summed: under 11ms at p50. Every number on this page is one you can reproduce.
PRICING / TRANSPARENT
Pay per token. 1M free tokens / month. No card.
Start buildingVolume routing, all 14 regions, full observability.
Start buildingDedicated capacity, SLA 99.99%, private regions, SSO.
Book a demo“If you can’t MEASURE it, you can’t sell it to an engineer.”
— Mara Køhl, Founder & CTO, Substrate
TRUSTED BY TEAMS AT
“We cut p95 in half the week we moved. The benchmark page was the only vendor page our staff engineers didn’t roll their eyes at.”
A studio piece by ALO Design Pros.
Swiss precision as the material, for systems whose numbers are the argument.