What the cloud bill would be — over time.
The mesh runs on Intel AI PCs an organization already owns, so the marginal cost of inference is ≈ $0. This models what the same agentic workload would cost on hosted cloud APIs across a fleet over its service life — honestly, with a declining cloud-price curve so it doesn't overstate the gap. Per-device hardware figures are labeled estimates.
| provider | model | yr-1 cloud | 5-yr avoided | payback | 5-yr net |
|---|---|---|---|---|---|
| OpenAI | GPT-4.1 mini | $548k | $1.21M | 0.5y | $903k |
| Gemini 2.5 Flash | $648k | $1.43M | 0.4y | $1.12M | |
| Anthropic | Claude Sonnet 4.6 | $4.65M | $10.31M | 0.1y | $10.00M |
| Dentons mesh | qwen3-8b / llama-8b | $0* | — | — | on-prem |
* marginal inference cost on owned hardware; incremental capex/opex modeled separately in the payback view.
Swing in 5-yr avoided vs Anthropic as each assumption varies. The cloud-price-decline rate dominates — which is exactly why it's the input to argue about.
Assumptions & honesty notes
Cloud prices fall fast. Inference prices have dropped on the order of 10×–50×/yr at fixed quality, so a flat projection would grossly overstate savings. The model applies a declining-price curve (default ~65–70%/yr, the conservative same-tier end); set it to 0 to see the flat upper bound.
8B local ≠ frontier cloud. The comparator defaults to a mid tier, the honest match for 8B-class models. Flagship is available but only fair when the workload genuinely needs that quality.
Hardware is dual-use. The fleet exists for productivity; inference rides on it at ≈ $0 marginal. The payback view therefore amortizes only the incremental AI-capability capex/opex, not the whole device or IT budget.
Per-device hardware figures are estimates. No authoritative public benchmark establishes 8B-on-NPU watts, capex, or throughput; treat capex/opex defaults as adjustable placeholders. Token volumes default to ≈ what a live referral run measured (3,563 in / 1,008 out per run).
Per-employee usage is a proxy, not a measurement. No public dataset reports tokens per enterprise seat — OpenRouter, Anthropic's Economic Index, Ramp, and OpenAI's usage disclosures all lack it. The light / medium / heavy presets are planning estimates anchored to OpenAI's “200+ messages/day” power-user disclosure and published per-interaction token ranges; the ~2×/yr growth ceiling reflects Menlo's $37B / 3.2× enterprise-spend and OpenRouter's ~10× throughput signals. The biggest swing is the agentic share of work, which can move per-seat tokens by orders of magnitude.
- Epoch AI — LLM inference price trends (9–900×/yr fixed-quality, ~50× median)
- a16z — LLMflation (~10×/yr at equivalent performance)
- DeepLearning.AI / Andrew Ng — falling token prices (~79%/yr same-tier)
- Lenovo — on-prem vs cloud GenAI TCO method (5-yr amortization)
- Wall Street Prep — payback period (uneven-cashflow method)
- OpenAI — ChatGPT usage & adoption patterns at work (200+ msgs/day power-users)
- Menlo Ventures — 2025 State of Gen-AI in the enterprise ($37B, 3.2× YoY spend)
- OpenRouter / a16z — State of AI (per-request token intensity, ~10× throughput)
- Stanford Digital Economy Lab — how AI agents spend tokens (agentic 1000×, input-dominated)