Cost model

What the cloud bill would be — over time.

The mesh runs on Intel AI PCs an organization already owns, so the marginal cost of inference is ≈ $0. This models what the same agentic workload would cost on hosted cloud APIs across a fleet over its service life — honestly, with a declining cloud-price curve so it doesn't overstate the gap. Per-device hardware figures are labeled estimates.

compare vs

$10.31M

avoided over 5 years vs Anthropic (Claude Sonnet 4.6) at mid-tier — on hardware already owned (≈ $0 marginal inference)

range across providers: $1.21M – $10.31M

Cumulative cloud spend avoided

focus

OpenAI · GPT-4.1 miniGoogle · Gemini 2.5 FlashAnthropic · Claude Sonnet 4.6

hover the chart for exact values at each year

provider	model	yr-1 cloud	5-yr avoided	payback	5-yr net
OpenAI	GPT-4.1 mini	$548k	$1.21M	0.5y	$903k
Google	Gemini 2.5 Flash	$648k	$1.43M	0.4y	$1.12M
Anthropic	Claude Sonnet 4.6	$4.65M	$10.31M	0.1y	$10.00M
Dentons mesh	qwen3-8b / llama-8b	$0*	—	—	on-prem

* marginal inference cost on owned hardware; incremental capex/opex modeled separately in the payback view.

What moves the result

Swing in 5-yr avoided vs Anthropic as each assumption varies. The cloud-price-decline rate dominates — which is exactly why it's the input to argue about.

cloud price decline (40↔90%/yr)

runs/device/day ±50%

tokens/run ±50%

workload growth (0↔100%/yr)

low ↔ high

$5.15Mbase $10.31M$22.81M

Assumptions & honesty notes

Cloud prices fall fast. Inference prices have dropped on the order of 10×–50×/yr at fixed quality, so a flat projection would grossly overstate savings. The model applies a declining-price curve (default ~65–70%/yr, the conservative same-tier end); set it to 0 to see the flat upper bound.

8B local ≠ frontier cloud. The comparator defaults to a mid tier, the honest match for 8B-class models. Flagship is available but only fair when the workload genuinely needs that quality.

Hardware is dual-use. The fleet exists for productivity; inference rides on it at ≈ $0 marginal. The payback view therefore amortizes only the incremental AI-capability capex/opex, not the whole device or IT budget.

Per-device hardware figures are estimates. No authoritative public benchmark establishes 8B-on-NPU watts, capex, or throughput; treat capex/opex defaults as adjustable placeholders. Token volumes default to ≈ what a live referral run measured (3,563 in / 1,008 out per run).

Per-employee usage is a proxy, not a measurement. No public dataset reports tokens per enterprise seat — OpenRouter, Anthropic's Economic Index, Ramp, and OpenAI's usage disclosures all lack it. The light / medium / heavy presets are planning estimates anchored to OpenAI's “200+ messages/day” power-user disclosure and published per-interaction token ranges; the ~2×/yr growth ceiling reflects Menlo's $37B / 3.2× enterprise-spend and OpenRouter's ~10× throughput signals. The biggest swing is the agentic share of work, which can move per-seat tokens by orders of magnitude.