Dentons
Cost model

What the cloud bill would be — over time.

The mesh runs on Intel AI PCs an organization already owns, so the marginal cost of inference is ≈ $0. This models what the same agentic workload would cost on hosted cloud APIs across a fleet over its service life — honestly, with a declining cloud-price curve so it doesn't overstate the gap. Per-device hardware figures are labeled estimates.

compare vs
$10.31M
avoided over 5 years vs Anthropic (Claude Sonnet 4.6) at mid-tier — on hardware already owned (≈ $0 marginal inference)
range across providers: $1.21M – $10.31M
Cumulative cloud spend avoided
focus
$0$5.00M$10.00M$15.00M$20.00MY0Y1Y2Y3Y4Y5
OpenAI · GPT-4.1 miniGoogle · Gemini 2.5 FlashAnthropic · Claude Sonnet 4.6
hover the chart for exact values at each year
providermodelyr-1 cloud5-yr avoidedpayback5-yr net
OpenAIGPT-4.1 mini$548k$1.21M0.5y$903k
GoogleGemini 2.5 Flash$648k$1.43M0.4y$1.12M
AnthropicClaude Sonnet 4.6$4.65M$10.31M0.1y$10.00M
Dentons meshqwen3-8b / llama-8b$0*on-prem

* marginal inference cost on owned hardware; incremental capex/opex modeled separately in the payback view.

What moves the result

Swing in 5-yr avoided vs Anthropic as each assumption varies. The cloud-price-decline rate dominates — which is exactly why it's the input to argue about.

cloud price decline (40↔90%/yr)
runs/device/day ±50%
tokens/run ±50%
workload growth (0↔100%/yr)
low ↔ high
$5.15Mbase $10.31M$22.81M
Assumptions & honesty notes

Cloud prices fall fast. Inference prices have dropped on the order of 10×–50×/yr at fixed quality, so a flat projection would grossly overstate savings. The model applies a declining-price curve (default ~65–70%/yr, the conservative same-tier end); set it to 0 to see the flat upper bound.

8B local ≠ frontier cloud. The comparator defaults to a mid tier, the honest match for 8B-class models. Flagship is available but only fair when the workload genuinely needs that quality.

Hardware is dual-use. The fleet exists for productivity; inference rides on it at ≈ $0 marginal. The payback view therefore amortizes only the incremental AI-capability capex/opex, not the whole device or IT budget.

Per-device hardware figures are estimates. No authoritative public benchmark establishes 8B-on-NPU watts, capex, or throughput; treat capex/opex defaults as adjustable placeholders. Token volumes default to ≈ what a live referral run measured (3,563 in / 1,008 out per run).

Per-employee usage is a proxy, not a measurement. No public dataset reports tokens per enterprise seat — OpenRouter, Anthropic's Economic Index, Ramp, and OpenAI's usage disclosures all lack it. The light / medium / heavy presets are planning estimates anchored to OpenAI's “200+ messages/day” power-user disclosure and published per-interaction token ranges; the ~2×/yr growth ceiling reflects Menlo's $37B / 3.2× enterprise-spend and OpenRouter's ~10× throughput signals. The biggest swing is the agentic share of work, which can move per-seat tokens by orders of magnitude.