From [$/M-token] to delivered capacity
| Mode | Monthly | 3-yr TCO | $/M-tok |
|---|---|---|---|
| Cloud (on-demand) | $35.0K | $1.26M | $13.52 |
| Lease (committed HaaS) | — | — | — |
| CapEx (owned) | — | — | — |
| Hosted-API baseline✓ | $1.3K | $45.5K | $0.487 |
Hosted-API row is the "do nothing — pay per token" comparison. Output billed at this model's representative hosted rate; input estimated at 25% of output throughput (chat-typical). Input-heavy workloads (RAG, doc analysis) will land slightly higher than shown.
How it works
Model, chip, throughput. Optional: precision, batch size, context length, MFU. No signup, live recompute.
Three-constraint sizing (compute · memory · bandwidth), cluster topology, monthly + 3-year TCO across cloud, lease, and CapEx.
Sign up to size every workload across the fleet, run vendor RFP scenarios, and watch $/M-token shrink in real time.
What Mintok optimises
The calculator above covers one workload. The platform covers your entire fleet — every chip, every model, every contract.
Pin two of {workload, hardware, site}; solve the third.
Power-envelope-anchored. Pin your MW budget + facility constraints; plan capacity within them.
Token-throughput-anchored. Forecast utilisation per model, surface exhaustion dates, plan redeployment for retired silicon.
Workload-anchored. Pick a workload; compare every silicon — NVIDIA, AMD, Google TPU, Cerebras, Groq, AWS — head-to-head.
Hardware-config-anchored. Pin chip + MW; explode into the full DC BOM with compute / memory / latency constraint analysis.
Project the cost axis at every unit of granularity.
$/chip-hour, depreciation, $/FLOP. Compare any silicon under CapEx vs cloud, with reseller margin.
$/M-token per model, fleet optimisation, what-if sensitivity to utilisation, depreciation, contract structure.
Cluster TCO, $/cluster-hour, $/MW. What-if comparator across rack and cluster shapes.
FDE — Forward Deployment Engagement
Same physics, wrapped in the workflow consultancies and in-house AI platform teams use to author recommendations: workload capture, sizing, cost projection, customer-shareable brief.
Versioned, customer-shareable working doc. Draft → in-review → approved → shared. Cites the methodology version it was authored against.
Mintok-curated archetypes (chatbot, RAG, agentic, batch). Forkable per tenant; pre-fills Tier-3 numerics so workload spec isn’t a blank page.
Token-gated read-only view of the brief, workloads, and open questions. No login required — your customer bookmarks a link.
Async replacement for status meetings. Customers comment, you reply, everything threaded under the engagement. Emails fire on each reply.
Beyond sizing
The numbers above are the answer. Below is where you run them — orders, supply, fulfilment, and live agent watchers. One platform, end to end.
Capacity, Inference, Reference Architecture, and Rack sizing. $/M-token across every silicon, every workload.
Convert sized plans into orders. ATP per order, priority allocation against committed supply.
Demand, supply ledger, POs, AVL, vendors. PROPOSED → COMMITTED → RELEASED allocation lifecycle.
Order Health, Supply Constraints, Coverage, Schedule. Agent watchers ping you when reality drifts from plan.
Mintok is invite-only during private alpha. Tell us about your fleet — chips you're evaluating, target $/M-token, contract mix — and we'll get you set up.