Economics

Where every dollar goes,
broken out per test.

We charge $5 per attested test on standard benchmarks. This page shows the actual cost columns: inference, attestor compute, ZK proof generation, mainnet gas, hosting, margin. Per-benchmark where it varies. Iteration levers where we can squeeze it down.

01The $5 cost stack — one standard test.

A “standard” test = ~1,000 problems (MBPP, HumanEval), 3-run average, Claude-Sonnet-class model, SP1 proof, Aligned Layer batch of 32. These are the numbers behind the headline price.

Line itemCostPaid toNotes
LLM inference (3× standard bench)$0.80Model providerAvg 1.5M tokens · Sonnet-tier rate
Attestor compute (CPU + I/O)$0.40Attestor operatorHardware amortized over 50 runs/mo
ZK proof generation (SP1)$1.50Attestor operatorSP1 prover, 100M cycles · RTX 5090-class
Ethereum L1 gas (batched)$1.30Ethereum miners$42 batch / 32 runs · scales w/ base fee
IPFS + edge hosting$0.10Pinata / VercelTranscripts + dataset mirror, amortized
Platform margin$0.90Benchlist (Slopshop Inc.)Team, support, runner + SDK maintenance
Total$5.00Roughly break-even with 1 credit/pack growth; margin expands with scale

These are our best-case steady-state numbers, not worst case. A complex suite (SWE-bench Verified, τ-Bench) has a different cost profile; see complex suites.

02Per-benchmark matrix.

Not every benchmark costs the same. The inference column dominates; the others scale less aggressively. Prices below are what we internally compute; user-facing prices are $5 for anything in green and quoted separately for yellow/red.

Benchmark Problems Inference Proof gen Gas (amortized) Total cost User price
HumanEval164 $0.15$1.20$1.30 $3.05 $5
MBPP974 $0.80$1.50$1.30 $4.10 $5
MMLU-Pro12,032 $1.80$1.80$1.30 $5.30 $5
GSM8K8,500 $0.90$1.50$1.30 $4.10 $5
LongMemEval500 $1.50$1.80$1.30 $5.10 $5
LCB (LiveCodeBench)400 $0.40$1.50$1.30 $3.70 $5
FRAMES1,170 $3.60$1.80$1.30 $7.20 $10
τ-Bench (Tau-Bench)230 trajectories $12.00$2.40$1.30 $16.10 $20
SWE-bench Verified500 $28.00$3.20$1.30 $33.50 $50
WebArena812 $38.00$3.60$1.30 $44.90 $60

Inference is estimated at mid-tier API prices (Claude Sonnet / GPT-4o-mini class). For Opus-class or o1-style reasoning models the inference column roughly 4×. Complex suites quoted above include the premium.

03Per-proof-system cost.

Publishers can pick a proof system. The tradeoff is prove-time cost vs. on-chain verification cost:

Proof systemProve timeProve costProof sizeL1 verify gasBest for
SP1 (default)8-18 min$1.50~1 KB~300kComplex eval code, unmodified Python
Risc06-14 min$1.30~900 B~280kGPU-heavy batching
Halo2 (KZG)25-60 min$3.20~750 B~220kPost-quantum, long-horizon claims
Groth16-BN2542-5 min$0.80~200 B~150kSimple threshold/mean scoring
Plonk (kimchi)10-30 min$2.10~400 B~200kCustom circuits
Signed attestation (fallback)<1 s$0.0564 B~60kLLM-judged benchmarks (no ZK-friendly score fn)

Signed attestations carry no ZK guarantee but still get the attestor-stake + community-replay layers. We mark them “Attested” instead of “Verified ⛓” on the UI.

04Iteration levers — how we bring the price down.

Three things move the needle, in order of leverage:

Lever 1 · biggest
Batch size

Gas amortizes per batch. 32 → 128 runs per batch drops gas per run from $1.30 → $0.38. Requires more queued volume; comes online as publisher demand grows.

Lever 2
Prover hardware

SP1 + Risc0 have aggressive GPU paths. Moving from 4090-class to H100-class cuts prove time ~40% and per-proof cost ~25%. Capital-intensive but linear.

Lever 3
L2 settlement path

Aligned batches already compress to one L1 proof. Future: an L2 receipt path for dashboards that don’t need mainnet directness. Would drop the gas column to ~$0.10 at the cost of a longer trust path. Not yet live; we prefer L1 honesty.

We publish these internally every month and update this page when the stack shifts. No Ethereum-gas surprise billing — if base fee triples, we eat it for in-flight runs and adjust new quotes.

05Batching economics.

Aligned Layer aggregates proofs into a single on-chain verification. The per-run gas cost is:

gas_per_run = (L1_verify_gas × gas_price + batcher_fee) / batch_size

At current mainnet pricing (~25 gwei base fee, ETH ≈ $3,600):

  • Batch of 8 runs: ~$4.20 per run
  • Batch of 32 runs: ~$1.30 per run
  • Batch of 128 runs: ~$0.38 per run
  • Batch of 512 runs: ~$0.12 per run

We default to batches of 32 during launch. The system automatically increases batch size as volume grows; users see their effective price drop accordingly (packs get cheaper per run, pay-as-you-go price stays $5 but margin improves).

06Attestor economics.

Attestors earn a share of each run they process. At $5/test, the split is approximately:

  • $1.90 → attestor (compute reimbursement + margin)
  • $1.30 → Ethereum gas
  • $0.80 → model provider
  • $1.00 → Benchlist (platform + hosting + team)

An attestor break-even at current pricing is ~50 runs/month per node, assuming a GPU amortized over 36 months. Once fleet demand pushes an attestor above 200 runs/month, they become meaningfully profitable at these rates.

Operator guide + join flow: /docs#attestors.

07Price floor — why not cheaper?

We get this question a lot. The honest answer: Ethereum L1 settlement is the floor. The verification contract on mainnet costs gas we don’t control. A proof batch that doesn’t land on L1 isn’t a Benchlist proof by definition.

Competitors who charge <$1 per “verified” test are either:

  • Not actually settling on a public blockchain (just a signed claim on a private server), or
  • Running on a testnet or proprietary rollup (free / near-zero gas but no real security guarantee), or
  • Using a shared batch that rarely lands on-chain (claim of “on-chain settlement” without actual mainnet cadence).

We prefer to be expensive and honest. For use cases that don’t need mainnet directness, the “Signed attestation” fallback above exists at $0.05 amortized.

08Complex suites.

SWE-bench, τ-Bench, WebArena, and anything requiring sandboxed execution, browser automation, or multi-hour agent trajectories are outside the “standard” cost envelope. These are quoted up-front before any run starts.

Typical quotes:

  • SWE-bench Verified (500 tasks, Docker): $50 per run (cost ~$33)
  • τ-Bench (tool-calling trajectories): $20 per run (cost ~$16)
  • WebArena (browser tasks): $60 per run (cost ~$45)
  • Custom compliance benchmark (negotiated): starts $2,999 setup + $499/mo

These are posted publicly the same way simple suites are. The $5/test default is for “green” rows on the matrix above.

Iteration discipline

We re-run this cost table the first of every month with fresh numbers from the attestor fleet. If costs drop, prices drop. If costs rise, we flag it here before changing pricing. The audit trail is in /changelog.