Verification spec

Every score,
attested on Ethereum.

Benchlist uses Aligned Layer — a proof aggregation network settling on Ethereum L1 — as the verification backbone. When you see a Verified ⛓ badge, it means the score is backed by an on-chain proof you can re-check.

Proofs flowing from off-chain runner into Ethereum via Aligned Layer
Live on mainnet

Recent batches.

Every attested run on Benchlist shows up here. Click any batch to jump into the proof viewer — or deep-link to Aligned’s explorer and Etherscan for an independent check.

Run · benchmark
Batch hash
Block
Proof system
Score
Explore
Step 1 · Run
Attestor executes
A trusted third-party runner (or the publisher's own signed runner) executes the canonical benchmark. Every (prompt, response, judge) tuple is recorded.
Step 2 · Commit
Merkleize
The runner hashes each tuple and builds a Merkle tree. The root, combined with pinned dataset + methodology hashes, becomes the commitment.
Step 3 · Prove
Generate proof
A ZK proof of the scoring function over the Merkle root is generated. Supported systems: SP1, Risc0, Halo2, Groth16-BN254, Plonk.
Step 4 · Verify
On-chain batch
The proof is submitted to Aligned's operator set. Aligned batches proofs together, produces a Merkle root, and verifies on Ethereum. The batch ID is the listing's credential.
Proof systems

Five supported, one default.

Publishers pick the proof system that fits their scoring function. The default is SP1 — a RISC-V zkVM that runs unmodified Python/Rust scoring code.

Default
SP1
RISC-V zkVM. Re-compile your scoring script to RV32IM and prove it verbatim. Works for complex eval harnesses.
zkVM
Risc0
Alternative zkVM with GPU prover acceleration. Faster for large workloads.
SNARK
Groth16-BN254
Classic pairing-based SNARK. Shortest proofs, lowest gas. Best for simple threshold/average scoring.
STARK
Halo2 (KZG / IPA)
Transparent setup, post-quantum-friendly. Good for long-horizon credibility.
PLONK
Plonk (kimchi / ultra)
Universal setup. Flexible middle-ground for custom scoring circuits.
Fallback
Signed attestation
When ZK is impractical (LLM-judge benchmarks): an allowlisted attestor's Ed25519 signature is verified by Aligned's general attestation contract. Stake-slashable.
Why on-chain

Three properties you can't fake.

Tamper-evident

Every commitment is a one-way hash. Edit one character in a transcript and the Merkle root changes, the proof fails, Aligned rejects the batch.

Replayable

The dataset hash and methodology hash are pinned in the proof. Anyone can re-run the benchmark and check whether the same inputs produce the same score.

Slashable

Attestors post ETH stake. If a dispute is upheld — i.e., a community replay produces a materially different score — the attestor's stake is slashed.

Architecture

End-to-end attestation flow.

publisher.json         (service metadata)
       │
       ▼
attestor runner        (SP1 / Risc0 / signed Ed25519)
       │   ┌─ commits ─▶ datasetHash     ─┐
       │   ├─ commits ─▶ methodologyHash  ├─ merkle root
       │   ├─ commits ─▶ transcripts      ─┘
       │   └─ produces ─▶ zk proof of score
       ▼
aligned-sdk submit     (Holesky → mainnet bridge)
       │
       ▼
Aligned operator set   (BLS signatures over batch)
       │
       ▼
ServiceManager.sol     (0xeF2A…606c on Ethereum L1)
       │
       ▼
batch_id + block  ────▶ Benchlist listing credential
Example

Submit a proof.

Ruby-simple CLI. Run the benchmark, submit the proof, wait for verification.

# 1. Run the benchmark locally (or in CI)
$ benchlist run longmemeval \
    --service rem-labs \
    --model claude-opus-4-7 \
    --runs 3 \
    --out run.json

# 2. Commit: hash transcripts, compute Merkle root
$ benchlist commit run.json

# 3. Prove: produce SP1 proof of scoring function
$ benchlist prove run.json --system sp1

# 4. Submit to Aligned Layer
$ benchlist submit run.json --network holesky
  → batch_id: 0x7b3c...2b4c
  → verifier: 0xeF2A...606c
  → waiting for on-chain verification...
  → verified at block 22184921

# 5. Publish the listing
$ benchlist publish run.json
  → https://benchlist.ai/verify/run-rem-lme-001
Verifier contracts

Audit addresses.

Aligned Layer’s AVS contracts live on Ethereum L1 mainnet — secured by restaked ETH via EigenLayer, not on Base or any rollup. Point your node at any of these to validate a batch independently.

Aligned ServiceManager
0xeF2A435e5EE44B2041100EF8cbC8ae035166606c etherscan ↗
Batcher Payment Service
0xb0567184A52cB40956df6333510d6eF35B89C8de etherscan ↗
Aggregator
0x0b9AacA2C28a7ECAcB68BAef0d2F596AC27aaE32 etherscan ↗
Registry Coordinator
0xA8CC0749b4409c3c47012323E625aEcBA92f64b9 etherscan ↗
Operator State Retriever
0x6e0046205cAfA503F6b7465195A6C63C47d214f1 etherscan ↗
Benchlist contracts

Our registry, dispute, stake.

Benchlist’s own on-chain contracts — for publisher registration, attestor staking, and dispute resolution — are in audit. Addresses publish here once deployed. Until then, payments settle via Stripe (card) and the crypto endpoint (native ETH on Base for pennies of gas, Ethereum L1 for directness, or Arbitrum).

Benchlist Registry
Planned · post-audit Q3 2026 · Ethereum L1
Dispute Manager
Planned · post-audit Q3 2026 · Ethereum L1
Stake Vault
Planned · post-audit Q3 2026 · Ethereum L1
ETH receiver · Base
0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad basescan ↗
ETH receiver · Ethereum L1
0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad etherscan ↗
ETH receiver · Arbitrum
0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad arbiscan ↗
Spec

Full wire format and reference runner implementation in the integration docs. Reference runner is MIT-licensed, forkable.