Optional ZK anchor,
via Aligned Layer.

Every Benchlist run ships with an Ed25519 signed attestation by default — that is the floor of trust. Publishers who want a full ZK proof of the scoring function settled on Ethereum L1 can opt in via Aligned Layer; today those proofs are queued, not yet anchored, and the Verified ⛓ pill only appears after a batch confirms. Until then the leaderboard runs on signed attestations alone, and we say so explicitly.

Fund the prover · Aligned mainnet

Send ETH on Ethereum L1 to:
0xb0567184A52cB40956df6333510d6eF35B89C8de

BatcherPaymentService · Ethereum L1 mainnet · loading live balance…

Suggested first deposit: 0.05 ETH (~$150) covers ~25–50 batched proof submissions. The contract is verified at the Etherscan link above; cross-check with Aligned's mainnet deployment file before sending. The contract holds your deposit; the batcher debits it as your proofs land on-chain.

Proofs flowing from off-chain runner into Ethereum via Aligned Layer

Live on mainnet

Recent batches.

Every attested run on Benchlist shows up here. Click any batch to jump into the proof viewer, or deep-link to Aligned’s explorer and Etherscan for an independent check.

Run · benchmark

Batch hash

Block

Proof system

Score

Explore

Step 1 · Run

Attestor executes

A trusted third-party runner (or the publisher's own signed runner) executes the canonical benchmark. Every (prompt, response, judge) tuple is recorded.

Step 2 · Commit

Merkleize

The runner hashes each tuple and builds a Merkle tree. The root, combined with pinned dataset + methodology hashes, becomes the commitment.

Step 3 · Prove

Generate proof

A ZK proof of the scoring function over the Merkle root is generated. Supported systems: SP1, Risc0, Halo2, Groth16-BN254, Plonk.

Step 4 · Verify

On-chain batch

The proof is submitted to Aligned's operator set. Aligned batches proofs together, produces a Merkle root, and verifies on Ethereum. The batch ID is the listing's credential.

Five supported, one default.

Publishers pick the proof system that fits their scoring function. The default is SP1, a RISC-V zkVM that runs unmodified Python/Rust scoring code.

Default

SP1

RISC-V zkVM. Re-compile your scoring script to RV32IM and prove it verbatim. Works for complex eval harnesses.

zkVM

Risc0

Alternative zkVM with GPU prover acceleration. Faster for large workloads.

SNARK

Groth16-BN254

Classic pairing-based SNARK. Shortest proofs, lowest gas. Best for simple threshold/average scoring.

STARK

Halo2 (KZG / IPA)

Transparent setup, post-quantum-friendly. Good for long-horizon credibility.

PLONK

Plonk (kimchi / ultra)

Universal setup. Flexible middle-ground for custom scoring circuits.

Fallback

Signed attestation

When ZK is impractical (LLM-judge benchmarks): an allowlisted attestor's Ed25519 signature is verified by Aligned's general attestation contract. Stake-slashable.

Three properties you can't fake.

Tamper-evident

Every commitment is a one-way hash. Edit one character in a transcript and the Merkle root changes, the proof fails, Aligned rejects the batch.

Replayable

The dataset hash and methodology hash are pinned in the proof. Anyone can re-run the benchmark and check whether the same inputs produce the same score.

Slashable

Attestors post ETH stake. If a dispute is upheld, i.e., a community replay produces a materially different score, the attestor's stake is slashed.

End-to-end attestation flow.

publisher.json         (service metadata)
       │
       ▼
attestor runner        (SP1 / Risc0 / signed Ed25519)
       │   ┌─ commits ─▶ datasetHash     ─┐
       │   ├─ commits ─▶ methodologyHash  ├─ merkle root
       │   ├─ commits ─▶ transcripts      ─┘
       │   └─ produces ─▶ zk proof of score
       ▼
aligned-sdk submit     (Holesky → mainnet bridge)
       │
       ▼
Aligned operator set   (BLS signatures over batch)
       │
       ▼
ServiceManager.sol     (0xeF2A…606c on Ethereum L1)
       │
       ▼
batch_id + block  ────▶ Benchlist listing credential

Submit a proof.

Ruby-simple CLI. Run the benchmark, submit the proof, wait for verification.

# 1. Run the benchmark locally (or in CI)
$ benchlist run longmemeval \
    --service rem-labs \
    --model claude-opus-4-7 \
    --runs 3 \
    --out run.json

# 2. Commit: hash transcripts, compute Merkle root
$ benchlist commit run.json

# 3. Prove: produce SP1 proof of scoring function
$ benchlist prove run.json --system sp1

# 4. Submit to Aligned Layer
$ benchlist submit run.json --network holesky
  → batch_id: 0x7b3c...2b4c
  → verifier: 0xeF2A...606c
  → waiting for on-chain verification...
  → verified at block 22184921

# 5. Publish the listing
$ benchlist publish run.json
  → https://benchlist.ai/verify/run-rem-lme-001

Audit addresses.

Aligned Layer’s AVS contracts live on Ethereum L1 mainnet, secured by restaked ETH via EigenLayer, not on Base or any rollup. Point your node at any of these to validate a batch independently.

Aligned ServiceManager

0xeF2A435e5EE44B2041100EF8cbC8ae035166606c etherscan ↗

Batcher Payment Service

0xb0567184A52cB40956df6333510d6eF35B89C8de etherscan ↗

Aggregator

0x0b9AacA2C28a7ECAcB68BAef0d2F596AC27aaE32 etherscan ↗

Registry Coordinator

0xA8CC0749b4409c3c47012323E625aEcBA92f64b9 etherscan ↗

Operator State Retriever

0x6e0046205cAfA503F6b7465195A6C63C47d214f1 etherscan ↗

Our registry, dispute, stake.

Benchlist’s own on-chain contracts, for publisher registration, attestor staking, and dispute resolution, are in audit. Addresses publish here once deployed. Until then, payments settle via Stripe (card) and the crypto endpoint (native ETH on Base for pennies of gas, Ethereum L1 for directness, or Arbitrum).

Benchlist Registry

Planned · post-audit Q3 2026 · Ethereum L1

Dispute Manager

Planned · post-audit Q3 2026 · Ethereum L1

Stake Vault

Planned · post-audit Q3 2026 · Ethereum L1

ETH receiver · Base

0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad basescan ↗

ETH receiver · Ethereum L1

0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad etherscan ↗

ETH receiver · Arbitrum

0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad arbiscan ↗

Spec

Full wire format and reference runner implementation in the integration docs. Reference runner is MIT-licensed, forkable.

Optional ZK anchor,via Aligned Layer.

Recent batches.

Five supported, one default.

Three properties you can't fake.

End-to-end attestation flow.

Submit a proof.

Audit addresses.

Our registry, dispute, stake.

Optional ZK anchor,
via Aligned Layer.