OPEN RESEARCH · LAUNCHING 2026

The first open benchmark
for AI-generated parametric CAD

Compare outputs from LLM baselines, academic, and commercial models — side by side — on a fixed set of 20 curated prompts across 4 difficulty tiers.

Browse Models →See the Benchmark
173
Papers analyzed
5
Models tested
20
Benchmark prompts
4
Difficulty tiers

How it works

Inspired by Chatbot Arena and 3D Arena — but for engineering-grade parametric CAD.

STEP 01

Enter a text prompt

Describe a mechanical part in plain English. From simple primitives to complex functional assemblies.

STEP 02

Compare model outputs

See outputs from multiple models rendered side-by-side in 3D. Inspect geometry, view the generated code, see where each model fails.

STEP 03

Browse the results

Explore the full benchmark grid — 20 prompts × 5 models. Click any cell to see the 3D output, source code, and failure analysis.

Models tested

5 models evaluated on the full 20-prompt benchmark. More being added.

MODEL
NOTES
VALID STLs
Claude Opus 4.6
LLM Baseline · Anthropic
Best overall. Perfect on T1–T3.
19
of 20
Zoo / ML-ephant
Commercial · zoo.dev
Native geometry engine. Returns KCL.
19
of 20
Text-to-CadQuery
Academic · arXiv 2025
Qwen 3B fine-tuned. Unit normalization quirk.
14
of 20
Gemini 2.5 Flash
LLM Baseline · Google
Fastest. Hallucinates methods on T4.
14
of 20
GPT-5
LLM Baseline · OpenAI
Token limit cuts off complex prompts.
12
of 20
Have a model to add? contact@cadarena.dev

Benchmark prompts

20 prompts across 4 difficulty tiers. A prompt scores ✓ if it produces a valid, executable 3D part.

TIER 4
Complex Functional Parts
5 prompts
A parametric living hinge, 100 mm span, 0.3 mm flex zone
An S-curve pipe fitting, 15 mm inner diameter, 45° bend
A 3-part snap-fit assembly: housing, PCB carrier, and lid
TIER 3
Multi-Feature Parts
5 prompts
A flanged shaft with 3 equally-spaced M4 bolt holes on the flange
A box with a snap-fit lid, 50 × 40 × 30 mm
A spur gear: 20 teeth, module 2, 10 mm thick, 8 mm center bore
TIER 2
Single Part with Features
5 prompts
A rectangular plate 50 × 30 × 5 mm with a centered hole 8 mm diameter
An L-shaped bracket, 40 mm arms, 5 mm thick, 30 mm tall
A hex bolt head 10 mm across flats, M6 thread, 20 mm shaft
TIER 1
Simple Primitives
5 prompts
A cube 20 × 20 × 20 mm
A cylinder 10 mm diameter, 30 mm tall
A hollow sphere, outer radius 20 mm, wall 2 mm

Preliminary results

EARLY DATA · 2026-03-03

20 prompts across 4 difficulty tiers. Metric: % of prompts that produced a valid, executable 3D part. Full leaderboard launching soon.

RANKMODELTYPEVALID STLSYNTAX OKAVG LATENCYPROMPTS PASSEDNOTES
#1Claude Opus 4.6LLM Baseline
90%
100%6.9s19 / 20Perfect T1–T3. Only tier 4 failures.
#1Zoo ML-ephantCommercial
95%
95%11.1s19 / 20Tied with Claude. Returns native geometry.
#3Gemini 2.5 FlashLLM Baseline
70%
100%3.1s14 / 20Fastest. Hallucinates methods at T4.
#4GPT-5LLM Baseline
60%
60%16.1s12 / 20Token truncation kills all T4 prompts.

These are API-only results on 20 hand-selected prompts, run and reviewed manually. More models and prompts being added. Gemini result reflects free-tier rate limiting, not model quality.

Stay in the loop

We're benchmarking every AI-for-CAD model. Get notified when we add new models, publish results, or release our paper.

Working on a text-to-CAD model? Reach out at contact@cadarena.dev