OPEN RESEARCH · LAUNCHING 2026

The first open benchmark
for AI-generated parametric CAD

Compare outputs from LLM baselines, academic, and commercial models — side by side — on a fixed set of 20 curated prompts across 4 difficulty tiers.

Browse Models →See the Benchmark

173

Papers analyzed

5

Models tested

20

Benchmark prompts

4

Difficulty tiers

How it works

Inspired by Chatbot Arena and 3D Arena — but for engineering-grade parametric CAD.

STEP 01

Enter a text prompt

Describe a mechanical part in plain English. From simple primitives to complex functional assemblies.

STEP 02

Compare model outputs

See outputs from multiple models rendered side-by-side in 3D. Inspect geometry, view the generated code, see where each model fails.

STEP 03

Browse the results

Explore the full benchmark grid — 20 prompts × 5 models. Click any cell to see the 3D output, source code, and failure analysis.

Models tested

5 models evaluated on the full 20-prompt benchmark. More being added.

MODEL

NOTES

VALID STLs

Claude Opus 4.6

LLM Baseline · Anthropic

Best overall. Perfect on T1–T3.

19

of 20

Zoo / ML-ephant

Commercial · zoo.dev

Native geometry engine. Returns KCL.

19

of 20

Text-to-CadQuery

Academic · arXiv 2025

Qwen 3B fine-tuned. Unit normalization quirk.

14

of 20

Gemini 2.5 Flash

LLM Baseline · Google

Fastest. Hallucinates methods on T4.

14

of 20

GPT-5

LLM Baseline · OpenAI

Token limit cuts off complex prompts.

12

of 20

Have a model to add? contact@cadarena.dev

Benchmark prompts

20 prompts across 4 difficulty tiers. A prompt scores ✓ if it produces a valid, executable 3D part.

TIER 4

Complex Functional Parts

5 prompts

“A parametric living hinge, 100 mm span, 0.3 mm flex zone”

“An S-curve pipe fitting, 15 mm inner diameter, 45° bend”

“A 3-part snap-fit assembly: housing, PCB carrier, and lid”

TIER 3

Multi-Feature Parts

5 prompts

“A flanged shaft with 3 equally-spaced M4 bolt holes on the flange”

“A box with a snap-fit lid, 50 × 40 × 30 mm”

“A spur gear: 20 teeth, module 2, 10 mm thick, 8 mm center bore”

TIER 2

Single Part with Features

5 prompts

“A rectangular plate 50 × 30 × 5 mm with a centered hole 8 mm diameter”

“An L-shaped bracket, 40 mm arms, 5 mm thick, 30 mm tall”

“A hex bolt head 10 mm across flats, M6 thread, 20 mm shaft”

TIER 1

Simple Primitives

5 prompts

“A cube 20 × 20 × 20 mm”

“A cylinder 10 mm diameter, 30 mm tall”

“A hollow sphere, outer radius 20 mm, wall 2 mm”

Preliminary results

EARLY DATA · 2026-03-03

20 prompts across 4 difficulty tiers. Metric: % of prompts that produced a valid, executable 3D part. Full leaderboard launching soon.

RANK	MODEL	TYPE	VALID STL	SYNTAX OK	AVG LATENCY	PROMPTS PASSED	NOTES
#1	Claude Opus 4.6	LLM Baseline	90%	100%	6.9s	19 / 20	Perfect T1–T3. Only tier 4 failures.
#1	Zoo ML-ephant	Commercial	95%	95%	11.1s	19 / 20	Tied with Claude. Returns native geometry.
#3	Gemini 2.5 Flash	LLM Baseline	70%	100%	3.1s	14 / 20	Fastest. Hallucinates methods at T4.
#4	GPT-5	LLM Baseline	60%	60%	16.1s	12 / 20	Token truncation kills all T4 prompts.

These are API-only results on 20 hand-selected prompts, run and reviewed manually. More models and prompts being added. Gemini result reflects free-tier rate limiting, not model quality.

Stay in the loop

We're benchmarking every AI-for-CAD model. Get notified when we add new models, publish results, or release our paper.

Working on a text-to-CAD model? Reach out at contact@cadarena.dev

The first open benchmarkfor AI-generated parametric CAD

How it works

Enter a text prompt

Compare model outputs

Browse the results

Models tested

Benchmark prompts

Preliminary results

Stay in the loop

The first open benchmark
for AI-generated parametric CAD