00 / 08  ·  VISION

The road to end‑to‑end AI
for mechanical engineering

We have geometry generators. We have simulation tools. We have optimization algorithms. None of them talk to each other. This is what needs to change.

scroll
01 / 08  ·  THE PROBLEM

Today, every step still needs a human

The mechanical engineering loop hasn't fundamentally changed in 40 years. An expert sits at every handoff.

Concept Engineer manually describes requirements — dimensions, loads, constraints Human
CAD Expert models the geometry by hand — hours to weeks per part Human
Simulation FEA/CFD setup, meshing, boundary conditions — specialist required Human
DFM Review Manufacturing engineer checks tolerances, undercuts, wall thickness Human
Iteration Repeat from CAD step — typically 5–20 cycles before sign-off Human
02 / 08  ·  WHERE WE ARE

AI is arriving — but in isolated islands

Each layer is getting AI tooling independently. Nothing connects top to bottom.

Text → CAD 13+ models exist. Results are inconsistent. No benchmark. This is where we're focused. Active now
CAD → Sim Autodesk Fusion AI, Ansys AI, SimScale — AI-assisted setup but not end-to-end Early
Topo Opt Generative design (Fusion 360, nTopology) — well-established but siloed Mature
DFM Rule-based tools exist (Boothroyd Dewhurst, DFMPro) — not AI-native Pre-AI
Assembly No meaningful AI for multi-part assembly generation yet Unsolved
03 / 08  ·  THE CORE PROBLEM

You can't improve what
you can't measure

The text-to-CAD field has 13+ competing models, four different output formats, and no shared benchmark. Papers can't be compared. Progress is invisible.

What exists today
Static paper benchmarks. Each paper tests on its own dataset with its own metrics. Results die with the paper.
What's needed
A living leaderboard. Fixed prompt set. Automatic metrics + human preference votes. New models can submit anytime.
"Lack of comprehensive evaluation frameworks" — identified as the field's most critical gap in the 2025 LLMs for CAD survey (173 papers reviewed).
04 / 08  ·  THE STACK

What full-stack AI for mechanical
engineering actually requires

L1 · Valid geometry from text~80% solved
L2 · Dimensional accuracy + constraints~35% solved
L3 · Manufacturability (DFM-aware output)~5% solved
L4 · Physics-valid under load~10% solved
L5 · Multi-part assemblies~2% solved
L6 · Full product from specification0% solved

Estimates based on current SOTA across 173 papers reviewed. L1 validity = best models achieve ~80–93% on simple shapes.

05 / 08  ·  MISSING PIECES

The three gaps nobody has closed

Manufacturability No model checks if a generated part can actually be made. Wall thickness, undercuts, tolerances, process-specific constraints — all ignored. A generated part that looks correct may be physically impossible to manufacture.
Cross-model eval Sequence-based models (Text2CAD), code-based (CAD-Coder), B-rep direct (BrepGen) — never compared on the same benchmark. We don't know which paradigm wins or when.
Academic vs commercial Zoo, AdamCAD, CADGPT have never appeared in any academic benchmark table. Academic SOTA models are never in commercial comparisons. Nobody has done both.
06 / 08  ·  CAD ARENA

The benchmark that drives progress

The history of ML is clear: ImageNet didn't just measure vision, it created it. SWE-bench didn't just measure coding agents, it shaped their development. A good benchmark is a forcing function.

200
benchmark prompts
13+
models evaluated
4
difficulty tiers
open submissions
First benchmark to compare academic and commercial models side by side on the same fixed prompt set. Automatic validity + geometry metrics. Human preference voting. Living leaderboard.
07 / 08  ·  THE VISION

Where this ends up

A mechanical engineer describes what they need in plain language. The system generates geometry, checks it against manufacturing constraints, runs simulation, optimizes the design, and outputs a production-ready file.

Not a CAD copilot. A CAD engineer.

We're at step one: getting geometry generation right and measurable. But step one has to be done properly for the rest to follow.