Diagram Generation Eval v4

Algorithmic Projection Pipeline — Text → 3D Coords → Oblique Projection → SVG

24 February 2026

Summary

Cases
7
text-deterministic
v4 Checks
30/30
all pass
v3 Checks
26/30
4 fail
Human Eval
5/7
2 partial
What changed from v3 to v4 v3 SVGs were hand-drawn by an LLM. v4 SVGs are generated algorithmically: text description → 3D vertex coordinates → oblique projection → SVG. The geometry is correct by construction — the projection is a deterministic affine map, not a guess.
What the validator checks Structure: all vertices labeled, all edges drawn. Parallel edges: edges parallel in 3D must be equal length in 2D (oblique projection invariant). Affine consistency: fit a 3D→2D affine map to all vertices — residual must be <5px. Topology: above/below relationships preserved.

Case-by-Case Comparison

2012 Q40 — Regular Tetrahedron

v4: 3/3 v3: 3/3

ABCD is a regular tetrahedron with side 2x. E is midpoint of BC. Find tan∠AED.

Ground Truth

Ground truth 2012 Q40

v4 (Projected)

A B C D

v3 (Hand-drawn)

A B D C
v4 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
All edges equal → unique 3D shape up to rotation. Projection preserves topology and above/below.
Human eval: Good.

2014 Q40 — Perpendicular to Plane

v4: 3/3 v3: 3/3

AB⊥plane BCD. BC=8m, BD=15m, CD=17m, AB=8m. Find ∠AEB.

Ground Truth

Ground truth 2014 Q40

v4 (Projected)

8 m 8 m 15 m 17 m A B C D

v3 (Hand-drawn)

8 m 8 m 15 m 17 m A B C D
v4 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
All 4 edges given + perpendicularity → unique 3D figure. ∠DBC=90° (Pythagorean triple 8-15-17).
Human eval: Good.

2016 Q39 — Rectangular Box

v4: 6/6 v3: 6/6

ABCDEFGH rectangular box. AB=16cm, BC=12cm, height=15cm. P midpoint of AC. Find ∠PFQ.

Ground Truth

Ground truth 2016 Q39

v4 (Projected)

16 cm 12 cm 15 cm A B C D E F G H

v3 (Hand-drawn)

16 cm 12 cm 15 cm 9 cm A B C D E F G H P Q
v4 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 12 lines ≥ 12 edges
  • ✓ parallel-x: max dev 0.0%
  • ✓ parallel-y: max dev 0.0%
  • ✓ parallel-z: max dev 0.0%
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 12 lines ≥ 12 edges
  • ✓ parallel-x: max dev 0.0%
  • ✓ parallel-y: max dev 3.5%
  • ✓ parallel-z: max dev 2.3%
  • ✓ affine: max residual 2.5px
3 dimensions → unique box. Parallel edges verified equal in projection.
Human eval: Missing P (midpoint of AC). Q has to be specified via the diagram — not available from the text alone. This case demonstrates the limit of text-deterministic generation: even when the box is fully specified, auxiliary points like Q may require the diagram to disambiguate.

2017 Q39 — Vertical Pole on Ground

v4: 3/3 v3: 3/3

AD vertical pole on ground BCD. BD⊥DC. AB=25m, AD=15m, BC=29m, CD=21m.

Ground Truth

Ground truth 2017 Q39

v4 (Projected)

15 m 25 m 29 m 21 m 20 m A B C D

v3 (Hand-drawn)

15 m 25 m 21 m 29 m A D B C
v4 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
BD=20 (from AB²−AD²). Three mutually perpendicular edges at D: AD, BD, DC.
Human eval: Good.

2018 Q41 — Rectangular Block with Point X

v4: 6/6 v3: 6/6

ABCDEFGH rectangular block. AB=12cm, BC=8cm. X on DE with DX=9cm.

Ground Truth

Ground truth 2018 Q41

v4 (Projected)

12 cm 8 cm 9 A B C D E F G H X

v3 (Hand-drawn)

12 cm 8 cm A B C D E F G H X
v4 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 13 lines ≥ 12 edges
  • ✓ parallel-x: max dev 0.0%
  • ✓ parallel-y: max dev 0.0%
  • ✓ parallel-z: max dev 0.0%
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 14 lines ≥ 12 edges
  • ✓ parallel-x: max dev 0.0%
  • ✓ parallel-y: max dev 0.0%
  • ✓ parallel-z: max dev 0.0%
  • ✓ affine: max residual 0.0px
3 box dimensions given. X positioned at 9/13 along vertical edge DE.
Human eval: There is no Y in the exam question — the question stem referencing ∠YBX was likely read wrong from the competency map. The diagram itself is good.

2022 Q40 — Cube (Cross-Sections)

v4: 6/6 v3: 2/6

ABCDEFGH cube. Find dihedral angles α (△AFG vs △AFH) and β (△AFH vs △FGH).

Ground Truth

Ground truth 2022 Q40

v4 (Projected)

A B C D E F G H

v3 (Hand-drawn)

F E A D H B G C
v4 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 15 lines ≥ 12 edges
  • ✓ parallel-x: max dev 0.0%
  • ✓ parallel-y: max dev 0.0%
  • ✓ parallel-z: max dev 0.0%
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 8 present
  • ✓ edges: 18 lines ≥ 12 edges
  • ✗ parallel-x: max dev 38.3%
  • ✗ parallel-y: max dev 52.4%
  • ✗ parallel-z: max dev 58.0%
  • ✗ affine: max residual 101.8px
All edges equal. v3 had 101.8px affine residual ("completely deformed"). v4: 0.0px.
Human eval: A bit unsure what exactly needs to be drawn beyond the cube itself. The cross-section triangles (AFG, AFH, FGH) are shown as construction lines but the question-specific rendering needs further clarification.

2024 Q40 — Tetrahedron on Ground

v4: 3/3 v3: 3/3

PQR on ground, S above Q (SQ⊥ground). ∠PQR=90°, ∠QPS=30°, ∠QRS=45°. Find ∠PRS.

Ground Truth

Ground truth 2024 Q40

v4 (Projected)

P Q R S

v3 (Hand-drawn)

S Q R P
v4 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
v3 Geometric Checks
  • ✓ vertices: All 4 present
  • ✓ edges: 6 lines ≥ 6 edges
  • ✓ affine: max residual 0.0px
Angles uniquely determine shape up to scale. S vertically above Q verified.
Human eval: v4-pre had QR, QP as solid — should be dotted (ground plane edges behind the figure). Fixed.

Verdict

The algorithmic projection pipeline eliminates the core problem identified in v3: inability to monitor whether diagrams are drawn correctly. By deriving 3D coordinates from text constraints and applying a deterministic oblique projection, the geometry is correct by construction.

v4 achieves 30/30 geometric checks passing (vs 26/30 for v3). The v3 cube had 101.8px affine residual and 38-58% parallel-edge deviations — v4 has 0.0px and 0.0% across all cases.

Key architectural insight: generation and validation now use independent code paths (projection engine vs affine-consistency checker), making the eval loop honest. The ground truth images from past papers serve as visual sanity checks, not the primary evaluation criterion.

Pipeline Architecture

Generation path: Text description → 3D vertex coordinates (manual extraction) → oblique projection (cabinet, 45°, depth scale 0.5) → SVG elements (lines, labels, right-angle marks)
Validation path: Parse SVG line endpoints → cluster into vertex positions → match to labels → fit affine 3D→2D map → check residuals + parallel-edge invariants + topology
Limitation: 3D coordinate extraction from text is still manual. The next step is automating this — parsing text constraints into a coordinate system. This is feasible for text-deterministic questions but requires a constraint solver.