MathemaTikZ: A Dataset and Benchmark for Mathematical Diagram Generation

Academic Paper

Jun 1

Written By Jeremy Koren

Academic Article

Academic article introducing MathemaTikZ, a dataset and benchmark for mathematical diagram generation.

Visit Resource

This link will take you to an external website.

Purpose/Abstract

Diagrams play a fundamental role in mathematics education, serving both as essential components of mathematical problems and as powerful scaffolding tools to support student comprehension.

While AI tools have shown promise in supporting teachers with lesson preparation, especially with text-based mathematical content, they still struggle with reliably generating visual diagrams.

This work makes two main contributions: (1) it introduces MathemaTikZ, a dataset derived from the Illustrative Mathematics curriculum, comprising 3,793 mathematical diagrams paired with their natural language descriptions, problem contexts, and TikZ implementations. These span the full range of diagrams utilized in the K12 math curriculum.

(2) It conducts comprehensive baseline evaluations using state-of-the-art language models, including GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash, to assess current capabilities in mathematical diagram generation.

The findings reveal that even the best-performing models achieve only a 73.9% success rate in accurately generating mathematical diagrams, with performance varying significantly across different types of visualizations.

Through detailed error analysis, the work identifies four key challenge areas that future work should address: spatial reasoning and element placement, adherence to geometric constraints, pedagogical knowledge of mathematical diagrams, and preservation of mathematical relationships.

The results establish baselines for mathematical diagram generation and highlight critical areas for improvement in making AI tools more effective for mathematics education.

Citation

Malik, R., Hao, R. L., Kacholia, R., & Demszky, D. (2025). MathemaTikZ: A dataset and benchmark for mathematical diagram generation. In Proceedings of the Twelfth ACM Conference on Learning @ Scale (L@S '25) (pp. 95–104). Association for Computing Machinery. https://doi.org/10.1145/3698205.3729558

Areas researched: Platform/Program, AI