ChatPaper.aiChatPaper

Code2Math:你的代码智能体能否通过探索有效演化数学问题?

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

March 3, 2026
作者: Dadi Guo, Yuejin Xie, Qingyu Liu, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung
cs.AI

摘要

随着大型语言模型(LLM)的数学能力向国际数学奥林匹克竞赛(IMO)水平迈进,训练与评估所需的高难度、高质量数学问题的稀缺性已成为显著瓶颈。与此同时,新一代代码智能体在自主编程与推理方面展现出卓越能力,表明代码执行可作为数学实验的可扩展环境。本文研究代码智能体将现有数学问题自主演化成更复杂变体的潜力,提出一种多智能体框架,该框架在生成问题变异体的同时能验证其可解性与难度提升。实验表明,在充分测试探索的情况下,代码智能体能够合成结构新颖且难度超越原题的可解新问题。本工作通过实证说明,在可扩展计算环境中,代码驱动的智能体可作为生成高难度数学推理问题的有效机制。相关数据已发布于 https://github.com/TarferSoul/Code2Math。
English
As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical experimentation. In this paper, we investigate the potential of code agents to autonomously evolve existing math problems into more complex variations. We introduce a multi-agent framework designed to perform problem evolution while validating the solvability and increased difficulty of the generated problems. Our experiments demonstrate that, given sufficient test-time exploration, code agents can synthesize new, solvable problems that are structurally distinct from and more challenging than the originals. This work provides empirical evidence that code-driven agents can serve as a viable mechanism for synthesizing high-difficulty mathematical reasoning problems within scalable computational environments. Our data is available at https://github.com/TarferSoul/Code2Math.
PDF172May 8, 2026