Code2Math: コードエージェントは探索を通じて数学問題を効果的に進化させられるか？

要旨

大規模言語モデル（LLM）の数学的能力が国際数学オリンピック（IMO）レベルに近づくにつれ、訓練と評価のための質が高く難易度の高い問題の不足が大きなボトルネックとなっている。一方、近年のコードエージェントは、エージェントとしてのコーディングと推論において高度な能力を示しており、コード実行が数学的実験のためのスケーラブルな環境として機能し得ることが示唆されている。本論文では、コードエージェントが既存の数学問題を自律的に発展させ、より複雑なバリエーションを生成する可能性を探る。我々は、問題の進化を実行するとともに、生成された問題の解決可能性と難易度の向上を検証するように設計されたマルチエージェントフレームワークを提案する。実験により、十分なテスト時間探索が与えられれば、コードエージェントは、元の問題とは構造的に異なり、かつより難易度の高い、解決可能な新規問題を合成できることが実証された。本研究は、コード駆動型エージェントが、スケーラブルな計算環境内で高難度の数学的推論問題を合成するための有効なメカニズムとなり得ることを実証的に示すものである。データは https://github.com/TarferSoul/Code2Math で公開されている。

English

As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical experimentation. In this paper, we investigate the potential of code agents to autonomously evolve existing math problems into more complex variations. We introduce a multi-agent framework designed to perform problem evolution while validating the solvability and increased difficulty of the generated problems. Our experiments demonstrate that, given sufficient test-time exploration, code agents can synthesize new, solvable problems that are structurally distinct from and more challenging than the originals. This work provides empirical evidence that code-driven agents can serve as a viable mechanism for synthesizing high-difficulty mathematical reasoning problems within scalable computational environments. Our data is available at https://github.com/TarferSoul/Code2Math.

Code2Math: コードエージェントは探索を通じて数学問題を効果的に進化させられるか？

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

要旨

Support