TPTPエコシステムにおけるLLM数学推論のための飽和駆動型データセット生成

要旨

高品質で論理的に整合性のあるデータの不足は、大規模言語モデル（LLMs）の数学的推論能力を向上させる上で重大なボトルネックとなっています。本研究は、数十年にわたる自動定理証明の研究をスケーラブルなデータエンジンに変換することで、この課題に取り組みます。エラーの発生しやすいLLMsやLeanやIsabelleのような複雑な証明支援システムの構文に依存するのではなく、我々のフレームワークはE-proverの飽和機能をTPTP公理ライブラリに適用し、大規模で保証された有効性を持つ定理のコーパスを導出します。我々のパイプラインは原理的でシンプルです：公理を飽和させ、「興味深い」定理をフィルタリングし、タスクを生成します。LLMsをループに含めないことで、構成的に事実誤認を排除します。この純粋に記号的なデータは、難易度が制御された3つの課題に変換されます：含意検証、前提選択、証明再構築です。最先端モデルに対するゼロショット実験では、深い構造的推論を必要とするタスクにおいて性能が著しく低下するという明確な弱点が明らかになりました。我々のフレームワークは、このギャップを測定する診断ツールと、それを解決するためのスケーラブルな記号的トレーニングデータの両方を提供します。コードとデータは公開されています。 https://github.com/sileod/reasoning_core https://hf.co/datasets/reasoning-core/rc1

English

The scarcity of high-quality, logically sound data is a critical bottleneck for advancing the mathematical reasoning of Large Language Models (LLMs). Our work confronts this challenge by turning decades of automated theorem proving research into a scalable data engine. Rather than relying on error-prone LLMs or complex proof-assistant syntax like Lean and Isabelle, our framework leverages E-prover's saturation capabilities on the vast TPTP axiom library to derive a massive, guaranteed-valid corpus of theorems. Our pipeline is principled and simple: saturate axioms, filter for "interesting" theorems, and generate tasks. With no LLMs in the loop, we eliminate factual errors by construction. This purely symbolic data is then transformed into three difficulty-controlled challenges: entailment verification, premise selection, and proof reconstruction. Our zero-shot experiments on frontier models reveal a clear weakness: performance collapses on tasks requiring deep, structural reasoning. Our framework provides both the diagnostic tool to measure this gap and a scalable source of symbolic training data to address it. We make the code and data publicly available. https://github.com/sileod/reasoning_core https://hf.co/datasets/reasoning-core/rc1

TPTPエコシステムにおけるLLM数学推論のための飽和駆動型データセット生成

Saturation-Driven Dataset Generation for LLM Mathematical Reasoning in the TPTP Ecosystem

要旨

Support