ChatPaper.aiChatPaper

推理核心:一個可擴展的強化學習環境,用於大語言模型的符號推理

Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

September 22, 2025
作者: Valentin Lacombe, Valentin Quesnel, Damien Sileo
cs.AI

摘要

我們推出「推理核心」(Reasoning Core),這是一個專為強化學習與可驗證獎勵(RLVR)設計的新型可擴展環境,旨在提升大型語言模型(LLMs)的基礎符號推理能力。與現有專注於遊戲或孤立謎題的基準不同,推理核心程序化地生成涵蓋核心形式領域的問題,包括PDDL規劃、一階邏輯、上下文無關文法解析、因果推理以及系統方程求解。該環境基於高通用性問題分佈、通過外部工具進行驗證以及持續難度控制等關鍵設計原則構建,這些原則共同提供了近乎無限的新穎訓練實例。對前沿LLMs的初始零樣本評估證實了推理核心任務的難度,使其成為提升未來模型推理能力的有望資源。
English
We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.
PDF42September 23, 2025