图解求解:运用主动视觉思维拓展推理前沿
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
December 30, 2025
作者: Meiqi Chen, Fandong Meng, Jie Zhou
cs.AI
摘要
复杂推理问题常涉及文本中未明确编码的隐含空间、几何与结构关系。尽管当前推理模型已在多领域取得优异表现,但纯文本推理在复杂场景中难以呈现全局结构约束。本文提出FIGR模型,通过端到端强化学习将主动视觉思维融入多轮推理过程。FIGR通过在解题过程中构建可视化表征,将中间结构假设外显化。通过自适应调控视觉推理的触发时机与方式,该模型能对纯文本难以捕捉的全局结构特性实现更稳定、连贯的推理。在具有挑战性的数学推理基准测试中,FIGR显著优于强文本链式思维基线模型,尤其在AIME 2025和BeyondAIME数据集上分别提升基础模型性能13.12%和11.00%,印证了图示引导多模态推理在增强复杂推理稳定性与可靠性方面的有效性。
English
Complex reasoning problems often involve implicit spatial, geometric, and structural relationships that are not explicitly encoded in text. While recent reasoning models have achieved strong performance across many domains, purely text-based reasoning struggles to represent global structural constraints in complex settings. In this paper, we introduce FIGR, which integrates active visual thinking into multi-turn reasoning via end-to-end reinforcement learning. FIGR externalizes intermediate structural hypotheses by constructing visual representations during problem solving. By adaptively regulating when and how visual reasoning should be invoked, FIGR enables more stable and coherent reasoning over global structural properties that are difficult to capture from text alone. Experiments on challenging mathematical reasoning benchmarks demonstrate that FIGR outperforms strong text-only chain-of-thought baselines. In particular, FIGR improves the base model by 13.12% on AIME 2025 and 11.00% on BeyondAIME, highlighting the effectiveness of figure-guided multimodal reasoning in enhancing the stability and reliability of complex reasoning.