图求解:运用主动视觉思维拓展推理前沿
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
December 30, 2025
作者: Meiqi Chen, Fandong Meng, Jie Zhou
cs.AI
摘要
複雜推理問題往往涉及文本中未明確編碼的隱性空間、幾何與結構關係。儘管當前推理模型在多個領域表現優異,純文本推理在複雜情境下仍難以表徵全局結構約束。本文提出FIGR模型,通過端到端強化學習將主動視覺思維融入多輪次推理過程。FIGR在解題時通過構建視覺表徵來外化中間結構假設,藉由自適應調控視覺推理的觸發時機與方式,實現對文本難以單獨捕捉的全局結構屬性進行更穩定連貫的推理。在具有挑戰性的數學推理基準測試中,FIGR顯著優於強勁的純文本思維鏈基線模型,尤其在AIME 2025和BeyondAIME數據集上分別將基礎模型性能提升13.12%和11.00%,彰顯了圖形引導多模態推理在增強複雜推理穩定性與可靠性方面的有效性。
English
Complex reasoning problems often involve implicit spatial, geometric, and structural relationships that are not explicitly encoded in text. While recent reasoning models have achieved strong performance across many domains, purely text-based reasoning struggles to represent global structural constraints in complex settings. In this paper, we introduce FIGR, which integrates active visual thinking into multi-turn reasoning via end-to-end reinforcement learning. FIGR externalizes intermediate structural hypotheses by constructing visual representations during problem solving. By adaptively regulating when and how visual reasoning should be invoked, FIGR enables more stable and coherent reasoning over global structural properties that are difficult to capture from text alone. Experiments on challenging mathematical reasoning benchmarks demonstrate that FIGR outperforms strong text-only chain-of-thought baselines. In particular, FIGR improves the base model by 13.12% on AIME 2025 and 11.00% on BeyondAIME, highlighting the effectiveness of figure-guided multimodal reasoning in enhancing the stability and reliability of complex reasoning.