ChatPaper.aiChatPaper

重新审视视觉中心推理泛化中长链思维的必要性

Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

November 27, 2025
作者: Yifan Du, Kun Zhou, Yingqian Min, Yue Ling, Wayne Xin Zhao, Youbin Wu
cs.AI

摘要

我们研究了不同思维链(CoT)设计如何影响视觉语言模型(VLMs)中可泛化视觉推理能力的习得。尽管CoT数据(尤其是长文本或视觉化CoT,如“基于图像的思考”)已被广泛用于监督中间推理过程,但特定CoT设计为何有效、哪些设计能真正支持可泛化推理仍不明确。为系统评估这一问题,我们采用可控的迷宫求解基准测试:其推理规则完全基于视觉,难度可通过网格尺寸调节,且所有中间步骤均可自动生成。在标准SFT后接RL的训练流程下,我们使用Qwen2.5-VL-7B模型比较了三种典型CoT格式:语言CoT、定位CoT(含空间坐标轨迹)和视觉CoT(含图像操作)。实验表明:视觉化/长文本CoT主要加速收敛但未提升最终性能上限;仅包含必要定位步骤的简洁CoT优于长轨迹CoT;尤为重要的是,仅保留最简定位信息的CoT在不同迷宫尺寸中展现出最佳泛化能力。我们进一步在其他视觉中心任务中验证了这些发现。这些结果揭示了“短即长”效应,为构建更具泛化性的视觉推理SFT数据集提供了实践指导。
English
We study how different Chain-of-Thought (CoT) designs affect the acquisition of the generalizable visual reasoning ability in vision-language models (VLMs). While CoT data, especially long or visual CoT such as "think with image", has been widely used to supervise intermediate reasoning, it remains unclear why specific CoT designs help and which ones truly support generalizable reasoning. To systematically evaluate this, we focus on a controlled maze-solving benchmark where reasoning rules are fully visual, difficulty can be tuned by grid size, and all the intermediate steps can be automatically generated. Using Qwen2.5-VL-7B under a standard SFT-then-RL pipeline, we compare three representative CoT formats: Language CoT, Grounding CoT (with spatial coordinate trajectories), and Visual CoT (with image manipulations). Our experiments reveal that visual and longer CoT mainly accelerate convergence but do not lift the final performance ceiling; concise CoT containing only essential grounding steps outperforms longer traces; and, strikingly, CoT retaining only the minimal grounding results generalizes best across different maze sizes. We further validate these insights on other vision-centric tasks. These findings highlight a "short is long" effect and provide practical guidance for constructing more generalizable SFT datasets for visual reasoning.
PDF51December 4, 2025