ChatPaper.aiChatPaper

通过头部-尾部重平衡缓解LVLM自我提升中的马太效应

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

October 30, 2025
作者: Xin Guo, Zhiheng Xi, Yiwen Ding, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang
cs.AI

摘要

自我提升已成为增强大型视觉语言模型推理能力的主流范式,该过程中模型通过迭代探索并学习成功轨迹。然而我们发现这一过程中的关键问题:模型擅长为简单查询(即头部数据)生成高质量轨迹,却在处理复杂查询(即尾部数据)时表现不佳。这种不平衡优化导致模型倾向于优先掌握简单推理技能,而应对复杂推理任务的能力发展受阻。随着迭代次数的增加,这种失衡现象日益显著——我们称之为“马太效应”——最终阻碍模型的持续改进并引发性能瓶颈。为应对此挑战,我们提出分布重塑与轨迹重采样双视角下的四种高效策略,在探索式学习的自我提升过程中实现头尾数据的动态再平衡。基于Qwen2-VL-7B-Instruct和InternVL2.5-4B模型在视觉推理任务上的大量实验表明,我们的方法能持续提升视觉推理能力,相比原始自我提升范式平均提升3.86个指标点。
English
Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical issue during this process: the model excels at generating high-quality trajectories for simple queries (i.e., head data) but struggles with more complex ones (i.e., tail data). This leads to an imbalanced optimization that drives the model to prioritize simple reasoning skills, while hindering its ability to tackle more complex reasoning tasks. Over iterations, this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew effect"--which ultimately hinders further model improvement and leads to performance bottlenecks. To counteract this challenge, we introduce four efficient strategies from two perspectives: distribution-reshaping and trajectory-resampling, to achieve head-tail re-balancing during the exploration-and-learning self-improvement process. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.
PDF21December 2, 2025