通过头尾重平衡策略缓解LVLM自我提升中的马太效应
Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
October 30, 2025
作者: Xin Guo, Zhiheng Xi, Yiwen Ding, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang
cs.AI
摘要
自我改进已成为提升大型视觉语言模型推理能力的主流范式,该模式通过模型对成功轨迹的迭代探索与学习实现能力进化。然而我们发现这一过程中存在关键问题:模型能对简单查询(即头部数据)生成高质量轨迹,却难以处理复杂查询(即尾部数据)。这种不平衡优化导致模型倾向于优先掌握简单推理技能,从而削弱其应对复杂推理任务的能力。随着迭代次数的增加,这种失衡会日益加剧——我们称之为"马太效应"——最终阻碍模型的持续改进并形成性能瓶颈。为应对这一挑战,我们提出分布重塑与轨迹重采样两大视角下的四种高效策略,在探索式学习的自我改进过程中实现头尾数据的动态再平衡。基于Qwen2-VL-7B-Instruct和InternVL2.5-4B模型在视觉推理任务上的大量实验表明,我们的方法能持续提升视觉推理能力,相较原始自我改进方法平均提升3.86个指标点。
English
Self-improvement has emerged as a mainstream paradigm for advancing the
reasoning capabilities of large vision-language models (LVLMs), where models
explore and learn from successful trajectories iteratively. However, we
identify a critical issue during this process: the model excels at generating
high-quality trajectories for simple queries (i.e., head data) but struggles
with more complex ones (i.e., tail data). This leads to an imbalanced
optimization that drives the model to prioritize simple reasoning skills, while
hindering its ability to tackle more complex reasoning tasks. Over iterations,
this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew
effect"--which ultimately hinders further model improvement and leads to
performance bottlenecks. To counteract this challenge, we introduce four
efficient strategies from two perspectives: distribution-reshaping and
trajectory-resampling, to achieve head-tail re-balancing during the
exploration-and-learning self-improvement process. Extensive experiments on
Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks
demonstrate that our methods consistently improve visual reasoning
capabilities, outperforming vanilla self-improvement by 3.86 points on average.