ChatPaper.aiChatPaper

通过向同行专家小组学习,提升大规模视觉与语言模型性能

Improving Large Vision and Language Models by Learning from a Panel of Peers

September 1, 2025
作者: Jefferson Hernandez, Jing Shi, Simon Jenni, Vicente Ordonez, Kushal Kafle
cs.AI

摘要

传统的大型视觉与语言模型(LVLMs)对齐方法主要依赖于人工筛选的偏好数据。然而,人工生成的偏好数据成本高昂;机器生成的偏好数据质量有限;而自监督的偏好数据又常常引入幻觉问题。为克服这些局限,我们提出了一种受人类协作学习启发的新型“同行评审团”学习框架。该方法利用一组LVLMs,通过迭代自我改进过程,相互评估并学习彼此的集体输出。通过模拟同行评审系统,我们的模型针对一系列精心设计的提示生成、评估并优化输出,仿效课堂学习环境。我们证明,这一方法无需大量人工标注数据集即可提升模型性能。实验结果显示,在多个基准测试中均取得显著进步,展现了同行评估作为自监督对齐可扩展替代方案的潜力。尤为突出的是,我们展示“同行评审团”方法将十五个基准测试的平均得分从48%提升至57%。
English
Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%
PDF21September 3, 2025