ChatPaper.aiChatPaper

透過向同儕專家群體學習來提升大型視覺與語言模型的能力

Improving Large Vision and Language Models by Learning from a Panel of Peers

September 1, 2025
作者: Jefferson Hernandez, Jing Shi, Simon Jenni, Vicente Ordonez, Kushal Kafle
cs.AI

摘要

大型視覺與語言模型(LVLMs)的傳統對齊方法主要依賴於人工篩選的偏好數據。人類生成的偏好數據成本高昂;機器生成的偏好數據質量有限;而自監督的偏好數據往往會引入幻覺。為克服這些限制,我們提出了一種受人類協作學習啟發的新型「同儕評審團」學習框架。該方法利用一組LVLMs,每個模型通過迭代的自我改進過程,評估並從集體輸出中學習。通過模擬同行評審系統,我們的模型針對一組精心設計的提示生成、評估並精煉輸出,模仿課堂學習環境。我們證明,這種方法無需大量人工標註數據即可提升模型性能。實驗結果顯示,在多個基準測試中均有顯著改善,展示了同儕評估作為自監督對齊的可擴展替代方案的潛力。值得注意的是,我們發現「同儕評審團」將十五個基準測試的平均分數從48%提升至57%。
English
Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%
PDF21September 3, 2025