透過向同儕專家群體學習來提升大型視覺與語言模型的能力

摘要

大型視覺與語言模型（LVLMs）的傳統對齊方法主要依賴於人工篩選的偏好數據。人類生成的偏好數據成本高昂；機器生成的偏好數據質量有限；而自監督的偏好數據往往會引入幻覺。為克服這些限制，我們提出了一種受人類協作學習啟發的新型「同儕評審團」學習框架。該方法利用一組LVLMs，每個模型通過迭代的自我改進過程，評估並從集體輸出中學習。通過模擬同行評審系統，我們的模型針對一組精心設計的提示生成、評估並精煉輸出，模仿課堂學習環境。我們證明，這種方法無需大量人工標註數據即可提升模型性能。實驗結果顯示，在多個基準測試中均有顯著改善，展示了同儕評估作為自監督對齊的可擴展替代方案的潛力。值得注意的是，我們發現「同儕評審團」將十五個基準測試的平均分數從48%提升至57%。

English

Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%

透過向同儕專家群體學習來提升大型視覺與語言模型的能力

Improving Large Vision and Language Models by Learning from a Panel of Peers

摘要

Support