視覺超對齊：視覺基礎模型的弱到強泛化

摘要

近期大型語言模型的進步引發了人們對其非凡和接近超人類能力的興趣，促使研究人員探索評估和優化這些能力的方法，這被稱為超對齊。在這個背景下，我們的論文深入探討了視覺基礎模型的領域，著重於弱到強泛化的概念，即利用較弱模型監督較強模型，旨在提升後者的能力超越前者的極限。我們引入了一種新穎且可自適應調整的弱到強監督損失函數。我們的全面實驗涵蓋各種情境，包括少樣本學習、遷移學習、噪聲標籤學習和常識蒸餾設置。結果令人驚訝：我們的方法不僅超越了由強到強泛化設定設定的性能基準，還超越了使用整個數據集對強模型進行微調的結果。這些有力的證據突顯了弱到強泛化的重要潛力，展示了它顯著提升視覺基礎模型性能的能力。程式碼可在 https://github.com/ggjy/vision_weak_to_strong 找到。

English

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.

視覺超對齊：視覺基礎模型的弱到強泛化

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

摘要

Support