视觉超对齐：视觉基础模型的弱到强泛化

摘要

最近大型语言模型的进展引起了人们对其非凡和接近超人类能力的兴趣，促使研究人员探索评估和优化这些能力的方法，这被称为超对齐。在这个背景下，我们的论文深入探讨了视觉基础模型领域，着重讨论了弱到强泛化的概念，即利用一个较弱的模型监督一个较强的模型，旨在提升后者的能力超越前者的极限。我们引入了一种新颖且可自适应调整的弱到强监督损失函数。我们的全面实验涵盖了各种场景，包括少样本学习、迁移学习、噪声标签学习和常识蒸馏设置。结果令人瞩目：我们的方法不仅超过了由强到强泛化设定的性能基准，还超越了用整个数据集微调强模型的结果。这一令人信服的证据凸显了弱到强泛化的巨大潜力，展示了它显著提升视觉基础模型性能的能力。代码可在https://github.com/ggjy/vision_weak_to_strong 获取。

English

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.

视觉超对齐：视觉基础模型的弱到强泛化

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

摘要

Support