비전 슈퍼얼라인먼트: 비전 파운데이션 모델을 위한 약한-강한 일반화

초록

최근 대규모 언어 모델의 발전은 이들의 비범하고 초인적이라 할 수 있는 능력에 대한 관심을 불러일으켰으며, 연구자들은 이러한 능력을 평가하고 최적화하는 방법, 즉 '슈퍼얼라인먼트(superalignment)'를 탐구하게 되었습니다. 이러한 맥락에서, 본 논문은 비전 기반 모델의 영역에 깊이 파고들어, 약한 모델이 강한 모델을 감독함으로써 후자의 능력을 전자의 한계를 넘어서 향상시키는 것을 목표로 하는 '약한 모델에서 강한 모델로의 일반화(weak-to-strong generalization)' 개념에 초점을 맞춥니다. 우리는 약한 모델에서 강한 모델로의 감독을 위한 새로운 적응형 조정 가능한 손실 함수를 소개합니다. 우리의 포괄적인 실험은 소수 샷 학습, 전이 학습, 노이즈가 있는 레이블 학습, 그리고 일반적인 지식 증류 설정을 포함한 다양한 시나리오를 아우릅니다. 결과는 놀라운데, 우리의 접근법은 강한 모델에서 강한 모델로의 일반화가 설정한 성능 벤치마크를 넘어섰을 뿐만 아니라, 전체 데이터셋으로 강한 모델을 미세 조정한 결과도 능가했습니다. 이러한 설득력 있는 증거는 약한 모델에서 강한 모델로의 일반화가 비전 기반 모델의 성능을 크게 향상시킬 수 있는 상당한 잠재력을 가지고 있음을 강조합니다. 코드는 https://github.com/ggjy/vision_weak_to_strong에서 확인할 수 있습니다.

English

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.

비전 슈퍼얼라인먼트: 비전 파운데이션 모델을 위한 약한-강한 일반화

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

초록

Support