오픈 기반 언어-비전 모델 및 데이터셋의 강건한 비교를 위한 스케일링 법칙

초록

전이 학습 연구에서 다양한 중요한 기초 모델의 특성과 대규모에서의 성능을 예측하기 위해 스케일링 법칙이 도출됩니다. 본 연구에서는 스케일링 법칙 도출이 모델 및 데이터셋 비교에도 활용될 수 있음을 보여주며, 이를 통해 사전 학습에 어떤 절차를 선호할지 결정할 수 있습니다. 처음으로, CLIP과 MaMMUT라는 두 가지 중요한 언어-시각 학습 절차에 대해, 대조적 손실만 사용하거나 대조적 및 캡션 텍스트 생성 손실을 모두 사용하는 모델과 샘플 규모에 걸친 밀집 측정을 기반으로 한 완전한 스케일링 법칙이 도출되었습니다. 보유된 데이터 포인트에 대한 충분한 예측 정확도를 보장하면서, 도출된 스케일링 법칙을 사용하여 두 모델을 비교함으로써 MaMMUT가 규모에 따른 더 강력한 개선과 표준 CLIP보다 더 나은 샘플 효율성을 보이는 증거를 얻었습니다. 비교의 타당성을 강화하기 위해, 분류, 검색, 세분화와 같은 다양한 다운스트림 작업과 DataComp, DFN, Re-LAION과 같은 다른 오픈 데이터셋에 대한 스케일링 법칙을 제시하며, 일관된 동일한 경향을 관찰했습니다. 또한, 일정한 학습률 스케줄로 스케일링 법칙을 도출할 때도 비교가 가능함을 보여주며, 이는 계산 비용을 줄입니다. 스케일링 법칙의 정확한 도출은 단일 참조 규모에서의 측정만을 기반으로 한 오해의 소지를 피하고, 규모에 걸친 모델 및 데이터셋 비교를 수행할 수 있는 수단을 제공함으로써, 오픈 기초 모델과 데이터셋의 체계적인 비교 및 개선을 위한 길을 열어줍니다. 우리는 DataComp-1.4B의 12.8B 샘플로 학습된 80.3%의 제로샷 ImageNet-1k 정확도를 달성한 openMaMMUT-L/14를 포함한 모든 사전 학습 모델과 중간 체크포인트를 공개합니다. 논문의 실험을 재현하기 위한 코드와 원시 실험 데이터는 https://github.com/LAION-AI/scaling-laws-for-comparison에서 확인할 수 있습니다.

English

In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. For the first time, full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. Ensuring sufficient prediction accuracy for held out points, we use derived scaling laws to compare both models, obtaining evidence for MaMMUT's stronger improvement with scale and better sample efficiency than standard CLIP. To strengthen validity of the comparison, we show scaling laws for various downstream tasks, classification, retrieval, and segmentation, and for different open datasets, DataComp, DFN and Re-LAION, observing consistently the same trends. We show that comparison can also be performed when deriving scaling laws with a constant learning rate schedule, reducing compute cost. Accurate derivation of scaling laws provides thus means to perform model and dataset comparison across scale spans, avoiding misleading conclusions based on measurements from single reference scales only, paving the road for systematic comparison and improvement of open foundation models and datasets for their creation. We release all the pre-trained models with their intermediate checkpoints, including openMaMMUT-L/14, which achieves 80.3% zero-shot ImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/scaling-laws-for-comparison.

오픈 기반 언어-비전 모델 및 데이터셋의 강건한 비교를 위한 스케일링 법칙

Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

초록

Support