MLP 확장: 귀납적 편향에 관한 이야기

초록

본 연구에서는 딥러닝의 가장 기본적인 구성 요소인 다층 퍼셉트론(MLP)을 재조명하고, 시각 작업에서의 성능 한계를 탐구합니다. MLP에 대한 실증적 통찰은 여러 가지 이유로 중요합니다. (1) 최근 트랜스포머가 컨볼루션 모델을 능가하면서 "덜 가정된 편향이 더 낫다"는 주장이 유행하고 있는데, 이러한 가설의 한계를 탐구하는 것은 자연스러운 일입니다. 이를 위해 MLP는 어떠한 귀납적 편향도 완전히 배제된 이상적인 테스트 베드 역할을 합니다. (2) MLP는 수학적 단순성으로 인해 딥러닝 이론 문헌에서 거의 독점적으로 주요 주인공 역할을 해왔으며, 더 복잡한 아키텍처에서 관찰된 실증적 현상을 설명하기 위한 대리자 역할을 해왔습니다. 그러나 놀랍게도, 특히 대규모 사전 학습 프로토콜과 결합된 경우, 문헌에서 MLP에 대한 실험 데이터를 찾는 것은 매우 어렵습니다. 이러한 실습과 이론 간의 불일치는 우려스러운 문제입니다: MLP가 실제 모델에서 보여주는 실증적 발전을 반영하는가? 아니면 이론가들이 MLP의 대리자 역할을 재고해야 하는가? 우리는 이 두 가지 측면에 대한 통찰을 제공합니다. 우리는 MLP의 성능이 규모에 따라 극적으로 향상됨을 보여주며(CIFAR10에서 93%, CIFAR100에서 79%, TinyImageNet에서 69%), 귀납적 편향의 부재가 실제로 보상될 수 있음을 강조합니다. 우리는 MLP가 현대 모델의 행동을 충실히 모방하지만, 학습 설정의 일부 구성 요소가 더 강력하거나 예상치 못한 행동을 보이는 것을 관찰했습니다. MLP의 본질적인 계산 효율성 덕분에, 대규모 사전 학습 실험이 학계 연구자들에게 더 접근 가능해졌습니다. 우리의 모든 실험은 단일 GPU에서 실행되었습니다.

English

In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of this hypothesis. To that end, MLPs offer an ideal test bed, being completely free of any inductive bias. (2) MLPs have almost exclusively been the main protagonist in the deep learning theory literature due to their mathematical simplicity, serving as a proxy to explain empirical phenomena observed for more complex architectures. Surprisingly, experimental datapoints for MLPs are very difficult to find in the literature, especially when coupled with large pre-training protocols. This discrepancy between practice and theory is worrying: Do MLPs reflect the empirical advances exhibited by practical models? Or do theorists need to rethink the role of MLPs as a proxy? We provide insights into both these aspects. We show that the performance of MLPs drastically improves with scale (93% on CIFAR10, 79% on CIFAR100, 69% on TinyImageNet), highlighting that lack of inductive bias can indeed be compensated. We observe that MLPs mimic the behaviour of their modern counterparts faithfully, with some components in the learning setting however surprisingly exhibiting stronger or unexpected behaviours. Due to their inherent computational efficiency, large pre-training experiments become more accessible for academic researchers. All of our experiments were run on a single GPU.

MLP 확장: 귀납적 편향에 관한 이야기

Scaling MLPs: A Tale of Inductive Bias

초록

Support