신경 덩굴숲: 사전 훈련된 가중치 주변에 밀집된 다양한 작업 전문가들

초록

사전학습은 일반적으로 추가적인 반복적 적응을 위한 시작점으로 취급되는 학습된 매개변수 벡터를 생성합니다. 본 연구에서는 사전학습의 결과를 매개변수 벡터에 대한 분포로 바라보며, 이 분포의 지지집합이 이미 작업별 전문가를 포함하고 있다고 간주합니다. 우리는 소규모 모델에서는 이러한 전문가 해법이 해당 분포 부피의 무시할 만한 부분만을 차지하여, 그 발견이 경사하강법과 같은 구조화된 최적화 방법에 의존하게 됨을 보여줍니다. 반대로, 대규모이며 잘 사전학습된 모델에서는 작업 전문가의 밀도가 급격히 증가하여, 다양한 작업 성능 향상 전문가들이 사전학습된 가중치 주변의 상당 부분을 차지하게 됩니다. 이러한 관점에 기반하여, 우리는 N개의 매개변수 섭동을 무작위로 샘플링하고 상위 K개를 선택한 후 다수결 투표를 통해 예측을 앙상블하는 간단한 완전 병렬 사후학습 방법을 탐구합니다. 간단함에도 불구하고, 이 접근법은 현대적 대규모 모델에 대해 PPO, GRPO, ES와 같은 표준 사후학습 방법들과 경쟁력을 보입니다.

English

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples N parameter perturbations at random, selects the top K, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

신경 덩굴숲: 사전 훈련된 가중치 주변에 밀집된 다양한 작업 전문가들

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

초록

Support