神經叢林：預訓練權重周圍密佈多樣化任務專家

摘要

預訓練所產生的學習參數向量通常被視為後續迭代適應的起點。本研究提出一種新視角：將預訓練結果視為參數向量的概率分佈，其支撐集中已蘊含任務專家的解。我們證明在小型模型中，此類專家解僅佔該分佈體積的微小部分，需依賴梯度下降等結構化優化方法才能發現；而在經過充分預訓練的大型模型中，任務專家的密度會急劇增加，使得多樣化的任務改進型專家充斥於預訓練權重鄰域的相當大範圍內。基於此觀點，我們探索了一種完全並行的簡單後訓練方法：隨機採樣N個參數擾動，選取表現最佳的K個樣本，並通過多數表決進行預測集成。儘管方法簡潔，該策略在當代大規模模型中的表現可與PPO、GRPO、ES等標準後訓練方法相媲美。

English

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples N parameter perturbations at random, selects the top K, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

神經叢林：預訓練權重周圍密佈多樣化任務專家

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

摘要

Support