ニューラルな茂み：多様なタスク専門家は事前学習済み重みの周囲に密集している

要旨

事前学習によって得られる学習済みパラメータベクトルは、通常、さらなる反復的適応の出発点として扱われる。本研究では、事前学習の結果をパラメータベクトル上の分布として捉え直し、その支持集合には既にタスク特化のエキスパートが含まれていると見なす。我々は、小規模モデルではこのようなエキスパート解が分布の体積において無視できる割合しか占めておらず、その発見が勾配降下法のような構造化された最適化手法に依存することを示す。対照的に、大規模で十分に事前学習されたモデルでは、タスクエキスパートの密度が劇的に増加し、多様なタスク改善型の専門家が事前学習済み重みの近傍のかなりの部分を占めるようになる。この視点に動機付けられ、我々は単純かつ完全並列な事後学習手法を探求する。これはN個のパラメータ摂動を無作為にサンプリングし、上位K個を選択し、多数決によって予測をアンサンブルする手法である。この手法は単純であるにも関わらず、現代の大規模モデルにおいて、PPO、GRPO、ESなどの標準的な事後学習手法と同等の性能を示す。

English

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples N parameter perturbations at random, selects the top K, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

ニューラルな茂み：多様なタスク専門家は事前学習済み重みの周囲に密集している

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

要旨

Support