エントロピー正則化活性化：活性化をエントロピー制約として活用した連続制御、大規模言語モデル、および画像分類の強化

要旨

我々はERAという新しいパラダイムを提案します。これは、モデルの出力に特別に設計された活性化関数を適用することで、サンプリングエントロピーを所定の閾値以上に制約するものです。本手法は様々な領域で幅広い有効性を示しています：1) 大規模言語モデル(LLM)において、Qwen2.5-Math-7BのAIME 2025スコアを37.4%向上させ、2) 連続制御強化学習エージェントでは、HumanoidBenchのような難易度の高いタスクにおいてSACなどの強力なベースラインを30%以上上回る性能を達成し、3) 画像分類では、ResNet-50のImageNetトップ1精度を0.69%向上させました。これらの改善は、7%未満の計算オーバーヘッドで実現されています。我々の研究は、出力活性化関数がエントロピー制御の強力なツールであることを実証し、よりシンプルでロバストなアルゴリズム設計に向けた新たな方向性を開拓しました。

English

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

エントロピー正則化活性化：活性化をエントロピー制約として活用した連続制御、大規模言語モデル、および画像分類の強化

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

要旨

Support