熵正则化激活：通過激活作為熵約束提升連續控制、大型語言模型與圖像分類

摘要

我們提出了ERA這一新範式，通過對模型輸出施加特別設計的激活函數，將採樣熵限制在給定閾值之上。我們的方法在多個領域展現了廣泛的有效性：1) 對於大型語言模型(LLMs)，將Qwen2.5-Math-7B在AIME 2025上的得分提升了37.4%；2) 對於連續控制強化學習代理，在HumanoidBench等挑戰性任務上，相較於SAC等強基線，性能提升超過30%；3) 在圖像分類任務中，將ResNet-50在ImageNet上的top-1準確率提高了0.69%。這些增益的實現僅伴隨著不到7%的計算開銷。我們的工作驗證了輸出激活作為熵控制的有力工具，為設計更簡單、更魯棒的算法開闢了新的方向。

English

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

熵正则化激活：通過激活作為熵約束提升連續控制、大型語言模型與圖像分類

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

摘要

Support