엔트로피 정규화 활성화: 활성화를 엔트로피 제약으로 활용한 연속 제어, 대형 언어 모델 및 이미지 분류 성능 향상

초록

우리는 ERA라는 새로운 패러다임을 제안합니다. 이는 모델 출력에 특별히 설계된 활성화 함수를 적용하여 샘플링 엔트로피를 주어진 임계값 이상으로 제약합니다. 우리의 접근 방식은 다양한 도메인에서 광범위한 효과를 입증했습니다: 1) 대형 언어 모델(LLM)의 경우, Qwen2.5-Math-7B의 AIME 2025 점수를 37.4% 향상시켰고, 2) 연속 제어 강화 학습 에이전트의 경우, SAC와 같은 강력한 베이스라인 대비 도전적인 HumanoidBench에서 30% 이상의 성능 향상을 보였으며, 3) 이미지 분류의 경우, ResNet-50의 ImageNet top-1 정확도를 0.69% 개선했습니다. 이러한 성과는 7% 미만의 계산 오버헤드로 달성되었습니다. 우리의 연구는 출력 활성화 함수가 엔트로피 제어를 위한 강력한 도구임을 검증하며, 더 간단하고 강력한 알고리즘 설계를 위한 새로운 방향을 제시합니다.

English

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

엔트로피 정규화 활성화: 활성화를 엔트로피 제약으로 활용한 연속 제어, 대형 언어 모델 및 이미지 분류 성능 향상

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

초록

Support