EAGER: 적응형 추론 시간 스케일링을 위한 엔트로피 인식 생성

초록

추론 언어 모델과 테스트 시간 스케일링 방법의 등장으로 모델 성능을 개선하기 위한 패러다임이 형성되면서, 동일한 프롬프트에서 여러 후보 시퀀스를 생성하기 위해 상당한 계산이 종종 필요하게 되었다. 이는 올바른 해결책을 향한 다양한 추론 경로를 탐색할 수 있게 하지만, 각 프롬프트에 동일한 계산 예산을 할당한다. 서로 다른 프롬프트가 서로 다른 수준의 복잡성을 지니며, 따라서 서로 다른 계산 요구를 가진다는 가정에 기반하여, 우리는 EAGer라는 훈련 없이도 생성 가능한 방법을 제안한다. 이 방법은 토큰별 엔트로피 분포를 통해 모델의 불확실성을 활용하여 중복 계산을 줄이고 동시에 전반적인 성능을 개선한다. EAGer는 고엔트로피 토큰이 있는 경우에만 여러 추론 경로로 분기하고, 그렇게 절약된 계산 예산을 대체 경로 탐색이 가장 필요한 경우에 재할당한다. 우리는 AIME 2025와 같은 복잡한 추론 벤치마크에서 여러 오픈소스 모델을 대상으로 실험한 결과, EAGer가 목표 레이블에 접근하지 않고도 예산을 재할당할 수 있으며, 추론 길이와 Pass@k 측면에서 최고의 효율-성능 균형을 달성함을 발견했다. 목표 레이블에 접근 가능한 경우, EAGer는 Full Parallel Sampling 대비 최대 65% 적은 토큰을 생성(따라서 계산을 절약)하고 Pass@k에서 최대 37%의 개선을 달성했다.

English

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.

EAGER: 적응형 추론 시간 스케일링을 위한 엔트로피 인식 생성

EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

초록

Support