엔트로피 기반 적응형 가중치를 활용한 자기 학습

초록

대규모 언어 모델의 수학적 문제 해결 능력은 연구의 주요 초점이 되었으며, 이러한 모델을 개선하고 강화하기 위한 유망한 방법으로 자체 생성 추론 경로를 활용하는 데 대한 관심이 증가하고 있습니다. 이러한 경로는 단계별 논리적 과정을 포착하면서도 정답만을 지도 정보로 요구합니다. 자기 학습 방법은 외부 모델과 수동 주석이 필요 없이도 추론 과제에서 효과적인 것으로 입증되었습니다. 그러나 모델 학습을 위해 자체 생성 데이터를 최적화하는 것은 여전히 해결해야 할 과제로 남아 있습니다. 본 연구에서는 자기 학습 중 불확실한 데이터를 우선적으로 고려하기 위해 엔트로피 기반 적응 가중치 전략(Entropy-Based Adaptive Weighting for Self-Training, EAST)을 제안합니다. 구체적으로, EAST는 가중치의 선명도를 조절하는 튜닝 가능한 매개변수를 가진 매핑 함수를 사용하여 모델이 더 큰 불확실성을 보이는 데이터에 더 높은 가중치를 부여합니다. 이 접근법은 모델이 더 유익하고 도전적인 예제에 집중하도록 유도함으로써 추론 능력을 향상시킵니다. 우리는 이 방법을 GSM8K와 MATH 벤치마크에서 평가했습니다. 실험 결과, 기본 방법은 MATH에서 거의 개선을 보이지 않은 반면(0%), EAST는 백본 모델 대비 약 1%의 성능 향상을 달성했습니다. GSM8K에서는 EAST가 기본 방법 대비 추가로 1-2%의 성능 향상을 보였습니다.

English

The mathematical problem-solving capabilities of large language models have become a focal point of research, with growing interests in leveraging self-generated reasoning paths as a promising way to refine and enhance these models. These paths capture step-by-step logical processes while requiring only the correct answer for supervision. The self-training method has been shown to be effective in reasoning tasks while eliminating the need for external models and manual annotations. However, optimizing the use of self-generated data for model training remains an open challenge. In this work, we propose Entropy-Based Adaptive Weighting for Self-Training (EAST), an adaptive weighting strategy designed to prioritize uncertain data during self-training. Specifically, EAST employs a mapping function with a tunable parameter that controls the sharpness of the weighting, assigning higher weights to data where the model exhibits greater uncertainty. This approach guides the model to focus on more informative and challenging examples, thereby enhancing its reasoning ability. We evaluate our approach on GSM8K and MATH benchmarks. Empirical results show that, while the vanilla method yields virtually no improvement (0%) on MATH, EAST achieves around a 1% gain over backbone model. On GSM8K, EAST attains a further 1-2% performance boost compared to the vanilla method.