모델을 믿으세요: 분포 기반 신뢰도 보정

초록

대규모 추론 모델은 테스트 시간 스케일링 기법의 발전으로 인해 여러 후보 응답을 생성하고 가장 신뢰할 수 있는 답변을 선택함으로써 예측 정확도를 향상시키는 놀라운 성능을 보여주고 있습니다. 기존 연구에서는 신뢰도 점수와 같은 내부 모델 신호가 응답 정확성을 부분적으로 나타내며 정확도와 분포적 상관관계를 보인다고 분석했지만, 이러한 분포 정보가 답변 선택을 안내하는 데 완전히 활용되지는 못했습니다. 이에 동기를 부여받아, 우리는 분포적 사전 정보를 투표 과정에서 신뢰도와 함께 또 다른 신호로 통합하는 DistriVoting을 제안합니다. 구체적으로, 우리의 방법은 (1) 먼저 가우시안 혼합 모델을 사용하여 혼합된 신뢰도 분포를 긍정 및 부정 구성 요소로 분해하고, (2) 이를 통해 얻은 긍정/부정 샘플을 기반으로 한 불량 필터를 적용하여 두 분포 간의 중첩을 완화합니다. 또한 분포 자체의 관점에서 중첩을 추가적으로 완화하기 위해, 단계별 신뢰도를 사용하여 추론 과정을 동적으로 조정하여 두 분포 간의 분리를 증가시키고 투표 시 신뢰도의 신뢰성을 향상시키는 SelfStepConf를 제안합니다. 16개 모델과 5개 벤치마크에 걸친 실험을 통해 우리의 방법이 최첨단 접근법을 크게 능가함을 입증합니다.

English

Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that internal model signals like confidence scores can partly indicate response correctness and exhibit a distributional correlation with accuracy, such distributional information has not been fully utilized to guide answer selection. Motivated by this, we propose DistriVoting, which incorporates distributional priors as another signal alongside confidence during voting. Specifically, our method (1) first decomposes the mixed confidence distribution into positive and negative components using Gaussian Mixture Models, (2) then applies a reject filter based on positive/negative samples from them to mitigate overlap between the two distributions. Besides, to further alleviate the overlap from the perspective of distribution itself, we propose SelfStepConf, which uses step-level confidence to dynamically adjust inference process, increasing the separation between the two distributions to improve the reliability of confidences in voting. Experiments across 16 models and 5 benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches.

모델을 믿으세요: 분포 기반 신뢰도 보정

Believe Your Model: Distribution-Guided Confidence Calibration

초록

Support