접두사 일관성을 통한 신뢰할 수 있는 추론 연쇄

초록

대규모 언어 모델은 추론 작업에서 정확도를 향상시키기 위해 여러 Chain-of-Thought(CoT) 흔적을 샘플링하고 다수결 투표(MV)로 집계하는 테스트 시간 기법인 자기 일관성(self-consistency)을 자주 사용합니다. CoT를 중간에 자르고 나머지를 재생성할 때, 정답을 포함한 흔적이 오답을 포함한 흔적보다 원래 답을 더 자주 재현한다는 것을 관찰했습니다. 이 차이를 신뢰성 신호인 접두사 일관성(prefix consistency)으로 사용하여, 각 후보 답변을 재생성 하에서 재등장하는 빈도에 따라 가중치를 부여합니다. 이는 토큰 로그 확률이나 자기 평가 프롬프트에 접근할 필요가 없습니다. 다섯 가지 추론 모델과 네 가지 수학 및 과학 벤치마크에서 접두사 일관성은 대부분의 설정에서 최고의 정확성 예측 변수였으며, 이를 통해 투표를 재가중치화하면 최대 21배(중앙값 4.6배) 적은 토큰으로 표준 MV의 최고 정확도에 도달했습니다. 우리의 코드는 https://github.com/naoto-iwase/prefix-consistency에서 확인할 수 있습니다.

English

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

접두사 일관성을 통한 신뢰할 수 있는 추론 연쇄

Reliable Chain-of-Thought via Prefix Consistency

초록

Support