대규모 언어 모델 및 시각 언어 모델을 위한 재귀적 사고-응답 프로세스

초록

DeepSeek-R1과 같은 Think-Answer 추론 모델은 해석 가능한 내부 추론을 활용하여 주목할 만한 진전을 이루었습니다. 그러나 "Oops!"와 같은 자기 반성적 신호가 빈번하게 나타남에도 불구하고, 단일 패스 추론 과정에서 출력 오류에 취약한 한계가 여전히 존재합니다. 이러한 한계를 해결하기 위해 우리는 기존의 단일 패스 방식을 넘어 반복적인 추론 사이클을 통해 더 정확한 답변을 생성할 수 있는 효율적인 Recursive Think-Answer Process(R-TAP)를 제안합니다. 이 접근법의 핵심은 모델 응답의 확실성을 평가하고 후속 개선을 안내하는 신뢰도 생성기입니다. 두 가지 상호 보완적인 보상—재귀적 신뢰도 증가 보상과 최종 답변 신뢰도 보상—을 도입함으로써 R-TAP이 적용된 모델이 대규모 언어 모델(LLM)과 시각-언어 모델(VLM) 모두에서 기존 단일 패스 방법을 지속적으로 능가함을 보여줍니다. 더 나아가 모델 응답에서 "Oops" 유사 표현의 빈도를 분석한 결과, R-TAP이 적용된 모델은 자기 반성적 패턴이 현저히 줄어들어 더 안정적이고 빠른 추론 시간을 보여주었습니다. 우리는 R-TAP이 미래 AI의 추론 과정을 정제하는 효율적이고 정교한 방법으로 진화하는 길을 열어가기를 기대합니다.

English

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

대규모 언어 모델 및 시각 언어 모델을 위한 재귀적 사고-응답 프로세스

Recursive Think-Answer Process for LLMs and VLMs

초록

Support