OThink-R1: 과도한 추론 완화를 위한 내재적 빠른/느린 사고 모드 전환

초록

최근의 고급 대형 추론 모델(LRMs)은 확장된 사고의 연쇄(CoT) 추론을 활용하여 복잡한 작업을 해결하며 최첨단 성능을 달성하고 있습니다. 그러나 이러한 성공에도 불구하고, 우리는 중요한 문제를 발견했습니다: LRMs에 의해 해결된 단순 작업의 상당 부분이 비추론적 대형 언어 모델(LLMs)을 사용하여 훨씬 적은 토큰으로도 해결될 수 있다는 점입니다. 이는 복잡한 추론이 항상 필요하지 않을 수 있음을 시사합니다. 이를 해결하기 위해, 우리는 LRMs의 추론 궤적을 체계적으로 분석하고, 식별된 패러다임과 LLM-Judge를 활용하여 이러한 궤적을 '불필요한 추론(Redundant Reasoning)' 또는 '필수적인 추론(Essential Reasoning)'으로 분류하는 방법을 제시합니다. 또한, 우리는 OThink-R1이라는 방법을 소개합니다. 이 방법은 논리적 타당성을 유지하면서 불필요한 추론 단계를 제거합니다. OThink-R1은 단순한 문제에 대해서는 비사고 모드(빠른 사고)를 동적으로 사용하고, 복잡한 문제에 대해서는 신중한 사고(느린 사고)를 수행합니다. 수학적 문제와 질의응답 작업에 대한 실험 결과, OThink-R1은 정확도를 저하시키지 않으면서 평균적으로 거의 23%의 추론 중복을 줄임으로써 효율적인 추론 모델을 위한 실용적인 지침을 제공합니다. 코드는 https://github.com/AgenticIR-Lab/OThink-R1에서 확인할 수 있습니다.

English

Recent advanced large reasoning models (LRMs) leverage extended chain-of-thought (CoT) reasoning to solve complex tasks, achieving state-of-the-art performance. Despite their success, we identify a critical issue: a substantial portion of simple tasks solved by LRMs can also be addressed by non-reasoning LLMs using significantly fewer tokens, indicating the complex reasoning may not always be necessary. To address this, we systematically analyze the reasoning trajectories of LRMs and present a method utilizing identified paradigms and LLM-Judge to classify these trajectories as either Redundant Reasoning or Essential Reasoning. And we introduce OThink-R1, a method that prunes redundant reasoning steps while preserving logical validity. OThink-R1 dynamically employs the non-thinking mode (fast-thinking) for straightforward problems while engaging in deliberate thinking (slow-thinking) for complex problems. Experiments across mathematical and question-answering tasks demonstrate that OThink-R1 reduces reasoning redundancy by almost 23\% on average without compromising accuracy, offering practical guidelines for efficient reasoning models. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.

OThink-R1: 과도한 추론 완화를 위한 내재적 빠른/느린 사고 모드 전환

OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

초록

Support