코드 지원 사고 체인 및 모델 추론을 위한 명령어 확장

초록

추론 능력은 대규모 언어 모델(LLM)이 복잡한 과제를 해결하는 데 있어 핵심적이지만, 신뢰할 수 있고 확장 가능한 추론을 달성하는 것은 여전히 어려운 과제로 남아 있습니다. 사고의 연쇄(Chain-of-Thought, CoT) 프롬프팅이 주류 접근법으로 자리 잡았지만, 기존 방법들은 통제되지 않은 생성, 불충분한 품질, 그리고 제한된 추론 경로 다양성 등의 문제를 겪고 있습니다. 최근 연구들은 코드를 활용하여 실행 가능한 단계에 기반을 둔 CoT를 강화하려는 시도를 보여주지만, 이러한 방법들은 일반적으로 미리 정의된 수학 문제에 국한되어 확장성과 일반화 능력이 제한됩니다. 본 연구에서는 코드 기반 증강을 통해 고품질, 검증 가능, 그리고 다양한 명령어-CoT 추론 데이터의 합성을 자동화하는 새로운 프레임워크인 Caco(Code-Assisted Chain-of-ThOught)를 제안합니다. 기존 연구와 달리, Caco는 먼저 통합 코드 형식으로 기존의 수학 및 프로그래밍 솔루션에 대해 코드 기반 CoT 생성기를 미세 조정한 후, 다양한 추론 흔적을 대규모로 데이터 생성합니다. 특히, 코드 실행과 규칙 기반 필터링을 통해 논리적 정확성과 구조적 다양성을 보장하는 자동화된 검증을 도입하고, 필터링된 출력을 자연어 명령어와 언어 CoT로 역설계하여 과제 적응성을 풍부하게 합니다. 이 폐쇄 루프 프로세스는 실행 가능성이 보장된 추론 데이터의 완전 자동화된, 확장 가능한 합성을 가능하게 합니다. 우리가 생성한 Caco-1.3M 데이터셋에 대한 실험은 Caco로 훈련된 모델이 수학적 추론 벤치마크에서 강력한 경쟁력을 보이며, 기존의 강력한 베이스라인을 능가함을 보여줍니다. 추가 분석은 Caco의 코드 기반 검증과 명령어 다양성이 보이지 않는 과제에 대한 우수한 일반화에 기여함을 보여줍니다. 우리의 연구는 인간의 개입 없이 자체 지속 가능하고 신뢰할 수 있는 추론 시스템을 구축하는 패러다임을 확립합니다.

English

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing methods often suffer from uncontrolled generation, insufficient quality, and limited diversity in reasoning paths. Recent efforts leverage code to enhance CoT by grounding reasoning in executable steps, but such methods are typically constrained to predefined mathematical problems, hindering scalability and generalizability. In this work, we propose Caco (Code-Assisted Chain-of-ThOught), a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data through code-driven augmentation. Unlike prior work, Caco first fine-tunes a code-based CoT generator on existing math and programming solutions in a unified code format, then scales the data generation to a large amount of diverse reasoning traces. Crucially, we introduce automated validation via code execution and rule-based filtering to ensure logical correctness and structural diversity, followed by reverse-engineering filtered outputs into natural language instructions and language CoTs to enrich task adaptability. This closed-loop process enables fully automated, scalable synthesis of reasoning data with guaranteed executability. Experiments on our created Caco-1.3M dataset demonstrate that Caco-trained models achieve strong competitive performance on mathematical reasoning benchmarks, outperforming existing strong baselines. Further analysis reveals that Caco's code-anchored verification and instruction diversity contribute to superior generalization across unseen tasks. Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.

코드 지원 사고 체인 및 모델 추론을 위한 명령어 확장

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

초록

Support