ChatPaper.aiChatPaper

Z1:基于代码的高效测试时扩展

Z1: Efficient Test-time Scaling with Code

April 1, 2025
作者: Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang
cs.AI

摘要

大型語言模型(LLMs)能夠通過測試時計算擴展來增強複雜問題的解決能力,但這通常伴隨著更長的上下文和大量的推理代幣成本。本文提出了一種高效的測試時擴展方法,該方法在代碼相關的推理軌跡上訓練LLMs,從而促進其減少多餘的思考代幣,同時保持性能。首先,我們創建了Z1-Code-Reasoning-107K,這是一個精心策劃的數據集,包含簡單和複雜的編碼問題及其短和長的解決軌跡。其次,我們提出了一種新穎的「移位思考窗口」,通過移除上下文分隔標籤(例如,<think>. . . </think>)並限制推理代幣來減輕過度思考的開銷。通過長短軌跡數據的訓練並配備移位思考窗口,我們的模型Z1-7B展示了根據問題複雜度調整其推理水平的能力,並在不同推理任務中表現出高效的測試時擴展,其平均思考代幣約為R1-Distill-Qwen-7B的30%。值得注意的是,僅通過代碼軌跡進行微調的Z1-7B在更廣泛的推理任務上展現了泛化能力(在GPQA Diamond上達到47.5%)。我們對高效推理引導的分析也為未來研究提供了寶貴的見解。
English
Large Language Models (LLMs) can achieve enhanced complex problem-solving through test-time computing scaling, yet this often entails longer contexts and numerous reasoning token costs. In this paper, we propose an efficient test-time scaling method that trains LLMs on code-related reasoning trajectories, facilitating their reduction of excess thinking tokens while maintaining performance. First, we create Z1-Code-Reasoning-107K, a curated dataset of simple and complex coding problems paired with their short and long solution trajectories. Second, we present a novel Shifted Thinking Window to mitigate overthinking overhead by removing context-delimiting tags (e.g., <think>. . . </think>) and capping reasoning tokens. Trained with long and short trajectory data and equipped with Shifted Thinking Window, our model, Z1-7B, demonstrates the ability to adjust its reasoning level as the complexity of problems and exhibits efficient test-time scaling across different reasoning tasks that matches R1-Distill-Qwen-7B performance with about 30% of its average thinking tokens. Notably, fine-tuned with only code trajectories, Z1-7B demonstrates generalization to broader reasoning tasks (47.5% on GPQA Diamond). Our analysis of efficient reasoning elicitation also provides valuable insights for future research.

Summary

AI-Generated Summary

PDF263April 2, 2025