逆向工程推理於開放式生成之應用
Reverse-Engineered Reasoning for Open-Ended Generation
September 7, 2025
作者: Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Wei Ye, Tong Yang, Wenhao Huang, Ge Zhang, Fangzhen Lin
cs.AI
摘要
儘管「深度推理」範式在數學等可驗證領域推動了顯著進展,但其在開放性、創造性生成中的應用仍是一個關鍵挑戰。當前主流的兩種推理培養方法——強化學習(RL)與指令蒸餾——在此領域均顯乏力;RL因缺乏明確的獎勵信號及高質量獎勵模型而舉步維艱,而蒸餾法則因成本高昂且受制於教師模型的能力上限而難以施展。為突破這些限制,我們引入了逆向工程推理(REER),這一新範式從根本上轉變了方法論。REER不再通過試錯或模仿來「正向」構建推理過程,而是從已知優質解決方案出發,「逆向」工作,以計算方式揭示可能產生這些解決方案的潛在、逐步深入的推理過程。利用這一可擴展、無梯度的策略,我們精心策劃並開源了DeepWriting-20K,這是一個包含20,000條開放性任務深度推理軌跡的大規模數據集。基於此數據集訓練的模型DeepWriter-8B,不僅超越了強大的開源基準,而且在性能上與GPT-4o和Claude 3.5等領先的專有模型相匹敵,甚至在某些方面更勝一籌。
English
While the ``deep reasoning'' paradigm has spurred significant advances in
verifiable domains like mathematics, its application to open-ended, creative
generation remains a critical challenge. The two dominant methods for
instilling reasoning -- reinforcement learning (RL) and instruction
distillation -- falter in this area; RL struggles with the absence of clear
reward signals and high-quality reward models, while distillation is
prohibitively expensive and capped by the teacher model's capabilities. To
overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a
new paradigm that fundamentally shifts the approach. Instead of building a
reasoning process ``forwards'' through trial-and-error or imitation, REER works
``backwards'' from known-good solutions to computationally discover the latent,
step-by-step deep reasoning process that could have produced them. Using this
scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a
large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks.
Our model, DeepWriter-8B, trained on this data, not only surpasses strong
open-source baselines but also achieves performance competitive with, and at
times superior to, leading proprietary models like GPT-4o and Claude 3.5.