逆向推理驱动的开放式生成
Reverse-Engineered Reasoning for Open-Ended Generation
September 7, 2025
作者: Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Wei Ye, Tong Yang, Wenhao Huang, Ge Zhang, Fangzhen Lin
cs.AI
摘要
尽管“深度推理”范式在数学等可验证领域推动了显著进展,但其在开放性和创造性生成任务中的应用仍面临重大挑战。当前主流的两种推理培养方法——强化学习(RL)和指令蒸馏——在此领域均显不足:RL因缺乏明确的奖励信号和高质量的奖励模型而受限,而蒸馏法则因成本高昂且受限于教师模型的能力而难以扩展。为突破这些限制,我们提出了逆向工程推理(REER),这一新范式从根本上转变了方法路径。不同于通过试错或模仿正向构建推理过程,REER从已知的优秀解决方案出发,逆向计算发现可能产生这些解决方案的潜在、逐步的深度推理过程。利用这一可扩展、无梯度的策略,我们精心整理并开源了DeepWriting-20K,一个包含20,000条开放性任务深度推理轨迹的大规模数据集。基于此数据训练的DeepWriter-8B模型,不仅超越了强大的开源基线,还在性能上与GPT-4o和Claude 3.5等领先的专有模型相媲美,甚至在某些方面更胜一筹。
English
While the ``deep reasoning'' paradigm has spurred significant advances in
verifiable domains like mathematics, its application to open-ended, creative
generation remains a critical challenge. The two dominant methods for
instilling reasoning -- reinforcement learning (RL) and instruction
distillation -- falter in this area; RL struggles with the absence of clear
reward signals and high-quality reward models, while distillation is
prohibitively expensive and capped by the teacher model's capabilities. To
overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a
new paradigm that fundamentally shifts the approach. Instead of building a
reasoning process ``forwards'' through trial-and-error or imitation, REER works
``backwards'' from known-good solutions to computationally discover the latent,
step-by-step deep reasoning process that could have produced them. Using this
scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a
large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks.
Our model, DeepWriter-8B, trained on this data, not only surpasses strong
open-source baselines but also achieves performance competitive with, and at
times superior to, leading proprietary models like GPT-4o and Claude 3.5.