超级写手：基于大语言模型的反思驱动长文本生成

摘要

长文本生成对于大型语言模型（LLMs）而言仍是一项重大挑战，尤其是在保持连贯性、确保逻辑一致性以及随着序列长度增加维持文本质量方面。为应对这些局限，我们提出了SuperWriter-Agent，一个基于代理的框架，旨在提升长文本生成的质量与一致性。SuperWriter-Agent通过引入明确的规划与精炼阶段，将结构化思维过程融入生成流程，引导模型遵循更为审慎且认知基础扎实的创作过程，类似于专业作家的写作方式。基于此框架，我们构建了一个监督微调数据集，用于训练一个7B参数的SuperWriter-LM模型。此外，我们开发了一种分层直接偏好优化（DPO）方法，利用蒙特卡洛树搜索（MCTS）传播最终质量评估，并据此优化每一步生成。在多样化的基准测试中，实证结果表明，SuperWriter-LM实现了最先进的性能，在自动评估和人工评估中均超越了更大规模的基线模型。同时，全面的消融研究验证了分层DPO的有效性，并强调了引入结构化思维步骤对于提升长文本生成质量的价值。

English

Long-form text generation remains a significant challenge for large language models (LLMs), particularly in maintaining coherence, ensuring logical consistency, and preserving text quality as sequence length increases. To address these limitations, we propose SuperWriter-Agent, an agent-based framework designed to enhance the quality and consistency of long-form text generation. SuperWriter-Agent introduces explicit structured thinking-through planning and refinement stages into the generation pipeline, guiding the model to follow a more deliberate and cognitively grounded process akin to that of a professional writer. Based on this framework, we construct a supervised fine-tuning dataset to train a 7B SuperWriter-LM. We further develop a hierarchical Direct Preference Optimization (DPO) procedure that uses Monte Carlo Tree Search (MCTS) to propagate final quality assessments and optimize each generation step accordingly. Empirical results across diverse benchmarks demonstrate that SuperWriter-LM achieves state-of-the-art performance, surpassing even larger-scale baseline models in both automatic evaluation and human evaluation. Furthermore, comprehensive ablation studies demonstrate the effectiveness of hierarchical DPO and underscore the value of incorporating structured thinking steps to improve the quality of long-form text generation.

超级写手：基于大语言模型的反思驱动长文本生成

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

摘要

Support