ChatPaper.aiChatPaper

推理与创造力的权衡:迈向创造力驱动的问题解决之道

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

January 2, 2026
作者: Max Ruiz Luyten, Mihaela van der Schaar
cs.AI

摘要

当前最先进的大型语言模型(LLM)流程依赖于自举推理循环:通过采样多样化的思维链并强化得分最高的路径,主要优化正确性。我们分析了这种设计选择如何对模型在推理路径上的分布崩溃敏感,从而削减语义熵并削弱创造性问题解决能力。为解析这一失效机制,我们提出分布创造性推理(DCR)——一种将训练视为通过解轨迹概率测度的梯度流的统一变分目标。STaR、GRPO、DPO以及熵奖励等方法的损失函数均可视为该目标的特例。该框架产生三项核心成果:(i)多样性衰减定理,描述基于正确性的目标如何导致STaR、GRPO和DPO出现不同的多样性衰减模式;(ii)确保收敛至稳定且多样化策略的设计方案,有效防止分布崩溃;(iii)可在实践中实现的简洁可行方案。DCR由此首次为LLM提供了保持正确性与创造性的原理性解决方案。
English
State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.
PDF101January 6, 2026