ChatPaper.aiChatPaper

C^2DLM:因果概念引导的扩散大语言模型

C^2DLM: Causal Concept-Guided Diffusion Large Language Models

November 27, 2025
作者: Kairong Han, Nuanqiao Shan, Ziyu Zhao, Zijing Hu, Xinpeng Dong, Junjian Ye, Lujia Pan, Fei Wu, Kun Kuang
cs.AI

摘要

自回归语言模型与扩散语言模型构成大语言模型的两大主流范式,但二者均存在推理能力不足的缺陷。人类推理本质上依赖于因果知识与思维,这种特性在自然语言中得以体现。然而在自回归范式下,语言被建模为下一词元预测(严格遵循从左到右的词元级顺序),而自然语言本身具有更灵活的因果结构。在扩散范式下,注意力机制采用全连接方式,完全忽略了因果顺序。为填补这一空白,我们提出\textbf{因}果\textbf{概}念引导的\textbf{扩}散\textbf{语}言\textbf{模}型(C²DLM)。该模型从扩散语言模型的全连接注意力机制出发,首先通过教师模型获取概念级因果图,进而显式引导注意力学习概念间的因果关系。通过聚焦因果关系并规避涉及因果逆变的困难子目标干扰,C²DLM在COT-OrderPerturb任务中实现12%的性能提升与约3.2倍训练加速,并在六项下游推理任务中平均增益达1.31%。更多细节详见代码库~https://github.com/Kairong-Han/C-2-DLM{此处}。
English
Autoregressive (AR) language models and Diffusion Language Models (DLMs) constitute the two principal paradigms of large language models. However, both paradigms suffer from insufficient reasoning capabilities. Human reasoning inherently relies on causal knowledge and thought, which are reflected in natural language. But in the AR paradigm, language is modeled as next token prediction (a strictly left-to-right, token-by-token order), whereas natural language itself exhibits more flexible causal structures. In the DLM paradigm, the attention mechanism is fully connected, which entirely disregards causal order. To fill this gap, we propose a \textbf{C}ausal \textbf{C}oncept-Guided \textbf{D}iffusion \textbf{L}anguage \textbf{M}odel (C^2DLM). Starting from DLM's fully connected attention, C^2DLM first obtains a concept-level causal graph from the teacher model, and then explicitly guides attention to learn causal relationships between concepts. By focusing on causal relationships and avoiding interference from difficult subgoals involving causal inversion, C^2DLM improves 12\% with about 3.2 times training speedup in the COT-OrderPerturb task, and achieves an average gain of 1.31\% across six downstream reasoning tasks. More details in the repository ~https://github.com/Kairong-Han/C-2-DLM{here}.
PDF11December 4, 2025