C^2DLM: 因果概念誘導拡散大規模言語モデル

要旨

自己回帰（AR）言語モデルと拡散言語モデル（DLM）は、大規模言語モデルの二大パラダイムを構成する。しかしながら、両パラダイムは推論能力の不足に悩まされている。人間の推論は本質的に因果的知識と思考に依存しており、これは自然言語に反映されている。しかしARパラダイムでは、言語は次トークン予測（厳密な左から右へ、トークンバイトークンの順序）としてモデル化されるのに対し、自然言語自体はより柔軟な因果的構造を示す。DLMパラダイムでは、注意機構が完全接続されており、因果的順序が完全に無視されている。このギャップを埋めるため、我々は**C**ausal **C**oncept-Guided **D**iffusion **L**anguage **M**odel（C^2DLM）を提案する。DLMの完全接続注意機構を出発点として、C^2DLMはまず教師モデルから概念レベルの因果グラフを取得し、その後、注意機構を明示的に導くことで概念間の因果関係を学習する。因果関係に焦点を当て、因果逆転を含む困難なサブゴールからの干渉を回避することにより、C^2DLMはCOT-OrderPerturbタスクにおいて約3.2倍の学習速度向上と12%の性能向上を達成し、6つの下流推論タスクにおいて平均1.31%のゲインを得た。詳細はリポジトリ ~https://github.com/Kairong-Han/C-2-DLM{こちら} を参照されたい。

English

Autoregressive (AR) language models and Diffusion Language Models (DLMs) constitute the two principal paradigms of large language models. However, both paradigms suffer from insufficient reasoning capabilities. Human reasoning inherently relies on causal knowledge and thought, which are reflected in natural language. But in the AR paradigm, language is modeled as next token prediction (a strictly left-to-right, token-by-token order), whereas natural language itself exhibits more flexible causal structures. In the DLM paradigm, the attention mechanism is fully connected, which entirely disregards causal order. To fill this gap, we propose a \textbf{C}ausal \textbf{C}oncept-Guided \textbf{D}iffusion \textbf{L}anguage \textbf{M}odel (C^2DLM). Starting from DLM's fully connected attention, C^2DLM first obtains a concept-level causal graph from the teacher model, and then explicitly guides attention to learn causal relationships between concepts. By focusing on causal relationships and avoiding interference from difficult subgoals involving causal inversion, C^2DLM improves 12\% with about 3.2 times training speedup in the COT-OrderPerturb task, and achieves an average gain of 1.31\% across six downstream reasoning tasks. More details in the repository ~https://github.com/Kairong-Han/C-2-DLM{here}.

C^2DLM: 因果概念誘導拡散大規模言語モデル

C^2DLM: Causal Concept-Guided Diffusion Large Language Models

要旨

Support