大規模言語モデルを訓練して、連続的な潜在空間で推論する

要旨

大規模言語モデル（LLM）は、「言語空間」でのみ推論を行うことが制限されており、通常、複雑な推論問題を解決するために思考の連鎖（CoT）で推論プロセスを表現します。しかし、我々は、言語空間が常に推論に最適であるとは限らないと主張します。例えば、ほとんどの単語トークンは主にテキストの一貫性のためであり、推論には必須ではない一方で、一部の重要なトークンは複雑な計画が必要で、LLMには大きな課題を提起します。自然言語の代わりに制約のない潜在空間でLLMの推論能力を探るために、新しいパラダイムCoconut（Chain of Continuous Thought）を導入します。LLMの最後の隠れた状態を推論状態の表現（「連続的な思考」と呼ばれる）として利用します。これを単語トークンにデコードする代わりに、この状態を連続空間で直接LLMに戻し、次の入力埋め込みとして供給します。実験結果は、Coconutがいくつかの推論タスクでLLMを効果的に補完できることを示しています。この新しい潜在推論パラダイムは、新たな高度な推論パターンをもたらします。連続的な思考は複数の代替次の推論ステップを符号化でき、モデルがCoTのように単一の決定軌道に早期にコミットするのではなく、問題を解決するために幅優先探索（BFS）を実行できます。Coconutは、計画中に大幅なバックトラッキングが必要な特定の論理推論タスクで、推論中の思考トークンが少なく、CoTを上回る性能を発揮します。これらの知見は、潜在的な推論の可能性を示し、将来の研究に有益な示唆を提供しています。

English

Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.

大規模言語モデルを訓練して、連続的な潜在空間で推論する

Training Large Language Models to Reason in a Continuous Latent Space

要旨

Support