LayerCake: 大規模言語モデル層内におけるトークン認識型コントラスティブデコーディング

要旨

大規模言語モデル（LLM）は自然言語の理解と生成において優れているが、事実誤認に対して脆弱であり、知識集約型タスクにおける信頼性が制限されている。デコード時の戦略は、トレーニングを必要としない効率的な解決策を提供するが、既存の手法では通常、トークンレベルとレイヤーレベルの信号を個別に扱い、それらの間の連動するダイナミクスを見落としている。本研究では、特定のトークンタイプを最も影響力のあるトランスフォーマーレイヤーと整合させることで、事実に基づく生成を改善するトークン認識型のレイヤー局所化コントラストデコード手法を提案する。経験的注意分析を通じて、句読点トークンが初期レイヤーで支配的な注意を受け、概念トークンが中間レイヤーで意味推論を支配するという2つの主要なパターンを特定した。これらのトークンタイプに対する注意をそれぞれの深さで選択的に抑制することで、制御された事実劣化を誘導し、最終的な事実デコードを導くためのコントラスト信号を導出する。本手法は追加のトレーニングやモデル変更を必要とせず、実験により、複数のLLMと様々なベンチマークにおいて一貫して事実性が向上することが示された。

English

Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors, limiting their reliability in knowledge-intensive tasks. While decoding-time strategies provide a promising efficient solution without training, existing methods typically treat token-level and layer-level signals in isolation, overlooking the joint dynamics between them. In this work, we introduce a token-aware, layer-localized contrastive decoding method that aligns specific token types with their most influential transformer layers to improve factual generation. Through empirical attention analysis, we identify two key patterns: punctuation tokens receive dominant attention in early layers, while conceptual tokens govern semantic reasoning in intermediate layers. By selectively suppressing attention to these token types at their respective depths, we achieve the induction of controlled factual degradation and derive contrastive signals to guide the final factual decoding. Our method requires no additional training or model modification, and experiments demonstrate that our method consistently improves factuality across multiple LLMs and various benchmarks.

LayerCake: 大規模言語モデル層内におけるトークン認識型コントラスティブデコーディング

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

要旨

Support