LayerCake: 대규모 언어 모델 내 계층 간 토큰 인식 대조 디코딩

초록

대규모 언어 모델(LLMs)은 자연어 이해 및 생성에 탁월한 성능을 보이지만, 사실 오류에 취약하여 지식 집약적 작업에서의 신뢰성이 제한적입니다. 디코딩 시점 전략은 추가 학습 없이도 효율적인 해결책을 제공하지만, 기존 방법들은 일반적으로 토큰 수준과 계층 수준 신호를 독립적으로 처리하며 이들 간의 상호작용을 간과합니다. 본 연구에서는 특정 토큰 유형을 가장 영향력 있는 트랜스포머 계층과 정렬하여 사실적 생성을 개선하는 토큰 인식, 계층 국소화 대조 디코딩 방법을 제안합니다. 실증적 주의력 분석을 통해 두 가지 주요 패턴을 확인했습니다: 구두점 토큰은 초기 계층에서 지배적인 주의를 받는 반면, 개념 토큰은 중간 계층에서 의미론적 추론을 주도합니다. 이러한 토큰 유형에 대한 주의력을 각각의 깊이에서 선택적으로 억제함으로써, 통제된 사실적 저하를 유도하고 최종 사실 디코딩을 안내하는 대조 신호를 도출합니다. 우리의 방법은 추가 학습이나 모델 수정이 필요하지 않으며, 실험 결과는 여러 LLMs와 다양한 벤치마크에서 사실성을 지속적으로 개선함을 보여줍니다.

English

Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors, limiting their reliability in knowledge-intensive tasks. While decoding-time strategies provide a promising efficient solution without training, existing methods typically treat token-level and layer-level signals in isolation, overlooking the joint dynamics between them. In this work, we introduce a token-aware, layer-localized contrastive decoding method that aligns specific token types with their most influential transformer layers to improve factual generation. Through empirical attention analysis, we identify two key patterns: punctuation tokens receive dominant attention in early layers, while conceptual tokens govern semantic reasoning in intermediate layers. By selectively suppressing attention to these token types at their respective depths, we achieve the induction of controlled factual degradation and derive contrastive signals to guide the final factual decoding. Our method requires no additional training or model modification, and experiments demonstrate that our method consistently improves factuality across multiple LLMs and various benchmarks.

LayerCake: 대규모 언어 모델 내 계층 간 토큰 인식 대조 디코딩

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

초록

Support