Prism-Δ: 대규모 언어 모델의 프롬프트 강조를 위한 차분 부분공간 스티어링

초록

프롬프트 하이라이팅은 대규모 언어 모델이 생성 과정에서 사용자가 지정한 텍스트 범위를 우선적으로 고려하도록 유도하는 기술입니다. 핵심 과제는 관련 컨텍스트와 무관한 컨텍스트 모두에 공통적으로 나타나는 구조적 패턴이 아닌, 양자 간의 차이를 포착하는 방향성(steeering direction)을 추출하는 것입니다. 본 연구에서는 긍정적 및 부정적 교차 공분산 행렬 간의 차이를 분해하여 공유 방향을 제거함과 동시에 판별 에너지를 최대화하는 PRISM-Δ(Projection-based Relevance-Informed Steering Method)를 제안합니다. 각 어텐션 헤드에는 연속적인 소프트플러스(softplus) 중요도 가중치가 부여되어, 약하지만 유용한 헤드도 낮은 강도로 기여할 수 있습니다. 본 프레임워크는 Value 표현으로 자연스럽게 확장되어 Key-Only 방법들이 활용하지 못하는 콘텐츠 채널 신호를 포착합니다. 4개의 벤치마크와 5개의 모델에서 PRISM-Δ는 20개 구성 중 19개에서 기존 최고 방법을 능가하거나 동등한 성능을 보였으며, 상대적 성능 향상은 최대 +10.6%에 달했으며 하이라이팅으로 인한 유창성 저하는 절반으로 감소했습니다. PRISM-Δ는 장문 컨텍스트 검색으로도 확장 적용 가능하며, 기존 최고 방법 대비 최대 +4.8%의 상대적 성능 향상을 보입니다. PRISM-Δ는 FlashAttention과 호환되며 메모리 오버헤드는 무시할 수준입니다.

English

Prompt highlighting steers a large language model to prioritize user-specified text spans during generation. A key challenge is extracting steering directions that capture the difference between relevant and irrelevant contexts, rather than shared structural patterns common to both. We propose PRISM-Δ (Projection-based Relevance-Informed Steering Method), which decomposes the difference between positive and negative cross-covariance matrices to maximize discriminative energy while eliminating shared directions. Each attention head receives a continuous softplus importance weight, letting weak-but-useful heads contribute at reduced strength. The framework extends naturally to Value representations, capturing content-channel signal that Key-only methods leave unused. Across four benchmarks and five models, PRISM-Δ matches or exceeds the best existing method on 19 of 20 configurations, with relative gains up to +10.6%, while halving the fluency cost of steering. PRISM-Δ also scales to long-context retrieval, outperforming the best existing method by up to +4.8% relative gain. PRISM-Δ is compatible with FlashAttention and adds negligible memory overhead.

Prism-Δ: 대규모 언어 모델의 프롬프트 강조를 위한 차분 부분공간 스티어링

Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models

초록

Support