Prism-Δ: Differentiële Deelruimtesturing voor Promptaccentuering in Grote Taalmodellen

Samenvatting

Prompt highlighting stuurt een groot taalmodel aan om gebruikersgespecificeerde tekstgedeeltes prioriteit te geven tijdens het genereren. Een belangrijke uitdaging is het extraheren van stuurrichtingen die het verschil vastleggen tussen relevante en irrelevante contexten, in plaats van gedeelde structurele patronen die bij beide voorkomen. Wij stellen PRISM-Δ voor (Projection-based Relevance-Informed Steering Method), dat het verschil ontleedt tussen positieve en negatieve kruiscovariantiematrices om discriminerende energie te maximaliseren terwijl gedeelde richtingen worden geëlimineerd. Elk aandachtshoofd ontvangt een continue softplus-belanggewicht, waardoor zwakke maar nuttige hoofden met verminderde sterkte kunnen bijdragen. Het framework breidt zich natuurlijk uit naar Value-representaties, waardoor signaal uit de inhoudskanalen wordt vastgelegd dat Key-only-methodes onbenut laten. Over vier benchmarks en vijf modellen heen, evenaart of overtreft PRISM-Δ de beste bestaande methode in 19 van de 20 configuraties, met relatieve winsten tot +10,6%, terwijl de vlotheidskosten van sturing worden gehalveerd. PRISM-Δ schaalt ook naar retrieval in lange contexten, en presteert tot +4,8% beter dan de beste bestaande methode. PRISM-Δ is compatibel met FlashAttention en voegt verwaarloosbare geheugenoverhead toe.

English

Prompt highlighting steers a large language model to prioritize user-specified text spans during generation. A key challenge is extracting steering directions that capture the difference between relevant and irrelevant contexts, rather than shared structural patterns common to both. We propose PRISM-Δ (Projection-based Relevance-Informed Steering Method), which decomposes the difference between positive and negative cross-covariance matrices to maximize discriminative energy while eliminating shared directions. Each attention head receives a continuous softplus importance weight, letting weak-but-useful heads contribute at reduced strength. The framework extends naturally to Value representations, capturing content-channel signal that Key-only methods leave unused. Across four benchmarks and five models, PRISM-Δ matches or exceeds the best existing method on 19 of 20 configurations, with relative gains up to +10.6%, while halving the fluency cost of steering. PRISM-Δ also scales to long-context retrieval, outperforming the best existing method by up to +4.8% relative gain. PRISM-Δ is compatible with FlashAttention and adds negligible memory overhead.

Prism-Δ: Differentiële Deelruimtesturing voor Promptaccentuering in Grote Taalmodellen

Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models

Samenvatting

Support