使用動態啟動組合的大型語言模型的多屬性操控
Multi-property Steering of Large Language Models with Dynamic Activation Composition
June 25, 2024
作者: Daniel Scalena, Gabriele Sarti, Malvina Nissim
cs.AI
摘要
激活導向方法已被證明對語言模型生成具有有效的條件作用,通過對模型的中間表示進行加法干預。然而,迄今為止這些技術的評估僅限於單一條件特性和合成環境。在這項工作中,我們對各種激活導向策略進行了全面評估,突顯了最佳參數的特性依賴性,以確保在整個生成過程中具有強大的效果。為解決這個問題,我們提出了動態激活組合,這是一種信息理論方法,用於調節一個或多個特性在生成過程中的導向強度。我們對多特性導向的實驗表明,我們的方法成功地保持了高條件性,同時最大程度地減少了條件對生成流暢性的影響。
English
Activation steering methods were shown to be effective in conditioning
language model generation by additively intervening over models' intermediate
representations. However, the evaluation of these techniques has so far been
limited to single conditioning properties and synthetic settings. In this work,
we conduct a comprehensive evaluation of various activation steering
strategies, highlighting the property-dependent nature of optimal parameters to
ensure a robust effect throughout generation. To address this issue, we propose
Dynamic Activation Composition, an information-theoretic approach to modulate
the steering intensity of one or more properties throughout generation. Our
experiments on multi-property steering show that our method successfully
maintains high conditioning while minimizing the impact of conditioning on
generation fluency.Summary
AI-Generated Summary