具有动态激活组合的大型语言模型的多属性导向

摘要

激活引导方法已被证明能够通过对模型的中间表示进行加性干预，有效地调节语言模型的生成。然而，迄今为止，对这些技术的评估仅限于单一调节属性和合成环境。在本研究中，我们对各种激活引导策略进行了全面评估，突出了最佳参数的属性相关性，以确保在整个生成过程中产生稳健效果。为解决这一问题，我们提出了动态激活组合，这是一种信息论方法，用于调节一个或多个属性在生成过程中的引导强度。我们在多属性引导上的实验表明，我们的方法成功地保持了高度的调节性，同时最大程度地减少了调节对生成流畅性的影响。

English

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

具有动态激活组合的大型语言模型的多属性导向

Multi-property Steering of Large Language Models with Dynamic Activation Composition

摘要

Support