条件化语言策略：可操控多目标微调的通用框架

摘要

基于奖励的微调对于将语言策略与预期行为（例如创造力和安全性）保持一致至关重要。这里的一个关键挑战是开发可控的语言模型，以灵活高效地权衡多个（相互冲突的）目标。本文提出了条件语言策略（CLP），这是一个用于在多个目标上微调语言模型的通用框架。借鉴多任务训练和参数高效微调的技术，CLP能够学习到在推理时有效权衡冲突目标的可控模型。值得注意的是，这不需要训练或维护多个模型以实现不同目标之间的权衡。通过大量实验和消融研究，我们展示了CLP框架学习到的可控模型胜过并帕累托支配了目前多目标微调的最新方法。

English

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP can learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework learns steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.

条件化语言策略：可操控多目标微调的通用框架

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

摘要

Support