條件語言政策：用於可控多目標微調的通用框架

摘要

基於獎勵的微調對於將語言策略與預期行為（例如創造力和安全性）保持一致至關重要。在這裡的一個關鍵挑戰是開發可調整的語言模型，以靈活高效地平衡多個（衝突的）目標。本文提出了條件語言策略（CLP），這是一個通用框架，用於在多個目標上微調語言模型。基於多任務訓練和參數高效微調的技術，CLP 可以學習到在推論時有效平衡衝突目標的可調整模型。值得注意的是，這不需要訓練或維護多個模型以實現不同目標之間的平衡。通過大量的實驗和消融，我們展示了 CLP 框架學習到的可調整模型勝過並 Pareto 優於當前多目標微調的最新方法。

English

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP can learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework learns steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.

條件語言政策：用於可控多目標微調的通用框架

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

摘要

Support