ChatPaper.aiChatPaper

领域迁移下偏好调优泛化性与多样性的实证研究

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

January 9, 2026
作者: Constantinos Karouzos, Xingwei Tan, Nikolaos Aletras
cs.AI

摘要

偏好调优通过优化显式偏好信号而非仅依赖似然度,使预训练语言模型与人类对质量、帮助性或安全性的判断保持一致。已有研究表明,当在训练领域外进行评估时,偏好调优会降低模型性能并削弱其帮助性。然而,适应策略在多大程度上能缓解这种领域偏移仍属未知。我们通过开展领域偏移下对齐泛化能力的系统化研究来解决这一挑战。在文本摘要和问答帮助性任务中,我们比较了五种主流对齐目标及多种从源领域到目标领域的适应策略(包括目标域监督微调和伪标注方法)。研究结果表明,不同对齐目标在领域偏移下的泛化能力存在系统性差异。我们发现基于伪标注的适应策略能显著减轻领域偏移带来的性能退化。
English
Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior work has shown that preference-tuning degrades performance and reduces helpfulness when evaluated outside the training domain. However, the extent to which adaptation strategies mitigate this domain shift remains unexplored. We address this challenge by conducting a comprehensive and systematic study of alignment generalization under domain shift. We compare five popular alignment objectives and various adaptation strategies from source to target, including target-domain supervised fine-tuning and pseudo-labeling, across summarization and question-answering helpfulness tasks. Our findings reveal systematic differences in generalization across alignment objectives under domain shift. We show that adaptation strategies based on pseudo-labeling can substantially reduce domain-shift degradation
PDF181January 13, 2026