ChatPaper.aiChatPaper

利用人类反馈对语言、语音和视觉任务进行偏好调整:一项调查

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

September 17, 2024
作者: Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
cs.AI

摘要

偏好调整是将深度生成模型与人类偏好对齐的关键过程。本调查全面概述了偏好调整和整合人类反馈的最新进展。本文分为三个主要部分:1)介绍和基础知识:介绍了强化学习框架、偏好调整任务、模型和跨不同模态的数据集:语言、语音和视觉,以及不同的策略方法;2)对每种偏好调整方法进行深入研究:详细分析了偏好调整中使用的方法;3)应用、讨论和未来方向:探讨了偏好调整在下游任务中的应用,包括不同模态的评估方法,以及未来研究方向的展望。我们的目标是呈现偏好调整和模型对齐的最新方法,增进研究人员和从业者对这一领域的理解。我们希望鼓励在这一领域进一步参与和创新。
English
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area.

Summary

AI-Generated Summary

PDF212November 16, 2024