ChatPaper.aiChatPaper

在獎勵模型、參數更新和上下文提示之間的轉換中

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

June 24, 2024
作者: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi
cs.AI

摘要

儘管預先訓練的大型語言模型(LLMs)具有一般性能力,但仍需要進一步調整以更好地滿足實際應用需求。本文展示了三種流行且獨特的調整工具:參數更新、獎勵建模和上下文提示的互換性。這種互換性建立了一個三角形框架,具有六個轉換方向,每個方向都促進各種應用。我們的工作提供了一個整體觀,統一了眾多現有研究,並提出了潛在的研究方向。我們將我們的工作視為未來LLMs研究的有用路線圖。
English
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.

Summary

AI-Generated Summary

PDF121November 29, 2024