ChatPaper.aiChatPaper

MixCE:通过混合正向和反向交叉熵训练自回归语言模型

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

May 26, 2023
作者: Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg
cs.AI

摘要

自回归语言模型通过最小化模型分布Q相对于数据分布P的交叉熵来进行训练,即最小化前向交叉熵,这等价于最大似然估计(MLE)。我们观察到以这种方式训练的模型可能会出现“过度泛化”的情况,即它们生成非人类风格的文本。此外,我们认为反向交叉熵,即P相对于Q的交叉熵,更能反映人类如何评估模型生成的文本。因此,我们提出了使用MixCE进行学习,这是一个将前向和反向交叉熵混合的目标。我们在已知P的合成数据设置(合成数据)和真实数据上评估了使用这一目标训练的模型,并展示了由此产生的模型生成的文本更好,而无需复杂的解码策略。我们的代码和模型可在以下网址公开获取:https://github.com/bloomberg/mixce-acl2023
English
Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies. Our code and models are publicly available at https://github.com/bloomberg/mixce-acl2023
PDF20December 15, 2024