MixCE: 順方向と逆方向のクロスエントロピーを混合した自己回帰言語モデルの学習

要旨

自己回帰型言語モデルは、モデル分布Qとデータ分布Pとの間のクロスエントロピーを最小化することで学習されます。これは、フォワードクロスエントロピーの最小化、つまり最尤推定（MLE）と等価です。この方法で学習されたモデルは「過度に一般化」し、人間らしくないテキストを生成する可能性があることが観察されています。さらに、リバースクロスエントロピー、すなわちPに対するQのクロスエントロピーは、人間がモデルによって生成されたテキストを評価する方法をよりよく反映していると考えられます。そこで、フォワードクロスエントロピーとリバースクロスエントロピーを混合した目的関数であるMixCEを用いた学習を提案します。この目的関数で学習されたモデルを、合成データ設定（Pが既知の場合）と実データで評価し、複雑なデコード戦略なしに、より良い生成テキストが得られることを示します。私たちのコードとモデルはhttps://github.com/bloomberg/mixce-acl2023で公開されています。

English

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies. Our code and models are publicly available at https://github.com/bloomberg/mixce-acl2023

MixCE: 順方向と逆方向のクロスエントロピーを混合した自己回帰言語モデルの学習

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

要旨

Support