MixCE: 순방향 및 역방향 교차 엔트로피를 혼합하여 자기회귀 언어 모델 학습하기

초록

자기회귀 언어 모델은 데이터 분포 P에 대한 모델 분포 Q의 교차 엔트로피를 최소화함으로써 학습됩니다. 즉, 순방향 교차 엔트로피를 최소화하는 것으로, 이는 최대 가능도 추정(MLE)과 동일합니다. 우리는 이러한 방식으로 학습된 모델이 "과도하게 일반화"되어 비인간적인 텍스트를 생성할 수 있다는 것을 관찰했습니다. 더욱이, 우리는 역방향 교차 엔트로피, 즉 Q에 대한 P의 교차 엔트로피가 인간이 모델이 생성한 텍스트를 평가하는 방식에 더 잘 부합한다고 믿습니다. 따라서 우리는 순방향과 역방향 교차 엔트로피를 혼합한 목적 함수인 MixCE를 사용한 학습을 제안합니다. 우리는 이 목적 함수로 학습된 모델을 합성 데이터 설정(P가 알려진 경우)과 실제 데이터에서 평가하고, 복잡한 디코딩 전략 없이도 더 나은 텍스트를 생성하는 모델을 보여줍니다. 우리의 코드와 모델은 https://github.com/bloomberg/mixce-acl2023에서 공개되어 있습니다.

English

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies. Our code and models are publicly available at https://github.com/bloomberg/mixce-acl2023

MixCE: 순방향 및 역방향 교차 엔트로피를 혼합하여 자기회귀 언어 모델 학습하기

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

초록

Support