算術Transformerにおける長さ一般化

要旨

我々は、トランスフォーマーが2つの課題にどのように対処するかを検証した：基本的な整数演算の学習と、訓練中に見たよりも長い系列への一般化である。相対的位置埋め込みが、加算のような単純なタスクにおいて長さの一般化を可能にすることがわかった：5桁の数字で訓練されたモデルが15桁の加算を実行できる。しかし、この方法は乗算では失敗し、我々は訓練セットプライミングを提案する：訓練セットにいくつか（10から50）の長い系列を追加する。プライミングにより、5桁×3桁の乗算で訓練されたモデルが35×3の例に一般化できることを示す。また、モデルが異なる一般化長に対してプライミング可能であり、プライミングサンプルサイズが訓練セットサイズの対数としてスケールすることを示す。最後に、演算を超えたプライミングの潜在的な応用について議論する。

English

We examine how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training. We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on 5-digit numbers can perform 15-digit sums. However, this method fails for multiplication, and we propose train set priming: adding a few (10 to 50) long sequences to the training set. We show that priming allows models trained on 5-digit times 3-digit multiplications to generalize to 35times 3 examples. We also show that models can be primed for different generalization lengths, and that the priming sample size scales as the logarithm of the training set size. Finally, we discuss potential applications of priming beyond arithmetic.

算術Transformerにおける長さ一般化

Length Generalization in Arithmetic Transformers

要旨

Support