Transformerは適切な埋め込みを用いることで算術演算を実行可能である

要旨

Transformerが算術タスクで低い性能を示す主な原因は、多数の桁の中での各数字の正確な位置を把握できないことにあるようです。この問題を解決するため、各数字にその数字が数値の先頭からどれだけ離れているかをエンコードする埋め込みを追加しました。この埋め込み自体が性能向上に寄与するだけでなく、この修正により入力注入や再帰層といったアーキテクチャの変更がさらなる性能向上をもたらすことも示しました。位置情報が解決されたことで、Transformerの論理的外挿能力を研究することができます。彼らは、訓練データよりも大きく複雑な算術問題を解くことができるのでしょうか？わずか1日間の単一GPUでの20桁の数値に対する訓練で、100桁の加算問題において99%の精度を達成し、最先端の性能に到達できることがわかりました。最後に、この数値処理能力の向上が、ソートや乗算といった他の多段階推論タスクの改善も可能にすることを示しました。

English

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

Transformerは適切な埋め込みを用いることで算術演算を実行可能である

Transformers Can Do Arithmetic with the Right Embeddings

要旨

Support