使用適當的嵌入，Transformer 可以執行算術操作

摘要

Transformer 在算術任務上表現不佳，這主要是因為它們無法準確追蹤大量數位中每個數位的確切位置。我們通過為每個數位添加一個嵌入，以編碼其相對於數字開頭的位置來解決這個問題。除了這些嵌入本身提供的提升外，我們展示了這個修正使得架構修改，如輸入注入和循環層，進一步提高了性能。有了位置問題解決後，我們可以研究 Transformer 的邏輯外推能力。它們能否解決比訓練數據中更大更複雜的算術問題？我們發現，僅使用單個 GPU 在一天內訓練 20 位數字，就能達到最先進的性能，100 位數字相加問題的準確率可達 99%。最後，我們展示這些在數字能力上的增益也帶來了對其他多步推理任務的改進，包括排序和乘法。

English

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

使用適當的嵌入，Transformer 可以執行算術操作

Transformers Can Do Arithmetic with the Right Embeddings

摘要

Summary

Support

Support