使用適當的嵌入,Transformer 可以執行算術操作
Transformers Can Do Arithmetic with the Right Embeddings
May 27, 2024
作者: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
cs.AI
摘要
Transformer 在算術任務上表現不佳,這主要是因為它們無法準確追蹤大量數位中每個數位的確切位置。我們通過為每個數位添加一個嵌入,以編碼其相對於數字開頭的位置來解決這個問題。除了這些嵌入本身提供的提升外,我們展示了這個修正使得架構修改,如輸入注入和循環層,進一步提高了性能。
有了位置問題解決後,我們可以研究 Transformer 的邏輯外推能力。它們能否解決比訓練數據中更大更複雜的算術問題?我們發現,僅使用單個 GPU 在一天內訓練 20 位數字,就能達到最先進的性能,100 位數字相加問題的準確率可達 99%。
最後,我們展示這些在數字能力上的增益也帶來了對其他多步推理任務的改進,包括排序和乘法。
English
The poor performance of transformers on arithmetic tasks seems to stem in
large part from their inability to keep track of the exact position of each
digit inside of a large span of digits. We mend this problem by adding an
embedding to each digit that encodes its position relative to the start of the
number. In addition to the boost these embeddings provide on their own, we show
that this fix enables architectural modifications such as input injection and
recurrent layers to improve performance even further.
With positions resolved, we can study the logical extrapolation ability of
transformers. Can they solve arithmetic problems that are larger and more
complex than those in their training data? We find that training on only 20
digit numbers with a single GPU for one day, we can reach state-of-the-art
performance, achieving up to 99% accuracy on 100 digit addition problems.
Finally, we show that these gains in numeracy also unlock improvements on other
multi-step reasoning tasks including sorting and multiplication.Summary
AI-Generated Summary