使用正确的嵌入,Transformer 可以进行算术运算。
Transformers Can Do Arithmetic with the Right Embeddings
May 27, 2024
作者: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
cs.AI
摘要
Transformer 在算术任务上表现不佳,这在很大程度上是因为它们无法准确跟踪大量数字中每个数字的确切位置。我们通过为每个数字添加一个嵌入来解决这个问题,该嵌入编码了数字相对于数字开头的位置。除了这些嵌入本身提供的增益外,我们展示了这一修复使得架构修改如输入注入和循环层能够进一步提高性能。
有了位置信息,我们可以研究 Transformer 的逻辑推断能力。它们能否解决比训练数据中更大更复杂的算术问题?我们发现,仅使用单个 GPU 在一天内训练 20 位数字,我们就可以达到最先进的性能,对 100 位数字加法问题的准确率高达 99%。
最后,我们展示这些在数字能力上的提升也带来了其他多步推理任务的改进,包括排序和乘法。
English
The poor performance of transformers on arithmetic tasks seems to stem in
large part from their inability to keep track of the exact position of each
digit inside of a large span of digits. We mend this problem by adding an
embedding to each digit that encodes its position relative to the start of the
number. In addition to the boost these embeddings provide on their own, we show
that this fix enables architectural modifications such as input injection and
recurrent layers to improve performance even further.
With positions resolved, we can study the logical extrapolation ability of
transformers. Can they solve arithmetic problems that are larger and more
complex than those in their training data? We find that training on only 20
digit numbers with a single GPU for one day, we can reach state-of-the-art
performance, achieving up to 99% accuracy on 100 digit addition problems.
Finally, we show that these gains in numeracy also unlock improvements on other
multi-step reasoning tasks including sorting and multiplication.Summary
AI-Generated Summary