使用正确的嵌入，Transformer 可以进行算术运算。

摘要

Transformer 在算术任务上表现不佳，这在很大程度上是因为它们无法准确跟踪大量数字中每个数字的确切位置。我们通过为每个数字添加一个嵌入来解决这个问题，该嵌入编码了数字相对于数字开头的位置。除了这些嵌入本身提供的增益外，我们展示了这一修复使得架构修改如输入注入和循环层能够进一步提高性能。有了位置信息，我们可以研究 Transformer 的逻辑推断能力。它们能否解决比训练数据中更大更复杂的算术问题？我们发现，仅使用单个 GPU 在一天内训练 20 位数字，我们就可以达到最先进的性能，对 100 位数字加法问题的准确率高达 99%。最后，我们展示这些在数字能力上的提升也带来了其他多步推理任务的改进，包括排序和乘法。

English

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

使用正确的嵌入，Transformer 可以进行算术运算。

Transformers Can Do Arithmetic with the Right Embeddings

摘要

Summary

Support

Support