적절한 임베딩을 사용하면 트랜스포머가 산술 연산을 수행할 수 있다

초록

트랜스포머가 산술 작업에서 보이는 낮은 성능은 주로 큰 자릿수 범위 내에서 각 숫자의 정확한 위치를 추적하지 못하는 데서 비롯된 것으로 보입니다. 우리는 이 문제를 해결하기 위해 각 숫자에 해당 숫자가 수의 시작점에서 상대적으로 어디에 위치하는지를 인코딩하는 임베딩을 추가했습니다. 이러한 임베딩 자체가 제공하는 성능 향상 외에도, 이 수정이 입력 주입(input injection) 및 순환 레이어(recurrent layers)와 같은 아키텍처적 변경을 가능하게 하여 성능을 더욱 개선할 수 있음을 보여줍니다. 위치 문제가 해결되면, 트랜스포머의 논리적 외삽 능력을 연구할 수 있습니다. 트랜스포머가 훈련 데이터보다 더 크고 복잡한 산술 문제를 해결할 수 있을까요? 우리는 단일 GPU로 20자리 숫자에 대해 하루 동안 훈련하는 것만으로도 최첨단 성능에 도달할 수 있으며, 100자리 덧셈 문제에서 최대 99%의 정확도를 달성할 수 있음을 확인했습니다. 마지막으로, 이러한 수리 능력의 향상이 정렬 및 곱셈과 같은 다른 다단계 추론 작업에서도 개선을 이끌어낼 수 있음을 보여줍니다.

English

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

적절한 임베딩을 사용하면 트랜스포머가 산술 연산을 수행할 수 있다

Transformers Can Do Arithmetic with the Right Embeddings

초록

Summary

Support

Support