산술 트랜스포머에서의 길이 일반화

초록

우리는 트랜스포머가 두 가지 도전 과제, 즉 기본 정수 산술을 학습하는 것과 훈련 중에 접한 것보다 더 긴 시퀀스로 일반화하는 것에 어떻게 대처하는지 조사한다. 우리는 상대적 위치 임베딩이 덧셈과 같은 간단한 작업에서 길이 일반화를 가능하게 한다는 것을 발견했다: 5자리 숫자로 훈련된 모델이 15자리 합계를 수행할 수 있다. 그러나 이 방법은 곱셈에는 실패하며, 우리는 훈련 세트 프라이밍을 제안한다: 훈련 세트에 몇 개(10~50개)의 긴 시퀀스를 추가하는 것이다. 우리는 프라이밍이 5자리 × 3자리 곱셈으로 훈련된 모델이 35자리 × 3자리 예제로 일반화할 수 있게 한다는 것을 보여준다. 또한 모델이 다른 일반화 길이에 대해 프라이밍될 수 있으며, 프라이밍 샘플 크기가 훈련 세트 크기의 로그로 스케일링된다는 것을 보여준다. 마지막으로, 우리는 산술을 넘어서는 프라이밍의 잠재적 응용에 대해 논의한다.

English

We examine how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training. We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on 5-digit numbers can perform 15-digit sums. However, this method fails for multiplication, and we propose train set priming: adding a few (10 to 50) long sequences to the training set. We show that priming allows models trained on 5-digit times 3-digit multiplications to generalize to 35times 3 examples. We also show that models can be primed for different generalization lengths, and that the priming sample size scales as the logarithm of the training set size. Finally, we discuss potential applications of priming beyond arithmetic.

산술 트랜스포머에서의 길이 일반화

Length Generalization in Arithmetic Transformers

초록

Support