반복된 예제로 발생하는 신흥 속성

초록

우리는 알고리즘으로 생성된 데이터셋을 사용하여 훈련 예제의 반복 횟수에 따른 트랜스포머의 성능을 연구합니다. 최대공약수, 모듈러 곱셈, 행렬 고유값 세 가지 수학 문제에 대해, 일정한 훈련 단계 수에 대해, 반복적인 예제 집합으로 훈련된 모델이 일회용 예제 집합으로 훈련된 모델보다 우수한 성능을 보여줍니다. 또한 작은 무작위 부분집합을 반복적으로 사용하는 두 집합 훈련은 학습 속도와 성능 향상을 제공함을 입증합니다. 이는 반복의 이점이 데이터 다양성의 이점을 능가할 수 있다는 것을 강조합니다. 이러한 데이터셋과 문제는 딥러닝에서의 일반화와 기억의 상호작용을 아직 충분히 이해되지 않은 제어된 환경을 제공합니다.

English

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

반복된 예제로 발생하는 신흥 속성

Emergent properties with repeated examples

초록

Support