繰り返しの例による新興性質

要旨

我々は、アルゴリズムによって生成されたデータセットを用いて、トランスフォーマーの性能を訓練例の反復回数の関数として研究しています。最大公約数、モジュラー乗算、行列固有値の3つの数学問題において、一定の訓練ステップ数に対して、反復使用される例のセットよりも一度だけ使用される例のセットの方が性能が優れていることを示しています。また、2つのセットのトレーニング、つまり、一部の例を繰り返し使用する小さなランダムサブセットと、残りのトレーニングセットで通常のサンプリングを行うことが、より速い学習と優れた性能をもたらすことを示しています。これにより、反復の利点がデータの多様性の利点を上回ることが示されています。これらのデータセットと問題は、ディープラーニングにおける一般化と記憶の相互作用について、まだ十分に理解されていない点を明らかにするための制御された環境を提供しています。

English

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

繰り返しの例による新興性質

Emergent properties with repeated examples

要旨

Support