ChatPaper.aiChatPaper

Transformer 模型通过逐渐增加排名来学习。

Transformers learn through gradual rank increase

June 12, 2023
作者: Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
cs.AI

摘要

我们在transformer模型中发现了增量学习动态,即训练后权重与初始权重之间的差异会逐渐增加。我们在对角权重矩阵和小初始化的简化假设下严格证明了这一现象的发生。我们的实验证实了这一理论,并且还表明即使在没有简化假设的情况下,这种现象也可能在实践中发生。
English
We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.
PDF90December 15, 2024