Transformer 透過逐漸增加的排名學習。
Transformers learn through gradual rank increase
June 12, 2023
作者: Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
cs.AI
摘要
我們在transformers中識別到增量學習動態,其中訓練後的權重與初始權重之間的差異在排名上逐漸增加。我們在對角權重矩陣和小初始化的簡化假設下嚴謹地證明了這一點。我們的實驗支持這一理論,並且還表明即使沒有這些簡化假設,這種現象也可能在實踐中發生。
English
We identify incremental learning dynamics in transformers, where the
difference between trained and initial weights progressively increases in rank.
We rigorously prove this occurs under the simplifying assumptions of diagonal
weight matrices and small initialization. Our experiments support the theory
and also show that phenomenon can occur in practice without the simplifying
assumptions.