更優嵌入與耦合Adam優化
Better Embeddings with Coupled Adam
February 12, 2025
作者: Felix Stollenwerk, Tobias Stollenwerk
cs.AI
摘要
儘管大型語言模型(LLMs)展現了卓越的能力,但其學習到的詞語表徵卻呈現出一個不良且尚不為人充分理解的特徵——各向異性。本文提出,Adam優化器中的第二動量是導致嵌入向量各向異性的原因之一,並建議採用一種名為耦合Adam(Coupled Adam)的改進優化器來緩解這一問題。實驗結果表明,耦合Adam顯著提升了嵌入向量的質量,同時在足夠大的數據集上,也能帶來更好的上游與下游任務性能。
English
Despite their remarkable capabilities, LLMs learn word representations that
exhibit the undesirable yet poorly understood feature of anisotropy. In this
paper, we argue that the second moment in Adam is a cause of anisotropic
embeddings, and suggest a modified optimizer called Coupled Adam to mitigate
the problem. Our experiments demonstrate that Coupled Adam significantly
improves the quality of embeddings, while also leading to better upstream and
downstream performance on large enough datasets.Summary
AI-Generated Summary