コンテキストにおける学習とオッカムの剃刀

要旨

機械学習の目標は一般化です。No Free Lunch定理は、追加の仮定なしに一般化に対する理論的な保証を得ることはできないと述べていますが、実際には、トレーニングデータを最もよく説明する単純なモデルが最も一般化されることが観察されます。これをオッカムの剃刀と呼ばれる原則としています。単純なモデルの必要性にもかかわらず、現在のほとんどの機械学習アプローチはトレーニングエラーを最小化するだけであり、最良の場合でも正則化やアーキテクチャ設計を通じて間接的に単純さを促進します。ここでは、オッカムの剃刀とインコンテキスト学習との関連性について述べます。これは、Transformerなどの特定のシーケンスモデルが、シーケンス内の過去の観測から推論時に学習するという新たな能力である。特に、インコンテキスト学習をトレーニングするために使用される次のトークン予測損失が、prequential codingと呼ばれるデータ圧縮技術と直接等価であり、この損失を最小化することは、トレーニングエラーと暗黙にコンテキストから学習されたモデルの複雑さの両方を共同で最小化することを意味します。私たちの理論とそれをサポートするために使用する経験的実験は、インコンテキスト学習の規範的な説明を提供するだけでなく、現在のインコンテキスト学習方法の欠点を明らかにし、それらが改善される方法を示唆しています。私たちはコードをhttps://github.com/3rdCore/PrequentialCodeで公開しています。

English

The goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

コンテキストにおける学習とオッカムの剃刀

In-context learning and Occam's razor

要旨

Support