ICL CIPHERS: 置換暗号を用いたインコンテクスト学習における「学習」の定量化

要旨

最近の研究では、In-Context Learning (ICL) が二つのモード、すなわちタスク検索（事前学習から学んだパターンを想起すること）とタスク学習（推論時のデモンストレーションからの「学習」）で動作することが示唆されています。しかし、これら二つのモードを分離することは依然として難しい目標です。本研究では、古典的な暗号学から借用した置換暗号に基づくタスク再定式化のクラスであるICL CIPHERSを紹介します。このアプローチでは、コンテキスト内入力のトークンの一部が他の（無関係な）トークンに置換され、英語の文が人間の目には理解しにくくなります。しかし、設計上、この置換には潜在的な固定パターンがあり、可逆的です。この全単射（可逆的）な暗号により、変換が行われても、タスクはある抽象的な意味で明確に定義されたタスクのままです。LLMが全単射マッピングを持つICL CIPHERSを解くことができるかどうかは興味深い疑問です。我々は、LLMが非全単射（不可逆的）なベースラインよりも全単射マッピングを持つICL CIPHERSを解くのに優れていることを示し、ICLにおける「学習」を定量化する新しいアプローチを提供します。この差は小さいものの、4つのデータセットと6つのモデルにわたって一貫しています。最後に、LLMの内部表現を調査し、暗号化された入力を解読する能力の証拠を特定します。

English

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

ICL CIPHERS: 置換暗号を用いたインコンテクスト学習における「学習」の定量化

ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

要旨

Support