ICL密码：通过替换密码量化上下文学习中的“学习”能力

摘要

近期研究表明，上下文学习（In-Context Learning, ICL）以双重模式运作，即任务检索（从预训练中回忆已学模式）与任务学习（通过演示在推理时进行“学习”）。然而，区分这两种模式仍是一个具有挑战性的目标。我们引入了ICL CIPHERS，这是一类基于经典密码学中替换密码的任务重构方法。在此方法中，上下文输入中的部分词汇被替换为其他（无关的）词汇，使得英文句子对人类而言难以理解。但设计上，这种替换存在一个潜在的固定模式，使其可逆。这种双射（可逆）密码确保了任务在某种抽象意义上仍是一个定义明确的任务，尽管经过了变换。一个有趣的问题是，大语言模型（LLMs）能否通过双射映射解决ICL CIPHERS，这需要破译潜在的密码。我们展示出，相较于非双射（不可逆）基线，LLMs在解决具有双射映射的ICL CIPHERS时表现更优，为量化ICL中的“学习”提供了一种新颖途径。尽管这一差距微小，但在四个数据集和六个模型上均保持一致。最后，我们探究了LLMs的内部表征，并发现了它们能够解码加密输入的证据。

English

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

ICL密码：通过替换密码量化上下文学习中的“学习”能力

ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

摘要

Support