ICL CIPHERS: Cuantificación del "Aprendizaje" en el Aprendizaje en Contexto mediante Cifrados por Sustitución

Resumen

Trabajos recientes han sugerido que el Aprendizaje en Contexto (ICL, por sus siglas en inglés) opera en dos modos: recuperación de tareas (recordar patrones aprendidos durante el preentrenamiento) y aprendizaje de tareas ("aprendizaje" en tiempo de inferencia a partir de demostraciones). Sin embargo, separar estos dos modos sigue siendo un objetivo desafiante. Introducimos ICL CIPHERS, una clase de reformulaciones de tareas basadas en cifrados de sustitución tomados de la criptografía clásica. En este enfoque, un subconjunto de tokens en las entradas en contexto se sustituye por otros tokens (irrelevantes), lo que hace que las oraciones en inglés sean menos comprensibles para el ojo humano. Sin embargo, por diseño, existe un patrón latente y fijo en esta sustitución, lo que la hace reversible. Este cifrado biyectivo (reversible) asegura que la tarea siga siendo una tarea bien definida en un sentido abstracto, a pesar de las transformaciones. Es una pregunta interesante si los Modelos de Lenguaje de Gran Escala (LLMs) pueden resolver ICL CIPHERS con un mapeo BIYECTIVO, lo que requiere descifrar el cifrado latente. Demostramos que los LLMs son mejores resolviendo ICL CIPHERS con mapeos BIYECTIVOS que la línea base NO BIYECTIVA (irreversible), proporcionando un enfoque novedoso para cuantificar el "aprendizaje" en ICL. Aunque esta brecha es pequeña, es consistente en cuatro conjuntos de datos y seis modelos. Finalmente, examinamos las representaciones internas de los LLMs e identificamos evidencia de su capacidad para decodificar las entradas cifradas.

English

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

ICL CIPHERS: Cuantificación del "Aprendizaje" en el Aprendizaje en Contexto mediante Cifrados por Sustitución

ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

Resumen

Support