ICL 암호: 대체 암호를 통해 인-컨텍스트 러닝의 "학습"을 정량화하기

초록

최근 연구들은 인컨텍스트 러닝(In-Context Learning, ICL)이 이중 모드, 즉 태스크 검색(사전 훈련에서 학습된 패턴을 기억)과 태스크 학습(데모를 통한 추론 시점 '학습')으로 작동한다고 제안했습니다. 그러나 이 두 모드를 분리하는 것은 여전히 도전적인 목표로 남아 있습니다. 우리는 고전 암호학에서 차용한 치환 암호(substitution cipher)를 기반으로 한 태스크 재구성 클래스인 ICL CIPHERS를 소개합니다. 이 접근법에서는 인컨텍스트 입력의 토큰 일부를 다른(무관한) 토큰으로 치환하여 영어 문장을 인간의 눈에 덜 이해 가능하게 만듭니다. 그러나 설계상, 이 치환에는 잠재적이고 고정된 패턴이 존재하여 이를 역변환할 수 있습니다. 이 전단사적(역변환 가능) 암호는 변형에도 불구하고 어떤 추상적 의미에서 태스크가 잘 정의된 태스크로 남아 있음을 보장합니다. LLM(Large Language Model)이 잠재적 암호를 해독해야 하는 전단사적 매핑을 사용하여 ICL CIPHERS를 해결할 수 있는지 여부는 흥미로운 질문입니다. 우리는 LLM이 비전단사적(역변환 불가능) 기준선보다 전단사적 매핑을 사용한 ICL CIPHERS를 더 잘 해결한다는 것을 보여주며, 이를 통해 ICL에서 '학습'을 정량화하는 새로운 접근 방식을 제시합니다. 이 격차는 작지만, 네 개의 데이터셋과 여섯 개의 모델에서 일관되게 관찰됩니다. 마지막으로, 우리는 LLM의 내부 표현을 조사하고 암호화된 입력을 해독하는 능력에 대한 증거를 확인합니다.

English

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

ICL 암호: 대체 암호를 통해 인-컨텍스트 러닝의 "학습"을 정량화하기

ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

초록

Support