向量-ICL:具有連續向量表示的內文學習
Vector-ICL: In-context Learning with Continuous Vector Representations
October 8, 2024
作者: Yufan Zhuang, Chandan Singh, Liyuan Liu, Jingbo Shang, Jianfeng Gao
cs.AI
摘要
大型語言模型(LLMs)展示了在文本數據上顯著的上下文學習(ICL)能力。我們探索這些能力是否可以擴展到來自黑盒預訓練編碼器的不同領域的連續向量。通過通過輕量級投影器將輸入數據與LLM的嵌入空間對齊,我們觀察到LLMs可以有效地處理和學習這些投影向量,我們稱之為向量-ICL。特別是,我們發現使用通用語言建模目標預訓練投影器可以實現向量-ICL,而任務特定的微調進一步提高了性能。在我們的實驗中涵蓋各種任務和模態,包括文本重構、數值函數回歸、文本分類、摘要、分子標題、時間序列分類、圖分類和fMRI解碼,向量-ICL通常優於少樣本ICL和特定領域模型或調整。我們進一步進行分析和案例研究,顯示LLMs處理向量表示的潛力超越了傳統基於標記的範式。
English
Large language models (LLMs) have shown remarkable in-context learning (ICL)
capabilities on textual data. We explore whether these capabilities can be
extended to continuous vectors from diverse domains, obtained from black-box
pretrained encoders. By aligning input data with an LLM's embedding space
through lightweight projectors, we observe that LLMs can effectively process
and learn from these projected vectors, which we term Vector-ICL. In
particular, we find that pretraining projectors with general language modeling
objectives enables Vector-ICL, while task-specific finetuning further enhances
performance. In our experiments across various tasks and modalities, including
text reconstruction, numerical function regression, text classification,
summarization, molecule captioning, time-series classification, graph
classification, and fMRI decoding, Vector-ICL often surpasses both few-shot ICL
and domain-specific model or tuning. We further conduct analyses and case
studies, indicating the potential of LLMs to process vector representations
beyond traditional token-based paradigms.Summary
AI-Generated Summary