编程概念与神经元在代码语言模型中的共享机制

摘要

多项研究已探讨了大型语言模型（LLMs）在编码任务中的机制，但多数集中于单语环境下的编程语言（PLs）。本文中，我们探究了LLMs概念空间中多种编程语言与英语之间的关系。我们利用两个基于Llama的模型，对21对编程语言进行了少量样本的翻译任务。通过解码任务过程中中间层的嵌入向量，我们观察到概念空间更接近于英语（包括编程语言关键词），并在中间层的后半部分赋予英语标记以高概率。我们分析了11种编程语言及英语的神经元激活情况，发现尽管语言特异性神经元主要集中于底层，但每种编程语言独有的神经元则倾向于出现在顶层。对于那些与多种其他编程语言高度对齐的编程语言，识别其语言特异性神经元并不可行。这些编程语言往往拥有比其他编程语言更大的关键词集，并且在翻译任务中无论输入/输出为何种编程语言，它们都更接近模型的概念空间。我们的发现为理解LLMs内部如何表征编程语言提供了洞见，揭示了模型概念空间中的结构模式。代码可在https://github.com/cisnlp/code-specific-neurons获取。

English

Several studies have explored the mechanisms of large language models (LLMs) in coding tasks, but most have focused on programming languages (PLs) in a monolingual setting. In this paper, we investigate the relationship between multiple PLs and English in the concept space of LLMs. We perform a few-shot translation task on 21 PL pairs using two Llama-based models. By decoding the embeddings of intermediate layers during this task, we observe that the concept space is closer to English (including PL keywords) and assigns high probabilities to English tokens in the second half of the intermediate layers. We analyze neuron activations for 11 PLs and English, finding that while language-specific neurons are primarily concentrated in the bottom layers, those exclusive to each PL tend to appear in the top layers. For PLs that are highly aligned with multiple other PLs, identifying language-specific neurons is not feasible. These PLs also tend to have a larger keyword set than other PLs and are closer to the model's concept space regardless of the input/output PL in the translation task. Our findings provide insights into how LLMs internally represent PLs, revealing structural patterns in the model's concept space. Code is available at https://github.com/cisnlp/code-specific-neurons.

编程概念与神经元在代码语言模型中的共享机制

How Programming Concepts and Neurons Are Shared in Code Language Models

摘要

Support