コード言語モデルにおけるプログラミング概念とニューロンの共有関係

要旨

大規模言語モデル（LLM）のコーディングタスクにおけるメカニズムを探求した研究はいくつか存在するが、そのほとんどは単一言語設定におけるプログラミング言語（PL）に焦点を当てている。本論文では、LLMの概念空間における複数のPLと英語の関係を調査する。2つのLlamaベースモデルを用いて、21のPLペアに対してFew-shot翻訳タスクを実施した。このタスク中の中間層の埋め込みをデコードすることで、概念空間が英語（PLキーワードを含む）に近く、中間層の後半では英語トークンに高い確率を割り当てていることを観察した。11のPLと英語のニューロン活性化を分析した結果、言語固有のニューロンは主に下位層に集中しているが、各PLに排他的なニューロンは上位層に現れる傾向があることがわかった。複数の他のPLと高度に整合しているPLでは、言語固有のニューロンを特定することは不可能である。これらのPLは、他のPLよりも大きなキーワードセットを持つ傾向があり、翻訳タスクの入力/出力PLに関係なく、モデルの概念空間に近い位置にある。我々の知見は、LLMが内部でPLをどのように表現しているかを示し、モデルの概念空間における構造的パターンを明らかにするものである。コードはhttps://github.com/cisnlp/code-specific-neuronsで公開されている。

English

Several studies have explored the mechanisms of large language models (LLMs) in coding tasks, but most have focused on programming languages (PLs) in a monolingual setting. In this paper, we investigate the relationship between multiple PLs and English in the concept space of LLMs. We perform a few-shot translation task on 21 PL pairs using two Llama-based models. By decoding the embeddings of intermediate layers during this task, we observe that the concept space is closer to English (including PL keywords) and assigns high probabilities to English tokens in the second half of the intermediate layers. We analyze neuron activations for 11 PLs and English, finding that while language-specific neurons are primarily concentrated in the bottom layers, those exclusive to each PL tend to appear in the top layers. For PLs that are highly aligned with multiple other PLs, identifying language-specific neurons is not feasible. These PLs also tend to have a larger keyword set than other PLs and are closer to the model's concept space regardless of the input/output PL in the translation task. Our findings provide insights into how LLMs internally represent PLs, revealing structural patterns in the model's concept space. Code is available at https://github.com/cisnlp/code-specific-neurons.

コード言語モデルにおけるプログラミング概念とニューロンの共有関係

How Programming Concepts and Neurons Are Shared in Code Language Models

要旨

Support