表面を超えて：スケールと層にわたるLLaMAの探求

要旨

本論文は、大規模言語モデル（LLMs）に関する詳細な分析を提示し、自然言語処理分野における代表的なオープンソース基盤モデルであるLLaMAに焦点を当てています。LLaMAをその生成出力を通じて評価するのではなく、推論や計算といった高次タスクにおける内在的理解を探るために、多肢選択タスクを設計しました。モデルを水平的に比較し、異なるサイズを検証するとともに、垂直的に異なる層を評価しました。設計したプロービングタスクに基づき、以下の重要なかつ珍しい知見を明らかにしました：（1）水平的には、モデルサイズを拡大しても、追加の知識や計算能力が自動的に付与されることはほとんどありません。代わりに、特に数学的問題解決において推論能力が向上し、特定のサイズ閾値を超えると幻覚（hallucination）を減少させる効果があります。（2）垂直的分析では、LLaMAの下位層には算術や事実知識がほとんどなく、論理的思考、多言語能力、認識能力を示す一方で、上位層にはほとんどの計算能力と実世界の知識が集約されています。

English

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

表面を超えて：スケールと層にわたるLLaMAの探求

Beyond Surface: Probing LLaMA Across Scales and Layers

要旨

Support