超越表面：探究跨尺度和層級的 LLaMA

摘要

本文對大型語言模型（LLMs）進行了深入分析，專注於LLaMA，這是自然語言處理中一個知名的開源基礎模型。我們並未通過評估LLaMA的生成輸出來評估它，而是設計了多重選擇任務，以探究其在高階任務（如推理和計算）中的內在理解。我們水平地檢視模型，比較不同大小，垂直地評估不同層次。我們根據設計的探測任務揭示了幾個關鍵且不尋常的發現：（1）水平方面，擴大模型大小幾乎不能自動帶來額外知識或計算能力。相反，它可以增強推理能力，特別是在數學問題解決方面，有助於減少幻覺，但僅在特定大小閾值之上；（2）在垂直分析中，LLaMA的較低層缺乏實質算術和事實知識，展示了邏輯思維、多語言和認知能力，而頂層則擁有大部分計算能力和現實世界知識。

English

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

超越表面：探究跨尺度和層級的 LLaMA

Beyond Surface: Probing LLaMA Across Scales and Layers

摘要

Support