超越表面：探究跨尺度和层级的LLaMA

摘要

本文对大型语言模型（LLMs）进行了深入分析，重点关注LLaMA，这是自然语言处理中一种知名的开源基础模型。我们设计了多项选择任务来探究LLaMA在高阶任务（如推理和计算）中的内在理解，而非通过其生成性输出来评估LLaMA。我们水平地检查了模型，比较了不同规模，垂直地评估了不同层次。根据设计的探究任务，我们揭示了几个关键且不同寻常的发现：（1）水平方面，增大模型规模几乎无法自动赋予额外知识或计算能力。相反，它可以增强推理能力，特别是在数学问题解决方面，并有助于减少幻觉，但仅限于一定规模阈值之上；（2）在垂直分析中，LLaMA的较低层缺乏实质性的算术和事实知识，展示了逻辑思维、多语言和识别能力，而顶层则拥有大部分计算能力和现实世界知识。

English

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

超越表面：探究跨尺度和层级的LLaMA

Beyond Surface: Probing LLaMA Across Scales and Layers

摘要

Support