표면 너머: 스케일과 계층에 걸친 LLaMA 탐구

초록

본 논문은 자연어 처리 분야에서 주목받는 오픈소스 기초 모델인 LLaMA를 중심으로 대규모 언어 모델(LLMs)에 대한 심층 분석을 제시한다. LLaMA를 생성적 출력을 통해 평가하는 대신, 우리는 추론 및 계산과 같은 고차원적 과제에서 모델의 내재적 이해를 탐구하기 위해 다중 선택 과제를 설계하였다. 우리는 모델을 수평적으로(다양한 크기 비교)와 수직적으로(다른 계층 평가) 조사하였다. 설계된 탐구 과제를 기반으로 몇 가지 주요하고 독특한 발견을 밝혀냈다: (1) 수평적으로, 모델 크기를 키우는 것이 추가적인 지식이나 계산 능력을 자동으로 부여하지는 못한다. 대신, 특히 수학 문제 해결에서 추론 능력을 향상시키고, 특정 크기 임계값을 넘어서야 환각 현상을 줄이는 데 도움을 준다; (2) 수직적 분석에서, LLaMA의 하위 계층은 산술 및 사실적 지식이 부족하지만 논리적 사고, 다국어 및 인식 능력을 보여주며, 상위 계층은 대부분의 계산 능력과 실세계 지식을 담고 있다.

English

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

표면 너머: 스케일과 계층에 걸친 LLaMA 탐구

Beyond Surface: Probing LLaMA Across Scales and Layers

초록

Support