뉴런 활성화 그래프를 통한 목표 지향 사전 학습 데이터 선택

초록

일상적인 과제에는 목표가 수반되며, 이 목표를 중심으로 모델을 사전 학습하는 것이 바로 모델을 전문가로 만드는 방법입니다. 본 논문에서는 목표 지향 언어 모델(LM) 사전 학습을 연구하기 위해 훈련 없이 적용 가능하고 해석 가능한 목표 사전 학습 데이터 선별 프레임워크인 뉴런 활성화 그래프 순위 지정(NAG 기반 순위 지정)을 소개합니다. 블랙박스 표현을 사용하는 대신, 본 접근법은 기성 대형 언어 모델(LLM) 내에서 각 대상 입력을 높은 영향을 미치는 희소 뉴런 집합으로 직접 특성화합니다. 구체적으로, 우리는 뉴런 영향력을 정량화하고 계층별로 가장 영향력 있는 뉴런을 선별하여 간결한 뉴런 활성화 그래프(NAG)로 구성하며, 후보 데이터를 대상 예제와의 NAG 유사도에 따라 순위를 매깁니다. 우리는 6개의 벤치마크에서 실험을 수행한 결과, NAG 기반 순위 지정이 무작위 샘플링 대비 목표 지향 사전 학습 성능을 평균 4.9% 향상시켰으며, HellaSwag에서도 최신 기준선(baseline)들을 5.3% 정확도로 앞섰습니다. 또한 보다 실용적인 다중 목표 설정에서도 효과를 유지했으며, 우리의 최적 설정이 두 기준선을 각각 1.1%와 4.1% 능가했습니다. 나아가, NAG가 왜 그리고 어떻게 작동하는지에 대한 포괄적인 분석을 제공합니다. 예를 들어, NAG로 선별된 뉴런(전체의 0.12%에 불과)을 비활성화하면 성능이 23.5% 급락하며, NAG를 최종 계층으로 제한할 경우 평균 4.1% 하락이 발생하여, NAG가 목표 특징 학습을 위한 희소 "기능적 백본(functional backbone)"을 포착함을 시사합니다. 코드는 https://github.com/asillycat/NAG 에서 공개합니다.

English

Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data selection. Rather than using black-box representations, our approach directly characterizes each target input by a sparse set of high-impact neurons in any off-the-shelf LLMs. Concretely, we quantify neuron impact and select the most influential neurons across layers into a compact Neuron-Activated Graph (NAG), and rank candidate data by NAG similarity to target examples. We conduct experiments across six benchmarks, where our NAG-based Ranking improves target-oriented pretraining by 4.9% on average over random sampling, and also outperforms state-of-the-art baselines by 5.3% accuracy on HellaSwag. It also remains effective under a more applicable multi-target setting, where our best setup surpasses two baselines by 1.1% and 4.1%, respectively. Furthermore, we provide a comprehensive analysis on why and how our NAG works, e.g., deactivating NAG-selected neurons (only 0.12% of all) causes a 23.5% performance collapse, and restricting NAG to the final layer incurs a 4.1% average drop, indicating that NAG captures a sparse "functional backbone" for learning target features. We release the code at https://github.com/asillycat/NAG.

뉴런 활성화 그래프를 통한 목표 지향 사전 학습 데이터 선택

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

초록

Support