데이터 혼합 에이전트: 지속적 사전 학습을 위한 도메인 재가중 학습

초록

소규모 작업별 데이터에 대한 지속적 사전 학습은 대규모 언어 모델을 새로운 대상 분야에서 개선하는 효과적인 방법이지만, 원래의 능력을 심각하게 잃어버릴 위험이 있습니다. 일반적인 해결책은 소스 분야와 대상 분야의 훈련 데이터 혼합을 도메인 공간에서 재조정하여 균형 잡힌 성능을 달성하는 것입니다. 이전의 도메인 재조정 전략은 인간의 직관이나 경험적 결과에 기반한 특정 휴리스틱을 수동으로 지정하는 데 의존했습니다. 본 연구에서는 더 일반적인 휴리스틱을 매개변수화할 수 있음을 증명하고, 도메인 재조정을 학습하는 최초의 모델 기반 종단 간 프레임워크인 Data Mixing Agent를 제안합니다. 이 에이전트는 평가 환경에서의 피드백과 함께 대량의 데이터 혼합 궤적에 대해 강화 학습을 통해 일반화 가능한 휴리스틱을 학습합니다. 수학적 추론에 대한 지속적 사전 학습 실험에서 Data Mixing Agent는 소스 및 대상 분야 벤치마크에서 균형 잡힌 성능을 달성하는 데 강력한 베이스라인을 능가함을 보여줍니다. 또한, 재훈련 없이도 보이지 않는 소스 분야, 대상 모델, 도메인 공간에서 잘 일반화됩니다. 코드 생성 분야에 직접 적용한 결과는 대상 도메인 간의 적응성도 나타냅니다. 추가 분석은 에이전트의 휴리스틱이 인간의 직관과 잘 맞아떨어지며, 더 적은 소스 분야 데이터로도 우수한 모델 성능을 달성하는 효율성을 보여줍니다.

English

Continual pre-training on small-scale task-specific data is an effective method for improving large language models in new target fields, yet it risks catastrophic forgetting of their original capabilities. A common solution is to re-weight training data mixtures from source and target fields on a domain space to achieve balanced performance. Previous domain reweighting strategies rely on manual designation with certain heuristics based on human intuition or empirical results. In this work, we prove that more general heuristics can be parameterized by proposing Data Mixing Agent, the first model-based, end-to-end framework that learns to re-weight domains. The agent learns generalizable heuristics through reinforcement learning on large quantities of data mixing trajectories with corresponding feedback from an evaluation environment. Experiments in continual pre-training on math reasoning show that Data Mixing Agent outperforms strong baselines in achieving balanced performance across source and target field benchmarks. Furthermore, it generalizes well across unseen source fields, target models, and domain spaces without retraining. Direct application to the code generation field also indicates its adaptability across target domains. Further analysis showcases the agents' well-aligned heuristics with human intuitions and their efficiency in achieving superior model performance with less source-field data.

데이터 혼합 에이전트: 지속적 사전 학습을 위한 도메인 재가중 학습

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

초록

Support