소형 LLM의 전략적 조정 프레임워크가 데이터 합성에서 대형 LLM과 동등한 성능을 보이다

초록

데이터 합성과 증류는 소규모 언어 모델을 향상시키기 위한 유망한 전략이지만, 현재의 접근 방식은 대규모 언어 모델(LLM)에 크게 의존하고 있습니다. 이러한 대규모 모델은 높은 계산 비용, 환경 비효율성, 그리고 단일 구조에서 상속된 잠재적 편향성 등의 문제를 안고 있습니다. 반면, 소규모 LLM은 접근성이 높고 지속 가능하지만, 개별 모델의 능력만으로는 고품질, 다양성, 신뢰성이 보장된 데이터를 생성하기에는 부족한 경우가 많습니다. 인간의 협업 프로세스(예: 동료 검토)에서 영감을 받아, 우리는 여러 소규모 LLM이 참여하는 GRA 프레임워크를 제안합니다. 이 프레임워크는 단일 대규모 LLM이 달성하는 반복적 개선과 품질 관리를 위해 소규모 LLM 간의 전문적 역할을 통합합니다. 이 협업 프레임워크에서 여러 소규모 LLM은 생성자(Generator), 검토자(Reviewer), 중재자(Adjudicator)라는 구별된 역할을 맡아 동료 검토를 모방한 데이터 합성 파이프라인을 시뮬레이션합니다. 생성자는 초기 데이터 샘플을 제안하고, 검토자는 그 품질과 다양성을 비판하며, 중재자는 충돌을 해결하여 최종 출력을 결정합니다. 합성 프로세스를 전문적인 하위 작업으로 분해함으로써, 협업하는 소규모 LLM은 대규모 LLM 기반 증류와 데이터 수준에서 동등한 성과를 달성할 수 있습니다. 여러 벤치마크를 통한 실험을 통해, 우리는 GRA가 생성한 데이터가 단일 대규모 LLM(예: Qwen-2.5-72B-Instruct)의 출력 품질과 동등하거나 이를 능가함을 입증했습니다. 우리의 결과는 고품질 데이터 합성을 위해 단일 대규모 모델이 필수적이라는 통념에 도전하며, 대신 소규모 에이전트들의 전략적 조정을 주장합니다. 우리의 데이터셋, 모델, 코드는 https://github.com/GX-XinGao/GRA에서 공개되어 있습니다.

English

While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures. In contrast, smaller LLMs are more accessible and sustainable, but their individual capabilities often fall short in generating high-quality, diverse, and reliable data. Inspired by collaborative human processes (e.g., peer review), we propose a multiple small LLMs involved framework, GRA, that aggregates specialized roles across small LLMs to iterative refinement and quality control typically achieved by a single large LLM. In this collaborative framework, multiple small LLMs assume distinct roles-Generator, Reviewer, and Adjudicator-to simulate a peer-review-inspired data synthesis pipeline. The Generator proposes initial data samples, the Reviewer critiques their quality and diversity, and the Adjudicator resolves conflicts to finalize the output. By decomposing the synthesis process into specialized sub-tasks, collaborative small LLMs can achieve data-level parity with large LLM-based distillation. Through experiments across multiple benchmarks, we demonstrate that GRA-produced data matches or exceeds the quality of single large LLM outputs, e.g., Qwen-2.5-72B-Instruct. Our results challenge the necessity of monolithic large models for high-quality data synthesis, advocating instead for strategic coordination of smaller agents. Our datasets, models, and code are publicly available at https://github.com/GX-XinGao/GRA.

소형 LLM의 전략적 조정 프레임워크가 데이터 합성에서 대형 LLM과 동등한 성능을 보이다

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

초록

Support