SOLAR 10.7B: 단순하지만 효과적인 깊이 확장을 통한 대규모 언어 모델 스케일링

초록

우리는 기본 대형 언어 모델(LLM)을 간단하면서도 효율적이고 효과적으로 확장할 수 있는 새로운 기법인 깊이 업스케일링(DUS)을 소개한다. 전문가 혼합(MoE)과 달리 DUS는 학습과 추론 과정에서 복잡한 변경을 요구하지 않는다. DUS를 활용하여 우리는 107억 개의 파라미터를 가진 대형 언어 모델인 SOLAR 10.7B를 구축했으며, 다양한 자연어 처리(NLP) 작업에서 우수한 성능을 입증했다. 비교 평가 결과, SOLAR 10.7B는 Llama 2와 Mistral 7B와 같은 기존의 오픈소스 사전 학습된 LLM들을 능가하는 것으로 나타났다. 또한, 우리는 명령어 수행 능력을 위해 미세 조정된 변형 모델인 SOLAR 10.7B-Instruct를 제시하며, 이는 Mixtral-8x7B를 능가하는 성능을 보였다. SOLAR 10.7B는 Apache 2.0 라이선스 하에 공개되어, LLM 분야에서의 광범위한 접근성과 활용을 촉진한다.

English

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

SOLAR 10.7B: 단순하지만 효과적인 깊이 확장을 통한 대규모 언어 모델 스케일링

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

초록

Support