MobiLlama：高精度で軽量な完全透過型GPTを目指して

要旨

「大きければ大きいほど良い」という考え方が、近年の大規模言語モデル（LLM）開発における主流のトレンドとなってきました。しかし、LLMは、オンデバイス処理、エネルギー効率、低メモリフットプリント、応答効率を必要とするシナリオには適していません。これらの要件は、プライバシー、セキュリティ、持続可能な展開にとって重要です。本論文では、リソースが制約されたデバイス向けに、正確でありながら効率的な小型言語モデル（SLM）を設計するという課題に取り組むことで、「少ないほど良い」というパラダイムを探求します。私たちの主な貢献は、リソースが制約されたコンピューティングの特定のニーズに対応し、リソース要求を削減しながら性能を向上させることに重点を置いた、正確で完全に透明なオープンソースの5億（0.5B）パラメータのSLM、MobiLlamaを紹介することです。MobiLlamaは、より大きなモデルから始め、慎重に設計されたパラメータ共有スキームを適用することで、事前学習と展開の両方のコストを削減するSLM設計です。私たちの研究は、オープンソースのSLMにおけるギャップを埋めるだけでなく、完全な透明性を確保することを目指しており、完全なトレーニングデータパイプライン、トレーニングコード、モデルウェイト、300以上のチェックポイント、および評価コードが以下で利用可能です：https://github.com/mbzuai-oryx/MobiLlama。

English

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes is available at : https://github.com/mbzuai-oryx/MobiLlama.

MobiLlama：高精度で軽量な完全透過型GPTを目指して

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

要旨

Support