Mini-GPTs: コンテキストに基づくプルーニングによる効率的な大規模言語モデル

要旨

AI研究において、大規模言語モデル（LLMs）の最適化は、分野の実用的な応用と持続可能性を進める上で重要な課題であり続けています。MITのSong Han教授の研究室の基礎研究を基盤として、本論文は、コンテキストプルーニングを介したMini-GPTsの開発における新たなアプローチを紹介します。私たちの方法論は、Phi-1.5のような従来のLLMsの計算アーキテクチャを戦略的にプルーニングし、コア機能を保持しながらモデルサイズを大幅に削減することに焦点を当てています。この技術を、米国法、医療Q&A、Skyrimの対話、英語-台湾語翻訳、経済学記事など、多様で複雑なデータセットに適用しました。結果は、コンテキストプルーニングが単なる理論的概念ではなく、ドメイン特化型でリソース効率の高いLLMsを開発するための実用的なツールとしての効率性と有効性を強調しています。コンテキストプルーニングは、ドメイン特化型LLMsを構築するための有望な方法であり、本研究は、より多くのハードウェア計算力、洗練されたファインチューニング、量子化を伴う将来の開発に向けた基盤となるものです。

English

In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field's practical applications and sustainability. Building upon the foundational work of Professor Song Han's lab at MIT, this paper introduces a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. The results underscore the efficiency and effectiveness of contextual pruning, not merely as a theoretical concept but as a practical tool in developing domain-specific, resource-efficient LLMs. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.

Mini-GPTs: コンテキストに基づくプルーニングによる効率的な大規模言語モデル

Mini-GPTs: Efficient Large Language Models through Contextual Pruning

要旨

Support