小型GPT：透過上下文修剪實現高效大型語言模型

摘要

在人工智慧研究中，對於大型語言模型（LLMs）的優化仍然是一個重大挑戰，對於推進該領域的實際應用和可持續性至關重要。本文基於麻省理工學院韓松教授實驗室的基礎工作，介紹了一種通過上下文修剪來開發Mini-GPTs的新方法。我們的方法有策略性地修剪傳統LLMs（如Phi-1.5）的計算架構，著重於保留核心功能，同時大幅減小模型大小。我們在包括美國法律、醫學問答、《上古卷軸》對話、英翻台翻譯和經濟文章在內的各種複雜數據集上應用了這一技術。結果凸顯了上下文修剪的效率和有效性，不僅僅是一個理論概念，而是一個在開發特定領域專用、資源高效的LLMs中的實用工具。上下文修剪是構建特定領域LLMs的一種有前途的方法，這項研究是未來發展的一個基礎，將會有更多硬體運算、精細調整和量化。

English

In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field's practical applications and sustainability. Building upon the foundational work of Professor Song Han's lab at MIT, this paper introduces a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. The results underscore the efficiency and effectiveness of contextual pruning, not merely as a theoretical concept but as a practical tool in developing domain-specific, resource-efficient LLMs. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.

小型GPT：透過上下文修剪實現高效大型語言模型

Mini-GPTs: Efficient Large Language Models through Contextual Pruning

摘要

Support