Mini-GPTs：通过上下文修剪实现高效大型语言模型

摘要

在人工智能研究中，优化大型语言模型（LLMs）仍然是一个重要挑战，对推进该领域的实际应用和可持续性至关重要。本文借鉴了韩颂教授在麻省理工学院实验室的基础工作，介绍了一种通过上下文修剪开发Mini-GPTs的新方法。我们的方法有策略地修剪传统LLMs（如Phi-1.5）的计算架构，侧重保留核心功能，同时大幅减小模型大小。我们将这一技术应用于包括美国法律、医学问答、《上古卷轴》对话、英文-台湾话语翻译和经济文章在内的多样化和复杂数据集上。结果突显了上下文修剪的效率和有效性，不仅仅作为一个理论概念，而且作为开发领域特定、资源高效的LLMs的实用工具。上下文修剪是构建领域特定LLMs的一种有前途的方法，本研究是未来发展的基石，需要更多硬件计算、精细调整和量化。

English

In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field's practical applications and sustainability. Building upon the foundational work of Professor Song Han's lab at MIT, this paper introduces a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. The results underscore the efficiency and effectiveness of contextual pruning, not merely as a theoretical concept but as a practical tool in developing domain-specific, resource-efficient LLMs. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.

Mini-GPTs：通过上下文修剪实现高效大型语言模型

Mini-GPTs: Efficient Large Language Models through Contextual Pruning

摘要

Support