LLM 模組:使用增強型交叉注意力從大型模型轉移知識到小型模型
LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention
February 12, 2025
作者: Konstantin Kolomeitsev
cs.AI
摘要
在這份工作中,我們提出了一種LLM模組的架構,該架構利用增強型交叉注意機制,從一個大型預訓練模型將知識傳遞到一個較小的模型。在提出的方案中,Qwen2-1.5B模型被凍結,其表示通過特別設計的注意層傳遞到在有限計算資源上訓練的GPT-Neo-125M模型。在Bespoke-Stratos-17k數據集上的實驗結果表明,在訓練15個時代後,合併模型生成的回應質量與蒸餾獲得的回應相當。我們討論了模塊化方法的優勢,提供了輸入查詢和比較分析的示例,並概述了該方法進一步擴展的前景。
English
In this work, we propose an architecture of LLM Modules that enables the
transfer of knowledge from a large pre-trained model to a smaller model using
an Enhanced Cross-Attention mechanism. In the proposed scheme, the Qwen2-1.5B
model is frozen and its representations are passed through specially designed
attention layers to the GPT-Neo-125M model, which is trained on limited
computational resources. Experimental results on the Bespoke-Stratos-17k
dataset demonstrate that after 15 epochs of training, the combined model
generates responses comparable in quality to those obtained by distillation. We
discuss the advantages of the modular approach, provide examples of input
queries and comparative analysis, and outline prospects for further extension
of the method.Summary
AI-Generated Summary