深度語言網絡：使用變分推斷聯合提示訓練堆疊的LLM。

摘要

我們將大型語言模型（LLMs）視為網絡中的隨機語言層，其中可學習的參數是每一層的自然語言提示。我們堆疊兩個這樣的層，將一層的輸出餵入下一層。我們將這種堆疊的架構稱為深度語言網絡（DLN）。我們首先展示如何有效地執行對於單層語言網絡（DLN-1）的提示優化。然後，我們展示如何訓練雙層DLNs（DLN-2），其中必須學習兩個提示。我們將第一層的輸出視為要邊際化的潛在變量，並為聯合提示訓練設計了一種變分推理算法。DLN-2的性能比單層更高，有時甚至可與少樣本GPT-4相媲美，即使網絡中的每個LLM都更小且功能較弱。DLN代碼是開源的：https://github.com/microsoft/deep-language-networks。

English

We view large language models (LLMs) as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable to marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to few-shot GPT-4 even when each LLM in the network is smaller and less powerful. The DLN code is open source: https://github.com/microsoft/deep-language-networks .

深度語言網絡：使用變分推斷聯合提示訓練堆疊的LLM。

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

摘要

Support