深度语言网络：使用变分推断联合提示训练堆叠的LLM。

摘要

我们将大型语言模型（LLMs）视为网络中的随机语言层，其中可学习参数为每一层的自然语言提示。我们堆叠两个这样的层，将一个层的输出馈送到下一个层。我们将堆叠的架构称为深度语言网络（DLN）。我们首先展示如何有效地执行对于单层语言网络（DLN-1）的提示优化。然后，我们展示如何训练2层DLNs（DLN-2），其中必须学习两个提示。我们将第一层的输出视为一个潜变量进行边际化，并为联合提示训练设计了一种变分推断算法。DLN-2的性能比单层更高，有时甚至可与少样本GPT-4相媲美，即使网络中的每个LLM都较小且功能较弱。DLN代码是开源的：https://github.com/microsoft/deep-language-networks。

English

We view large language models (LLMs) as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable to marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to few-shot GPT-4 even when each LLM in the network is smaller and less powerful. The DLN code is open source: https://github.com/microsoft/deep-language-networks .

深度语言网络：使用变分推断联合提示训练堆叠的LLM。

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

摘要

Support