具有混合思维表示的大型语言模型级联，用于高效推理

摘要

大型语言模型（LLMs）如GPT-4在各种任务中展现出卓越的性能，但这种强大性能通常伴随着使用付费API服务的高昂成本。本文的动机在于研究构建LLM级联以节省使用LLMs的成本，特别是用于执行推理（例如数学、因果）任务。我们的级联流程遵循这样的直觉，即较简单的问题可以由一个更弱但更经济实惠的LLM解决，而只有具有挑战性的问题才需要更强大、更昂贵的LLM。为了实现这种决策过程，我们将较弱LLM的“答案一致性”视为问题难度的信号，并提出了几种答案抽样和一致性检查的方法，包括利用两种思维表示的混合（即思维链和思维程序）。通过在六个推理基准数据集上进行实验，其中GPT-3.5-turbo和GPT-4分别作为较弱和较强LLMs，我们证明了我们提出的LLM级联可以实现与仅使用更强LLM相当的性能，但仅需其成本的40%。

English

Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

具有混合思维表示的大型语言模型级联，用于高效推理

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

摘要

Support