大型語言模型串聯與混合思維表示法，以實現成本效益的推理

摘要

大型語言模型（LLMs）如 GPT-4 在各種任務中展現出卓越的表現，但這種強大的表現通常伴隨著使用付費 API 服務的高昂成本。在本文中，我們的動機是研究構建一個 LLMS 級聯以節省使用LLMs的成本，特別是用於執行推理（例如數學、因果）任務。我們的級聯管道遵循一個直覺，即較簡單的問題可以由一個較弱但更經濟實惠的LLM解決，而只有具有挑戰性的問題才需要更強大且更昂貴的LLM。為了實現這種決策，我們考慮較弱LLM的“答案一致性”作為問題難度的信號，並提出了幾種答案抽樣和一致性檢查的方法，包括利用兩種思維表徵的混合（即“思維鏈”和“思維程序”）。通過對六個推理基準數據集進行實驗，其中 GPT-3.5-turbo 和 GPT-4 分別作為較弱和較強的LLMs，我們證明了我們提出的LLMS級聯可以達到與僅使用較強LLM相當的性能，但僅需其成本的40%。

English

Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

大型語言模型串聯與混合思維表示法，以實現成本效益的推理

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

摘要

Support