비용 효율적 추론을 위한 혼합 사고 표현 기반 대형 언어 모델 캐스케이드

초록

GPT-4와 같은 대형 언어 모델(LLMs)은 다양한 작업에서 뛰어난 성능을 보여주지만, 이러한 강력한 성능은 종종 유료 API 서비스 사용에 따른 높은 비용을 수반합니다. 본 논문에서는 특히 수학적, 인과적 추론 작업을 수행할 때 LLM 사용 비용을 절감하기 위해 LLM 캐스케이드를 구축하는 연구를 진행했습니다. 우리의 캐스케이드 파이프라인은 더 단순한 질문은 더 약하지만 더 저렴한 LLM으로 해결할 수 있고, 도전적인 질문만이 더 강력하고 비용이 많이 드는 LLM을 필요로 한다는 직관을 따릅니다. 이러한 의사결정을 실현하기 위해, 우리는 더 약한 LLM의 "답변 일관성"을 질문의 난이도 신호로 간주하고, Chain-of-Thought와 Program-of-Thought라는 두 가지 사고 표현의 혼합을 활용한 답변 샘플링 및 일관성 검사 방법을 제안합니다. GPT-3.5-turbo와 GPT-4를 각각 더 약한 LLM과 더 강력한 LLM으로 설정하여 6개의 추론 벤치마크 데이터셋에서 실험을 진행한 결과, 제안된 LLM 캐스케이드는 더 강력한 LLM만을 사용했을 때와 비슷한 성능을 달성하면서도 비용은 단 40%만 소요되는 것을 입증했습니다.

English

Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

비용 효율적 추론을 위한 혼합 사고 표현 기반 대형 언어 모델 캐스케이드

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

초록

Support