ControlLLM：通過在圖上搜索來擴充語言模型

摘要

我們提出了ControlLLM，一個新穎的框架，使得大型語言模型（LLMs）能夠利用多模態工具來解決複雜的現實任務。儘管LLMs表現出色，但由於用戶提示模糊、工具選擇和參數化不準確，以及工具排程效率低下，它們仍然在工具調用方面遇到困難。為了克服這些挑戰，我們的框架包括三個關鍵組件：（1）任務分解器，將複雜任務分解為具有明確輸入和輸出的子任務；（2）Thoughts-on-Graph（ToG）範式，在預先構建的工具圖上搜索最佳解決方案路徑，該圖指定不同工具之間的參數和依賴關係；以及（3）一個執行引擎，具有豐富的工具箱，解釋解決方案路徑並在不同計算設備上高效運行工具。我們在涉及圖像、音頻和視頻處理的各種任務上評估我們的框架，展示了與現有方法相比其卓越的準確性、效率和多功能性。

English

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a task decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a Thoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an execution engine with a rich toolbox that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.

ControlLLM：通過在圖上搜索來擴充語言模型

ControlLLM: Augment Language Models with Tools by Searching on Graphs

摘要

Support