ControlLLM：通过在图上搜索来增强语言模型的工具

摘要

我们提出了ControlLLM，这是一个新颖的框架，使大型语言模型（LLMs）能够利用多模态工具来解决复杂的现实世界任务。尽管LLMs表现出色，但它们仍然在工具调用方面存在困难，原因包括用户提示模糊、工具选择和参数化不准确，以及工具调度低效。为了克服这些挑战，我们的框架包括三个关键组件：（1）任务分解器，将复杂任务分解为具有明确定义输入和输出的清晰子任务；（2）Thoughts-on-Graph（ToG）范式，在预先构建的工具图上搜索最佳解决方案路径，该图指定了不同工具之间的参数和依赖关系；以及（3）执行引擎，具有丰富的工具箱，解释解决方案路径并在不同计算设备上高效运行工具。我们在涉及图像、音频和视频处理的各种任务上评估了我们的框架，展示了与现有方法相比其卓越的准确性、效率和多功能性。

English

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a task decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a Thoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an execution engine with a rich toolbox that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.

ControlLLM：通过在图上搜索来增强语言模型的工具

ControlLLM: Augment Language Models with Tools by Searching on Graphs

摘要

Support