ControlLLM: 그래프 탐색을 통해 도구를 활용하여 언어 모델 강화하기

초록

복잡한 실세계 과제 해결을 위해 대규모 언어 모델(LLM)이 다중 모달 도구를 활용할 수 있도록 하는 새로운 프레임워크인 ControlLLM을 소개합니다. LLM의 뛰어난 성능에도 불구하고, 모호한 사용자 프롬프트, 부정확한 도구 선택 및 파라미터 설정, 비효율적인 도구 스케줄링으로 인해 도구 호출에 어려움을 겪고 있습니다. 이러한 문제를 극복하기 위해, 우리의 프레임워크는 세 가지 핵심 구성 요소로 이루어져 있습니다: (1) 복잡한 과제를 명확한 입력과 출력을 가진 하위 과제로 분해하는 작업 분해기, (2) 사전 구축된 도구 그래프 상에서 최적의 솔루션 경로를 탐색하는 Thoughts-on-Graph(ToG) 패러다임(이 그래프는 다양한 도구 간의 파라미터 및 의존 관계를 명시함), (3) 솔루션 경로를 해석하고 다양한 컴퓨팅 장치에서 도구를 효율적으로 실행하는 풍부한 도구 상자를 갖춘 실행 엔진. 우리는 이미지, 오디오, 비디오 처리와 관련된 다양한 과제에서 이 프레임워크를 평가하며, 기존 방법 대비 우수한 정확도, 효율성, 그리고 다용도성을 입증합니다.

English

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a task decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a Thoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an execution engine with a rich toolbox that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.

ControlLLM: 그래프 탐색을 통해 도구를 활용하여 언어 모델 강화하기

ControlLLM: Augment Language Models with Tools by Searching on Graphs

초록

Support