TPTU-v2: 실세계 시스템에서 대규모 언어 모델 기반 에이전트의 작업 계획 및 도구 사용 능력 강화

초록

대형 언어 모델(LLMs)은 작업 계획과 API와 같은 외부 도구 사용이 결합된 작업을 처리하는 데 능숙함을 보여주었습니다. 그러나 현실 세계의 복잡한 시스템은 작업 계획과 도구 사용과 관련하여 세 가지 주요 과제를 제시합니다: (1) 실제 시스템은 일반적으로 방대한 수의 API를 가지고 있어, 토큰 길이가 제한된 LLM의 프롬프트에 모든 API의 설명을 제공하는 것이 불가능합니다; (2) 실제 시스템은 복잡한 작업을 처리하도록 설계되어 있으며, 기본 LLM은 이러한 작업에 대한 올바른 하위 작업 순서와 API 호출 순서를 계획하기 어렵습니다; (3) 실제 시스템에서 API 간의 유사한 의미와 기능은 LLM뿐만 아니라 인간에게도 이를 구분하는 데 어려움을 초래합니다. 이에 대응하여, 본 논문은 현실 세계 시스템 내에서 작동하는 LLM 기반 에이전트의 작업 계획 및 도구 사용(TPTU) 능력을 향상시키기 위한 포괄적인 프레임워크를 소개합니다. 우리의 프레임워크는 이러한 과제를 해결하기 위해 설계된 세 가지 주요 구성 요소로 이루어져 있습니다: (1) API 검색기는 사용자 작업과 가장 관련성이 높은 API를 방대한 배열 중에서 선택합니다; (2) LLM 미세 조정기는 기본 LLM을 조정하여, 미세 조정된 LLM이 작업 계획과 API 호출에 더 능숙해지도록 합니다; (3) 데모 선택기는 구분하기 어려운 API와 관련된 다양한 데모를 적응적으로 검색하며, 이를 컨텍스트 학습에 추가 사용하여 최종 성능을 향상시킵니다. 우리는 실제 상용 시스템과 오픈소스 학술 데이터셋을 사용하여 우리의 방법을 검증하였으며, 그 결과는 각 개별 구성 요소와 통합 프레임워크의 효과를 명확히 보여줍니다.

English

Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.

TPTU-v2: 실세계 시스템에서 대규모 언어 모델 기반 에이전트의 작업 계획 및 도구 사용 능력 강화

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

초록

Support