APIGen-MT: 시뮬레이션된 에이전트-인간 상호작용을 통한 다중 턴 데이터 생성을 위한 에이전트 기반 파이프라인

초록

다중 턴 상호작용을 위한 효과적인 AI 에이전트를 훈련시키기 위해서는 현실적인 인간-에이전트 역학을 포착하는 고품질 데이터가 필요하지만, 이러한 데이터는 부족하고 수동으로 수집하기에는 비용이 많이 듭니다. 우리는 검증 가능하고 다양한 다중 턴 에이전트 데이터를 생성하는 2단계 프레임워크인 APIGen-MT를 소개합니다. 첫 번째 단계에서, 우리의 에이전트 파이프라인은 LLM 리뷰어 위원회와 반복적인 피드백 루프를 활용하여 실제 행동을 포함한 상세한 작업 청사진을 생성합니다. 이러한 청사진은 시뮬레이션된 인간-에이전트 상호작용을 통해 완전한 상호작용 궤적으로 변환됩니다. 우리는 1B에서 70B 파라미터 크기까지 다양한 xLAM-2-fc-r 시리즈 모델을 훈련시켰습니다. 우리의 모델은 tau-bench와 BFCL 벤치마크에서 GPT-4o 및 Claude 3.5와 같은 최첨단 모델을 능가하며, 특히 다중 턴 설정에서 더 작은 모델이 더 큰 모델을 앞서는 동시에 여러 시도에서 우수한 일관성을 유지합니다. 포괄적인 실험을 통해 검증된 청사진-세부사항 접근 방식이 고품질 훈련 데이터를 생성하여 더 신뢰할 수 있고 효율적이며 능력 있는 에이전트 개발을 가능하게 함을 입증했습니다. 우리는 수집된 합성 데이터와 훈련된 xLAM-2-fc-r 모델을 오픈소스로 공개하여 AI 에이전트 연구를 발전시키고자 합니다. 모델은 HuggingFace(https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4)에서 확인할 수 있으며, 프로젝트 웹사이트는 https://apigen-mt.github.io입니다.

English

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on tau-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io

APIGen-MT: 시뮬레이션된 에이전트-인간 상호작용을 통한 다중 턴 데이터 생성을 위한 에이전트 기반 파이프라인

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

초록

Support