ChatPaper.aiChatPaper

FAMA:面向開源大型語言模型的互動式工具使用環境之故障感知元代理框架

FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments

April 28, 2026
作者: Amir Saeidi, Venkatesh Mishra, Souradeep Mukhopadhyay, Gaowen Liu, Ali Payani, Jayanth Srinivasa, Chitta Baral
cs.AI

摘要

大型語言模型正日益被部署為自主代理的決策核心,這些代理能夠在外部環境中實現改變。然而,在模擬現實世界以客戶為中心的問題解決場景的對話基準測試中,這些代理經常因錯誤決策的連鎖效應而失敗。這些挑戰對於參數規模較小、上下文窗口有限且推理預算受限的開源LLM尤為顯著,這些限制加劇了代理場景中的錯誤積累。為應對這些挑戰,我們提出了故障感知元代理(FAMA)框架。FAMA分兩個階段運作:首先分析基線代理的故障軌跡以識別最常見的錯誤;其次採用協調機制,在決策步驟前啟動針對性配置的專業代理最小子集,通過注入定向上下文來輔助工具使用代理。在開源LLM上的實驗表明,該框架在各評估模式下相較標準基線實現了最高27%的性能提升。這些結果凸顯了通過專業代理針對常見故障進行定向上下文策展,是構建模擬現實對話場景的可靠多輪工具使用型LLM代理的重要設計原則。
English
Large Language Models are being increasingly deployed as the decision-making core of autonomous agents capable of effecting change in external environments. Yet, in conversational benchmarks, which simulate real-world customer-centric issue resolution scenarios, these agents frequently fail due to the cascading effects of incorrect decision-making. These challenges are particularly pronounced for open-source LLMs with smaller parameter sizes, limited context windows, and constrained inference budgets, which contribute to increased error accumulation in agentic settings. To tackle these challenges, we present the Failure-Aware Meta-Agentic (FAMA) framework. FAMA operates in two stages: first, it analyzes failure trajectories from baseline agents to identify the most prevalent errors; second, it employs an orchestration mechanism that activates a minimal subset of specialized agents tailored to address these failures by injecting a targeted context for the tool-use agent before the decision-making step. Experiments across open-source LLMs demonstrate performance gains up to 27% across evaluation modes over standard baselines. These results highlight that targeted curation of context through specialized agents to address common failures is a valuable design principle for building reliable, multi-turn tool-use LLM agents that simulate real-world conversational scenarios.
PDF61May 1, 2026