編排者-代理信任:具備信任感知編排與基於RAG推理的模組化代理式AI視覺分類系統
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning
July 9, 2025
作者: Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas
cs.AI
摘要
现代人工智能(AI)日益依赖融合视觉与语言理解的多智能体架构。然而,一个紧迫的挑战依然存在:我们如何能在无需微调的零样本场景下信任这些智能体?我们提出了一种新颖的模块化Agentic AI视觉分类框架,该框架整合了通用多模态智能体、非视觉推理协调器以及检索增强生成(RAG)模块。应用于苹果叶病害诊断,我们对比了三种配置:(I)基于置信度协调的零样本配置,(II)性能提升的微调智能体配置,以及(III)通过CLIP图像检索与重评估循环增强的信任校准协调配置。利用置信度校准指标(ECE, OCR, CCC),协调器调节各智能体间的信任度。我们的结果显示,在零样本设置下,采用信任感知协调与RAG,准确率提升了77.94%,整体达到85.63%。GPT-4o展现出更好的校准性,而Qwen-2.5-VL则表现出过度自信。此外,基于图像RAG的预测通过视觉相似案例进行验证,通过迭代重评估纠正了智能体的过度自信。所提出的系统将感知(视觉智能体)与元推理(协调器)分离,实现了可扩展且可解释的多智能体AI。这一蓝图可扩展至诊断、生物学及其他对信任要求严格的领域。所有模型、提示、结果及系统组件,包括完整的软件源代码,均已公开发布于Github以支持可重复性、透明性及社区基准测试:https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust。
English
Modern Artificial Intelligence (AI) increasingly relies on multi-agent
architectures that blend visual and language understanding. Yet, a pressing
challenge remains: How can we trust these agents especially in zero-shot
settings with no fine-tuning? We introduce a novel modular Agentic AI visual
classification framework that integrates generalist multimodal agents with a
non-visual reasoning orchestrator and a Retrieval-Augmented Generation (RAG)
module. Applied to apple leaf disease diagnosis, we benchmark three
configurations: (I) zero-shot with confidence-based orchestration, (II)
fine-tuned agents with improved performance, and (III) trust-calibrated
orchestration enhanced by CLIP-based image retrieval and re-evaluation loops.
Using confidence calibration metrics (ECE, OCR, CCC), the orchestrator
modulates trust across agents. Our results demonstrate a 77.94\% accuracy
improvement in the zero-shot setting using trust-aware orchestration and RAG,
achieving 85.63\% overall. GPT-4o showed better calibration, while Qwen-2.5-VL
displayed overconfidence. Furthermore, image-RAG grounded predictions with
visually similar cases, enabling correction of agent overconfidence via
iterative re-evaluation. The proposed system separates perception (vision
agents) from meta-reasoning (orchestrator), enabling scalable and interpretable
multi-agent AI. This blueprint is extensible to diagnostics, biology, and other
trust-critical domains. All models, prompts, results, and system components
including the complete software source code are openly released to support
reproducibility, transparency, and community benchmarking at Github:
https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust