오케스트레이터-에이전트 신뢰: 신뢰 인식 오케스트레이션과 RAG 기반 추론을 갖춘 모듈형 에이전트 AI 시각 분류 시스템

초록

현대 인공지능(AI)은 시각과 언어 이해를 결합한 다중 에이전트 아키텍처에 점점 더 의존하고 있습니다. 그러나 여전히 해결해야 할 중요한 과제가 남아 있습니다: 특히 파인튜닝 없이 제로샷 설정에서 이러한 에이전트를 어떻게 신뢰할 수 있을까요? 우리는 일반적인 멀티모달 에이전트와 비시각적 추론 오케스트레이터, 그리고 검색 증강 생성(RAG) 모듈을 통합한 새로운 모듈식 에이전트 AI 시각 분류 프레임워크를 소개합니다. 이를 사과 잎 질병 진단에 적용하여 세 가지 구성을 벤치마킹했습니다: (I) 신뢰 기반 오케스트레이션을 사용한 제로샷, (II) 성능이 개선된 파인튜닝된 에이전트, 그리고 (III) CLIP 기반 이미지 검색과 재평가 루프를 통해 강화된 신뢰 보정 오케스트레이션. 신뢰 보정 지표(ECE, OCR, CCC)를 사용하여 오케스트레이터는 에이전트 간의 신뢰를 조절합니다. 우리의 결과는 신뢰 인식 오케스트레이션과 RAG를 사용하여 제로샷 설정에서 77.94%의 정확도 향상을 보여주며, 전체적으로 85.63%의 정확도를 달성했습니다. GPT-4o는 더 나은 보정을 보여준 반면, Qwen-2.5-VL은 과신 경향을 나타냈습니다. 또한, 이미지-RAG는 시각적으로 유사한 사례를 기반으로 예측을 근거로 하여, 반복적인 재평가를 통해 에이전트의 과신을 수정할 수 있게 했습니다. 제안된 시스템은 인식(시각 에이전트)과 메타 추론(오케스트레이터)을 분리하여 확장 가능하고 해석 가능한 다중 에이전트 AI를 가능하게 합니다. 이 청사진은 진단, 생물학 및 기타 신뢰가 중요한 분야로 확장 가능합니다. 모든 모델, 프롬프트, 결과 및 시스템 구성 요소를 포함한 완전한 소프트웨어 소스 코드는 재현성, 투명성 및 커뮤니티 벤치마킹을 지원하기 위해 Github에 공개되었습니다: https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust

English

Modern Artificial Intelligence (AI) increasingly relies on multi-agent architectures that blend visual and language understanding. Yet, a pressing challenge remains: How can we trust these agents especially in zero-shot settings with no fine-tuning? We introduce a novel modular Agentic AI visual classification framework that integrates generalist multimodal agents with a non-visual reasoning orchestrator and a Retrieval-Augmented Generation (RAG) module. Applied to apple leaf disease diagnosis, we benchmark three configurations: (I) zero-shot with confidence-based orchestration, (II) fine-tuned agents with improved performance, and (III) trust-calibrated orchestration enhanced by CLIP-based image retrieval and re-evaluation loops. Using confidence calibration metrics (ECE, OCR, CCC), the orchestrator modulates trust across agents. Our results demonstrate a 77.94\% accuracy improvement in the zero-shot setting using trust-aware orchestration and RAG, achieving 85.63\% overall. GPT-4o showed better calibration, while Qwen-2.5-VL displayed overconfidence. Furthermore, image-RAG grounded predictions with visually similar cases, enabling correction of agent overconfidence via iterative re-evaluation. The proposed system separates perception (vision agents) from meta-reasoning (orchestrator), enabling scalable and interpretable multi-agent AI. This blueprint is extensible to diagnostics, biology, and other trust-critical domains. All models, prompts, results, and system components including the complete software source code are openly released to support reproducibility, transparency, and community benchmarking at Github: https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust

오케스트레이터-에이전트 신뢰: 신뢰 인식 오케스트레이션과 RAG 기반 추론을 갖춘 모듈형 에이전트 AI 시각 분류 시스템

Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning

초록

Support