オーケストレータ-エージェント信頼性：信頼認識型オーケストレーションとRAGベース推論を備えたモジュール型エージェントAI視覚分類システム

要旨

現代の人工知能（AI）は、視覚と言語理解を統合したマルチエージェントアーキテクチャにますます依存している。しかし、特にファインチューニングなしのゼロショット設定において、これらのエージェントをどのように信頼できるかという課題が残されている。本研究では、汎用マルチモーダルエージェント、非視覚的推論オーケストレータ、および検索拡張生成（RAG）モジュールを統合した新しいモジュール型エージェントAI視覚分類フレームワークを提案する。リンゴの葉の病気診断に適用し、3つの構成をベンチマークした：（I）信頼度ベースのオーケストレーションを用いたゼロショット、（II）性能が向上したファインチューニング済みエージェント、（III）CLIPベースの画像検索と再評価ループによって強化された信頼度調整オーケストレーション。信頼度調整指標（ECE、OCR、CCC）を使用して、オーケストレータはエージェント間の信頼度を調整する。結果として、信頼度を考慮したオーケストレーションとRAGを使用することで、ゼロショット設定において77.94%の精度向上を達成し、全体で85.63%の精度を実現した。GPT-4oはより良い調整を示した一方、Qwen-2.5-VLは過信を示した。さらに、画像-RAGは視覚的に類似したケースに基づいて予測を接地し、反復的な再評価を通じてエージェントの過信を修正することを可能にした。提案されたシステムは、知覚（視覚エージェント）とメタ推論（オーケストレータ）を分離し、スケーラブルで解釈可能なマルチエージェントAIを実現する。この設計図は、診断、生物学、およびその他の信頼が重要な分野に拡張可能である。すべてのモデル、プロンプト、結果、およびシステムコンポーネント（完全なソフトウェアソースコードを含む）は、再現性、透明性、およびコミュニティベンチマークを支援するためにGithubで公開されている：https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust

English

Modern Artificial Intelligence (AI) increasingly relies on multi-agent architectures that blend visual and language understanding. Yet, a pressing challenge remains: How can we trust these agents especially in zero-shot settings with no fine-tuning? We introduce a novel modular Agentic AI visual classification framework that integrates generalist multimodal agents with a non-visual reasoning orchestrator and a Retrieval-Augmented Generation (RAG) module. Applied to apple leaf disease diagnosis, we benchmark three configurations: (I) zero-shot with confidence-based orchestration, (II) fine-tuned agents with improved performance, and (III) trust-calibrated orchestration enhanced by CLIP-based image retrieval and re-evaluation loops. Using confidence calibration metrics (ECE, OCR, CCC), the orchestrator modulates trust across agents. Our results demonstrate a 77.94\% accuracy improvement in the zero-shot setting using trust-aware orchestration and RAG, achieving 85.63\% overall. GPT-4o showed better calibration, while Qwen-2.5-VL displayed overconfidence. Furthermore, image-RAG grounded predictions with visually similar cases, enabling correction of agent overconfidence via iterative re-evaluation. The proposed system separates perception (vision agents) from meta-reasoning (orchestrator), enabling scalable and interpretable multi-agent AI. This blueprint is extensible to diagnostics, biology, and other trust-critical domains. All models, prompts, results, and system components including the complete software source code are openly released to support reproducibility, transparency, and community benchmarking at Github: https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust

オーケストレータ-エージェント信頼性：信頼認識型オーケストレーションとRAGベース推論を備えたモジュール型エージェントAI視覚分類システム

Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning

要旨

Support