特徴から行動へ：従来型AIシステムとエージェント型AIシステムにおける説明可能性

要旨

過去10年間、説明可能なAIの研究は主に個々のモデル予測の解釈に焦点を当て、固定された意思決定構造の下で入出力を関連付ける事後的説明を生成してきた。大規模言語モデル（LLM）の最近の進歩により、複数ステップの軌跡にわたって行動が展開するエージェント型AIシステムが可能となった。これらの設定では、成功と失敗は単一の出力ではなく、一連の意思決定によって決定される。有用ではあるものの、静的な予測向けに設計された説明手法が、時間の経過とともに行動が創発するエージェント設定にどのように転換されるかは不明なままである。本研究では、属性ベースの説明と軌跡ベースの診断を両設定で比較することで、静的説明可能性とエージェント型説明可能性の間の隔たりを埋める。この区別を明確にするため、静的分類タスクで使用される属性ベースの説明と、エージェント型ベンチマーク（TAU-bench AirlineおよびAssistantBench）で使用される軌跡ベースの診断を実証的に比較する。結果は、属性手法が静的設定では安定した特徴ランキングを達成する（Spearman ρ=0.86）一方で、エージェント軌跡における実行レベル失敗の診断には信頼性をもって適用できないことを示す。対照的に、エージェント設定向けの軌跡に基づくルーブリック評価は、一貫して行動の崩壊を局所化し、状態追跡の不整合が失敗した実行で2.7倍頻繁に発生し、成功確率を49%減少させることを明らかにした。これらの知見は、自律的AI行動を評価・診断する際のエージェントシステム向け軌跡レベル説明可能性への転換を促すものである。リソース: https://github.com/VectorInstitute/unified-xai-evaluation-framework https://vectorinstitute.github.io/unified-xai-evaluation-framework

English

Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories. In these settings, success and failure are determined by sequences of decisions rather than a single output. While useful, it remains unclear how explanation approaches designed for static predictions translate to agentic settings where behaviour emerges over time. In this work, we bridge the gap between static and agentic explainability by comparing attribution-based explanations with trace-based diagnostics across both settings. To make this distinction explicit, we empirically compare attribution-based explanations used in static classification tasks with trace-based diagnostics used in agentic benchmarks (TAU-bench Airline and AssistantBench). Our results show that while attribution methods achieve stable feature rankings in static settings (Spearman ρ= 0.86), they cannot be applied reliably to diagnose execution-level failures in agentic trajectories. In contrast, trace-grounded rubric evaluation for agentic settings consistently localizes behaviour breakdowns and reveals that state tracking inconsistency is 2.7times more prevalent in failed runs and reduces success probability by 49\%. These findings motivate a shift towards trajectory-level explainability for agentic systems when evaluating and diagnosing autonomous AI behaviour. Resources: https://github.com/VectorInstitute/unified-xai-evaluation-framework https://vectorinstitute.github.io/unified-xai-evaluation-framework

特徴から行動へ：従来型AIシステムとエージェント型AIシステムにおける説明可能性

From Features to Actions: Explainability in Traditional and Agentic AI Systems

要旨

Support