AI 代理行為科學

摘要

近期大型語言模型（LLMs）的進展，促成了展現出日益類人行為的AI代理的開發，這些行為包括在多樣化、互動性及開放式情境中的規劃、適應與社交動態。這些行為不僅是底層模型內部架構的產物，更源於它們被整合到在特定情境下運作的代理系統中，其中環境因素、社交信號及互動反饋隨時間塑造行為。這一演變催生了一種新的科學視角：AI代理行為科學。此視角不僅關注內部機制，更強調對行為的系統性觀察、設計干預以驗證假設，以及理論指導下對AI代理如何行動、適應和互動的解釋。我們系統化整理了在單一代理、多代理及人機互動場景中日益增長的研究，並進一步展示了這一視角如何通過將公平性、安全性、可解釋性、責任性及隱私視為行為屬性，來指導負責任的AI實踐。通過整合最新發現並規劃未來方向，我們將AI代理行為科學定位為傳統以模型為中心方法的必要補充，為理解、評估和治理日益自主的AI系統在現實世界中的行為提供了關鍵工具。

English

Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems operating within specific contexts, where environmental factors, social cues, and interaction feedbacks shape behavior over time. This evolution necessitates a new scientific perspective: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this perspective emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual agent, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional model-centric approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.