AI 에이전트 행동 과학

초록

대규모 언어 모델(LLM)의 최근 발전은 다양한 상호작용적이고 개방형 시나리오에서 계획, 적응, 사회적 역학 등 점점 더 인간과 유사한 행동을 보이는 AI 에이전트의 개발을 가능하게 했습니다. 이러한 행동은 단순히 기저 모델의 내부 구조에서 비롯된 것이 아니라, 특정 맥락 내에서 작동하는 에이전트 시스템에 통합되면서 환경적 요인, 사회적 단서, 상호작용 피드백이 시간에 따라 행동을 형성하는 과정에서 나타나는 것입니다. 이러한 진화는 새로운 과학적 관점, 즉 AI 에이전트 행동 과학의 필요성을 요구합니다. 이 관점은 내부 메커니즘에만 초점을 맞추기보다는 행동의 체계적 관찰, 가설 검증을 위한 개입 설계, 그리고 시간에 따라 AI 에이전트가 어떻게 행동하고 적응하며 상호작용하는지에 대한 이론 기반 해석을 강조합니다. 우리는 개별 에이전트, 다중 에이전트, 인간-에이전트 상호작용 설정에 걸친 연구를 체계화하고, 이 관점이 공정성, 안전성, 해석 가능성, 책임성, 프라이버시를 행동적 속성으로 다루며 책임 있는 AI를 어떻게 지원하는지 추가로 보여줍니다. 최근 연구 결과를 통합하고 미래 방향을 제시함으로써, 우리는 AI 에이전트 행동 과학을 전통적인 모델 중심 접근법에 필수적인 보완으로 자리매김하며, 점점 더 자율적인 AI 시스템의 실세계 행동을 이해, 평가, 통제하기 위한 필수 도구를 제공합니다.

English

Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems operating within specific contexts, where environmental factors, social cues, and interaction feedbacks shape behavior over time. This evolution necessitates a new scientific perspective: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this perspective emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual agent, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional model-centric approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.