텍스트 아레나

초록

TextArena는 대규모 언어 모델(LLM)의 에이전트 행동을 훈련하고 평가하기 위한 경쟁 기반 텍스트 게임의 오픈소스 컬렉션입니다. 이 플랫폼은 57개 이상의 독특한 환경(단일 플레이어, 2인 플레이어, 다중 플레이어 설정 포함)을 포괄하며, 온라인 플레이 시스템(인간 및 제출된 다른 모델과 대전 가능)과 실시간 TrueSkill 점수를 통해 모델의 능력을 쉽게 평가할 수 있도록 합니다. 전통적인 벤치마크는 협상, 마음 이론, 속임수와 같은 동적 사회적 기술을 거의 평가하지 않아, 이러한 격차를 TextArena가 해소합니다. 연구, 커뮤니티, 확장성을 고려하여 설계된 TextArena는 새로운 게임 추가, 프레임워크 적응, 모델 테스트, 모델과 대전, 모델 훈련의 용이성을 강조합니다. 환경, 게임, 리더보드, 예제에 대한 상세한 문서는 https://github.com/LeonGuertler/TextArena와 https://www.textarena.ai/에서 확인할 수 있습니다.

English

TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.