让任意角色在任意世界中动起来
Animate Any Character in Any World
December 18, 2025
作者: Yitong Wang, Fangyun Wei, Hongyang Zhang, Bo Dai, Yan Lu
cs.AI
摘要
世界模型的最新进展显著提升了交互式环境模拟能力。现有方法主要分为两类:一是静态世界生成模型,可构建不含主动智能体的三维环境;二是可控实体模型,允许单一实体在不可控环境中执行有限动作。本研究提出的AniX框架,既保持了静态世界生成的真实感与结构基础,又将可控实体模型扩展至支持用户指定角色执行开放式动作。用户只需提供三维高斯散射场景与角色,即可通过自然语言指令引导角色完成从基础移动到处物交互的多样化行为,并自由探索环境。AniX通过条件自回归视频生成框架,合成具有时间连贯性的视频片段,确保与原始场景及角色的视觉保真度。基于预训练视频生成器,我们的训练策略在保持动作与角色泛化能力的同时,显著提升了运动动力学表现。评估体系涵盖视觉质量、角色一致性、动作可控性及长时序连贯性等多维度指标。
English
Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce AniX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors from basic locomotion to object-centric interactions while freely exploring the environment. AniX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.