デジタルヒューマンのための対話型知能に向けて

要旨

本論文では、人格に沿った表現、適応的インタラクション、自己進化を可能とする新しいデジタルヒューマンのパラダイム「Interactive Intelligence」を提案する。これを実現するため、思考モジュール、音声生成モジュール、顔面アニメーションモジュール、身体動作モジュール、レンダリングモジュールの5つの専門モジュールから構成されるエンドツーエンドフレームワーク「Mio（Multimodal Interactive Omni-Avatar）」を開発した。この統合アーキテクチャは、認知推論とリアルタイムマルチモーダル表現を統合し、流動的で一貫性のあるインタラクションを実現する。さらに、対話型知能の能力を厳密に評価する新たなベンチマークを確立した。大規模な実験により、本フレームワークがすべての評価次元において既存の最先端手法を凌駕する優れた性能を達成することを実証した。これらの貢献により、デジタルヒューマンは表面的な模倣を超え、知的対話へと進化する。

English

We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face Animator, Body Animator, and Renderer. This unified architecture integrates cognitive reasoning with real-time multimodal embodiment to enable fluid, consistent interaction. Furthermore, we establish a new benchmark to rigorously evaluate the capabilities of interactive intelligence. Extensive experiments demonstrate that our framework achieves superior performance compared to state-of-the-art methods across all evaluated dimensions. Together, these contributions move digital humans beyond superficial imitation toward intelligent interaction.

デジタルヒューマンのための対話型知能に向けて

Towards Interactive Intelligence for Digital Humans

要旨

Support