Uni-3DAR：圧縮された空間トークンを用いた自己回帰による統合的な3D生成と理解

要旨

大規模言語モデルとそのマルチモーダル拡張の最近の進展は、自己回帰的な次トークン予測を通じて生成と理解を統合する手法の有効性を実証してきました。しかし、AI for Scienceにおける3D構造生成と理解（{3D GU}）の重要性にもかかわらず、これらのタスクは独立して進化しており、自己回帰的手法は未だ十分に探求されていません。このギャップを埋めるため、我々はUni-3DARを提案します。これは、自己回帰予測を通じて{3D GU}タスクをシームレスに統合する統一フレームワークです。Uni-3DARの中核では、3D空間をオクツリーを用いて圧縮する新しい階層的トークン化を採用し、3D構造の内在的なスパース性を活用します。さらに、微細な構造の詳細を捉えるための追加のトークン化を行い、原子種や正確な空間座標といった重要な属性をマイクロスコピックな3D構造において表現します。効率と効果を高めるため、2つの最適化を提案します。1つ目は、オクツリートークンシーケンスを最大8倍に圧縮する2レベルのサブツリー圧縮戦略です。2つ目は、動的に変化するトークン位置に適応したマスク付き次トークン予測メカニズムで、モデルの性能を大幅に向上させます。これらの戦略を組み合わせることで、Uni-3DARは多様な{3D GU}タスクを単一の自己回帰フレームワーク内で統合することに成功しました。分子、タンパク質、ポリマー、結晶を含む複数のマイクロスコピックな{3D GU}タスクにおける広範な実験により、その有効性と汎用性が検証されました。特に、Uni-3DARは従来の最先端の拡散モデルを大幅に上回り、最大256％の相対的改善を達成するとともに、推論速度を最大21.8倍高速化しました。コードはhttps://github.com/dptech-corp/Uni-3DARで公開されています。

English

Recent advancements in large language models and their multi-modal extensions have demonstrated the effectiveness of unifying generation and understanding through autoregressive next-token prediction. However, despite the critical role of 3D structural generation and understanding ({3D GU}) in AI for science, these tasks have largely evolved independently, with autoregressive methods remaining underexplored. To bridge this gap, we introduce Uni-3DAR, a unified framework that seamlessly integrates {3D GU} tasks via autoregressive prediction. At its core, Uni-3DAR employs a novel hierarchical tokenization that compresses 3D space using an octree, leveraging the inherent sparsity of 3D structures. It then applies an additional tokenization for fine-grained structural details, capturing key attributes such as atom types and precise spatial coordinates in microscopic 3D structures. We further propose two optimizations to enhance efficiency and effectiveness. The first is a two-level subtree compression strategy, which reduces the octree token sequence by up to 8x. The second is a masked next-token prediction mechanism tailored for dynamically varying token positions, significantly boosting model performance. By combining these strategies, Uni-3DAR successfully unifies diverse {3D GU} tasks within a single autoregressive framework. Extensive experiments across multiple microscopic {3D GU} tasks, including molecules, proteins, polymers, and crystals, validate its effectiveness and versatility. Notably, Uni-3DAR surpasses previous state-of-the-art diffusion models by a substantial margin, achieving up to 256\% relative improvement while delivering inference speeds up to 21.8x faster. The code is publicly available at https://github.com/dptech-corp/Uni-3DAR.

Uni-3DAR：圧縮された空間トークンを用いた自己回帰による統合的な3D生成と理解

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

要旨

Support