Uni-3DAR:通過對壓縮空間標記的自回歸實現統一的3D生成與理解
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
March 20, 2025
作者: Shuqi Lu, Haowei Lin, Lin Yao, Zhifeng Gao, Xiaohong Ji, Weinan E, Linfeng Zhang, Guolin Ke
cs.AI
摘要
近期,大型语言模型及其多模态扩展的进展展示了通过自回归下一词预测统一生成与理解的有效性。然而,尽管三维结构生成与理解({3D GU})在科学人工智能中扮演着关键角色,这些任务大多独立发展,自回归方法仍未被充分探索。为填补这一空白,我们提出了Uni-3DAR,一个通过自回归预测无缝整合{3D GU}任务的统一框架。Uni-3DAR的核心在于采用了一种新颖的分层标记化方法,利用八叉树压缩三维空间,充分利用三维结构固有的稀疏性。随后,它应用了额外的标记化以捕捉微观三维结构中的精细细节,如原子类型和精确空间坐标等关键属性。我们进一步提出了两项优化以提升效率和效果。首先是两级子树压缩策略,可将八叉树标记序列减少多达8倍。其次是为动态变化标记位置量身定制的掩码下一词预测机制,显著提升了模型性能。通过结合这些策略,Uni-3DAR成功地将多样化的{3D GU}任务统一于单一自回归框架内。在包括分子、蛋白质、聚合物和晶体在内的多种微观{3D GU}任务上的广泛实验验证了其有效性和通用性。值得注意的是,Uni-3DAR大幅超越了之前最先进的扩散模型,实现了高达256%的相对改进,同时推理速度提升了多达21.8倍。代码已公开于https://github.com/dptech-corp/Uni-3DAR。
English
Recent advancements in large language models and their multi-modal extensions
have demonstrated the effectiveness of unifying generation and understanding
through autoregressive next-token prediction. However, despite the critical
role of 3D structural generation and understanding ({3D GU}) in AI for science,
these tasks have largely evolved independently, with autoregressive methods
remaining underexplored. To bridge this gap, we introduce Uni-3DAR, a unified
framework that seamlessly integrates {3D GU} tasks via autoregressive
prediction. At its core, Uni-3DAR employs a novel hierarchical tokenization
that compresses 3D space using an octree, leveraging the inherent sparsity of
3D structures. It then applies an additional tokenization for fine-grained
structural details, capturing key attributes such as atom types and precise
spatial coordinates in microscopic 3D structures. We further propose two
optimizations to enhance efficiency and effectiveness. The first is a two-level
subtree compression strategy, which reduces the octree token sequence by up to
8x. The second is a masked next-token prediction mechanism tailored for
dynamically varying token positions, significantly boosting model performance.
By combining these strategies, Uni-3DAR successfully unifies diverse {3D GU}
tasks within a single autoregressive framework. Extensive experiments across
multiple microscopic {3D GU} tasks, including molecules, proteins, polymers,
and crystals, validate its effectiveness and versatility. Notably, Uni-3DAR
surpasses previous state-of-the-art diffusion models by a substantial margin,
achieving up to 256\% relative improvement while delivering inference speeds up
to 21.8x faster. The code is publicly available at
https://github.com/dptech-corp/Uni-3DAR.Summary
AI-Generated Summary