UniMesh：统一三维网格理解与生成

摘要

近年来，三维视觉领域的进展催生了专注于三维理解（如形状分类、分割、重建）或三维生成（如合成、补全、编辑）的专用模型。然而，这些任务往往被孤立处理，导致架构与表征碎片化，阻碍了知识迁移与场景整体建模。为解决这些问题，我们提出UniMesh——在单一架构内协同学习三维生成与理解的统一框架。首先，我们设计了一种新颖的网格头部作为跨模型接口，将基于扩散的图像生成与隐式形状解码器相连接。其次，我们开发了网格链技术，通过潜在空间提示与再生成的闭环循环，实现用户驱动的语义网格编辑。第三，我们引入基于执行者-评估者-自省三元组的自反思机制，用于诊断并修正三维描述等高级任务中的错误。实验结果表明，UniMesh不仅在标准基准测试中达到领先性能，更解锁了迭代编辑及生成与理解相互增强的新能力。代码地址：https://github.com/AIGeeksGroup/UniMesh 项目网站：https://aigeeksgroup.github.io/UniMesh

English

Recent advances in 3D vision have led to specialized models for either 3D understanding (e.g., shape classification, segmentation, reconstruction) or 3D generation (e.g., synthesis, completion, and editing). However, these tasks are often tackled in isolation, resulting in fragmented architectures and representations that hinder knowledge transfer and holistic scene modeling. To address these challenges, we propose UniMesh, a unified framework that jointly learns 3D generation and understanding within a single architecture. First, we introduce a novel Mesh Head that acts as a cross model interface, bridging diffusion based image generation with implicit shape decoders. Second, we develop Chain of Mesh (CoM), a geometric instantiation of iterative reasoning that enables user driven semantic mesh editing through a closed loop latent, prompting, and re generation cycle. Third, we incorporate a self reflection mechanism based on an Actor Evaluator Self reflection triad to diagnose and correct failures in high level tasks like 3D captioning. Experimental results demonstrate that UniMesh not only achieves competitive performance on standard benchmarks but also unlocks novel capabilities in iterative editing and mutual enhancement between generation and understanding. Code: https://github.com/AIGeeksGroup/UniMesh. Website: https://aigeeksgroup.github.io/UniMesh.