MeshFormer: Hoogwaardige Mesh-generatie met 3D-geleide Reconstructie Model

Samenvatting

Open-world 3D-reconstructiemodellen hebben recentelijk aanzienlijke aandacht gekregen. Zonder voldoende 3D-inductieve bias brengen bestaande methoden echter doorgaans hoge trainingskosten met zich mee en hebben ze moeite om hoogwaardige 3D-meshes te extraheren. In dit werk introduceren we MeshFormer, een sparse-view reconstructiemodel dat expliciet gebruikmaakt van 3D-native structuur, invoerbegeleiding en trainingssupervisie. In plaats van een triplane-representatie te gebruiken, slaan we features op in 3D sparse voxels en combineren we transformers met 3D-convoluties om een expliciete 3D-structuur en projectieve bias te benutten. Naast sparse-view RGB-invoer vereisen we dat het netwerk invoer accepteert en corresponderende normal maps genereert. De invoer-normal maps kunnen worden voorspeld door 2D-diffusiemodellen, wat de begeleiding en verfijning van de geometrie-aanzienlijk ondersteunt. Bovendien leren we door Signed Distance Function (SDF)-supervisie te combineren met surface rendering direct hoogwaardige meshes te genereren zonder complexe meerfasige trainingsprocessen. Door deze expliciete 3D-biases te integreren, kan MeshFormer efficiënt worden getraind en levert het hoogwaardige getextureerde meshes met fijnmazige geometrische details. Het kan ook worden geïntegreerd met 2D-diffusiemodellen om snelle single-image-to-3D en text-to-3D taken mogelijk te maken. Projectpagina: https://meshformer3d.github.io

English

Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. Specifically, instead of using a triplane representation, we store features in 3D sparse voxels and combine transformers with 3D convolutions to leverage an explicit 3D structure and projective bias. In addition to sparse-view RGB input, we require the network to take input and generate corresponding normal maps. The input normal maps can be predicted by 2D diffusion models, significantly aiding in the guidance and refinement of the geometry's learning. Moreover, by combining Signed Distance Function (SDF) supervision with surface rendering, we directly learn to generate high-quality meshes without the need for complex multi-stage training processes. By incorporating these explicit 3D biases, MeshFormer can be trained efficiently and deliver high-quality textured meshes with fine-grained geometric details. It can also be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks. Project page: https://meshformer3d.github.io

MeshFormer: Hoogwaardige Mesh-generatie met 3D-geleide Reconstructie Model

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Samenvatting

Summary

Support

Support