Hyper-VolTran:通过超网络实现快速且具有泛化能力的一次性图像到3D物体结构转换
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
December 24, 2023
作者: Christian Simon, Sen He, Juan-Manuel Perez-Rua, Frost Xu, Amine Benhalloum, Tao Xiang
cs.AI
摘要
从单个视角解决图像到三维的问题是一个不适定问题,目前通过扩散模型解决这个问题的神经重建方法仍然依赖于特定场景的优化,限制了它们的泛化能力。为了克服现有方法在泛化和一致性方面的局限性,我们引入了一种新颖的神经渲染技术。我们的方法采用有符号距离函数作为表面表示,并通过几何编码体积和超网络结合可泛化的先验。具体而言,我们的方法从生成的多视角输入构建神经编码体积。我们在测试时根据输入图像调整有符号距离函数网络的权重,通过超网络使模型能够以前馈方式适应新场景。为了减少合成视图产生的伪影,我们提出使用体积变换模块来改善图像特征的聚合,而不是单独处理每个视角。通过我们提出的方法,命名为超体积变换(Hyper-VolTran),我们避免了特定场景优化的瓶颈,并保持了从多个视角生成的图像的一致性。我们的实验表明了我们提出的方法的优势,具有一致的结果和快速生成。
English
Solving image-to-3D from a single view is an ill-posed problem, and current
neural reconstruction methods addressing it through diffusion models still rely
on scene-specific optimization, constraining their generalization capability.
To overcome the limitations of existing approaches regarding generalization and
consistency, we introduce a novel neural rendering technique. Our approach
employs the signed distance function as the surface representation and
incorporates generalizable priors through geometry-encoding volumes and
HyperNetworks. Specifically, our method builds neural encoding volumes from
generated multi-view inputs. We adjust the weights of the SDF network
conditioned on an input image at test-time to allow model adaptation to novel
scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts
derived from the synthesized views, we propose the use of a volume transformer
module to improve the aggregation of image features instead of processing each
viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we
avoid the bottleneck of scene-specific optimization and maintain consistency
across the images generated from multiple viewpoints. Our experiments show the
advantages of our proposed approach with consistent results and rapid
generation.