Hyper-VolTran：透過超網路快速且具一般性的一次性影像轉3D物件結構

摘要

從單一視角解決影像到3D的問題是一個不透徹的問題，目前的神經重建方法通過擴散模型來處理，仍然依賴特定場景的優化，限制了它們的泛化能力。為了克服現有方法在泛化和一致性方面的限制，我們引入了一種新穎的神經渲染技術。我們的方法採用符號距離函數作為表面表示，並通過幾何編碼體積和超網絡來整合通用先驗。具體而言，我們的方法從生成的多視角輸入中構建神經編碼體積。我們在測試時根據輸入圖像調整SDF網絡的權重，以允許模型通過超網絡以前馳的方式適應新的場景。為了減輕從合成視圖中產生的瑕疵，我們提出使用體積轉換器模塊來改善圖像特徵的聚合，而不是分別處理每個視角。通過我們提出的方法，被稱為Hyper-VolTran，我們避免了特定場景優化的瓶頸，並保持了從多個視角生成的圖像的一致性。我們的實驗顯示了我們提出的方法的優勢，具有一致的結果和快速生成。

English

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.

Hyper-VolTran：透過超網路快速且具一般性的一次性影像轉3D物件結構

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

摘要

Support