ChatPaper.aiChatPaper

UniSHARP: 通用清晰單目視圖合成

UniSHARP: Universal Sharp Monocular View Synthesis

June 5, 2026
作者: Meixi Song, Dizhe Zhang, Hao Ren, Ruiyang Zhang, Bo Du, Ming-Hsuan Yang, Lu Qi
cs.AI

摘要

在本研究中,我们致力于扩展广泛使用的逼真视角合成方法SHARP,以实现覆盖从传统透视相机到广角、鱼眼及全景环境的连续相机系统的通用单目渲染。为突破SHARP基于针孔模型的特定假设,我们的核心思路是将各类图像统一对齐至全向潜在空间。由此提出UniSHARP方法,在特征空间和高斯空间中进行隐式对齐。具体而言,高斯基元沿射线与径向距离排列,构成基于射线的通用表征;同时,由类UniK3D编码器提取的二维语义特征与三维空间特征被联合解码,以生成完整的高斯点云。为全面评估本方法,我们构建了一个涵盖多种成像系统及多样化场景的基准数据集,并进一步按视场角(FoV)分层,以精细评估通用单目渲染任务性能。在该基准上的大量实验表明,UniSHARP效果显著,远超其他对比方法。项目页面详见:https://insta360-research-team.github.io/Unisharp-website/
English
In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera systems, from conventional perspective cameras to wide-field-of-view, fisheye and omnidirectional panoramic settings. To overcome the pinhole-specific assumptions of SHARP, our key idea is to align various images in a unified omnidirectional latent space. Thus, we propose UniSHARP, which performs implicit alignment in both feature and Gaussian spaces. Specifically, Gaussian primitives are arranged along rays and radial distances in a ray-based universal representation, while 2D semantic and 3D spatial features extracted from UniK3D-inspired encoders are jointly decoded to generate the complete Gaussian cloud. To comprehensively evaluate our method, we construct a benchmark covering diverse imaging systems across various scenes. The benchmark is further stratified by field of view (FoV) to enable fine-grained assessment of the universal monocular rendering task. Extensive experiments on the proposed benchmark demonstrate the effectiveness of UniSHARP, outperforming alternative methods by a large margin. The project page can be found at: https://insta360-research-team.github.io/Unisharp-website/