基于视频扩散模型的重聚焦学习
Learning to Refocus with Video Diffusion Models
December 22, 2025
作者: SaiKiran Tedla, Zhoutong Zhang, Xuaner Zhang, Shumian Xin
cs.AI
摘要
对焦是摄影的基石,然而自动对焦系统常无法准确捕捉目标主体,用户往往需要在拍摄后重新调整焦点。我们提出一种基于视频扩散模型的新型后期对焦技术,可实现逼真的焦点重定位。该方法仅需单张失焦图像,即可生成感知准确的焦堆栈(以视频序列形式呈现),支持交互式焦点调整并解锁一系列下游应用。为支持本项研究及未来探索,我们发布了在多样化智能手机实拍条件下获取的大规模焦堆栈数据集。在各类复杂场景中,我们的方法在感知质量与鲁棒性方面均显著优于现有技术,为日常摄影中更先进的焦点编辑功能开辟了新路径。代码与数据详见 www.learn2refocus.github.io。
English
Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io