借助视频扩散模型学习重聚焦技术

摘要

焦点是摄影的基石，然而自动对焦系统常无法准确捕捉目标主体，用户往往需要在拍摄后重新调整焦点。我们提出了一种基于视频扩散模型的创新性后期对焦技术，能够实现逼真的焦点重定位。该方法仅需单张虚化图像，即可生成感知准确的焦点堆栈（以视频序列形式呈现），支持交互式焦点调整并开启多种下游应用场景。为支持本项研究及未来探索，我们发布了大规模真实手机拍摄环境下的焦点堆栈数据集。在各类复杂场景中，我们的方法在感知质量与鲁棒性方面均显著优于现有技术，为日常摄影中更先进的焦点编辑功能开辟了新路径。代码与数据集详见www.learn2refocus.github.io。

English

Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io