在手機上使用相機融合實現高效混合式變焦
Efficient Hybrid Zoom using Camera Fusion on Mobile Phones
January 2, 2024
作者: Xiaotong Wu, Wei-Sheng Lai, YiChang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang
cs.AI
摘要
單反相機可以通過調整鏡頭距離或更換鏡頭類型來實現多個變焦級別。然而,由於空間限制,智能手機無法使用這些技術。大多數智能手機製造商採用混合變焦系統:通常是在低變焦級別下使用廣角(W)鏡頭和在高變焦級別下使用望遠(T)鏡頭。為了模擬W和T之間的變焦級別,這些系統會從W裁剪並數字上取樣圖像,導致重要細節的損失。在本文中,我們提出了一種在移動設備上進行混合變焦超分辨率的高效系統,該系統捕獲同步的W和T拍攝對並利用機器學習模型將T的細節對齊並轉移到W。我們進一步開發了一種適應性混合方法,考慮了景深不匹配、場景遮擋、流不確定性和對齊錯誤。為了最小化領域差異,我們設計了一個雙手機相機架,用於捕獲真實世界的輸入和標準答案進行監督式訓練。我們的方法在移動平台上在500毫秒內生成一張1200萬像素的圖像,在真實場景的廣泛評估中與最先進的方法相比表現優異。
English
DSLR cameras can achieve multiple zoom levels via shifting lens distances or
swapping lens types. However, these techniques are not possible on smartphone
devices due to space constraints. Most smartphone manufacturers adopt a hybrid
zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T)
camera at a high zoom level. To simulate zoom levels between W and T, these
systems crop and digitally upsample images from W, leading to significant
detail loss. In this paper, we propose an efficient system for hybrid zoom
super-resolution on mobile devices, which captures a synchronous pair of W and
T shots and leverages machine learning models to align and transfer details
from T to W. We further develop an adaptive blending method that accounts for
depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment
errors. To minimize the domain gap, we design a dual-phone camera rig to
capture real-world inputs and ground-truths for supervised training. Our method
generates a 12-megapixel image in 500ms on a mobile platform and compares
favorably against state-of-the-art methods under extensive evaluation on
real-world scenarios.