모바일 폰에서 카메라 퓨전을 활용한 효율적인 하이브리드 줌

초록

DSLR 카메라는 렌즈 거리 조정이나 렌즈 교체를 통해 다양한 줌 레벨을 구현할 수 있습니다. 그러나 스마트폰 기기에서는 공간 제약으로 인해 이러한 기술을 적용할 수 없습니다. 대부분의 스마트폰 제조사는 하이브리드 줌 시스템을 채택하고 있습니다: 일반적으로 낮은 줌 레벨에서는 와이드(W) 카메라를, 높은 줌 레벨에서는 텔레포토(T) 카메라를 사용합니다. W와 T 사이의 줌 레벨을 시뮬레이션하기 위해, 이러한 시스템은 W에서 촬영된 이미지를 크롭하고 디지털 업샘플링을 수행하지만, 이로 인해 상당한 디테일 손실이 발생합니다. 본 논문에서는 모바일 기기에서 하이브리드 줌 초해상도를 위한 효율적인 시스템을 제안합니다. 이 시스템은 W와 T의 동기화된 쌍을 캡처하고, 머신 러닝 모델을 활용하여 T의 디테일을 W로 정렬 및 전달합니다. 또한, 피사계 심도 불일치, 장면 가림, 흐름 불확실성, 정렬 오류를 고려한 적응형 블렌딩 방법을 개발합니다. 도메인 격차를 최소화하기 위해, 실제 입력과 지상 실측 데이터를 캡처하기 위한 듀얼 폰 카메라 장치를 설계하여 지도 학습을 수행합니다. 우리의 방법은 모바일 플랫폼에서 500ms 내에 12메가픽셀 이미지를 생성하며, 실제 시나리오에서의 광범위한 평가를 통해 최신 기술과 비교하여 우수한 성능을 보입니다.

English

DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.

모바일 폰에서 카메라 퓨전을 활용한 효율적인 하이브리드 줌

Efficient Hybrid Zoom using Camera Fusion on Mobile Phones

초록

Support