LuxDiT: 비디오 확산 트랜스포머를 활용한 조명 추정

초록

단일 이미지나 비디오에서 장면 조명을 추정하는 것은 컴퓨터 비전과 그래픽스 분야에서 오랜 기간 동안 해결되지 않은 과제로 남아 있습니다. 학습 기반 접근법은 실제 고다이내믹레인지(HDR) 환경 맵 데이터의 부족으로 인해 제약을 받고 있으며, 이러한 데이터는 비용이 많이 들고 다양성도 제한적입니다. 최근 생성 모델들이 이미지 합성을 위한 강력한 사전 지식을 제공하지만, 조명 추정은 간접적인 시각적 단서에 의존하고, 전역적(비지역적) 맥락을 추론해야 하며, 고다이내믹레인지 출력을 복원해야 한다는 점에서 여전히 어려운 문제로 남아 있습니다. 우리는 LuxDiT라는 새로운 데이터 기반 접근법을 제안합니다. 이 방법은 비디오 확산 트랜스포머를 미세 조정하여 시각적 입력을 조건으로 HDR 환경 맵을 생성합니다. 다양한 조명 조건을 가진 대규모 합성 데이터셋으로 학습된 우리의 모델은 간접적인 시각적 단서로부터 조명을 추론하고 실제 장면에 효과적으로 일반화합니다. 입력과 예측된 환경 맵 간의 의미론적 정렬을 개선하기 위해, 우리는 수집된 HDR 파노라마 데이터셋을 사용한 저순위 적응 미세 조정 전략을 도입했습니다. 우리의 방법은 사실적인 각도 고주파 세부 정보를 포함한 정확한 조명 예측을 생성하며, 양적 및 질적 평가 모두에서 기존의 최첨단 기술을 능가합니다.

English

Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.

LuxDiT: 비디오 확산 트랜스포머를 활용한 조명 추정

LuxDiT: Lighting Estimation with Video Diffusion Transformer

초록

Support