로그형 인코딩을 통한 잠재 공간 정렬 기반 HDR 비디오 생성

초록

고다이내믹 레인지(HDR) 영상은 장면의 복사조도를 풍부하고 정확하게 표현하지만, 생성 모델이 학습된 경계가 지정되고 지각적으로 압축된 데이터와 불일치하여 생성 모델에게는 여전히 어려운 과제로 남아 있습니다. 자연스러운 해결책은 HDR을 위한 새로운 표현을 학습하는 것이지만, 이는 추가적인 복잡성과 데이터 요구 사항을 초래합니다. 본 연구에서는 사전 학습된 생성 모델이 이미 포착한 강력한 시각적 사전 지식을 활용하여 훨씬 더 간단한 방법으로 HDR 생성을 달성할 수 있음을 보여줍니다. 시네마틱 파이프라인에서 널리 사용되는 로그 인코딩이 HDR 영상을 이러한 모델들의 잠재 공간과 자연스럽게 정렬되는 분포로 매핑함으로써, 인코더를 재학습시키지 않고도 경량 파인튜닝을 통한 직접적인 적용이 가능함을 관찰했습니다. 입력에서 직접 관찰할 수 없는 디테일을 복원하기 위해, 모델이 학습된 사전 지식으로부터 누락된 고다이내믹 레인지 콘텐츠를 추론하도록 유도하는 카메라 모방 열화 기반 훈련 전략을 추가로 도입했습니다. 이러한 통찰력을 결합하여, 최소한의 적용으로 사전 학습된 비디오 모델을 사용한 고품질 HDR 비디오 생성이 가능함을 입증하며, 다양한 장면과 까다로운 조명 조건에서 강력한 결과를 달성했습니다. 우리의 결과는 HDR이 근본적으로 다른 이미지 형성 체계를 나타냄에도 불구하고, 표현이 학습된 사전 지식과 일치하도록 선택된다면 생성 모델을 재설계하지 않고도 효과적으로 처리될 수 있음을 시사합니다.

English

High dynamic range (HDR) imagery offers a rich and faithful representation of scene radiance, but remains challenging for generative models due to its mismatch with the bounded, perceptually compressed data on which these models are trained. A natural solution is to learn new representations for HDR, which introduces additional complexity and data requirements. In this work, we show that HDR generation can be achieved in a much simpler way by leveraging the strong visual priors already captured by pretrained generative models. We observe that a logarithmic encoding widely used in cinematic pipelines maps HDR imagery into a distribution that is naturally aligned with the latent space of these models, enabling direct adaptation via lightweight fine-tuning without retraining an encoder. To recover details that are not directly observable in the input, we further introduce a training strategy based on camera-mimicking degradations that encourages the model to infer missing high dynamic range content from its learned priors. Combining these insights, we demonstrate high-quality HDR video generation using a pretrained video model with minimal adaptation, achieving strong results across diverse scenes and challenging lighting conditions. Our results indicate that HDR, despite representing a fundamentally different image formation regime, can be handled effectively without redesigning generative models, provided that the representation is chosen to align with their learned priors.

로그형 인코딩을 통한 잠재 공간 정렬 기반 HDR 비디오 생성

HDR Video Generation via Latent Alignment with Logarithmic Encoding

초록

Support