基于对数编码潜在对齐的HDR视频生成
HDR Video Generation via Latent Alignment with Logarithmic Encoding
April 13, 2026
作者: Naomi Ken Korem, Mohamed Oumoumad, Harel Cain, Matan Ben Yosef, Urska Jelercic, Ofir Bibi, Yaron Inger, Or Patashnik, Daniel Cohen-Or
cs.AI
摘要
高动态范围(HDR)影像能够丰富且真实地再现场景辐射度,但由于其与生成模型训练时所使用的有界感知压缩数据不匹配,仍对生成模型构成挑战。一种自然的解决方案是为HDR学习新的表征方式,但这会引入额外的复杂性和数据需求。本研究提出,通过利用预训练生成模型已捕获的强大视觉先验,可以实现更简化的HDR生成方法。我们发现,电影工业管线中广泛采用的对数编码方式可将HDR影像映射至与这些模型潜在空间自然契合的分布,从而通过轻量级微调直接实现适配而无需重新训练编码器。为恢复输入中不可直接观测的细节,我们进一步引入基于相机模拟退化的训练策略,促使模型从学习到的先验中推断缺失的高动态范围内容。结合这些洞见,我们使用经过最小化适配的预训练视频模型实现了高质量HDR视频生成,在多样化场景和复杂光照条件下均取得显著成果。我们的研究表明,尽管HDR代表着完全不同的成像机制,只要选择与其学习先验相契合的表征方式,无需重新设计生成模型也能有效处理HDR内容。
English
High dynamic range (HDR) imagery offers a rich and faithful representation of scene radiance, but remains challenging for generative models due to its mismatch with the bounded, perceptually compressed data on which these models are trained. A natural solution is to learn new representations for HDR, which introduces additional complexity and data requirements. In this work, we show that HDR generation can be achieved in a much simpler way by leveraging the strong visual priors already captured by pretrained generative models. We observe that a logarithmic encoding widely used in cinematic pipelines maps HDR imagery into a distribution that is naturally aligned with the latent space of these models, enabling direct adaptation via lightweight fine-tuning without retraining an encoder. To recover details that are not directly observable in the input, we further introduce a training strategy based on camera-mimicking degradations that encourages the model to infer missing high dynamic range content from its learned priors. Combining these insights, we demonstrate high-quality HDR video generation using a pretrained video model with minimal adaptation, achieving strong results across diverse scenes and challenging lighting conditions. Our results indicate that HDR, despite representing a fundamentally different image formation regime, can be handled effectively without redesigning generative models, provided that the representation is chosen to align with their learned priors.