対数エンコーディングによる潜在空間アライメントに基づくHDR映像生成

要旨

高ダイナミックレンジ（HDR）画像は、シーンの放射輝度を豊かかつ忠実に再現するが、生成モデルが学習する境界付けされ知覚的に圧縮されたデータとの不一致により、生成モデルにとって依然として課題となっている。自然な解決策はHDRのための新たな表現を学習することであるが、これは追加の複雑さとデータ要件を導入する。本研究では、事前学習済み生成モデルが既に獲得している強力な視覚的事前分布を活用することで、HDR生成がはるかに簡潔に達成可能であることを示す。シネマティックパイプラインで広く用いられる対数符号化が、HDR画像をこれらのモデルの潜在空間と自然に整合する分布に写像し、エンコーダの再学習なしで軽量なファインチューニングによる直接適応を可能にすることを観察した。入力で直接観察できない詳細を回復するため、カメラを模倣した劣化に基づく学習戦略をさらに導入し、モデルが学習済み事前分布から欠落した高ダイナミックレンジコンテンツを推論することを促進する。これらの知見を組み合わせることで、最小限の適応で事前学習済みビデオモデルを用いた高品質なHDRビデオ生成を実証し、多様なシーンと困難な照明条件において優れた結果を達成する。我々の結果は、HDRが根本的に異なる画像形成体制を表現するにも関わらず、表現が学習済み事前分布と整合するように選択されれば、生成モデルを再設計することなく効果的に扱えることを示唆する。

English

High dynamic range (HDR) imagery offers a rich and faithful representation of scene radiance, but remains challenging for generative models due to its mismatch with the bounded, perceptually compressed data on which these models are trained. A natural solution is to learn new representations for HDR, which introduces additional complexity and data requirements. In this work, we show that HDR generation can be achieved in a much simpler way by leveraging the strong visual priors already captured by pretrained generative models. We observe that a logarithmic encoding widely used in cinematic pipelines maps HDR imagery into a distribution that is naturally aligned with the latent space of these models, enabling direct adaptation via lightweight fine-tuning without retraining an encoder. To recover details that are not directly observable in the input, we further introduce a training strategy based on camera-mimicking degradations that encourages the model to infer missing high dynamic range content from its learned priors. Combining these insights, we demonstrate high-quality HDR video generation using a pretrained video model with minimal adaptation, achieving strong results across diverse scenes and challenging lighting conditions. Our results indicate that HDR, despite representing a fundamentally different image formation regime, can be handled effectively without redesigning generative models, provided that the representation is chosen to align with their learned priors.

対数エンコーディングによる潜在空間アライメントに基づくHDR映像生成

HDR Video Generation via Latent Alignment with Logarithmic Encoding

要旨

Support