神经重塑大师:结构对齐生成的相位保持扩散模型
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
December 4, 2025
作者: Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister
cs.AI
摘要
標準擴散方法採用高斯噪聲對數據進行破壞,其傅立葉係數具有隨機幅值和隨機相位。雖然這種方法在無條件生成或文本到圖像生成中效果顯著,但破壞相位分量會摧毀空間結構,因此不適用於需要幾何一致性的任務(如重渲染、仿真增強和圖像到圖像轉換)。我們提出相位保持擴散(φ-PD),這是一種與模型無關的擴散過程重構方法,能在隨機化幅值的同時保留輸入相位,從而無需改變架構或增加參數即可實現結構對齊的生成。我們進一步提出頻率選擇性結構(FSS)噪聲,通過單一的頻率截止參數實現對結構剛度的連續控制。φ-PD不會增加推理時間成本,且兼容任何適用於圖像或視頻的擴散模型。在逼真與風格化重渲染、以及駕駛規劃器的仿真到現實增強任務中,φ-PD均能產生可控且空間對齊的結果。應用於CARLA仿真器時,φ-PD將CARLA到Waymo規劃器的性能提升了50%。該方法與現有條件控制技術互補,可廣泛應用於圖像到圖像及視頻到視頻的生成任務。視頻、補充案例和代碼詳見我們的{項目頁面}https://yuzeng-at-tri.github.io/ppd-page/。
English
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our https://yuzeng-at-tri.github.io/ppd-page/{project page}.