神经重塑大师:基于相位保持扩散的结构对齐生成方法
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
December 4, 2025
作者: Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister
cs.AI
摘要
标准扩散模型采用高斯噪声破坏数据,其傅里叶系数具有随机幅值与随机相位。虽然该方法在无条件生成或文生图任务中表现优异,但相位分量的破坏会导致空间结构失真,因此不适用于需要几何一致性的任务(如重渲染、仿真增强和图生图转换)。我们提出相位保持扩散模型φ-PD,这是一种与模型架构无关的扩散过程重构方法,通过在随机化幅值的同时保留输入相位,无需改变网络结构或增加参数即可实现结构对齐的生成。我们进一步提出频率选择结构化噪声,通过单一频带截断参数实现对结构刚度的连续控制。φ-PD不会增加推理耗时,且兼容所有图像或视频扩散模型。在写实风格与艺术风格重渲染、以及驾驶规划器的仿真到真实增强任务中,φ-PD均能生成可控且空间对齐的结果。应用于CARLA仿真器时,φ-PD将CARLA到Waymo规划器的性能提升了50%。该方法与现有条件控制技术形成互补,可广泛应用于图生图与视频生成任务。演示视频、补充案例及代码详见我们的https://yuzeng-at-tri.github.io/ppd-page/{项目页面}。
English
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our https://yuzeng-at-tri.github.io/ppd-page/{project page}.