视觉扩散模型是几何求解器
Visual Diffusion Models are Geometric Solvers
October 24, 2025
作者: Nir Goren, Shai Yehezkel, Omer Dahary, Andrey Voynov, Or Patashnik, Daniel Cohen-Or
cs.AI
摘要
本文首次证明视觉扩散模型可作为有效的几何求解器:它们能直接在像素空间中对几何问题进行推理。我们首先以"内接正方形问题"(该几何学百年难题探讨是否所有若尔当曲线都包含可构成正方形的四个点)验证此能力,随后将该方法拓展至斯坦纳树问题和简单多边形问题这两大著名几何难题。
我们的方法将每个问题实例视为图像,并训练标准视觉扩散模型将高斯噪声转换为表征有效近似解(与精确解高度吻合)的图像。该模型学会将含噪几何结构转换为正确配置,实质上把几何推理重构为图像生成任务。
与先前研究在应用扩散模型处理参数化几何表示时需定制专用架构和领域适配不同,我们采用标准视觉扩散模型直接处理问题的视觉表征。这种简洁性揭示了生成建模与几何问题求解之间令人惊异的桥梁。除本文研究的特定问题外,我们的成果指向更广泛的范式:在图像空间中操作为逼近著名难题提供了通用实用框架,并为攻克更广泛挑战性几何任务开辟了新途径。
English
In this paper we show that visual diffusion models can serve as effective
geometric solvers: they can directly reason about geometric problems by working
in pixel space. We first demonstrate this on the Inscribed Square Problem, a
long-standing problem in geometry that asks whether every Jordan curve contains
four points forming a square. We then extend the approach to two other
well-known hard geometric problems: the Steiner Tree Problem and the Simple
Polygon Problem.
Our method treats each problem instance as an image and trains a standard
visual diffusion model that transforms Gaussian noise into an image
representing a valid approximate solution that closely matches the exact one.
The model learns to transform noisy geometric structures into correct
configurations, effectively recasting geometric reasoning as image generation.
Unlike prior work that necessitates specialized architectures and
domain-specific adaptations when applying diffusion to parametric geometric
representations, we employ a standard visual diffusion model that operates on
the visual representation of the problem. This simplicity highlights a
surprising bridge between generative modeling and geometric problem solving.
Beyond the specific problems studied here, our results point toward a broader
paradigm: operating in image space provides a general and practical framework
for approximating notoriously hard problems, and opens the door to tackling a
far wider class of challenging geometric tasks.