Raster2Seq:用於平面圖重建的多邊形序列生成
Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction
May 11, 2026
作者: Hao Phung, Hadar Averbuch-Elor
cs.AI
摘要
從已柵格化的平面圖影像重建出結構化的向量圖形表示,通常是自動化理解或 CAD 工作流程等涉及平面圖的計算任務的重要前提。然而,現有技術在忠實生成複雜平面圖所傳達的結構與語義方面仍面臨挑戰,這類平面圖描繪了具有大量房間與多變多邊形角點的大型室內空間。為此,我們提出 Raster2Seq,將平面圖重建視為一個序列到序列的任務,其中平面圖元素(例如房間、窗戶和門)被表示為標記化的多邊形序列,以聯合編碼幾何形狀與語義。我們的方法引入了一種自回歸解碼器,它能根據圖像特徵以及先前生成的角點,並藉助可學習錨點的引導,來學習預測下一個角點。這些錨點代表圖像空間中的空間坐標,從而有效引導注意力機制聚焦於信息豐富的圖像區域。透過採用自回歸機制,我們的方法在輸出格式上提供了靈活性,能夠高效處理具有大量房間與多樣多邊形結構的複雜平面圖。我們的標準基準測試(如 Structure3D、CubiCasa5K 和 Raster2Graph)上達到了最先進的性能,同時也在更具挑戰性的資料集(如 WAFFLE)上展現了強大的泛化能力,該資料集包含多樣的房間結構與複雜的幾何變化。
English
Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.