Reconstructie van kamerindeling vanuit ongeposeerde, schaarse beelden in het tijdperk van voorgetrainde modellen

Samenvatting

Het schatten van kamerindelingen vanuit meerdere perspectiefbeelden is slecht onderzocht vanwege de complexiteiten die ontstaan uit multi-view geometrie, wat multi-stap oplossingen vereist zoals het schatten van intrinsieke en extrinsieke cameraparameters, beeldmatching en triangulatie. Echter, in 3D-reconstructie heeft de vooruitgang van recente 3D-fundamentmodellen zoals DUSt3R het paradigma verschoven van het traditionele multi-stap structure-from-motion proces naar een end-to-end aanpak in één stap. Daarom introduceren wij Plane-DUSt3R, een nieuwe methode voor het schatten van kamerindelingen vanuit meerdere perspectieven die gebruikmaakt van het 3D-fundamentmodel DUSt3R. Plane-DUSt3R integreert het DUSt3R-framework en wordt getraind op een kamerindelingendataset (Structure3D) met een aangepast doel om structurele vlakken te schatten. Door uniforme en beknopte resultaten te genereren, maakt Plane-DUSt3R het mogelijk om kamerindelingen te schatten met slechts één post-processing stap en 2D-detectieresultaten. In tegenstelling tot eerdere methoden die afhankelijk zijn van enkelvoudige perspectief- of panoramabeelden, breidt Plane-DUSt3R de instelling uit om meerdere perspectiefbeelden te verwerken. Bovendien biedt het een gestroomlijnde, end-to-end oplossing die het proces vereenvoudigt en de foutaccumulatie vermindert. Experimentele resultaten tonen aan dat Plane-DUSt3R niet alleen state-of-the-art methoden overtreft op de synthetische dataset, maar ook robuust en effectief blijkt op real-world data met verschillende beeldstijlen zoals cartoons. Onze code is beschikbaar op: https://github.com/justacar/Plane-DUSt3R

English

Room layout estimation from multiple-perspective images is poorly investigated due to the complexities that emerge from multi-view geometry, which requires muti-step solutions such as camera intrinsic and extrinsic estimation, image matching, and triangulation. However, in 3D reconstruction, the advancement of recent 3D foundation models such as DUSt3R has shifted the paradigm from the traditional multi-step structure-from-motion process to an end-to-end single-step approach. To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes. By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results. Unlike previous methods that rely on single-perspective or panorama image, Plane-DUSt3R extends the setting to handle multiple-perspective images. Moreover, it offers a streamlined, end-to-end solution that simplifies the process and reduces error accumulation. Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon.Our code is available at: https://github.com/justacar/Plane-DUSt3R

Reconstructie van kamerindeling vanuit ongeposeerde, schaarse beelden in het tijdperk van voorgetrainde modellen

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

Samenvatting

Support