Learn2Fold: Gestructureerde origamigeneratie met wereldmodelplanning

Samenvatting

Het vermogen om een plat vel om te vormen tot een complexe driedimensionale structuur is een fundamentele test van fysieke intelligentie. In tegenstelling tot het manipuleren van doek wordt origami beheerst door strikte geometrische axioma's en harde kinematische beperkingen, waarbij een enkele ongeldige vouw of botsing de gehele vouwsequentie ongeldig kan maken. Origami vereist daarom een constructief redeneerproces op lange termijn dat zowel aan precieze fysieke wetten als aan hoogwaardige semantische intentie voldoet. Bestaande benaderingen vallen uiteen in twee gescheiden paradigma's: op optimalisatie gebaseerde methoden handhaven fysieke geldigheid maar vereisen dichte, nauwkeurig gespecificeerde invoer, waardoor ze ongeschikt zijn voor schaarse beschrijvingen in natuurlijke taal, terwijl generatieve foundation-modellen uitblinken in semantische en perceptuele synthese, maar er niet in slagen om vouwprocessen op lange termijn te produceren die consistent zijn met de fysica. Bijgevolg blijft het genereren van geldige origami-vouwsequenties rechtstreeks vanuit tekst een open uitdaging. Om deze kloof te overbruggen, introduceren we Learn2Fold, een neuro-symbolisch raamwerk dat origami-vouwen formuleert als conditionele programma-inductie over een vouwpatroongrafiek. Onze belangrijkste inzicht is het ontkoppelen van semantische voorstellen en fysieke verificatie. Een groot taalmodel genereert kandidaat-vouwprogramma's vanuit abstracte tekstprompts, terwijl een geleerd grafisch gestructureerd wereldmodel dient als een differentieerbare surrogaatsimulator die de fysieke haalbaarheid en faalwijzen voorspelt vóór uitvoering. Geïntegreerd in een vooruitkijkende planningslus stelt Learn2Fold robuuste generatie van fysiek geldige vouwsequenties mogelijk voor complexe en buiten-de-verdeling patronen, wat aantoont dat effectieve ruimtelijke intelligentie voortkomt uit de synergie tussen symbolisch redeneren en gegronde fysieke simulatie.

English

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

Learn2Fold: Gestructureerde origamigeneratie met wereldmodelplanning

Learn2Fold: Structured Origami Generation with World Model Planning

Samenvatting

Support