Learn2Fold:基于世界模型规划的结构化折纸生成
Learn2Fold: Structured Origami Generation with World Model Planning
February 2, 2026
作者: Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang
cs.AI
摘要
将平面纸张转化为复杂三维结构的能力,是对物理智能的基本考验。与布料操控不同,折纸艺术受严格的几何公理和硬性运动约束支配,任何一处无效折痕或碰撞都可能导致整个折叠序列失效。因此,折纸需要满足精确物理定律与高层语义意图的长程建构推理。现有方法分为两种割裂的范式:基于优化的方法能保证物理有效性,但需要密集且精确指定的输入,难以适配稀疏的自然语言描述;而生成式基础模型虽擅长语义与感知合成,却无法产生长程且符合物理规律的折叠过程。因此直接从文本生成有效的折纸折叠序列仍是开放难题。为解决这一局限,我们提出Learn2Fold——一种将折纸折叠建模为折痕图条件程序归纳的神经符号框架。我们的核心思路是将语义提议与物理验证解耦:大语言模型根据抽象文本提示生成候选折叠程序,而习得的图结构世界模型则作为可微分代理模拟器,在执行前预测物理可行性与失效模式。通过前瞻规划循环的整合,Learn2Fold能稳健生成复杂及超分布图案的物理有效折叠序列,印证了有效的空间智能源于符号推理与具身物理模拟的协同作用。
English
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.