Learn2Fold: 世界モデル計画に基づく構造化折り紙生成

要旨

平面のシートを複雑な立体構造へと変形させる能力は、物理的知性の基本的な試金石である。布の操作とは異なり、折り紙は厳密な幾何学的公理と硬い運動学的制約に支配されており、単一の無効な折り目や干渉が折り工程全体を無効にしうる。その結果、折り紙は精密な物理法則と高次元の意味的意図を同時に満たす、長期的な構成的推論を要求する。既存のアプローチは二つの分離したパラダイムに分類される：最適化ベースの手法は物理的正当性を保証するが、密で正確に指定された入力を必要とし、疎な自然言語記述には適さない。一方、生成基盤モデルは意味的・知覚的合成に優れるが、長期的で物理的に一貫した折り工程を生成できない。したがって、テキストから直接に有効な折り紙の折り工程を生成することは未解決の課題である。この隔たりを埋めるため、我々は折り紙折りを折り目パターングラフ上の条件付きプログラム誘導として定式化する神経記号フレームワーク「Learn2Fold」を提案する。我々の重要な洞察は、意味的提案と物理的検証を分離することである。大規模言語モデルが抽象的なテキストプロンプトから候補となる折りプログラムを生成し、学習済みのグラフ構造世界モデルが微分可能な代理シミュレータとして機能し、実行前に物理的実現可能性と故障モードを予測する。先読み計画ループに統合されるLearn2Foldは、複雑な分布外パターンに対しても物理的に有効な折り工程を頑健に生成し、効果的な空間的知性が記号的推論と接地された物理シミュレーションの相乗作用から生じることを実証する。

English

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

Learn2Fold: 世界モデル計画に基づく構造化折り紙生成

Learn2Fold: Structured Origami Generation with World Model Planning

要旨

Support