Learn2Fold: 세계 모델 기반 계획을 통한 구조적 오리지미 생성

초록

평평한 평면을 복잡한 3차원 구조로 변형하는 능력은 물리적 지능의 근본적인 시험입니다. 천 조작과 달리, 접기 예술은 엄격한 기하학적 공리와 강력한 운동학적 제약 조건에 의해 지배되며, 단 하나의 잘못된 접기 선이나 충돌이 전체 접기 순서를 무효화할 수 있습니다. 그 결과, 접기 예술은 정밀한 물리 법칙과 높은 수준의 의미적 의도를 동시에 만족시키는 장기적 구성 추론을 요구합니다. 기존 접근법은 두 가지 분리된 패러다임으로 나뉩니다: 최적화 기반 방법은 물리적 타당성을 강제하지만 조밀하고 정확하게 지정된 입력이 필요하여 희소한 자연어 설명에는 적합하지 않으며, 생성형 기초 모델은 의미 및 지각 합성에 뛰어나지만 장기적이고 물리적으로 일관된 접기 과정을 생성하지 못합니다. 결과적으로, 텍스트에서 직접 유효한 접기 접기 순서를 생성하는 것은 여전히 해결되지 않은 과제로 남아 있습니다. 이러한 격차를 해결하기 위해 우리는 접기 접기를 접기 패턴 그래프에 대한 조건부 프로그램 귀납으로 공식화하는 신경-기호 프레임워크인 Learn2Fold를 소개합니다. 우리의 핵심 통찰은 의미적 제안과 물리적 검증을 분리하는 것입니다. 대규모 언어 모델은 추상적인 텍스트 프롬프트에서 후보 접기 프로그램을 생성하는 반면, 학습된 그래프 구조 세계 모델은 실행 전에 물리적 실현 가능성과 실패 모드를 예측하는 미분 가능한 대체 시뮬레이터 역할을 합니다. 전방 탐색 계획 루프 내에 통합된 Learn2Fold는 복잡하고 분포 외 패턴에 대해 물리적으로 유효한 접기 순서를 강력하게 생성하며, 효과적인 공간 지능은 기호적 추론과 실제 물리 시뮬레이션 간의 시너지에서 발생함을 보여줍니다.

English

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

Learn2Fold: 세계 모델 기반 계획을 통한 구조적 오리지미 생성

Learn2Fold: Structured Origami Generation with World Model Planning

초록

Support