OneHOI: 인간-객체 상호작용 생성 및 편집의 통합

초록

Human-Object Interaction(HOI) 모델링은 인간이 객체에 대해 행동하고 관계를 맺는 방식을 포착하며, 일반적으로 <사람, 행동, 객체> 삼중항으로 표현됩니다. 기존 접근법은 상호 배타적인 두 가지 계열로 나뉩니다: HOI 생성은 구조화된 삼중항과 레이아웃으로부터 장면을 합성하지만, HOI와 객체만 존재하는 엔티티와 같은 혼합 조건을 통합하지 못합니다. HOI 편집은 텍스트를 통해 상호작용을 수정하지만, 포즈와 물리적 접촉을 분리하는 데 어려움을 겪고 다중 상호작용으로 확장하기 어렵습니다. 본 논문에서는 HOI 생성과 편집을 공유된 구조적 상호작용 표현으로 구동되는 단일 조건부 노이즈 제거 프로세스로 통합하는 통합 확산 트랜스포머 프레임워크인 OneHOI를 소개합니다. 핵심에는 Relational Diffusion Transformer(R-DiT)가 있으며, 이는 역할 및 인스턴스 인식 HOI 토큰, 레이아웃 기반 공간적 Action Grounding, 상호작용 토폴로지를 강제하는 Structured HOI Attention, 다중 HOI 장면을 분리하는 HOI RoPE를 통해 동사 매개 관계를 모델링합니다. HOI-Edit-44K와 HOI 및 객체 중심 데이터셋에서 모달리티 드롭아웃을 사용하여 공동으로 훈련된 OneHOI는 레이아웃 기반, 레이아웃 무관, 임의 마스크, 혼합 조건 제어를 지원하며 HOI 생성과 편집 모두에서 최첨단 결과를 달성합니다. 코드는 https://jiuntian.github.io/OneHOI/에서 확인할 수 있습니다.

English

Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as <person, action, object> triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like HOI and object-only entities; and HOI editing modifies interactions via text, yet struggles to decouple pose from physical contact and scale to multiple interactions. We introduce OneHOI, a unified diffusion transformer framework that consolidates HOI generation and editing into a single conditional denoising process driven by shared structured interaction representations. At its core, the Relational Diffusion Transformer (R-DiT) models verb-mediated relations through role- and instance-aware HOI tokens, layout-based spatial Action Grounding, a Structured HOI Attention to enforce interaction topology, and HOI RoPE to disentangle multi-HOI scenes. Trained jointly with modality dropout on our HOI-Edit-44K, along with HOI and object-centric datasets, OneHOI supports layout-guided, layout-free, arbitrary-mask, and mixed-condition control, achieving state-of-the-art results across both HOI generation and editing. Code is available at https://jiuntian.github.io/OneHOI/.

OneHOI: 인간-객체 상호작용 생성 및 편집의 통합

OneHOI: Unifying Human-Object Interaction Generation and Editing

초록

Support