DreamO: 이미지 커스터마이제이션을 위한 통합 프레임워크

초록

최근 이미지 커스터마이징(예: 정체성, 주제, 스타일, 배경 등)에 대한 광범위한 연구는 대규모 생성 모델에서 강력한 커스터마이징 능력을 보여주고 있습니다. 그러나 대부분의 접근 방식은 특정 작업을 위해 설계되어 다양한 유형의 조건을 결합하는 일반화 가능성을 제한하고 있습니다. 이미지 커스터마이징을 위한 통합 프레임워크를 개발하는 것은 여전히 해결되지 않은 과제로 남아 있습니다. 본 논문에서는 다양한 작업을 지원하면서 여러 조건의 원활한 통합을 용이하게 하는 이미지 커스터마이징 프레임워크인 DreamO를 제안합니다. 구체적으로, DreamO는 디퓨전 트랜스포머(DiT) 프레임워크를 활용하여 다양한 유형의 입력을 균일하게 처리합니다. 학습 과정에서는 다양한 커스터마이징 작업을 포함하는 대규모 학습 데이터셋을 구축하고, 참조 이미지에서 관련 정보를 정확하게 쿼리하기 위해 특징 라우팅 제약을 도입합니다. 또한, 특정 위치의 조건과 연관된 플레이스홀더 전략을 설계하여 생성 결과에서 조건의 배치를 제어할 수 있도록 합니다. 더 나아가, 세 단계로 구성된 점진적 학습 전략을 채택합니다: 첫 번째 단계에서는 제한된 데이터로 간단한 작업에 초점을 맞춰 기본 일관성을 확립하고, 두 번째 단계에서는 전면적인 학습을 통해 커스터마이징 능력을 종합적으로 향상시키며, 마지막 단계에서는 저품질 데이터로 인해 발생한 품질 편향을 교정합니다. 광범위한 실험을 통해 제안된 DreamO가 다양한 이미지 커스터마이징 작업을 고품질로 효과적으로 수행하고 다양한 유형의 제어 조건을 유연하게 통합할 수 있음을 입증합니다.

English

Recently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities in large-scale generative models. However, most approaches are designed for specific tasks, restricting their generalizability to combine different types of condition. Developing a unified framework for image customization remains an open challenge. In this paper, we present DreamO, an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. Specifically, DreamO utilizes a diffusion transformer (DiT) framework to uniformly process input of different types. During training, we construct a large-scale training dataset that includes various customization tasks, and we introduce a feature routing constraint to facilitate the precise querying of relevant information from reference images. Additionally, we design a placeholder strategy that associates specific placeholders with conditions at particular positions, enabling control over the placement of conditions in the generated results. Moreover, we employ a progressive training strategy consisting of three stages: an initial stage focused on simple tasks with limited data to establish baseline consistency, a full-scale training stage to comprehensively enhance the customization capabilities, and a final quality alignment stage to correct quality biases introduced by low-quality data. Extensive experiments demonstrate that the proposed DreamO can effectively perform various image customization tasks with high quality and flexibly integrate different types of control conditions.

DreamO: 이미지 커스터마이제이션을 위한 통합 프레임워크

DreamO: A Unified Framework for Image Customization

초록

Support