OmniGen2: 고급 멀티모달 생성 기술 탐구

초록

본 연구에서는 텍스트-이미지 생성, 이미지 편집, 컨텍스트 내 생성 등 다양한 생성 작업을 통합적으로 해결하기 위해 설계된 다목적 오픈소스 생성 모델인 OmniGen2를 소개합니다. OmniGen v1과 달리, OmniGen2는 텍스트와 이미지 모달리티를 위한 두 가지 독립적인 디코딩 경로를 갖추고 있으며, 공유되지 않은 파라미터와 분리된 이미지 토크나이저를 활용합니다. 이러한 설계는 VAE 입력을 재조정할 필요 없이 기존의 멀티모달 이해 모델을 기반으로 구축할 수 있게 하여 원본 텍스트 생성 능력을 유지합니다. OmniGen2의 학습을 지원하기 위해, 이미지 편집 및 컨텍스트 내 생성 데이터를 포함한 포괄적인 데이터 구축 파이프라인을 개발했습니다. 또한, 이미지 생성 작업에 특화된 리플렉션 메커니즘을 도입하고 OmniGen2를 기반으로 전용 리플렉션 데이터셋을 구축했습니다. 비교적 적은 파라미터 크기에도 불구하고, OmniGen2는 텍스트-이미지 생성 및 이미지 편집을 포함한 여러 작업 벤치마크에서 경쟁력 있는 결과를 달성했습니다. 컨텍스트 내 생성(주제 기반 작업이라고도 함)을 추가로 평가하기 위해 OmniContext라는 새로운 벤치마크를 도입했습니다. OmniGen2는 일관성 측면에서 오픈소스 모델 중 최첨단 성능을 보여줍니다. 향후 연구를 지원하기 위해 모델, 학습 코드, 데이터셋 및 데이터 구축 파이프라인을 공개할 예정입니다. 프로젝트 페이지: https://vectorspacelab.github.io/OmniGen2; GitHub 링크: https://github.com/VectorSpaceLab/OmniGen2

English

In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

OmniGen2: 고급 멀티모달 생성 기술 탐구

OmniGen2: Exploration to Advanced Multimodal Generation

초록

Support