JourneyDB: 생성적 이미지 이해를 위한 벤치마크

초록

최근 비전-언어 모델의 발전이 다중 모달 이해를 혁신적으로 변화시켰음에도 불구하고, 이러한 모델들이 생성된 이미지를 이해할 수 있는 능력을 갖추고 있는지는 여전히 불분명합니다. 실제 데이터와 비교할 때, 합성 이미지는 내용과 스타일 모두에서 더 높은 수준의 다양성을 보이며, 이는 모델이 이를 완전히 이해하는 데 상당한 어려움을 초래합니다. 이를 위해 우리는 생성된 이미지에 대한 다중 모달 시각적 이해를 위한 대규모 데이터셋인 JourneyDB를 제안합니다. 우리가 정제한 이 데이터셋은 400만 개의 다양하고 고품질의 생성된 이미지와 이를 생성하는 데 사용된 텍스트 프롬프트를 포함하고 있습니다. 또한, 우리는 생성된 이미지 이해의 성능을 내용과 스타일 해석 측면에서 정량화하기 위해 4가지 벤치마크를 설계했습니다. 이러한 벤치마크에는 프롬프트 역변환, 스타일 검색, 이미지 캡셔닝 및 시각적 질의응답이 포함됩니다. 마지막으로, 우리는 JourneyDB에 적용된 현재 최첨단 다중 모달 모델들의 성능을 평가하고, 생성된 콘텐츠 이해에서의 강점과 한계에 대한 심층 분석을 제공합니다. 우리는 제안된 데이터셋과 벤치마크가 생성 콘텐츠 이해 분야의 연구를 촉진할 수 있기를 바랍니다. 이 데이터셋은 https://journeydb.github.io에서 이용 가능할 것입니다.

English

While recent advancements in vision-language models have revolutionized multi-modal understanding, it remains unclear whether they possess the capabilities of comprehending the generated images. Compared to real data, synthetic images exhibit a higher degree of diversity in both content and style, for which there are significant difficulties for the models to fully apprehend. To this end, we present a large-scale dataset, JourneyDB, for multi-modal visual understanding in generative images. Our curated dataset covers 4 million diverse and high-quality generated images paired with the text prompts used to produce them. We further design 4 benchmarks to quantify the performance of generated image understanding in terms of both content and style interpretation. These benchmarks include prompt inversion, style retrieval, image captioning and visual question answering. Lastly, we assess the performance of current state-of-the-art multi-modal models when applied to JourneyDB, and provide an in-depth analysis of their strengths and limitations in generated content understanding. We hope the proposed dataset and benchmarks will facilitate the research in the field of generative content understanding. The dataset will be available on https://journeydb.github.io.

JourneyDB: 생성적 이미지 이해를 위한 벤치마크

JourneyDB: A Benchmark for Generative Image Understanding

초록

Support