**주제:** 시각적 결함 파악 및 수정: 에이전트 기반 데이터 합성을 통해 VLM과 확산 모델의 시각적 아티팩트 이해 능력 향상 **한국어 제목 (대체안):** * 에이전트 기반 데이터 합성을 통한 VLM 및 확산 모델의 시각적 아티팩트 이해 역량 강화 * 시각적 결함 인식 및 개선: VLM과 확산 모델을 위한 에이전트 데이터 생성 방법론 **설명:** * "See and Fix the Flaws"를 "시각적 결함 파악 및 수정"으로 의역하여 동작의 목적을 명확히 전달했습니다. * "Visual Artifacts"는 컴퓨터 비전 및 그래픽 분야에서 널리 쓰이는 전문 용어인 "시각적 아티팩트"로 번역했습니다. * "Agentic Data Synthesis"는 "에이전트 기반 데이터 합성"으로, 에이전트의 능동적인 역할을 강조하면서도 학술 논문에 적합한 용어를 사용했습니다. * "Enabling VLMs and Diffusion Models to Comprehend" 부분은 "...이해 능력 향상" 또는 "...이해 역량 강화"로 번역하여 자연스러운 학문적 표현을 유지했습니다.

초록

확산 모델의 최근 발전에도 불구하고, AI 생성 이미지는 여전히 현실감을 해치는 시각적 아티팩트를 종종 포함합니다. 더 철저한 사전 학습과 더 큰 모델이 아티팩트를 줄일 수는 있지만, 이를 완전히 제거할 수 있다는 보장은 없어 아티팩트 완화 연구의 중요성이 매우 높습니다. 기존의 아티팩트 인식 방법론은 인간이 레이블을 작성한 아티팩트 데이터셋에 의존하는데, 이는 비용이 많이 들고 확장하기 어려워 신뢰할 수 있는 아티팩트 주석 데이터셋의 자동화된 획득 방법이 필요한 실정입니다. 본 논문에서는 실제 이미지와 아티팩트가 주입된 이미지의 쌍을 효율적으로 생성하는 ArtiAgent를 제안합니다. 이는 세 가지 에이전트로 구성됩니다: 실제 이미지에서 객체 및 하위 객체를 인식하고 위치를 특정하는 지각 에이전트, 확산 트랜스포머 내 새로운 패치 단위 임베딩 조작을 통해 아티팩트 주입 도구로 아티팩트를 도입하는 합성 에이전트, 그리고 합성된 아티팩트를 필터링하고 각 인스턴스에 대한 지역적 및 전역적 설명을 생성하는 관리 에이전트입니다. ArtiAgent를 사용하여 풍부한 아티팩트 주석이 포함된 10만 장의 이미지를 합성하였으며, 다양한 응용 분야에서 효용성과 다용도성을 입증했습니다. 코드는 링크에서 확인할 수 있습니다.

English

Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a highly crucial area of study. Previous artifact-aware methodologies depend on human-labeled artifact datasets, which are costly and difficult to scale, underscoring the need for an automated approach to reliably acquire artifact-annotated datasets. In this paper, we propose ArtiAgent, which efficiently creates pairs of real and artifact-injected images. It comprises three agents: a perception agent that recognizes and grounds entities and subentities from real images, a synthesis agent that introduces artifacts via artifact injection tools through novel patch-wise embedding manipulation within a diffusion transformer, and a curation agent that filters the synthesized artifacts and generates both local and global explanations for each instance. Using ArtiAgent, we synthesize 100K images with rich artifact annotations and demonstrate both efficacy and versatility across diverse applications. Code is available at link.

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

초록

Support