파노픽 페어와이즈 왜곡 그래프

초록

본 연구에서는 이미지 쌍을 해당 영역들의 구조적 구성으로 표현함으로써 비교적 이미지 평가에 대한 새로운 관점을 제시합니다. 기존 방법론들이 전체 이미지 분석에 중점을 두면서 암묵적으로 영역 수준의 이해에 의존하는 것과 대조적입니다. 우리는 단일 이미지 내 장면 그래프(scene graph) 개념을 이미지 간 관계로 확장하고, 왜곡 그래프(Distortion Graph, DG)라는 새로운 과제를 제안합니다. DG는 이미지 쌍을 영역에 기반한 구조적 위상으로 취급하고, 왜곡 유형, 심각도, 비교 결과, 품질 점수 등의 밀집된 열화 정보를 간결하고 해석 가능한 그래프 구조로 표현합니다. 왜곡 그래프 학습 과제를 구현하기 위해 우리는 (i) 영역 수준 데이터셋인 PandaSet, (ii) 다양한 영역 수준 난이도를 가진 벤치마크 제품군 PandaBench, (iii) 왜곡 그래프를 생성하는 효율적인 아키텍처 Panda를 기여합니다. 우리는 최첨단 멀티모달 대규모 언어 모델(MLLM)들이 명시적인 영역 단서를 제공받더라도 영역 수준의 열화를 이해하지 못하여 PandaBench가 이들에게 상당한 도전 과제가 됨을 입증합니다. PandaSet으로 학습하거나 DG를 활용한 프롬프팅을 수행하면 영역별 왜곡 이해 능력이 향상되어, 세분화되고 구조화된 pairwise 이미지 평가를 위한 새로운 방향이 열림을 보여줍니다.

English

In this work, we introduce a new perspective on comparative image assessment by representing an image pair as a structured composition of its regions. In contrast, existing methods focus on whole image analysis, while implicitly relying on region-level understanding. We extend the intra-image notion of a scene graph to inter-image, and propose a novel task of Distortion Graph (DG). DG treats paired images as a structured topology grounded in regions, and represents dense degradation information such as distortion type, severity, comparison and quality score in a compact interpretable graph structure. To realize the task of learning a distortion graph, we contribute (i) a region-level dataset, PandaSet, (ii) a benchmark suite, PandaBench, with varying region-level difficulty, and (iii) an efficient architecture, Panda, to generate distortion graphs. We demonstrate that PandaBench poses a significant challenge for state-of-the-art multimodal large language models (MLLMs) as they fail to understand region-level degradations even when fed with explicit region cues. We show that training on PandaSet or prompting with DG elicits region-wise distortion understanding, opening a new direction for fine-grained, structured pairwise image assessment.

파노픽 페어와이즈 왜곡 그래프

Panoptic Pairwise Distortion Graph

초록

Support