GPT-4V(ision)는 텍스트-3D 생성 작업을 위한 인간 중심 평가자입니다.

초록

텍스트-3D 생성 방법의 최근 발전에도 불구하고, 신뢰할 수 있는 평가 지표가 현저히 부족한 상황이다. 기존 지표들은 일반적으로 입력 텍스트와 얼마나 잘 정렬되었는지와 같은 단일 기준에 초점을 맞추고 있다. 이러한 지표들은 다양한 평가 기준에 일반화할 수 있는 유연성이 부족하며, 인간의 선호도와 잘 맞지 않을 수 있다. 사용자 선호도 연구를 수행하는 것은 적응성과 인간 중심의 결과를 모두 제공하는 대안이다. 그러나 사용자 연구는 확장하기에 매우 비용이 많이 든다. 본 논문은 텍스트-3D 생성 모델을 위한 자동적이고 다목적이며 인간 중심의 평가 지표를 제시한다. 이를 위해, 먼저 GPT-4V를 사용하여 평가 프롬프트를 생성하는 프롬프트 생성기를 개발하여, 이를 텍스트-3D 모델을 비교하기 위한 입력으로 사용한다. 또한, 사용자 정의 기준에 따라 두 개의 3D 자산을 비교하도록 GPT-4V를 지시하는 방법을 설계한다. 마지막으로, 이러한 쌍별 비교 결과를 사용하여 이 모델들에 Elo 등급을 부여한다. 실험 결과는 우리의 지표가 다양한 평가 기준에서 인간의 선호도와 강력하게 일치함을 보여준다.

English

Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is an alternative that offers both adaptability and human-aligned results. User studies, however, can be very expensive to scale. This paper presents an automatic, versatile, and human-aligned evaluation metric for text-to-3D generative models. To this end, we first develop a prompt generator using GPT-4V to generate evaluating prompts, which serve as input to compare text-to-3D models. We further design a method instructing GPT-4V to compare two 3D assets according to user-defined criteria. Finally, we use these pairwise comparison results to assign these models Elo ratings. Experimental results suggest our metric strongly align with human preference across different evaluation criteria.

GPT-4V(ision)는 텍스트-3D 생성 작업을 위한 인간 중심 평가자입니다.

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

초록

Support