GPT-4V(ision)是一個針對文本轉3D生成的人類對齊評估器。

摘要

儘管最近在文本轉3D生成方法方面取得了進展，但值得注意的是缺乏可靠的評估指標。現有的指標通常專注於單一標準，例如資產與輸入文本的對齊程度。這些指標缺乏泛化到不同評估標準的靈活性，並且可能與人類偏好不太一致。進行使用者偏好研究是一種提供適應性和與人類一致結果的替代方法。然而，使用者研究在擴展方面可能非常昂貴。本文提出了一種自動、多功能且與人類一致的文本轉3D生成模型評估指標。為此，我們首先使用GPT-4V開發提示生成器，生成評估提示，作為比較文本轉3D模型的輸入。我們進一步設計了一種方法，指示GPT-4V根據使用者定義的標準比較兩個3D資產。最後，我們使用這些兩兩比較的結果來為這些模型分配Elo評分。實驗結果表明，我們的指標在不同評估標準下與人類偏好強烈一致。

English

Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is an alternative that offers both adaptability and human-aligned results. User studies, however, can be very expensive to scale. This paper presents an automatic, versatile, and human-aligned evaluation metric for text-to-3D generative models. To this end, we first develop a prompt generator using GPT-4V to generate evaluating prompts, which serve as input to compare text-to-3D models. We further design a method instructing GPT-4V to compare two 3D assets according to user-defined criteria. Finally, we use these pairwise comparison results to assign these models Elo ratings. Experimental results suggest our metric strongly align with human preference across different evaluation criteria.

GPT-4V(ision)是一個針對文本轉3D生成的人類對齊評估器。

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

摘要

Support