ChronoMagic-Bench：一個用於評估文本轉時間膠片視頻生成的變形評估基準。

摘要

我們提出了一個新穎的文本到視頻（T2V）生成基準，ChronoMagic-Bench，用於評估T2V模型（例如Sora和Lumiere）在延時視頻生成中的時間和變形能力。與現有基準不同，這些基準著重於生成視頻的視覺質量和文本相關性，ChronoMagic-Bench則專注於模型生成具有顯著變形幅度和時間連貫性的延時視頻的能力。該基準通過自由形式文本查詢探測T2V模型的物理、生物和化學能力。為此，ChronoMagic-Bench引入了1,649個提示和現實世界視頻作為參考，分為四大類延時視頻：生物、人造、氣象和物理現象，進一步細分為75個子類別。這種分類全面評估了模型處理多樣和複雜變換的能力。為了準確對齊人類偏好與基準，我們引入了兩個新的自動指標，MTScore和CHScore，用於評估視頻的變形屬性和時間連貫性。MTScore衡量變形幅度，反映隨時間變化的程度，而CHScore評估時間連貫性，確保生成的視頻保持邏輯進展和連貫性。基於ChronoMagic-Bench，我們對十個具代表性的T2V模型進行全面手動評估，揭示它們在不同提示類別中的優勢和劣勢，並提供一個全面的評估框架，解決了視頻生成研究中的現有缺口。此外，我們創建了一個大規模的ChronoMagic-Pro數據集，包含460k對720p高質量延時視頻和詳細說明，確保高物理相關性和大變形幅度。

English

We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e.g. Sora and Lumiere) in time-lapse video generation. In contrast to existing benchmarks that focus on the visual quality and textual relevance of generated videos, ChronoMagic-Bench focuses on the model's ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark probes T2V models for their physics, biology, and chemistry capabilities, in a free-form text query. For these purposes, ChronoMagic-Bench introduces 1,649 prompts and real-world videos as references, categorized into four major types of time-lapse videos: biological, human-created, meteorological, and physical phenomena, which are further divided into 75 subcategories. This categorization comprehensively evaluates the model's capacity to handle diverse and complex transformations. To accurately align human preference with the benchmark, we introduce two new automatic metrics, MTScore and CHScore, to evaluate the videos' metamorphic attributes and temporal coherence. MTScore measures the metamorphic amplitude, reflecting the degree of change over time, while CHScore assesses the temporal coherence, ensuring the generated videos maintain logical progression and continuity. Based on the ChronoMagic-Bench, we conduct comprehensive manual evaluations of ten representative T2V models, revealing their strengths and weaknesses across different categories of prompts, and providing a thorough evaluation framework that addresses current gaps in video generation research. Moreover, we create a large-scale ChronoMagic-Pro dataset, containing 460k high-quality pairs of 720p time-lapse videos and detailed captions ensuring high physical pertinence and large metamorphic amplitude.

ChronoMagic-Bench：一個用於評估文本轉時間膠片視頻生成的變形評估基準。

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

摘要

Support