ChatPaper.aiChatPaper

PICABench:我們距離物理真實的圖像編輯還有多遠?

PICABench: How Far Are We from Physically Realistic Image Editing?

October 20, 2025
作者: Yuandong Pu, Le Zhuo, Songhao Han, Jinbo Xing, Kaiwen Zhu, Shuo Cao, Bin Fu, Si Liu, Hongsheng Li, Yu Qiao, Wenlong Zhang, Xi Chen, Yihao Liu
cs.AI

摘要

近期,圖像編輯技術取得了顯著進展。現代編輯模型已能遵循複雜指令對原始內容進行操作。然而,在完成編輯指令之外,伴隨的物理效應是生成真實感的關鍵。例如,移除一個物體時,其陰影、反射及與周圍物體的互動也應一併消除。遺憾的是,現有模型與基準主要聚焦於指令的完成,而忽視了這些物理效應。因此,當下我們距離實現物理真實的圖像編輯還有多遠?為解答此問題,我們引入了PICABench,它系統地評估了大多數常見編輯操作(添加、移除、屬性變更等)在八個子維度(涵蓋光學、力學及狀態轉變)上的物理真實性。我們進一步提出了PICAEval,這是一種可靠的評估協議,採用VLM作為評判工具,並結合逐案例、區域級別的人工標註與問題。除了基準測試外,我們還探索了從視頻中學習物理的有效解決方案,並構建了訓練數據集PICA-100K。在評估了大多數主流模型後,我們觀察到物理真實性仍是一個具有廣闊探索空間的挑戰性問題。我們希望我們的基準及提出的解決方案能為未來從簡單內容編輯邁向物理一致真實感的工作奠定基礎。
English
Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.
PDF583October 21, 2025