ChatPaper.aiChatPaper

PICABench:我们距离物理真实的图像编辑还有多远?

PICABench: How Far Are We from Physically Realistic Image Editing?

October 20, 2025
作者: Yuandong Pu, Le Zhuo, Songhao Han, Jinbo Xing, Kaiwen Zhu, Shuo Cao, Bin Fu, Si Liu, Hongsheng Li, Yu Qiao, Wenlong Zhang, Xi Chen, Yihao Liu
cs.AI

摘要

近期,图像编辑技术取得了显著进展。现代编辑模型已能依据复杂指令对原始内容进行操控。然而,在完成编辑指令之外,伴随的物理效应是生成真实感的关键。例如,移除一个物体时,其阴影、反射及与周围物体的互动也应一并消除。遗憾的是,现有模型和基准测试主要聚焦于指令完成度,却忽视了这些物理效应。那么,当前我们距离实现物理真实的图像编辑还有多远?为解答这一问题,我们推出了PICABench,它系统性地评估了大多数常见编辑操作(如添加、移除、属性更改等)在八个子维度(涵盖光学、力学及状态转换)上的物理真实感。此外,我们提出了PICAEval,一种可靠的评估协议,它采用VLM作为评判者,结合逐案例、区域级的人工标注与提问。在基准测试之外,我们还探索了通过从视频中学习物理规律的有效解决方案,并构建了训练数据集PICA-100K。在评估了多数主流模型后,我们发现物理真实感仍是一个充满探索空间的挑战性问题。我们期望我们的基准测试及提出的解决方案能为未来工作奠定基础,推动从简单内容编辑向物理一致性真实感的转变。
English
Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.
PDF583October 21, 2025