GeoRemover: 객체 및 그에 따른 시각적 인과관계 제거

초록

지능형 이미지 편집을 위해서는 대상 객체뿐만 아니라 그로 인한 시각적 부산물, 즉 그림자와 반사 등도 제거해야 합니다. 그러나 기존의 이미지 외관 기반 방법들은 엄격하게 마스크에 맞춰 학습을 진행하여 명시적으로 마스크되지 않은 이러한 인과적 효과를 제거하지 못하거나, 느슨하게 마스크에 맞춘 전략을 채택하여 제어 가능성이 부족하고 다른 객체를 의도치 않게 과도하게 지우는 문제가 있습니다. 우리는 이러한 한계가 객체의 기하학적 존재와 그 시각적 효과 간의 인과 관계를 무시한 데서 비롯된다고 파악했습니다. 이 한계를 해결하기 위해, 우리는 기하학을 고려한 두 단계 프레임워크를 제안합니다. 이 프레임워크는 객체 제거를 (1) 기하학적 제거와 (2) 외관 렌더링으로 분리합니다. 첫 번째 단계에서는 엄격하게 마스크에 맞춘 감독을 통해 객체를 기하학(예: 깊이)에서 직접 제거함으로써 강력한 기하학적 제약 하에서 구조를 고려한 편집이 가능하도록 합니다. 두 번째 단계에서는 수정된 3D 기하학을 조건으로 하여 사실적인 RGB 이미지를 렌더링하며, 이 과정에서 수정된 3D 기하학의 결과로 인과적 시각적 효과가 암묵적으로 고려됩니다. 기하학적 제거 단계에서 학습을 안내하기 위해, 우리는 긍정적 및 부정적 샘플 쌍을 기반으로 한 선호도 주도 목적 함수를 도입하여, 새로운 구조적 삽입을 피하면서 객체와 그 인과적 시각적 부산물을 제거하도록 모델을 유도합니다. 광범위한 실험을 통해 우리의 방법이 두 가지 인기 벤치마크에서 객체와 관련된 부산물을 모두 제거하는 데 있어 최첨단 성능을 달성함을 입증했습니다. 코드는 https://github.com/buxiangzhiren/GeoRemover에서 확인할 수 있습니다.

English

Towards intelligent image editing, object removal should eliminate both the target object and its causal visual artifacts, such as shadows and reflections. However, existing image appearance-based methods either follow strictly mask-aligned training and fail to remove these causal effects which are not explicitly masked, or adopt loosely mask-aligned strategies that lack controllability and may unintentionally over-erase other objects. We identify that these limitations stem from ignoring the causal relationship between an object's geometry presence and its visual effects. To address this limitation, we propose a geometry-aware two-stage framework that decouples object removal into (1) geometry removal and (2) appearance rendering. In the first stage, we remove the object directly from the geometry (e.g., depth) using strictly mask-aligned supervision, enabling structure-aware editing with strong geometric constraints. In the second stage, we render a photorealistic RGB image conditioned on the updated geometry, where causal visual effects are considered implicitly as a result of the modified 3D geometry. To guide learning in the geometry removal stage, we introduce a preference-driven objective based on positive and negative sample pairs, encouraging the model to remove objects as well as their causal visual artifacts while avoiding new structural insertions. Extensive experiments demonstrate that our method achieves state-of-the-art performance in removing both objects and their associated artifacts on two popular benchmarks. The code is available at https://github.com/buxiangzhiren/GeoRemover.

GeoRemover: 객체 및 그에 따른 시각적 인과관계 제거

GeoRemover: Removing Objects and Their Causal Visual Artifacts

초록

Support