ChatPaper.aiChatPaper

CoSPlan:基于场景图增量更新的校正式序贯规划

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

December 11, 2025
作者: Shresth Grover, Priyank Pathak, Akash Kumar, Vibhav Vineet, Yogesh S Rawat
cs.AI

摘要

大规模视觉语言模型(VLMs)在复杂推理任务中展现出卓越能力,但在视觉序列规划领域——即执行多步动作以实现目标——的研究仍处于空白状态。此外,实际序列规划常包含非最优(错误)步骤,这对模型检测与修正此类步骤的能力提出挑战。我们提出纠错式序列规划基准(CoSPlan),通过在迷宫导航、积木重组、图像重建和物体重排4个领域评估VLMs在容错型视觉序列规划任务中的表现。该基准重点考察两项核心能力:错误检测(识别非最优动作)与步骤补全(修正并完善动作序列以达成目标)。尽管采用思维链和场景图等前沿推理技术,主流VLM模型(如Intern-VLM与Qwen2)在CoSPlan中表现不佳,难以利用上下文线索达成目标。为此,我们提出无需训练的创新方法——场景图增量更新(SGI),通过在初始状态与目标状态间引入中间推理步骤,帮助VLMs进行序列推理,实现平均5.2%的性能提升。SGI不仅能增强纠错式序列规划的可靠性,还可泛化至Plan-Bench和视觉问答等传统规划任务。
English
Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on CoSPlan, failing to leverage contextual cues to reach goals. Addressing this, we propose a novel training-free method, Scene Graph Incremental updates (SGI), which introduces intermediate reasoning steps between the initial and goal states. SGI helps VLMs reason about sequences, yielding an average performance gain of 5.2%. In addition to enhancing reliability in corrective sequential planning, SGI generalizes to traditional planning tasks such as Plan-Bench and VQA.
PDF02December 18, 2025