ChatPaper.aiChatPaper

CoSPlan:基于场景图增量更新的校正式序列规划

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

December 11, 2025
作者: Shresth Grover, Priyank Pathak, Akash Kumar, Vibhav Vineet, Yogesh S Rawat
cs.AI

摘要

大规模视觉语言模型(VLMs)展现出令人印象深刻的复杂推理能力,但在视觉序列规划领域——即执行多步动作以实现目标——仍鲜有探索。此外,实际序列规划常涉及非最优(错误)步骤,这对VLM检测与修正此类步骤的能力提出挑战。我们提出纠正式序列规划基准(CoSPlan),通过在迷宫导航、积木重组、图像重建和物体重组这4个领域评估VLM在容错型视觉序列规划任务中的表现。该基准重点考察两项关键能力:错误检测(识别非最优动作)与步骤补全(修正并完善动作序列以达成目标)。尽管采用思维链和场景图谱等最先进推理技术,主流VLM(如Intern-VLM与Qwen2)在CoSPlan中表现不佳,难以利用上下文线索达成目标。为此,我们提出一种无需训练的新方法——场景图谱增量更新(SGI),通过在初始状态与目标状态间引入中间推理步骤,帮助VLM进行序列推理,实现平均5.2%的性能提升。除增强纠正式序列规划的可靠性外,SGI还可泛化至Plan-Bench和VQA等传统规划任务。
English
Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on CoSPlan, failing to leverage contextual cues to reach goals. Addressing this, we propose a novel training-free method, Scene Graph Incremental updates (SGI), which introduces intermediate reasoning steps between the initial and goal states. SGI helps VLMs reason about sequences, yielding an average performance gain of 5.2%. In addition to enhancing reliability in corrective sequential planning, SGI generalizes to traditional planning tasks such as Plan-Bench and VQA.
PDF02December 18, 2025