ChatPaper.aiChatPaper

AFFORDANCE20Q:從物理屬性評估可供性推理

AFFORDANCE20Q: Evaluating Affordance Reasoning from Physical Properties

June 12, 2026
作者: Yifan Jiang, Meige Yang, Zitong Li, Jay Pujara
cs.AI

摘要

Affordance推理,即根據物體的物理屬性(如形狀和材質)推斷其行為可能性,是人類物理理解的基礎,且對大型語言模型(LLMs)愈發關鍵。然而,現有的affordance基準大多在評估設定中暴露明確的物體身分,使模型能依賴記憶的物體-affordance映射而非基於物理屬性進行推理。為填補此缺口,我們提出Affordance20Q,一個新穎的affordance推理基準,以20個問題遊戲的形式呈現,且不揭露物體身分。在每場遊戲中,模型透過詢問關於物理屬性的「是/否」問題,從候選集合中識別隱藏物體的affordance。Affordance20Q包含1,009場遊戲,涵蓋454個物體與59種affordance,所有數據均經人工篩選、修正與標註。我們對15個最先進的大型語言模型進行全面實驗,發現其與人類表現存在約20個百分點的巨大差距。基於KL散度的資訊增益(IG)分析進一步顯示,模型在遊戲進行中未能提出具鑑別力的問題。為縮小差距,我們開發了知識庫錨定的規則歸納法(KARI),這是一條基於LLMs的流程,能產生奠基於知識庫(KBs)證據的affordance規則。KARI使開源LLMs的表現提升高達15.2個百分點,然而知識庫的涵蓋範圍有限,限制了進一步的進步。我們已將所有程式碼與數據公開於 https://github.com/1171-jpg/Affordance20Q.git。
English
Affordance reasoning, the inference of an object's action possibilities from its physical properties (e.g., shape and material), is fundamental to human physical understanding and increasingly critical for Large Language Models (LLMs). However, existing affordance benchmarks largely expose explicit object identities in the evaluation setup, allowing models to rely on memorized object-affordance mappings rather than reasoning over physical properties. To address this gap, we introduce Affordance20Q, a novel affordance reasoning benchmark formulated as a 20-Questions game without exposing the object's identity. In each game, the model identifies a hidden object's affordance from a candidate set by asking yes/no questions about its physical properties. Affordance20Q comprises 1,009 games over 454 objects and 59 affordances, all manually filtered, refined, and annotated. We conduct comprehensive experiments with 15 state-of-the-art LLMs and find a substantial gap (~20 points) compared to human performance. A KL-based information-gain (IG) analysis further shows that models fail to ask discriminating questions as the game progresses. To close the gap, we develop KB-Anchored Rule Induction (KARI), a pipeline based on LLMs that generates affordance rules grounded in evidence from knowledge bases (KBs). KARI improves open-source LLMs by up to 15.2 points, while the limited coverage of KBs hinders further gains. We release all our code and data at https://github.com/1171-jpg/Affordance20Q.git