SPhyR:材料分佈的空間物理推理基準測試
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
May 21, 2025
作者: Philipp D. Siedler
cs.AI
摘要
我們引入了一個新穎的數據集,旨在基於拓撲優化方法來評估大型語言模型(LLM)的物理與空間推理能力。該方法用於在給定載荷和支撐條件下,計算設計空間內的最優材料分佈。在此數據集中,LLM會獲得如二維邊界、施加的力與支撐等條件,並需推理出相應的最優材料分佈。數據集包含多樣化的任務,從填充部分結構中的遮罩區域到預測完整的材料分佈不等。解決這些任務需要理解力的傳遞及在特定約束下所需的材料分佈,而無需借助仿真工具或顯式的物理模型,從而挑戰模型對結構穩定性與空間組織的推理能力。我們的數據集專注於二維環境下的空間與物理推理能力評估,為傳統語言與邏輯基準提供了一個互補的視角。
English
We introduce a novel dataset designed to benchmark the physical and spatial
reasoning capabilities of Large Language Models (LLM) based on topology
optimization, a method for computing optimal material distributions within a
design space under prescribed loads and supports. In this dataset, LLMs are
provided with conditions such as 2D boundary, applied forces and supports, and
must reason about the resulting optimal material distribution. The dataset
includes a variety of tasks, ranging from filling in masked regions within
partial structures to predicting complete material distributions. Solving these
tasks requires understanding the flow of forces and the required material
distribution under given constraints, without access to simulation tools or
explicit physical models, challenging models to reason about structural
stability and spatial organization. Our dataset targets the evaluation of
spatial and physical reasoning abilities in 2D settings, offering a
complementary perspective to traditional language and logic benchmarks.Summary
AI-Generated Summary