SPhyR: 材料分布における空間的・物理的推論ベンチマーク

要旨

我々は、トポロジー最適化に基づいて大規模言語モデル（LLM）の物理的・空間的推論能力を評価するための新しいデータセットを紹介する。トポロジー最適化とは、所定の荷重と支持条件の下で設計空間内の最適な材料分布を計算する手法である。このデータセットでは、LLMに対して2D境界、作用力、支持条件などの条件が与えられ、それに基づいて最適な材料分布を推論する必要がある。データセットには、部分構造内のマスクされた領域を埋めるタスクから、完全な材料分布を予測するタスクまで、多様な課題が含まれている。これらの課題を解決するためには、シミュレーションツールや明示的な物理モデルにアクセスすることなく、与えられた制約下での力の流れと必要な材料分布を理解する必要があり、構造の安定性と空間的組織化についての推論能力が試される。我々のデータセットは、2D設定における空間的・物理的推論能力の評価を目的としており、従来の言語や論理のベンチマークに対して補完的な視点を提供する。

English

We introduce a novel dataset designed to benchmark the physical and spatial reasoning capabilities of Large Language Models (LLM) based on topology optimization, a method for computing optimal material distributions within a design space under prescribed loads and supports. In this dataset, LLMs are provided with conditions such as 2D boundary, applied forces and supports, and must reason about the resulting optimal material distribution. The dataset includes a variety of tasks, ranging from filling in masked regions within partial structures to predicting complete material distributions. Solving these tasks requires understanding the flow of forces and the required material distribution under given constraints, without access to simulation tools or explicit physical models, challenging models to reason about structural stability and spatial organization. Our dataset targets the evaluation of spatial and physical reasoning abilities in 2D settings, offering a complementary perspective to traditional language and logic benchmarks.

SPhyR: 材料分布における空間的・物理的推論ベンチマーク

SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution

要旨

Support