基於結構化表格發現的多樣化模型探索

摘要

模型卡透過文字描述與結構化成品（包括效能、組態及資料集表格）的混合方式來描述模型行為。現有的模型搜尋系統主要依賴於文字的語義相似度，這可能產生同質化的結果集，限制對替代方案的探索。我們主張模型搜尋本質上是比較性的：使用者希望模型既能與任務對齊，又能在可量化的指標上表現出差異。我們假設，要達到此平衡，需要從精簡、高品質的證據中進行檢索，而非冗長的描述，而這些證據大多集中在結構化表格中。我們提出 StructuredSemanticSearch，這是一個基於 ModelTables 基準的表格驅動模型搜尋框架。給定一個查詢，StructuredSemanticSearch 結合了用於任務對齊的語義基準，以及一個結構感知的流程，該流程利用表格發現運算子（如可聯集性、可連結性與關鍵字搜尋）來發現與查詢相關的模型卡表格。檢索到的表格會在受控的 top-k 預算下映射回模型卡，從而實現基於文本與基於表格的檢索之間的公平比較。除了檢索之外，StructuredSemanticSearch 還透過方向感知整合，將表格整合適配到模型表格領域，從部分重疊且有時轉置的證據表格中產出緊湊的整合視圖。在評估方面，我們引入了一個基於 nugget 的可審計協議，該協議從模型卡中提取緊湊的證據項目，將查詢匹配到特定條件或意圖的 nugget，並衡量檢索到的模型卡候選集中的證據覆蓋率與多樣性。該協議也為在動態模型湖中實現近似、基於證據的標註提供了可擴展的路徑。在 597 個模型推薦查詢上的實驗顯示，與語義基準相比，結構感知流程改善了 nugget 覆蓋率。

English

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline