通过结构化表格发现的多样化模型发现

摘要

模型卡片通过文本描述与结构化工件（包括性能、配置和数据集表格）相结合的方式来描述模型行为。现有的模型搜索系统主要依赖文本层面的语义相似度，这可能导致结果集同质化，限制了替代方案的探索空间。我们认为模型搜索本质上是比较性的：用户需要的是任务对齐但在可测量维度上存在差异的模型。我们假设这种平衡需要通过检索精简的高质量证据（而非冗长描述）来实现，而这类证据大多集中在结构化的表格中。为此，我们提出基于ModelTables基准的表驱动模型搜索框架StructuredSemanticSearch。面对查询时，StructuredSemanticSearch将用于任务对齐的语义基线方法与结构感知管道相结合，通过可并性、可连接性和关键词搜索等表发现算子，挖掘与查询相关的模型卡片表格。检索到的表格在受控的top-k预算下映射回模型卡片，从而支持基于文本的检索与基于表格的检索之间的公平比较。超越基础检索能力，StructuredSemanticSearch通过方向感知集成技术，将表格整合适配至模型表格领域，从部分重叠甚至转置的证据表中生成紧凑的集成视图。在评估方面，我们引入基于要点、可审计的评估协议：从模型卡片中提取紧凑证据项，将查询匹配到条件或意图特定的要点，并衡量检索到的模型卡片候选集上的证据覆盖率与多样性。该协议还为动态模型库中的近似、基于证据的标注提供可扩展路径。在597个模型推荐查询上的实验表明，结构感知管道相比语义基线方法在要点覆盖率上有所提升。

English

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline