STAR: Semantische Tabelweergave met Header-bewust Clustering en Adaptieve Gewogen Fusie

Samenvatting

Tabelretrieval is de taak om de meest relevante tabellen uit grootschalige corpora te halen op basis van natuurlijke taalvragen. Structurele en semantische verschillen tussen ongestructureerde tekst en gestructureerde tabellen maken embedding-uitlijning echter bijzonder uitdagend. Recente methoden zoals QGpT proberen de tabel semantiek te verrijken door synthetische vragen te genereren, maar ze vertrouwen nog steeds op grove partiële-tabelsteekproeven en eenvoudige fusiestrategieën, wat de semantische diversiteit beperkt en effectieve vraag-tabel-uitlijning belemmert. Wij stellen STAR (Semantic Table Representation) voor, een lichtgewicht raamwerk dat de semantieke tabelrepresentatie verbetert door semantische clustering en gewogen fusie. STAR past eerst header-aware K-means clustering toe om semantisch vergelijkbare rijen te groeperen en selecteert representatieve centroid-instanties om een diverse partiële tabel te construeren. Vervolgens genereert het clusterspecifieke synthetische vragen om de semantische ruimte van de tabel uitgebreid te dekken. Ten slotte gebruikt STAR gewogen fusiestrategieën om tabel- en vraag-embeddings te integreren, waardoor fijnmazige semantische uitlijning mogelijk wordt. Dit ontwerp stelt STAR in staat complementaire informatie uit gestructureerde en tekstuele bronnen vast te leggen, wat de expressiviteit van tabelrepresentaties verbetert. Experimenten op vijf benchmarks tonen aan dat STAR consequent een hogere Recall bereikt dan QGpT op alle datasets, wat de effectiviteit aantoont van semantische clustering en adaptieve gewogen fusie voor robuuste tabelrepresentatie. Onze code is beschikbaar op https://github.com/adsl135789/STAR.

English

Table retrieval is the task of retrieving the most relevant tables from large-scale corpora given natural language queries. However, structural and semantic discrepancies between unstructured text and structured tables make embedding alignment particularly challenging. Recent methods such as QGpT attempt to enrich table semantics by generating synthetic queries, yet they still rely on coarse partial-table sampling and simple fusion strategies, which limit semantic diversity and hinder effective query-table alignment. We propose STAR (Semantic Table Representation), a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion. STAR first applies header-aware K-means clustering to group semantically similar rows and selects representative centroid instances to construct a diverse partial table. It then generates cluster-specific synthetic queries to comprehensively cover the table's semantic space. Finally, STAR employs weighted fusion strategies to integrate table and query embeddings, enabling fine-grained semantic alignment. This design enables STAR to capture complementary information from structured and textual sources, improving the expressiveness of table representations. Experiments on five benchmarks show that STAR achieves consistently higher Recall than QGpT on all datasets, demonstrating the effectiveness of semantic clustering and adaptive weighted fusion for robust table representation. Our code is available at https://github.com/adsl135789/STAR.

STAR: Semantische Tabelweergave met Header-bewust Clustering en Adaptieve Gewogen Fusie

STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion

Samenvatting

Support