ST-Raptor：基于大语言模型的半结构化表格问答系统

摘要

半结构化表格在现实应用中广泛使用（如财务报告、医疗记录、交易订单），通常包含灵活且复杂的布局（如分层表头和合并单元格）。这些表格通常依赖人工分析师解读表格布局并回答相关的自然语言问题，这种方式成本高且效率低。为了自动化这一过程，现有方法面临重大挑战。首先，像NL2SQL这样的方法需要将半结构化表格转换为结构化表格，这往往导致大量信息丢失。其次，NL2Code和多模态LLM QA等方法难以理解半结构化表格的复杂布局，无法准确回答相应问题。为此，我们提出了ST-Raptor，一个基于树的框架，利用大语言模型进行半结构化表格问答。首先，我们引入了层次正交树（HO-Tree），这是一种捕捉复杂半结构化表格布局的结构模型，并提供了构建该树的有效算法。其次，我们定义了一组基本树操作，以指导LLMs执行常见的问答任务。给定用户问题，ST-Raptor将其分解为更简单的子问题，生成相应的树操作管道，并进行操作-表格对齐以确保管道执行的准确性。第三，我们引入了一个两阶段验证机制：前向验证检查执行步骤的正确性，而后向验证通过从预测答案重构查询来评估答案的可靠性。为了评估性能，我们提出了SSTQA，一个包含102个真实世界半结构化表格的764个问题的数据集。实验表明，ST-Raptor在答案准确率上比九种基线方法高出最多20%。代码可在https://github.com/weAIDB/ST-Raptor获取。

English

Semi-structured tables, widely used in real-world applications (e.g., financial reports, medical records, transactional orders), often involve flexible and complex layouts (e.g., hierarchical headers and merged cells). These tables generally rely on human analysts to interpret table layouts and answer relevant natural language questions, which is costly and inefficient. To automate the procedure, existing methods face significant challenges. First, methods like NL2SQL require converting semi-structured tables into structured ones, which often causes substantial information loss. Second, methods like NL2Code and multi-modal LLM QA struggle to understand the complex layouts of semi-structured tables and cannot accurately answer corresponding questions. To this end, we propose ST-Raptor, a tree-based framework for semi-structured table question answering using large language models. First, we introduce the Hierarchical Orthogonal Tree (HO-Tree), a structural model that captures complex semi-structured table layouts, along with an effective algorithm for constructing the tree. Second, we define a set of basic tree operations to guide LLMs in executing common QA tasks. Given a user question, ST-Raptor decomposes it into simpler sub-questions, generates corresponding tree operation pipelines, and conducts operation-table alignment for accurate pipeline execution. Third, we incorporate a two-stage verification mechanism: forward validation checks the correctness of execution steps, while backward validation evaluates answer reliability by reconstructing queries from predicted answers. To benchmark the performance, we present SSTQA, a dataset of 764 questions over 102 real-world semi-structured tables. Experiments show that ST-Raptor outperforms nine baselines by up to 20% in answer accuracy. The code is available at https://github.com/weAIDB/ST-Raptor.

ST-Raptor：基于大语言模型的半结构化表格问答系统

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

摘要

Support