ST-Raptor：基於大語言模型的半結構化表格問答系統

摘要

半結構化表格在現實世界應用中（如財務報告、醫療記錄、交易訂單）廣泛使用，通常涉及靈活且複雜的佈局（如層次化標題和合併單元格）。這些表格通常依賴於人類分析師來解釋表格佈局並回答相關的自然語言問題，這既耗時又低效。為了自動化這一過程，現有方法面臨著重大挑戰。首先，像NL2SQL這樣的方法需要將半結構化表格轉換為結構化表格，這往往會導致大量信息丟失。其次，像NL2Code和多模態LLM QA這樣的方法難以理解半結構化表格的複雜佈局，無法準確回答相應的問題。為此，我們提出了ST-Raptor，這是一個基於樹的框架，利用大型語言模型進行半結構化表格問答。首先，我們引入了層次正交樹（HO-Tree），這是一種捕捉複雜半結構化表格佈局的結構模型，並提供了一個有效的樹構建算法。其次，我們定義了一組基本樹操作，以指導LLMs執行常見的QA任務。給定用戶問題，ST-Raptor將其分解為更簡單的子問題，生成相應的樹操作管道，並進行操作-表格對齊以確保管道執行的準確性。第三，我們引入了兩階段驗證機制：前向驗證檢查執行步驟的正確性，而後向驗證則通過從預測答案重建查詢來評估答案的可靠性。為了評估性能，我們提出了SSTQA，這是一個包含102個現實世界半結構化表格的764個問題的數據集。實驗表明，ST-Raptor在答案準確性上比九個基線方法高出最多20%。代碼可在https://github.com/weAIDB/ST-Raptor獲取。

English

Semi-structured tables, widely used in real-world applications (e.g., financial reports, medical records, transactional orders), often involve flexible and complex layouts (e.g., hierarchical headers and merged cells). These tables generally rely on human analysts to interpret table layouts and answer relevant natural language questions, which is costly and inefficient. To automate the procedure, existing methods face significant challenges. First, methods like NL2SQL require converting semi-structured tables into structured ones, which often causes substantial information loss. Second, methods like NL2Code and multi-modal LLM QA struggle to understand the complex layouts of semi-structured tables and cannot accurately answer corresponding questions. To this end, we propose ST-Raptor, a tree-based framework for semi-structured table question answering using large language models. First, we introduce the Hierarchical Orthogonal Tree (HO-Tree), a structural model that captures complex semi-structured table layouts, along with an effective algorithm for constructing the tree. Second, we define a set of basic tree operations to guide LLMs in executing common QA tasks. Given a user question, ST-Raptor decomposes it into simpler sub-questions, generates corresponding tree operation pipelines, and conducts operation-table alignment for accurate pipeline execution. Third, we incorporate a two-stage verification mechanism: forward validation checks the correctness of execution steps, while backward validation evaluates answer reliability by reconstructing queries from predicted answers. To benchmark the performance, we present SSTQA, a dataset of 764 questions over 102 real-world semi-structured tables. Experiments show that ST-Raptor outperforms nine baselines by up to 20% in answer accuracy. The code is available at https://github.com/weAIDB/ST-Raptor.

ST-Raptor：基於大語言模型的半結構化表格問答系統

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

摘要

Support