超越真偽：基於檢索增強的細微主張層次分析

摘要

個人或實體所提出的主張往往具有細微差別，無法簡單地將其標記為完全“真實”或“虛假”——這在科學與政治主張中尤為常見。然而，一項主張（例如，“疫苗A優於疫苗B”）可被分解為其核心方面及子方面（如效力、安全性、分發情況），這些方面單獨來看更易於驗證。這種方法促成了一種更全面、結構化的回應，不僅為特定問題提供了全方位的視角，還讓讀者能夠優先關注主張中的特定角度（例如，對兒童的安全性）。因此，我們提出了ClaimSpect，這是一個基於檢索增強生成的框架，旨在自動構建處理主張時通常考慮的方面層次結構，並利用特定語料庫的視角對其進行豐富。該結構層次化地劃分輸入語料庫，以檢索相關片段，這些片段有助於發現新的子方面。此外，這些片段還能揭示對主張某一方面的不同觀點（如支持、中立或反對）及其各自的普遍性（例如，“有多少生物醫學論文認為疫苗A比B更易於運輸？”）。我們將ClaimSpect應用於我們構建的數據集中涵蓋的廣泛現實世界科學與政治主張，展示了其在解構複雜主張及反映語料庫內觀點方面的魯棒性和準確性。通過實際案例研究與人工評估，我們驗證了其相較於多種基線方法的有效性。

English

Claims made by individuals or entities are oftentimes nuanced and cannot be clearly labeled as entirely "true" or "false" -- as is frequently the case with scientific and political claims. However, a claim (e.g., "vaccine A is better than vaccine B") can be dissected into its integral aspects and sub-aspects (e.g., efficacy, safety, distribution), which are individually easier to validate. This enables a more comprehensive, structured response that provides a well-rounded perspective on a given problem while also allowing the reader to prioritize specific angles of interest within the claim (e.g., safety towards children). Thus, we propose ClaimSpect, a retrieval-augmented generation-based framework for automatically constructing a hierarchy of aspects typically considered when addressing a claim and enriching them with corpus-specific perspectives. This structure hierarchically partitions an input corpus to retrieve relevant segments, which assist in discovering new sub-aspects. Moreover, these segments enable the discovery of varying perspectives towards an aspect of the claim (e.g., support, neutral, or oppose) and their respective prevalence (e.g., "how many biomedical papers believe vaccine A is more transportable than B?"). We apply ClaimSpect to a wide variety of real-world scientific and political claims featured in our constructed dataset, showcasing its robustness and accuracy in deconstructing a nuanced claim and representing perspectives within a corpus. Through real-world case studies and human evaluation, we validate its effectiveness over multiple baselines.