利用大型語言模型與人類專家的事實查核方法，剖析新聞媒體的事實性與偏見

摘要

在這個以網絡上錯誤和虛假信息泛濫為特徵的時代，賦予讀者理解所閱內容的能力至關重要。此方向的重要努力依賴於手動或自動的事實核查，這對於信息有限的新興主張來說可能具有挑戰性。此類情況可以通過評估主張來源的可靠性和政治偏見來處理，即對整個新聞機構而非個別主張或文章進行特徵描述。這是一個重要但研究不足的方向。雖然先前的工作已經探討了語言和社會背景，但我們並不分析社交媒體中的個別文章或信息。相反，我們提出了一種新穎的方法論，模仿專業事實核查員用於評估整個機構事實性和政治偏見的標準。具體而言，我們基於這些標準設計了多種提示，並從大型語言模型（LLMs）中獲取回應，然後將其匯總以做出預測。除了通過多個LLMs的廣泛實驗展示相對於強基線的顯著改進外，我們還深入分析了媒體流行度和地區對模型性能的影響。此外，我們進行了消融研究，以突出我們數據集中促成這些改進的關鍵組成部分。為了促進未來的研究，我們在https://github.com/mbzuai-nlp/llm-media-profiling發布了我們的數據集和代碼。

English

In an age characterized by the proliferation of mis- and disinformation online, it is critical to empower readers to understand the content they are reading. Important efforts in this direction rely on manual or automatic fact-checking, which can be challenging for emerging claims with limited information. Such scenarios can be handled by assessing the reliability and the political bias of the source of the claim, i.e., characterizing entire news outlets rather than individual claims or articles. This is an important but understudied research direction. While prior work has looked into linguistic and social contexts, we do not analyze individual articles or information in social media. Instead, we propose a novel methodology that emulates the criteria that professional fact-checkers use to assess the factuality and political bias of an entire outlet. Specifically, we design a variety of prompts based on these criteria and elicit responses from large language models (LLMs), which we aggregate to make predictions. In addition to demonstrating sizable improvements over strong baselines via extensive experiments with multiple LLMs, we provide an in-depth error analysis of the effect of media popularity and region on model performance. Further, we conduct an ablation study to highlight the key components of our dataset that contribute to these improvements. To facilitate future research, we released our dataset and code at https://github.com/mbzuai-nlp/llm-media-profiling.