인간 전문가의 팩트체크 방법론과 대형 언어 모델(LLMs)을 활용한 뉴스 미디어의 사실성 및 편향성 프로파일링

초록

오늘날 온라인상에서 오보와 허위 정보가 확산되는 시대적 특성을 고려할 때, 독자들이 자신이 읽는 콘텐츠를 이해할 수 있도록 돕는 것이 중요합니다. 이를 위한 주요 노력은 수동 또는 자동 팩트 체킹에 의존하지만, 정보가 제한된 새로운 주장에 대해서는 이 방법이 어려울 수 있습니다. 이러한 상황은 주장의 출처인 뉴스 매체의 신뢰도와 정치적 편향성을 평가함으로써 해결할 수 있습니다. 즉, 개별 주장이나 기사가 아니라 전체 뉴스 매체를 특성화하는 것입니다. 이는 중요하지만 아직 충분히 연구되지 않은 분야입니다. 기존 연구는 언어적 및 사회적 맥락을 살펴보았지만, 우리는 개별 기사나 소셜 미디어의 정보를 분석하지 않습니다. 대신, 우리는 전문 팩트 체커들이 전체 매체의 사실성과 정치적 편향성을 평가하는 기준을 모방한 새로운 방법론을 제안합니다. 구체적으로, 이러한 기준에 기반한 다양한 프롬프트를 설계하고, 대형 언어 모델(LLM)로부터 응답을 이끌어내어 이를 종합하여 예측을 수행합니다. 여러 LLM을 사용한 광범위한 실험을 통해 강력한 베이스라인 대비 상당한 개선을 보여줄 뿐만 아니라, 매체의 인기와 지역이 모델 성능에 미치는 영향에 대한 심층적인 오류 분석을 제공합니다. 또한, 이러한 개선에 기여하는 데이터셋의 주요 구성 요소를 강조하기 위해 어블레이션 연구를 수행합니다. 향후 연구를 촉진하기 위해, 우리는 데이터셋과 코드를 https://github.com/mbzuai-nlp/llm-media-profiling 에 공개했습니다.

English

In an age characterized by the proliferation of mis- and disinformation online, it is critical to empower readers to understand the content they are reading. Important efforts in this direction rely on manual or automatic fact-checking, which can be challenging for emerging claims with limited information. Such scenarios can be handled by assessing the reliability and the political bias of the source of the claim, i.e., characterizing entire news outlets rather than individual claims or articles. This is an important but understudied research direction. While prior work has looked into linguistic and social contexts, we do not analyze individual articles or information in social media. Instead, we propose a novel methodology that emulates the criteria that professional fact-checkers use to assess the factuality and political bias of an entire outlet. Specifically, we design a variety of prompts based on these criteria and elicit responses from large language models (LLMs), which we aggregate to make predictions. In addition to demonstrating sizable improvements over strong baselines via extensive experiments with multiple LLMs, we provide an in-depth error analysis of the effect of media popularity and region on model performance. Further, we conduct an ablation study to highlight the key components of our dataset that contribute to these improvements. To facilitate future research, we released our dataset and code at https://github.com/mbzuai-nlp/llm-media-profiling.

인간 전문가의 팩트체크 방법론과 대형 언어 모델(LLMs)을 활용한 뉴스 미디어의 사실성 및 편향성 프로파일링

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts

초록

Support