AssertBench：大型语言模型自我断言能力评估基准

摘要

近期基准测试深入探讨了大语言模型（LLMs）在事实一致性与修辞稳健性方面的表现。然而，关于事实性陈述的方向性框架如何影响模型认同度，这一LLM用户常见场景，仍存在知识空白。AssertBench通过从事实验证数据集FEVEROUS中抽样证据支持的事实来填补这一空白。对于每一条（有证据支持的）事实，我们构建了两个框架提示：一个提示中用户声称该陈述在事实上是正确的，另一个则声称其不正确。随后，我们记录模型的认同度及其推理过程。理想的结果是，模型能够坚持己见，在两种框架下保持对事实的一致性评估，而非随用户观点改变其判断。AssertBench通过将结果分层，基于模型在相同主张以中立方式呈现时的准确性，从而将框架引发的变异性与模型底层的事实知识区分开来。通过这种方式，该基准测试旨在衡量LLM在面对用户对同一事实提出矛盾断言时，能否“坚持己见”。完整源代码可在https://github.com/achowd32/assert-bench获取。

English

Recent benchmarks have probed factual consistency and rhetorical robustness in Large Language Models (LLMs). However, a knowledge gap exists regarding how directional framing of factually true statements influences model agreement, a common scenario for LLM users. AssertBench addresses this by sampling evidence-supported facts from FEVEROUS, a fact verification dataset. For each (evidence-backed) fact, we construct two framing prompts: one where the user claims the statement is factually correct, and another where the user claims it is incorrect. We then record the model's agreement and reasoning. The desired outcome is that the model asserts itself, maintaining consistent truth evaluation across both framings, rather than switching its evaluation to agree with the user. AssertBench isolates framing-induced variability from the model's underlying factual knowledge by stratifying results based on the model's accuracy on the same claims when presented neutrally. In doing so, this benchmark aims to measure an LLM's ability to "stick to its guns" when presented with contradictory user assertions about the same fact. The complete source code is available at https://github.com/achowd32/assert-bench.

AssertBench：大型语言模型自我断言能力评估基准

AssertBench: A Benchmark for Evaluating Self-Assertion in Large Language Models

摘要

Support