ChatPaper.aiChatPaper

COMPASS:大型语言模型组织特定政策对齐评估框架

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

January 5, 2026
作者: Dasol Choi, DongGeon Lee, Brigitta Jesica Kartono, Helena Berndt, Taeyoun Kwon, Joonwon Jang, Haon Park, Hwanjo Yu, Minsuk Kahng
cs.AI

摘要

随着大型语言模型在从医疗到金融等高风险企业应用中的部署,确保其遵循组织特定政策已变得至关重要。然而现有安全评估仅关注普适性危害。我们提出COMPASS(企业/组织政策对齐评估)框架,这是首个系统化评估LLM是否遵守组织白名单与黑名单政策的方案。通过应用于八大行业场景,我们生成并验证了5,920个测试查询,既检验常规合规性,又通过策略性设计的边缘案例评估对抗鲁棒性。在对七个前沿模型的评估中,我们发现根本性不对称现象:模型能可靠处理合法请求(准确率>95%),却在执行禁令时出现灾难性失效,仅能拒绝13-40%的对抗性黑名单违规行为。这些结果表明当前LLM缺乏政策关键型部署所需的鲁棒性,确立了COMPASS作为组织级AI安全必备评估框架的地位。
English
As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results demonstrate that current LLMs lack the robustness required for policy-critical deployments, establishing COMPASS as an essential evaluation framework for organizational AI safety.
PDF41January 7, 2026