噪声感知下的选择性控制：模块化网络中被聚合指标掩盖的治理失败

摘要

一个内容审核系统在各项标准准确率指标上可能得分很高，但若其错误恰落在连接着原本相互分离的社区的少数用户（即“桥梁用户”）身上，仍可能造成实质性伤害。我们通过一个基于主体的模型证明这一点：在社区结构化网络中，N=240个学习主体各自发布无害、有益或危险内容，而监管机构根据噪声分类器的标记移除或处罚相关内容。当噪声水平变化时，整体效用几乎没有变动（单因素方差分析，p=0.96）：从汇总指标看，一切正常。然而伤害实际上集中在这些桥梁用户身上——他们发布的有用帖子被错误压制，而危险帖子却被错误放过。一种将这两种错误与执行成本分开计价的治理损失（L_gov）在假阳性偏重的噪声条件下增长超过一倍。聚合准确率掩盖了谁受到伤害，而最容易审计的量化指标是用户拥有的连接数（度），该指标与定义桥梁用户的中介中心性之间近乎完美相关（r=0.96）。

English

A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the noise changes (one-way ANOVA, p=0.96): by aggregate measures, nothing looks wrong. The damage instead concentrates on these bridge users, whose useful posts are wrongly suppressed and whose dangerous posts are wrongly spared. A governance loss (L_gov) that prices these two mistakes separately from the cost of enforcement more than doubles under false-positive-heavy noise. Aggregate accuracy hides who is harmed, and the cheap quantity to audit is how many connections a user has (degree), a near-perfect proxy for the betweenness that defines a bridge (r=0.96).