噪訊感知下的選擇性控制：模組化網絡中總體指標所隱藏的治理失靈

摘要

一個內容審核系統可能在所有標準準確率指標上表現優異，但仍可能造成實際傷害——若其錯誤集中在那些連結原本互不相關社群的少數用戶身上。我們透過一個基於代理的模型來說明此現象：在一個社群結構網路中，設置N=240個學習型代理，各自發布無害、有益或危險內容，而監管者會移除或處罰任何被雜訊分類器標記的內容。整體有用性幾乎不隨雜訊變化而變動（單因子變異數分析，p=0.96）：從總體指標來看，一切看似正常。然而，傷害卻集中在這些橋樑用戶身上——他們的有益貼文被錯誤壓制，危險貼文則被錯誤放行。將這兩種錯誤分別定價、獨立於執法成本的治理損失（L_gov），在假陽性偏重的雜訊下翻了一倍以上。總體準確率掩蓋了誰受到傷害，而最容易審計的廉價指標是用戶擁有多少連結（度數），這是定義橋樑用戶的中介中心性近乎完美的代理變數（r=0.96）。

English

A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the noise changes (one-way ANOVA, p=0.96): by aggregate measures, nothing looks wrong. The damage instead concentrates on these bridge users, whose useful posts are wrongly suppressed and whose dangerous posts are wrongly spared. A governance loss (L_gov) that prices these two mistakes separately from the cost of enforcement more than doubles under false-positive-heavy noise. Aggregate accuracy hides who is harmed, and the cheap quantity to audit is how many connections a user has (degree), a near-perfect proxy for the betweenness that defines a bridge (r=0.96).