ノイズのある知覚下での選択的制御：モジュラーネットワークにおける集計指標に隠されたガバナンスの失敗

要旨

コンテンツモデレーションシステムは、標準的な正確性指標において高いスコアを達成しながらも、その誤りが、そうでなければ分離されたコミュニティを結びつける少数のユーザーに集中する場合、実際に害を引き起こす可能性がある。我々は、コミュニティ構造を持つネットワーク上のN=240の学習エージェントがそれぞれ無害、生産的、または危険なコンテンツを投稿し、規制機関がノイズの多い分類器によってフラグ付けされたものを削除またはペナルティを課すエージェントベースモデルにおいてこれを示す。ノイズが変化しても全体的な有用性はほとんど変動せず（一元配置分散分析、p=0.96）、集計的な指標では何も問題がないように見える。損害は代わりにこれらのブリッジユーザーに集中し、彼らの有用な投稿は誤って抑制され、危険な投稿は誤って放置される。これら二つの誤りを執行コストとは別に価格付けするガバナンス損失（L_gov）は、偽陽性の多いノイズの下で2倍以上になる。集計的な正確性は誰が害を被っているかを隠蔽し、監査すべき安価な量はユーザーが持つ接続数（次数）であり、これはブリッジを定義する媒介性のほぼ完全な代理指標となる（r=0.96）。

English

A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the noise changes (one-way ANOVA, p=0.96): by aggregate measures, nothing looks wrong. The damage instead concentrates on these bridge users, whose useful posts are wrongly suppressed and whose dangerous posts are wrongly spared. A governance loss (L_gov) that prices these two mistakes separately from the cost of enforcement more than doubles under false-positive-heavy noise. Aggregate accuracy hides who is harmed, and the cheap quantity to audit is how many connections a user has (degree), a near-perfect proxy for the betweenness that defines a bridge (r=0.96).