잡음 인지 하 선택적 통제: 모듈형 네트워크에서 집계 지표에 의해 가려진 거버넌스 실패

초록

콘텐츠 중재 시스템이 모든 표준 정확도 지표에서 우수한 점수를 받더라도, 그 오류가 분리된 커뮤니티를 연결하는 소수 사용자에게 집중될 경우 실제로 해를 끼칠 수 있다. 본 연구는 커뮤니티 구조화 네트워크 상에서 N=240개의 학습 에이전트가 각각 무해한 콘텐츠, 생산적인 콘텐츠, 또는 위험한 콘텐츠를 게시하고, 조정자가 잡음이 있는 분류기가 표시한 모든 콘텐츠를 제거하거나 처벌하는 에이전트 기반 모델을 통해 이를 입증한다. 잡음이 변화함에 따라 전반적인 유용성은 거의 변하지 않는다(일원분산분석, p=0.96). 총체적 측정치로는 아무 문제가 없어 보인다. 피해는 대신 이러한 브리지 사용자에게 집중되는데, 이들의 유용한 게시물은 잘못 억제되고 위험한 게시물은 잘못 면제된다. 집행 비용과 별도로 이 두 가지 오류를 각각 평가하는 거버넌스 손실(L_gov)은 거짓양성 중심의 잡음 하에서 두 배 이상 증가한다. 총체적 정확도는 누가 피해를 입는지 숨기며, 감사하기 쉬운 양적 지표는 사용자가 가진 연결 수(차수)로, 이는 브리지를 정의하는 매개 중심성에 대한 거의 완벽한 대리 변수 역할을 한다(r=0.96).

English

A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the noise changes (one-way ANOVA, p=0.96): by aggregate measures, nothing looks wrong. The damage instead concentrates on these bridge users, whose useful posts are wrongly suppressed and whose dangerous posts are wrongly spared. A governance loss (L_gov) that prices these two mistakes separately from the cost of enforcement more than doubles under false-positive-heavy noise. Aggregate accuracy hides who is harmed, and the cheap quantity to audit is how many connections a user has (degree), a near-perfect proxy for the betweenness that defines a bridge (r=0.96).