대칭성-데이터 교환율 측정

초록

등변성 이론은 아키텍처 대칭 사전 분포가 표본 복잡도를 |G|배만큼 줄인다고 예측하는데, 이는 널리 인용되지만 통제된 환경에서 사전 분포와 교란 요인을 분리하여 스케일링 법칙으로 측정된 경우는 드물다. 통제된 C_n 대칭 과제에서 우리는 세 가지 결과를 보고한다. 첫째, 동일한 궤도 크기와 일치하는 계산량을 가진 잘못된 그룹 통제는 제약이 없는 경우보다 더 나쁘다(쌍별 결합 CI [+0.79, +3.26]이 0을 배제하며, 추정량에 걸쳐 강건함). 정렬되지 않은 제약은 단순히 도움이 되지 않는 것이 아니라 적극적으로 해롭다. 둘째, 테스트 시 궤도 평균을 갖춘 증강 기준선은 등변 모델과 정확히 일치한다(일치하는 셀에 걸쳐 비트 단위로 동일한 에폭별 검증 곡선). 따라서 아키텍처 대 증강 간 격차는 비대칭적 테스트 시 계산량에 조건부적이며, 무조건적이지 않다. 셋째, 상대 교환 비율 beta_diff = 1.28은 부호와 크기 순서에서 이론적 값 1.0과 일관된다(단일 수준 CI [+0.92, +2.05]). 보다 보수적인 이중 수준 부트스트랩(시드 × 그룹 크기)은 이 구간을 [-0.63, +1.72]로 확장하며 0을 포함하고, √2 간격 그리드에서의 더 세밀한 N 복제는 결정적이지 않다(점 추정치 -0.82). 방법론적 기여(공유 난이도 교란 요인을 상쇄하는 상대 비율 추정량, 잘못된 그룹 통제, 사전 지정된 실패 분류 체계)는 강도를 매개변수화할 수 있는 모든 귀납적 편향에 적용 가능하다. 정직한 범위 설정: 주요 추정량인 beta_diff는 초기 분석에서 양의 기울기 식별 가능성 문제가 드러난 후 사후적으로 채택되었다. 설계는 외부에 사전 등록되지 않았으며, 표제 수치는 성근 N 그리드에서 7개 그룹 크기에 대한 OLS 기울기에 의존한다. 이는 확증적 측정이 아닌 탐색적 연구이며, 잘못된 그룹 결과가 가장 명확한 발견으로 가장 높은 확신을 가지고 보고한다. 새로운 시드를 사용한 등록된 복제 연구는 향후 과제로 남긴다.

English

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.