Squeeze-Release: 반복적 가지치기를 통한 정확한 구조 최소화

초록

비정형 가지치기는 희소 가중치 텐서를 생성하지만, 표준 구현은 텐서 형태를 그대로 유지하므로 배포된 모델은 가지치기 전보다 작아지지 않는다. 본 연구에서는 최소화(minimization)라고 명명한 정확한 구조적 재작성을 제시하는데, 이는 마스킹된 네트워크를 부동소수점 반올림 오차를 제외하고 동일한 순방향 함수를 갖는 더 작은 밀집 네트워크로 변환한다. Squeeze-Release 주기는 가지치기와 최소화를 반복하되, 중간에 해제(release) 단계를 추가하여 압축된 텐서 내의 정확한 영점 위치를 작은 교정 잡음으로 다시 활성화함으로써, 그렇지 않으면 낭비될 용량을 다시 훈련 가능한 파라미터로 전환한다. 연속적인 주기는 이 용량을 활용하여 단일 패스로는 발견할 수 없는 구조적 중복성을 찾아낸다. 또한, CompensatedLayerNorm을 도입하는데, 이는 LayerNorm을 대체하는 함수 보존적 변환으로, LayerNorm이 장착된 잔차 스트림에서의 채널 축소를 최소화로 확장한다. Squeeze-Release는 완전 연결 모델 네트워크에서 가지치기 전 모델보다 39배, 최신 CNN(ConvNeXt-Tiny)에서는 14.8배 더 작은 배포 가능 네트워크로 압축하며, 비슷한 정확도를 유지한다. 추가로, 이 재작성이 트랜스포머 아키텍처로 확장 가능함을 증명한다.

English

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.