Nacrith: 앙상블 컨텍스트 모델링 및 고정밀 CDF 코딩을 통한 신경망 무손실 압축

초록

본 논문에서는 1억 3,500만 개의 매개변수를 가진 변환기 언어 모델(SmolLM2-135M)과 경량 온라인 예측기 앙상블, 32비트 산술 코더를 결합한 무손실 압축 시스템인 Nacrith를 제안한다. 기본적인 LLM-산술-코딩 패러다임을 넘어 Nacrith는 다음과 같은 여러 기여를 도입한다: (1) CDF 정밀도를 2^16에서 2^24로 향상시켜 대규모 어휘 사전에서 최소 확률 기준값으로 인한 양자화 오버헤드의 약 75%를 제거; (2) 빠른 지역 예측을 위한 토큰 수준 N-gram 모델; (3) 온라인 경사 하강법을 통해 문서별 LLM 오류를 보정하는 적응형 로그 공간 편향 헤드; (4) 예측 가능성이 높은 토큰 가속화를 위한 신뢰도 기반 LLM 스킵; (5) 신경망 기반 압축을 임의의 이진 파일로 확장하는 하이브리드 이진 형식(NC06)—저자들이 아는 한 LLM 기반 압축기 중 최초; (6) PyTorch 대비 단일 토큰 디코딩 속도 약 7배 향상을 달성한 llama.cpp 추론 백엔드; (7) 최대 8개 작업자에 걸친 병렬 다중 GPU 압축; (8) 슬라이드당 비용을 약 37배 절감하는 기본 KV 캐시 슬라이딩 윈도우. 본 시스템은 소비자용 GPU에서 실행 시 약 500MB의 GGUF 가중치와 작업자당 약 1.2GB의 VRAM만을 요구한다. Canterbury Corpus의 alice29.txt(152KB)에서 Nacrith는 바이트당 0.918비트(bpb)를 달성하며, gzip 대비 3.1배, bzip2 대비 2.5배, CMIX v21 대비 44%, ts_zip 대비 20% 우수한 성능을 보였고, 0차, 1차, 2차 바이트 수준 섀넌 엔트로피 하한선 아래로 압축하였다. enwik8(100MB)에서는 0.9389bpb(11.74%)를 달성하여 ts_zip(약 1.11bpb) 대비 15%, FineZip(1.024bpb) 대비 8% 우수한 성능을 보였으며, 이는 미세 조정 없이 60배 더 작은 모델을 사용했음에도 불구한 결과이다. 모델 학습 종료 시점 이후에 발간된 문서에 대한 분포 외 평가에서 0.723bpb를 달성하여 이러한 성능 향상이 단순한 암기 효과가 아님을 확인하였다.

English

We present Nacrith, a lossless compression system that combines a 135M-parameter transformer language model (SmolLM2-135M) with an ensemble of lightweight online predictors and a 32-bit arithmetic coder. Beyond the base LLM-plus-arithmetic-coding paradigm, Nacrith introduces several contributions: (1) a CDF precision upgrade from 2^16 to 2^24 that eliminates ~75% of quantization overhead caused by minimum-probability floors in large vocabularies; (2) a token-level N-gram model for fast local predictions; (3) an adaptive log-space bias head correcting per-document LLM errors via online gradient descent; (4) confidence-based LLM skip for accelerating highly predictable tokens; (5) a hybrid binary format (NC06) extending neural compression to arbitrary binary files--to our knowledge a first among LLM-based compressors; (6) a llama.cpp inference backend achieving ~7x faster single-token decode than PyTorch; (7) parallel multi-GPU compression across up to 8 workers; and (8) native KV cache sliding window reducing per-slide cost by ~37x. The system requires only ~500 MB of GGUF weights and ~1.2 GB VRAM per worker, running on consumer GPUs. On alice29.txt (Canterbury Corpus, 152 KB), Nacrith achieves 0.918 bits per byte (bpb)--outperforming gzip by 3.1x, bzip2 by 2.5x, CMIX v21 by 44%, and ts_zip by 20%, while compressing below the 0th-, 1st-, and 2nd-order byte-level Shannon entropy bounds. On enwik8 (100 MB), Nacrith achieves 0.9389 bpb (11.74%), surpassing ts_zip (~1.11 bpb) by 15% and FineZip (1.024 bpb) by 8% despite using a 60x smaller model with no fine-tuning. An out-of-distribution evaluation on a document published after the model's training cutoff confirms these gains are not memorization artifacts, achieving 0.723 bpb on unseen text.

Nacrith: 앙상블 컨텍스트 모델링 및 고정밀 CDF 코딩을 통한 신경망 무손실 압축

Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding

초록

Support