ChatPaper.aiChatPaper

LiVeAction:一種輕量、多功能且不對稱的神經編解碼器設計,適用於即時操作

LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation

May 7, 2026
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI

摘要

現代感測器產生豐富且高保真度的數據,然而運行於可穿戴或遠端感測裝置上的應用仍受到頻寬與功率預算的限制。標準化編解碼器(如JPEG與MPEG)雖能在位元率與感知品質之間達成高效權衡,但其設計目標為人類感知,因而限制了在機器感知任務及非傳統模態(如空間音訊陣列、高光譜影像與3D醫學影像)中的適用性。基於標量量化或解析度降低的通用壓縮方案雖廣泛適用,卻無法有效利用訊號的內在冗餘,導致率失真表現次優。近期提出的生成式神經編解碼器(或稱標記化器)雖能建模複雜的訊號依賴關係,但往往存在參數過多、資料需求量大及模態特定等問題,使其在資源受限環境中難以實用。我們提出一種輕量、通用且非對稱的神經編解碼架構(LiVeAction),透過兩項關鍵概念解決前述限制:(1) 為降低編碼器複雜度以符合執行環境的資源限制,我們施加類似FFT的結構,並縮小基於神經網路分析轉換的整體規模與深度;(2) 為允許任意訊號模態並簡化訓練過程,我們以基於變異數的位元率懲罰項取代對抗性損失與感知損失。此設計所產生的編解碼器,其率失真效能優於當前最先進的生成式標記化器,同時仍能實際部署於低功率感測器上。我們在 https://github.com/UT-SysML/liveaction 公開程式碼、實驗結果及Python函式庫。
English
Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for human perception, limiting their applicability to machine-perception tasks and non-traditional modalities such as spatial audio arrays, hyperspectral images, and 3D medical images. General-purpose compression schemes based on scalar quantization or resolution reduction are broadly applicable but fail to exploit inherent signal redundancies, resulting in suboptimal rate-distortion performance. Recent generative neural codecs, or tokenizers, model complex signal dependencies but are often over-parameterized, data-hungry, and modality-specific, making them impractical for resource-constrained environments. We introduce a Lightweight, Versatile, and Asymmetric neural codec architecture (LiVeAction), that addresses these limitations through two key ideas. (1) To reduce the complexity of the encoder to meet the resource constraints of the execution environments, we impose an FFT-like structure and reduce the overall size and depth of the neural-network-based analysis transform. (2) To allow arbitrary signal modalities and simplify training, we replace adversarial and perceptual losses with a variance-based rate penalty. Our design produces codecs that deliver superior rate-distortion performance compared to state-of-the-art generative tokenizers, while remaining practical for deployment on low-power sensors. We release our code, experiments, and python library at https://github.com/UT-SysML/liveaction .
PDF41May 12, 2026