LiVeAction:一种轻量级、多功能且非对称的神经编解码器设计,用于实时运行
LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
May 7, 2026
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI
摘要
现代传感器能够生成丰富且高保真的数据,然而在可穿戴或遥感设备上运行的应用程序仍受限于带宽和功耗预算。JPEG和MPEG等标准化编解码器实现了码率与感知质量之间的高效权衡,但其设计针对人类感知,限制了它们在机器感知任务及非传统模态(如空间音频阵列、高光谱图像和三维医学图像)中的适用性。基于标量量化或分辨率降低的通用压缩方案虽适用范围广泛,却未能利用信号固有的冗余性,导致率失真性能欠佳。近期提出的生成式神经编解码器(即分词器)能够建模复杂的信号依赖关系,但往往存在参数过多、数据需求量大且模态特定等缺陷,使其难以在资源受限环境中实际应用。我们提出了一种轻量、通用且非对称的神经编解码器架构(LiVeAction),通过两个关键思想来应对上述局限:(1)为降低编码器复杂度以适应执行环境的资源约束,我们引入类FFT结构,并缩减基于神经网络的分析变换的总体规模与深度;(2)为实现任意信号模态并简化训练过程,我们采用基于方差的率惩罚替代对抗性损失与感知损失。我们的设计所生成的编解码器相比最先进的生成式分词器能够提供更优的率失真性能,同时仍适用于低功耗传感器的实际部署。我们已在https://github.com/UT-SysML/liveaction上发布了相关代码、实验内容及Python库。
English
Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for human perception, limiting their applicability to machine-perception tasks and non-traditional modalities such as spatial audio arrays, hyperspectral images, and 3D medical images. General-purpose compression schemes based on scalar quantization or resolution reduction are broadly applicable but fail to exploit inherent signal redundancies, resulting in suboptimal rate-distortion performance. Recent generative neural codecs, or tokenizers, model complex signal dependencies but are often over-parameterized, data-hungry, and modality-specific, making them impractical for resource-constrained environments. We introduce a Lightweight, Versatile, and Asymmetric neural codec architecture (LiVeAction), that addresses these limitations through two key ideas. (1) To reduce the complexity of the encoder to meet the resource constraints of the execution environments, we impose an FFT-like structure and reduce the overall size and depth of the neural-network-based analysis transform. (2) To allow arbitrary signal modalities and simplify training, we replace adversarial and perceptual losses with a variance-based rate penalty. Our design produces codecs that deliver superior rate-distortion performance compared to state-of-the-art generative tokenizers, while remaining practical for deployment on low-power sensors. We release our code, experiments, and python library at https://github.com/UT-SysML/liveaction .