ChatPaper.aiChatPaper

FRAPPE:基于投影追踪编码器的全输入残差输出自编码

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

May 27, 2026
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI

摘要

媒体压缩标准在率失真-复杂度权衡方面已达到瓶颈,限制了机器人、可穿戴设备和遥感等应用中将昂贵的AI感知任务卸载到云端的能力。基于深度神经网络的编解码器虽然提高了压缩效率,但存在代价:它们难以轻松适应可用比特率的大幅变化,且实时编码需要昂贵且高功耗的GPU,从而无法在低成本或资源受限平台上使用。为解决这些限制,我们提出了一种新型自编码框架(FRAPPE),该框架利用完整输入通过投影追踪编码器预测残差输出。FRAPPE的编码目标天然按重要性对潜在通道进行排序,从而实现零开销的可变比特率编码。与基于循环神经网络的学得编解码器(其编码器消耗前一重建结果的残差)或RVQ风格编解码器(其码本必须顺序应用)不同,FRAPPE的分析路径是一个可高度并行的独立输入投影有向无环图。我们利用FRAPPE构建了一个可变比特率RGB图像编解码器(FRAPPE-Image),并评估了其与标准图像编解码器在率失真-复杂度权衡方面的表现。在高压缩比(约0.1 bpp)下,FRAPPE-Image提供了比AVIF更高的感知质量,同时编码速度快47倍,实现了仅依赖CPU的1080p、30fps实时编码能力。我们的代码和预训练模型已开源:https://github.com/UT-SysML/FRAPPE。
English
Media compression standards have reached a plateau in terms of the rate-distortion-complexity trade-off, limiting the ability to offload expensive AI perception to the cloud in applications like robotics, wearables, and remote sensing. DNN-based codecs improve compression efficiency, but at a cost: they cannot easily adapt to large changes in available bitrate, and real-time encoding requires expensive, power-hungry GPUs that prohibit use on low-cost or resource-constrained platforms. To address these limitations, we propose a novel autoencoding framework (FRAPPE) that uses the Full input to predict the Residual output via a Projection Pursuit Encoder. FRAPPE's encoding objective naturally sorts latent channels by importance, allowing zero-overhead variable-rate coding. Unlike RNN-based learned codecs, whose encoder consumes the previous reconstruction's residual, or RVQ-style codecs, whose codebooks must be applied sequentially, FRAPPE's analysis path is an embarrassingly parallel DAG of independent input projections. Using FRAPPE, we build a variable-rate RGB image codec (FRAPPE-Image), and evaluate its rate-distortion-complexity trade-off against standard image codecs. At high compression ratios (approx. 0.1 bpp) FRAPPE-Image provides higher perceptual quality than AVIF with 47 times faster encoding, making it capable of real-time 1080p, 30fps CPU-only encoding. Our code and pre-trained models are available: https://github.com/UT-SysML/FRAPPE .