ChatPaper.aiChatPaper

FRAPPE:全輸入、殘差輸出自動編碼與投影追蹤編碼器

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

May 27, 2026
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI

摘要

媒體壓縮標準在率失真複雜度權衡上已達到瓶頸,限制了在機器人、可穿戴裝置及遙測等應用中將昂貴的AI感知任務卸載至雲端的能力。基於DNN的編解碼器雖能提升壓縮效率,卻伴隨代價:難以適應位元率的大幅變化,且即時編碼需依賴昂貴且高功耗的GPU,因而無法用於低成本或資源受限平台。為解決這些限制,我們提出新穎的自動編碼框架FRAPPE,其透過投影追蹤編碼器使用完整輸入預測殘差輸出。FRAPPE的編碼目標自然排序潛在通道的重要性,實現零開銷的可變位元率編碼。不同於基於RNN的學習型編解碼器(其編碼器消耗先前重建的殘差)或RVQ風格編解碼器(其碼本需順序應用),FRAPPE的分析路徑是獨立輸入投影的易並行化有向無環圖。我們利用FRAPPE建構可變位元率RGB影像編解碼器FRAPPE-Image,並與標準影像編解碼器在率失真複雜度權衡上進行比較。在高壓縮比(約0.1 bpp)下,FRAPPE-Image提供的感知品質優於AVIF,編碼速度卻快47倍,使其能在僅使用CPU的情況下實現1080p、30fps的即時編碼。我們的程式碼與預訓練模型已公開於:https://github.com/UT-SysML/FRAPPE 。
English
Media compression standards have reached a plateau in terms of the rate-distortion-complexity trade-off, limiting the ability to offload expensive AI perception to the cloud in applications like robotics, wearables, and remote sensing. DNN-based codecs improve compression efficiency, but at a cost: they cannot easily adapt to large changes in available bitrate, and real-time encoding requires expensive, power-hungry GPUs that prohibit use on low-cost or resource-constrained platforms. To address these limitations, we propose a novel autoencoding framework (FRAPPE) that uses the Full input to predict the Residual output via a Projection Pursuit Encoder. FRAPPE's encoding objective naturally sorts latent channels by importance, allowing zero-overhead variable-rate coding. Unlike RNN-based learned codecs, whose encoder consumes the previous reconstruction's residual, or RVQ-style codecs, whose codebooks must be applied sequentially, FRAPPE's analysis path is an embarrassingly parallel DAG of independent input projections. Using FRAPPE, we build a variable-rate RGB image codec (FRAPPE-Image), and evaluate its rate-distortion-complexity trade-off against standard image codecs. At high compression ratios (approx. 0.1 bpp) FRAPPE-Image provides higher perceptual quality than AVIF with 47 times faster encoding, making it capable of real-time 1080p, 30fps CPU-only encoding. Our code and pre-trained models are available: https://github.com/UT-SysML/FRAPPE .