FRAPPE：全輸入、殘差輸出自動編碼與投影追蹤編碼器

摘要

媒體壓縮標準在率失真複雜度權衡上已達到瓶頸，限制了在機器人、可穿戴裝置及遙測等應用中將昂貴的AI感知任務卸載至雲端的能力。基於DNN的編解碼器雖能提升壓縮效率，卻伴隨代價：難以適應位元率的大幅變化，且即時編碼需依賴昂貴且高功耗的GPU，因而無法用於低成本或資源受限平台。為解決這些限制，我們提出新穎的自動編碼框架FRAPPE，其透過投影追蹤編碼器使用完整輸入預測殘差輸出。FRAPPE的編碼目標自然排序潛在通道的重要性，實現零開銷的可變位元率編碼。不同於基於RNN的學習型編解碼器（其編碼器消耗先前重建的殘差）或RVQ風格編解碼器（其碼本需順序應用），FRAPPE的分析路徑是獨立輸入投影的易並行化有向無環圖。我們利用FRAPPE建構可變位元率RGB影像編解碼器FRAPPE-Image，並與標準影像編解碼器在率失真複雜度權衡上進行比較。在高壓縮比（約0.1 bpp）下，FRAPPE-Image提供的感知品質優於AVIF，編碼速度卻快47倍，使其能在僅使用CPU的情況下實現1080p、30fps的即時編碼。我們的程式碼與預訓練模型已公開於：https://github.com/UT-SysML/FRAPPE 。

English

Media compression standards have reached a plateau in terms of the rate-distortion-complexity trade-off, limiting the ability to offload expensive AI perception to the cloud in applications like robotics, wearables, and remote sensing. DNN-based codecs improve compression efficiency, but at a cost: they cannot easily adapt to large changes in available bitrate, and real-time encoding requires expensive, power-hungry GPUs that prohibit use on low-cost or resource-constrained platforms. To address these limitations, we propose a novel autoencoding framework (FRAPPE) that uses the Full input to predict the Residual output via a Projection Pursuit Encoder. FRAPPE's encoding objective naturally sorts latent channels by importance, allowing zero-overhead variable-rate coding. Unlike RNN-based learned codecs, whose encoder consumes the previous reconstruction's residual, or RVQ-style codecs, whose codebooks must be applied sequentially, FRAPPE's analysis path is an embarrassingly parallel DAG of independent input projections. Using FRAPPE, we build a variable-rate RGB image codec (FRAPPE-Image), and evaluate its rate-distortion-complexity trade-off against standard image codecs. At high compression ratios (approx. 0.1 bpp) FRAPPE-Image provides higher perceptual quality than AVIF with 47 times faster encoding, making it capable of real-time 1080p, 30fps CPU-only encoding. Our code and pre-trained models are available: https://github.com/UT-SysML/FRAPPE .