SEAOTTER：传感器嵌入自编码与一次性转码的高效重建

摘要

在机器人系统中，低成本、低功耗硬件可轻松捕获高分辨率的大量视觉数据。然而，有限的带宽和机载计算资源使得通过JPEG/MPEG等传统编解码器传输时无法充分利用这些数据。AV1/AVIF等新型编解码器改善了率失真权衡，但编码所需资源大幅增加，缺乏专用ASIC时难以实用。近期提出的非对称自编码器在极端功耗和带宽约束下实现了高质量，但带来了高昂的解码成本，且采用定制格式，忽视了围绕JPEG等标准构建的数十年基础设施。为解决这些局限性，我们提出了一种基于传感器嵌入式自编码器配合一次性转码高效重建（SEAOTTER）的云端机器人压缩框架。由于传感器、云端和消费端面临截然不同的功耗与带宽预算，SEAOTTER将学习型潜表示的紧凑性与标准JPEG文件的广泛可用性相结合。鉴于直接转码会降低性能，我们提出一种可学习的JPEG色彩和量化变换，从而提升全局、密集及基于视觉语言的感知任务的精度。通过SEAOTTER，我们为预训练且冻结的编码器训练了通用型及任务感知型转码流水线。在200:1的压缩比下，与AVIF相比，编码速度提升7倍，解码速度提升3.5倍，ImageNet Top-1准确率提高8%，同时保持与JPEG基础设施的兼容性。我们的代码开源于 https://github.com/UT-SysML/seaotter。

English

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .