SEAOTTER：基於一次性轉碼之感測器嵌入自編碼高效重建技術

摘要

在机器人系统中，大量视觉数据可轻松地通过低成本、低功耗的硬件以高分辨率获取。然而，有限的带宽和设备端计算资源阻碍了通过传统编解码器（如JPEG/MPEG）传输这些数据时的充分利用。新型编解码器（如AV1/AVIF）改善了率失真权衡，但需要更多的编码资源，若缺乏定制ASIC则难以实际应用。近期的不对称自编码器在极端功耗和带宽限制下实现了高质量，但解码成本过高，且采用定制格式，忽略了围绕JPEG等标准建立的数十年基础设施。为解决这些局限，我们提出了一种面向云机器人的压缩框架——传感器嵌入自编码器与一次性转码高效重建（SEAOTTER）。由于传感器、云和消费端面临截然不同的功耗与带宽预算，SEAOTTER结合了学习型潜空间的紧凑性与标准JPEG文件的广泛可用性。鉴于简单转码会降低性能，我们提出了一种可学习的JPEG颜色与量化变换，能够提升全局、密集及视觉语言感知的准确性。使用SEAOTTER，我们为预训练的冻结编码器训练了通用型和任务感知型转码流程。在200:1的压缩比下，与AVIF相比，我们的编码速度提升7倍，解码速度提升3.5倍，ImageNet top-1准确率提升8%，同时保持与JPEG基础设施的兼容性。我们的代码详见 https://github.com/UT-SysML/seaotter。

English

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .