延遲消除:通過設備端校正來刪除遠程推理延遲
Dedelayed: Deleting remote inference delay via on-device correction
October 15, 2025
作者: Dan Jacobellis, Mateen Ulhaq, Fabien Racapé, Hyomin Choi, Neeraja J. Yadwadkar
cs.AI
摘要
遠端推理使得輕量級設備能夠利用強大的雲端模型。然而,通信網絡的延遲導致預測結果陳舊,不適合實時任務。為解決這一問題,我們提出了Dedelayed,這是一種延遲校正方法,能夠減輕任意遠端推理延遲,使本地設備能夠實時產生低延遲輸出。我們的方法採用了一個輕量級的本地模型,該模型處理當前幀並融合由重量級遠端模型從過去幀計算出的特徵。在BDD100K駕駛數據集的視頻上,Dedelayed在所有超過33毫秒的實際通信網絡延遲下,均優於僅本地和僅遠端基線中較強者的語義分割精度。在不引入額外延遲的情況下,與完全本地推理相比,其精度提高了6.4 mIoU,與遠端推理相比,提高了9.8 mIoU,往返延遲為100毫秒。在更長的延遲和更高運動場景下,這一優勢更加明顯,因為延遲減輕的分割推理更有效地保持了精度,為必須與當前世界狀態保持一致的實時任務提供了明顯優勢。
English
Remote inference allows lightweight devices to leverage powerful cloud
models. However, communication network latency makes predictions stale and
unsuitable for real-time tasks. To address this, we introduce Dedelayed, a
delay-corrective method that mitigates arbitrary remote inference delays,
allowing the local device to produce low-latency outputs in real time. Our
method employs a lightweight local model that processes the current frame and
fuses in features that a heavyweight remote model computes from past frames. On
video from the BDD100K driving dataset, Dedelayed improves semantic
segmentation accuracy over the stronger of the local-only and remote-only
baselines across all realistic communication network delays beyond 33 ms.
Without incurring additional delay, it improves accuracy by 6.4 mIoU compared
to fully local inference and 9.8 mIoU compared to remote inference, for a
round-trip delay of 100 ms. The advantage grows under longer delays and
higher-motion scenes, as delay-mitigated split inference sustains accuracy more
effectively, providing clear advantages for real-time tasks that must remain
aligned with the current world state.