PRISM: 面向基於擴散的文本圖像超解析度的先驗修正與不確定性感知結構建模

摘要

文字影像超解析度（Text-SR）不僅需要視覺上合理的細節合成：筆畫拓撲的細微錯誤可能改變字符身分並破壞可讀性。現有方法透過更強的辨識基礎或生成先驗來提升文字保真度，但在嚴重退化情況下仍面臨兩項未解決的挑戰：從低品質輸入提取的文字條件本身可能不可靠，且合理的全局先驗無法完全決定細粒度的筆畫邊界。我們提出PRISM，這是一個基於單步擴散的文字超解析度框架，透過流匹配先驗修正（FMPR）與結構引導不確定性感知殘差編碼器（SURE）來應對上述挑戰。FMPR從配對的低/高品質潛變量中建構具有特權的訓練階段先驗，並學習將退化嵌入向量朝向此還原導向先驗空間的流匹配，從而產生更準確且可靠的全局文字引導。SURE進一步預測不確定性感知的結構殘差，以選擇性地吸收可靠的局部邊界證據，同時抑制模糊的筆畫線索。透過這些元件，可在單一擴散還原過程中實現明確的全局先驗修正與局部結構精煉。在合成與真實世界基準上的實驗顯示，PRISM以毫秒級推論速度達到了最先進的表現。我們的資料集與程式碼將於 https://github.com/faithxuz/PRISM 公開。

English

Text image super-resolution (Text-SR) requires more than visually plausible detail synthesis: slight errors in stroke topology may alter character identity and break readability. Existing methods improve text fidelity with stronger recognition-based or generative priors, yet they still face two unresolved challenges under severe degradation: the text condition extracted from low-quality inputs can itself be unreliable, and a plausible global prior does not fully determine fine-grained stroke boundaries. We present PRISM, a single-step diffusion-based Text-SR framework that addresses these two challenges through Flow-Matching Prior Rectification (FMPR) and a Structure-guided Uncertainty-aware Residual Encoder (SURE). FMPR constructs a privileged training-time prior from paired low-quality/high-quality latents and learns a flow matching that transports degraded embeddings toward this restoration-oriented prior space, yielding more accurate and reliable global text guidance. SURE further predicts uncertainty-aware structural residuals to selectively absorb reliable local boundary evidence while suppressing ambiguous stroke cues. Together, these components enable explicit global prior rectification and local structure refinement within a single diffusion restoration pass. Experiments on both synthetic and real-world benchmarks show that PRISM achieves state-of-the-art performance with millisecond-level inference. Our dataset and code will be available at https://github.com/faithxuz/PRISM.