PRISM: 확산 기반 텍스트 이미지 초해상도를 위한 사전 보정 및 불확실성 인식 구조 모델링

초록

텍스트 이미지 초해상도(Text-SR)는 시각적으로 그럴듯한 세부 합성 그 이상을 요구한다. 획 위상의 미세한 오류가 문자 식별을 바꾸고 가독성을 해칠 수 있기 때문이다. 기존 방법은 더 강력한 인식 기반 또는 생성 사전 지식을 통해 텍스트 충실도를 개선하지만, 심각한 열화 환경에서는 여전히 두 가지 해결되지 않은 과제에 직면한다. 저품질 입력에서 추출된 텍스트 조건 자체가 신뢰할 수 없을 수 있으며, 그럴듯한 전역 사전 지식이 세밀한 획 경계를 완전히 결정하지 못한다는 점이다. 본 논문에서는 흐름 정합 사전 정정(FMPR)과 구조 기반 불확실성 인식 잔차 인코더(SURE)를 통해 이 두 가지 과제를 해결하는 단일 단계 확산 기반 Text-SR 프레임워크인 PRISM을 제안한다. FMPR은 쌍을 이루는 저품질/고품질 잠재 변수로부터 특권 학습 시간 사전 지식을 구성하고, 열화된 임베딩을 이 복원 지향 사전 공간으로 이동시키는 흐름 정합을 학습하여 더 정확하고 신뢰할 수 있는 전역 텍스트 안내를 제공한다. SURE는 추가로 불확실성을 인식하는 구조적 잔차를 예측하여 모호한 획 단서를 억제하면서 신뢰할 수 있는 국소 경계 증거를 선택적으로 흡수한다. 이러한 구성 요소들은 단일 확산 복원 과정 내에서 명시적인 전역 사전 정정과 국소 구조 개선을 가능하게 한다. 합성 및 실제 벤치마크 실험에서 PRISM은 밀리초 단위 추론으로 최첨단 성능을 달성함을 보여준다. 데이터셋과 코드는 https://github.com/faithxuz/PRISM에서 공개될 예정이다.

English

Text image super-resolution (Text-SR) requires more than visually plausible detail synthesis: slight errors in stroke topology may alter character identity and break readability. Existing methods improve text fidelity with stronger recognition-based or generative priors, yet they still face two unresolved challenges under severe degradation: the text condition extracted from low-quality inputs can itself be unreliable, and a plausible global prior does not fully determine fine-grained stroke boundaries. We present PRISM, a single-step diffusion-based Text-SR framework that addresses these two challenges through Flow-Matching Prior Rectification (FMPR) and a Structure-guided Uncertainty-aware Residual Encoder (SURE). FMPR constructs a privileged training-time prior from paired low-quality/high-quality latents and learns a flow matching that transports degraded embeddings toward this restoration-oriented prior space, yielding more accurate and reliable global text guidance. SURE further predicts uncertainty-aware structural residuals to selectively absorb reliable local boundary evidence while suppressing ambiguous stroke cues. Together, these components enable explicit global prior rectification and local structure refinement within a single diffusion restoration pass. Experiments on both synthetic and real-world benchmarks show that PRISM achieves state-of-the-art performance with millisecond-level inference. Our dataset and code will be available at https://github.com/faithxuz/PRISM.