GlyphPrinter：面向字形精准视觉文本渲染的区域分组直接偏好优化

摘要

生成準確的字形以實現視覺化文字渲染至關重要卻充滿挑戰。現有方法通常通過大量高質量場景文字圖像進行訓練來提升渲染效果，但字形變體覆蓋範圍有限與過度風格化往往會損害字形準確性，尤其對複雜或領域外字符更為明顯。部分方法採用強化學習緩解此問題，但其獎勵模型通常依賴對細粒度字形誤差不敏感的文字識別系統，導致含錯誤字元的圖像仍可能獲得高獎勵。受直接偏好優化（DPO）啟發，我們提出基於偏好的文字渲染方法GlyphPrinter，無需顯式獎勵模型。然而標準DPO目標僅建模樣本間的整體偏好關係，對於字形錯誤常集中於局部區域的視覺文字渲染而言並不充分。為此，我們構建帶區域級字形偏好標註的GlyphCorrector數據集，提出區域分組DPO（R-GDPO），通過基於標註區域的樣本間與樣本內偏好優化顯著提升字形準確性。此外，我們引入區域獎勵引導推理策略，從具有可控字形準確度的最優分佈中採樣。大量實驗表明，GlyphPrinter在保持風格化與精確度良好平衡的同時，字形準確性優於現有方法。

English

Generating accurate glyphs for visual text rendering is essential yet challenging. Existing methods typically enhance text rendering by training on a large amount of high-quality scene text images, but the limited coverage of glyph variations and excessive stylization often compromise glyph accuracy, especially for complex or out-of-domain characters. Some methods leverage reinforcement learning to alleviate this issue, yet their reward models usually depend on text recognition systems that are insensitive to fine-grained glyph errors, so images with incorrect glyphs may still receive high rewards. Inspired by Direct Preference Optimization (DPO), we propose GlyphPrinter, a preference-based text rendering method that eliminates reliance on explicit reward models. However, the standard DPO objective only models overall preference between two samples, which is insufficient for visual text rendering where glyph errors typically occur in localized regions. To address this issue, we construct the GlyphCorrector dataset with region-level glyph preference annotations and propose Region-Grouped DPO (R-GDPO), a region-based objective that optimizes inter- and intra-sample preferences over annotated regions, substantially enhancing glyph accuracy. Furthermore, we introduce Regional Reward Guidance, an inference strategy that samples from an optimal distribution with controllable glyph accuracy. Extensive experiments demonstrate that the proposed GlyphPrinter outperforms existing methods in glyph accuracy while maintaining a favorable balance between stylization and precision.

GlyphPrinter：面向字形精准视觉文本渲染的区域分组直接偏好优化

GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

摘要

Support