一键修复所有错误

摘要

大型语言模型（LLMs）的印象深刻能力提供了重新构想用户输入体验的强大途径。本文展示了Proofread，这是一项由Gboard上的服务器端LLM驱动的新颖功能，可通过一次轻点实现无缝的句子级和段落级校正。我们在本文中描述了完整的系统，从数据生成、指标设计到模型调优和部署。为了获得具有足够质量的模型，我们实施了一个针对在线使用情况量身定制的谨慎数据合成流程，设计了多方面的指标，采用了两阶段调优方法来获得专用于该功能的LLM：首先是用于基础质量的监督微调（SFT），然后是用于针对性改进的强化学习（RL）调优方法。具体来说，我们发现在SFT阶段，对重写和校对任务进行顺序调优可以获得最佳质量，并提出在RL调优阶段采用全局和直接奖励以寻求进一步改进。在人工标记的黄金数据集上进行了大量实验，结果显示我们调优的PaLM2-XS模型达到了85.56\%的良好比例。我们通过在Google Cloud的TPU v5上提供模型，将该功能推出到Pixel 8设备，拥有数千名每日活跃用户。通过量化、桶推断、文本分割和猜测解码，服务延迟显著降低。我们的演示视频可在https://youtu.be/4ZdcuiwFU7I{Youtube}中观看。

English

The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to model tuning and deployment. To obtain models with sufficient quality, we implement a careful data synthetic pipeline tailored to online use cases, design multifaceted metrics, employ a two-stage tuning approach to acquire the dedicated LLM for the feature: the Supervised Fine Tuning (SFT) for foundational quality, followed by the Reinforcement Learning (RL) tuning approach for targeted refinement. Specifically, we find sequential tuning on Rewrite and proofread tasks yields the best quality in SFT stage, and propose global and direct rewards in the RL tuning stage to seek further improvement. Extensive experiments on a human-labeled golden set showed our tuned PaLM2-XS model achieved 85.56\% good ratio. We launched the feature to Pixel 8 devices by serving the model on TPU v5 in Google Cloud, with thousands of daily active users. Serving latency was significantly reduced by quantization, bucket inference, text segmentation, and speculative decoding. Our demo could be seen in https://youtu.be/4ZdcuiwFU7I{Youtube}.

一键修复所有错误

Proofread: Fixes All Errors with One Tap

摘要

Support