ワンタップですべてのエラーを修正

要旨

大規模言語モデル（LLMs）の驚異的な能力は、ユーザーのタイピング体験を再構築するための強力なアプローチを提供します。本論文では、GboardのサーバーサイドLLMによって駆動される新機能「Proofread」を紹介します。この機能は、ワンタップでシームレスな文レベルおよび段落レベルの修正を可能にします。本論文では、データ生成、メトリクス設計、モデルチューニング、デプロイメントに至るまでの完全なシステムを説明します。十分な品質を持つモデルを獲得するために、オンライン使用ケースに特化した慎重なデータ合成パイプラインを実装し、多面的なメトリクスを設計し、2段階のチューニングアプローチを採用して、この機能専用のLLMを獲得します。具体的には、基礎的な品質を確保するための教師あり微調整（SFT）と、ターゲットを絞った改善を行うための強化学習（RL）チューニングアプローチを採用します。特に、SFT段階では、RewriteとProofreadタスクを順次チューニングすることが最高の品質をもたらすことを発見し、RLチューニング段階では、さらなる改善を図るためにグローバルおよびダイレクトな報酬を提案します。人間がラベル付けしたゴールデンセットでの大規模な実験により、チューニングされたPaLM2-XSモデルが85.56％の良好率を達成したことが示されました。この機能は、Google CloudのTPU v5上でモデルを提供することでPixel 8デバイスにリリースされ、数千の日次アクティブユーザーを獲得しました。量子化、バケット推論、テキストセグメンテーション、および投機的デコードにより、サービングレイテンシーが大幅に削減されました。デモはhttps://youtu.be/4ZdcuiwFU7I{Youtube}でご覧いただけます。

English

The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to model tuning and deployment. To obtain models with sufficient quality, we implement a careful data synthetic pipeline tailored to online use cases, design multifaceted metrics, employ a two-stage tuning approach to acquire the dedicated LLM for the feature: the Supervised Fine Tuning (SFT) for foundational quality, followed by the Reinforcement Learning (RL) tuning approach for targeted refinement. Specifically, we find sequential tuning on Rewrite and proofread tasks yields the best quality in SFT stage, and propose global and direct rewards in the RL tuning stage to seek further improvement. Extensive experiments on a human-labeled golden set showed our tuned PaLM2-XS model achieved 85.56\% good ratio. We launched the feature to Pixel 8 devices by serving the model on TPU v5 in Google Cloud, with thousands of daily active users. Serving latency was significantly reduced by quantization, bucket inference, text segmentation, and speculative decoding. Our demo could be seen in https://youtu.be/4ZdcuiwFU7I{Youtube}.

ワンタップですべてのエラーを修正

Proofread: Fixes All Errors with One Tap

要旨

Support