Rambler: LLM支援による要約操作を介した音声による執筆支援

要旨

ディクテーションはモバイルデバイスでの効率的なテキスト入力を可能にします。しかし、音声による文章作成は、不自然で冗長、かつ一貫性のないテキストを生成しがちで、そのため多大な後処理が必要となります。本論文では、LLM（大規模言語モデル）を活用したグラフィカルユーザーインターフェース「Rambler」を紹介します。Ramblerは、ディクテーションテキストの要旨レベルの操作を支援する2つの主要な機能セットを提供します：要旨抽出とマクロリビジョンです。要旨抽出は、音声テキストのレビューとインタラクションを支援するためのキーワードや要約を生成します。LLMを活用したマクロリビジョンでは、ユーザーは正確な編集位置を指定することなく、ディクテーションテキストを再録音、分割、結合、変換することができます。これらの機能は、自発的な発話と構造化された文章とのギャップを埋めるためのインタラクティブなディクテーションとリビジョンを実現します。12名の参加者による口頭作文タスクの比較研究では、Ramblerは音声テキストエディタ＋ChatGPTのベースラインを上回り、コンテンツに対するユーザーの制御を強化しつつ、驚くほど多様なユーザー戦略をサポートすることで、反復的なリビジョンをより効果的に促進しました。

English

Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.

Rambler: LLM支援による要約操作を介した音声による執筆支援

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

要旨

Support