LLMにパーソナライズを教える - 文章教育に着想を得たアプローチ

要旨

パーソナライズされたテキスト生成は、近年注目を集めている新興研究分野である。この方向性の研究の多くは、特定のドメインに焦点を当て、独自の特徴量やモデルを設計することに重点を置いている。本研究では、大規模言語モデル（LLM）を用いた汎用的なパーソナライズドテキスト生成のアプローチを提案する。文章教育の実践に着想を得て、パーソナライズド生成のためにLLMを教える多段階・多タスクのフレームワークを開発した。文章指導において、出典からの執筆タスクは、情報の発見、評価、要約、統合、統合といった複数のステップに分解されることが多い。同様に、我々のパーソナライズドテキスト生成アプローチも、検索、ランキング、要約、統合、生成という複数の段階から構成される。さらに、教育現場での観察から得られた知見、すなわち学生の読解力と文章力がしばしば相関しているという事実に基づき、モデルの生成能力をさらに向上させる多タスク設定を導入した。我々は、異なる代表的なドメインをカバーする3つの公開データセットでこのアプローチを評価し、様々なベースラインと比較して大幅な改善を示す結果を得た。

English

Personalized text generation is an emerging research area that has attracted much attention in recent years. Most studies in this direction focus on a particular domain by designing bespoke features or models. In this work, we propose a general approach for personalized text generation using large language models (LLMs). Inspired by the practice of writing education, we develop a multistage and multitask framework to teach LLMs for personalized generation. In writing instruction, the task of writing from sources is often decomposed into multiple steps that involve finding, evaluating, summarizing, synthesizing, and integrating information. Analogously, our approach to personalized text generation consists of multiple stages: retrieval, ranking, summarization, synthesis, and generation. In addition, we introduce a multitask setting that helps the model improve its generation ability further, which is inspired by the observation in education that a student's reading proficiency and writing ability are often correlated. We evaluate our approach on three public datasets, each of which covers a different and representative domain. Our results show significant improvements over a variety of baselines.

LLMにパーソナライズを教える - 文章教育に着想を得たアプローチ

Teach LLMs to Personalize -- An Approach inspired by Writing Education

要旨

Support