ReALM: 参照解決を言語モデリングとして

要旨

参照解決は重要な課題であり、さまざまな種類の文脈を理解し適切に扱うために不可欠である。この文脈には、過去の会話のターンだけでなく、ユーザーの画面上にあるエンティティやバックグラウンドで動作しているエンティティなど、非会話的なエンティティに関連する文脈も含まれる。大規模言語モデル（LLM）はさまざまなタスクで非常に強力であることが示されているが、特に非会話的なエンティティに対する参照解決での利用はまだ十分に活用されていない。本論文では、参照解決を言語モデリング問題に変換することで、画面上のエンティティなど、従来はテキストのみのモダリティに還元することが難しい形式のエンティティを含む場合でも、LLMを活用して非常に効果的な参照解決システムを構築する方法を示す。既存の類似機能を持つシステムと比較して、さまざまな種類の参照に対して大幅な改善を示し、最小のモデルでも画面上の参照に対して5%以上の絶対的な性能向上を達成した。また、GPT-3.5およびGPT-4とのベンチマークを行い、最小のモデルがGPT-4に匹敵する性能を達成し、より大規模なモデルではGPT-4を大幅に上回る結果を示した。

English

Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

ReALM: 参照解決を言語モデリングとして

ReALM: Reference Resolution As Language Modeling

要旨

Support