StructLM: 構造化知識の基盤化に向けた汎用モデルの構築

要旨

表、グラフ、データベースなどの構造化データソースは、遍在する知識源です。大規模言語モデル（LLM）がプレーンテキストにおいて示した能力にもかかわらず、構造化データの解釈と活用における熟練度は限られています。私たちの調査では、LLMの構造化データ処理能力に顕著な欠陥があることが明らかになりました。例えば、ChatGPTは最先端（SoTA）モデルに平均35%遅れを取っています。LLMの構造化知識基盤（SKG）能力を強化するため、110万の例を含む包括的な指示チューニングデータセットを開発しました。このデータセットを活用し、Code-LLaMAアーキテクチャに基づく7Bから34BパラメータまでのStructLMシリーズのモデルをトレーニングしました。私たちのStructLMシリーズは、評価された18のデータセットのうち14においてタスク固有のモデルを上回り、7つのSKGタスクで新たなSoTAを達成しました。さらに、StructLMは6つの新しいSKGタスクにおいて卓越した汎化能力を示しました。予想に反して、モデルサイズのスケーリングは限定的な利益しかもたらさず、StructLM-34BはStructLM-7Bに対してわずかな改善しか示しませんでした。これは、構造化知識基盤が依然として困難なタスクであり、新たなレベルに押し上げるためにはより革新的な設計が必要であることを示唆しています。

English

Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Code-LLaMA architecture, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 14 out of 18 evaluated datasets and establishes new SoTA achievements on 7 SKG tasks. Furthermore, StructLM demonstrates exceptional generalization across 6 novel SKG tasks. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.

StructLM: 構造化知識の基盤化に向けた汎用モデルの構築

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

要旨

Support