学習中の適応：知的ツール使用適応による科学問題のLLMの基盤化

要旨

大規模言語モデル（LLMs）は、単純な科学問題の解決において有望な能力を示していますが、複雑な問題に対しては幻覚を生じることがよくあります。LLMsをツールと統合することで信頼性を向上させることができますが、このアプローチは通常、ツールへの過度の依存をもたらし、基本的な推論を通じて単純な問題を解決するモデルの能力を低下させます。これに対し、人間の専門家は、適切な解決方法を選択する前に、ドメイン知識を用いて問題の複雑さを最初に評価します。この人間の問題解決プロセスに着想を得て、私たちは新しい二要素の微調整方法を提案します。最初の要素であるWorld Knowledge Distillation（WKD）では、LLMsは、ツールの情報を使用して生成された解決策から直接学習し、ドメイン知識を内面化します。2番目の要素であるTool Usage Adaptation（TUA）では、モデルの直接回答精度に基づいて問題を簡単なものと難しいものに分割します。WKDと同様に簡単な問題に対しては同じアライメントターゲットを維持しつつ、より難しい問題に対してはモデルが知的にツールの使用に切り替えるようにトレーニングします。私たちは、数学、気候科学、および疫学を含む6つの科学ベンチマークデータセットで当社の手法を検証しました。平均して、当社のモデルは、すべてのデータセットで回答精度が28.18％向上し、ツールの使用精度が13.89％向上し、GPT-4oやClaude-3.5などの最先端モデルを上回りました。

English

Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but often produce hallucinations for complex ones. While integrating LLMs with tools can increase reliability, this approach typically results in over-reliance on tools, diminishing the model's ability to solve simple problems through basic reasoning. In contrast, human experts first assess problem complexity using domain knowledge before choosing an appropriate solution approach. Inspired by this human problem-solving process, we propose a novel two-component fine-tuning method. In the first component World Knowledge Distillation (WKD), LLMs learn directly from solutions generated using tool's information to internalize domain knowledge. In the second component Tool Usage Adaptation (TUA), we partition problems into easy and hard categories based on the model's direct answering accuracy. While maintaining the same alignment target for easy problems as in WKD, we train the model to intelligently switch to tool usage for more challenging problems. We validate our method on six scientific benchmark datasets, spanning mathematics, climate science and epidemiology. On average, our models demonstrate a 28.18% improvement in answer accuracy and a 13.89% increase in tool usage precision across all datasets, surpassing state-of-the-art models including GPT-4o and Claude-3.5.

学習中の適応：知的ツール使用適応による科学問題のLLMの基盤化

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

要旨

Support