ChatPaper.aiChatPaper

教導語言模型掌握工具使用的語言

Teaching a Language Model to Speak the Language of Tools

June 29, 2025
作者: Simeon Emanuilov
cs.AI

摘要

外部工具通過函數調用的整合對於實用的語言模型應用至關重要,然而大多數多語言模型在非英語語言中缺乏可靠的工具使用能力。即使是最先進的多語言模型,在決定何時使用工具以及生成函數調用所需的結構化輸出時也面臨困難,尤其是在提示使用資源較少的語言時,常常表現出語言混淆。本研究提出了一種方法,用於調整現有語言模型,使其能夠在任何目標語言中實現穩健的工具使用,並以保加利亞語作為案例研究。該方法涉及對BgGPT模型系列(2.6B、9B、27B參數)進行持續訓練,使用一個包含10,035個函數調用示例的新雙語數據集,旨在支持如MCP(模型上下文協議)等標準化協議。研究引入了TUCAN(工具使用能力助手導航器),其在函數調用準確性上相比基礎模型提升了高達28.75%,同時在保加利亞語基準測試中驗證了其核心語言理解能力的保持。除了準確性提升外,TUCAN模型展示了生產就緒的響應格式,提供清晰、可解析的函數調用,與基礎模型的冗長且不一致的輸出形成鮮明對比。模型、評估框架和數據集均已發布,以便其他語言進行複製。這項工作展示了將工具增強能力擴展到以英語為中心的系統之外的實用方法。
English
External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted in lower-resource languages. This work presents a methodology for adapting existing language models to enable robust tool use in any target language, using Bulgarian as a case study. The approach involves continued training of the BgGPT model series (2.6B, 9B, 27B parameters) on a novel bilingual dataset of 10,035 function-calling examples designed to support standardized protocols like MCP (Model Context Protocol). The research introduces TUCAN (Tool-Using Capable Assistant Navigator), which achieves up to 28.75% improvement in function-calling accuracy over base models while preserving core language understanding, as verified on established Bulgarian benchmarks. Beyond accuracy gains, TUCAN models demonstrate production-ready response formatting with clean, parsable function calls, contrasting with the verbose and inconsistent outputs of base models. The models, evaluation framework, and dataset are released to enable replication for other languages. This work demonstrates a practical approach for extending tool-augmented capabilities beyond English-centric systems.
PDF41July 1, 2025