FinMCP-Bench: Benchmarking von LLM-Agenten für den Einsatz realer Finanztools unter dem Model Context Protocol

Zusammenfassung

Dieses Paper stellt FinMCP-Bench vor, einen neuartigen Benchmark zur Bewertung von Large Language Models (LLMs) bei der Lösung realer Finanzprobleme durch Tool-Invokation von Financial Model Context Protocols. FinMCP-Bench umfasst 613 Beispiele, die 10 Hauptszenarien und 33 Unterszenarien abdecken und sowohl echte als auch synthetische Nutzeranfragen enthalten, um Vielfalt und Authentizität zu gewährleisten. Es integriert 65 echte finanzielle MCPs und drei Arten von Beispielen – Single-Tool, Multi-Tool und Multi-Turn –, was eine Bewertung von Modellen über verschiedene Komplexitätsstufen von Aufgaben hinweg ermöglicht. Mithilfe dieses Benchmarks bewerten wir systematisch eine Reihe von Mainstream-LLMs und schlagen Metriken vor, die die Genauigkeit der Tool-Invokation und die Reasoning-Fähigkeiten explizit messen. FinMCP-Bench bietet eine standardisierte, praxisnahe und anspruchsvolle Testumgebung zur Förderung der Forschung zu finanziellen LLM-Agenten.

English

This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

FinMCP-Bench: Benchmarking von LLM-Agenten für den Einsatz realer Finanztools unter dem Model Context Protocol

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Zusammenfassung

Support