开源大语言模型为何在数据分析中表现欠佳？一项系统性实证研究

摘要

大型语言模型（LLMs）在自动化数据分析任务中展现出巨大潜力，然而开源模型在这类推理密集型场景中面临显著局限。本研究探讨了提升开源LLMs数据分析能力的策略。通过构建一个包含多样化、现实场景的种子数据集，我们从三个维度评估模型表现：数据理解、代码生成及战略规划。分析揭示出三大关键发现：（1）战略规划质量是模型性能的主要决定因素；（2）交互设计与任务复杂性显著影响推理能力；（3）在实现最优性能方面，数据质量比多样性具有更大影响。基于这些洞见，我们开发了一种数据合成方法，显著提升了开源LLMs的分析推理能力。

English

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs' analytical reasoning capabilities.

开源大语言模型为何在数据分析中表现欠佳？一项系统性实证研究

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

摘要

Support