开源大语言模型为何在数据分析中表现欠佳?一项系统性实证研究
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
June 24, 2025
作者: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang
cs.AI
摘要
大型语言模型(LLMs)在自动化数据分析任务中展现出巨大潜力,然而开源模型在这类推理密集型场景中面临显著局限。本研究探讨了提升开源LLMs数据分析能力的策略。通过构建一个包含多样化、现实场景的种子数据集,我们从三个维度评估模型表现:数据理解、代码生成及战略规划。分析揭示出三大关键发现:(1)战略规划质量是模型性能的主要决定因素;(2)交互设计与任务复杂性显著影响推理能力;(3)在实现最优性能方面,数据质量比多样性具有更大影响。基于这些洞见,我们开发了一种数据合成方法,显著提升了开源LLMs的分析推理能力。
English
Large Language Models (LLMs) hold promise in automating data analysis tasks,
yet open-source models face significant limitations in these kinds of
reasoning-intensive scenarios. In this work, we investigate strategies to
enhance the data analysis capabilities of open-source LLMs. By curating a seed
dataset of diverse, realistic scenarios, we evaluate models across three
dimensions: data understanding, code generation, and strategic planning. Our
analysis reveals three key findings: (1) Strategic planning quality serves as
the primary determinant of model performance; (2) Interaction design and task
complexity significantly influence reasoning capabilities; (3) Data quality
demonstrates a greater impact than diversity in achieving optimal performance.
We leverage these insights to develop a data synthesis methodology,
demonstrating significant improvements in open-source LLMs' analytical
reasoning capabilities.