大型语言模型作为税务律师:法律能力出现的案例研究
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
June 12, 2023
作者: John J. Nay, David Karamardian, Sarah B. Lawsky, Wenting Tao, Meghana Bhat, Raghav Jain, Aaron Travis Lee, Jonathan H. Choi, Jungo Kasai
cs.AI
摘要
更好地理解大型语言模型(LLMs)在法律分析方面的能力有助于提高法律服务的效率,监管人工智能,并利用LLMs来识别法律中的不一致之处。本文探讨了LLMs在应用税法方面的能力。我们选择这个法律领域,因为它具有一种结构,使我们能够在成千上万的示例中建立自动化验证流程,需要逻辑推理和数学技能,并使我们能够以与公民和公司的现实经济生活相关的方式测试LLMs的能力。我们的实验表明,新兴的法律理解能力,随着每一次后续OpenAI模型发布而提高。我们尝试检索和利用相关的法律权威来评估向LLMs提供额外法律背景的影响。发现,少样本提示,展示问题-答案对的示例,也被发现明显提升了最先进模型GPT-4的性能。研究结果表明,LLMs,特别是当结合提示增强和正确的法律文本时,可以在高准确度水平上执行,但尚未达到专业税务律师的水平。随着LLMs的不断进步,它们自主推理法律的能力可能对法律行业和人工智能治理产生重大影响。
English
Better understanding of Large Language Models' (LLMs) legal analysis
abilities can contribute to improving the efficiency of legal services,
governing artificial intelligence, and leveraging LLMs to identify
inconsistencies in law. This paper explores LLM capabilities in applying tax
law. We choose this area of law because it has a structure that allows us to
set up automated validation pipelines across thousands of examples, requires
logical reasoning and maths skills, and enables us to test LLM capabilities in
a manner relevant to real-world economic lives of citizens and companies. Our
experiments demonstrate emerging legal understanding capabilities, with
improved performance in each subsequent OpenAI model release. We experiment
with retrieving and utilising the relevant legal authority to assess the impact
of providing additional legal context to LLMs. Few-shot prompting, presenting
examples of question-answer pairs, is also found to significantly enhance the
performance of the most advanced model, GPT-4. The findings indicate that LLMs,
particularly when combined with prompting enhancements and the correct legal
texts, can perform at high levels of accuracy but not yet at expert tax lawyer
levels. As LLMs continue to advance, their ability to reason about law
autonomously could have significant implications for the legal profession and
AI governance.