ChatPaper.aiChatPaper

SciPrompt:知识增强提示用于科学主题的细粒度分类

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

October 2, 2024
作者: Zhiwen You, Kanyao Han, Haotian Zhu, Bertram Ludäscher, Jana Diesner
cs.AI

摘要

基于提示的微调已成为一种重要方法,用于从预训练语言模型中提取编码的信息,适用于各种任务,包括文本分类。对于多类分类任务,在资源稀缺情况下进行基于提示的微调已导致性能水平与完全微调方法相媲美。先前的研究使用精心设计的提示模板和语言化器,从标签术语空间映射到类空间,将分类问题解决为掩码语言建模任务。然而,跨领域和细粒度的基于提示的微调与自动丰富的语言化器仍未被探索,主要是因为手动选择领域标签术语用于语言化器的困难和成本高昂,需要具有领域专业知识的人类。为了解决这一挑战,我们引入了SciPrompt,这是一个旨在为资源稀缺的文本分类任务自动检索科学主题相关术语的框架。为此,我们在科学文献的背景下选择语义相关且领域特定的标签术语,用于语言化器增强。此外,我们提出了一种新的语言化策略,利用相关性分数作为额外权重,以增强语言模型在模型调整期间的预测性能。我们的方法在少量和零次迁移设置下,在科学文本分类任务中胜过了最先进的基于提示的微调方法,特别是在对细粒度和新兴科学主题进行分类时。
English
Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.

Summary

AI-Generated Summary

PDF43November 16, 2024