ChatPaper.aiChatPaper

EduRABSA:面向教育领域细粒度情感分析任务的评论文本数据集

EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks

August 23, 2025
作者: Yan Cathy Hua, Paul Denny, Jörg Wicker, Katerina Taskova
cs.AI

摘要

每年,大多数教育机构都会从学生那里收集到大量关于课程、教学及整体体验的文本反馈。然而,将这些原始反馈转化为有价值的洞见远非易事。由于教育评论文本内容的复杂性和细粒度报告需求,长期以来,采用自动意见挖掘解决方案一直面临挑战。基于方面的情感分析(ABSA)凭借其丰富的子句级意见挖掘能力,提供了一个颇具前景的解决方案。然而,现有的ABSA研究和资源高度集中于商业领域,在教育领域则因公开数据集有限和数据保护严格而稀缺且难以开发。一个高质量、经过标注的数据集对于推动这一资源匮乏领域的研究至关重要。在本研究中,我们推出了EduRABSA(教育评论ABSA),这是首个公开的、标注的ABSA教育评论数据集,涵盖了英语语言下的三种评论主题类型(课程、教学人员、大学)以及所有主要ABSA任务,包括尚未充分探索的隐含方面和隐含意见提取。我们还分享了ASQE-DPT(数据处理工具),这是一款离线、轻量级、无需安装的手动数据标注工具,能够从单一任务标注生成适用于全面ABSA任务的标记数据集。这些资源共同为ABSA社区和教育领域做出了贡献,消除了数据集障碍,支持了研究的透明度和可重复性,并促进了更多资源的创建与共享。数据集、标注工具以及用于数据集处理和采样的脚本与统计信息均可通过https://github.com/yhua219/edurabsa_dataset_and_annotation_tool获取。
English
Every year, most educational institutions seek and receive an enormous volume of text feedback from students on courses, teaching, and overall experience. Yet, turning this raw feedback into useful insights is far from straightforward. It has been a long-standing challenge to adopt automatic opinion mining solutions for such education review text data due to the content complexity and low-granularity reporting requirements. Aspect-based Sentiment Analysis (ABSA) offers a promising solution with its rich, sub-sentence-level opinion mining capabilities. However, existing ABSA research and resources are very heavily focused on the commercial domain. In education, they are scarce and hard to develop due to limited public datasets and strict data protection. A high-quality, annotated dataset is urgently needed to advance research in this under-resourced area. In this work, we present EduRABSA (Education Review ABSA), the first public, annotated ABSA education review dataset that covers three review subject types (course, teaching staff, university) in the English language and all main ABSA tasks, including the under-explored implicit aspect and implicit opinion extraction. We also share ASQE-DPT (Data Processing Tool), an offline, lightweight, installation-free manual data annotation tool that generates labelled datasets for comprehensive ABSA tasks from a single-task annotation. Together, these resources contribute to the ABSA community and education domain by removing the dataset barrier, supporting research transparency and reproducibility, and enabling the creation and sharing of further resources. The dataset, annotation tool, and scripts and statistics for dataset processing and sampling are available at https://github.com/yhua219/edurabsa_dataset_and_annotation_tool.
PDF02September 1, 2025