EduRABSA:面向細粒度情感分析任務的教育評論數據集
EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks
August 23, 2025
作者: Yan Cathy Hua, Paul Denny, Jörg Wicker, Katerina Taskova
cs.AI
摘要
每年,大多數教育機構都會從學生那裡收集並收到大量關於課程、教學和整體體驗的文本反饋。然而,將這些原始反饋轉化為有用的洞察卻遠非易事。由於內容的複雜性和低粒度報告要求,採用自動意見挖掘技術來處理這類教育評論文本數據一直是一個長期存在的挑戰。基於方面的情感分析(ABSA)以其豐富的子句級意見挖掘能力,提供了一個有前景的解決方案。然而,現有的ABSA研究和資源非常集中於商業領域。在教育領域,由於公開數據集有限且數據保護嚴格,相關資源稀缺且難以開發。迫切需要一個高質量的標註數據集來推動這一資源匱乏領域的研究。在本研究中,我們提出了EduRABSA(教育評論文本ABSA),這是首個公開的、標註的ABSA教育評論文本數據集,涵蓋了英語語言中的三種評論主題類型(課程、教學人員、大學)以及所有主要的ABSA任務,包括尚未充分探索的隱含方面和隱含意見提取。我們還分享了ASQE-DPT(數據處理工具),這是一個離線、輕量級、無需安裝的手動數據標註工具,它能從單一任務標註中生成適用於全面ABSA任務的標註數據集。這些資源共同為ABSA社區和教育領域做出了貢獻,通過消除數據集障礙,支持研究的透明度和可重複性,並促進了更多資源的創建和共享。數據集、標註工具以及用於數據集處理和採樣的腳本和統計信息可在https://github.com/yhua219/edurabsa_dataset_and_annotation_tool獲取。
English
Every year, most educational institutions seek and receive an enormous volume
of text feedback from students on courses, teaching, and overall experience.
Yet, turning this raw feedback into useful insights is far from
straightforward. It has been a long-standing challenge to adopt automatic
opinion mining solutions for such education review text data due to the content
complexity and low-granularity reporting requirements. Aspect-based Sentiment
Analysis (ABSA) offers a promising solution with its rich, sub-sentence-level
opinion mining capabilities. However, existing ABSA research and resources are
very heavily focused on the commercial domain. In education, they are scarce
and hard to develop due to limited public datasets and strict data protection.
A high-quality, annotated dataset is urgently needed to advance research in
this under-resourced area. In this work, we present EduRABSA (Education Review
ABSA), the first public, annotated ABSA education review dataset that covers
three review subject types (course, teaching staff, university) in the English
language and all main ABSA tasks, including the under-explored implicit aspect
and implicit opinion extraction. We also share ASQE-DPT (Data Processing Tool),
an offline, lightweight, installation-free manual data annotation tool that
generates labelled datasets for comprehensive ABSA tasks from a single-task
annotation. Together, these resources contribute to the ABSA community and
education domain by removing the dataset barrier, supporting research
transparency and reproducibility, and enabling the creation and sharing of
further resources. The dataset, annotation tool, and scripts and statistics for
dataset processing and sampling are available at
https://github.com/yhua219/edurabsa_dataset_and_annotation_tool.