ChatPaper.aiChatPaper

推进阿拉伯语反向词典系统:基于Transformer的方法与数据集构建指南

Advancing Arabic Reverse Dictionary Systems: A Transformer-Based Approach with Dataset Construction Guidelines

April 30, 2025
作者: Serry Sibaee, Samar Ahmed, Abdullah Al Harbi, Omer Nacar, Adel Ammar, Yasser Habashi, Wadii Boulila
cs.AI

摘要

本研究針對阿拉伯語自然語言處理中的關鍵空白,開發了一種高效的阿拉伯語反向詞典(RD)系統,使用戶能夠根據描述或含義查找詞語。我們提出了一種基於變壓器的新穎方法,採用半編碼器神經網絡架構,其層數呈幾何級數遞減,在阿拉伯語RD任務中達到了最先進的成果。我們的方法包含全面的數據集構建過程,並為阿拉伯語詞典學定義建立了正式的質量標準。通過對多種預訓練模型的實驗表明,阿拉伯語專用模型顯著優於通用的多語言嵌入模型,其中ARBERTv2獲得了最佳排名分數(0.0644)。此外,我們提供了反向詞典任務的形式化抽象,增強了理論理解,並開發了一個模塊化、可擴展的Python庫(RDTL),具有可配置的訓練管道。我們對數據集質量的分析揭示了改進阿拉伯語定義構建的重要見解,從而提出了構建高質量反向詞典資源的八項具體標準。這項工作對阿拉伯語計算語言學做出了重要貢獻,並為阿拉伯語的語言學習、學術寫作和專業交流提供了寶貴的工具。
English
This study addresses the critical gap in Arabic natural language processing by developing an effective Arabic Reverse Dictionary (RD) system that enables users to find words based on their descriptions or meanings. We present a novel transformer-based approach with a semi-encoder neural network architecture featuring geometrically decreasing layers that achieves state-of-the-art results for Arabic RD tasks. Our methodology incorporates a comprehensive dataset construction process and establishes formal quality standards for Arabic lexicographic definitions. Experiments with various pre-trained models demonstrate that Arabic-specific models significantly outperform general multilingual embeddings, with ARBERTv2 achieving the best ranking score (0.0644). Additionally, we provide a formal abstraction of the reverse dictionary task that enhances theoretical understanding and develop a modular, extensible Python library (RDTL) with configurable training pipelines. Our analysis of dataset quality reveals important insights for improving Arabic definition construction, leading to eight specific standards for building high-quality reverse dictionary resources. This work contributes significantly to Arabic computational linguistics and provides valuable tools for language learning, academic writing, and professional communication in Arabic.
PDF82May 14, 2025