阿拉伯语及其方言的指令引导式诗歌生成
Instruction-Guided Poetry Generation in Arabic and Its Dialects
April 30, 2026
作者: Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto
cs.AI
摘要
诗歌历来是阿拉伯语使用者的核心艺术形式,是表达情感和文化认同的重要载体。尽管现代阿拉伯语使用者依然重视诗歌,但现有关于阿拉伯诗歌的大型语言模型研究主要集中于分析任务,如诗歌解读或元数据预测(例如韵律模式和标题)。相比之下,我们的研究通过引入可控生成技术来解决阿拉伯语诗歌创作的实际需求,以辅助用户进行诗歌写作。具体而言,我们构建了一个大规模、精心策划的基于指令的数据集,包含现代标准阿拉伯语及多种阿拉伯方言。该数据集支持根据预设条件(如风格与韵律)进行诗歌创作、修改与续写,同时能执行诗歌分析任务。实验表明,基于该数据集微调的大型语言模型能有效生成符合用户需求的诗歌,这一结论同时基于自动化指标和以阿拉伯语为母语者的人工评估。数据集与代码已公开于:https://github.com/mbzuai-nlp/instructpoet-ar
English
Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or metadata prediction, e.g., rhyme schemes and titles. In contrast, our work addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry. Specifically, we present a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects. This dataset enables tasks such as writing, revising, and continuing poems based on predefined criteria, including style and rhyme, as well as performing poetry analysis. Our experiments show that fine-tuning LLMs on this dataset yields models that can effectively generate poetry that is aligned with user requirements, based on both automated metrics and human evaluation with native Arabic speakers. The data and the code are available at https://github.com/mbzuai-nlp/instructpoet-ar