阿拉伯语及其方言的指令引导诗歌生成

摘要

诗歌历来是阿拉伯语使用者的核心艺术形式，既是强有力的表达媒介，也是文化认同的重要载体。尽管现代阿拉伯语使用者依然重视诗歌，但现有关于阿拉伯诗歌的大型语言模型研究主要集中于诗歌解读或韵律模式、标题生成等元数据预测的分析任务。与此不同，我们的研究通过引入可控生成技术来解决阿拉伯语诗歌创作的实际需求，旨在辅助用户进行诗歌写作。具体而言，我们构建了一个大规模、精心标注的基于指令的数据集，涵盖现代标准阿拉伯语及各地方言变体。该数据集支持根据预定义标准（如风格与韵律）进行诗歌创作、修订与续写，同时能执行诗歌分析任务。实验表明，基于该数据集微调的大型语言模型能有效生成符合用户需求的诗歌，这一结论同时得到自动化指标和以阿拉伯语为母语者的人工评估双重验证。数据集与代码已开源：https://github.com/mbzuai-nlp/instructpoet-ar。

English

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or metadata prediction, e.g., rhyme schemes and titles. In contrast, our work addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry. Specifically, we present a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects. This dataset enables tasks such as writing, revising, and continuing poems based on predefined criteria, including style and rhyme, as well as performing poetry analysis. Our experiments show that fine-tuning LLMs on this dataset yields models that can effectively generate poetry that is aligned with user requirements, based on both automated metrics and human evaluation with native Arabic speakers. The data and the code are available at https://github.com/mbzuai-nlp/instructpoet-ar

阿拉伯语及其方言的指令引导诗歌生成

Instruction-Guided Poetry Generation in Arabic and Its Dialects

摘要

Support