道德故事：一個用於評估道德對齊的法語數據集

摘要

將語言模型與人類價值觀對齊至關重要，尤其是隨著它們越來越融入日常生活。儘管模型通常會根據用戶偏好進行調整，確保其與現實社會情境中的道德規範和行為保持一致同樣重要。儘管在英語和中文等語言取得了顯著進展，但法語在這方面卻受到較少關注，這導致我們對LLM在這種語言中處理道德推理的方式了解不足。為彌補這一空白，我們介紹了「Histoires Morales」，這是一個源自道德故事的法語數據集，通過翻譯創建，並在母語人士的協助下進行了後續精煉，以確保語法準確性和適應法國文化背景。我們還依賴數據集中的道德價值標註，以確保其與法國規範保持一致。Histoires Morales涵蓋了各種社會情境，包括小費支付習慣的差異、關係中的誠實表達以及對待動物的責任。為促進未來研究，我們還對多語言模型在法語和英語數據上的對齊以及對齊的穩健性進行了初步實驗。我們發現，儘管LLM通常默認與人類道德規範保持一致，但它們很容易受到用戶偏好優化的影響，無論是對道德還是不道德數據。

English

Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.