DIWALI - 印度文化特定项目的多样性与包容性意识：数据集及大型语言模型在印度文化文本适应中的评估

摘要

大型語言模型（LLMs）在各種任務和應用中廣泛使用。然而，儘管其功能廣泛，研究顯示它們缺乏文化對齊性（ryan-etal-2024-unintended, alkhamissi-etal-2024-investigating），並因文化知識和能力的不足而產生偏見的生成結果（naous-etal-2024-beer）。評估LLMs的文化意識和對齊性尤其具有挑戰性，這主要是由於缺乏適當的評估指標以及無法獲得代表區域和次區域層面文化複雜性的文化基礎數據集。現有的文化特定項目（CSIs）數據集主要集中於區域層面的概念，且可能包含誤判。為解決這一問題，我們引入了一個針對印度文化的新穎CSI數據集，涵蓋17個文化面向。該數據集包含來自36個次區域的sim8k文化概念。為了衡量LLMs在文化文本適應任務中的文化能力，我們使用創建的CSIs、LLM作為評判者以及來自不同社會人口區域的人類評估來評估這些適應。此外，我們進行了定量分析，展示了所有考慮的LLMs在選擇性次區域覆蓋和表面層次適應方面的表現。我們的數據集可在此處獲取：https://huggingface.co/datasets/nlip/DIWALI，項目網頁\href{https://nlip-lab.github.io/nlip/publications/diwali/}，以及我們的代碼庫與模型輸出可在此找到：https://github.com/pramitsahoo/culture-evaluation。

English

Large language models (LLMs) are widely used in various tasks and applications. However, despite their wide capabilities, they are shown to lack cultural alignment ryan-etal-2024-unintended, alkhamissi-etal-2024-investigating and produce biased generations naous-etal-2024-beer due to a lack of cultural knowledge and competence. Evaluation of LLMs for cultural awareness and alignment is particularly challenging due to the lack of proper evaluation metrics and unavailability of culturally grounded datasets representing the vast complexity of cultures at the regional and sub-regional levels. Existing datasets for culture specific items (CSIs) focus primarily on concepts at the regional level and may contain false positives. To address this issue, we introduce a novel CSI dataset for Indian culture, belonging to 17 cultural facets. The dataset comprises sim8k cultural concepts from 36 sub-regions. To measure the cultural competence of LLMs on a cultural text adaptation task, we evaluate the adaptations using the CSIs created, LLM as Judge, and human evaluations from diverse socio-demographic region. Furthermore, we perform quantitative analysis demonstrating selective sub-regional coverage and surface-level adaptations across all considered LLMs. Our dataset is available here: https://huggingface.co/datasets/nlip/DIWALI{https://huggingface.co/datasets/nlip/DIWALI}, project webpage\href{https://nlip-lab.github.io/nlip/publications/diwali/{https://nlip-lab.github.io/nlip/publications/diwali/}}, and our codebase with model outputs can be found here: https://github.com/pramitsahoo/culture-evaluation{https://github.com/pramitsahoo/culture-evaluation}.

DIWALI - 印度文化特定项目的多样性与包容性意识：数据集及大型语言模型在印度文化文本适应中的评估

DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context

摘要

Support