ChatPaper.aiChatPaper

檢索GPT:合併提示和數學模型以增強混合編碼信息檢索

RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval

November 7, 2024
作者: Aniket Deroy, Subhankar Maity
cs.AI

摘要

程式混合是指在單一句子中整合來自多種語言的詞彙和語法元素,是一種廣泛存在的語言現象,尤其在多語社會中尤為普遍。在印度,社交媒體用戶經常使用羅馬字母來進行程式混合對話,特別是在移民社區中形成的線上群組中分享相關的當地信息。本文專注於從程式混合對話中提取相關信息的挑戰,特別是在羅馬化孟加拉語與英語混合的情況下。本研究提出了一種新方法來應對這些挑戰,通過開發一個機制來自動識別程式混合對話中最相關的答案。我們在包含來自Facebook的查詢和文件以及查詢相關文件(QRels)的數據集上進行了實驗,以協助完成這項任務。我們的結果展示了我們的方法在從複雜的程式混合數字對話中提取相關信息方面的有效性,有助於豐富多語和非正式文本環境中自然語言處理領域。我們使用GPT-3.5 Turbo通過提示以及利用相關文件的序列性質來構建一個數學模型,有助於檢測與查詢相關的文件。
English
Code-mixing, the integration of lexical and grammatical elements from multiple languages within a single sentence, is a widespread linguistic phenomenon, particularly prevalent in multilingual societies. In India, social media users frequently engage in code-mixed conversations using the Roman script, especially among migrant communities who form online groups to share relevant local information. This paper focuses on the challenges of extracting relevant information from code-mixed conversations, specifically within Roman transliterated Bengali mixed with English. This study presents a novel approach to address these challenges by developing a mechanism to automatically identify the most relevant answers from code-mixed conversations. We have experimented with a dataset comprising of queries and documents from Facebook, and Query Relevance files (QRels) to aid in this task. Our results demonstrate the effectiveness of our approach in extracting pertinent information from complex, code-mixed digital conversations, contributing to the broader field of natural language processing in multilingual and informal text environments. We use GPT-3.5 Turbo via prompting alongwith using the sequential nature of relevant documents to frame a mathematical model which helps to detect relevant documents corresponding to a query.
PDF173December 4, 2025