检索GPT：合并提示和数学模型以增强混合代码信息检索

摘要

代码混合是指在单个句子中整合来自多种语言的词汇和语法元素，是一种广泛存在的语言现象，尤其在多语社会中尤为普遍。在印度，社交媒体用户经常使用罗马字母文字进行代码混合对话，特别是在形成在线群体以分享相关本地信息的移民社区中。本文关注从罗马字母转写的孟加拉语与英语混合对话中提取相关信息的挑战。该研究提出了一种新方法来解决这些挑战，即通过开发一种机制来自动识别代码混合对话中最相关的答案。我们在包含来自Facebook的查询和文档以及查询相关文件（QRels）的数据集上进行了实验以协助完成此任务。我们的结果表明，我们的方法在从复杂的代码混合数字对话中提取相关信息方面的有效性，有助于在多语言和非正式文本环境中的自然语言处理领域。我们使用GPT-3.5 Turbo通过提示以及利用相关文档的顺序性质构建数学模型，帮助检测与查询相关的文档。

English

Code-mixing, the integration of lexical and grammatical elements from multiple languages within a single sentence, is a widespread linguistic phenomenon, particularly prevalent in multilingual societies. In India, social media users frequently engage in code-mixed conversations using the Roman script, especially among migrant communities who form online groups to share relevant local information. This paper focuses on the challenges of extracting relevant information from code-mixed conversations, specifically within Roman transliterated Bengali mixed with English. This study presents a novel approach to address these challenges by developing a mechanism to automatically identify the most relevant answers from code-mixed conversations. We have experimented with a dataset comprising of queries and documents from Facebook, and Query Relevance files (QRels) to aid in this task. Our results demonstrate the effectiveness of our approach in extracting pertinent information from complex, code-mixed digital conversations, contributing to the broader field of natural language processing in multilingual and informal text environments. We use GPT-3.5 Turbo via prompting alongwith using the sequential nature of relevant documents to frame a mathematical model which helps to detect relevant documents corresponding to a query.