乒乓:多轮语码转换对话的自然基准
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
January 24, 2026
作者: Mohammad Rifqi Farhansyah, Hanif Muhammad Zhafran, Farid Adilazuarda, Shamsuddeen Hassan Muhammad, Maryam Ibrahim Mukhtar, Nedjma Ousidhoum, Genta Indra Winata, Ayu Purwarianti, Alham Fikri Aji
cs.AI
摘要
语码转换在全球多语种人群中普遍存在,但现有基准测试难以准确反映其日常交流的复杂性。我们推出PingPong——一个涵盖五种语言组合变体(部分为三语)的自然多方言语码转换对话基准。该数据集由2至4人参与的人工编写对话构成,呈现真实的多线程对话结构,其中应答常指向对话中较早的节点。我们证明该数据在信息长度、发言主导权和应答跨度等方面比机器生成数据更具自然性与结构多样性,且变化维度更丰富。基于这些对话,我们定义了三个下游任务:问答系统、对话摘要和主题分类。在PingPong上对多种前沿语言模型的评估表明,现有模型对语码转换输入的处理能力仍显不足,这凸显了开发能应对现实世界多语交流复杂性的更强健自然语言处理系统的迫切需求。
English
Code-switching is a widespread practice among the world's multilingual majority, yet few benchmarks accurately reflect its complexity in everyday communication. We present PingPong, a benchmark for natural multi-party code-switching dialogues covering five language-combination variations, some of which are trilingual. Our dataset consists of human-authored conversations among 2 to 4 participants covering authentic, multi-threaded structures where replies frequently reference much earlier points in the dialogue. We demonstrate that our data is significantly more natural and structurally diverse than machine-generated alternatives, offering greater variation in message length, speaker dominance, and reply distance. Based on these dialogues, we define three downstream tasks: Question Answering, Dialogue Summarization, and Topic Classification. Evaluations of several state-of-the-art language models on PingPong reveal that performance remains limited on code-switched inputs, underscoring the urgent need for more robust NLP systems capable of addressing the intricacies of real-world multilingual discourse.