會議代表：對於代表我們出席會議的大型語言模型進行基準測試

摘要

在當代工作場所中，會議對於交換想法和確保團隊一致性至關重要，但常常面臨著時間消耗、排程衝突和參與效率不高等挑戰。最近大型語言模型（LLMs）的進步展示了它們在自然語言生成和推理方面的強大能力，引發了一個問題：LLMs能否有效地委派會議參與者？為了探索這一問題，我們開發了一個原型LLM驅動的會議代表系統，並使用真實會議記錄創建了一個全面的基準。我們的評估顯示，GPT-4/4o在積極和謹慎參與策略之間保持平衡的表現。相比之下，Gemini 1.5 Pro傾向於更謹慎，而Gemini 1.5 Flash和Llama3-8B/70B展現出更積極的傾向。整體而言，約60\%的回應至少涉及一個來自真實情況的關鍵要點。然而，仍需要改進以減少無關或重複的內容，並增強對於在現實環境中常見的轉錄錯誤的容忍度。此外，我們在實際環境中實施了該系統並收集了來自演示的真實反饋。我們的研究強調了利用LLMs作為會議代表的潛力和挑戰，為減輕會議負擔的實際應用提供了寶貴的見解。

English

In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.