MATE：基於大型語言模型的多智能體翻譯環境，助力無障礙應用

摘要

可及性在當今社會仍然是一個關鍵問題，因為許多技術並未開發以支援全面的用戶需求。現有的多代理系統（MAS）由於封閉源設計缺乏客製化，往往無法為有需求的用戶提供全面的協助。因此，殘障人士在嘗試與數位環境互動時經常遇到重大障礙。我們介紹了MATE，這是一個多模態可及性多代理系統，它根據用戶的需求執行模態轉換。該系統對於協助殘障人士非常有用，確保數據將轉換為可理解的格式。例如，如果用戶視力不佳並接收到一張圖片，系統會將此圖片轉換為其音頻描述。MATE可應用於廣泛的領域、行業和地區，如醫療保健，並可成為各種用戶群體的有用助手。該系統支援多種類型的模型，從LLM API調用到使用自定義的機器學習（ML）分類器。這種靈活性確保了系統可以適應各種需求，並與多種硬體兼容。由於系統預計在本地運行，它確保了敏感信息的隱私和安全。此外，該框架可以有效地與機構技術（如數位醫療服務）整合，以提供即時的用戶協助。此外，我們介紹了ModCon-Task-Identifier，這是一個能夠從用戶輸入中提取精確模態轉換任務的模型。大量實驗表明，ModCon-Task-Identifier在我們的定制數據上始終優於其他LLM和統計模型。我們的代碼和數據公開於https://github.com/AlgazinovAleksandr/Multi-Agent-MATE。

English

Accessibility remains a critical concern in today's society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user's needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

MATE：基於大型語言模型的多智能體翻譯環境，助力無障礙應用

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

摘要

Support