MATE：基于大语言模型的多智能体翻译环境，助力无障碍应用

摘要

在当今社会，可访问性仍然是一个关键问题，因为许多技术并未开发以支持全面的用户需求。现有的多智能体系统（MAS）由于封闭源代码设计导致的定制化不足，往往无法为有需求的用户提供全面的帮助。因此，残障人士在尝试与数字环境互动时常常遇到重大障碍。我们引入了MATE，一种多模态可访问性多智能体系统，它根据用户需求执行模态转换。该系统通过确保数据被转换为可理解的格式，对辅助残障人士非常有用。例如，如果用户视力不佳并接收到一张图片，系统会将该图片转换为其音频描述。MATE可应用于广泛的领域、行业和区域，如医疗保健，并成为各类用户的有用助手。该系统支持多种类型的模型，从LLM API调用到使用自定义机器学习（ML）分类器。这种灵活性确保了系统能够适应各种需求，并与多种硬件兼容。由于系统预期在本地运行，它确保了敏感信息的隐私和安全。此外，该框架可以有效地与机构技术（如数字医疗服务）集成，以提供实时用户协助。我们还引入了ModCon-Task-Identifier模型，该模型能够从用户输入中提取精确的模态转换任务。大量实验表明，ModCon-Task-Identifier在我们的自定义数据上始终优于其他LLM和统计模型。我们的代码和数据可在https://github.com/AlgazinovAleksandr/Multi-Agent-MATE 公开获取。

English

Accessibility remains a critical concern in today's society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user's needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

MATE：基于大语言模型的多智能体翻译环境，助力无障碍应用

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

摘要

Support