MATE: アクセシビリティアプリケーションのためのLLM駆動型マルチエージェント翻訳環境

要旨

アクセシビリティは、現代社会において依然として重要な課題であり、多くのテクノロジーがユーザーの多様なニーズを十分にサポートするよう開発されていない。既存のマルチエージェントシステム（MAS）は、クローズドソース設計に起因するカスタマイズの欠如により、支援を必要とするユーザーに対して包括的な支援を提供できないことが多い。その結果、障害を持つ個人は、デジタル環境との対話を試みる際に重大な障壁に直面することが頻繁にある。本論文では、ユーザーのニーズに基づいてモダリティ変換を行うマルチモーダルアクセシビリティMASであるMATEを紹介する。このシステムは、データを理解可能な形式に変換することで、障害を持つ人々を支援するのに有用である。例えば、ユーザーが視覚に問題があり画像を受け取った場合、システムはその画像を音声説明に変換する。MATEは、医療など幅広いドメイン、業界、領域に適用可能であり、様々なユーザーグループにとって有用なアシスタントとなり得る。システムは、LLM API呼び出しからカスタム機械学習（ML）分類器の使用まで、複数のタイプのモデルをサポートする。この柔軟性により、システムは様々なニーズに適応可能であり、多様なハードウェアと互換性がある。システムはローカルで動作することが期待されているため、機密情報のプライバシーとセキュリティが確保される。さらに、このフレームワークは、デジタル医療サービスなどの機関技術と効果的に統合され、リアルタイムのユーザー支援を実現する。さらに、ユーザー入力から正確なモダリティ変換タスクを抽出可能なモデルであるModCon-Task-Identifierを紹介する。数多くの実験により、ModCon-Task-Identifierは、カスタムデータにおいて他のLLMや統計モデルを一貫して上回ることが示されている。我々のコードとデータは、https://github.com/AlgazinovAleksandr/Multi-Agent-MATE で公開されている。

English

Accessibility remains a critical concern in today's society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user's needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

MATE: アクセシビリティアプリケーションのためのLLM駆動型マルチエージェント翻訳環境

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

要旨

Support