MATE: 접근성 애플리케이션을 위한 LLM 기반 다중 에이전트 번역 환경

초록

접근성은 현대 사회에서 여전히 중요한 문제로 남아 있으며, 많은 기술이 사용자 요구의 전체 범위를 지원하도록 개발되지 못하고 있습니다. 기존의 다중 에이전트 시스템(MAS)은 폐쇄형 설계로 인한 맞춤화 부족으로 인해 필요한 사용자에게 포괄적인 지원을 제공하지 못하는 경우가 많습니다. 결과적으로 장애를 가진 개인들은 디지털 환경과 상호작용하려 할 때 상당한 장벽에 직면하게 됩니다. 우리는 사용자의 요구에 기반하여 모달리티 변환을 수행하는 다중 모달리티 접근성 MAS인 MATE를 소개합니다. 이 시스템은 데이터를 이해 가능한 형식으로 변환함으로써 장애를 가진 사람들을 지원하는 데 유용합니다. 예를 들어, 사용자가 시력이 좋지 않아 이미지를 받는 경우, 시스템은 이 이미지를 오디오 설명으로 변환합니다. MATE는 헬스케어와 같은 다양한 도메인, 산업 및 영역에 적용될 수 있으며, 다양한 사용자 그룹을 위한 유용한 도우미가 될 수 있습니다. 이 시스템은 LLM API 호출부터 사용자 정의 머신러닝(ML) 분류기 사용에 이르기까지 다양한 유형의 모델을 지원합니다. 이러한 유연성은 시스템이 다양한 요구에 적응할 수 있도록 보장하며, 다양한 하드웨어와 호환됩니다. 시스템이 로컬에서 실행될 것으로 예상되므로, 민감한 정보의 프라이버시와 보안을 보장합니다. 또한, 이 프레임워크는 실시간 사용자 지원을 위해 기관 기술(예: 디지털 헬스케어 서비스)과 효과적으로 통합될 수 있습니다. 더 나아가, 우리는 사용자 입력에서 정확한 모달리티 변환 작업을 추출할 수 있는 ModCon-Task-Identifier 모델을 소개합니다. 수많은 실험을 통해 ModCon-Task-Identifier가 우리의 사용자 정의 데이터에서 다른 LLM 및 통계 모델을 꾸준히 능가함을 보여줍니다. 우리의 코드와 데이터는 https://github.com/AlgazinovAleksandr/Multi-Agent-MATE에서 공개적으로 이용 가능합니다.

English

Accessibility remains a critical concern in today's society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user's needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

MATE: 접근성 애플리케이션을 위한 LLM 기반 다중 에이전트 번역 환경

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

초록

Support