단일 시연을 통한 비전 기반 손동작 커스터마이제이션

초록

손동작 인식은 인간-컴퓨터 상호작용의 보편적인 방식으로 자리 잡아가고 있으며, 특히 일상 기기들에 카메라가 광범위하게 보급되면서 더욱 그 중요성이 부각되고 있다. 이 분야에서 지속적인 진전이 이루어지고 있음에도 불구하고, 동작 커스터마이제이션은 종종 충분히 탐구되지 않고 있다. 커스터마이제이션은 사용자가 더 자연스럽고 기억하기 쉬우며 접근성이 높은 동작을 정의하고 시연할 수 있게 해주기 때문에 매우 중요하다. 그러나 커스터마이제이션은 사용자가 제공한 데이터를 효율적으로 활용해야 한다. 본 연구에서는 단안 카메라를 사용하여 한 번의 시연만으로도 사용자가 맞춤형 동작을 쉽게 설계할 수 있는 방법을 소개한다. 이를 위해 트랜스포머와 메타러닝 기법을 활용하여 소수 샷 학습의 어려움을 해결하였다. 기존 연구와 달리, 본 방법은 한 손, 두 손, 정적, 동적 동작을 포함한 다양한 조합과 시점을 지원한다. 21명의 참가자로부터 수집된 20가지 동작을 대상으로 사용자 연구를 통해 커스터마이제이션 방법을 평가하였으며, 한 번의 시연만으로 최대 97%의 평균 인식 정확도를 달성하였다. 본 연구는 비전 기반 동작 커스터마이제이션을 위한 실현 가능한 방향을 제시하며, 이 분야의 미래 발전을 위한 기반을 마련하였다.

English

Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and meta-learning techniques to address few-shot learning challenges. Unlike prior work, our method supports any combination of one-handed, two-handed, static, and dynamic gestures, including different viewpoints. We evaluated our customization method through a user study with 20 gestures collected from 21 participants, achieving up to 97% average recognition accuracy from one demonstration. Our work provides a viable path for vision-based gesture customization, laying the foundation for future advancements in this domain.

단일 시연을 통한 비전 기반 손동작 커스터마이제이션

Vision-Based Hand Gesture Customization from a Single Demonstration

초록

Support