RobotValues: 인간 가치 충돌 시 가정용 로봇 평가

초록

가정용 로봇은 종종 작업 완수 여부로 평가되지만, 일상적인 가정 환경에서는 인간의 자율성, 효율성 또는 사회적 적절성과 같은 작업 성공보다 다른 가치를 우선시하는 행동을 로봇이 선택해야 하는 가치 충돌 상황이 발생한다. 그러나 이러한 시나리오에서 로봇의 가치 선호도를 평가하기 위한 벤치마크는 존재하지 않는다. 본 논문에서는 10,000개의 가치 충돌 시나리오에서 가정용 로봇 플래너를 평가하기 위한 벤치마크인 RobotValues를 소개한다. 각 인스턴스는 서로 다른 인간 가치를 우선시하는 여러 가지 그럴듯한 로봇 행동과 함께 현실적인 가정용 이미지로 구성된다. RobotValues는 LLM 지원 시나리오 생성, 이해관계자 기반 가치 추출, 이미지 생성 및 자동 품질 관리를 통해 구축된다. RobotValues를 사용하여 로봇공학에 사용되는 VLM(비전-언어 모델)을 평가한 결과, 모델은 안전 및 편의성을 포함한 기본 가치 선호도를 보이는 반면, 개인정보 보호를 우선시하는 행동은 과소 선택함을 발견했다. 모델이 자신의 선호도와 충돌하는 특정 가치를 우선시하도록 지시받았을 때, 기본 행동을 재정의하는 데 종종 실패하여 80%의 확률로 잘못된 행동을 선택했다. 이러한 연구 결과는 가정용 로봇 평가가 작업 완수나 안전 준수뿐만 아니라, 인간의 가치가 충돌할 때 로봇이 그럴듯한 행동들 중에서 선택할 수 있는지 여부도 측정해야 함을 시사한다.

English

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.