NeuralOS: 신경 생성 모델을 통한 운영체제 시뮬레이션을 향하여

초록

우리는 NeuralOS를 소개합니다. 이 신경망 프레임워크는 마우스 이동, 클릭, 키보드 이벤트와 같은 사용자 입력에 직접 반응하여 화면 프레임을 예측함으로써 운영 체제의 그래픽 사용자 인터페이스(GUI)를 시뮬레이션합니다. NeuralOS는 컴퓨터 상태를 추적하는 순환 신경망(RNN)과 화면 이미지를 생성하는 확산 기반 신경 렌더러를 결합합니다. 이 모델은 무작위로 생성된 상호작용과 AI 에이전트가 생성한 현실적인 상호작용을 모두 포함하는 대규모 Ubuntu XFCE 녹화 데이터셋으로 학습됩니다. 실험 결과, NeuralOS는 현실적인 GUI 시퀀스를 성공적으로 렌더링하고, 마우스 상호작용을 정확하게 포착하며, 애플리케이션 실행과 같은 상태 전환을 안정적으로 예측하는 것으로 나타났습니다. 세밀한 키보드 상호작용을 정확하게 모델링하는 것은 여전히 도전적인 과제이지만, NeuralOS는 미래의 인간-컴퓨터 상호작용 시스템을 위한 완전히 적응형이고 생성적인 신경 인터페이스를 만들기 위한 한 걸음을 내디뎠습니다.

English

We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. NeuralOS combines a recurrent neural network (RNN), which tracks computer state, with a diffusion-based neural renderer that generates screen images. The model is trained on a large-scale dataset of Ubuntu XFCE recordings, which include both randomly generated interactions and realistic interactions produced by AI agents. Experiments show that NeuralOS successfully renders realistic GUI sequences, accurately captures mouse interactions, and reliably predicts state transitions like application launches. Although modeling fine-grained keyboard interactions precisely remains challenging, NeuralOS offers a step toward creating fully adaptive, generative neural interfaces for future human-computer interaction systems.