神经驱动的图像编辑

摘要

传统图像编辑通常依赖于手动提示，这使得该过程既费时又对运动控制或语言能力受限的个人难以企及。借助脑机接口（BCIs）和生成模型的最新进展，我们提出了LoongX，一种基于多模态神经生理信号的无手操作图像编辑方法。LoongX利用最先进的扩散模型，该模型在包含23,928对图像编辑样本的全面数据集上训练，每对样本均配有同步的脑电图（EEG）、功能性近红外光谱（fNIRS）、光电容积描记术（PPG）以及捕捉用户意图的头部运动信号。为有效应对这些信号的异质性，LoongX整合了两个关键模块：跨尺度状态空间（CS3）模块负责编码各模态特有的信息特征；动态门控融合（DGF）模块进一步将这些特征聚合至统一潜在空间，并通过在扩散变换器（DiT）上的微调与编辑语义对齐。此外，我们采用对比学习预训练编码器，以将认知状态与嵌入自然语言的语义意图对齐。大量实验表明，LoongX在性能上可与文本驱动方法相媲美（CLIP-I：0.6605对比0.6558；DINO：0.4812对比0.4636），并在神经信号与语音结合时表现更优（CLIP-T：0.2588对比0.2549）。这些成果凸显了神经驱动生成模型在实现无障碍、直观图像编辑方面的潜力，为认知驱动的创意技术开辟了新方向。我们将发布数据集和代码，以支持未来工作并推动这一新兴领域的发展。

English

Traditional image editing typically relies on manual prompting, making it labor-intensive and inaccessible to individuals with limited motor control or language abilities. Leveraging recent advances in brain-computer interfaces (BCIs) and generative models, we propose LoongX, a hands-free image editing approach driven by multimodal neurophysiological signals. LoongX utilizes state-of-the-art diffusion models trained on a comprehensive dataset of 23,928 image editing pairs, each paired with synchronized electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), photoplethysmography (PPG), and head motion signals that capture user intent. To effectively address the heterogeneity of these signals, LoongX integrates two key modules. The cross-scale state space (CS3) module encodes informative modality-specific features. The dynamic gated fusion (DGF) module further aggregates these features into a unified latent space, which is then aligned with edit semantics via fine-tuning on a diffusion transformer (DiT). Additionally, we pre-train the encoders using contrastive learning to align cognitive states with semantic intentions from embedded natural language. Extensive experiments demonstrate that LoongX achieves performance comparable to text-driven methods (CLIP-I: 0.6605 vs. 0.6558; DINO: 0.4812 vs. 0.4636) and outperforms them when neural signals are combined with speech (CLIP-T: 0.2588 vs. 0.2549). These results highlight the promise of neural-driven generative models in enabling accessible, intuitive image editing and open new directions for cognitive-driven creative technologies. Datasets and code will be released to support future work and foster progress in this emerging area.

神经驱动的图像编辑

Neural-Driven Image Editing

摘要

Support