神經驅動圖像編輯

摘要

傳統圖像編輯通常依賴於手動提示，這使得該過程既耗時又對運動控制或語言能力有限的個體難以接近。借助腦機介面（BCIs）和生成模型的最新進展，我們提出了LoongX，一種由多模態神經生理信號驅動的免手動圖像編輯方法。LoongX利用基於23,928對圖像編輯數據集訓練的最新擴散模型，每對數據均配備了同步的腦電圖（EEG）、功能性近紅外光譜（fNIRS）、光電容積描記（PPG）及捕捉用戶意圖的頭部運動信號。為有效處理這些信號的異質性，LoongX整合了兩個關鍵模組：跨尺度狀態空間（CS3）模組，用於編碼信息豐富的模態特徵；動態門控融合（DGF）模組，進一步將這些特徵聚合到一個統一的潛在空間，並通過在擴散變壓器（DiT）上的微調與編輯語義對齊。此外，我們利用對比學習預訓練編碼器，以將認知狀態與嵌入自然語言的語義意圖對齊。大量實驗表明，LoongX在性能上可與文本驅動方法相媲美（CLIP-I：0.6605對比0.6558；DINO：0.4812對比0.4636），並在神經信號與語音結合時超越它們（CLIP-T：0.2588對比0.2549）。這些結果凸顯了神經驅動生成模型在實現可訪問、直觀圖像編輯方面的潛力，並為認知驅動的創意技術開闢了新方向。數據集和代碼將被公開，以支持未來工作並促進這一新興領域的進步。

English

Traditional image editing typically relies on manual prompting, making it labor-intensive and inaccessible to individuals with limited motor control or language abilities. Leveraging recent advances in brain-computer interfaces (BCIs) and generative models, we propose LoongX, a hands-free image editing approach driven by multimodal neurophysiological signals. LoongX utilizes state-of-the-art diffusion models trained on a comprehensive dataset of 23,928 image editing pairs, each paired with synchronized electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), photoplethysmography (PPG), and head motion signals that capture user intent. To effectively address the heterogeneity of these signals, LoongX integrates two key modules. The cross-scale state space (CS3) module encodes informative modality-specific features. The dynamic gated fusion (DGF) module further aggregates these features into a unified latent space, which is then aligned with edit semantics via fine-tuning on a diffusion transformer (DiT). Additionally, we pre-train the encoders using contrastive learning to align cognitive states with semantic intentions from embedded natural language. Extensive experiments demonstrate that LoongX achieves performance comparable to text-driven methods (CLIP-I: 0.6605 vs. 0.6558; DINO: 0.4812 vs. 0.4636) and outperforms them when neural signals are combined with speech (CLIP-T: 0.2588 vs. 0.2549). These results highlight the promise of neural-driven generative models in enabling accessible, intuitive image editing and open new directions for cognitive-driven creative technologies. Datasets and code will be released to support future work and foster progress in this emerging area.

神經驅動圖像編輯

Neural-Driven Image Editing

摘要

Support