如何拍出令人难忘的照片？赋予用户可操作的反馈指南

摘要

圖像記憶性（即圖像被記住的可能性）在計算機視覺領域的傳統研究主要呈現兩種範式：或被動預測模型通過回歸標量分數進行評估，或採用生成方法修改視覺輸入以提升圖像被記住的概率。然而，這些範式均無法在用戶拍攝時提供支持，而此時的核心問題在於如何提升照片的記憶性。我們提出記憶性反饋任務，旨在通過自動化模型為用戶提供可操作、人類可解讀的指導，以增強圖像的未來回憶效果。我們同時推出MemCoach——首個基於多模態大語言模型的訓練無需方法，能夠生成具體的自然語言建議來提升記憶性（例如「強化面部表情」「拉近主體距離」）。該方法採用師生引導策略，通過對齊模型內部激活向量，使其趨向於從按記憶性由低到高排序的樣本中學習到的記憶模式。為系統評估這一新任務，我們進一步構建MemBench基準，包含帶記憶性分數註釋的序列對齊拍攝圖集。針對多種多模態大語言模型的實驗表明，MemCoach相較於多個零樣本模型能持續提升性能，證明記憶性不僅可預測，更可通過指導教學實現從被動預測向為人類創作者提供可操作反饋的範式轉變。

English

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.

如何拍出令人难忘的照片？赋予用户可操作的反馈指南

How to Take a Memorable Picture? Empowering Users with Actionable Feedback

摘要

Support