如何拍出令人难忘的照片？赋予用户可操作的反馈建议

摘要

图像记忆性，即图像被记住的可能性，传统上在计算机视觉领域通常作为被动预测任务进行研究——模型通过回归标量分数进行预测，或通过生成方法修改视觉输入以提升图像被记住的概率。然而，这些范式均无法在拍摄阶段为用户提供支持，而此时的核心问题是如何提升照片的记忆性。我们提出了记忆性反馈任务，要求自动化模型在拍摄时为用户提供可操作、易于理解的指导，以增强图像的未来回忆效果。我们同时推出MemCoach方法，首次通过自然语言生成具体改进建议（如“强化面部表情”“突出主体前景”）。该方法基于多模态大语言模型，无需训练即可运作，并采用师生引导策略——通过将模型内部激活状态向教师模型从低到高记忆性样本中习得的记忆模式对齐。为系统评估这一新任务，我们进一步构建MemBench基准数据集，包含时序对齐的连拍摄影序列及记忆性标注分数。针对多种多模态大模型的实验表明，MemCoach能持续超越多个零样本模型，证明记忆性不仅可被预测，更能通过可操作的反馈指导人类创作者，实现从被动预测到主动引导的范式转变。

English

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.

如何拍出令人难忘的照片？赋予用户可操作的反馈建议

How to Take a Memorable Picture? Empowering Users with Actionable Feedback

摘要

Support