印象的な写真を撮る方法：実践的なフィードバックによるユーザー支援

要旨

画像記憶性、すなわち画像が記憶に残りやすさの度合いは、従来、コンピュータビジョン分野において、モデルがスカラー値を回帰する受動的な予測タスクとして、あるいは、記憶に残りやすさを高めるために視覚入力を変更する生成的アプローチによって研究されてきた。しかし、これらのパラダイムはいずれも、写真の記憶性を「どのように向上させるか」という核心的な問いが生じる撮影時点において、ユーザーを支援するものではない。本研究では、**Memorability Feedback (MemFeed)** という新たなタスクを提案する。これは、自動化されたモデルが、画像の将来的な想起を高めることを目的として、ユーザーに対して実行可能で人間が解釈可能な指針を提供すべきタスクである。さらに我々は、記憶性向上のための具体的な提案（例：「表情を強調する」「被写体を手前に出す」）を自然言語で提供する、初のアプローチである**MemCoach**を発表する。マルチモーダル大規模言語モデルに基づく本手法は訓練不要であり、教師-生徒の連携戦略を採用する。これは、モデルの内部活性化を、記憶性の低いサンプルから高いサンプルへと進む教師モデルから学習した、より記憶に残りやすいパターンに向けて調整するものである。この新規タスクの体系的な評価を可能にするため、注釈付き記憶性スコアを持つ連続した写真群から構成される新しいベンチマーク**MemBench**をさらに導入する。複数のMLLMを考慮した実験により、MemCoachの有効性が実証され、いくつかのゼロショットモデルを一貫して上回る性能が示された。この結果は、記憶性が予測可能であるだけでなく、教え、指示することも可能であり、単なる予測から人間の創造者への実行可能なフィードバックへと焦点を移行し得ることを示唆している。

English

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.

印象的な写真を撮る方法：実践的なフィードバックによるユーザー支援

How to Take a Memorable Picture? Empowering Users with Actionable Feedback

要旨

Support