기억에 남는 사진 찍는 법: 실천 가능한 피드백으로 사용자 역량 강화하기

초록

이미지 기억성, 즉 이미지가 기억될 가능성은 전통적으로 컴퓨터 비전 분야에서 두 가지 방식으로 연구되어 왔습니다. 하나는 모델이 스칼라 점수를 회귀하는 수동적 예측 과제로 접근하는 것이고, 다른 하나는 생성적 방법으로 시각적 입력을 변형하여 이미지가 기억될 가능성을 높이는 것입니다. 그러나 이러한 패러다임 중 그 어느 것도 사용자가 촬영 시점에 '사진의 기억성을 어떻게 향상시킬 수 있는가'라는 핵심적인 질문을 던질 때 지원을 제공하지 못합니다. 본 연구는 **기억성 피드백(Memorability Feedback, MemFeed)** 이라는 과제를 소개합니다. 이는 자동화된 모델이 사용자에게 실행 가능하고 인간이 이해할 수 있는 지침을 제공하여 이미지의 미래 회상력을 강화하는 것을 목표로 합니다. 또한 기억성 향상을 위한 자연어 기반의 구체적인 제안(예: "표정을 강조하세요", "주체를 앞으로 가져오세요")을 제공하도록 설계된 최초의 접근법인 **MemCoach**를 제시합니다. 멀티모달 대형 언어 모델(Multimodal Large Language Models, MLLMs)을 기반으로 하는 우리의 방법은 훈련이 필요 없으며, 교사-학생 조정(teacher-student steering) 전략을 사용하여 모델의 내부 활성화를 가장 기억하기 어려운 샘플에서 가장 기억하기 쉬운 샘플로 진행하는 교사 모델로부터 학습된 더 기억하기 쉬운 패턴에 정렬시킵니다. 이 새로운 과제에 대한 체계적인 평가를 가능하게 하기 위해, 우리는 주석이 달린 기억성 점수를 가진 순서 정렬 사진 촬영 시퀀스를 특징으로 하는 새로운 벤치마크 **MemBench**를 추가로 소개합니다. 여러 MLLMs을 고려한 우리의 실험은 MemCoach의 효과를 입증하며, 여러 제로샷 모델 대비 일관되게 향상된 성능을 보여줍니다. 결과는 기억성이 예측될 수 있을 뿐만 아니라 가르쳐지고 지시될 수 있음을 시사하며, 단순한 예측에서 인간 창작자를 위한 실행 가능한 피드백으로 초점을 전환합니다.

English

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.

기억에 남는 사진 찍는 법: 실천 가능한 피드백으로 사용자 역량 강화하기

How to Take a Memorable Picture? Empowering Users with Actionable Feedback

초록

Support