RONA：基于连贯关系的实用多样化图像描述生成

摘要

传统写作助手（如Grammarly、Microsoft Copilot）通常通过运用句法和语义的多样性来描述图像元素，从而生成多样化的图像标题。然而，人类撰写的标题更注重在视觉描述之外，借助语用线索传达核心信息。为了提升语用多样性，探索与视觉内容相结合的其他信息传达方式至关重要。针对这一挑战，我们提出了RONA，一种新颖的多模态大语言模型（MLLM）提示策略，它利用连贯关系作为变化轴。我们证明，与跨多个领域的MLLM基线相比，RONA生成的标题在整体多样性和与真实情况的契合度上表现更优。我们的代码已公开于：https://github.com/aashish2000/RONA。

English

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

RONA：基于连贯关系的实用多样化图像描述生成

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

摘要

Support