RONA：基於連貫關係的實用多樣化圖像描述生成

摘要

傳統上，寫作助手（如Grammarly、Microsoft Copilot）通過運用句法和語義的變化來描述圖像元素，從而生成多樣的圖像標題。然而，人類撰寫的標題則優先考慮在視覺描述的同時，利用語用線索傳達核心訊息。為了增強語用多樣性，探索與視覺內容相結合的替代訊息傳達方式至關重要。為應對這一挑戰，我們提出了RONA，這是一種針對多模態大型語言模型（MLLM）的新穎提示策略，它利用連貫關係作為變化的軸心。我們證明，與多個領域的MLLM基線相比，RONA生成的標題在整體多樣性和與真實情況的對齊方面表現更佳。我們的代碼可在以下網址獲取：https://github.com/aashish2000/RONA。

English

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

RONA：基於連貫關係的實用多樣化圖像描述生成

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

摘要

Support