提示編排標記語言

摘要

大型語言模型（LLMs）需要精細的提示設計，然而當前的實踐在結構、數據整合、格式敏感性以及工具支持方面面臨挑戰。現有方法缺乏全面解決方案來組織涉及多樣數據類型（文檔、表格、圖像）的複雜提示，或系統地管理呈現變體。為填補這些空白，我們引入了POML（提示編排標記語言）。POML採用基於組件的標記來實現邏輯結構（角色、任務、示例），使用專用標籤實現無縫數據整合，並採用類似CSS的樣式系統來分離內容與呈現，從而降低格式敏感性。它包含用於動態提示的模板化功能，以及一套全面的開發者工具包（IDE支持、SDKs），以提升版本控制與協作效率。我們通過兩個案例研究驗證了POML，展示了其在複雜應用集成（PomLink）和準確性表現（TableQA）方面的影響，並進行了一項用戶研究，評估其在實際開發場景中的有效性。

English

Large Language Models (LLMs) require sophisticated prompting, yet current practices face challenges in structure, data integration, format sensitivity, and tooling. Existing methods lack comprehensive solutions for organizing complex prompts involving diverse data types (documents, tables, images) or managing presentation variations systematically. To address these gaps, we introduce POML (Prompt Orchestration Markup Language). POML employs component-based markup for logical structure (roles, tasks, examples), specialized tags for seamless data integration, and a CSS-like styling system to decouple content from presentation, reducing formatting sensitivity. It includes templating for dynamic prompts and a comprehensive developer toolkit (IDE support, SDKs) to improve version control and collaboration. We validate POML through two case studies demonstrating its impact on complex application integration (PomLink) and accuracy performance (TableQA), as well as a user study assessing its effectiveness in real-world development scenarios.