Libretto：为LLM智能体赋予音乐结构感

摘要

生成式音乐系统如今能够根据文本提示生成令人印象深刻的音频，但这些音频输出作为音乐结构来说，难以检查、编辑和诊断。我们提出Libretto，一个面向代理的符号音乐生成与修订框架。Libretto采用基于大语言模型的原生语法，包含显式的起始时隙、声部和小节层级组织，然后通过节律、和声、旋律、织体、曲式和变奏等维度，在语料库校准的统计空间中评估每首作品。相同的结构轴支持检索、诊断、抄袭风险控制以及迭代式自我修订。在空缺填充、参考引导的全曲生成、渐进式变形以及教育性音乐生成等任务中，Libretto将符号音乐从原始的令牌序列转变为可供语言模型代理测量和编辑的对象。

English

Generative music systems can now produce impressive audio from text prompts, but audio outputs are difficult to inspect, edit, and diagnose as musical structure. We introduce Libretto, an agent-facing framework for symbolic music generation and revision. Libretto uses an LLM-native grammar with explicit onset slots, voices, and bar-level organization, then evaluates each piece in a corpus-calibrated statistical space over rhythm, harmony, melody, texture, form, and variation. The same structural axes support retrieval, diagnosis, copy-risk control, and iterative self-revision. Across gap filling, reference-guided full-piece generation, gradual morphing, and educational music generation, Libretto turns symbolic music from a raw token sequence into a measurable and editable object for language-model agents.