リブレット：LLMエージェントに音楽構造の感覚を与える

要旨

生成音楽システムは現在、テキストプロンプトから印象的な音声を生成できるようになったが、音声出力は音楽構造としての検査、編集、診断が難しい。本稿では、記号的音楽の生成と修正のためのエージェント向けフレームワークであるLibrettoを紹介する。Librettoは、明示的なオンセットスロット、ボイス、小節単位の組織化を備えたLLMネイティブな文法を採用し、各作品をリズム、和声、旋律、テクスチャ、形式、変奏にわたるコーパス校正された統計空間で評価する。同じ構造軸が検索、診断、コピーリスク管理、そして反復的自己修正を支える。ギャップ補完、参照誘導型全曲生成、段階的モーフィング、および教育向け音楽生成にわたって、Librettoは記号的音楽を生のトークン列から、言語モデルエージェントにとって測定可能かつ編集可能なオブジェクトへと変換する。

English

Generative music systems can now produce impressive audio from text prompts, but audio outputs are difficult to inspect, edit, and diagnose as musical structure. We introduce Libretto, an agent-facing framework for symbolic music generation and revision. Libretto uses an LLM-native grammar with explicit onset slots, voices, and bar-level organization, then evaluates each piece in a corpus-calibrated statistical space over rhythm, harmony, melody, texture, form, and variation. The same structural axes support retrieval, diagnosis, copy-risk control, and iterative self-revision. Across gap filling, reference-guided full-piece generation, gradual morphing, and educational music generation, Libretto turns symbolic music from a raw token sequence into a measurable and editable object for language-model agents.