Macaron-A2UI: パーソナルエージェントにおけるジェネレーティブUIのモデル

要旨

パーソナルエージェントが複雑でユーザー中心のタスクを処理するように進化するにつれて、静的なプレーンテキストチャットは急速にボトルネックになりつつある。生成型UIは必要な新しいインターフェース層として登場し、対話コンテキストから適切なコントロール、オプション、状態をリアルタイムで動的に合成する。我々は、パーソナルエージェントにおける生成型UIのためのモデルであるMacaron-A2UIを提案する。我々の目標は、エージェントが情報収集、嗜好洗練、確認、マルチゴール整理のための軽量で実行可能なUIアクションとともに自然言語を生成できるようにすることで、テキストのみの対話を超えることである。我々は、異種の対話ソースから大規模な生成型UIコーパスを構築し、制御された評価のためのA2UI-Benchを導入し、パラメータ効率的なLoRAベースの教師ありファインチューニングとそれに続く報酬駆動型強化学習を用いて30B、235B、754Bモデルを訓練する。最良のMacaron-A2UIモデルは、明示的なスキーマヒントなしでA2UI-Bench上で全体75.6を達成し、最も強力なフルスキーマの最先端ベースラインを上回る。我々は、パーソナルエージェントにおける生成型UIに関する将来の研究を支援するために、モデル、ベンチマーク、評価プロトコルを公開する。

English

As personal agents evolve to handle complex, user-centric tasks, static plain-text chat is rapidly becoming a bottleneck. Generative UI emerges as the necessary new interface layer, dynamically synthesizing the right controls, options, and state from the interaction context in real time. We present Macaron-A2UI, a model for Generative UI in personal agents. Our goal is to move beyond text-only interaction by enabling agents to generate natural language together with lightweight, executable UI actions for information collection, preference refinement, confirmation, and multi-goal organization. We build a large-scale Generative UI corpus from heterogeneous dialogue sources, introduce A2UI-Bench for controlled evaluation, and train 30B, 235B and 754B models with parameter-efficient LoRA-based supervised fine-tuning followed by reward-driven reinforcement learning. The best Macaron-A2UI model reaches 75.6 overall on A2UI-Bench without explicit schema hints, surpassing the strongest full-schema frontier baseline. We release the models, benchmark, and evaluation protocol to support future work on Generative UI for personal agents.