Story2Proposal: 構造化された科学論文執筆のための足場

要旨

科学論文の作成には、文書ライフサイクル全体を通じて、論理的展開、実験的証拠、視覚的要素の整合性を維持することが求められる。既存の言語モデルによる生成パイプラインは、制約のないテキスト合成に依存し、検証は生成後にのみ適用されるため、構造的なずれ、図表の欠落、章間の不整合が生じやすい。本論文では、研究ストーリーを構造化された論文に変換する契約駆動型マルチエージェントフレームワーク「Story2Proposal」を提案する。本システムは、セクション構造と登録された視覚要素を追跡する契約状態を中心に、アーキテクト、ライター、リファイナー、レンダラーエージェントを編成し、評価エージェントが生成-評価-適応ループでフィードバックを提供することで、生成過程中に契約を更新する。Jericho研究コーパスに基づくタスクでの実験において、Story2ProposalはGPT、Claude、Gemini、Qwenをバックボーンとした場合、DirectChatの3.963に対し6.145（+2.182）の専門家評価スコアを達成した。構造化生成ベースラインのFarsと比較すると、Story2Proposalは平均5.705対5.197のスコアを得て、構造的一貫性と視覚的整合性の向上を示した。

English

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing structural drift, missing figures or tables, and cross-section inconsistencies. We introduce Story2Proposal, a contract-governed multi-agent framework that converts a research story into a structured manuscript through coordinated agents operating under a persistent shared visual contract. The system organizes architect, writer, refiner, and renderer agents around a contract state that tracks section structure and registered visual elements, while evaluation agents supply feedback in a generate evaluate adapt loop that updates the contract during generation. Experiments on tasks derived from the Jericho research corpus show that Story2Proposal achieved an expert evaluation score of 6.145 versus 3.963 for DirectChat (+2.182) across GPT, Claude, Gemini, and Qwen backbones. Compared with the structured generation baseline Fars, Story2Proposal obtained an average score of 5.705 versus 5.197, indicating improved structural consistency and visual alignment.