LayoutNUWA: Die verborgenen Layout-Fähigkeiten großer Sprachmodelle aufdecken

papers.abstract

Die Generierung von Grafiklayouts, ein wachsendes Forschungsfeld, spielt eine bedeutende Rolle bei der Benutzerbindung und der Wahrnehmung von Informationen. Bestehende Methoden behandeln die Layoutgenerierung hauptsächlich als numerische Optimierungsaufgabe, wobei sie sich auf quantitative Aspekte konzentrieren und die semantischen Informationen des Layouts, wie die Beziehung zwischen den einzelnen Layoutelementen, vernachlässigen. In diesem Artikel stellen wir LayoutNUWA vor, das erste Modell, das die Layoutgenerierung als eine Codegenerierungsaufgabe behandelt, um die semantischen Informationen zu verbessern und das versteckte Layoutwissen von großen Sprachmodellen (LLMs) zu nutzen. Konkret entwickeln wir einen Code-Instruct-Tuning-Ansatz (CIT), der aus drei miteinander verbundenen Modulen besteht: 1) Das Code-Initialisierungsmodul (CI) quantifiziert die numerischen Bedingungen und initialisiert sie als HTML-Code mit strategisch platzierten Masken; 2) Das Code-Vervollständigungsmodul (CC) nutzt das Formatierungswissen von LLMs, um die maskierten Bereiche im HTML-Code auszufüllen; 3) Das Code-Rendering-Modul (CR) transformiert den vervollständigten Code in das endgültige Layout, wodurch ein hoch interpretierbarer und transparenter Layoutgenerierungsprozess gewährleistet wird, der Code direkt in ein visualisiertes Layout abbildet. Wir erreichen signifikante state-of-the-art Leistungen (sogar über 50 % Verbesserungen) auf mehreren Datensätzen, was die starken Fähigkeiten von LayoutNUWA unterstreicht. Unser Code ist verfügbar unter https://github.com/ProjectNUWA/LayoutNUWA.

English

Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception. Existing methods primarily treat layout generation as a numerical optimization task, focusing on quantitative aspects while overlooking the semantic information of layout, such as the relationship between each layout element. In this paper, we propose LayoutNUWA, the first model that treats layout generation as a code generation task to enhance semantic information and harness the hidden layout expertise of large language models~(LLMs). More concretely, we develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules: 1) the Code Initialization (CI) module quantifies the numerical conditions and initializes them as HTML code with strategically placed masks; 2) the Code Completion (CC) module employs the formatting knowledge of LLMs to fill in the masked portions within the HTML code; 3) the Code Rendering (CR) module transforms the completed code into the final layout output, ensuring a highly interpretable and transparent layout generation procedure that directly maps code to a visualized layout. We attain significant state-of-the-art performance (even over 50\% improvements) on multiple datasets, showcasing the strong capabilities of LayoutNUWA. Our code is available at https://github.com/ProjectNUWA/LayoutNUWA.

LayoutNUWA: Die verborgenen Layout-Fähigkeiten großer Sprachmodelle aufdecken

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

papers.abstract

Support