CodeV:透過多層摘要強化LLM用於Verilog生成
CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization
July 15, 2024
作者: Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen
cs.AI
摘要
隨著現代處理器設計日益複雜和成本高昂,導致對處理器設計自動化的需求急劇增加。調整指令的大型語言模型(LLMs)已展示出在自動生成Python等通用程式語言代碼方面的卓越性能。然而,這些方法在硬體描述語言(HDLs)如Verilog上失敗,原因在於缺乏高質量的指令調整數據,即使像GPT-3.5這樣的先進LLMs在Verilog生成方面的表現也受限。針對這個問題,我們觀察到(1)從現實世界中收集的Verilog代碼比LLMs生成的代碼質量更高。 (2)像GPT-3.5這樣的LLMs在總結Verilog代碼方面表現出色,而非生成代碼。基於這些觀察,本文介紹了CodeV,一系列開源的調整指令Verilog生成LLMs。與從先進LLMs首先生成描述,然後獲取相應代碼不同,我們通過多級總結提示LLMs提供Verilog代碼,讓LLMs生成相應的自然語言描述。實驗結果顯示,CodeV在VerilogEval中相對於先前的開源SOTA分別提高了14.4%(VerilogEval中的BetterV)和11.3%(RTLLM中的RTLCoder),並且在VerilogEval中相對於先前的商業SOTA GPT-4提高了22.1%。
English
The increasing complexity and high costs associated with modern processor
design have led to a surge in demand for processor design automation.
Instruction-tuned large language models (LLMs) have demonstrated remarkable
performance in automatically generating code for general-purpose programming
languages like Python. However, these methods fail on hardware description
languages (HDLs) like Verilog due to the scarcity of high-quality instruction
tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on
Verilog generation. Regarding this issue, we observe that (1) Verilog code
collected from the real world has higher quality than those generated by LLMs.
(2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating
it. Based on these observations, this paper introduces CodeV, a series of
open-source instruction-tuned Verilog generation LLMs. Instead of generating
descriptions first and then getting the corresponding code from advanced LLMs,
we prompt the LLM with Verilog code and let the LLM generate the corresponding
natural language description by multi-level summarization. Experimental results
show that CodeV relatively surpasses the previous open-source SOTA by 14.4%
(BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also
relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.Summary
AI-Generated Summary