ChatPaper.aiChatPaper

BERT-VBD:越南多文档摘要框架

BERT-VBD: Vietnamese Multi-Document Summarization Framework

September 18, 2024
作者: Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong
cs.AI

摘要

在应对多文档摘要(MDS)的挑战时,已经提出了许多方法,涵盖了抽取式和生成式摘要技术。然而,每种方法都有其局限性,仅依赖其中一种方法往往效果不佳。一种新兴且有前景的策略涉及将抽取式和生成式摘要方法融合起来。尽管在这一领域有大量研究,但关于结合方法的研究仍然很少,特别是在越南语处理的背景下。本文提出了一个新颖的越南语MDS框架,利用了一个包含两个组件的流水线架构,集成了抽取式和生成式技术。第一个组件采用抽取式方法来识别每个文档中的关键句。这是通过修改预训练的BERT网络实现的,该网络使用siamese和triplet网络结构来生成语义上有意义的短语嵌入。第二个组件利用VBD-LLaMA2-7B-50b模型进行生成式摘要,最终生成最终摘要文档。我们提出的框架表现出良好性能,VN-MDS数据集上达到了39.6%的ROUGE-2分数,优于现有技术基线。
English
In tackling the challenge of Multi-Document Summarization (MDS), numerous methods have been proposed, spanning both extractive and abstractive summarization techniques. However, each approach has its own limitations, making it less effective to rely solely on either one. An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods. Despite the plethora of studies in this domain, research on the combined methodology remains scarce, particularly in the context of Vietnamese language processing. This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture that integrates extractive and abstractive techniques. The first component employs an extractive approach to identify key sentences within each document. This is achieved by a modification of the pre-trained BERT network, which derives semantically meaningful phrase embeddings using siamese and triplet network structures. The second component utilizes the VBD-LLaMA2-7B-50b model for abstractive summarization, ultimately generating the final summary document. Our proposed framework demonstrates a positive performance, attaining ROUGE-2 scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art baselines.

Summary

AI-Generated Summary

PDF12November 16, 2024