ChatPaper.aiChatPaper

BERT-VBD:越南多文件摘要框架

BERT-VBD: Vietnamese Multi-Document Summarization Framework

September 18, 2024
作者: Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong
cs.AI

摘要

在應對多文件摘要(MDS)的挑戰中,已提出許多方法,涵蓋了提取式和抽象式摘要技術。然而,每種方法都有其局限性,僅依賴其中一種方法往往效果不佳。一種新興且有前景的策略是將提取式和抽象式摘要方法融合起來。儘管在這個領域有大量研究,但關於結合方法的研究仍然很少,特別是在越南語言處理的背景下。本文提出了一個新穎的越南語MDS框架,利用了一個兩組件管道架構,整合了提取式和抽象式技術。第一個組件採用提取式方法來識別每個文件中的關鍵句子。這是通過修改預訓練的BERT網絡實現的,該網絡使用siamese和triplet網絡結構來生成語義上有意義的短語嵌入。第二個組件利用VBD-LLaMA2-7B-50b模型進行抽象式摘要,最終生成最絈摘要文件。我們提出的框架表現出積極的性能,達到了VN-MDS數據集上39.6%的ROUGE-2分數,並優於最先進的基準。
English
In tackling the challenge of Multi-Document Summarization (MDS), numerous methods have been proposed, spanning both extractive and abstractive summarization techniques. However, each approach has its own limitations, making it less effective to rely solely on either one. An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods. Despite the plethora of studies in this domain, research on the combined methodology remains scarce, particularly in the context of Vietnamese language processing. This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture that integrates extractive and abstractive techniques. The first component employs an extractive approach to identify key sentences within each document. This is achieved by a modification of the pre-trained BERT network, which derives semantically meaningful phrase embeddings using siamese and triplet network structures. The second component utilizes the VBD-LLaMA2-7B-50b model for abstractive summarization, ultimately generating the final summary document. Our proposed framework demonstrates a positive performance, attaining ROUGE-2 scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art baselines.

Summary

AI-Generated Summary

PDF12November 16, 2024