Topic-VQ-VAE: 潜在コードブックを活用した柔軟なトピック誘導型文書生成

要旨

本論文では、Vector-Quantized Variational Auto-Encoder（VQ-VAE）の潜在コードブックを活用した新しいトピックモデリング手法を提案する。この手法は、事前学習済み言語モデルなどの埋め込み表現の豊富な情報を離散的にカプセル化する。潜在コードブックと埋め込み表現を概念的なBag-of-Wordsとして新たに解釈し、それぞれの潜在コードブックに関連する元の文書を逆生成する新しい生成型トピックモデル、Topic-VQ-VAE（TVQ-VAE）を提案する。TVQ-VAEは、従来のBoW分布や自己回帰的な画像生成を含む多様な生成分布を用いてトピックを可視化することができる。文書分析と画像生成に関する実験結果から、TVQ-VAEがデータセットの潜在構造を明らかにするトピックコンテキストを効果的に捉え、柔軟な形式の文書生成をサポートすることが示された。提案されたTVQ-VAEの公式実装はhttps://github.com/clovaai/TVQ-VAEで公開されている。

English

This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates the original documents related to the respective latent codebook. The TVQ-VAE can visualize the topics with various generative distributions including the traditional BoW distribution and the autoregressive image generation. Our experimental results on document analysis and image generation demonstrate that TVQ-VAE effectively captures the topic context which reveals the underlying structures of the dataset and supports flexible forms of document generation. Official implementation of the proposed TVQ-VAE is available at https://github.com/clovaai/TVQ-VAE.

Topic-VQ-VAE: 潜在コードブックを活用した柔軟なトピック誘導型文書生成

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

要旨

Support