mStyleDistance:多語言風格嵌入及其評估
mStyleDistance: Multilingual Style Embeddings and their Evaluation
February 21, 2025
作者: Justin Qiu, Jiacheng Zhu, Ajay Patel, Marianna Apidianaki, Chris Callison-Burch
cs.AI
摘要
風格嵌入在風格分析與風格轉換中極為有用;然而,目前僅有英文的風格嵌入可供使用。我們引入了多語言風格距離(mStyleDistance),這是一個利用合成數據和對比學習訓練的多語言風格嵌入模型。我們在九種語言的數據上訓練該模型,並創建了一個多語言的STEL-or-Content基準(Wegmann等,2022),用以評估嵌入的質量。此外,我們還將這些嵌入應用於涉及不同語言的作者驗證任務中。我們的結果顯示,mStyleDistance嵌入在這些多語言風格基準上優於現有模型,並且能夠很好地泛化到未見的特徵和語言上。我們已將模型公開於https://huggingface.co/StyleDistance/mstyledistance。
English
Style embeddings are useful for stylistic analysis and style transfer;
however, only English style embeddings have been made available. We introduce
Multilingual StyleDistance (mStyleDistance), a multilingual style embedding
model trained using synthetic data and contrastive learning. We train the model
on data from nine languages and create a multilingual STEL-or-Content benchmark
(Wegmann et al., 2022) that serves to assess the embeddings' quality. We also
employ our embeddings in an authorship verification task involving different
languages. Our results show that mStyleDistance embeddings outperform existing
models on these multilingual style benchmarks and generalize well to unseen
features and languages. We make our model publicly available at
https://huggingface.co/StyleDistance/mstyledistance .Summary
AI-Generated Summary