Dal compromesso alla sinergia: un framework versatile di watermarking simbiotico per i grandi modelli linguistici

Abstract

L'ascesa dei Large Language Models (LLM) ha accentuato le preoccupazioni riguardo all'uso improprio di testi generati dall'IA, rendendo la filigranatura una soluzione promettente. Gli schemi di filigranatura mainstream per i LLM si dividono in due categorie: basati sui logit e basati sul campionamento. Tuttavia, gli schemi attuali comportano compromessi tra robustezza, qualità del testo e sicurezza. Per mitigare ciò, integriamo schemi basati sui logit e sul campionamento, sfruttando i rispettivi punti di forza per ottenere sinergia. In questo articolo, proponiamo un framework versatile di filigranatura simbiotica con tre strategie: seriale, parallela e ibrida. Il framework ibrido incorpora adattivamente le filigrane utilizzando l'entropia dei token e l'entropia semantica, ottimizzando l'equilibrio tra rilevabilità, robustezza, qualità del testo e sicurezza. Inoltre, convalidiamo il nostro approccio attraverso esperimenti completi su vari dataset e modelli. I risultati sperimentali indicano che il nostro metodo supera le baseline esistenti e raggiunge prestazioni all'avanguardia (SOTA). Crediamo che questo framework fornisca nuove intuizioni su diversi paradigmi di filigranatura. Il nostro codice è disponibile all'indirizzo https://github.com/redwyd/SymMark{https://github.com/redwyd/SymMark}.

English

The rise of Large Language Models (LLMs) has heightened concerns about the misuse of AI-generated text, making watermarking a promising solution. Mainstream watermarking schemes for LLMs fall into two categories: logits-based and sampling-based. However, current schemes entail trade-offs among robustness, text quality, and security. To mitigate this, we integrate logits-based and sampling-based schemes, harnessing their respective strengths to achieve synergy. In this paper, we propose a versatile symbiotic watermarking framework with three strategies: serial, parallel, and hybrid. The hybrid framework adaptively embeds watermarks using token entropy and semantic entropy, optimizing the balance between detectability, robustness, text quality, and security. Furthermore, we validate our approach through comprehensive experiments on various datasets and models. Experimental results indicate that our method outperforms existing baselines and achieves state-of-the-art (SOTA) performance. We believe this framework provides novel insights into diverse watermarking paradigms. Our code is available at https://github.com/redwyd/SymMark{https://github.com/redwyd/SymMark}.

Dal compromesso alla sinergia: un framework versatile di watermarking simbiotico per i grandi modelli linguistici

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

Abstract

Support