言語モデルは単射であり、したがって可逆的である

要旨

非線形活性化関数や正規化などのTransformerの構成要素は、本質的に非単射的であり、異なる入力が同じ出力にマッピングされ、モデルの表現から入力の正確な復元が妨げられる可能性がある。本論文では、この見解に異議を唱える。まず、離散的な入力シーケンスを対応する連続的な表現シーケンスにマッピングするTransformer言語モデルが、初期化時に確立され、訓練中に維持される単射的であり、したがってロスレスであることを数学的に証明する。次に、この結果を、6つの最先端言語モデルに対する数十億回の衝突テストを通じて実証的に確認し、衝突が観察されないことを示す。さらに、単射性を実用的に活用するために、SipItを導入する。SipItは、隠れ層の活性化から正確に入力テキストを再構築する初のアルゴリズムであり、線形時間保証を確立し、実践的な正確な可逆性を実証する。全体として、本研究は、単射性を言語モデルの基本的かつ活用可能な特性として確立し、透明性、解釈可能性、安全な展開に直接的な影響を与える。

English

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.

言語モデルは単射であり、したがって可逆的である

Language Models are Injective and Hence Invertible

要旨

Support