OLMoTrace: 言語モデルの出力を数兆のトレーニングトークンに遡るトレース

要旨

我々はOLMoTraceを発表します。これは、言語モデルの出力をその数兆トークンに及ぶトレーニングデータにリアルタイムで遡及する初のシステムです。OLMoTraceは、言語モデルの出力セグメントとトレーニングテキストコーパス内の文書との間の逐語的な一致を発見し表示します。拡張版infini-gram（Liu et al., 2024）を活用した本システムは、数秒以内にトレーシング結果を返します。OLMoTraceは、ユーザーがトレーニングデータを通じて言語モデルの挙動を理解するのに役立ちます。我々は、ファクトチェック、ハルシネーション、そして言語モデルの創造性を探るための使用方法を実演します。OLMoTraceは公開されており、完全なオープンソースです。

English

We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace finds and shows verbatim matches between segments of language model output and documents in the training text corpora. Powered by an extended version of infini-gram (Liu et al., 2024), our system returns tracing results within a few seconds. OLMoTrace can help users understand the behavior of language models through the lens of their training data. We showcase how it can be used to explore fact checking, hallucination, and the creativity of language models. OLMoTrace is publicly available and fully open-source.