TensorBLEU: トレーニング中の文単位評価のためのベクトル化GPUベースBLEUスコア実装

要旨

現代の自然言語処理モデルは前例のない規模を達成しているが、その評価ツールはしばしば計算上のボトルネックとなり、研究のペースを制限している。これは特に、強化学習における文単位の報酬信号など、トレーニング中の評価指標において顕著であり、GPU上でトークンIDのバッチに対して効率的に動作する必要がある。本論文では、この特定のユースケースのためにゼロから設計されたBLEUメトリックの新しい実装であるTensorBLEUを紹介する。我々のアプローチは、PyTorch内でのGPU加速による文単位の計算のために完全にベクトル化されており、メモリ効率の良いカウント機構を導入している。torch.uniqueを使用してn-gramのコンパクトなバッチ固有の辞書を作成することで、従来のハッシュベースのベクトル化に伴う膨大なメモリコストを回避し、大規模語彙モデルにおいて実用的なものとしている。我々は、CPU上でのトークンIDベースのBLEU計算の標準ライブラリであるNLTKに対してTensorBLEUをベンチマークした。実験結果は、TensorBLEUがコンシューマーグレードのGPU（NVIDIA T4）で13倍以上、データセンタークラスのハードウェア（NVIDIA A100）で40倍以上の高速化を提供することを示している。この性能により、重要なボトルネックがトレーニングループにおいて無視できる部分に変わる。開発目的の「トークンID BLEU」としての役割を明確に定義し、実装をオープンソース化することで、RLベースのモデルファインチューニングなどの分野における研究を加速する強力なツールを提供する。

English

Modern natural language processing models have achieved unprecedented scale, yet the tools for their evaluation often remain a computational bottleneck, limiting the pace of research. This is particularly acute for in-training evaluation metrics, such as per-sentence reward signals in Reinforcement Learning, which must operate efficiently on batches of token IDs directly on the GPU. In this paper, we introduce TensorBLEU, a novel implementation of the BLEU metric designed from the ground up for this specific use case. Our approach is fully vectorized for GPU-accelerated, per-sentence computation within PyTorch and introduces a memory-efficient counting mechanism. By creating a compact, batch-specific dictionary of n-grams using torch.unique, our method avoids the prohibitive memory costs of traditional hashing-based vectorization, making it practical for large-vocabulary models. We benchmark TensorBLEU against NLTK, the standard library for token-ID-based BLEU calculation on the CPU. Experiments show that TensorBLEU provides speedups of over 13x on consumer-grade GPUs (NVIDIA T4) and exceeding 40x on data-center-class hardware (NVIDIA A100). This performance transforms a significant bottleneck into a negligible part of the training loop. By clearly defining its role as a "Token-ID BLEU" for development purposes and open-sourcing our implementation, we provide a powerful tool for accelerating research in areas like RL-based model fine-tuning.

TensorBLEU: トレーニング中の文単位評価のためのベクトル化GPUベースBLEUスコア実装

TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation

要旨

Support