Gibt es einen Fall für konversationsoptimierte Tokenizer in großen Sprachmodellen?

papers.abstract

Die Rechen- und Energiekosten von Large Language Models (LLMs) haben exponentiell zugenommen, angetrieben durch die wachsenden Modellgrößen und die massive Nutzung von LLMs durch Hunderte Millionen von Benutzern. Die Einheitskosten eines LLMs sind die Berechnung eines Tokens. Daher spielt der Tokenizer eine wichtige Rolle für die Effizienz eines Modells, und sie werden sorgfältig optimiert, um die Anzahl der Tokens für den Text in ihrem Trainingskorpus zu minimieren. Eine der beliebtesten Anwendungen von LLMs sind Chatbots, die mit Benutzern interagieren. Eine wichtige Beobachtung ist, dass für diese Chatbots die Leistung des Tokenizers im Benutzereingabetext und in den Chatbot-Antworten entscheidend ist. Diese unterscheiden sich höchstwahrscheinlich vom Text im Trainingskorpus. Daher stellt sich unmittelbar die Frage, ob es einen potenziellen Nutzen gibt, Tokenizer für Chatbot-Konversationen zu optimieren. In dieser Arbeit wird diese Idee für verschiedene Tokenizer untersucht, indem ein öffentlich verfügbarer Korpus von Chatbot-Konversationen verwendet wird, um deren Vokabulare neu zu gestalten und ihre Leistung in diesem Bereich zu bewerten. Die Ergebnisse zeigen, dass konversationsoptimierte Tokenizer die Anzahl der Tokens in Chatbot-Dialogen konsequent reduzieren, was zu signifikanten Energieeinsparungen im Bereich von 5 % bis 10 % führen kann, während die Tokenisierungseffizienz für den ursprünglichen Trainingskorpus minimal oder sogar leicht positiv beeinflusst wird.

English

The computational and energy costs of Large Language Models (LLMs) have increased exponentially driven by the growing model sizes and the massive adoption of LLMs by hundreds of millions of users. The unit cost of an LLM is the computation of a token. Therefore, the tokenizer plays an important role in the efficiency of a model, and they are carefully optimized to minimize the number of tokens for the text in their training corpus. One of the most popular applications of LLMs are chatbots that interact with users. A key observation is that, for those chatbots, what is important is the performance of the tokenizer in the user text input and the chatbot responses. Those are most likely different from the text in the training corpus. So, a question that immediately arises is whether there is a potential benefit in optimizing tokenizers for chatbot conversations. In this paper, this idea is explored for different tokenizers by using a publicly available corpus of chatbot conversations to redesign their vocabularies and evaluate their performance in this domain. The results show that conversation-optimized tokenizers consistently reduce the number of tokens in chatbot dialogues, which can lead to meaningful energy savings, in the range of 5% to 10% while having minimal or even slightly positive impact on tokenization efficiency for the original training corpus.

Gibt es einen Fall für konversationsoptimierte Tokenizer in großen Sprachmodellen?

Is There a Case for Conversation Optimized Tokenizers in Large Language Models?

papers.abstract

Support