Many people I know are interested in playing with OpenAI’s large language models (LLMs). But hosting LLMs is expensive, and thus, inference services like OpenAI’s application programming interface (API) are not free. But entering your payment information without knowing what inferencing costs will add up to can be a little intimidating.
Usually, I like to include a little indicator of the API costs of a walkthrough of my articles, so my readers know what to expect and can get a feeling for inferencing costs.
This article introduces you to the
tiktoken library I use to estimate inferencing costs for OpenAI foundation models.
tiktoken is an open-source byte pair encoding (BPE) tokenizer developed by OpenAI that is used for tokenizing text in their LLMs. It allows developers to count how many tokens are in a text before making calls to the OpenAI endpoint.
It thus helps with estimating the associated costs of using the OpenAI API because its costs are billed in units of 1,000 tokens according to the OpenAI’s pricing page .
Tokens are common sequences of characters in a text, and tokenization is when you split a text string into a list of tokens. A token can be equal to a word but usually a word consists of multiple tokens.
Natural language processing (NLP) models are trained on tokens and understand the relationships between them. Thus, the input text is tokenized before an NLP model processes it.
But how words are tokenized exactly depends on the used tokenizer.
Below you can see an example of how the text
“Alice has a parrot.
What animal is Alice’s pet?