With the release of LLaMA v1, we saw a Cambrian explosion of fine-tuned models, including Alpaca, Vicuna, and WizardLM, among others. This trend encouraged different businesses to launch their own base models with licenses suitable for commercial use, such as OpenLLaMA, Falcon, XGen, etc. The release of Llama 2 now combines the best elements from both sides: it offers a highly efficient base model along with a more permissive license.
During the first half of 2023, the software landscape was significantly shaped by the widespread use of APIs (like OpenAI API) to create infrastructures based on Large Language Models (LLMs). Libraries such as LangChain and LlamaIndex played a critical role in this trend. Moving into the latter half of the year, the process of fine-tuning (or instruction tuning) these models is set to become a standard procedure in the LLMOps workflow. This trend is driven by various factors: the potential for cost savings, the ability to process confidential data, and even the potential to develop models that exceed the performance of prominent models like ChatGPT and GPT-4 in certain specific tasks.
LLMs are pretrained on an extensive corpus of text. In the case of Llama 2, we know very little about the composition of the training set, besides its length of 2 trillion tokens. In comparison, BERT (2018) was “only” trained on the BookCorpus (800M words) and English Wikipedia (2,500M words). From experience, this is a very costly and long process with a lot of hardware issues. If you want to know more about it, I recommend reading Meta’s logbook about the pretraining of the OPT-175B model.
When the pretraining is complete, auto-regressive models like Llama 2 can predict the next token in a sequence. However…