Retrieval Augmented Generation (RAG): teaching new tricks to old models
OpenAI launched its ChatGPT tool in November 2022, sparking a revolution - well, another one actually - in the world of artificial intelligence. ChatGPT is a model from a family of so-called Large Language Models (LLMs). These models are based on Transformers architectures, which we talked about in this post, and are trained with huge amounts of text —to get an idea, it is estimated that GPT4, the latest version of OpenAI, was trained using 10,000 GPUs uninterruptedly for 150 days—, thus learning to generate text automatically. Such has been the success of LLMs that since the release of ChatGPT, all leading companies have been developing and improving their own LLMs: Meta's Llama 3, IBM's Granite, or Anthropic's Claude. Although these models are very versatile, being able to answer a wide range of questions, from general culture to even mathematical and logical question—something that was initially their Achilles heel—, sometimes we want our model to have knowledge about a very specific domain, which may even be information specific to a particular company. This is where several techniques have emerged to be able to use LLMs, but also “teach” them —note the quotation marks— new knowledge. Retraining our model If I ask any LLM what the capital of Spain is, for example, he will answer that it is Madrid. The model has been trained with a lot of public information —we can think of Wikipedia, for example— among which is the answer to the question we have asked. However, if you ask it what your dog's name is, the LLM will not know that information —unless you are Cristiano Ronaldo or Taylor Swift— because it has not been trained with it. Since training an LLM from scratch is not an option for the average Joe —we can't all afford to have 10,000 GPUs running for 150 days at a time, as expensive as electricity is!— different techniques have been explored to incorporate extra information to LLMs already trained, thus saving time and money, especially money. A first approach, very widespread in the world of Deep Learning in general, is to use Fine Tuning techniques: that is, to take an LLM already trained and retrain it with our particular data. Therefore, for example, we have models such as BloombergGPT, an LLM that, based on OpenAI models, was retrained with a large amount of financial information, thus creating a model specially designed for those working in the world of finance. However, since the models are so large —billions and billions of parameters to be adjusted— this technique is still very expensive. Moreover, sometimes things are not as pretty as they sound, as in many cases BloombergGPT performs only slightly better than ChatGPT — without having any extra knowledge, but the base model that any of us can use— and worse than GPT4, the latest version of the model. Knowing information vs. knowing how to search for information If the reader were asked the birthday of a far—off relative, he or she would probably not know it by heart— at least that is my case, I admit. However, having that information in memory is irrelevant, since we can have it written down in a calendar —for the more analogical ones— or in the cell phone. The same thing happens with LLMs. If we train an LLM with specific knowledge, the LLM will “know” that knowledge, just as we know —I hope— our parents' birthdays. However, we don't need an LLM to know something in order to ask about it, we can give it extra information to elaborate its answer, in the same way that if we are asked something we don't know, we can look up the answer in books, notes, internet, or by calling our friend. So, if I ask ChatGPT about my birthday, it obviously does not know the answer, as we can see in Figure 1. Figure 1: example of ChatGPT response to unknown information. If I however, provide it with additional information about me at the prompt —the text we pass to the LLM— we can see how the answer about the date of my birthday is correct —even ChatGPT indulges in some flourish, as we can see in Figure 2. The LLM does not know the information but, if given a context, it does know how to look it up and use it to work out the answer. Figure 2: Example of ChatGPT response when you provide it with additional information. Finding the right information: RAG systems At this point, the reader might not be very satisfied with the above example. What is the point of asking ChatGPT about my birthday if I must give that information at the prompt myself? It seems rather pointless to ask an LLM something if I must give it the answer myself. However, we could design a system that would automate the search for that context, that additional information that the LLM needs, to pass it to the LLM. And this is precisely what RAG (Retrieval Augmented Generation) systems do. RAG systems consist of an LLM and a database that will store the documents about which we are going to ask questions to the model. So, for example, if we work in an industrial machinery company, we could have a database with all the manuals, both for the machinery itself and for safety, and we could ask questions about any of these topics, which would answer us by taking the information from these handbooks. The operation of the RAG systems — illustrated in Figure 3— is as follows: The user types a prompt formulating their question, which will be converted to a numeric vector through an embedding model. If you are not familiar with this term, don't worry, you can think of embedding as nothing more than the “translation” of a text into the language of computers. This “translation” of the prompt is then taken and compared with as many documents as desired - in this case handbooks - and the most similar fragments are extracted. Here, it must be emphasized that all documents have been previously “translated” with the same embedding model. Finally, the LLM is provided with both the prompt and the information extracted from our vector database, automatically providing it with the necessary information to answer our question. Figure 3: Schematic diagram of an RAG system. Figure 3: Schematic diagram of an RAG system. So, just as we can search for information that is unknown to us, an LLM can do the same if we integrate it into a RAG. We can therefore exploit the text generation capabilities of LLMs, while automating the generation of the context and information needed to answer the questions, we want... as long as we have the necessary document at hand, of course! IA & Data Preparing your data strategy for the Generative AI era March 7, 2024 Image by rawpixel.com / Freepik.
June 17, 2024