Telefonica Tech · Blog · Javier Coronado Blazquez

Javier Coronado Blazquez

Doctor en Física Teórica por la Universidad Autónoma de Madrid especializado en Data Science y Machine Learning. Pasé de investigar materia oscura y rayos gamma a ser un adicto a los datos y su visualización. Actualmente, soy parte del equipo de AI & Analytics de Telefónica Tech, trabajando como Data Scientist y formador en transformación digital de empresas.

AI & Data

Creative AI in business: how to adapt ChatGPT (and similar) to my customer's needs

Generative Artificial Intelligence (GenAI) has gone from being of purely academic interest in the last year to making the front pages of world news, thanks to the democratization of tools like ChatGPT or Stable Diffusion, capable of reaching the general public with free use and a very simple interface. However, from a business point of view, is this wave of attention pure hype, or are we in the early stages of a major revolution? Is GenAI capable of generating new and innovative use cases? The LLM (Large Language Models) era The GenAI is a branch of AI focused on creating new content. We can consider IAG as a new stage of AI: with traditional statistics you can only have descriptive and diagnostic analysis. We can move on to predictive and prescriptive analytics with AI (especially with machine learning techniques), which is able to foresee patterns or situations and offer recommendations or alternatives to deal with them. By using GenAI, we have creative analytics, capable not only of studying existing data but also of generating new information. Although GenAI is applied to all forms of human creativity (text, audio, image, video...), possibly its best-known aspect are the so-called Large Language Models (LLM), especially OpenAI's ChatGPT, which has the honor of being the fastest growing App in history. Rather than using machines' language, we can speak our own natural language with them. 'Traditional' AI (in this field anything more than 5 years old is already traditional) undoubtedly meant a change of mentality for companies, equivalent to the democratization of computing in the 80s and the internet in the 90s. This digital transformation makes it possible to optimize processes to make them more efficient and secure, and to make data-driven decisions, to mention two of the main applications. Do we have equivalent use cases with GenAI? Some of them, proposed in these early stages of GenAI, include: Internal searches for documentation: having large volumes of unstructured information (text), specific information on concepts, strategies or doubts can be consulted in natural language. Until now, we did not have much more than Ctrl+F to search for exact correspondences, which is not very efficient, very error-prone, and only provides fragmentary information. Chatbots: with the precision and versatility achieved by the latest LLMs, chatbots are able to respond much more fully to user questions. This allows the vast majority of problems that customers may have to be solved with great agility, requiring human intervention only in the most complex ones. It is even possible to perform feelings analysis on past responses, to understand which strategies and solutions have been the most satisfactory. Synthetic data generation: from a private dataset, we can swell its volume to provide a larger sample. This is especially useful when obtaining real data requires a lot of time or resources. With IAG, it is not necessary to have advanced knowledge of statistical techniques to do this. Proposal writing: using existing documents, we can generate new value proposals aligned with the company's strategy, according to the type of client, team, type of use case, deadlines, etc. Executive summaries: also from documents, or voice transcriptions (another use of GAI), we can summarize long and complex texts, often of a technical or legal nature, even adapting the style of the summary to our needs. Thus, if we do not have time to attend an important committee, to read 120 pages of a proposal or to understand a new European legislation, the GAI can summarize the most relevant points in a clear and concise way. Personalized marketing: 'Traditional' AI enables customer profiling and segmentation, which facilitates the design of personalized campaigns. Using GAI, marketing can be created on an individual level, so that no two interactions with each customer are the same, and always in line with the company's existing values and campaign style. This opens the door to creating content on a scale that would be impossible to create manually, as with automatically generated YouTube captions. Retraining or adapting... or both It all sounds great, but how can we have an LLM for our company? The most obvious option is brute force: we train one of our own. Unfortunately, it is no coincidence that the best LLMs come from the so-called hyperscalers: OpenAI, Google, Meta, Amazon... training one of these models requires an exorbitant amount of data, time, and resources. ◾ As an example, the training of GPT-4 cost about 100 million dollars, a figure within the reach of very few pockets. On the other hand, as ChatGPT-3 reminds us every time we ask it about something current, its training was done with data up to September 2021, so it does not know anything later. However, its successor GPT-4 did incorporate conversations with GPT-3 as part of the training. This meant that, if a user in one company had revealed secrets to it in the chat of the previous version, another user in GPT-4 could access that information simply by asking for it. As a result of these incidents, many companies have banned its use to prevent leaks of confidential data. Therefore, if we cannot train our own model natively, we have to adopt other strategies to make the LLM work with our data and company casuistry. There are two strategies, fine tuning, and retrieval-augmented generation (RAG). There is no one better than the other, but it depends on many factors. Fine tuning: as its name suggests, the idea is to refine the model, but keeping its base. That is, take an existing LLM like ChatGPT, and train it by incorporating all the data from my company. Whilst it can be somewhat costly, we are talking about orders of magnitude less than the millions of dollars involved in doing it natively, as it is a small training set compared to all the information that the base model has gobbled up. RAG: In this case, all we do is create a database of information relevant to the customer, so that they can ask the LLM questions about it. When faced with a query, the LLM will search through all these documents for pieces of information that seem relevant to what has been asked, will make a ranking of the most similar ones, and will generate an answer from them. Its main advantage is that it will not only tell us the answer, but it can also indicate which documents and which parts are used to create it, so that the information can be traced and verified. Which of the two is more suitable for my use case or customer? As we have said, it depends. We must take into account the volume of data available, for example. If our customer is a small company and is only going to have a few hundred or a few thousand documents, it is difficult to do a satisfactory fine tuning (we will probably have overfitting), and the RAG approach will be much more appropriate. Privacy is also fundamental If we need to restrict the information that employees at different levels or departments can access, we will have to make a RAG, in which different documents are available depending on the type of access. The fine-tuning approach is not suitable, as there would be a potential data leakage similar to the one discussed in GPT. How versatile and interpretable should our model be? This point is crucial, because choosing the wrong strategy depending on the use case can ruin it. If we want our LLM to summarize a document for us, for instance, or to make a mix of several documents to explain something to us, we want it to be literal and traceable. LLMs are very prone to hallucinate, especially when we ask them something they don't know. Thus, if we use a fine tuning strategy and ask them to explain something that is not in any document, they may pull a fast one and start making things up. However, if we have a chatbot for customer service, we need it to be very flexible and versatile, because it must deal with many different casuistries. In this case, fine tuning would be much more suitable than RAG. The refresh rate of our data is also important If we are mainly interested in taking into account past data, knowing that the rate of new data will be very low, it may be interesting to adopt a fine tuning, but if we want to incorporate information with a high frequency (hours, days), it is much better to go for a RAG. The problem is that fine tuning "freezes" the incoming data, so that every time we would like to incorporate new information we would have to re-train it, being very inefficient (and expensive). A compromise between the two approaches There is a compromise between the two approaches, consisting of fine tuning to incorporate a large corpus of past information from the company or customer, and using a RAG on top of this more versatile and adapted model, to consider recent data. ◾ Following the chatbot example, this would allow us to have a virtual assistant capable of dealing with a range of situations based on the history of resolved incidents, and at the same time have updated information on the status of open incidents. Once we have all these factors in mind, we can choose one of the strategies, or combine them, in order to create a customized ChatGPT for different customers, thus generating an attractive value proposition with different use cases. References: https://www.linkedin.com/pulse/rag-vs-finetuning-your-best-approach-boost-llm-application-saha/ https://rito.hashnode.dev/fine-tuning-vs-rag-retrieval-augmented-generation https://neo4j.com/developer-blog/fine-tuning-retrieval-augmented-generation/ https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7 Cyber Security AI of Things Generative AI as part of business strategy and leadership September 20, 2023

August 14, 2024

AI & Data

Ghosts in the machine: does Artificial Intelligence suffer from hallucinations?

Artificial Intelligence (AI) content generation tools such as ChatGPT or Midjourney have recently been making a lot of headlines. At first glance, it might even appear that machines “think” like humans when it comes to understanding the instructions given to them. However, details that are elementary for a human being turn out to be completely wrong in these tools. Is it possible that the algorithms are suffering from hallucinations? Science and (sometimes) fiction 2022 was the year of Artificial Intelligence: we saw, among other things, the democratisation of image generation from text, a Princess of Asturias award, and the world went crazy talking to a machine that had the power to last: OpenAI's ChatGPT. Although it is not the aim of this article to explain how this tool works, as it is outlined in Artificial Intelligence in Fiction: The Bestiary Chronicles, by Steve Coulson (spoiler: written by herself), we can say that, in short, it does try to imitate a person in any conversation. With the added bonus that she might be able to answer any question we ask her, from what the weather is like in California in October to defending or criticising dialectical materialism in an essay (and she would approach both positions with equal confidence Why browse through a few pages looking for specific information when we can simply ask questions in a natural way? The same applies to AI image generation algorithms such as Midjourney, Dall-e, Stable Diffusion or BlueWillow. These tools are similar to ChatGPT in that they take text as input, creating high quality images. Examples of the consequences of mind-blowing Artificial Intelligence Leaving aside the crucial ethical aspect of these algorithms —some of which have already been sued for using paid content without permission to be trained— the content they generate may sometimes seem real, but only in appearance For instance, We can ask you to picture The Simpsons as a sitcom from the 1980s. Indeed, it all seems disturbingly real, even if those images haunt us in our nightmares. Or to generate images of a party. At first glance we wouldn't know if they are real or not, as they look like photos with an Instagram or Polaroid filter. However, as the headline suggests, as soon as we start to look at them more closely we see details that don't quite add up: mouths with more teeth than usual, hands with 8 fingers, limbs sticking out of unexpected places... none of these fake photos pass a close visual examination. Artificial intelligence learns patterns and can reproduce them, but without understanding what it is doing. This is because, basically, all the AI does is learn patterns, but it doesn't really understand what it is seeing. So if we train it with 10 million images of people at parties, it will recognise many patterns: people are often talking, in various postures, holding glasses, posing with other people... but it is unable to understand that a human being has 5 fingers, so when it comes to creating an image with someone holding a glass or a camera, it just “messes up”. But perhaps we are asking too much of the AI with images. If you're a drawing hobbyist, you'll know how difficult it is to draw realistic hands holding objects. Photo: Ian Dooley / Unsplash What about ChatGPT? If you are able to write an article for this blog, you might not make mistakes like that. And yet ChatGPT is tremendously easy to fool, which is not particularly relevant. It is also very easy to fool us without us realising it. And if the results of a web search are going to depend on it, it is much more worrying. In fact, ChatGPT has been tested by hundreds of people all over the world in exams ranging from early childhood education tests to university exams to entrance exams. In Spain, he was subjected to the EVAU (the old university entrance exam) History test, in which he got a pass mark. “Ambiguous answers”, “overreaching to other unrelated subjects”, “circular reiterations”, “incomplete”... are some of the comments that professional correctors gave to his answers. A few examples: If we ask what is the largest country in Central America, it might credibly tell us that it is Guatemala, when in fact it is Nicaragua. It may also confuse two antagonistic concepts, so that if we wanted to understand the differences between the two, it would be confusing us. If, for example, we were to use this tool to find out whether we can eat a certain family of foods if we have diabetes and it gave us the wrong answer, we would have a very serious problem. If we ask him to generate an essay and cite papers on the subject, it is very likely that it will mix articles that exist with invented ones, with no trivial way of detecting them. Or if we ask about a scientific phenomenon that does not exist, such as “inverted cycloidal electromagnon”, it will invent a twisted explanation accompanied by completely non-existent articles that will even make us doubt whether such a concept actually exists. However, a quick Google search would have quickly revealed that the name is an invention. That is, ChatGPT is suffering from what is called "AI hallucination". A phenomenon that mimics hallucinations in humans, in which it behaves erratically and asserts as valid statements that are completely false or irrational. AI of Things Endless worlds, realistic worlds: procedural generation and artificial intelligence in video games August 22, 2022 Androids hallucinate with electric sheep? So, what is going on? As we have said before, the problem is that the AI is tremendously clever at some things, but terribly stupid at others. ChatGPT is very bad at lying, irony and other forms of language twisting. The problem then lies in having a critical spirit and distinguishing what is real from what is not (in a way, as is the case today with fakenews). In short, the AI will not give in: if the question we ask it is direct, precise, and real, it will give us a very good answer. But if not, it will make up an answer with equal confidence. When asked about the lyrics to Bob Dylan's “Like a Rolling Stone”, it will give us the full lyrics without any problem. But if we get the wrong Bob and claim that the song is by Bob Marley, it'll pull a whole new song completely out of the hat. A sane human being would reply “I don't know what song that is”, “isn't that Dylan's”, or something similar. But the AI lacks that basic understanding of the question. As language and AI expert Gary Marcus points out, “current systems suffer from compositionality problems, they are incapable of understanding a whole in terms of its parts”. Platforms such as Stack Overflow, a forum for queries about programming and technology, have already banned this tool to generate automatic answers, as in many cases its solution is incomplete, erroneous or irrelevant. And OpenAI has hundreds of programmers explaining step-by-step solutions to create a training set for the tool. The phenomenon of hallucination in Artificial Intelligence is not fully understood The hallucination in Artificial Intelligence is not fully understood at a fundamental level. This is partly because the algorithms behind it are sophisticated deep learning neural networks. Although extremely complex, at its core it is nothing more than a network of billions of individual "neurons", which are activated or not depending on input parameters, mimicking the workings of the human brain. In other words, linear algebra, but in a big way. CYBER SECURITY Artificial Intelligence, ChatGPT, and Cyber Security February 15, 2023 The idea is to break down a very complicated problem into billions of trivial problems. The big advantage is that it gives us incredible answers once the network is trained, but at the cost of having no idea what is going on internally. A Nature study, for example, showed that a neural network was able to distinguish whether an eye belonged to a male or female person, despite the fact that it is not known whether there are anatomical differences between the two.. Or a potentially very dangerous example, in which a single facial photo classified people as heterosexual or homosexual. Who watches over the watchman? Then, if we are not able to understand what is going on behind the scenes, how can we diagnose the hallucination, and how can we prevent it? The short answer is that we can't right now. And that's a problem, as AI is increasingly present in our everyday lives. Getting a job, being granted credit by a bank, verifying our identity online, or being considered a threat by the government are all increasingly automated tasks. If our lives are going to have such an intimate relationship with AI, we'd better make sure it knows what it's doing. Other algorithms for text generation and image classification had to be deactivated, as they turned out to be neo-Nazi, racist, sexist, sexist, homophobic... and they learned this from human biases. In a sort of Asimov's tale, let's imagine that, in an attempt to make politics "objective", we let an AI make government decisions. We can imagine what would happen then. Although some people point to a problem of lack of training data as the cause of hallucinations, this does not seem to be the case in many situations. Perhaps in the near future a machine will be able to really understand any question. Or not. In fact, we are reaching a point where exhausting the datasphere —the volume of relevant data available— is beginning to be on the horizon. That is, we will no longer have much to improve by increasing the training set. The solution may then have to wait for the next revolution in algorithms, a new approach to the problem that is currently unimaginable. This revolution may come in the form of quantum computing. Perhaps in the near future a machine will be able to really understand any question. Maybe not. It is very difficult and daring to make long-term technological predictions. After all, the New York Times wrote in 1936 that it would be impossible to leave the earth's atmosphere, and 33 years later, Neil Armstrong was walking on the moon. Who knows, maybe in a few decades it will be AI that diagnoses why humans “hallucinate”… Publications: https://www.unite.ai/preventing-hallucination-in-gpt-3-and-other-complex-language-models/ https://nautil.us/deep-learning-is-hitting-a-wall-238440/ https://medium.com/analytics-vidhya/what-happens-when-neural-networks-hallucinate-9bd0d4594943 Featured photo: Pier Monzon / Unsplash

February 20, 2023

Búsquedas recomendadas

Javier Coronado Blazquez

Find out more about us