Eduardo Fernández García

Eduardo Fernández García

Ingeniero de Telecomunicaciones por la Universidad Politécnica de Madrid con máster en Data Science & Generative AI. Actualmente trabajo como Científico de Datos en Telefónica Tech. Apasionado de las matemáticas, de las nuevas tecnologías y del ciclismo.

AI & Data
Detecting the undetectable: AI tools to identify AI-generated content
In just a few years, Generative AI has gone from being a technological curiosity to something we use every day. From assistants that write emails to chatbots that hold conversations that feel human. AI has radically changed how people work, write and think: what used to take hours can now be done in seconds with a well-crafted prompt. This democratisation of content creation has opened up infinite possibilities. But as with every technological revolution, not everything is positive. The same ease that makes AI such a powerful tool also creates a problem. How can we tell whether a text was written by a human or by an AI? How AI-generated content detectors work This question has important practical implications. When content authenticity matters (for example, when verifying authorship in education), it is essential to ensure that the work is original. That’s why we need reliable tools. And this is where AI-generated content detectors come into play. These are tools that, interestingly, use AI itself to detect its own presence. But how exactly do they work? AI text detectors are based on something that may seem contradictory: using AI to identify content created by other AIs. Although it sounds paradoxical, language models leave distinctive "digital fingerprints" in their texts, patterns that can be recognised and analysed. AI-generated content detectors use artificial intelligence itself to unmask what other AIs have created. Perplexity and burstiness: the keys to detecting AI text To understand how they work, we need to explain two key concepts: The first is perplexity, which measures how predictable a text is. AI tends to choose likely words, generating predictable texts with low perplexity. Humans, on the other hand, are more unpredictable: we use unexpected turns of phrase or less obvious words, and that’s why our texts show higher perplexity. The second concept is burstiness or variability: when we write, we alternate between short and long sentences and between simple and complex ones, while AI maintains a fairly uniform rhythm. AI generates predictable texts with low perplexity, while humans write in a more varied and unpredictable way. In terms of technology, these detectors use transformer-based models, the same technology that powers systems like GPT or Claude. The most common ones are adapted versions of BERT, RoBERTa or DistilBERT, trained on millions of examples of both human and AI-generated text. They also analyse embeddings (mathematical representations of text), since human and AI-generated texts tend to cluster in different areas of the vector space. They also rely on complementary statistical techniques: repetition pattern analysis, vocabulary diversity measurement (humans use more variety), and syntactic complexity analysis. There is also watermarking or digital watermarks, imperceptible patterns that some AIs embed in their texts, although this only works if the text has not been edited afterwards. Are AI text detectors really reliable? None of these techniques is perfect. We are in a kind of technological race: as generative models become better and sound more human, they also become harder to detect. Hybrid texts (partially human-edited) complicate things significantly. And then there are false positives (human writers with very structured styles that resemble AI) and false negatives (AI texts that go undetected). The reality is that these tools work reasonably well in many cases, but they have clear limits. They perform better with long texts than with short ones, and better with general content than with highly specialised texts. In addition, they need constant updates to keep up with new generative models as they emerge. Even the best AI text detection tools are far from foolproof. In fact, the numbers confirm this. The study Testing of detection tools for AI-generated text recently evaluated the accuracy of the main available tools, and the results are revealing: TurnItIn and Compilatio achieve accuracy rates close to 80%, making them the most reliable. How reliable are AI-generated text detectors? Source: Testing of detection tools for AI-generated text. Other tools like Content at Scale or PlagiarismCheck barely reach 50% (equivalent to flipping a coin), while more popular ones like GPTZero or ZeroGPT score between 65 and 70%. This shows that even the best options are still far from infallible. When does it make sense to use these detectors? Despite their limitations, these tools are useful in several sectors. In education, many institutions are integrating them into their academic assessment processes. In journalism and digital media, they are used to manage large-scale content production. In human resources, they are used in selection processes that involve analysing large volumes of applications. They are also gaining relevance in areas where originality is especially critical. In the audiovisual sector, public funding calls for film scripts are beginning to incorporate them in the evaluation phase. In general, they make sense wherever large amounts of text need to be analysed and content authenticity matters, provided they are combined with human judgement. Conclusion Detecting AI-generated content is not an exact science. It is a constantly evolving field that must always be complemented by human judgement and contextual analysis. We cannot blindly trust a percentage from a tool. And now comes the inevitable question: was the article you just read written entirely by a human, or did it have help from an AI? I encourage you to check for yourself using one of the tools we’ve mentioned! AI & Data Green algorithms for AI sustainability October 3, 2024
December 11, 2025
AI & Data
Green algorithms for AI sustainability
When we talk about Artificial Intelligence (AI), we often associate it with sophisticated algorithms that help us solve many of the problems we encounter in our daily lives. We also consider it one of the fundamental pillars of digital transformation. However, it is rarely linked to ethics or sustainability. That’s why today we’re discussing green algorithms and how they help us work more sustainably by improving energy consumption and reducing carbon emissions. What are green algorithms? To begin with, we must understand what green algorithms are. These are algorithms designed in a way that achieves the same results as a more complex algorithm while consuming fewer resources. In short, they are more energy-efficient algorithms with a lower carbon footprint. But how did we reach the point where these green algorithms became necessary? Today, more than 50% of the world’s population has internet access. Billions of people access online content and services daily, demanding a telecommunications infrastructure capable of supporting that load. This demand for services is growing exponentially, and data centers must be prepared to handle it. ⚠️ While this evolution in connectivity and computing power has brought about numerous improvements and significant advances, it has also led to an increase in energy consumption. Energy consumption impact As of March 2024, there were 10,593 data centers worldwide. According to estimates from the International Energy Agency, their global energy consumption is around 460 TWh, accounting for nearly 2% of global electricity demand. It is predicted that these figures will continue to rise, and in a worst-case scenario, they could double by 2026 due to the advancement and development of trends like AI or cryptocurrencies. This projected energy consumption is equivalent to Germany's electricity use. ✅ Each data center consumes about 68,000 liters of water daily to cool its servers, which is equivalent to a person’s annual water consumption. In recent years, alternative data center designs have been tested to reduce cooling costs. CO2 emissions impact In addition to energy consumption, data centers generate significant carbon dioxide emissions. For example: Streaming 30 minutes of video on a platform emits 1.6 kg of CO2, equivalent to driving approximately 10 kilometers. In the cryptocurrency sector, Bitcoin's carbon footprint is comparable to that of New Zealand (36.95 tons of CO2). ✅ Each Bitcoin transaction is equivalent to the CO2 emissions of 750,000 credit card payments. Not only do cryptocurrencies generate emissions, but the electricity consumed globally by cryptocurrencies is equivalent to the Netherlands' electricity consumption. AI's energy impact In this context of increasing energy consumption, advances in computing power also come into play. The surge in machine learning, particularly deep learning models, is increasing the energy consumption and carbon footprint of the cloud industry. Image by the author. Google Flights estimates emissions from a round trip flight between San Francisco and New York are 180 tons of CO2. When compared to the emissions from training certain LLMs systems (Large Language Models), the results are as follows: T5 LLM natural language model accounts for 26% of flight emissions. Meena accounts for 53%. GShard-600B represents 2% Switch Transformers account for 32% GPT-3 from OpenAI, the company behind ChatGPT, exceeds flight emissions by 305%. ✅ According to a study by the University of California, Berkeley, emissions from training GPT-3 reached 552 tons of CO2, with an energy consumption of 1,287 MWh. These figures are comparable to the average energy consumption of an American household over 120 years. In short, it is becoming increasingly evident, both at a personal and business level, that carbon footprint is a global challenge. As a result, many companies are taking steps to become carbon-neutral, making changes to their activities to reduce pollution. This is where green algorithms come into play. Green algorithms: a firm step toward sustainability To track energy consumption during code execution, Python libraries such as CodeCarbon, Eco2AI, and Kiri can be embedded within the code. These libraries calculate the electricity consumed and the CO2 emitted during each execution, providing developers with greater visibility into emissions generated by their code. Green algorithms are those built more efficiently, saving energy and reducing carbon dioxide emissions. Some of these libraries even compare emissions with their equivalents, such as mileage driven in cars, TV hours watched, or average household consumption. In some cases, they even suggest more efficient solutions, like changing the data center location. Conclusion Green algorithms offer a solution to society's sustainability challenge without sacrificing technological progress enabled by AI. Implementing these tools not only reduces environmental impact but also fosters greater awareness of energy consumption and CO2 emissions in the tech sector. Developers and tech companies must adopt these sustainable practices. In addition to reducing emissions and energy consumption, they optimize resources and generate significant cost savings. Thus, the transition toward greener, more sustainable technology is a collective effort requiring all stakeholders' collaboration. By doing so, we not only contribute to the planet's well-being but also unlock new opportunities for environmentally friendly innovations. Telefónica Tech Cloud How do sustainable data centers work and what are their benefits? November 2, 2023
October 3, 2024