Paloma Recuero de los Santos

Paloma Recuero de los Santos

Especialista en generación de contenidos tecnológicos para los canales digitales de Telefónica Tech AI of Things. Licenciada en Ciencias Físicas y Máster en Tecnología Educativa. Apasionada por las “tecnologías para la vida”, las que nos hacen la vida más fácil (que no son todas) y por la pedagogía.
AI & Data
How should you speak to children about Artificial Intelligence?
In previous years, people's relationship with technology in general, and with artificial intelligence in particular, was based on "text", usually using specialised programming languages. Today, however, artificial intelligence has learned to speak and interpret human language. So, even if we talk to an assistant as if she were a person, and we say, "Siri, I want a video of Pepa Pig", at no time are we going to doubt that Siri is not a person. However, we are seeing that, for the little ones of the alpha generation, the limits between themselves and the technology that has always surrounded them are not so clear. Figure 2: "Siri, I want a video of Pepa Pig", Sue Shellenbarger, a columnist for the Wall Street Journal, warns that "many children think that robots are smarter than humans or give them magical powers". A 2018 study on the Cozmo toy robot, a toy designed to appear "to have a soul", showed how children between the ages of 4 and 10 thought the toy was "smarter" than they were, and even that it was capable of feeling. Figure 3: Cozmo Robot (source: amazon.com) In promoting the toy, Boris Sofman, one of the founders of Anki, the company that manufactures them, said: "If you don't play with Cozmo for a week, you will feel like you haven't played with your dog for a week". How can little ones not be confused when their toys are designed this way? Other studies show how children between the ages of 9 and 15 feltemotionally attached to human-looking robots, and thought that they "could be their friends"; or they changed their answers to "is it OK to hit other children" according to their doll's "opinion". "My doll says "it's OK". The Solution As with almost everything else, the best way to help children define the boundaries between technology and reality is through education. Researchers at MIT are working with children of different ages to see how adults can help them perceive artificial intelligence correctly. Although it may seem rushed, and it certainly is, very few four year old children were able to understand that even if the toy beats a game, it is not smarter than they are. The MIT AI ethics course Between the ages of 10 and 14, children begin to develop high-level thinking and deal with complex moral reasoning. And at that age too, most have smart phones with all kinds of AI-based applications. MIT has developed an AI ethics course for children, which teaches them how AI-based algorithms work, and how there can be determined intentions behind the answers. For example, they learn why Instagram shows them a certain ad, or why they may receive one piece of information and not another in their news app. They are also challenged to design an "algorithm" in the form of a recipe for the best peanut butter sandwich or play bingo (AI Bingo). In short, they learn in a simple and fun way, that technology, robots, computers… are nothing more than tools, fast, precise, powerful, but they do nothing more than follow the models, or the algorithms with which we have programmed them. Some simple tips to put into practice Adults are a fundamental reference for children, especially parents. And, without having to take any MIT course (neither they nor we), we can help them understand the limits of AI with these simple tips proposed by Sue Shellenbarger: Do not refer to assistants, robots, or AI-based toys as if they were people It tries to convey a positive image to them about the benefits of AI in general. They make our lives easier in many ways. Arouse their curiosity about how robots are designed and built Help them understand that the "source" of intelligence for AI-based devices is humans Discuss ethical aspects of AI design with questions such as: Should we build robots that, (as we try to teach them), are polite and ask for things please, say hello, thank you etc? It encourages their critical thinking about the information they receive through these toys or smart devices; as well as that received from social networks and the internet. Be very careful with toys that are marketed as a child's "best friend". They can create unwanted dependencies. And most importantly, try to challenge, whenever they arise, ideas such as "machines are superior to humans", "robots will kill humans" etc, because they can be harmful to the naive minds of children. Translated by Patrick Buckley
June 30, 2022
Connectivity & IoT
AI & Data
IoT and Big Data: What's the link?
The digital revolution has changed our lives. To begin with, technological advances were linked to the worlds of scientific research, industrial innovation, the space race, defence, health, private companies etc. But nowadays, everyday citizens can see how technology is changing their daily lives, ways of communicating, learning, making decisions and even getting to know themselves. You no longer have to be a "techie" in order to use words such as "Big Data" and "IoT" in our daily speech. But, do we really know what they mean? What is the IoT? What does it have to do with Big Data? Put simply, IoT is an acronym for Internet of Things. The philosophy that this concept stands on is the connection between the physical world and the digital one, through a series of devices connected to the internet. These devices work like an improved version of our sensory organs, and are capable of collecting a large amount of data from the physical world and transmitting it to the digital world, where we store, process, and use it to make informed decisions about how to act. These decisions can end up being made automatically, since the IoT opens doors to the creation of application in the fields of automation, detection and communication between machines. The data collected by these connected devices is characterized by its great Volume (there are millions of sensors continually generating information), Variety (sensors of all types exist, from traffic cameras and radars, to temperature and humidity detectors) and the Velocity at which is it generated. These 3 V's are the same ones that define Big Data. To these three we can add Veracity and Value. It is said that this data is the oil of the 21st Century, but by itself, it is not very useful. However, if we apply advanced Big Data analytics we can identify trends and patterns. These " insights" carry great value for companies since that can help them to make decisions based on data, what we at LUCA call being " Data Driven". The application of IoT has two very different aspects. On one hand, that which is related to consumers, comprising of applications aimed at creating smart homes, connected vehicles and intelligent healthcare. On the other hand, the business-related uses, those application relating to retail, manufacturing, smart building, agriculture etc. Which elements make up the IoT? The Internet of Things is made up of a combination of electronic devices, network protocols and communication interfaces. Out of the devices, we are able to distinguish three different types: Wearable technology: any object or item of clothing, such as a watch or pair of glasses, that includes sensors which help improve its functionality. Quantifying devices for people's activity: any device designed to be used by those who want to store and monitor data about their habits or lifestyle. Smart homes: any device that allows you to control or remotely alter an object, or that contains motion sensors, identification systems or other measures of security. Industrial devices: any device that allows you to turn physical variables (temperature, pressure, humidity etc.) into electrical signals. Figure 1 : A graphic representation of the Internet of Things. These devices can reach certain levels of intelligence. At a most basic level, we see devices which are simply able to identify themselves in a certain way ( identity), but then we move on to devices that can define where it is ( location). Further still, there are devices which can communicate the condition that it is in ( state) and those that can analyze their environment and carry out tasks based on certain criteria. These intelligence levels translate into a series of capabilities: Communication and Cooperation: being able to connect either to the internet and/or other devices, therefore being able to share data between themselves and establish communication with servers. Direction: the ability to be condivd and located from anywhere on the network. Identification: the ability to be identified via technology such as RFID (Radio Frequency Identification), NFC (Near Field Communication), QR (quick response) code and more. Localization: being able to know its own location at any moment. Intervention: the ability to manipulate its environment. With regard to protocols, we already know that in order to connect to the internet, we need TCP/IP (Transmission Control Protocol / Identification Protocol). The first steps in terms of the IoT were made using the fourth version (IPv4). But this brought an important limitation, since the number of addresses that it could generate was reduced. From 2011 onwards, the Internet IPv6 communications protocol was designed, which permits an infinite number of addresses. This meant that the IoT could develop since, according to Juniper Research, by 2021 there will be over 46,000 connected devices, sensors and actuators. As well as the protocols, a connection interface is needed. On the one hand, there are wireless technologies such as WiFi and Bluetooth. On the other, we have wired connections, such as IEEE 802.3 Ethernet (which means you can set up a cabled connection between the IoT device and the internet) and GPRS/UMTS or NB-iot which use mobile networks to connect to the internet. These last ones, at their cost, are usually used for devices where a low level of data consumption is expected, such as garage door opening systems or rangefinders in solar fields. The curious relationship between IoT and toasters. A short history lesson... In 1990, John Romkey and Simon Hacket, in response to a challenge launched by Interop, presented the first device connected to the internet; a toaster. You could use any computer connected to the internet in turn it on, off and choose the toasting time. The only human interaction needed was to put the bread in. The following year, of course, they added a small robotic arm to fully automate the process. Curiously, in 2001, another toaster became a protagonist in the history of the IoT, when Robin Southgate designed one capable of collecting weather patterns from online and "printing" them onto a slice of toast. Figure 2 : The first connected device was a toaster. Although Romkey and Hacket's toaster is often referred to as the first IoT device, the true first actually came much early. In the 70's, the Department of Computer Science at Carnegie Mellon connected a CocaCola machine to the department server via a series of microswitches, meaning that before "taking the walk" to the vending machine, you could check on your computer whether there was stock left and whether the bottles were at the right temperature (since it knew how long they had been stored for). Although this device wasn't technically connected to the internet (since the internet was still in development), it certainly was a "connected device". Moving on from toasters and vending machines, the Auto-ID Center at the Massachusetts Institute of Technology (MIT), played a crucial role in developing the IoT, thanks to their work on Radio-Frequency Identification (RFID) and the new detection sensor technology they developed. In 2010, thanks to the explosive growth of smartphones and tablets, and the falling price of hardware and communications, the number of connected devices per person exceeded 1 for the first time (1.84). It should be noted that there was not an even distribution at a global level. The challenges and obstacles that the IoT faces The rapid innovation that we see in this area brings together a diverse collection of different networks, designed for different and specific purposes. Therefore, one of the main challenges for the IoT is to define common standards that allow these various networks and sensors to work together. On the other hand, each day we have new technological advances in miniaturization, where components are becoming more powerful and efficient. However, there is something that slows this progress down: energy consumption, and in particular, autonomous batteries. When you are talking about a connected device for personal use, such a smartwatch, it can be somewhat frustrating when you have to keep recharging it, but it's not a huge issue. However, with devices that are located in remote locations, it is vital that they work well with just one charge. in order to solve this problem, research is being done into devices that harness energy from their surroundings. For example, water level sensors that can recharge their batteries with solar energy. In conclusion The IoT significantly increases the amount of data available to process, but this data doesn't become useful until they are collected, stored and understood. It is at this point when Big Data comes into play, with its ability to store and process data at a massive scale. We add to this the falling price and rising availability of connected devices. This will lead to an explosion of revolutionary applications that can create " smart cities", help us efficiently use energy, lead to a more comfortable lifestyle where tasks are done for us, mean that medical diagnoses can be more precise and even offer us information from space. The Internet of Things and Big Data are two different things, but one would not exist without the other. Together they are the true internet revolution.
January 24, 2022
AI & Data
The 2 types of learning in Machine Learning: supervised and unsupervised
We have already seen in previous posts that Machine Learning techniques basically consist of automation, through specific algorithms, the identification of patterns or trends which “hide” in the data. Thus, it is very important not only to choose the most suitable algorithm (and its subsequent parameterisation for each particular problem), but also to have a large volume of data of a sufficient quality. The selection of the algorithm is not easy. If we look it up on the internet, we can find ourselves in an avalanche of very detailed items, which at times, more than helping us, actually confuse us. Therefore, we are going to try and give some basic guidelines to get started. There are two fundamental questions which we must ask ourselves. The first is: What is it that we want to do? To respond to this question, it may come in handy to reread two posrs that we posted earlier in our LUCA blog, “The 9 tasks on which to base Machine Learning”, and “The 5 questions which you can answer with Data Science”. The crux of the matter is to clearly define the objective. To solve our problem, then, we will consider what kind of task we will have to undertake. This may be, for example, a classification problem, such as spam detection or spam; or a clustering problem, such as recommending a book to a customer based on their previous purchases (Amazon's recommendation system). We can also try to div out, for example, how much a customer will use a particular service. In this case, we would be faced with a regression problem (estimating a value). If we consider the classic customer retention problem, we see that we can address it from different approaches. We want to do customer segmentation, yes, but which strategy is best? Is it better to treat it as a classification problem, clustering or even regression? The key clue is going to be to ask us the second question. What information I have to achieve my objective? If I ask myself, "My clients, do they group together in any way, naturally?", I have not defined any target for the grouping. However, if I ask the question in this other way: Can we identify groups of customers with a high probability of requesting the service to be stopped as soon as their contract ends, we have a perfectly defined goal: whether the customer will deregister, and we want to take action based on the response we get. In the first case, we are faced with an example of unsupervised learning, while the second is supervised learning. In the early stages of the Data Science process, it is very important to decide whether the "attack strategy" will be monitored or unsupervised, and in the latter case define precisely what the target variable will be. As we decide, we will work with one family of algorithms or another. Supervised Learning In supervised learning, algorithms work with "labelled data", trying to find a function that, given the input data variables, assigns them the appropriate output tag. The algorithm is trained "historical" data and thus "learns" to assign the appropriate output tag to a new value, that is, it predicts the output value. For example, a spam detector analyses the history of messages, seeing what function it can represent, depending on the input parameters that are defined (the sender, whether the recipient is individual or part of a list, if the subject contains certain terms etc), the assignment of the "spam" or "not spam" tag. Once this function is defined, when you enter a new unlabelled message, the algorithm is able to assign it the correct tag. Supervised learning is often used in classification issues, such as digit identification, diagnostics, or identity fraud detection. It is also used in regression problems, such as weather predictions, life expectancy, growth etc. These two main types of supervised learning, classification and regression, are distinguished by the target variable type. In classification cases, it is of categorical type, while in cases of regression, the target variable is numeric. Although in previous posts we spoke in more detail about different algorithms, we have already moved forward with some of the most common: 1. Decision trees 2. Classification of Naïve Bayes 3. Regression by least squares 4. Logistic Regression 5. Support Vector Machines (SVM) 6. "Ensemble" Methods (Classifier Sets) Unsupervised Learning Unsupervised learning occurs when "labelled" data is not available for training. We only know the input data, but there is no output data that corresponds to a certain input. Therefore, we can only describe the structure of the data, to try to find some kind of organization that simplifies the analysis. Therefore, they have an exploratory character. For example, clustering tasks look for groupings based on similarities, but there is no guarantee that these will have any meaning or utility. Sometimes, when exploring data without a defined goal, you can find curious but impractical spurious correlations. For example, in the graph below, published on Tyler Vigen Spurious Correlations' website, we can see a strong correlation between per capita chicken consumption in the United States and its oil imports. Figure 1: Example of a spurred correlation Unsupervised learning is often used in clustering, co-occurrence groupings, and profiling issues. However, problems that involve finding similarity, link prediction, or data reduction can be monitored or not. The most common types of algorithms in unsupervised learning are: 1.Clustering algorithms 2.Analysis of major components 3.Decomposition into singular values (singular value decomposition) 4. Independent Component Analysis Which algorithm to choose? Once we are clear whether we are dealing with a supervised or unsupervised learning case, we can use one of the famous "cheat-sheet" algorithms (what we would call "chop"), to help us choose which one we want to start working with. We leave as an example one of the most well-known, the scikit-learn. But there are many more, such as the Microsoft Azure Machine Learning Algorithm cheat sheet. Figure 2: “Chop” algorithm selection from Scikit-learn So, what is reinforcement learning? Not all ML algorithms can be classified as supervised or unsupervised learning algorithms. There is a "no man's land" which is where reinforcement learning techniques fit. This type of learning is based on improving the response of the model using a feedback process. They are based on studies on how to encourage learning in humans and rats based on rewards and punishments. The algorithm learns by observing the world around it. Your input information is the feedback you get from the outside world in response to your actions. Therefore, the system learns from trial and error. It is not a type of supervised learning, because it is not strictly based on a set of tagged data, but on monitoring the response to actions taken. It is also not unsupervised learning, since when we model our "apprentice" we know in advance what the expected reward is. You can also follow us on Twitter, YouTube and LinkedIn
December 2, 2021
AI & Data
The risks of not having controlled exposure to information (III)
Finally comes the last and long-awaited post in this series on the risks of uncontrolled information overexposure. As we saw in the previous post, we know how to minimise the risks of our digital footprint, but now we need to know how to remove existing information. Practical resources for the removal of information In recent years, with the entry into force of the General Data Protection Regulation (GDPR), the trend in digital services has been towards trying to preserve the protection of the privacy of citizens and users on the Internet. For this reason, an effective method for deleting our online accounts and associated information is to review the service's privacy policy and find a contact or form to which we can direct our intention to exercise our right of deletion. This right corresponds to the data subject's intention to request the data controller to delete his or her personal data, provided that the personal data are no longer necessary for the purposes for which they were collected. To do so, we must send a letter, for example this model from the Spanish Data Protection Agency, adding our intention to delete our account or associated service. Similarly, even if we believe that the information collected by different sites is "public", we can almost always choose to request the removal of our information. This applies to services where, although the information is public, they are making a financial profit from the collection, or simply present the information in a structured form. We can choose, for example, to search for ourselves in people indexing tools such as Pipl, and remove our information in the next section. Likewise, in the case of Have I been Pwned, through its Opt-out section, we can prevent access to see in which information leaks our compromised email is found. One of the most direct ways to remove personal information displayed among Google's results is to contact the owner of the site where they appear directly. Google also offers a form to remove personal information, and to stop indexing pages where personally identifiable information (PII) appears, such as financial information, sensitive personal or health data, contact addresses, or our handwritten signatures, among others. If we have already removed different profiles of ours, or information that we had displayed on pages where we are not the owners, the next step is to inform Google to stop indexing the link, indicating that it is obsolete content and is no longer available. To do this, the search engine provides the user with a tool to remove obsolete content. Finally, it should be noted that these measures are intended to control the exposure of information and minimise the risks associated with it, without forgetting the following premise: It is not about not having information exposed about us, but about having a controlled exposure.
February 19, 2021
AI & Data
5 reasons why everyone wants to learn Python
Python is currently the most popular programming language. Many different people from children, students, teachers, researchers of all kinds (Social Sciences, Biology, Medicine, Economics...), experts in Finance, Insurance, Marketing, developers, analysts or data scientists learn it and use it in their fields of interest. In today's post, we will try to explain why. The most popular programming language at the moment. How do you measure the popularity of a programming language? It depends on who you ask. Developers do it by calculating the number of questions asked about it on websites like StackOverflow. These are websites where people can ask questions and share knowledge with the community. Thus, although JavaScript continues to be the language with the highest number of questions accumulated since the creation of Stack Overflow, Python has become the language that has sparked the most interest so far this year. Projections of future for major programming languages (Image Credit: Stack Overflow) (The more "purist" developers prefer to consult indexes such as PYPL or rankings such as IEEE's). And for non-programmers, I'm sure this quote from The Economist will give you an idea: "In the last 12 months, in the US, there have been more Google searches for Python than for Kim Kardashian." "Python has brought computer programming to a vast new audience". Is Python really more popular than Kim Kardashian in the USA?. Well, it seems so 😉 We will see some of the reasons, but first of all, let's explain what Python is. Python in a nutshell To "warm up", let's start with a curious fact: Do you remember "Life of Brian", the famous "Spam" spot? The name Python has no zoological connotation but is a tribute by its author to the unforgettable English comedy group Monty Python, protagonist of one of the best comedies in the history of cinema. Image of the Monty Python group. Let's get back to being serious and start with a technical definition that we will be "unpacking" little by little: “Python is an open-source, multi-paradigm, but mainly high-level object-oriented interpreted programming language. Its syntax emphasises code readability, which makes it easy to debug and therefore favours productivity. It offers the power and flexibility of compiled languages with a gentle learning curve” By The people from the Tango! project. From this definition, we will analyse the 5 reasons why Python has gained so much popularity in the last few years. 1. Python is an interpreted language Python was created by Guido Van Rossum in 1991 as a general-purpose interpreted programming language. What does it mean for a programming language to be interpreted? Low-level languages, such as machine or assembly language, can be run directly on a computer. High-level languages, such as Java, C, C++ or Python itself, on the other hand, have to be reinterpreted (compiled) as low-level languages before they can be executed. This usually results in slower execution times. Nowadays, however, this is not a problem as advances in cloud computing make customised computing capabilities available at very affordable costs. How well the code is optimised also plays a role. With Python, programming is easy Programming in machine code is expensive and difficult. Python offers a syntax that is much simpler and closer to human logic. More readable code is easier to generate, debug, and maintain. As a result, the learning curve for interpreted languages is much smoother. 2. Python is powerful, flexible and versatile Let's see why this power and versatility. There is no shortage of arguments. It is a general-purpose language Being a general-purpose language, and not created specifically for web development, Python allows you to create all kinds of programs and tools. It is compatible with other programming languages Its interoperability with other programming languages such as C, Java, R, etc., is another factor that has helped its widespread use in different fields. It allows you to work with different programming models In Python, everything is an object. However, although it is mainly an object-oriented language, it combines properties of different programming models or paradigms (imperative, functional, procedural or reflexive). It offers libraries and environments specialised in a wide range of topics. On the other hand, Python offers very powerful libraries and development environments for Machine Learning, Science, data visualisation etc. For example: Mathematicians and scientists use SciPy and NumPy in their research Linguists analyse texts with NLTK (the Natural Language Tool Kit), Statisticians use Pandas to analyse data. IT teams set up and manage resources in the cloud with OpenStack Developers use Django to create web applications etc. It is the language of reference in Data Science and Machine Learning. In fact, it has become the reference language in Data Science, being the language of choice for 57% of data scientists and developers. If we take into account the evolution in the last two years of Python environments for Deep Learning, including the creation of Tensorflow and other specialised libraries, it will come as no surprise that it has left behind other languages such as R, Julia, Scala, Ruby, Octave, MATLAB and SAS. It is the language of reference in Education The fact that it is such a simple language that it can be used by beginners to professional programmers has also made it the programming language of excellence in educational environments. And not only because of its simplicity, but also because it can be run on different operating systems (Microsoft Windows, Mac OS X, Linux, or using the corresponding interpreter). It is also accessible through web services such as Python Anywhere. This is especially important for the education sector because it can be used from computers in school classrooms, or even at home, without the need to install additional software. Thanks to this, Python has been at the centre of several very interesting educational projects, such as the ones we will see below. In 2015, the BBC launched the MicroBit project. This is a small programmable device, micro:bit, which aims to inspire a new generation of creators, makers and coders, aimed at children from 11 years of age. Other projects, such as the MicroPython project, allow you to work with other small devices, such as Raspberry Pi, and can be used as the basis for many interesting and entertaining electronics projects to control screens, speakers, microphones, motors, etc. You can even create simple robots. In short, Python can be used to create all kinds of tools, can be run on different operating systems, is compatible with other programming languages and offers libraries and frameworks specialised in different areas of knowledge. 3. Python is a free software project. The Python Project was born as a free software project. Until very recently, it was still run by its creator, Guido van Rossum, who, in another nod to Monty Python, was the "Benevolent dictator of life" of the PSF for almost three decades. Guido van Rossum By Alessio Bragadini - originally posted to Flickr What characterises free software? Free software is not necessarily always free (although Python is), but is characterised by the scrupulous respect of the so-called "4 freedoms": The freedom to use the program, for any purpose. (freedom 0) The freedom to study how the program works, and adapt it to your needs. (freedom 1) The freedom to distribute copies, so you can help your neighbour. (freedom 2) The freedom to improve the program and make the improvements public to others, so that the whole community benefits. (freedom 3) For freedoms 1 and 3 to be possible, it is necessary for users to have access to the source code of the programs. In short, free software are those programs that once obtained can be freely used, copied, studied, modified and redistributed. Therefore, the "freedom" of software is related to the permissions that its author offers and not to its price. Python is released under the Python Software Foundation License. The PSF is a non-profit organisation, which was created in 2001 with the aim of managing the project (development, rights management, fundraising, etc.). And it is compatible with the GNU GPL (GNU General Public License, from version 2.1.1). 4. Python is an open source language. In addition to being free, Python is an open source language, which is similar, but not the same thing. According to Richard Stallman, both free software and open source pursue a common goal: to give greater freedom and transparency to the software world. However, they differ in the way they go about it. Free software is defined by its ethics. Not only programs whose code is open source are considered free software, but all programs that respect the four essential user freedoms defined by the Free Software Foundation (1985). The concept of open source software emerged in 1998, when the OSI (Open Source Initiative) was created as a split from free software. In this case, instead of the 4 freedoms of free software, 10 requirements were defined, which a software must fulfil in order to be considered open. The main difference between the two types of software is subtle. Free software prioritises ethical aspects, while open source software prioritises technical aspects. Therefore, it is usually less strict, so that all free software is also open source, but not necessarily the other way around. In any case, Python is free and it is open. And therein lies one of the keys to Python's success: the Python community. It is a large and very active community, which contributes to the development and improvement of the source code, according to the needs and demands of users. Although many companies and organisations, such as Google, Microsoft or Red Hat, make extensive use of this language and have an influence on its evolution, none of them have any control over it. Its free and open character has undoubtedly also facilitated the versatility, flexibility and power mentioned in the previous point, since in addition to all of the above, Python is a multiplatform language. That is, we can run it on different operating systems such as Windows or Linux simply by using the corresponding interpreter. 5. And also... it's free As we mentioned before, despite the confusion that may arise from the fact that, in English, "free", in addition to being free, means free, free software does not necessarily have to be free. However, it can be affirmed that to program in Python, it is not necessary to pay any kind of licence fee. However, we must never forget that any code not developed by ourselves may be subject to some kind of licensing. Conclusion In short, Python's simplicity, versatility and power have made it the all-rounder programming language that can help boost the digital literacy of broad sectors of the population, making programming accessible to people and professionals of all kinds. What about you? Do you dare with Python? You can read the original post in Spanish here. To keep up to date with Telefónica’s Internet of Things, visit our website or follow us on Twitter, LinkedIn and YouTube
February 5, 2020
AI & Data
Artificial Intelligence to fight pandemics
Did you know how Artificial Intelligence can help us fight pandemics like the 2019-nCoV coronavirus? In today's post we tell you how it can become a great tool for national and international health authorities. The coronavirus 2019-nCoV Canadian start-up BlueDot alerted its clients to the outbreak of the epidemic in the Chinese city of Wuhan a week before its detection by US health authorities and the World Health Organization. Coronavirus, still unclassified, has infected 7700 people in just one month, and is responsible for the death of 170. All this in spite of the fact that the world's largest collection of strategic resources are in place to stop the spread. Mobility and transport restrictions have affected 60 million people in an attempt to curb the spread of the epidemic. Despite this, WHO has been forced to declare an international emergency. According to the International Health Regulations, the declaration of an emergency must be made when an event "constitutes a risk to the public health of other States through the international spread of a disease". This "may require a coordinated international response". What is a virus? Viruses are much smaller microorganisms than bacteria. We can only see them through an electron microscope. To multiply, they need a host. That's why they emerged on our planet at the same time as living things. It's estimated that there are about 2 million different species of viruses on Earth. They are, therefore, the most numerous and diverse living beings that exist. On the other hand, our planet is populated by 7.7 billion people, in continuous population growth. We live in a globalized world and, as a consequence, the flow of people, animals and goods even between continents, is permanent. And with them, viruses travel. For example, the seasonal flu virus jumps from one continent to another, taking advantage of the movements of migratory birds. Others, like HIV, which affected monkeys, leave the natural niches in which they were confined, and begin to spread among humans. How to fight the viruses? To fight them, it is essential to develop antiviral and antibody-based therapies capable of stopping the infection in affected people. It is also very important to use preventive vaccines among health workers or those at risk of infection, in order to prevent its progression. Therefore, early detection of new diseases is key in the fight against global pandemics. It is at this point that artificial intelligence can help us fight viruses. Artificial intelligence against viruses BlueDot was founded in 2014 to provide health workers with early warnings to more effectively identify and treat people who may suffer from epidemic infectious diseases, thus helping to slow their spread. To do this, they rely on natural language processing techniques that analyze 100,000 articles in 65 languages to track information on more than 100 infectious diseases everyday. Their machine learning algorithms scan news, official reports, blogs and forums related to diseases that can affect people, animals and plants. They also analyze weather or mobility information, which allows them to predict the next "jump" of the disease. For example, the algorithm correctly predicted that coronavirus would jump from Wuhan to Bangkok, Seoul, Taipei and Tokyo in the days following its initial appearance. BlueDot's epidemiologists analyze this information and issue reports that alert their clients (companies, governments, NGOs). They will then forward them to health centers and public health officials in a dozen countries. In this way, they help them to react more quickly and efficiently to these emergencies. Early detection is key in the fight against pandemics and, as we can see, Artificial Intelligence can be a great ally. To stay up to date with LUCA, visit our Webpage, subscribe to LUCA Data Speaks and follow us on Twitter, LinkedIn o YouTube.
February 3, 2020
AI & Data
Natural artificial intelligence: not-so-opposite antonyms
What do we mean by “natural"? Some words are so rich in meaning that the dictionary of the Spanish Royal Academy (RAE) lists up to 17 different definitions. And "natural" is one of them. Because although at first it may seem a bit contradictory to associate "natural" with "artificial intelligence", we’ll see that, in reality, they are perfectly compatible. Indeed, the first thing that comes to mind when we hear the word "natural" is something "pertaining or related to nature", the first meaning listed in the RAE. However, something natural can also be something simple, logical, expected, spontaneous, normal... The goal of any technology is to help us overcome—through tools—the challenges imposed by the complexity of our environment or by our own physical limitations. Humans are not the only living beings to develop technology. But human technology is, arguably—with the possible exception of the arts—the most astonishing thing we’ve ever created. Our senses Sight, touch, taste, hearing, smell… our senses are our “windows” to the world. They are complex systems of “sensory perception” that detect changes in the physical properties of our surroundings. These stimuli may be caused by pressure waves (touch, hearing), electromagnetic waves (sight), or thermal changes (touch), as well as by chemical substances (taste and smell). Our nervous system transmits all this encoded sensory information to the center of its processing: our brain. As we mentioned in this other post, Artificial Intelligence: a “very human” technology, biologically speaking, we are not particularly special. We’re not especially strong, fast, or even resilient. However, our intelligence and our social skills have enabled us to develop sophisticated technologies like the Internet of Things and Artificial Intelligence, which help us overcome more and more barriers every day. Thanks to them, we’re no longer as limited by the perception range of our sensory organs or by the processing capacity of our brain: every day, we go a little further. And not only that, every day we make more effort to make technology more human and accessible. So that the complexity enabling so many things does not reach the end user, and instead, interacting with technology feels increasingly natural. A real-life example Many may remember that hilarious video where a medieval monk calls tech support (medieval helpdesk) because, used to working with scrolls, he didn’t know how to use the cutting-edge innovation of the time: the book. It reminds us how many things we take for granted, that seem “natural” to us, do so because they are so simple and convenient that they’ve become second nature. In the early 2000s—not that long ago—there was a new revolution in this same space. The first e-book readers evolved into touch screens, and to turn the page you simply had to tap the right margin with your finger. This simple gesture, which was initially tricky to learn—just like it was for the medieval monk in the video—now comes “built-in” with new generations. It feels natural (it's just so easy!). That’s why we do it so instinctively that we can’t help but laugh at ourselves when we absentmindedly tap the margin of a paper book. The opposite of artificial intelligence isn’t natural—it’s unnecessarily complicated… or having to get off the couch. That’s why we welcome anything that makes life easier. Like changing the TV channel. At first, you had to peel yourself off the couch or argue over who would get up to change it. Then, we didn’t need to get up anymore, but our living rooms filled with multiple remote controls—or complex multi-device remotes that weren’t always easy to use. Today, you don’t need to argue with your sibling or get frustrated with a far-from-“intuitive” remote control. It’s as natural and simple as saying “Aura, put on the basketball game”, and that’s it. The beauty lies in the fact that, despite the incredible technological complexity required for this to happen, the ultimate goal—making life easier and more comfortable in a way that feels natural—has undoubtedly been achieved thanks to artificial intelligence. “Artificial is natural”
October 2, 2019
AI & Data
An introduction to Machine Learning: "What are Insights?"
Content originally written in Spanish by Paloma Recuero de los Santos, LUCA Brand Awareness. Within LUCA, we often talk of " Insights". Our tools are designed to obtain valuable "Insights", or "Actionable Insights" that allow a company to make better decisions based on data. But, what exactly does the word "Insight" mean? Figure 1: A spiral representing an Insight as a discovery. What is an insight? The Collins dictionary gives the following definition: insight /ˈɪnˌsaɪt/ noun 1. The ability to perceive clearly or deeply; penetration 2. A penetrating and often sudden understanding, as of a complex situation or problem As we can see, both definitions talk about perceiving something clearly, of understanding and also hint at a "complex" situation or problem. By looking at the etymology of the word, we get another interesting definition: Word Origin and History for insight c.1200, innsihht, "sight with the eyes of the mind," mental vision, understanding," from in + sight. Sense shifted to "penetrating understanding into character or hidden nature" (1580s). The phrase " sight with the eyes of the mind" is vision based on understanding, where expert eyes are able to see crucial things hidden in data, such as behaviour, tendencies and anomalous events. Artificial Intelligence, and crucially Machine Learning, allow us to apply such a "vision" in order to obtain a deep understanding of volumes of data that would be impossible to analyze manually. Such technologies make it possible to compare the data, order it, put it in context and convert it into " Insights". These Insights can then be translated into concrete actions that form the foundations of a business strategy. It is no longer good enough to justify given decisions using data, instead strategy should be based on such Insights. Figure 2: The process of turning data into Insights. Therefore, "Data-Driven" companies are those that speak the language of data, and are as such capable of making more intelligence decisions, based on their own data as well as other sources that are freely available. One of the challenges for non-English speaking countries is finding a word that conveys the meaning of the word "Insight". In Spain, for example, the word "clave" (key) can be used but it doesn't quite encapsulate everything that "Insight" does. As such, the English word is often used. This is a fairly common occurence; technical terms such as Big Data, Machine Learning and Data Science are often kept in the original language. Don't miss out on a single post. Subscribe to LUCA Data Speaks.
March 6, 2019
AI & Data
Artificial Intelligence: a "very human" technology
What makes us human? The answer to this question is not simple, it's not easy to define what makes us human.We can try to find the answer from a scientific point of view, from a philosophical one, or choose an area in between. All of these areas lead us to ask ourselves questions such as: Is it language that makes us human? Is it our spiritual side? our creative or cognitive abilities? the fact that we are social beings? What is clear is that humans are social beings, with all that this implies in terms of language, collaboration, transmission of culture and knowledge, empathy and much more. We have a brain that, thanks to its extraordinary plasticity, has allowed us to extend our cognitive abilities far beyond our neurons and develop a culture and technology thanks to which we have managed to adapt, to survive all kinds of changes in our environment to the point of even becoming able to modify it. What would those homo Sapiens that fought to survive in the African Savannah 300.000 years ago say, if they knew that one day they would be the ones capable of populating an entire planet! And not only that, but that they would be able to move quickly by land, sea and air, multiply their life expectancy and send missions to outer space and create systems based on Artificial Intelligence that would convert them into “superhumans”? Superhumans? It is funny to talk about “Superhumans”, because the truth is that from a biological perspective we are nothing too wonderful. We are not especially strong or fast, we do not have "eagle eyesight", nor are we especially resistant ... However, our intelligence, and our social skills have allowed us to develop technologies to eliminate barriers, and overcome our biological limitations. This is how we started to tame and domesticate animals that allowed us to be "stronger and more resilient", move faster, and then to even "fly" with vehicles we created ourselves; also to create machines that allowed us to produce more food, and machines capable of working in conditions that humans would not be able to resist... Once we have overcome our physical limitations, we have taken a step further. Current technologies have allowed us to connect our physical environment with the digital space through the Internet (yes, Internet of Things or IoT), and offer us an improved version of our sensory organs through a large number of sophisticated, ubiquitous and today, already affordable devices that offer us valuable information about our environment. Finally, leaving aside the physical and sensorial barriers, why not the cognitive ones? In order to overcome the latter, progress has been necessary on different fronts. On one hand, the most tangible thing, the development of the transistor, the integrated circuits and the data storage devices, has allowed us to have necessary hardware, at an affordable price. The reduction in price and the availability of suitable hardware has allowed the development of Big Data technologies, such as Hadoop, which allows large volumes of information to be captured, stored and efficiently processed. These technologies are what has made the "Golden Age"of Artificial Intelligence possible, which is not something new (it actually emerged in the 1950's), but has experimented spectacular growth in the last couple of years, among other accelerators, these technologies. Fundamentally, AI is based on the idea of getting a computer to solve a complete problem in the same way as a human would. Thus, in the same way as in the Neolithic period humans began to domesticate animals and learned to take advantage of their strength and resistance to cultivate their fields more efficiently, today we use Artificial Intelligence in so many areas of human activity, sometimes we are not even aware of it. Not only it is easy to identify in robots that are used in heavy industry, or in autonomous cars, but AI is also used to diagnose diseases, organise staff rotations and assign hospital beds, to make decisions and perform high-speed stock trading, to support users as virtual assistants, optimize the emergency aid that reaches populations displaced by natural catastrophes, discover exoplanets, to control epidemics, to improve sports performance, detect trends and "feelings" in social networks, offer personalized offers, dynamic prices, perform preventive maintenance of all kinds, optimize consumption, automatically translate any language ... The list is never-ending, but the important thing is that technologies based on Artificial Intelligence allow us to perform virtually any task with much more efficiency than we would with our (limited) human capabilities. It's as if Artificial Intelligence gave us superpowers. Overcoming our biological and cognitive limitations…Superpowers? Almost everyone likes superheroes; fictional characters capable of overcoming classic heroes thanks to their superhuman powers. Many of them emerged in the late 1930s in the American comic industry, and were later adapted to other media, especially film. The character of Superman, created by the American writer Jerry Siegel and the Canadian artist Joe Shuster in 1933 was one of the first. Let's remember it’s story a bit to set us up. Superman was born on the planet Krypton. Shortly before the destruction of his planet, when he was still a child, his parents sent him in a spaceship to Earth to save him. There he was found by the Kents, a couple of farmers in Smallville, Kansas, and raised with the name of Clark Kent, who imparted on him a strict moral code. Soon, young Kent begins to discover his superhuman abilities, his superpowers, where upon reaching adulthood, he would decide to use for the benefit of humanity. And what are superman’s powers? Despite changing over the years, we more or less remember his great speed (“faster than a bullet”), his super strength (“more powerful than a locomotive”), his super- vision (“X-Ray, Infrared”), and above all, his ability to fly. The image of Superman flying over the city with his red cape fluttering in the wind, still forms part of the collective image of those who first saw the film at the end of the 70s. Figure 2. A Superman comic Why do we love Superheroes? The thing we love the most, of course, is their superpowers, their ability to do things that are inaccessible to the rest of us mortals. Also, the mythical aura that gives them their vocation to "do good", to work for the good of humanity. And if we think about it, that is precisely what Artificial Intelligence allows us to do. It allows us to overcome our human limitations, and yes, gives us “superpowers”. For us, if we were to have an elevated ethical sense like Superman, we would choose to use them for good. On the other hand, we could also use them exclusively for our own benefit, even with the risk of harming others and joining the long list of “Supervillans”. For this reason, it’s crucial to define an ethical code and regulatory framework for the use of AI. AI gives us “Super powers”, we become Superheroes Thanks to AI, we can therefore, like Superman “be faster than a bullet” (a lot faster!), do calculations, tend to our clients at any time using a Bot, or analyse enormous volumes of data to detect, for example, anomalies. We can also boast our "super-vision" and process a large number of images at high speed to identify a face or a possible tumor in a medical test. And flying ...also an experience that can be improved thanks to AI. From the use of autopilots that take all types of flight data to optimize parameters, to systems that optimize trajectory calculations in highly saturated airspaces, fuel consumption forecasts, prediction models of adverse weather conditions, etc. What can´t AI do? Superpowers without a superhero who decides what to do and how to use them, have no meaning. In reality, they are just tools that humans created to make life easier. They may be so sophisticated and powerful that we sometimes forget that, in the end, without human intelligence deciding how they should be used, they have their limitations. Thus, an IA-based application can translate a text into another language but cannot understand it. They cannot read between the lines. They can calculate the number of times certain words appear that are considered positive or negative and thus assign a "feeling", but cannot understand the deep meaning of the words, or the real emotions behind them. Another example: One of the most widely used Deep Learning techniques for image recognition (computer vision) is the convolutional neural networks CNNs. These systems classify the objects that appear in an image based on the detection of patterns that match those learned in previous training processes with many tagged images. So far, so good. AI algorithms will look for the pattern that best fits the image in question and will offer a result. But this AI is not able to realize if the result it offers makes sense or not. Any human would have immediately detected a known error from Google, where they labelled an image of an African-American couple as "gorillas". The AI gave its best result, as learned from their training data. But this is where the "limitations" of AI that we have mentioned above come into play. Is this data adequate or is there a bias? In this example, there was a clear racial bias. By not having enough images of African-American people in the training data, the algorithm was not able to give an adequate result. Conclusion: In conclusion, Artificial Intelligence is one of the most powerful tools that the human being has created as it can be applied to almost any field of human activity. Like other previous advances in Human Science and Technology, it allows us to go beyond our biological limitations, and for that reason, it turns us a little into Superheroes. But Artificial Intelligence is a tool created by man, for man. Without human intelligence to define its objective, choose which is the most appropriate "superpower" for each situation, know its limitations, define boundaries, discard the "exact" but absurd results ... it does not make any sense. To add to this, there is a part of human nature that can never be "optimized" by AI: our social dimension, our emotions, empathy and creativity. A "companion" robot can remind you to take your medication, but it can´t give you a hug, or generate a smile, or have that crazy and original idea that solves the problem or at least prompts a laugh. You can also follow us on Twitter, YouTube and LinkedIn
September 6, 2018
AI & Data
Python for all (5): Finishing your first Machine Learning experiment with Python
We have finally come to the last part of the Machine Learning experiment with Python for all. We have been taking it step by step, and in this last post we will address any doubts and keep going through to the end. In this post we will select the algorithms, construct the models, and we will put our validation dataset to the test. We have built a good model, and above all, we have lost our fear of Python. And so…we´ll keep learning! The steps that we have taken in the previous post are as follows: Load the data and modules/libraries we need for this example Explore the data We will now go through the following: Evaluation of different algorithms to select the most adequate model for this case. The application of the model to make predictions from the ´learnt´ 3. Selecting the algorithms The moment has come to create models from the known data and estimate their precision with new data. For this, we´re going to take the following steps: We will separate part of the data to create a validation dataset We will use cross validation for 10 interactions to estimate accuracy We will build 5 different models to predict (from the measurements of the flowers collected in the dataset) which species the new flower belongs to We will select the best model 3.1 Creation of the validation dataset How do we know if our model is good? To know what type of metrics we can use to evaluate the quality of a model based in Machine Learning, we recommend you read this post we published recently about the Confusion Matrix. We used statistical methods to estimate the precision of the models, but we also had to evaluate them regarding new data. For this, just as we did in the experiment before Machine Learning, this time in the Azure Machine Learning Studio, we will reserve 20% of the data from the original dataset. And so, applying this together with the validation, we can check how the model we have generated works, and the algorithm that we chose in this case with 80% remaining. This procedure is known as a holdout method. With the following code, which, as we have done before, we can type or copy and paste into our Jupyter Notebook, separating the data into training sets X_train, Y_train and the validation ones X_validation, Y_validation. https://gist.github.com/PalomaRS/15ba55eda999d7595466d0f91e1ecc9d#file-split-ipynb This method is useful because its quick at the time of computing. However, its not very precise, as the results already vary a lot when we chose different training data. To overcome these issues, the concept of cross validation emerged. 3.2 Cross-validation The objective of cross-validation is to guarantee that the results we obtain are independent from the partition between training data and validation data, and for this reason it is often used to validate models generate in AI projects. It consists of the repeating and calculating of the arrhythmic average of the evaluation methods that we obtain about different partitions. In this case, we are going to use a process of cross-validation with 10 interactions. This means that our collection of training data is divided into 10 parts, trained in 9, validated in 1, and the process repeated 10 times. In the image we can see a visual example the process with 4 interactions. Figure 1: Cross validation, (By Joan.domenech91 CC BY-SA 3.0) To evaluate the model, we chose the estimation variable scoring the accuracy metric that represents the ratio between the number of instances that the model has predicted correctly, against the number of total instances in the dataset, multiplied for 100 to give a percentage. For this, we add the following code: https://gist.github.com/PalomaRS/5699cd36f689d3447a76641438ab3d1b#file-7-ipynb 3.3 Constructing the models As beforehand, we don’t know which algorithms work best for this problem, so we will try 6 different ones, lineal ones (LR, LDA), as well as non-lineal ones (KNN, CART, NB and SVM). The initial graphs indicated that we can imagine they will work, because some of the classes appear to be separated lineally in some dimension. We will evaluate the following algorithms: Logistic regression LR Lineal discrimination analysis LDA K-near neighbours KNN Classification and regression trees CART Naïve Bayes NB Support Vector Machines SVM Before each execution we reset the initial (seed) value to make sure that the evaluation of each algorithm is made sure to be using to same collection of data (data split), to ensure that the results will be directly comparable. We will add the following code: https://gist.github.com/PalomaRS/9a0ff155f6086a566dae348b5e0f8408#file-prueba-ipynb 3.4 Choosing the model that works best If we execute the cells (Cell/Run Cells) we can observe the estimations for each model. In this way we can compare them and chose the best. If we look at the results obtained, we can see that the model with the highest precision value is KNN (98%). Figure 2: Precision results of dinstinct algorithms . We can also create a graph of the results of the model evaluation and compare the distribution and average precision for each model (each algorithm is already evaluated in 10 interactions for the type of cross-validation that we have chosen). For this, we add the following code: https://gist.github.com/PalomaRS/f2b2a4249fdeb0879375ab215cd984a7#file-9-ipynb We get this result: Figur3 3: Box and Whisker plots of comparison algorithms. In the box and whisker diagram it is clear that the precision shown in many of the models KNN, NB and SVM are 100%, whilst the model that offers the least precision is the lineal regression LR. 4. Applying the model to make predictions The moment has come to put the model we created to the test from the training data. For this, what we do its apply it to the original part of the dataset that we separate at the start as the validation dataset. As we have the correct classification values, and they have not been used in the training model, if we compare the real values with the predicted ones for the model we will see if the model is good or not. To do this we apply the chosen model (the one that gave us the best accuracy in the previous step) directly to this dataset, and we summarise the results with a final validation score, a confusion matrix and a classification report. To apply the base model to the SVM algorithm, we don’t need to do any more than run the following code: https://gist.github.com/PalomaRS/fc0989f119fe10cfe1ab12f546ffd398#file-validacion-ipynb We get something like this: Figure 4: Evaluation of the algorithm around the validation. As we can see, accuracy is 0.93 or 93%, a good result. The confusion matrix indicates that the number of points the prediction model got correct (diagonal values: 7+10+11=28), and the elements outside of the diagonal are the prediction errors (2). From this, we can conclude that it is a good model and we can apply it comfortably to our new dataset. We have based the model on the SVM model, but the precision values for KNN are also very good. Are you excited to do this last step and apply the other algorithm? With this step ,we can officially say that we have finished our first Machine Learning experiment with Python. Our recommendation: re-do the whole experiment, taking notes of any doubts that arise, try to look for the answers, try to make small changes to the code, like the last one we proposed and… Across platforms such as Coursera, edX, DataCamp or CodeAcademy, you can find free courses to keep getting better. Never stop learning! Other posts from this tutorial: Dare with Python: An experiment for all (intro) Python for all (1): Installation of the Anaconda environment. Python for all (2): What are the Jupiter Notebook ?. We created our first notebook and practiced some easy commands. Python for all (3): ScyPy, NumPy, Pandas…. What libraries do we need? Python for all (4): We start the experiment properly. Data loading, exploratory analysis (dimensions of the dataset, statistics, visualization, etc.) Python for all (5) Final: Creation of the models and estimation of their accuracy You can also follow us on Twitter, YouTube and LinkedIn
April 17, 2018
AI & Data
Python for all (4): Data loading, explorative analysis and visualisation
Now we have the environment installed, and we have had some practice with commands and we have learnt the various libraries, and which are the most important. The time has come to start our predictive experiment. We will work with one of the most highly recommended datasets for beginners, the iris dataset. This collection of data is very practical because it’s a very manageable size (it only has 4 attributes and 150 rows). The attributes are numerical, and it is not necessary to make any changes to the scale or units, which allows a simple approach (as a classification problem), as well as a more advanced one (such as a multi-class classification problem). This dataset is a good example to use to explain the difference between supervised and non-supervised learning. The steps that we are going to take are as follows: Load the data and modules/libraries we need for this example Exploration of data Evaluation of different algorithms to select the most adequate model for this case. The application of the model to make predictions from the ´learnt´ So that it isn’t too long, in this 4 th post we will carry out the first two steps. Then in the next and last one, we will carry out the 3rd and 4th. 1. Loading the data/ libraries/ modules We have seen, in the previous post, the large variety of libraries that we have at our disposal. There are distinct modules in each one of these. But in order to use the modules as well as the libraries, we have to explicitly import them (except for the standard library). In the previous example some of the libraries are needed to check the versions. Now, we will import the modules we need for this particular experiment. Create a new Jupyter Notebook for the experiment. We can call it ´Classification of iris´. To load the libraries and modules we need, copy and paste this code: https://gist.github.com/PalomaRS/cc3ca3c0e8a4a9949a4db5403cde9bd3#file-cargamodulos-ipynb Continuing on, we will load the data. We´ll do this directly from Machine Learning UCI repository. For this we use the pandas library, that we just loaded, and that will be useful for the explorative analysis of the data, because it has data visualisation tools and descriptive statistics. We only need to know the dataset URL and specify the names of each column to load the data ('sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'). To load the data, type or copy and paste this code: https://gist.github.com/PalomaRS/b5f2ad393bea0e438a60ad9d6340d71b#file-cargadataset-ipynb You can also download the csv of the dataset onto your working directory and substitute the URL for name of the local file. 2. Data exploration In this phase we´re going to focus on topics such as the dimension of the data and what aspect it has. We will do a small statistic analysis of its attributes and group them by class. Each one of these actions doesn’t it is not more difficult than the execution of a command that, in addition, you can reuse again and again in future projects. In particular, we will work with the function shape, that will give us the dimensions of the dataset, the function head, which will show us the data (it indicates the number of records that we want it to show us), and the function describe, that will give us the statistical values of the dataset. Our recommendation is that you try it one by one as you continue to find each one of the commands. You can also type them directly or copy and paste them in your Jupyter Notebook. (use the vertical shift bar to get to the end of the cell). Each time you add a function, execute the cell using (Menú Cell/Run Cells). https://gist.github.com/PalomaRS/28bc5f7faf918ac7924ef23bc47e6561#file-exploracion-ipynb As a result, you should get something like this: Figure 2: Results of applying the commands to the datset exploration And so, we see that this dataset has 150 instances with 5 attributes, we see the list of the first 20 records, and we see the distinct values of longitude and the width of the petals and sepals of the flower that, in this case, correspond to the Iris-setosa class. At last, we can see the number of records that there are in the dataset, the average, the standard deviation, the maximum and minimum values of each attribute and some percentages. Now we will visualise the data. We can produce graphs of a variable, which will help us to better understand each individual attribute, or multivariable graphs, which hallows us to analyse the relationship between the attributes. Its our first experiment, and we don’t want to over complicate it, so we will only try the first ones. As the beginning variables are numerical, we can create a box and whisker plot diagram, which will give us a much clearer idea of the distribution of the starting attributes (longitude and width of the petals and sepals). For this, we just have to type or copy and paste this code: https://gist.github.com/PalomaRS/6a9702ce4cc13f28c49a1e41508af73a#file-caja_bigotes-ipynb By executing this cell, we get this result: Figure 3: Box and Whisker plots. We can also create a histogram of each attribute and variable to give us an idea of what type of distribution follows. For this, we don’t need to do more than add the following commands to our Jupyter Notebook (as in the previous example, better to do it one by one): https://gist.github.com/PalomaRS/35e187ab4aeb40be83f462183bf2d663#file-histograma-ipynb We execute the cell, and we get this result. At a first glance, we can see that the related variables are the sepals, they appear to follow a Guissiana distribution. This is very useful because we are able to use algorithms that take advantage of the properties of this group of distributions. Figure 4: Histograms. And so now we´re almost finished. In the following post we will finalise our first Machine Learning experiment with Python. We will evaluate different algorithms around the conjunction of validation data, and we will choose which offers us the most precise metrics to expand our predictive model. And, at last, we will use the model. The posts in this tutorial: Dare with Python: An experiment for all (intro) Python for all (1): Installation of the Anaconda environment. Python for all (2): What are the Jupiter Notebook ?. We created our first notebook and practiced some easy commands. Python for all (3): ScyPy, NumPy, Pandas…. What libraries do we need? Python for all (4): Data loading, exploratory analysis (dimensions of the dataset, statistics, visualization, etc.) Python for all (5) Final: Creation of the models and estimation of their accuracy You can also follow us on Twitter, YouTube and LinkedIn
April 10, 2018
AI & Data
Python for all (3): ScyPy, NumPy, Pandas…. What libraries do we need?
We are taking another step in our learning of Python by studying what the modules are, and, in particular the libraries. We will see what purpose some of them serve and lean how to import and use them. What are the modules? The modules are the form in which Python stores definition (instructions or variables) in an archive, so that they can be used after in a script or in an interactive instance of the interpretation (as in our case, Jupyter Notebook). Thus, we don’t need to return to define them every time. The main advantage of Python allowing us to separate a program into modules is, evidently, that we are able to reuse them in other programmed modules. For this, as we will see further on, it will be necessary to import the modules that we want to use. Python comes with a collection of standard modules that we can use as a base for a new program or as examples from which we can begin to learn. Python organises the modules, archives .py, in packages, that are no more than folders that contain files .py (modules), and an archive that starts with the name _init_.py. The packages are a way of structuring the spaces by names of Python using ´names of modules with points´. For example, the number of the module A.B designates a submodule called B in a package called A. Just as the use of modules prevents the authors of different modules from having to worry about the respective names of global variables, the use of named modules with points prevents the authors of packages of many modules, such as NumPy or the image library of Python (Python Imaging Library or PIL), from having to worry about the respective names of modules. └── paquete ├── __init__.py ├── modulo1.py ├── modulo2.py └── modulo3.py To import a module, use the instruction ´Import´´, following with the name of the package (if applicable) more the name of the module (without the .py) that you wish to import. If the routes (or what are known ´namespace´) are large, you can generate an alias by modifying ´as´: The modules should be imported at the start of the program, in alphabetical order and, first the ones of Python, then those of the third party, and finally, those of the application. The standard Python library Python comes with a library of standard modules, in which we can find all the information under The Python Standard Library. To learn about syntax and semantics, it will also be good to have to hand The Python Language Reference. The standard library is very large and offers a great variety of modules that carry out functions of all kinds, from written modules in C that offer access to system functions like access to files (file I/O). Python installers for platforms such as Windows, normally include a complete standard library, including some additional components. However, Python installations using packages will require specific installers. A stroll through the standard library The standard library offers a large variety of modules that carry out all times of functions. For example, the module OS offers typical functions that allow you to interact with the operative system, like how to tell in which directory you are, change directory, find help functions, the maths module that offers trigonometry functions, logarithms, statistics etc. There are also modules to access the internet and process protocols with urillib.request, to download data in a URL and smtplib, to send emails; or modules such as datetime, that allow you to manage dates and times, modules that allow you compress data, or modules that measure returns. We don´t include examples so we don’t drag ourselves out too much, but if you´re really interested in learning, we recommend you test these modules one by one) from the Python Shell, or through Jupyter) with this small walk through the standard library that you can find in official Python documentation. Virtual environments However, the applications of Python often use packages and modules that don´t form part of the standard library, in fact Python is designing a way to facilitate this interoperability. The problem in which we find ourselves, common in the environment of open coding, is that applications frequently need a specific version of the library, because that application requires that a particular bug has been fixed or that the application has been written using an outdated version of the library interface. This means that it may not be possible that Python installations complies with the requirements of all the applications. If application A needs version 1.0 of a particular module and application B needs version 2.0, there requirements are therefore conflicting and to install the version 1.0 or 2.0 will stop one of the applications working. The solution to this problem is to create a virtual environment, a directory that contains a Python installation from a particular version, moreover from one of the many additional packages. In this way, different applications can use different virtual environments. To resolve the example of conflicting requirements cited previously, application A can have its wen virtual environment with version 1.0 installed, whilst application B has its own virtual environment with the version 2.0. Non-standard libraries Given that the objective of our example is to carry out an experiment into the application of Machine Learning with Python to a determined dataset, we will need something more than the standard libraries that, even though offers us some mathematical functions, leaves us a little short. For example, we will also need modules that allow us to work with the visualisation of the data. We will get to know what are the most common in data science: NumPy : Acronym of Numerical Python. Its most powerful characteristics are that its able to work with an array of matrices and dimensions. It also offers basic lineal algebraic functions, transformed to Fourier, advanced capacities with random numbers, and integration tools with other languages at a basic level such as Fortran, C and C++. SciPy: Acronym of Scientific Python. SciPy is constructed around the library NumPy. It is one of the most useful with its grand variety of high-level modules that it has around science and engineering, as a discrete Fourier transformer, lineal algebra, and optimisation matrices.. Matplotlib: is a graphic library, from histograms to lineal graphs or heat maps. It can also use Latex commands to add mathematical expressions to graphs. Pandas: used for operations and manipulations of structured data. It’s a recently added library, but its vast utility has propelled the use of Python in the scientific community. Scikit Learn formachine learning: Constructed on NumPy, SciPy and matplotlib, this library contains a large number of efficient tools for machine learning and statistic modelling, for example, classification algorithms, regression, clustering and dimensional reduction. Statsmodels: for statistic modelling. It’s a Python model that allows users to explore data, make estimations of statistic models, and carry out test statistics. It offers an extensive list of descriptive statistics, test, graphic functions etc for different types of data and estimators. Seaborn: based in matplotlib, it is used to make graphs more attractive and statistical information in Python. Its objective is to give greater relevance to visualisations, within areas of explorations and interpretation of data. Bokeh: allows the generation of attractive interactive 3D graphs, and web applications. It is used for fixing applications with streaming data. Blaze: it extends the capacity of NumPy and Pandas to distributed data and streaming. It can be used to access a large number of data from sources such as Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables etc Scrapy: used to track the web. It’s a very useful environment for obtaining determined data owners. From the home page url you can ´dive´ into different pages on the site to compile information. SymPy: it is used for symbolic calculation, from arithmetic, to calculus, algebra, discrete mathematics and quantum physics. It also allows you to format the results in LaTeX code. Requests for accessing the web: it works in a similar weay to the standard library yrllib2, but its much simpler to code. And now we suggest you do a simple exercise to practice a little. It consists of verifying versions of the Anaconda library that we have installed. On Anaconda´s web page, we can see this diagram showing the different types of available libraries (IDEs for data science, analytics, scientific calculations, visualisation and Machine Learning. As you can see, two libraries appear that we have not talked about, Dask and Numba. Thus, we must also investigate their use, and also check out which versions Anaconda has installed for us. Figure 2:Diagram of the Anaconda environment. For that, you don’t need to do anything more than write in your Jupyter notebook, or copy and paste the following commands, (with a slight modification for the libraries that don’t show). With this post we now have everything prepared to start the Machine Learning experiment. In the next one, we will start downloading the data, and the explorative analysis. We´re nearly there! All the posts in this tutorial here: Dare with Python: An experiment for all (intro) Python for all (1): Installation of the Anaconda environment. Python for all (2): What are the Jupiter Notebook ?. We created our first notebook and practiced some easy commands. Python for all (3): ScyPy, NumPy, Pandas…. What libraries do we need? Python for all (4): We start the experiment properly. Data loading, exploratory analysis (dimensions of the dataset, statistics, visualization, etc.) Python for all (5) Final: Creation of the models and estimation of their accuracy You can also follow us on Twitter, YouTube and LinkedIn
April 3, 2018
AI & Data
Dare with Python: An experiment for all (intro)
As we did in our experiment on the Titanic dataset in Azure Machine Learning Studio, we will continue with the "Learning by doing" strategy because we believe that the best way to learn is to carry out small projects, from start to finish. A Machine Learning project may not be linear, but it has a series of well-defined stages: 1. Define the problem 2. Prepare the data 3. Evaluate different algorithms 4. Refine the results 5. Present them On the other hand, the best way to get to know a new platform or tool is to work with it. And that is precisely what we are going to do in this tutorial: get to know Python as a language, and as a platform. What is NOT necessary to follow this tutorial? The objective of this experiment is to show how a simple Machine Learning experiment in Python can be done. Different people with different profiles can work with ML models. For example, a Social Sciences researcher, or a financial expert, Insurance broker, Marketing agent etc. They all want to apply the model (and understand how it works). A developer who already knows other languages/ programming environments, may want to start learning Phyton. Or a Data Scientist that works developing new algorithms in R, for example, and wants to start working in Python. So, instead of making a list of the prerequisites to follow the tutorial, we will detail what is not needed: You do not have to understand everything at first. The goal is to follow the example from start to finish and get a real result. You can take note of the questions that arise and use the function help ("FunctionName") of Python to learn about the functions that we are using. You do not need to know exactly how algorithms work. It is convenient to know their limitations, and how to condiv them. But you can learn little by little. The objective of this experiment is to lose the fear of the platform and keep learning with other experiments! You do not have to be a programmer. The Python language has a quite intuitive syntax. As a clue to begin to understand it, it is convenient to look at the function’s calls (e.g. function ()) and in the assignment of variables (e.g. a = "b"). The important thing now is to "start", little by little, you can learn all the details. You do not have to be an expert in Machine Learning. You can learn gradually about the advantages and limitations of different algorithms, how to improve in the different stages of the process, or the importance of evaluating accuracy through cross-validation. As it is our first project in Python, let's focus on the basic steps. In other tutorials we can work on other tasks such as preparing data with Panda or improving the results with PyBrain. What is Python? Python is an interpreted programming language, oriented to high level objects and dynamic semantics. Its syntax emphasizes the readability of code, which facilitates its debugging and, therefore, promotes productivity. It offers the power and flexibility of compiled languages with a smooth learning curve. Although Python was created as a general-purpose programming language, it has a series of libraries and development environments for each of the phases of the Data Science process. This, added to its power open source characteristics and ease of learning, has led it to take the lead from other languages of data analytics through Machine Learning such as SAS (leading commercial software so far) and R (also open source, but more typical of academic or research environments). Python was created by Guido Van Rossum in 1991 and, curiously, owes its name to the great fondness of its creator for the Monty Python films. In addition to libraries of scientific, numerical tools, analysis tools and data structures, or Machine Learning algorithms such as NumPy, SciPy, Matplotlib, Pandas or PyBrain, which will be discussed in more detail in another posts of the tutorial, Python offers interactive programming environments oriented around Data Science. Among them we find: 1. The Shell or Python interpreter, which can be launched from the Windows menu, is interactive (executes the commands as you write), and is useful for simple tests and calculations, but not for development. 2. IPython: It is an extended version of the interpreter that allows highlighting of lines and errors by means of colours, an additional syntax for the shell, and autocompletion by means of a tabulator. 3. IDE or Integrated Development Environments such as Ninja IDE, Spyder, or the one we will work with, Jupyter. Jupyter is a web application that allows you to create and share documents with executable code, equations, visualization, and explanatory text. Besides Python, it is compatible with more than 40 programming languages, including: R, Julia, and Scala and integrates very well with Big Data tools, such as Apache Spark. What steps are we going to take in this tutorial? What steps are we going to take in this tutorial? So that they are not too long, we are going to divide the work into different posts. Introduction: An experiment for all Python for all (1): Installation of the Anaconda environment. Python for all (2): What are the Jupiter Notebook ? Create Notebook and practice easy commands. Python for all (3): ScyPy, NumPy, Pandas…. What libraries do we need? Python for all (4): We start the experiment properly. Data loading,exploratory analysis (dimensions of the dataset, statistics, visualization,etc.) Python for all (5) Final: Creation of the models and estimation of theiraccuracy You can also follow us on Twitter, YouTube and LinkedIn
March 13, 2018
AI & Data
Artificial Intelligence or Cognitive Intelligence? The buzz words of business
(Original post in Spanish: ¿Inteligencia artificial o cognitiva?) Artificial Intelligence in the last 5 years has become the biggest buzz word with various spin offs including “Cognitive Intelligence”, “Smart Technologies” and “Predictive Technologies”. The often-negative associations that accompany the idea of Artificial Intelligence means some companies are shying away from declaring themselves as AI pioneers and instead are creating their own buzz words. But what is the real difference between these ideas and how do AI companies deal with possible negative connotations? What is AI? The Encyclopædia Britannica defines the concept of Artificial Intelligence as the “the ability of a digital computer or computer-controlled robots to perform tasks commonly associated with intelligent beings. The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, such as the ability to reason, discover meaning, generalize, or learn from past experience”. The problem with the term Artificial is that it gives the connotations of a lack of authenticity, robotic-like and unnatural when it’s aim is to be quite the opposite. To summarize this rather lengthy explanation in fewer words, one could simply describe AI as creating a computer that can solve complex problems as a human would. It is a vital part of economic sectors such as Information technologies, Health, Life Sciences, Data Analysis, Digital Transformation, Security and now in the consumer sector with the development of smart homes etc. Cognitive Intelligence and Machine Learning Cognitive Intelligence is an important part of AI, that encompasses the technologies and tools that allow our apps, website and bots to see, hear, speak, understand and interpret a user’s needs in a natural way. That’s to say, they are the applications of AI that allow machines to learn their users’ language so that the users don’t have to learn the language of machines. AI is a much wider concept that includes technology and innovations such as robotics, Machine Learning, Deep Learning, neural networks, NLP etc. Machine learning is one branch of Artificial Intelligence that allows researchers, data scientists, data engineers and analysts to build algorithms that learn and can make “data-driven” predictions. Instead of following a series of rules and instructions, these algorithms are trained to identify patterns in large quantities of data. Deep Learning takes this idea one step further and processes the information in layers, so that the result obtained in one layer becomes the input for the next. So, if AI is so important why is the term often tip toed around? The origin of negative connotations for AI Firstly, it seems that it has become a “worn-out” word. It has been used so widely that the whole world seems to know about it (and have their opinion on) the subject! This widespread use has been accompanied by a lack of information. Many people can only base their understanding on what Hollywood has taught them; that AI is limited to robots and Strong AIs. Others think they are talking about AI when, in reality, they are talking about Machine Learning. Secondly, there is the fact that Artificial Intelligence is not a new concept, meaning mal judgment has formed over many decades; in fact, it has existed since 1956. Over these years there have been different waves (such as the introduction of expert systems in the 80’s and the explosion of the internet in the 90’s). In each period, expectations have been greater than reality and there have been “troughs of disillusionment” the third phase of Gartner’s “Hype Cycle”. Figure 1: Gartner Curve. (From IOTpreneur , CC BY-SA 4.0) We currently find ourselves in a period of high expectations with respect to AI. Big companies are promising innovation beyond what we could have imagined 20 years ago. Some technological leaders talk about the dangers and impact that automation, robotics and AI might have on our lives and future jobs. Despite this, each day we are seeing more and more technologies that make our lives easier. These advancements, that help people see the reality of the technology may help to reduce the “baggage” of negative connotations surrounding the future of AI. What areas does AI encompass? AI is an ecosystem, where we can include technologies such as data mining, natural language processing (NLP), Deep Learning, Predictive and Prescriptive Analysis and many more. In this ecosystem we also find technologies that regularly assist in our daily lives such as recommendation systems which Netflix and AirBnb base themselves on. All of these technologies are characterized by generating data which, if analyzed correctly, can offer great value and understanding. Due to this, one can say that artificial intelligence lies at the convergence of all these solutions. Additionally, AI is closely linked to the four pillars of innovation and digital transformation: cloud computing, mobility, social analytics and Big Data as it powers some of the main accelerators of this transformation; including Cloud Computing, Cognitive Systems, the Internet of Things (IoT), Cybersecurity and Big Data technologies. Digital transformation pillars The technology sector is transforming into a sector of understanding. In order to take anything away from this understanding it is important to have technologies and “real-life” applications that are deeply connected. This is what we call the “digital economy”. As mentioned earlier, this transformation is based on four fundamental pillars: Cloud computing Mobility Social Analytics Big Data Analytics These technologies and innovations are the true driving forces behind a digital transformation and they are so closely tied with AI that they sometimes get confused with AI itself. These four pillars support the “accelerators” of innovation. The main accelerators are: Cognitive services Cybersecurity IoT Big Data Cognitive services All of these technologies are ever-present in our daily lives. Cognitive services aim to imitate rational human processes. They analyze large amounts of data that is created by connected systems, and offer tools that have diagnostic, predictive and prescriptive capabilities that are capable of observing, learning and offering Insights. They are closely orientated with the contextual and human interaction. For this, the challenge for Artificial Intelligence is to design the technology so that people can interact with it naturally. This involves developing applications with “human behavior”, such as: Listening and speaking, or rather the ability to turn audio to text and text to audio Natural Language Processing (NLP). Text is not just a combination of keywords, a computer needs to understand grammatical and contextual connections too Understanding emotions and feelings (“sentiment analysis”). To create empathetic systems capable of understanding the emotional state of a person and to make decisions based on this. Image recognition. This consists of finding and identifying objects in an image or video sequence. It is a simple task for humans, but a real challenge for machines. Cybersecurity Cybersecurity is also moving towards a more holistic focus, one that considers its environment and a more human dimension. Above all, it is becoming more proactive. Rather than waiting for a cyber-attack to happen, the key is in prediction and prevention. Now, AI can be used to detect patterns in the data and take action when alerts arise. Internet of Things and Big Data What about the Internet of Things and Big Data? In this case, the amount of data that is being created is clear, as is the fact that it is happening rapidly and often in unstructured forms. This can include data from IoT sensors, social networks, text files, images, videos and sound. Now AI tools such as data mining, machine learning and NLP mean that it is possible to turn this data into useful information. Artificial Intelligence is a very broad term and encompasses many processes and technologies that can be applied in various industries. Companies must be able to explain simply the type of AI they are incorporating inorder to displace confusion surrounding the terms, which will make it a more accesible technology.
February 2, 2018
AI & Data
How can Brand Sponsored Data be used as a marketing tool for video advertising?
(Content written by Cambria HAYASHINO) Connecting with customers while they are on the go is crucial for every successful marketing and sales campaign. More and more companies have realized that the mobile screen is a key element in every media plan to land own marketing messages. However, as we have seen in previous blogs, many mobile phone users in LATAM are on restricted mobile data plans. This leads to customers who carefully monitor data usage and usually avoid using most apps or even websites without being connected to Wi-Fi. This fact also impacts marketing campaigns, especially video campaigns. Users are less likely to watch a video-ad when on the go compared to when they are on Wi-Fi. Brand Sponsored Data can help to overcome this issue. Showcase: Brand Sponsored Data as a tool to improve the video-advertising experience Movistar Colombia planned a big mobile video campaign promoting their new service Movistar Música. In order to create a positive brand experience, they decided to use Brand Sponsored Data to support this goal. So, Movistar decided to run their whole mobile video campaign in a sponsored mode, to encourage more customers to click on the video immediately (no matter if they are on Wi-Fi or not) and reach more user with their marketing message. A seamless customer experience With help of a targeted messaging campaign customers received a direct (and free) SMS informing them about the new music service. With an embedded link to a separate landing page, users were invited to watch a video informing them about the new music service. At the end of the video, customers could sign up for the service, if they were interested. The entire customer journey was free for users in terms of data costs. Figure1: Process description. - click to enlarge Impressive results In order to see the impact of Brand Sponsored Data, Movistar set up a control group, which received the same message, but with a non-sponsored video. The uplifts of the sponsored vs. non sponsored group were impressive: Sponsored users showed: 2.3x higher Click-Rate 1.9x higher video starting rate 3x more video completions (65%! outstream format) 4x higher CTR Furthermore, digging more into detail Movistar found out, that not even more users responded to the sponsored message, but especially users who were connected to cellular, so were on the go. While the majority of the non-sponsored group was connected to Wi-Fi, when clicking on the video. So Brand Sponsored Data could encourage more users to click immediately on the link to watch the video. Figure 2: Results. - click to enlarge Conclusions With this sponsored video approach, Movistar could not only reach more customers but also achieve more video completions, making sure, that more users have received the full marketing message. This campaign’s success demonstrated that customers showed higher interest in watching a video-ad if the obstacle of data costs was removed. So with a single and very easy adaption of the video campaign, the results could be improved significantly for both - customers and the video-advertiser.. To learn more about Brand Sponsored Data and to see other case studies, visit our website. Don't miss out on a single post. Subscribe to LUCA Data Speaks.
September 7, 2017
AI & Data
Big Data Analytics, but what type?
In today’s climate, businesses are already aware that if they don’t make the most of their data, they will be left behind by the competition. They know that traditional Business Intelligence (BI) systems are no longer enough. All around them, they hear people talk about Big Data, Data Analytics, Data Science and more. They read leading consultancy reports that predict that an increasing number of businesses will enter these industries in the coming year. They have started to invest resources in their data storage. However, after all this they don’t know how that draw out value from this information. As you can imagine from the title, in this post we are going to explore the different types of Big Data Analytics, what they consist in, how they can be used in the market, and how specific businesses are currently using them. All this information will give us an insight into which type is best suited to specific businesses. The first step is to decide which type of analytics your business needs. This is not a trivial question, since there are no “universal analytics” that work for each case. Traditionally, one would work with analytical tools in a reactive way. These tools are capable of generating reports and visualizations about what has happened in the past, but they do not offer useful information about possible business opportunities or problems that may arise in the future. This led to a need for a movement towards Predictive Analytics as well as the Descriptive Analytics that already existed. The world saw a move from linear analytics in a controlled environment, towards analytics that can be applied in a real world (i.e. less structured) environment. This need is still to be fully fulfilled, shown by the fact that technology consultants IDC estimate a Compound Annual Growth Rate (CAGR) for the Big Data and Analytics industry of 26.4% by the end of 2018. Figure 1: Analytics. In this article we will define Descriptive, Predictive and Prescriptive analytics in order to reveal what each type can offer to businesses who want to improve their operational capabilities. 1) Descriptive Analytics This is the most basic area of analytics, and is currently used by around 90% of businesses. Descriptive Analytics answers the question: What has happened? It analyzes historical data and data collected in real time in order to generate insights about how past business strategies have worked (for example, a marketing campaign). Aim: To identify the causes that led to success or failure in the past in order to understand how they might affect the future Based on: Standard aggregate functions of the database. They require a basic level of mathematics Examples: This type of analytics is often used for social analytics and is the result of basic arithmetic operations such as average response time, page views, follower trends, likes etc Application: Using tools such as Google Analytics to analyze whether a promotional campaign has worked well or not, with the use of basic parameters such as the number of visits to the page. The results are usually visualized on “Dashboards” that allow the user to see real-time data and send the reports to others. 2) Predictive Analytics This is the next step in reducing data into insights, and according to Gartner, 13% of organizations use such techniques. This type of analytics answers the question: What may happen in the future based on what has happened in the past? It analyzes historical trends and data models in order to try to predict how they will behave in the future. For example, a company can predict their growth by extrapolating previous behavior and assuming that there would be relevant changes in their environment. Predictive analytics offer better recommendations and answers to questions that BI cannot answer. Aim: To identify the causes that led to success or failure in the past in order to understand how they might affect the future. This can be useful when setting realistic business objectives and can help businesses to plan more effectively Based on: They use statistical algorithms and Machine Learning to predict the probability of future results. The data that feeds these algorithms comes from CRMs, ERPs or human resources. These algorithms are capable of identifying relationships between different variables in the dataset. They are also capable of filling gaps in information with the best possible predictions. However, despite being the “best possible”, they are still only predictions Examples: One usually uses this type of analytics for Sentiment Analysis. Data enters a machine learning model in the form of plain text and the model is then capable of assigning a value to the text referring to whether the emotion shown is positive, negative or neutral Application: Often in the financial sector in order to assign a client with a credit score. Retail companies also use this type of analytics to identify patterns in the purchasing behavior of clients, to make stock predictions or to offer personalized recommendations Figure 2: The different types of analytics. 3) Prescriptive Analytics This type of analytics goes one step further, as it aims to influence the future. As such, it is known as “the final frontier of analytical capabilities”. Predictive analytics can suggest which actions to take in order to achieve a certain objective. Prescriptive analytics does this as well, but also suggests the possible effects of each option. It aims to answer the question: What should our business do? Its complexity means that, despite the immense value that it could offer, only 3% of organization use such analytics (according to Gartner) Figure 3: The three phases of analytics. Aims: Prescriptive analytics don’t just anticipate what is going to happen, and when, but can also tell us why. Further still, it can suggest which decisions we should take in order to make the most of a future business opportunity or to avoid a possible risk, showing the implication of each option on the result. Based on: This type of analytics ingests hybrid data; structures (numbers, categories) and unstructured (videos, images, sounds and text). This data may come from an organization’s internal sources, or external ones such as social networks. To the data, it applies statistical mathematical models, machine learning and natural language processing. It also applies rules, norms, best practices and business regulations. These models can continue to collect data in order to continue making predictions and prescriptions. In this way, the predictions become increasingly precise and can suggest better decisions to the business Examples: Prescriptive analytics is useful when making decisions relating to the exploration and production of petrol and natural gas. It captures a large quantity of data, can create models and images about the Earth’s structure and describe different characteristics of the process (machine performance, oil flow, temperature, pressure etc). This tools can be used to decide where and when to drill and therefore build wells in a way that minimizes costs and reduces the environmental impact. Application: Health service providers can use such analytics to: effectively plan future investments in equipment and infrastructure, by basing plans on economic, demographic and public health data; to obtain better results in patient satisfaction surveys and to avoid patient churn; to identify the most appropriate intervention models according to specific population groups. Pharmaceutical companies can use it to find the most appropriate patients for a clinical trial. We don’t have a crystal ball to tell us which numbers will appear in the lottery, but it is evident that Big Data technologies can shed light on current problems in our business, helping us to understand why they are happening. In this way, we can transform data into actionable insights and reinvent business processes. Figure 4: Infographic - from data to decisions and actions. We have moved from the question of “what has happened?” to being capable of understanding “why has it happened? We can predict “what is going to happen?” and prescribe “what should I do now?” Now, we can truly create an intelligent business. Don't miss out on a single post. Subscribe to LUCA Data Speaks.
August 10, 2017
AI & Data
What are the 5 principles of joined-up data?
The definition and principles of ‘open data’ are quite clear and simple but the principles of joined-up data are less clear. Can you enunciate five principles of joined-up data that could serve as a practical guide for others?” The Joined-Up Data Standards (JUDS) project ennunciated this 5 principles in terms of concrete guidance when it comes to a commonly recognised list of principles for interoperability – the ability to access and process data from multiple sources without losing meaning, and integrate them for mapping, visualisation, and other forms of analysis – at a global level. Figure 2: The 5 Principles of joined-up data infographic (self production)
July 7, 2017