Richard Benjamins

Richard Benjamins

El Dr. Richard Benjamins es chief AI & data strategist en Telefónica. Durante su carrera profesional ha trabajado en el mundo académico, en distintas start-ups y en corporaciones multinacionales. Ha sido Chief Data Officer de AXA. Su pasión es crear valor a partir de los datos. Ha sido el fundador del área de Big Data for Social Good de Telefónica. Es cofundador y vicepresidente del Observatorio del Impacto Social y Ético de la Inteligencia Artificial, OdiseIA, y es miembro del grupo de expertos de la Comisión Europea sobre la compartición de datos entre empresas y gobiernos para el interés público. Es doctor en Ciencias Cognitivas por la Universidad de Ámsterdam y ha publicado más de 100 artículos científicos.
AI & Data
A new organizational role for Artificial Intelligence: the Responsible AI Champion
With the increasing uptake of Artificial Intelligence (AI), more attention is given to its potential unintended negative consequences. This has led to a proliferation of voluntary ethical guidelines or AI Principles through which organizations publicly declare that they want to use AI in a fair, transparent, safe, robust, human-centric, etc. way, avoiding any negative consequences or harm. Harvard University has analyzed the AI Principles of the first 36 organizations in the world that published such guidelines and found 8+1 most used categories[1], including human values, professional responsibility, human control, fairness & non-discrimination, transparency & explainability, safety & security, accountability, privacy + human rights. The div below shows the timeline of publication date of the AI Guidelines for the 36 organizations. The non-profit organization Algorithm Watch maintains an open inventory of AI Guidelines with currently over 160 organizations[2]. Timeline of publication date of the AI Guidelines for the 36 organizations View bigge r From Principles to practice While there is much work dedicated to formulating and analyzing AI Guidelines or Principles, much less is known about the process of turning those principles into organizational practices (Business as Usual – BAU). Initial experiences are being shared and published[3],[4], and experience is building up[5],[6],[7], with technology and consultancy companies leading. Telefonica’s methodology, coined “Responsible AI by Design”3, includes various ingredients: The AI principles setting the values and boundaries[8] A set of questions and recommendations, ensuring that all AI principles are duly considered in the product lifecycle Tools that help answering some of the questions, and help mitigating any problems identified Training, both technical and non-technical A governance model assigning responsibilities and accountabilities Here we focus on a new organizational role which is essential for implementing the responsible use of AI in an organization; the role plays a critical role in the governance model and we have coined it “Responsible AI Champion”[9] (RAI Champion). Introducing the Responsible AI Champion Why do we need champions? AI & Ethics is a new area in many organizations and to establish new areas the identification of champions is a proven strategy. A champion is knowledgeable about the area, is available for fellow employees in a given geography or business unit, and provides awareness, advice, assistance and escalation if needed. Champions are also crucial to turn new practices into BAU, and as such are agents of chance. In particular, the responsibilities of a Responsible AI Champion are to inform, educate, advice & escalate, coordinate, connect and manage change. Inform A RAI Champion informs fellow employees about the importance of applying ethics to AI and data to avoid unintended harm. He or she raises awareness of the organization’s AI Principles. Educate A RAI Champion provides and organizes training -both online and face to face – to the corresponding business unit or geography, explaining how to apply the principles to the product lifecycle. He or she also explains the governance model and encourages self-educated experts to form a voluntary community of practice where “local” employees can get first-hand advice. Advice & escalate A RAI Champion is the final “local” contact for ethical questions about AI and Big Data applications. If the experts of the community of practice, nor the RAI Champion can address the issue at hand, it is escalated to a multi-disciplinary group of senior experts. Coordinate Given the fact that AI and Big Data refer to issues dealt with in several other organizations, RAI Champions need to coordinate with all of them. Coordination is needed with the DPO (data protection officer) for privacy related issues; with the CISO (chief information and security officer) for security-related aspects; with the CDO (chief data officer) for data and AI related topics; with CSR (corporate social responsibility) for reputational and sustainability issues; with the Regulation area for possible future AI regulations; and with the Legal area for other legal issues. In some organizations, the responsible use of AI and Big Data is part of a wider “responsibility” initiative including topics such as sustainability (SDGs), climate change, human rights, fair supply chain and reputation. In this case, the RAI Champion should coordinate and be fully aligned with the respective leaders. Connect RAI Champions need to connect relevant people to form communities of experts on the subject matter. Those communities are the first place to go if ethical doubts cannot be solved within a product or project team. RAI Champions also need to form a community among themselves connecting different geographies and business units of the organization in an active learning and sharing network. Finally, more mature organizations also may consider setting up or join an external RAI Champion (or similar) network where experiences and practices are shared with other organizations, either from the same sector or across different sectors. Manage change Finally, RAI Champions are agents of change. They have to ensure that over time, ethical considerations become an integral and natural part of any business activity touching AI and Big Data, including design, development, procurement and sales. They have to implement and turn the governance model into BAU. The RAI Champion profile For organizations that are starting, the RAI Champion is more a role than a fulltime job. Typically, the role is taken up by AI or Big Data enthusiasts that have researched ethics topics by themselves without being asked and are attentive to the latest developments. But the RAI Champion role is not necessarily the realm of technical people only. They also come from areas such as regulation, CSR, and data protection. Indeed, a “good” candidate to take up the role is the DPO. RAI Champions need to be communicative with an interest to teach and convince. As with any new roles with an interdisciplinary character, RAI Champions will need to be trained before they can exercise their role. To stay up to date with LUCA, visit our Webpage, subscribe to LUCA Data Speaks and follow us on Twitter, LinkedIn o YouTube. AI OF THINGS What to expect from Artificial Intelligence in 2020? January 10, 2020 AI OF THINGS Is your AI system discriminating without knowing it?: The paradox between fairness and privacy September 13, 2019 [1] https://cyber.harvard.edu/publication/2020/principled-ai [2] https://arxiv.org/abs/2001.09758 [3] https://arxiv.org/abs/1909.12838v2 [4] https://inventory.algorithmwatch.org/ [5] https://www.weforum.org/agenda/2020/01/tech-companies-ethics-responsible-ai-microsoft/ [6] https://media.telefonicatech.com/telefonicatech/uploads/2021/1/115731_Decision_Points_AI_Governance.pdf [7] https://www.ai.mil/blog_04_01_20-shifting_from_principles_to_practice.html [8] https://www.telefonica.com/en/web/responsible-business/our-commitments/ai-principles [9] Independently, Telefonica and Microsoft have come up with the same name: RAI Champion.
June 8, 2020
AI & Data
The RAE launches LEIA with Telefónica, Google, Microsoft, Amazon, Twitter and Facebook
Last Friday, Chema Alonso and I had the opportunity to attend the closing session of the 16th Congress of the Association of Spanish Language Academies (ASALE) in Seville organised by organizado por la Real Academia Española (RAE). During the event, presided over by the King and Queen of Spain, the LEIA (Spanish Language and Artificial Intelligence) project was presented, a very exciting plan that has been supported and promoted by Telefónica since its inception. The aim of LEIA is the defence, projection and correct use of the Spanish language in the field of Artificial Intelligence and current technologies. The LEIA project also has the collaboration of leading technology companies such as Google, Microsoft, Amazon, Twitter and Facebook, which like Telefónica with its Artificial Intelligence Aura, are opting for the use of AI in virtual assistants such as Alexa and Cortana in their services to enrich the relationship with their customers. This has a clear impact on the language used by these services and these companies do not want to miss this opportunity to take advantage of technology to promote the correct use of the Spanish language. In the event, these companies together showed a video with some of the tools that already exist or that will be developed in the context of LEIA as a product of the interaction of Artificial Intelligence applied to language with the resources of the RAE. In the case of Telefónica, through Aura in Movistar Home, customers can resolve any queries they may have about spelling, grammar or foreign words, providing information that motivates them to learn and enjoy the language from the comfort of their own home. https://www.youtube.com/watch?v=L_AW7lbz2pY&t= Telefónica's collaboration with the RAE began with the participation of José María Álvarez-Pallete, Chairman & CEO of Telefónica, in the 8th International Congress of the Spanish Language held in Cordoba, Argentina, on March 27. During his speech he called attention to the rapid evolution of technology and stressed the importance of promoting the use of Spanish in this technology so that it is not relegated to second place to other languages such as English, and the need to train algorithms with the appropriate data to preserve the use of correct Spanish. Chema Alonso emphasised in the framework of the event that "we are exposed to an unprecedented technological avalanche, we have to ensure that Artificial Intelligence not only speaks Spanish so that it is an inclusive technology that benefits all Spanish speakers, but also that it speaks it correctly. It is therefore very positive that through LEIA we can now take advantage of all the linguistic resources of the RAE”. In addition, Telefónica and all the technological partners have made our collaboration in the project official by signing an agreement in which we undertake to use the RAE's resources such as dictionaries, grammar and spelling in the development of our voice assistants, word processors, search engines, chatbots, instant messaging systems, social networks and any other tools, as well as following the criteria for the correct use of Spanish, approved by the Royal Spanish Academy. The RAE was founded more than 300 years ago to maintain the unity of the Spanish language. With LEIA, the aim is now to achieve the same for the Spanish of machines. I would like to conclude the article by mentioning the special thrill I felt when I heard the King say the name I chose for the project.
November 11, 2019
AI & Data
Artificial Intelligence in small and medium-sized enterprises
We hear of increasingly more applications of Artificial Intelligence ranging from improved medical diagnosis to autonomous vehicles to automatic translation to voice recognition. But when we look at the companies behind the development or use of those applications, we mostly see large companies, and a few technological start-ups. Most AI applications are currently developed or used by GAFAM[1], BAT[2] or the top 50 of national large enterprises listed on the international or national stock exchange. Think about the best facial recognition programs that are currently provided as a service by Amazon, Google and Microsoft. Most of those services are used by large organizations, both private and public. But what about the use of advanced AI technology for small and medium enterprises (SMEs), representing often the large majority of the economy, at least in terms of employees? AI in Spanish SMEs In this post, I describe AI in Spanish SMEs, dealing with questions such as: Are SMEs aware of AI? Do they know what it is useful for? Are they aware of the risks? Do they have access to the right skills? What about usage in different sectors? I have the honour to teach an AI class to more than 250 Spanish SMEs in the context of the DigitalXBorder program, an ambitious program to help SMEs in their digital transformation with participation of experts from Google, eBay, Amazon, Telefonica, Salesforce, Microsoft, etc, etc. AI is one of the around 25 topics of the course. The course takes place in more than 25 cities in Spain with about 30 SMEs participating in each city over the course of three years. The AI course consists of a bit of history and definitions of AI along with emblematic examples (General Problem Solver, ELIZA, the Turing Test), enterprise AI examples, ethical challenges, and what this all means or can mean for SMEs, who usually have less data, less budget and little access to the right skills. The course ends with practical examples of Machine Learning as a Service, allowing any SME to do Machine Learning on their own data sets within minutes or hours. If you are interested in the full course, here you can access its four videos (in Spanish). At the start of the program, I designed a simple questionnaire to get a better understanding of the state of play of AI in SMEs. Now that the first year of the course is over, I analyse the initial findings covering 7 cities and more than 50 SMEs. This is of course a small sample, so the interpretation is more qualitative than quantitative. Questionnaire results Following are some of the results. Except for one, all SMEs have heard of AI. Asked about what SMEs do understand by AI, the answers are not surprising. Machine Learning and intelligent software are at the top, followed by “thinking machines” and robots. A significant part also considers that AI can refer to all of the above (see Figure 1). Figure 1 What do SMEs understand by Artificial Intelligence? Robots, Intelligent software, Thinking machines, Machine Learning, Other, All of the above, Don’t know. About 80% of the participants know about some AI use cases. And 75% is aware of the risks associated with this technology (bias, undesired discrimination, explainability, future of work). The large majority, however, believes that the opportunities significantly outweigh the risks (Figure 2). Figure 2 Do you consider there are more opportunities or risks related to AI? More opportunities, more risks, don’t know. Most SMEs think that AI is already here, but that much more is to come. No one thinks that AI is only something of the future (Figure 3). Figure 3 Do you think AI is already here, or is something of the future? AI is already here, AI is here but much more to come, AI is for the future. However, the main challenges that SMEs face is to have access to the required technical skills such as data engineering, analytics and machine learning. More than 75% of the SMEs do not have access to the right knowledge (Figure 4). Obviously, that hinders the uptake of AI. Figure 4 Do you have someone in your company with knowledge of AI or Machine Learning? Yes, No. Asking whether SMEs are actually using AI, about 30% state that they are. Almost 60% has plans to use it, and only a bit more than 10% has currently no plans to use it (Figure 5). Figure 5 Are you using or considering using AI? I am using AI, I am considering using AI, I have no plans to use AI. When asked what problems they solve or plan to solve with AI applications, usual applications were reported like improving sales, market understanding, predictive maintenance, machine design, etc. Figure 6 Implemented and planned AI use cases in Spanish SMEs. Given the large variety of sectors where SMEs are active, a broad categorization has been used including services, technology, industry, consumer, food & agriculture, but almost 50% is from the industrial sector. Figure 7: To what sector does your SME belong? Services, Technology, Industry, Consumer, Food & Agriculture, Other. Conclusions In summary, we can see that there is high awareness of Artificial Intelligence in SMEs in Spain; several uses cases are known; and over 70% are implementing AI or planning to do so. But we also learned that the main challenge is to have access to the required AI skills such as Machine Learning, or even simpler “analytics” skills. This does not mean, however, that SMEs cannot start using AI. There are increasingly more tools available that offer Machine Learning as a Service through intuitive graphical interfaces, allowing non-data scientist to apply, evaluate and execute Machine Learning models directly on excels with data from the SMEs. As with many new technologies, first applications are pioneered by innovative early adopter enterprises. But when solutions are becoming more mature and automated, they come within reach of smaller organizations. This is the “normal” democratization process of new technology, and it will be no different with AI, which is reaching increasingly more SMEs. If you prefer to read this post in Spanish, you can find it here: AI OF THINGS La inteligencia artificial en pequeñas y medianas empresas 15 de octubre de 2019 To keep up to date with LUCA visit our website, subscribe to LUCA Data Speaks or follow us on Twitter, LinkedIn or YouTube . [1] Google, Amazon, Facebook, Apple, Microsoft [2] Baidu, Alibaba, Tencent
October 15, 2019
AI & Data
Artificial Intelligence for warfare or for maintaining peace
On July 3, 2019, I attended an event organized by the Spanish Center for National Defense Studies (CESEDEN) and the Polytechnic University of Madrid (UPM) on the impact of AI on defense and national security. Not coming from the military area, I was asked to speak about my view on what AI will look like in 20 years from now. But the interesting part was not my presentation, but the message conveyed by several generals, in particular by Major General José Manuel Roldán Tudela and Major General Juan Antonio Moliner González. Situations where AI may be helpful in wars are often related to safety of soldiers, fighting at the frontline, situations that require high endurance and persistence, lethal or very dangerous environments, to avoid physical or mental exhaustion, and when extreme fast reactions are required. AI can also improve different tactical levels in warfare related to for example, the decision-making cycle, improving the understanding of the situation, capacity to maneuver, protection of soldiers, performance of soldiers, capacity of perseverance. But there are several rules when applying AI to such situations, relating to supervision, teams, and security. Supervision The (AI) systems needs an advanced user interface for fast interaction All activities should be registered for later inspection The system can be activated and deactivated under human control Teams Humans and AI systems work together in teams AI systems should be able to explain themselves Security Hostile manipulation should be avoided No intrusion should be possible Cybersecurity AI is improving defense and national security on land, sea and in air. Some examples include: Land Remove landmines Recognition of important routes Battles in urban zones Air support Sea Mine detection and removal Anti-submarine warfare Maritime Search and Rescue Air Precision attacks Search and rescue in combat Suppression of Enemy Air Defenses There are of course also ethical aspects with the use of AI for Defense. For instance, the final responsibility of all actions needs to stay with humans. Humans should be “in the loop” (decide all), “on the loop” (be able to correct), and only in very specific cases “out of the loop.” An important lesson seems to be that when ethical principles are relaxed, armed conflicts increase. Specific aspects that were mentioned include: The principle to reduce unnecessary risks to own soldiers – machines seems to make less errors than people Discriminate between soldiers and civilians – AI is likely to better discriminate Overall, there is aversion against lethal autonomous weapon systems (LAWS) Raising a specific question about LAWS, the answer was that humans always need to stay in control of life or death decisions. But it was also recognized that there is a serious risk for an AI arms race. Even though many countries may be completely against the use of LAWS, if one country starts to develop and threat with LAWS, other countries might feel obliged to follow. This is probably behind the withdrawal of France and Germany from the “Killer Robots” ban. Humanity has experience with the nuclear arms race, and so far, has been wise enough to use it only as a threat. However, nuclear arms have a very high entrance barrier, probably much higher than LAWS. Let’s hope that humanity is also wise enough with LAWS and that no one has to brace for impact. You can also follow us on Twitter, YouTube and LinkedIn
July 12, 2019
AI & Data
London AI Summit 2019
The London AI Summit attracted thousands of visitors; so many more than when AI was still a niche thing in the 90-ies. As always with such big events, it was a mixture of really cool things, hyped promises, important issues, real use cases, fun apps, and some cool startups. Among the cool stuff was an RPA (Robotics Process Automation) demo that in 15 minutes programmed a bot where a handwritten message was sent to an email as photo; transcribed to text (no errors); pasted into a browser after first opening it; clicked on the tweet button and finally published the text on Twitter. And all without any human intervention! Hyped promises were made several times, one presentation mentioned that “if it is written in Python, it’s probably Machine Learning, if it is written in PowerPoint, it’s probably AI”. One of the overpromises was that AI will make health care accessible and affordable for everybody on earth. While this is a great vision to work with, achieving it involves so many other things besides AI. I was happy to see that there was a talk on data ethics and the ethics of AI algorithms. Rather than ticking the boxes in checklists, for aligning organization values with practice, it is paramount to have meaningful conversations. An example of a real use case was to use AI to evolve the customer relationship, and especially to do this across an organization in a scalable way across multiple channels and business processes. The pillar is a common data format which is turned into value through the combination of NLP, Machine Learning and RPA to improve the customer experience and increase efficiency. Another example of a use case was how AI can help pharmaceutical companies to fight the decline of the ROI on their R&D. A fun app was TasteFace that uses facial recognition to estimate how much you (dis)like Marmite, a typical UK thing for breakfast. A start-up with a purpose was Marhub which aims to build a platform to support refugees through the use of chatbots. Another social start-up was Access Earth who aims to provide – by crowdsourcing and AI – a world map of accessibility of public places for people with reduced mobility. And London remains a “hip” place, not only for colored socks, but also for shiny shoes.
June 17, 2019
AI & Data
Towards a new Big Data ecosystem to mitigate Climate Change
Big Data can be used to help fight Climate Change [1],[2],[3]. Several projects have analysed huge amounts of satellite, weather and climate data to come up with ways to better monitor, understand and predict the course of climate change. Recognising the dramatic impact of climate change on our lives and that of future generations, many governments are designing policy measures to mitigate the effects. It is however complex to estimate, and later monitor, the impact of those measures, both on climate change and on economic activities. One of the challenges governments face is to balance the mitigating measures with the impact on the economy. We believe that a combination of privately held data with public open data can provide valuable insights to both estimate and “quickly” monitor the impact on economic activities. This is the field of Business to Government (B2G) data sharing, whose value is well recognized as an enabler for solving important societal problems (e.g., by the European Commission, or TheGovlab). However, currently not many of such initiatives exist, and most of them are in pilot mode. Therefore, the Spanish Observatory for Big Data, Artificial Intelligence and Data Analytics (BIDA) studies the possibility of a B2G data sharing initiative to provide future insights to policy makers on climate change measures and economic impact. Figure 1: Spanish Observatory for Big Data, Artificial Intelligence and Data Analytics BIDA consists of around 20 large Spanish enterprises and public bodies and is a forum for sharing AI and Big Data experiences between peers. The initiative is looking into the possibilities to combine public and privately-held data (duly anonymized and aggregated) of its members into a common data lake to provide access to recognized climate change experts and data scientists. We believe that this would be one of the first occasions privately-held data would be shared for the common good on such a large scale. Applying AI and Machine Learning to such a unique data set has the potential to uncover so-far unknown insights about the relation between economic activities and potential measures to reduce climate change. One of the key success factors for B2G sharing initiatives is that, from the beginning, potential final users are involved and commit to putting the system into operation if the results of a first pilot are successful. We therefore would like to take advantage of the Climate Change Summit in Madrid to: invite policymakers and climate change experts to express their interest in this initiative, and talk to experts to evaluate the opportunity of this unique initiative. Climate change experts, policymakers and data scientists can express their interest by sending an email to richard.benjamins@telefonica.com or to bida@aeca.es. To stay up to date with LUCA, visit our Webpage, contact us and follow us on Twitter, LinkedIn o YouTube. [1] https://www.weforum.org/agenda/2018/10/how-big-data-can-help-us-fight-climate-change-faster/ [2] https://www.bbva.com/en/using-big-data-fight-climate-change/ [1] https://www.mdpi.com/406212 [3] https://www.mdpi.com/406212
February 3, 2019
AI & Data
How to measure your data maturity?
Big Data and Artificial Intelligence (AI) have become very popular these days, and many organizations have started their data journey to become more data-driven and take automated, intelligent decisions. But a data journey is a complex journey with several intermediate stages. While it is relatively clear what the stages are and what kind of activities they comprise (illustrated in Figure 1), it is less clear how to assess the overall data maturity of an organization with respect to its goal to fuel Analytics and AI. Figure 1 The phases of a typical data journey towards becoming a data-driven organization Indeed, measuring the data maturity of organizations is a multi-dimensional activity, covering a wide range of areas. In this article, we will provide an overview of those dimensions and how to measure progress on each of them. Figure 2 shows the dimensions, which we explain below using examples of what it means to be less or more mature. Figure 2 The dimension of measuring organizational data maturity IT, platform & tools Anyone who wants to do something with data and AI needs a platform where data is stored and accessed. Early stage, immature organizations will likely have any platform to start with, either in the Cloud or on-premise, with no particular strategy. Mature organizations will have a clear strategy for how to support all facets needed for Analytics and AI. The strategy will encompass whether systems will run on-premise, in the Cloud or using a hybrid approach. It will describe the reference architecture for the big data software stack, APIs for accessing data in secure ways, etc. It will also cover the analytics, data visualization and data quality tools available for the users across the organization. Mature organizations will have automated most of the processes to run the platforms and tools on a daily basis, with minimum manual intervention. Finally, mature companies have a clear budget assigned to this along with a data roadmap of new functionalities and new data sources to include. Data protection Data protection refers to the privacy and security of the organization’s data. Data protection can also be viewed as part of Data Governance, but due to its importance, it is often considered separately. With the new European GDPR regulation, it is clear for many organizations what it means to protect privacy of customer data. For most organizations, it is, however, still a major challenge to comply with all aspects of the GDPR. Because GDPR has set the bar high, we can say that organizations that are fully GDPR compliant, are mature on the data protection dimension. Data-mature organizations, in addition, use all kinds of privacy-enhancing technologies such as encryption, anonymization & pseudonymization, and differential privacy to reduce the risk of revealing personal information. With respect to security, apart from the technological solutions for secure data storage, transfer, access, and publishing, mature organizations also have a clear policy on who has access to what types of data, with special attention given to people with administrator rights who might be able to access all data and (encryption, hashing) keys. Data governance & management This dimension measures how well data is managed as an asset. Almost all organizations that have started their data journey some time ago will recognize that one of the biggest problems is to have access to quality data and to understand what all data fields mean. Managing data as an asset includes aspects such as having an up-to-data inventory of all data sources, a data dictionary, and a master-data-management solution with data quality and lineage. But it is also about processes, ownership, and stewardship. Data sources typically have an owner that is responsible for the data generation, either as a consequence of an operation (e.g. payment data generated by POS devices) or through explicit data collection. A data steward takes care of the data on a daily basis in terms of availability, quality, updates, etc. Organizations that take data serious tend to set up a “data management office” that functions as a centre of excellence to advise the different stakeholders in the organization. More advanced organizations not only manage their data, but also their analytical models throughout their lifecycle. They will also consider external data, either procured or as Open Data to increase the value potential. And the most mature organizations have a clear policy on Open Data, stating how Open Data should be managed when used (license, liability, updates, etc), and when and under what circumstances private data can be published as Open Data, and under what licence. Organization The organization dimension refers to how the data professionals are organized in the company. Is there a separate organization like a Chief Data Officer ? How powerful is this position in terms of distance from the CEO (-1, -2, -3)? Or are the data professionals split between several organizations such as IT, Marketing and Finance? What is the function of the data team? Is it a centre of excellence or is it operational, running all data operations of the company on a daily basis? And how well are the data professionals connected to the different businesses? Is there a company-wide “data board”, where data leaders and business leaders share, discuss and take decisions to align business and data priorities? Is there an initiative to “democratize” the data beyond the data professionals to the business people? How is the next layer of people involved in creating value from data? People The people dimension is all about how organizations go about acquiring and retaining the skills and profiles required for the data journey towards AI and Analytics. Is it just treated as one of the many profiles, or is there a special focus reflecting the scarceness in the market? If hiring is hard, are there programs for training and upskilling the workforce? How refined are the profile definitions? It should recognize the different essential profiles including data scientist (analytics and Machine Learning), data engineer (data pre-processing and cleansing), data architect (architectural design of platforms) data “translators” (translate insights in business relevance), and AI engineers. Business The final dimension, which is enabled by all the other dimensions, is the business dimension where the real value creation takes place. Mature organizations have a comprehensive data strategy where they lay out their plans and objectives for the six dimensions discussed in this article. There is also a clear vision on how much needs to be invested in each of the dimensions for achieving the goals. A data-mature organization also has a clear view on what use cases are possible and what the expected benefits are. Moreover, such organizations measure the economic impact of use cases and report them in a consistent manner at the company level so that there is a clear understanding of the value generated by the data investments. This is essential for continuing to invest in data. Finally, the most data-mature organizations are, apart from applying data and AI internally to optimize their business, looking at new opportunities with business. This could be based on insights generated from company data that are of value for other sectors and industries. For example, mobility data generated from mobile antennas, always in an anonymous and aggregated way, and combined with external data, has value for the traffic management , retail and tourism sector. But the new business opportunity could also be based on partnerships with companies from other sectors to combine data and generate differential insights. Data and AI can also be used for Social Good, that is, to pursue social objectives such as the Sustainable Development Goals of the UN. How to execute a data maturity assessment? A common way to perform a data maturity assessment is to translate each dimension into a set of questions with predefined answers ranging from 1 to 5, where 1 represents little maturity and 5 maximal maturity. This gives a questionnaire of less than 100 questions which still should be manageable. The questionnaire can be completed through interviews or as a self-assessment, possibly with a session afterwards where the self-assessed answers are challenged and the scores adapted. The resulting scores on each question are then aggregated per dimension, and finally in an overall data-maturity score. If done properly and avoiding tendencies to “look good”, this is a powerful tool to manage the data maturity of organizations: it embodies a data-driven way to manage the data journey. It allows to set objectives, track progress over time, prioritize data investments, and to compare or benchmark different units, especially in multi-national corporations. Don't miss out on a single post. Subscribe to LUCA Data Speaks. You can also follow us on Twitter, YouTube and LinkedIn
November 27, 2018
AI & Data
Artificial Intelligence – Five Fears Explained
While there are many great applications of Artificial Intelligence, a disproportionate amount of attention is given to concerns about AI such as robots taking over control, losing our jobs, malicious use, bias, discrimination and black box algorithms. While we agree that some of those concerns are legitimate, others are unrealistic or not specifically related to AI. Moreover, we should not look at AI in action in isolation, but in comparison with how things are happening without AI. Why are we afraid of AI? And should we be? No technology is without risk. The fear for Artificial Intelligence is partially based on legitimate concerns, and partially on movies and limited understanding. Fear 1 is about humanity losing control to robots who will take over the world. This fear comes from science fiction movies, and from confusing narrow AI (a machine that performs one specific task very well) with general AI (able to perform a wide range of tasks, being conscious). Today and in the foreseeable future, we are in the era of narrow AI. No need to fear that machines will take over, unless you believe in technological singularity, or think that we, humans, are machines ourselves … Fear 2 is about AI taking over our jobs by automating many tasks that currently are carried out by people. History has shown that any large technical revolution (electricity, motorised transportation) will affect jobs. Part of the jobs will disappear, but mostly, jobs will change nature and new jobs will be created (many of those new jobs are still unknown to us). Part of this fear is legitimate for those workers whose job will be mostly automated while not being able to develop the skills needed for the changing and new jobs. Fear 3 is that increasingly more decisions about people are taken or supported by AI: decisions about hiring, acceptance by insurers, granting of loans, medical diagnosis & treatment, etc. Such AI systems are trained on large data sets, and those data sets can contain undesired bias or sensitive personal data. The concern is that this might lead to discriminatory impact. Moreover, sometimes the algorithms of AI systems are black boxes, which justifies the concern that decisions are taken without people being able to understand them. Those fears are justified and creators of AI should be aware of, and transparent about those concerns, and do everything they can to remove them. If unsuccessful, then the AI systems should not be used for decisions that significantly impact people’s lives. Fear 4 is that sophisticated AI systems can cause huge harm in the hands of malicious people: think about AI-based cyberattacks. This is definitely true, but is not specific to AI and applies to any powerful technology. Fear 5 is based on losing one’s privacy, caused by all kinds of apps and companies that collect massive amounts of personal data, often in a less than transparent manner. This is a well-recognised issue and one of the reasons the GDPR exists. However, this fear does not only apply to AI systems, but to most digital systems that operate with personal data. We warn against an unfounded fear of AI. There are so many more good uses than bad uses. As we can see, while two of the five fears of AI are legitimate (jobs & discrimination/transparency), they have limited scope and solutions can be foreseen, either societal/organisational (fear 2) or technical/organisational (fear 3). Fear 1 (super-intelligence) is more a philosophical debate as well as the topic of movies. It is not a reality. Fear 4 (malicious use) and fear 5 (privacy loss) are very real and will happen, but are not specific to AI. It is human nature to pay attention to fear; that has contributed to putting us on top of the evolutionary chain. But let’s also not forget that AI can be used for an infinite number of good things to improve our world. Think about AI for Social Good to help achieve the UN’s Sustainable Development Goals (no poverty, no hunger, peace, health, education, equality, climate, water, etc.) Sometimes we think that we humans are a great species, but we shouldn’t forget that the majority of the misery, pain, destruction, wars, etc. in the world has been and is created by humans. We need to worry about machines becoming more intelligent and autonomous, but sometimes we could ask ourselves whether the world would be a better place, if less humans and more machines made decisions.
October 19, 2018
AI & Data
How to fund your data journey
As of 2018, most large multinationals have started their journey to become a more data-driven organization, usually as part of their digital transformation. For most of them, it is also clear that starting a data journey requires funding: a team needs to be created and a data infrastructure needs to be available (cloud or on-premise). The first pilot projects will be selected, usually together with several of the business areas. If the pilot is successful, it will be put into production to obtain the data benefits on a structural basis. For instance, a churn pilot gets a customer dataset from the marketing department and predicts with Machine Learning techniques what customers are at risk of leaving the company, and then tries to retain them. The number of retained customers can be translated into retained revenues. Putting this into production means that the marketing dataset is provided every week or month, the algorithms are executed automatically and the result is fed into the appropriate marketing channels to reach out to the customers. At the beginning of the journey, there is usually not too much discussion on who pays for what. The important thing is that things are happening and moving forward. But when the team grows, more pilots see the light and need to be put into production, questions about funding arise. Should the corporation continue to invest in the team? Should the business pay all or part? If the corporation keeps funding it, should the business be charged? At what rate should the charge be? If the work involves a third-party company, who is paying for them? Moreover, multinationals are usually formed by different legal entities and doing things for “free” is not easy to handle from a tax and anti-competition perspective. There are no unique answers to those questions, but what we can see is some patterns depending on the “data maturity” of the organizations. In general, as illustrated in Figure 1, corporate funding is available in the beginning, and over time, with increasing data maturity, central funding goes down, and business funding goes up. Usually, a small part of central funding remains to explore and test new, innovative technology and use cases. Figure 1 Typical funding evolution of data initiatives A specific application of this funding strategy is that the corporation funds the central initiative for a few years so the businesses get used to it, and from a certain decision point, joint funding happens, as illustrated in Figure 2. The advantage of this joint funding model is that the corporation can still stimulate strategically relevant local investments, but businesses also need to invest, avoiding the pitfall that “gifts” are easily accepted but not put into practice. Figure 2 Funding starts central, and, at some point in time, it becomes joint funding. Looking at the different stages of a data initiative: pilot, deployment, production, there are two main models where corporate funding diminishes over time, as illustrated in Figure 3 and Figure 4. Figure 3 Corporation funds pilot and deployment, business funds production In earlier stages of the data journey, the corporation might fund the data initiative in pilot and deployment stage, and the business takes care of funding the production part (Figure 3). Figure 4 Corporation funds pilot, business funds deployment and production However, it is more common that the corporation only funds the pilot part, which will be reusable among many businesses, whereas the deployment and the production part are fully funded by the business as they are business specific and not reusable (Figure 4). The latter strategy is also more acceptable from a tax perspective, keeping only the Group functions at the corporation. As a creative funding approach, the corporation can use the availability of free data assets to stimulate the business to step up their efforts in the data journey by, for instance, investing in data quality, governance or adopting a centrally developed data model. This implies that the corporation continues to fund data initiatives, but businesses can only take advantage if they comply with the corporate data strategy and standards. This model is illustrated in Figure 5. Figure 5 Stimulating local businesses to comply with a corporate data strategy There are, however, also situations where the first funding comes from the business, and the corporation steps in at a later stage. This is the case when a leading business in the Group explores a data initiative on its own account, and the result is considered a best practice. In this case, the business has funded the pilot and the deployment, and then the corporation steps in to turn the successful initiative into an asset that can be reused (deployed and put in production) by the other businesses of the Group. This situation is illustrated in Figure 6. Figure 6 Funding starts by a leading business unit. Then the corporation steps in to fund the development of a reusable asset that can be used by other businesses in the Group You can also follow us on Twitter, YouTube and LinkedIn
October 3, 2018
AI & Data
How to select AI and Big Data use cases?
Many organizations that start working with Big Data and Artificial Intelligence (AI) ask themselves the question: where to start? In general, there are two ways to start: 1) start building the capabilities needed to use AI and Big Data (infrastructure, data, skills, etc.), and 2) start with use cases that show the potential value to the organization. Most organizations choose the second option since it is easier to invest in capabilities once there is a clearer understanding of the value that can be generated. But how to choose the best use case to start with? In our experience, the best way to attack this problem is by building an opportunity matrix (also called the Ansoff Matrix) with the specific Data and AI opportunities for the organization. Let’s say, you want to apply Robotics Process Automation (RPA) as part of your AI strategy, and you want to decide what process to start with. The most successful applications of RPA are on processes that are highly-structured and apply to the core business. Figure 1 illustrates how such a matrix could look like, where the size of the circle represents the business value, and the colour the risk involved. The ideal processes to start with would be large, green ones in the top right. Figure 1. Generic Opportunity Matrix for RPA applications Source But how do we apply the opportunity matrix to Data and AI use cases for organisations that want to start? Usually, the two main axes represent value or business impact and feasibility. Value is important because demonstrating a use case on something that is of lateral importance to the business doesn’t convince the organization to invest. Feasibility is important because the results should not come in two years, but in months: the patience of businesses for results of new things is limited. However, other dimensions can be used for the axes (such as urgency to act); this depends on what is most important for the organisation at the time of starting. The additional dimensions (size, form, and colour) should represent other important factors to consider in the decision process. Figure 2 illustrates an opportunity matrix for big data and digital services when Telefonica Digital started its Big Data journey back in 2012. Here we prioritized the digital services that should include big data (business intelligence) capabilities according to value and urgency. The size of the bubble represented how simple (less complexity) it was to work on the topic, resulting in quicker results. The colour represented the risk of creating a silo solution as opposed to an integrated solution (preferred) where data of all digital services was stored in one single big data platform. Figure 2. Big Data opportunity matrix for Telefonica Digital in 2012 Sometimes it is hard to estimate the business value of a use case before actually executing it. A good way to estimate the business value of a use case is to multiply the business volume with the estimated percentage of optimization. For instance, if the churn rate of a company is 1% (per month) and there are about 10M customers, with an ARPU (average monthly revenues) of €10, then the business volume amounts to €1M per month or €12M per year. If Big Data could reduce the churn rate by 25%, that is, from 1% to 0.75%, then the estimated value would be €250.000 per month. The other important dimension to estimate is the feasibility of use cases. This is a more qualitative estimation which might be different per organization. Basically, it estimates how easy or difficult it is to execute the use case, including factors such as the availability of data (location, ownership, cost), the quality of data, the collaboration with the business area (some are champions, others are defensive), the privacy risk, etc. But how to get an overview of what use cases to consider for your industry? Many organisations have an initial idea of some use cases (such as upselling or churn reduction) but might lack a deeper understanding for coming up with a more exhaustive list of use cases to consider. Luckily, there is enough sector-based literature to help organizations with this step. For the telecommunications industry, for example, the TM Forum maintains a list of about a hundred uses cases along with important characteristics such as data requirements, privacy risk, value, etc. For the insurance industry, the website at http://wisetothenew.com/ai/ provides an overview of many Artificial Intelligence use cases (see Figure 3). Figure 3. AI use cases for the Insurance industry And then there are of course the usual research reports, reporting many use cases for different sectors such as the McKinsey report on Artificial Intelligence focusing on the sectors Retail, Utilities, Manufacturing, Healthcare and Education, and the PWC report “Sizing the price”. If you are looking for AI use cases in a particular sector, you can, of course, use a search engine, which will provide you with many suggestions, as illustrated in Figure 4. Figure 4. Finding the AI use cases in your sector You can also follow us on Twitter, YouTube and LinkedIn
September 18, 2018
AI & Data
Four design principles for developing sustainable AI applications
Artificial Intelligence (AI) has been put forward as the technology that will change the world in the coming decades. Many applications already have seen the light including recommendations of content, spam filtering, search engines, voice recognition, chatbots, computer vision, handwriting recognition, machine translation, financial fraud detection, medical diagnosis, education, transport and logistics, autonomous vehicles, optimization of storage facilities, etc, etc. However, creating AI applications also introduces challenges, some of which come from related technologies and areas, while others are specific to AI. In order to create sustainable AI systems, several key aspects have to be considered from the beginning of the development process (“by design”), rather than being applied as an afterthought, including data, security, privacy, and fairness. Data by Design Data by design refers to the process that organizations consider data (collection, storage, analysis, and usage) as an integral part of doing business. Many non-digital organizations that not apply this principle suffer from typical problems such as: Data accessibility. Too often, data is hidden in complex IT systems and/or sits with a vendor. Getting access to the data is often costly and time-consuming. Data ownership. Organizations work with many service providers to deliver their e2e value proposition. Oftentimes, the contracts with those providers do not clearly state the ownership of the data, leading to confusion and complex conversations when the data is needed for a new value proposition. Data quality. When data is not managed as an asset, there are no quality procedures in place. Checking data quality as late as during the analytics phase is complex and expensive, and should be automated as close as possible to its source. Organizations that fulfill the Data by Design principle have instant access to all relevant data with sufficient quality and are clear on the ownership of the data for the foreseen uses. Security by Design AI systems are powerful systems that can do much good in the hands of good people, but consequently, they can also do much harm in the hands of bad people. Therefore, one of the key aspects of AI development is "Security by Design". Security by Design is "an approach to software and hardware development that seeks to make systems as free of vulnerabilities and impervious to attack as possible through such measures as continuous testing, authentication safeguards and adherence to best programming practices. It puts emphasis on security risks at all phases of product development, including the development methodology itself: requirements, design, development, testing, deployment, operation, and maintenance. And is extended to third parties involved in the creation process. Privacy by Design AI systems are fuelled by data, and therefore another important principle is "privacy by design". Privacy by design calls for privacy and data protection to be considered throughout the whole engineering process. It was originally developed by Dr. Ann Cavoukian, Information and Privacy Commissioner of Ontario, and is based on seven principles: Proactive not reactive; preventative, not remedial Privacy as the default setting Privacy embedded into the design Full functionality – positive-sum, not zero-sum End-to-end security – full lifecycle protection Visibility and transparency – keep it open Respect for user privacy – keep it user-centric Figure 1 The seven principles of Privacy by Design (source) Fairness by Design AI systems support us in making decisions or make decisions on behalf of us. AI and Machine Learning (a subfield of AI) have proven to be very effective in analyzing huge amounts of data to come up with “objective” insights. It are those insights that help to make more, objective, data-driven decisions. However, when we let Machine Learning techniques come up with those insights, we need to make sure that the results created are fair and explainable, especially when decisions have an impact on people’s lives, such as medical diagnosis or loan granting. In particular, we need to make sure that: The results do not discriminate between different groups of people on the basis of race, nationality, ethnic origin, religion, gender, sexual orientation, marital status, age, disability, or family responsibility. We, therefore, need to minimize the likelihood that the training data sets we use, create or reinforce unfair bias or discrimination When optimizing a machine learning algorithm for accuracy in terms of false positives and negatives, one should consider the impact on the specific domain. A false positive is when the system "thinks" someone has, for example, a disease, whereas the person is healthy. A false negative is when a healthy person is incorrectly diagnosed as having a disease. With less false positives and negatives, an algorithm is more accurate, however, minimizing on one usually increases the other. Depending on the domain, false positives and false negatives may have different impacts and therefore need to be taken into account when optimizing algorithms. The AI systems are able to explain the “logic” of why it has come to a certain decision, especially for live-impacting decisions. AI systems should not be black boxes. When we build AI systems using those four principles- Data by Design, Security by Design, Privacy by Design and Fairness by Design- we can be more assured that we not only build performing systems but also secure, privacy-respecting and ethical systems. And this, in turn, we lead to greater acceptance of AI systems in the long run by societies and governments. You can also follow us on Twitter, YouTube and LinkedIn
September 10, 2018
AI & Data
AI, Data and IT - how do they live together?
Many things related to AI and Data are about information technology (IT). Systems, platforms, development, operations, security, all are needed for creating value with AI and Data, and are traditionally in the realm of IT. Yet, AI and Data imply specific technologies requiring particular profiles and skills. It is, therefore, no wonder that many organizations struggle where to put those areas in their organization. In an earlier publication, we discussed several alternatives that businesses have to “host” their Chief Data Officers. The conclusion was that the CDO is best placed in areas that are transversal to the business and matter to the business, for example, the Chief Operating Officer, the Chief Transformation Officer or the Chief Digital Officer. While this is an important organizational decision, wherever the CDO sits, he or she will always need to collaborate extensively with IT (usually the CIO). But what is the best relation between Data and AI on the one hand, and IT on the other hand? Data and IT Given that Big Data as an enterprise phenomenon exists longer than AI, there is more experience with the relation between Data and IT. Therefore, first, we will comment on the alternatives for the relation between Data and IT, and then bring AI into the discussion. Figure 2. Data reports to IT reflecting the large technological component of Data In Figure 2, Data is reporting to IT respecting its strong technological foundations. While this might be good to start data initiatives, since it is impossible to start data without technology, it lacks business as a driver of data. For this reason, most organizations are not using this structure today: they recognize that the value of data needs to be driven by business needs. Figure 3. Data and IT are independent departments reporting into different parts of the organization Figure 3 illustrates an alternative where the Data and IT departments report into independent parts of the organization. For instance, Data might report into the marketing area, whereas IT sits under the CIO. In such organizations, the relationship is a client-provider relation. This organizational structure usually causes many problems. Data is still a relatively new area and therefore needs many interactions with IT (install new software/libraries, modify permission, install updates, etc, etc.) Client-provider organizations function through a demand-management system with SLAs, and while that works for commodity IT programs, for (still) rapidly evolving technology this does not work since simple things might take weeks to complete. (for this reason, Gartner introduced the Bimodel approach for IT.) Figure 4. Data and IT are reporting to a common "boss" In Figure 4, Data and IT are both reporting to the same “boss”. The benefit of this organizational structure is alignment and coordination by design, and in case there are problems, escalation is simple, fostering quick resolution. Organizations that are not constrained by legacy decisions and structures might opt for this approach. Figure 5 Data and IT report into different organizations, but IT has a specific area dedicated to Data, possibly with a dotted reporting line to Data In Figure 5, Data and IT are both reporting somewhere in the organization, but IT has reserved (ring-fenced) a dedicated group of IT people to focus on Data. To reinforce this focus, a dotted reporting line to Data can be introduced. There are several advantages of this structure: The Data area will be better served by IT because, in the normal case, it doesn’t have to compete with other IT priorities. It ensures alignment between the technology that Data is using with the strategic choice of IT. The Data IT people still are part of the larger IT organization allowing for training, rotating to other interesting IT projects, etc. The disadvantages relate to the fact that sometimes the standard, approved IT technology might not be suitable for the rapid changing technology in the data space. Moreover, there are challenges with the teams. Do the Data IT people have Data or IT objectives? Or a mix of both? What happens in case the larger IT organization is under pressure. Will it still respect the Data focus, while it doesn’t necessarily see this effort reflected in its wider objectives? People in the Data IT team might also feel they have two bosses: their Data boss determining their daily work priorities, and their “administrative” IT boss who decides their bonus. If IT and Data get along well, there is no issue, but, unfortunately, in practice that is not always the case… One of the key factors in making this structure work is co-location of the Data and Data IT teams. While this doesn’t solve the HR problems, it does create a sense of belonging to one team, which helps smoothening the mentioned challenges. Figure 6. Data and IT report into different organizations, but Data has its own IT area, possibly with a dotted reporting line to IT Figure 6 illustrates the situation where Data and IT still report into different organizational units (as in Figure 5), but now the Data team has its own IT department, possibly with a dotted reporting line to IT. This structure has the same advantages of the previous structure and solves some of the challenges of the structure in Figure 5, notably, for the Data IT team it is now crystal clear who their boss is and who decides on their bonus. The disadvantage is the risk of a disconnect between technology used for Data versus the official IT technology standard. Moreover, it becomes harder for Data IT people to rotate to other interesting IT projects since they are formally not part of the IT organization. The dotted line will smoothen those issues to some extent, but they still need to be managed carefully. And what about AI? Data is fuelling many AI applications, and what is true for the relation between Data and IT is also true for the relation between AI and IT. What we currently see in the industry, however, is that some organizations set up completely new AI departments independent from Data departments. But since Data is a prerequisite for AI (at least for the part of AI that is based on Machine Learning), the same challenges we have discussed here will show up, along with the same alternative solutions. We wouldn’t be surprised, though, to see some political battlefields about whether AI should be a separate department or be merged with the Data department or maybe even absorb the Data area completely. You can also follow us on Twitter, YouTube and LinkedIn
August 28, 2018
AI & Data
Lessons learned from The Cambridge Analytica / Facebook scandal
It has been now some time that the Cambridge Analytica / Facebook scandal was first revealed on March 17, 2018 by The Guardian and the New York Times. Much has been written in the press since then about this scandal. Since then, Cambridge Analytica has closed its business, Facebook lost billions of market value and Mark Zuckerberg was summoned to appear in the US senate and the European Parliament to answer all kinds of questions about this case and about general privacy aspects of Facebook. Part of the reason that the situation has exploded into the scandal is that it might have influenced, in a so-far unknown way, the 2016 American Elections, and also the Brexit vote. Facebook suspended Canadian data firm AgreggateIQ from its platform, due to its involvement with Cambridge Analytica. Nobody knows yet whether this scandal finally will be the explosion of the privacy time bomb with profound impact on the data industry, or, whether some time after the storm, everybody forgets about it and life goes on like before. Figure 1. The 2016 US Presidential elections were intertwined with Cambridge Analytica But what is it exactly that happened and why has it become such a large scandal? And is what has happened exceptional? Or are similar things happening all the time, but they go by unnoticed? In this post, we will analyse step by step what has happened and then compare this to how the Obama administration used Facebook during his 2012 campaign. We leave it then to the reader to make up his or her mind. The steps leading to the scandal have been amply described, so we will just summarize it here. · In 2013, Cambridge University researcher Aleksandr Kogan and his company Global Science Research created an app that asked users questions to establish their psychological profile. The app also asked permission to access the users’ Facebook information including that of their friends. About 300000 users reportedly agreed to use the app in return for a small economic compensation. Through those 300000 opted-in users, Kogan got access to information of tens of millions of users. Using data science, the 300000 users allowed for establishing a relation between Facebook data and psychological profiles, which was then extrapolated to the tens of millions of users. Those profiles could then be used for “political advertising” or for influencing voting behavior. · Kogan reportedly sold the profiles of those tens of millions of users to Cambridge Analytica. · Trump’s campaign team hired the services of Cambridge Analytica to launch targeted Facebook ads to influence the voting behaviour for Americans against Clinton and in favour of Trump. · On March 2018, Whistleblower Christopher Wylie, former employee of Cambridge Analytica revealed to The Guardian and the New York Times about the activity of Cambridge Analytica and how they got access to the psychological profiles of tens of millions of American citizens, which were then used for the US elections.Two other events are important in understanding what has happened and what went wrong. · In 2014, Facebook changed its API so that consent of individual users did not extent anymore to their friends. That is, a user could still give consent for apps to access his or her personal information, but not for the information of their friends. However, Facebook did not apply this policy retroactively. · In 2015, through a publication of the Guardian, Facebook learned about the Kogan/Cambridge Analytica relation for political influencing through Senator Ted Cruz’ campaign against Trump in the elections for the republican candidate for the 2016 US elections. Based on this publication, Facebook asked Kogan and Cambridge Analytica to delete the data as it violated their T&Cs. Facebook claimed that both Kogan and Cambridge Analytica certified that the data had been deleted. Figure 2. Tech stocks including Facebook's, took a hit after the scandal came to light What did Facebook do wrong? In my opinion, Facebook made two mistakes: 1. The fact that, through their API, they not only gave access, after an opt-in, to a specific user’s information, but also to the information of their friends. It now seems strange that one person can give permission to access the personal Facebook data of 300 other persons, even if those 300 persons are “friends”. In a world where “privacy was no longer a social norm” this may have seemed normal, but now we know it is not. Notice that this phenomenon is something that is coming back in the GDPR with the “right to data portability” as we will see later in this post. 2. When Facebook learned about the data transfer from Kogan to Cambridge Analytica through The Guardian10, and asked both parties to delete all data, they did not sufficiently check whether this had been done. They were satisfied with a letter stating that and did not require more serious measurements. Where was the violation of the law? The real violation of the law has been in Kogan transferring the data to Cambridge Analytica, and thereby violating the terms and conditions of the Facebook API. Facebook and Obama’s re-election in 2012 While through this scandal and all the issues around Fake News the use of Facebook for influencing important world events is now questioned, Obama was praised for using Facebook and social media for his re-election campaign in 2012. However, while there are some differences – in the end, Obama didn’t violate the law – there are many commonalities. People who wanted to contribute to Obama’s re-election were encouraged to organize and/or notify all their activities through logging in on the Obama website or using Obama’s campaign App, using Facebook connect. This would result in the person consenting to inject his or her personal Facebook data (home location, date of birth, interests, network of friends) into a central Obama campaign, along with all the personal data of their friends. Once stored in the central Obama database, all this data was then combined with other voting data available, so Obama could send targeted political adds to people who they believed could be mobilised to vote for Obama. While Obama was transparent to encourage people to log in onto the campaign website, or use the App, with Facebook connect, it remains to be seen whether the volunteering individuals were aware of what happened to their personal data, let alone to the personal data of their friends. Obama exploited the same Facebook API as Kogan, which at that time was publicly available for any developer. Not Obama nor the press anticipated the far-reaching impact this action had on people’s privacy. Another question is whether they should have realized this… But the press praised Obama for pioneering a successful digital-first presidential campaign. But -as we have seen- while for Trump’s election the data used, was obtained illegally, for Obama’s re-election, the data was obtained in a legal way, complying with the T&Cs of Facebook’s API. Both campaign teams then used the Facebook ad platform to send targeted messages to clusters of voters. But there is also a difference in how Facebook was used. Using all the profiling information, Obama sent political adds on Facebook to clusters of people who the algorithms thought could be mobilised to vote for Obama, and many messages were sent by the supporters themselves. The Trump campaign team distributed targeted stories on Facebook to mobilise potential voters, but also distributed stories to discredit the opponent, Clinton, and sometimes those stories were claimed to be untrue (Fake News). The table below summarizes the commonalities and differences between the use of Facebook in the Obama and Trump campaign. Data Obama campaign Trump campaign Comments Consent for use in election Through Obama App, or login on campaign website with Facebook Connect No consent for this usage, but for scientific research Access to Individual & friends’ data Individual & friends’ data Both Obama and Trump exploited Facebook’s Open Graph API Usage through Facebook’s Ad platform Centrally designed political ads and volunteering user messages. Centrally designed political ads and, reportedly, stories to discredit Clinton There is debate on whether Cambridge Analytica spread “Fake News” to influence the elections. Lessons we should learn from this As said, some are considering that we are living on a privacy time-bomb. Could this scandal be that bomb that will change the data industry forever? No one yet knows. On the positive side, this scandal has helped that people and societies have become more aware of the use of personal data for advertising; even though this particular scandal is related to political advertising, the techniques are similar for general online advertising. The lesson for people is that we must be more careful when granting consent for personal data usage to companies whose services are for free: “If You're Not Paying For It, You Become The Product”. Another lesson we can learn is that people should not be able to give consent for usage of data of the people they communicate with. Only if both agree, consent should be considered given. This is however easier said than done. For example, the GDPR gives citizens a new right to “data portability”, where a user can ask any of her or his service providers for a copy of personal data or can ask to transfer (port) that data to another organization. But what happens when third users are included in this personal data? A bank transaction always includes an origin and destination user/organization. Likewise, a telephone communication includes a caller (the user) and a callee (the destination). Is it allowed to port data on/about the “destination”? Or in Facebook, if I port my Facebook data to Linkedin, should I be allowed to convert my “friends” into “connections”? Or should the destinations (receiver of transaction, callee, friends, etc) be anonymized? Or asked for consent? The Information Commissioner’s Office of the UK gives some advice, but this is not enough in case data portability will happen massively. Keep up to date with all things LUCA! check out our website, and don't forget to follow us on Twitter, LinkedIn and YouTube. Don't miss out on a single post. Subscribe to LUCA Data Speaks.
June 25, 2018
AI & Data
Will GDPR's "right to data portability" change the data industry forever?
After years of preparation, on May 25 2018, the new General Data Protection Regulation ( GDPR) has come into force. Much has been written, discussed and speculated about it. Over the past 2 years, organizations have worked frenetically to understand its implications and to be prepared for this date. In this blog we will explore whether the "right to data portability" will drastically change the data industry. The GDPR is a regulation, not a directive, and this is a main difference with the European data protection directive in force so far. A directive gives room for national interpretations, whereas a regulation is like a national law. Apart from this important change, other relevant changes relate to: Geography: the geographical scope is extended to all organizations that serve users in the European Union, regardless of citizenship and of where the organizations is headquartered. Penalties: the maximum fine for breaching the GDPR is 4% of the organizations global revenue or €20 million, whichever is greater. Consent: organizations that want to process personal data, need to obtain explicit consent (opt-in) through an easy understandable and clear text, defining and explaining the purpose of the processing. Data subject rights including the rights to be informed, right of access, right to rectification, right to be forgotten (erasure), right to data portability and right to object. Data Protection Officers: the appointment of a DPO will be mandatory for organizations whose core activities consist of operations which require regular and systematic monitoring of data subjects on a large scale, or of special (sensitive) categories of data. Breach notification: breach notification to the supervisory authority is mandatory within max 72 hours, and if the breach is likely to result in a high risk of adversely affecting individuals’ rights and freedoms, individuals must also be notified without undue delay. In this post, we will talk about one of the less known new rights of citizens, namely the right to data portability. This right has not received too much attention in all discussions around the GDPR. However, we believe that it might be a future game changer for many industries. The right to portability allows individuals to obtain a copy of their data and to reuse it for whatever purpose they see fit. They could use it just for their personal interest; or to transfer their data to a new service provider such as an electricity or insurance company. The new service provider would then “know” the new customer from day one. Not all personal data a company has about a customer falls under the right to portability. Covered are: The data the customer has provided to the service provider, such as name, address, bank information, etc. and The “observed” data that the service provider sees based on the customer’s usage of the service such as the KwH consumed, claims made or financial transactions made, etc. What is not covered is any information the service provider infers about the customer. For example, a company may use a Machine Learning model that assigns a score to customers reflecting the likelihood they will leave the company. This inferred " churn score" does not fall under the right to portability. Organizations that have prepared for this might have asked themselves the question of how many customers would exercise their right to portability. This is important since a low amount (in the hundreds) might be manageable in a manual way, whereas a large amount (e.g. in the ten thousands or even more) might require automation of the process, and therefore investment. There are usually two approaches companies have used to estimate the number of expected requests: They compare it with the right to access data, which is already a right under the current data protection directive, and assume similar amounts as today. In general, very few people exercise their right to access data, and most of the companies handle those requests manually. Another way companies use to estimate the number of requests for data portability is the (voluntary) churn rate of their customers. Customers that decide to change service provider might see a benefit in bringing their personal data to the new service provider because that makes the onboarding process easier; no need to fill in lots of information. Moreover, the new service provider can look at the usage behavior and give tailored services to the new customer from day one of the relation. Not all customers that churn will however chose to port their data. For instance, in the insurance industry, customers that have submitted many claims, might want to keep that information away from their new insurance company so that they are not penalized with a higher premium. In most cases, those two approaches have led organizations to believe that the right to portability will not be exercised too much, and therefore they have considered no specific investments to prepare for massive portability requests. In the short term, those organizations are probably right, and have taken the right decision. However, we think that this particular right might have a huge impact on many businesses across many sectors. Figure 2: One of the key parts of the DPR is the new "right to data portability". Here is why... The right to data portability also means that users can request service providers to directly transfer their personal data to other service providers. Moreover, users can authorize third parties to file the requests on behalf of them. And this is the point that might have a game changing impact on the data industry. Imagine that Amazon reaches out to all its customers to suggest that they authorize Amazon to file a data portability request on behalf of them to port their data from all their service providers (e.g. insurance, utilities, telecommunications, etc.) to Amazon. In return, Amazon promises to all customers who agree, to provide them with a better and cheaper alternative service, and significant discounts on future purchases. If Amazon were to offer telecommunications and insurance services, then through this campaign, Amazon could acquire many new customers. But more importantly, Amazon would have access to the personal data of all those users who accepted Amazon’s offer and could start creating value from this data. If this happened at a massive scale then, suddenly, the private data of the “left” service providers would have lost its uniqueness and thus would have become less differential. If we take this scenario to the extreme, then we might imagine a data war between companies to gather as much personal data as possible, and all in a way that is fully compliant with the GDPR. In the end, users are just exercising their right to data portability. Seen like this, it looks like a major threat for companies that are currently exploiting their propriety data for business because it is differential data; only they have access to this data. Notice, however that it can also be seen as an opportunity. Any organization could try to convince customers to port their data to them, and thereby increasing their customers and/or their data assets. If such a scenario happens, we think it is likely that it will be started and led by the likes of GAFAs and/or startups. Of course, this scenario will not happen overnight. Several things need to be in place for this scenario to become realistic. First of all, the GDPR already mentions that the data needs to be ported in a structured (e.g. columns and rows), commonly used (e.g. CSV) and machine-readable format. A second requirement is that data portability should be an automated process powered by APIs. This makes it similar to the PSD2 regulation (Payment Services Directive) in the financial sector, that obliges banks to open their customer information through APIs to support so-called Open Banking. In this scenario, customers can tell the banks to give access to their financial data to third parties who can then provide them with additional value or even transactional services. Banks might see this as a major threat, but they shouldn’t forget that they might charge for API usage and thus create a new revenue stream. Together, the GDPR’s data portability right and PSD2 might significantly change the banking and data industry. But neither automation nor APIs are sufficient for the scenario to work. What is still needed is a standard format to interchange data. Otherwise, a lot of effort needs to be done on the receiving side before the data can be processed. So apart from the data being in a structured, commonly used and machine-readable format, it also must be in a standard format. Only then, ecosystems can scale in a transparent way, with a possibly game-changing impact. With this in mind, there are three possible scenarios to consider: No standard - Each organization ports data in its own format, and receiving organizations need to build translators from the source format to the destination format. This will cause much data integration work, but on the other hand, it could start today. Sector standard - The different organizations of a sector define on a commonly agreed sector format. For instance, all major telecommunications companies in a country could come together to agree what data fields to interchange and what the format should be. Examples of this include the so-called Green Button in the utility sector in the USA: “The Green Button initiative is an industry-led effort to respond to a White House call-to-action to provide electricity customers with easy access to their energy usage data in a consumer-friendly and computer-friendly format.” Another example is the so-called Blue Button for the healthcare sector, also in the USA: “The Blue Button symbol signifies that a site has functionality for customers download health records. You can use your health data to improve your health and to have more control over your personal health information and your family’s healthcare.” Universal Standard - This is a cross-sectorial approach that tries to come up with a universal standard for data portability: the Rainbow Button: “The 'Rainbow Button' project has been initiated … by 8 leading companies …., in order to define a common framework for the deployment of the 'portability right' as described in the GDPR and the guidelines to data portability provided by WG29 in April 2017.” According to Fing, the organization that started the Rainbow Button initiative, “The regulators confirm that the right to data portability is at the heart of the creation of a data ecosystem and the services of tomorrow, based on new data usages initiated and controlled by the data subjects. The target is not limited to switching services (churn), but really to spark the creation of a new range of services based on data.” Another important initiative promoting the same approach is “midata” in the UK. When all these requirements have become a reality, then the impact of the right to data portability will have a game-changing impact on the data industry through the creation of thriving data ecosystems, where data can float freely around in a transparent way, and always under strict control of the users.
June 1, 2018
AI & Data
Artificial Intelligence is disrupting the law as we know it
Years ago, science fiction movies used to seem very futuristic to us - making us feel detached from the remarkable innovations we saw from scene to scene. However, what once seemed impossible is now a reality and we're slowly but surely learning to live with disruptive technologies. This quote perfectly encapsulates this: Figure 1: What does AI mean for legislation? The EU are moving fast. "humankind stands on the threshold of an era when ever more sophisticated robots, bots, androids and other manifestations of artificial intelligence ("AI") seem poised to unleash a new industrial revolution, which is likely to leave no stratum of society untouched" This could well be the introductory "Star Wars style" text of a science fiction film that is shown to the audience to understand the context of the movie. This quote could also be seen the same way: "within the space of a few decades AI could surpass human intellectual capacity in a manner which, if not prepared for, could pose a challenge to humanity's capacity to control its own creation and, consequently, perhaps also to its capacity to be in charge of its own destiny and to ensure the survival of the species" Well, those quotes are not from any movie but rather from an official draft report of the Committee on Legal Affairs of the European Parliament. With the rapid changes taking place in technology and society, many of us sometimes complain about the European Commission being slow to react, with the consequence that when new regulations or laws come into force, the world has changed again already and adaptations are already needed. How long it took to get the GDPR in place is a great example of this, with the first proposal being released in 2012, 6 years ahead of its predicted launch date: May 2018. However, this is not the case when it comes to legislation around Artificial Intelligence. The European Commission is ahead of time in thinking about how AI and the resulting Autonomous Robots might impact our society. And the impact doesn't seem to be small, according to the report. For this reasons, the European Parliament states that our laws need to be adapted to deal with those changes as soon as possible. Figure 2: The European Commission is taking the impact of AI seriously. However, before we break this down, what definitions should be considered ahead of this process? What even is a "smart robot"? Well, according to the Committee on Legal Affairs a smart robot has the following characteristics: Acquires autonomy through sensors and/or by exchanging data with its environment (inter-connectivity) and trades and analyses data. Is self-learning (optional criterion). Has a physical support. Adapts its behaviours and actions to its environment. Intuitively, this seems a very reasonable definition for a smart robot. From a legal (and above all) liability perspective, all characteristics are equally important: a smart robot can do things in the real world that have impact. From an AI perspective, the second characteristic about self-learning is the most important. Can a robot learn things during its "life" so that at the date of shipping (delivery to society) its behavior is unpredictable? While the date that new laws ruling autonomous robots and AI come into force is still far away, the Committee refers to long-existing fundamental principles to respect with regard to robots, namely Asimov's Laws of his book Runabout written in 1942: (1) A robot may not injure a human being or, through inaction, allow a human being to come to harm. (2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. (3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws (See Runabout, I. Asimov, 1942) and (0) A robot may not harm humanity, or, by inaction, allow humanity to come to harm. It has a lot of merit that those rules defined in the first half of the previous century are still valid almost a hundred years later, especially given the enormous industrial and technological revolutions that have taken and are still taking place. The Committee states that: "until such time, if ever, that robots become or are made self-aware, Asimov's Laws must be regarded as being directed at the designers, producers and operators of robots, since those laws cannot be converted into machine code" Liability The Committee goes on to talk about the impact of smart robots for society: "The legal responsibility arising from a robot’s harmful action becomes a crucial issue." And this is true. What happens when an autonomous robot does something that is harmful? Who is to blame? Or, in legal words, who is liable? Are current laws still applicable? "once technological developments allow the possibility for robots whose degree of autonomy is higher than what is reasonably predictable at present to be developed, to propose an update of the relevant legislation in due time" The draft report continues: "whereas in the scenario where a robot can take autonomous decisions, the traditional rules will not suffice to activate a robot's liability, since they would not make it possible to identify the party responsible for providing compensation and to require this party to make good the damage it has caused;" In normal language, when a robot gets in trouble and causes damage or harm, who should pay the bill, go to jail, or apologize? The conclusion is that new rules are needed to deal with those autonomous robots: "this, in turn, makes the ordinary rules on liability insufficient and calls for new rules which focus on how a machine can be held – partly or entirely – responsible for its acts or omissions" And: "the current legal framework would not be sufficient to cover the damage caused by the new generation of robots, insofar as they can be equipped with adaptive and learning abilities entailing a certain degree of unpredictability in their behaviour, since these robots would autonomously learn from their own, variable experience and interact with their environment in a unique and unforeseeable manner;" What is clear is that it will be unclear how much the creator, designer or programmer can still be held responsible for the unpredictable behavior of autonomous robots. However, what kind of legal status do autonomous robots need to have? Figure 3: How will liability work when robots cause harm? "robots' autonomy raises the question of their nature in the light of the existing legal categories – of whether they should be regarded as natural persons, legal persons, animals or objects" And: "creating a specific legal status for robots, so that at least the most sophisticated autonomous robots could be established as having the status of electronic persons with specific rights and obligations, including that of making good any damage they may cause" While it is unclear what legal status autonomous robots (powered by AI) should have, it is clear that current legislation is not sufficient. In fact, one suggestion for dealing with the potential damage robots can create is: "Establishing a compulsory insurance scheme whereby, similarly to what already happens with cars, producers or owners of robots would be required to take out insurance cover for the damage potentially caused by their robots." While I agree that the massive appearance of AI powered robots is important for liability legislation, I think that the real distinguishing factor is the second (optional) characteristic of the Committee's definition of smart robots: "is self-learning (optional criterion)" In my layman's view, the liability of a non self-learning robot lies with the user in case of wrong use and with the manufacturer in case of errors. Compare it with the cruise control function that most cars have today. To some extent, it has all the same characteristics of a smart robot, except that it doesn't learn. If a driver falls asleep while using the cruise control and causes an accident, the driver should be held liable. If a cruise control fails - and in spite of the driver's efforts to avoid - causes an accident, the manufacturer should be held liable. It is the "self-learning" aspect that makes the difference. Economic, labor impact There is also a lot of debate going on around smart robots taking away the jobs of people, and so far estimations diverge enormously. The Committee on Legal Affairs suggests that maybe corporations should contribute social insurance tax for the substituted employees by robots or AI. So, if without robots, 100 persons were needed to get a job done, and with robots or AI only 10 are needed, then the company still should pay social security for 100 employees. This is also referred to in the draft: "possible need to introduce corporate reporting requirements on the extent and proportion of the contribution of robotics and AI to the economic results of a company for the purpose of taxation and social security contributions" Or maybe corporations should be obliged to disclose the impact of their use of robots: "Disclosure of use of robots and artificial intelligence by undertakings. Undertakings should be obliged to disclose: – the number of 'smart robots' they use, – the savings made in social security contributions through the use of robotics in place of human personnel, – an evaluation of the amount and proportion of the revenue of the undertaking that results from the use of robotics and artificial intelligence." Job destruction by AI is already happening, for example, in Japan where an insurance company has substituted 34 claim workers with IBMs Watson. But how does this compare with the automation so far, that started with the Ford T and has increased since then on a continuous basis? Throughout the past 100 years, tasks have been increasingly automated and that has destroyed millions of jobs. What makes it different this time? The self-learning aspect? Its massive scale? All questions for which definite answers have yet to be given. Code of conduct Whatever the repercussions may be, it is clear that there will be a big impact, and the Committee proposes a code of conduct to ensure as much as possible that there will be no threat for humanity. "The Code of Conduct invites all researchers and designers to act responsibly and with absolute consideration for the need to respect the dignity, privacy and safety of humans." And specifically for researchers it states: "Researchers in the field of robotics should commit themselves to the highest ethical and professional conduct and abide by the following principles: Beneficence – robots should act in the best interests of humans; Non-maleficence – the doctrine of ‘first, do no harm’, whereby robots should not harm a human; Autonomy – the capacity to make an informed, un-coerced decision about the terms of interaction with robots; Justice – fair distribution of the benefits associated with robotics and affordability of homecare and healthcare robots in particular." Finally, to keep humanity and society safe, the Committee suggests to construct a: "Code of Ethical Conduct for Robotics Engineers License for Designers License for Users" All this becomes even more important when Technological Singularity is reached This is the idea that the invention of " Artificial Superintelligence" will suddenly cause runaway technological growth, resulting in agressive changes for human civilization. Our world is changing rapidly - are you ready?
April 19, 2017
AI & Data
Are you ready to become the CDO? Applying Analytics to the CDO role
As we have seen from an earlier post on CDOs, Chief Data Officers are becoming more popular. Increasingly more organizations are understanding that data is a strategic asset and an essential ingredient for their digital transformation. As a consequence, CDOs are the target of many "headhunters" or, as they call themselves, "executive search" agencies. In the last 4 years (2013-2016), I have been contacted many times by headhunters, and have received 14 job descriptions of a CDO, or similar position. The advantage of receiving CDO job descriptions is that it allows you to understand how the data world is progressing, what other large organizations are doing, and -why not- understand the market value of your knowledge and experience. You can learn a lot about different industries just by listening and having a conversation. The div below shows clearly that the need for senior data professionals is increasing. Figure 1: Number of CDO job descriptions received (by year). CDOs are not always called Chief Data Officers. Many of the titles are made up of a small set of terms including data, analytics, director, head, etc. The div below shows a wordcloud with the most used terms. Figure 2: Terms used in the title of senior data professionals. But what kinds of qualities are companies looking for in a CDO? We have used a simple R program to parse the job descriptions and create wordclouds, and this has resulted in some interesting insights. If we analyze all 14 job descriptions, we get the following wordcloud. Figure 3: Wordcloud of analyzing all 14 job descriptions. We see no surprises: "data", "business", "analytics", "management", "team", "experience" are among the top words used. If we now look at the same job descriptions but we don't visualize the words "data" and "analytics" we get a slightly different view, giving more importance to other accompanying relevant words. Figure 4: Wordcloud of analyzing all 14 job descriptions, but without the words "data" and "analytics. Indeed, any job description for a CDO will be about data and analytics. But it is interesting to know what else is relevant for the job. Therefore, in the rest of our analysis, we will remove the words "data" and "analytics" from the wordcloud to highlight more the differences than the commonalities. Consultancy firms Some of the job descriptions were from consultancy companies, while others (the majority) came from client organizations (those hiring consultants for Big Data projects). Below you can see the wordcloud for those consultancy companies. Figure 5: Wordcloud of job descriptions for CDO of consultancy firms. There is much focus on clients, practice, development, relationships and consulting, which is what one would expect for consultancy companies. Now, let's look at some different industries. We begin with the financial industry. Below we see the wordcloud (again we have removed the words "data" and "analytics". Figure 6: Wordcloud of CDO job description for the financial sector. "Bank", "management" and "business" are the most relevant words, but also interesting are "digital", and "group", "team". Almost all banks (multinational groups) are frenetically working on becoming more digital, and data is a large part of that strategy. Team work is key for that. Next we analyze the CDO job descriptions for the Insurance industry: Figure 7: Wordcloud of CDO job description for the insurance sector. Here we see much more emphasis on "international", "leadership", "team", "global" and "demonstrated". This may reflect the fact that many insurance companies are global companies operating in many countries across the world. This gives an additional complexity for a data strategy. It is curious that "technology" only appears small. Let's have a look at the telecommunications industry: Figure 8: Wordcloud of CDO job description for the telecommunications sector. In the telco industry there is a lot of emphasis on business and experience, but also on technological terms like "architecture" and "systems". "Strategy" seems also to be relevant. Telco's indeed are very technological companies (complex mobile and fiber technology), and Big Data forms part of their current strategy. The publishing industry is as follows: Figure 9: Wordcloud of CDO job description for the publishing sector. Relevant here are words like "business", "research", "change". Many publishing companies are active in the field of research publications. And the sector is undergoing a profound change of its traditional subscription business model. The Pharmaceutical industry: Figure 10: Wordcloud of CDO job description for the pharmaceutical sector. The importance of the word "development" distinguishes the pharmaceutical industry from others, probably because one of the main applications of Big Data in this industry is to develop new drugs. The Automotive industry: Figure 11: Wordcloud of CDO job description for the automotive sector. A CDO in the automotive industry is focused on serving the business from a global perspective. Performance is also relevant. Conclusions For almost all sectors -apart from Data and Analytics- Business is very important. But after that, different sectors highlight different aspects of the CDO role, as summarized here below: Insurance: global, team, leadership Telecommunications: technology and strategy Finance: (data) management Publishing: change Pharma: (drug) development Automotive: business, performance So, for data professionals interested in leading the strategy, execution and value creation with Big Data, make sure you have the right experience and knowledge. And, remember, if you are a CDO, and you need help to transform your organization in to a more data-driven one, LUCA is here to support you.
April 7, 2017
AI & Data
How these 4 sports are using Data Science
Thousands of companies around the world may have started their journey to become data-driven, harnessing the full potential of Big Data, however, the world of professional sports is only just starting to explore this world of applying Data Science to gain a competitive advantage. Until now, sports coaches have been able to boast about their experience or their gut feelings when making decisions and have therefore been somewhat resistant to the world of Big Data - something which we all saw so perfectly illustrated in Moneyball where Brad Pitt shows the tension between human experience and data-driven. However, things are changing - and slowly but surely we're starting to see a lot more research around the role of data in sport as well as an increasing number of jobs working directly with professional sports teams to enhance their performance. But which sports are leading the way? We took a look: 1. Formula 1 As our CDO, Chema Alonso, mentioned the other day in a talk with the Movistar cycling team, Formula 1 teams are pioneers when it comes to data-driven decisions. With every race generating huge amounts of data, on the track, vehicles, conditions and drivers - Williams saw a unique opportunity. They optimized team pits-stops by taking bio-metric measurements from the technical team allowing them to understand when each te am member functions optimally. Eventually, they ended up reducing their pitstop time to 1.92 seconds - the fastest ever recorded. Figure 1: Formula 1 team 2. Football Some years ago, we obtained some data from the Spanish football league for the 2012-2013 season, allowing our Data Scientists to carry out an in-depth analysis. The data was generated by cameras that take up to 10 photos per second, and are post-processed so that individual players can be identified. In the divs below you can see heatmaps of Barcelona vs Atletico Madrid. The area represents the field, and the goal of the team is located in the pointed parts with the darkest colours. The darker the color, the longer the players are at a certain location. It becomes immediately clear that Barcelona were more of an attacking team throughout that season, unlike Atletico who tended to have a more defensive approach. Figure 2: Barcelona's pitch activity (left) vs Atletico Madrid's pitch activity (right). It was also possible to follow individual players, and in the images below we can see the paths of two players throughout a match. The green points show that the player ran at approximately 5 m/s (the equivalent of running 100m in 20 seconds) and red points at approximately 7 m/s. It is clear that the first players runs much more than the second, but what does that mean? That the first player is better than the second? That they have different roles? Looking at only this data, if you were the trainer, which player would you prefer to buy? Figure 3: The "work rate" of Xavi Hernandez (left) vs Leo Messi (right). Well, the first player is midfielder Xavi Hernandez, and the second player is Leo Messi, who doesn't need any further introduction. 3. Cycling More recently, we had the opportunity to analyze data from the 2016 " Vuelta a España", looking at Movistar Team's performance. We had access to the data of 8 cyclists from the team throughout the 21 stages from start to finish. Every second, 7 types of data of each cyclist are captured resulting in more than 2 million data feeds. The variables captured, include location, altitude, force, speed, heart rate and pedal rate. Figure 4: The Movistar Team looking at their Big Data. With this data, apart from analyzing individual cyclists, it becomes possible to analyze how the team works together, and to understand and compare different stages. Looking at the data, it becomes very evident how professional cycling is a team sport with differentiated roles for the different team members: today it is impossible to win one of the main competitions "flying solo". What we have learned is that it is important to: Understand when team members peak in terms of performance so that training can be planned for peaks to coincide with competitions. Determine the context variables (altitude, weather), the training variables and the personal cyclist variables which impact most in the cyclist's performance and subjective experience. Combine the roles that cyclists play in the different stages with performance and fatigue variables to plan the recovery of the cyclists and the next stages during the competition. 4. Cricket Cricket, which is the most popular sport in India, and the second most popular sport in the world is also embracing the growing value of Big Data. IBM launched their #ScorewithData campaign during the Cricket World Cup which included a Social Sentiment Index which predicted correctly who would win certain phases of the tournament. The England Cricket team have also been pioneers and their ex-team coach, Peter Moores, even said " we use advanced data analytics as the sole basis for some of our decisions – even affecting who we select for the team." Nathan Leamon, who was hired by the new head coach for his expertise in maths and statistics, also used to create spreadsheets using Hawk-Eye technology to run match simulations which ended up being accurate to within 5% - breaking the field up into different segments for players to target when batting. Figure 5: Big Data in the world of cricket. As you can see, Big Data and Data Science aren't just limited to the world of big business - they are in fact affecting every single part of our lives. In the context of sport, the most successful will embrace data on and off the field if they want to fill up their trophy cabinets any time soon.
February 15, 2017
AI & Data
Artificial Intelligence vs Cognitive Computing: What's the difference?
The media hype around Artificial Intelligence (AI) and Cognitive Computing is unquestionable at the moment. They seem to appear everywhere on the Internet in the press, blogs, conferences and events and many companies and startups are now moving towards offering AI or Cognitive solutions. Google shows 44m hits on AI and 9m on Cognitive Computing and the div below from Google Trends clearly shows that the search term "Artificial Intelligence" is more popular than "Cognitive Computing", however, I'm sure we'll start to see that gap close in 2017. In our white paper "Surviving in the AI hype", we explained some of the fundamental concepts behind AI, as well as touching on Cognitive Science and Computing but in this post we want to focus in more detail on the relationship between AI and Cognitive Computing specifically. Figure 2: Google Trends for Artificial Intelligence (red) and Cognitive Computing (blue). To start off, what do Intelligence and Cognition mean if we search for a definition online? Intelligence: "the ability to learn or understand or to deal with new or trying situations : reason; also : the skilled use of reason (2) : the ability to apply knowledge to manipulate one's environment or to think abstractly as measured by objective criteria (as tests)." Cognition: "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses." The term Cognitive Computing has been popularized by IBM, through its Watson program, and we think IBM deserves a lot of credit for that. But what are exactly the differences between Artificial Intelligence and Cognitive Computing? And how are they related? Or are they synonyms? The best way we have seen this explained is by the late Herb Simon, one of the early gurus of AI: "AI can have two purposes. One is to use the power of computers to augment human thinking, just as we use motors to augment human or horse power. Robotics and expert systems are major branches of that. The other is to use a computer's artificial intelligence to understand how humans think. In a humanoid way. If you test your programs not merely by what they can accomplish, but how they accomplish it, then you're really doing cognitive science; you're using AI to understand the human mind." Stated in other words: AI is about making computers solve complex problems, that if people solved them would require intelligence - it is the result that counts. Cognitive Science is about making computers solve complex problems similar to how humans solve problems - it is the process that counts. So Cognitive Computing aims to mimic human reasoning behavior. Deep Blue is a great example to look at. In 1997, for the first time in history this IBM computer program beat the world chess champion, Gary Kasparov. The main reason why Deep Blue was able to win was pure brute force. It was capable of evaluating 200 million chess positions per second, and to look up to 20 moves ahead, something no human is able to do. So, is Deep Blue AI? Yes, because it solves a complex task, even better than the best human. Is Deep Blue Cognitive Computing? No, because the reasoning process has little to do with how humans play chess. So, what does it mean to "mimic human problem solving" or to "mimic the human brain"? Well, actually there are different levels of "mimicking". And here again we see the distinction between symbolic and non-symbolic or connectionist AI (as you can see here in our white paper). Originally, symbolic AI tried to mimic logical human problem-solving, while connectionist AI tried to mimic the brain's hardware, as Deep Learning does today. So, some symbolic AI may be cognitive computing, if it mimics human problem solving e.g. through rule-based systems. But not all (e.g. Deep Blue). Where does that leave us? Today the terms AI and Cognitive Computing are used as synonyms. We believe that whether a company calls its product AI or Cognitive Computing, it is a marketing decision. Everybody is offering AI solutions right now, and it becomes increasingly difficult to be perceived as differential. Cognitive Computing is an attractive alternative, as it alludes to the same underlying mystery (human intelligence), but the space is much less crowded.
February 13, 2017
AI & Data
Big Data: What's the economic value of it?
How do we put an economic value on Big Data initiatives in our organizations? How can we measure the impact of such projects in our businesses? How can we convince senior leadership to continue and increase their investment? Today on our blog we share our perspective. Most of us who are familiar with the Big Data boom, are also familiar with the big and bold promises made about its value for our economies and society. For example, McKinsey estimated in 2011 that Big Data would bring $300bn in value for healthcare, €250bn for the European Public Sector and $800bn for global personal location data. Recently, McKinsey also published an estimation of what percentage of that originally identified value has become a reality as of December 2016, which is up to 30%, with an exception of 50-60% for location-based data. These astronomic numbers have convinced, and are still convincing, many organizations to start their Big Data journey. In fact, only recently Forbes and IDC have estimated the market value for Big Data and Analytics technology to grow from $130B in 2016 to $203B in 2020. However, these sky-high numbers do not tell individual companies and institutions how to measure the value they generate with their Big Data initiatives. Many organizations are struggling to put an economic value to their Big Data investments, which is one of the main reasons why so many initiatives are not reaching the ambitious goals they once set. So how can we put numbers on Big Data and Analytics initiatives? From our experience here at LUCA, there are four main sources of economic value: Reducing costs with Big Data IT infrastructure There are considerable savings to be made on IT infrastructure: from propriety software to open source. The traditional model of IT providers of Data Warehouses is to charge a license fee for the software part and charge separately for the needed professional services. Some solutions, in addition, come with specific hardware. Before the age of Big Data this model had worked well, but with the increasing amount of data (much of which is non-structured and real-time), existing solutions have been come prohibitively expensive. This, in combination with a so-called "vendor lock-in" (due to committed investments and complexity, it becomes very costly and hard to change to another vendor solution) has forced many organizations to look for alternative, more economical, solutions. The most popular alternative is now provided by the Open Source Hadoop ecosystem of tools to manage Big Data. Open Source software has no license cost, and is therefore very attractive. However, in order to be able to take advantage of the Open Source solutions for Big Data, organizations need to have the appropriate skill set and experience available, either in-house, or outsourced. The Hadoop ecosystem software runs on commodity software, scales linearly and is therefore much more cost effective. For those reasons many organizations have substituted part of their propriety data infrastructure with Open Source, potentially saving up to millions of euros annually. While saving on IT doesn't give you the largest economic value, it is relatively easy to measure in the Total Cost of Ownership (TCO) of your data infrastructure, and therefore it is a good strategy to start with. Optimization of your business There is no questioning that Big Data and Analytics can improve your core business. There are two ways to achieve such economic benefits: by generating additional revenues or by reducing costs. Generating additional revenues means doing more with the same, or in other words, using Big Data to generate more revenues. The problem with this is that it is not easy to decide where to start, and it can be hard to work out how to measure the "more". Figure 2: Measuring the monetary value of Big Data in different areas of your organization. Reducing costs means doing the same with less, or in other words, using Big Data to make business processes more efficient, while maintaining the same results. External Data Monetization Here, the economic value of Big Data is not generated from optimizing your business, but it is generated from new, data-centric, business. This is only for organizations that have reached a certain level of maturity in Big Data. Once organizations are ready to materialize the benefits of Big Data to optimize their business, they can start looking to create new business around data, either by creating new data value propositions, i.e. new products where data is at the heart, or by creating insights from Big Data to help other organizations optimizing their business. In this case, measuring the economic value of Big Data is not different from launching new products in the market and managing their P&L. We believe that in the coming three to five years, the lion share of the value of Big Data will come from business optimization, that is, by turning companies and institutions into data-driven organizations that take data-driven decisions. And those are the kind of Big Data initiatives that organizations struggle to put an economic value on. Savings from IT are a good starting point, but will not scale with the business, while revenues from data monetization will become huge in the future, but are currently still modest compared to the potential value that can be generated from business optimization. Most businesses start their Big Data journey the right way. They make an opportunity-feasibility matrix, which plots the value of a use case against how feasible it is to realize that value. Figure 2 shows an example from EMC. The use cases to select would be those in the upper right quadrant: Figure 3. Opportunity Matrix for Big Data Use Cases - value versus feasibility. A good way to estimate the business value of a use case is to multiply the business volume with the estimated % of optimization. For instance, if the churn rate of a company is 1% (per month) and there are about 10M customers, with an ARPU (average monthly revenues) of €10, then the business volume amounts to €1M per month or €12M per year. If Big Data could reduce the churn rate by 25%, that is, from 1% to 0.75%, then the estimated value would be €250.000 per month. As an example of a cost saving use case, consider procurement. Suppose an organization spends €100M on procurement every year. Analytics might lead to a 0.5% optimization, which would amount to a potential value of €500.000 per year. There are hundreds of Big Data use cases and the TM Forum gives an extensive overview of some of the most relevant ones in the telecommunications sector. However, once the initial use cases have been selected, how should you measure the benefits? This is all about comparing the situation before and after, measuring the difference, and knowing how to extrapolate its value if it were applied as business as usual. Over the years, we have learned that there are two main issues that make it hard to measure and disseminate the economic impact of Big Data in an organization: Big Data is almost never the only reason for an improvement. Other business areas will be involved and it becomes then hard to decide how much value to assign to Big Data. Telling the whole organization and top management about the results obtained. Giving exposure to the value of Big Data is fundamental in raising awareness and creating a data-driven culture in your company. With regards to point 1, Big Data is almost never the only reason for creating value. Let's consider the Churn use case, and assume you use Analytics to better identify what customer are most likely to leave in the next month. Once the customers have been identified, other parts of the company need to define a retention campaign, and yet another department executes the campaign, e.g. through calling the top 3000 people at risk. Once the campaign is done, and the results are there, it is hard to decide whether the results, or what part of it, are due to Analytics, due the retention offer or due the execution through the call centres. There are two ways to deal with this issue: Start with use cases that have never been done before. An example of such a use case would be to use real-time, contextual campaigns. Real-time campaigns are not yet frequently used in many industries, and require Big Data technology. Imagine you are a mobile customer with a data tariff, and watching a video. The use case is to detect in real-time that you are watching a video and that you have almost reached the limit of your data bundle. The usual things to happen in those cases are that you either are throttled or are completely cut-off from Internet. Either situation results in a bad customer experience. In the new situation, you receive a message in real-time telling you about your bundle ending, and asking you whether you want to buy an extra 500MB for €2. If you accept this offer, then in real-time the service gets provisioned and you are able to continue watching your video. The value of this use case is easy to calculate: simply take the number of customers that have accepted the offer and multiply it by the price charged to the customer. Since there is no previous experience with this use-case, few people will challenge you that the value is not due to Big Data and Analytics. Compare with what would happen if you didn't use analytics. The second solution is a bit more complex, but applies more often than the previous case. Let's get back to the churn example. It is unlikely that an organization has never done anything about retention, either in a basic or more sophisticated way. So, when you do your Analytics initiative to identify customers that are likely to leave the company, and you have a good result, you can't just say that all is due to Analytics. You need to compare it with what would have happened without Analytics, all other things being equal. This requires using control groups. When you select a target customer set for your campaign, you should reserve a small, random part of this set to treat them exactly the same as the target customers, but without the Analytics part. If you do so, then any statistically significant difference between the target set and the control group can be assigned to the influence of Analytics. For instance, if with this, you retain 2% more customers than the control group, you then calculate how much revenue you would retain annually, if the retention campaign would be run every month. Some companies are able to run control groups for every single campaign, and so are always able to calculate the "uplift", and thus continuously report the economic value that can be assigned to Analytics. However, most companies will only do control groups in the beginning to make and confirm the case, and once confirmed they consider it business as usual (BAU), and a new baseline has been created. Figure 4: Sharing the impact of Big Data in your organization is fundamental. With regards to point 2, sharing results of Big Data within the organization in the right way is fundamental. It is our experience that while business owners love Analytics for the additional revenues or cost reduction, at first they are not always willing to tell the rest of the organization about it. But evangelizing in the organization about the success of the internal Big Data projects is critical to get top management on board and to change the culture. Why would individual business owners hesitate in sharing? The reason is as simple as it is human. Showing the wider organization that using Big Data and Analytics creates additional revenue makes some business owners worry about getting higher targets, but not with more resources (apart from Big Data). Similarly, other business owners might not want to share a cost saving of 5%, since it might reduce their next budget accordingly. Haven't they shown - through Big Data - that they can achieve the same goals with less? This is an example of a cultural challenge. Luckily it is not sustainable to maintain such a situation for a long time, and in the end, all organizations get used to publishing the value. But, it might be a problem especially at the beginning of the Big Data journey, when such economic numbers are most needed. For those organizations that in the end do not succeed to measure any concrete economic impact, don't worry too much either. Experience teaches us that, whereas organizations at the early phase of their journey are obsessed with measuring value, more mature organizations know that there is value and do not feel the need anymore to measure improvements. Taking full advantage of Big Data has changed the way departments interact and that is one of the main value drivers. Big Data has become fully integrated with Business As Usual. Big Data = BAU.
January 30, 2017
AI & Data
54% of organizations now have Chief Data Officers, but should mine?
With Big Data becoming such a big deal in the world of business, it is no surprise that the Chief Data Officer (CDO) has managed to wriggle its way into an extra seat around the boardroom table. Increasingly more organizations, in both the private and public sector, consider data to be a strategic asset, and for this reason, the most forward-thinking companies are appointing CDOs. In fact, according to this survey, 54% of firms now report having appointed a CDO, up from just 12% in 2012. Until the appearance of this new role, Business Intelligence (BI) and Big Data initiatives had often been remotely dispersed throughout organizations, working in isolated departments - even if there was supposedly a central BI department keeping tabs on the overall company data strategy. So, what kind of of questions will an organization be asking themselves ahead of appointing a CDO? We thought of a few: How far should the CDO be from the CEO? CEO-1 or CEO-n? If it is CEO-1, how does the CDO relate to the other officers, in particular the CIO and CTO? If it is CEO-n, to what Officer should the CDO report to? The CIO, COO, CMO, CFO, the Chief Transformation Officer, or the Chief Digital Officer? To leverage the full potential of data, the CDO is best placed in an area whose mission is cross-company and that represents a large chunk of the business. In this way, the value creation is not limited to one specific area (e.g. marketing), and the value is relevant for the business. Doing otherwise, creates a bias towards creating value only from data in a specific area, or in an area that doesn't really matter. Therefore, many argue that the best place to be for the CDO is at CEO-1 or at CEO-2 under the COO, which is cross-company. Having the CDO directly reporting to the CEO gets him or her a seat on the Executive Committee, which delivers a strong message both internally and externally. There are two alternative Officers who also ensure cross-organizational application and relevance: the Chief Transformation Officer and the Chief Digital Officer. While by nature those two roles have a temporary role (albeit for several years), they work in a cross-organizational manner and are tasked with the mission of adapting their business to the digital world, of which data is a pivotal part. Of course, having the CDO directly reporting to the CEO is not necessarily suitable for all organizations at all times. It requires a level of "data literacy", and is likely to be reserved for the more forward-looking organizations who really know and embrace the fact that they have to adapt to the digital world in a data-driven way. So why may organizations not yet want a want a CEO-1 position for the CDO? Some companies may be too immature from a data perspective (i.e. not fully data-literate) and therefore might want to place the CDO under the CIO with IT to make sure that there is sufficient quality data before starting to exploit it. Some organizations have a very clear idea of where to start exploiting data, so they place it under the corresponding department. For example, companies in sectors such as FMCG with a strong interest in improving their consumer marketing might place the CDO under the CMO. Those who want to innovate with data might even place it under the CTO (R&D), whilst organizations which want to save money, might place it under the Global Resources Officer. In general, if the CDO is placed within a specific area, it normally implies that the CDO inherits some of the objectives of that area. If it is under marketing, then objectives will probably be phrased in terms of sales or revenues. If it is under Global Resources, then it will likely be related to savings. Helping areas outside of their specific area then becomes a best-effort thing, rather than a core responsibility - depending on the bandwidth of the area of the CDO. However, experience teaches us that it is challenging to see this kind of cooperation beyond the day-to-day corporate limits of KPIs. So, if an organization decides to place the CDO under one of the Officers without a cross-organizational responsibility, they create an unnecessary limitation to value creation from data. But why then are most CDOs not CEO-1, but -2 or sometimes even CEO-3 or -4? Below, we briefly list the pros & cons for why an organization might do it this way: Pros & cons of CDO under: Figure 2: The Pro's and Con's of a CDO's position in the org chart Of course, whether a CDO is successful in his or her job does not only depend on how the role is placed in the organization, but it is an important factor. Other relevant factors are discussed in this article, such as business sponsorship or a lack of clarity on the role. In Telefónica, the CDO function was introduced to the Executive Committee at the end of 2015 and is currently held by Chema Alonso, whilst 5 years ago it was between CEO-5 and -4. Three years ago it became -3, then two years ago -2 and now it is CEO-1 - showing just how fundamental data is in our strategy going forwards in our quest to put customers at the center of everything we do. Figure 3: Chema Alonso, CDO of Telefonica Of course, this discussion is much more relevant for those organizations who are on their journey to becoming data-driven. However, there are many companies who are already data companies (i.e. their business is the data) and in their case, the CDO has very different requirements. Gartner wrote a report on the four types of Chief Data Officer Organizations highlighting that in data companies, the CDO is even more critital. We think that in such companies the CDO might even be the CEO. We may not know what the future holds for big corporates, but we do know that it will be driven by data.
January 18, 2017
AI & Data
LUCA's first 70 days
We are almost at the end of 2016, which has been a big year for the world of Big Data. One important milestone was the creation of LUCA, Telefonica’s new Data Unit, which focuses on developing Big Data solutions for private and public sector organizations. Our launch event took place on October 20th, which you can see an overview of here. LUCA (Last Universal Common Ancestor) has been warmly welcomed by the press, Industry Analysts and most importantly, our clients. Analysts firms such as Gartner, IDC, OVUM, and 451 said that Telefonica's strategic move with LUCA puts it well ahead of its competitors. Since our launch, the LUCA team has worked frenetically to set up the business in the different countries where we operate, and LUCA is now launched world-wide in Germany (through Telefonica Next), UK, Brazil, Colombia, Peru, Chile and Argentina. We have also regularly shared our views on the world of Big Data and Artificial Intelligence on our blog "Data Speaks". In our first 70 days of existence, we have shared various types of content as you can see below: Proofs of Concept (Demos): Our Data Scientists have shown how mobile data can be combined with alternative data sources (e.g. Open Data) to give actionable insights to private and public sector decision makers. These included our Global Comms demo, the Twitter experiment with Hue Lamps on the US Elections and Mobility Insights posts. Data Science and Data Scientists: We carried out interviews with our Data Scientists, talking about the 4 requirements to become one and also our prediction on the 5 most in-demand Big Data profiles in 2017. Big Data for Social Good: We've also been sharing our views on Big Data can be used to have a social impact, posting about the challenges of opportunities of using Big Data to help achieve the UN’s Sustainable Development Goals. You can read about the 6 challenges of Big Data for Social Good, or how mobile phones can save lives. Open Data: We also wrote posts discussing the opportunities and challenges of Open Data, as well as some proof of concepts using Open Data. For example, we looked at how Open Data can improve air quality, help commuting, or show the humanitarian impact of a war. Artificial Intelligence: Given the latest hype around AI, we decided to share a series of posts to help our readers understand the basics, as well as posting a White Paper on the topic. Examples include Chatbots, Big Dating, or our series on AI concepts. Data Events: We let you know about the events we attended, sharing the highlights of key Big Data conferences all around the world. Examples included LUCA at Big Data Spain, Big Data Castilla y León or the Telco Data Analytics Europe. Our people: We also shared more creative posts such as our Mannequin Challenge and our Christmas Lip Sync. LUCA News: We posted news from our different offices about our launches and our partnerships including an overview of our partnership with CARTO, our Smart Energy and Smart Steps products and also our antifraud applications. Our top 3 posts included our launch in Peru, our Mannequin Challenge and our proof of concept on commuter traffic in Madrid. We'll continue to post on such topics and much more in 2017 so make sure you don't miss out by subscribing here. Our goal is to bring you the best content possible in a timely fashion, sharing insights on cutting-edge technology and hands on-data science with the widest range of data sources possible. Since we started working towards the launch of LUCA in the summer of 2016, we have participated in 22 external events giving talks or participating in panels. Most of the events were related to Big Data, and some of them about Artificial Intelligence, as you can see here. We also carried out several proof of concepts in our Data Science team including Global Comms, a US Elections analysis using Hue lamps, Global Rider and a Commuting & Pollution analysis - all of which were published on our blog. We will deliver 12 more new proof of concepts in 2017. Last week we also published our first white paper on the fundamentals of Artificial Intelligence, and we'll be sharing plenty more throughout 2017 to give the latest insights on the world of Big Data and AI. Over the past few months, we have become obsessed with data, measuring as much as we can, aligned with our philosophy of making data-driven decisions. We are real fans of Lean Analytics, and we embed analytics in everything we do. We are not afraid to recognize our weaker points (“the KPIs that hurt”), and work hard to improve them, listening carefully to the data. We will also significantly progress our projects on Big Data for Social Good, so watch out for our announcement at the Mobile World Congress in February in Barcelona. We will also be making considerable progress as partners in the OPAL project, as well as applying Big Data to Digital Education in the Profuturo initiative. We are also working to create what we like to call an extended organization, in the sense of Salim Ismail’s Exponential Organizations. There are many people, foundations and companies who are willing to contribute to help achieve the UN’s Sustainable Development Goals with data, so we believe in bringing those individuals together. We will be working with several universities around the world, including the Pontificia University of Salamanca in Spain, and the ESAN Graduate School of Business in Peru. If you are interested in collaborating with us for the greater good, please contact us at hello@luca-d3.com. To give our activities in Big Data for Social Good a boost, we will start a Talentum Lab where 10 recently graduated students will work enthusiastically to accelerate our projects which harness the power of data to have a social impact. However, at LUCA we also think it's important to have fun whilst we're working hard. We came together with the rest of our CDO area, collaborating with different teams to work as one big family. For example, we filmed our very own Mannequin Challenge just over a month ago: Video 1: Some of our team at the Distrito Telefónica taking part in a Mannequin Challenge. After the success of this viral challenge, with more than 6500 views, we decided to spread the Christmas spirit by doing a Global Christmas Lip Sync, in which Telefónica colleagues from more than 7 countries participated: Video 2: Telefónica employees all over the world wish you Merry Christmas. As you can see, it's been a busy 70 days in the LUCA team and now we're ready to make 2017 a success. We'll be bringing exciting new products and services to our customers, driving forwards our Big Data for Social Good initiative and creating the most compelling content possible to make sure data is being used in the most innovative ways possible. We'll be taking the most data-driven approach possible to our work, ensuring the best experience for our users and clients - so watch out and keep in touch with us on our blog or website.
December 30, 2016
AI & Data
Can machines think? Or are humans machines?
This is the last in a series of three post about some fundamental notions of AI. The objective of these series of three posts is to equip readers with sufficient understanding of where AI comes from, so they can have their own criterion when reading about the hype of AI. If you missed any of the two previous posts, you can read the first one about what Artificial Intelligence is here, and the second one on how "intelligent" can Artificial Intelligence get, here. This dimension for understanding AI refers to how a computer program reaches its conclusion. Symbolic AISymbolic vs non-symbolic AIrefers to the fact that all steps are based on "symbolic"human-readable representations of the problems which use logic and searchto solve problems. Expert Systems are a typical example of symbolic AIas the knowledge is encoded in IF-THEN rules which are understandable bypeople. NLP systems which use grammars to parse language are also symbolic AI systems. Here the symbolic representation is the grammar ofthe language.The main advantage of symbolic AI is that the reasoning process canbe understand by people, which is a very important factor for takingimportant decisions. A symbolic AI program can explain why a certainconclusion is reached and what the intermediate reasoning steps havebeen. This is key for using AI systems that give advice on medicaldiagnosis; if doctors cannot understand why an AI system comes to its conclusion, it is harder for them to accept the advice. Non-symbolic AI systems do nomanipulate a symbolic representation to find solutions to problems.Instead, they perform calculations according to some principles which havedemostrated their capability to solve problems without exactly understanding howto arrive at their solutions. Examples include genetic algorithms,neural networks and deep learning. The origin of non-symbolic AI comesfrom the attempt to mimic the workings of the human brain; a complexnetwork of highly interconnected cells whose electrical signal flowsdecide how we, humans, behave. Figure 2 illustrates the difference between a symbolic and non-symbolic representation of an apple. Obviously, the symbolic representation is easy to understand by humans, whereas the symbolic representation isn't. Figure 2: A symbolic and non-symbolic representation of an apple (source http://web.media.mit.edu/~minsky/papers/SymbolicVs.Connectionist.html). Today, non-symbolic AI,through deep learning and other machines learning algorithms, isachieving very promising results, championed by IBM's Watson, Google'swork on automatic translation (which has no understanding of thelanguage itself, it "just" looks at co-occurring patterns), Facebook'salgorithm for face recognition, self-driving cars, and the popularity ofdeep learning. The main disadvantage of non-symbolic AI systems is thatno "normal" person can understand how those systems come to their conclusions oractions, or take their decisions. See for example Figure 2: in the left part we can understand easily why something is an apple, but looking at the right part, we cannot easily understand why the system concludes that it's an apple. When non-symbolic (aka connectionist) systems are applied tocritical tasks such as medical diagnosis, self-driving cars, legaldecisions, etc, understanding why they come to a certain conclusionthrough a human-understandable explanation is very important. In the end, in thereal world, somebody needs to be accountable or liable for the decisionstaken. But when an AI program takes a decision and no-one understandswhy, then our society has an issue (see FATML, an initiative that investigates Fairness, Accountability, and Transparency in Machine Learning).Probably the most powerful AI systems will come from a combination of both approaches. The final question: Can machines think? Are humans machine? It isnow clear that machines certainly can perform complex tasks that wouldrequire "thinking" if performed by people. But can computers haveconsciousness? Can they have, feel or express emotions? Or, are we,people, machines? After all our bodies and brains are based on a verycomplex "machinery" of mechanical, physical and chemical processes, thatso far, nobody has fully understood. There is a research field called"computational emotions" which tries to build programs that are able toexpress emotions. But maybe expressing emotions is different than feelingthem? (See Intentional Stance in this post). Figure 3: Can computers express of feel emotions? Another critical issue for the final question is whether machines can have consciousness. This is an even trickier question than whether machines can think. I will leave you with this MIT Technology Review interview with Christof Koch about "What It Will Take for Computers to Be Conscious", where he says: "Consciousness is a property of complex systems that have a particular “cause-effect” repertoire. They have a particular way of interacting with the world, such as the brain does, or in principle, such as a computer could." In my opinion, currently, there are noscientific answers to those questions, and whatever you may think aboutit, is more a belief or conviction than a commonly accepted truth or a scientific result.Maybe we have to wait until 2045, which is when Ray Kurzweil predicts technological singularityto occur: the point when machines become more intelligent than humans.While this point is still far away and many believe it will neverhappen, it is a very intriguing theme evidenced by movies such as 2001: ASpace Odyssey, A.I. (Spielberg), Ex Machina and Her, among others.
December 20, 2016
AI & Data
How "intelligent" can Artificial Intelligence get?
This post is the second in a series of three posts, each of which discuss the fundamental concepts of Artificial Intelligence. In our first post we discussed AI definitions, helping our readers to understand the basic concepts behind AI, giving them the tools required to sift through the many AI articles out there and form their own opinion. In this second post , we will discuss several notions which are important in understanding the limits of AI. Strong and weak AI When we speak about how far AI can go, thereare two “philosophies”: strong AI and weak AI. The most commonlyfollowed philosophy is that of weak AI, which means that machines canmanifest certain intelligent behavior to solve specific (hard) tasks,but that they will never equal the human mind. However, strong AI believes thatit is indeed possible. The difference hinges on thedistinction between simulating a mind and actually having amind. In the words of John Searle, "according to Strong AI, the correctsimulation really is a mind. According to Weak AI, the correctsimulation is a model of the mind.” The Turing Test Figure 2. The set up of the original Turing Test. The Turing Test was developed by Alan Turing in the 1950s and was designed to evaluatethe intelligence of a computer holding a conversation with a human. Thehuman cannot see the computer and interacts with it through aninterface (at that time by typing on a keyboard with a screen). In thetest, there is a person who asks questions and either another person or acomputer program responds. There are no limitations as to what theconversation can be about. The computer passes the test if the personcannot distinguish whether the answers or the conversation comes fromthe computer or the person. ELIZA was the first program that challenged the Turing Test, even though it unquestionably failed. A modern version of the Turing Test was recently features in the 2015 movie Ex Machina, which you can see in the video below. So far, no computer or machine has passed the test. Video 1: The Turing Test appears on a clip from the movie Ex Machina The Chinese Room Argument A very interesting thought experiment in the context of the Turing Test is the so-called "Chinese Room Experiment" which was invented by John Searle in 1980.This experiment argues that a program can never give a computer theability to really "understand", regardless of how human-like orintelligent itsbehavior is. It goes as follows: Imagine you are inside a closed room with door. Outside the room there is a Chinese person that slips a note with Chinese characters under the door. You pick up the note and followthe instructions in a large book that tells you exactly, for the symbolson the note, what symbols to write down on a blank piece of paper. You follow the instructions in the book processing each symbol from thenote and you produce a new note, which you slip under the door. The note ispicked up by the Chinese person who perfectly understands what iswritten, writes back and the whole process starts again, meaning that a realconversation is taking place. Figure 3. The Chinese Room mental experiment. Does the person in the room understand Chinese? The key question here is whether you understand the Chinese language. Whatyou have done is received an input note and followedinstructions to produce the output, without understanding anything aboutChinese. The argument is that a computer can never understand what itdoes, because - like you - it just executes the instructions of asoftware program. The point Searle wanted to make is that even if thebehavior of a machine seems intelligent, it will never be really intelligent.And as such, Searle claimed that the Turing Test was invalid. The Intentional Stance Related to the Turing test and the Chinese Room argument, the Intentional Stance,coined by philosopher Daniel Dennett in the seventies, is also of relevance for this discussion. The Intentional Stancemeans that "intelligent behavior" of machines is not a consequence of howmachines come to manifest that behavior (whether it is you followinginstructions in the Chinese Room or a computer following programinstructions). Rather it is an effect of people attributing intelligenceto a machine because the behavior they observe requires intelligence ifpeople would do it. A very simple example is that we say that ourpersonal computer is thinking when it takes more time than weexpect to perform an action. The fact that ELIZA was able to fool somepeople refers to the same phenomenon: due to the reasonable answers thatELIZA sometimes gives, people assume it must have some intelligence.But we know that ELIZA is a simple pattern matching rule-based algorithmwith no understanding whatsoever of the conversation it is engaging in.The more sophisticated software becomes, the more we are likely toattribute intelligence to that software. From the Intentional Stanceperspective, people attribute intelligence to machines when theyrecognize intelligent behavior in them. To what extend can machines have "general intelligence"? One of the main aspects of human intelligence, is that we have a generalintelligence which always works to some extent. Even if we don't havemuch knowledge about a specific domain, we are still able to make senseout of situations and communicate about them. Computers are usually programmedfor specific tasks, such as planning a space trip or diagnosing aspecific type of cancer. Within the scope of the subject, computerscan exhibit a high degree of knowledge and intelligence, butperformance degrades rapidly outside that specific scope. In AI, this phenomenonis called brittleness (as opposed to graceful degradation, which is how humans perform). Computer programs perform very well in theareas they are designed for, outperforming humans, but don't performwell outside of that specific domain. This is one of the main reasons whyit is so difficult to pass the Turing Test, as this would require the computer to be able to "fool" the human tester in any conversation,regardless of the subject area.In the history of AI, several attempts have been made to solve the brittleness problem. The first expert systems were based on the rule-based paradigm representing associations of the type if X and Y then Z; if Z then A and B,etc. For example, in the area of car diagnostics, if the car doesn't start, then the battery may be flat orthe starter motor may be broken. In this case, the expert system wouldask the user (who has the problem) to check the battery or the check thestarter motor. The computer drives the conversation with the user toconfirm observations, and based on the answers, the rule engine leads tothe solution of the problem. This type of reasoning was called heuristicor shallow reasoning.However, the program doesn't have any deeper understanding of how a carworks; it knows the knowledge that is embedded in the rules, but cannotreflect on this knowledge. Based on the experience of thoselimitations, researchers started thinking about ways to equip a computerwith more profound knowledge so that it could still perform (to someextent) even if the specific knowledge was not fully coded. Thiscapability was coined "deep reasoning" or "model-based reasoning",and a new generation of AI systems emerged, called "Knowledge-BasedSystems". In addition to specific association rules about the domain,such systems have an explicit model about the subject domain. If thedomain is a car, then the model would represent a structural model ofthe parts of a car and their connections, and a functional model of howthe different parts work together to represent the behavior of the car. Inthe case of the medical domain, the model would represent the structureof the part of the body involved and a functional model of how it works.With such models the computer can reason about the domain and come tospecific conclusions, or can conclude that it doesn't know the answer. The more profound the model is a computer can reason about, the lesssuperficial it becomes and the more it approaches the notion of general intelligence.Thereare two additional important aspects of general intelligence wherehumans excel compared to computers: qualitative reasoning and reflectivereasoning. Figure 4: Both qualitative reasoning and reflective reasoning differenciate us from computers. Qualitative reasoning Qualitative reasoningrefers to the ability to reason about continuous aspects of thephysical world, such as space, time, and quantity, for the purpose ofproblem solving and planning. Computers usually calculate things in aquantitative manner, while humans often use a more qualitative way ofreasoning (if X increases, then Y also increases, thus ...). The qualitative reasoning area of AI is related to formalism and the process to enable a computer to perform qualitativereasoning steps. Reflective reasoning Another important aspect ofgeneral intelligence is reflective reasoning. During problem-solvingpeople are able to take a step back and reflect on their ownproblem-solving process, for instance, if they find a dead-end andneed to backtrack to try another approach. Computers usually justexecute a fixed sequence of steps which the programmer has coded, withno ability to reflect on the steps they make. To enable computers toreflect on their own reasoning process, it needs to have knowledge aboutitself; some kind of meta knowledge. For my PhD research, I built an AI program for diagnostic reasoning that was able to reflecton its own reasoning process and select the optimal method depending onthe context of the situation. Conclusion Having explained the above concepts, it should be somewhat clearer that there is no concrete answer to the question posed in the title of the post, it depends on what one wants to believe and accept. By reading this series, you will have learned some basic concepts which will enable you to feel more comfortable talking about the rapidly growing world of AI. The third and last post will discuss the question whether machines can think, or whether humans are indeed machines. Stay tuned and visit our blog soon to find out.
December 13, 2016
AI & Data
Artificial Intelligence: What even is that?
Artificial Intelligence (AI) is the hottest topic out there at the moment, and often it is merely associated with chatbots such as Siri or other cognitive programs such as Watson. However, AI is much broader than just that. To understand what these systems mean for Artificial Intelligence, it is important to understand the "AI basics", which are often lost in the midst of AI hype out there at the moment. By understanding these fundamental principles, you will be able to make your own judgment on what you read or hear about AI. This post is the first of a series of three posts, each of which discuss fundamental concepts of AI. In this first post, we will discuss some definitions of AI, and explain what the sub-fields of AI are. What are the most common definitions of AI? So, first of all, how does Google (one of the kings of AI) define Artificial Intelligence? Figure 2: A popular definition of Artificial Intelligence (Google). There are many definitions of AI available online, but all of them refer to the same idea of machine intelligence, however, they differ in where they put the emphasis which is what we have analysed below (an overview of these definitions can be found here). For example, Webster gives the following definition: Figure 3: The official Webster definition of Artificial Intelligence. All definitions, of course, emphasise the presence of machines which are capable of performing tasks which normally require human intelligence. For example, Nillson and Minsky define AI in the following ways: "The goal of work in artificial intelligence is to build machines that perform tasks normally requiring human intelligence." (Nilsson, Nils J. (1971), Problem-Solving Methods in Artificial Intelligence (New York: McGraw-Hill): vii.) "The science of making machines do things that would require intelligence if done by humans." (Marvin Minsky) Other definitions put emphasis on a temporary dimension, such as that of Rich & Knight and Michie: "AI is the study of how to make computers perform things that, at the moment, people do better ."(Elaine Rich and Kevin Knight) "AI is a collective name for problems which we do not yet know how to solve properly by computer." (Michie, Donald, "Formation and Execution of Plans by Machine," in N. V. Findler & B. Meltzer (eds.) (1971), Artificial Intelligence and Heuristic Programming (New York: American Elsevier): 101-124; quotation on p. 101.) The above definitions portray AI as a moving target making computers perform things that, at the moment, people do better. 40 years ago imagining that a computer could beat the world champion of chess was considered AI. However, today, this is considered normal. The same goes for speech recognition; today we have it on our mobile phone, but 40 years ago it seemed impossible to most. On the other hand, other definitions highlight the role of AI as a tool to understand human thinking. Here we enter into the territory of Cognitive Science, which is currently being popularized through the term Cognitive Computing (mainly by IBM's Watson). "By Artificial Intelligence I therefore mean the use of computer programs and programming techniques to cast light on the principles of intelligence in general and human thought in particular." (Boden, Margaret (1977), Artificial Intelligence and Natural Man, New York: Basic Books) "AI can have two purposes. One is to use the power of computers to augment human thinking, just as we use motors to augment human or horse power. Robotics and expert systems are major branches of that. The other is to use a computer's artificial intelligence to understand how humans think. In a humanoid way. If you test your programs not merely by what they can accomplish, but how they accomplish it, then you're really doing cognitive science; you're using AI to understand the human mind." -- Herbert Simon Some however take much a more concise and less scientific approach with definitions such as: "AI is everything we can't do with today's computers." "AI is making computers act like those in movies." (Her, AI, Ex Machina, 2001: A Space Odyssey, etc.) From all of these definitions, the important points to remember are: AI can solve complex problems which used to be assessed by people only. What we consider today as AI, may just become commodity software in the not so distant future. AI may shed light on how we, people, think and solve problems. What are the sub areas of AI? Looking at the introductory table of content of any AI textbook will quickly reveal what are considered to be the sub-fields of AI, and there is ample consensus that the following areas definitely belong to it: Reasoning, Knowledge Representation, Planning, Learning, Natural Language Processing (communication), Perception and the Ability to Move and Manipulate objects. But, what does it mean for a computer to manifest those tasks? Reasoning. People are able to deal with facts (who is the president of the United States), but also know how to reason, e.g. how to deduce new facts from existing facts. For instance, if I know that all men all mortal and that Socrates is a man, then I know that Socrates is mortal, even if I have never seen this fact before. There is a difference between Information Retrieval (like Google search: if it's there, I will find it) and reasoning (like Wolfram Alpha: if it's not there, but I can deduce it, I will still find it). Knowledge Representation. Any computer program that reasons about things in the world, needs to be able to represent virtually the objects and actions that correspond to the real world. If I want to reason about cats, dogs and animals, I need to represent something like isa(cat, animal), isa(dog, animal), has_legs(animal, 4). This representation allows a computer to deduce that a cat has 4 legs, because it is an animal, not because I have represented explicitly that a cat has 4 legs, e.g. has_legs(cat, 4). Planning. People are planning constantly: if I have to go from home to work, I plan what route to take to avoid traffic. If I visit a city, I plan where to start, what to see, etc. For a computer to be intelligent, it needs to have this capability too. Planning requires a knowledge representation formalism that allows to talk about objects, actions and about how those actions change the objects, or, in other words, change the state of the (virtual) world. Robots and self-driving cars incorporate the latest AI technology for their planning processes. One of the first AI planners was STRIPS (Stanford Research Institute Problem Solver), that used a formal language to express states and state-changes in the world, as shown in Figure 3. Figure 4: The STRIPS planner to build a pile of blocks. Learning. Today this is probably the most popular aspect of AI. Rather than programming machines to do what they are supposed to do, machines are able to learn automatically from data: Machine Learning. Throughout their life, and especially in the early years, humans learn an enormous amount of things, such as talking, writing, mathematics, etc. Empowering machines with that capability makes them intelligent to a certain extent. Machines are also capable of improving their performance by learning by doing. Thanks to the popularity of Big Data, there is a vast amount of publications on Machine Learning, as well as cloud-based tools to run ML algorithms as you need them, e.g. BigML. Natural Language Processing. We, humans, are masters of language processing since communications is one of the aspects that make humans stand out of other living things. Therefore, any computer program that exhibits similar behavior is supposed to possess some intelligence. NLP is already part of our digital live. We can ask Siri questions, and we get answers, which implies that Siri processes our language and knows what to respond (oftentimes). Perception. Using our 5 senses, we constantly perceive and interpret things. We have no problem in attributing some intelligence to a computer that can "see", e.g. can recognize faces and objects in images and videos. This kind of perception is also amply present in our current digital life. Move and Manipulate objects. This capability is above all important for robotics. All our cars are assembled by robots, though they do not look like us. However, androids look a bit like us and need to manipulate objects all the time. Self-driving cars are an other clear example of this manifesting this intelligent capability. Figure 5. Self-driving cars combine many capabilities of Artificial Intelligence. In this first post (of three), we have explained some key notions about Artificial Intelligence. If you couldn't do so before, you will now be able to read AI publications a bit differently. In the next post, we will elaborate on the question of how intelligent AI can become. Stay tuned!
November 29, 2016
AI & Data
Open Data and Business - a paradox?
While Open Data has a wide range of definitions, Wikipedia provides one of the most commonly accepted: " Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." From our perspective, the most important word in this definition is "freely". And we pose the question: does this mean that Open Data and Business are incompatible? The short answer: absolutely not. McKinsey stated in a 2013 report that Open Data (public information and shared data from private sources) can help to create $3 trillion a year of value in seven areas of the global economy. The opportunities that arise when data is opened up to the masses are clear. However, the longer answer is that anyone who has tried to get some Open Data and perform an analysis knows that this is not trivial. Open Data varies much in terms of quality, formats, frequency of updates, support, etc. Moreover, it is very hard to find the right Open Data you are looking for. Today, most business and value from Open Data is generated through ad hoc consultancy projects that search, find and incorporate Open Data to solve a specific business problem. However, one of the visions of Open Data is to create a thriving ecosystem of, on the one hand, Open Data publishers, and on the other hand, users, developers, startups and businesses that process, combine and analyze this Open Data to create value (e.g. to solve specific problems, or to discover important and actionable insights). The current state of play is that those thriving ecosystems are still being formed, and there are several initiatives and companies that try to position themselves, mostly in specific niche markets. A few players in the field include: OpenCorporates. A large open database of companies in the world. Transport API. A digital platform for transport collecting all kinds of transport data, especially in the UK. Quandl. A financial and economic data portal. Those companies and organizations focus on aggregating Open Data in a specific niche area, and their business model is built around access to curated quality data. Other types of companies then can use this Open Data to run a specific business. A typical example of such a business is Claim my Refund, which uses Transport Open Data (e.g. from a Transport API) to automatically claim refunds for their customers in case there are delays on their underground trips in London. Figure 2: Claim my Refund, a startup based on Open Data. Another business model around Open Data is to help institution publish their Open Data in a structured way. Such projects are mostly performed for governmental institution: Socrata and Junar are cloud platforms that allow government organizations to put their data online. Localidata focuses on Location Data, especially in Spain. FiWare is an independent, open community to build an open sustainable ecosystem around public, royalty-free and implementation-driven software platform standards. Once the data is published as Open Data, developers and other companies can then access that data and build value added applications. In the governmental space it is not uncommon for Public Administration to pay for having its data published as Open Data, and then to pay again for an innovative application that uses this Open Data to provide value to citizens (e.g. with information about schools). In conclusion, there is definitely a business model for Open Data. In the short term around specific niche areas such as transport, or through ad hoc consultancy projects. In the mid term, business will evolve around ecosystems around Open Data both coming from the public and the private sector. However, the current state of play is relatively immature. The bottom line is that public Open Data still lacks quality and private Open Data is barely available. But this doesn't mean that Open Data is not powerful yet. A great example of this is where British Airways uses only three Open Data sets for an amazingly innovative advertising campaign in Piccadilly Circus in London: Video: British Airways uses Open Data for an advertising campaign. On a huge screen in Piccadilly Circus in London, a boy stands up and points to a passing planes only if it is a BA flight and it can be seen (i.e. there are no clouds). This advert is based on bringing together three data sources, which are all publicly available: GPS data, plane tracking data and weather data. This work illustrates the power of Open Data when combined with creativity. Here at LUCA, we're fascinated by Open Data so watch this space to see more posts and content on the power of opening up data to bring new value to society.
November 17, 2016
AI & Data
Big Data and Elections: We shine a light on Trump and Clinton
Twitter is widely used as a tool to understand and predict phenomena in the real world. Today on our blog, we have been using Twitter to understand the US Presidential E lections of November 8th 2016. There are no conclusive research results on whether it is possible to predict the outcomes of elections using tweets but we decided to investigate. Figure 1: Shine a light on Trump or Clinton In 2010, an article was published as part of the Fourth International AAAI Conference on Weblogs and Social Media entitled " Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment- concludes that it indeed is possible". This research used tweets from the German Federal elections in Sept 2009 and concluded that the mere number of messages mentioning a party can reflect the election result. Furthermore, joint mentions of two parties are in line with real world political ties and coalitions. The research also went on to show that an analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages "plausibly reflects the offline political landscape." However, another article published one year later in the same Artificial Intelligence conference -" Limits of Electoral Predictions Using Twitter" stated that “unfortunately, they found no correlation between the analysis results and the electoral outcomes, contradicting previous reports,”, basing their investigation on tweets of the US 2010 Congressional elections. A search in Google will show much more research on the feasibility of using Twitter and other social media networks for election predictions. Additionally, a Google Trends query to check which candidate generates more search activity provides some insight on the matter: Figure 2: Google Trends candidates' search activity Whatever the conclusion is, there is no doubt that Twitter reflects to some extent what is going on in a country just before important elections. During the two days running up to the elections on the 8th of November 2016, we established a real-time feed from Twitter filtering relevant hashtags, handles names and keywords. As technology, we have used Sinfonier, an ElevenPaths cybersecurity product to detect cyberthreats based on real-time information processing. In this case, Sinfonier automates the capture of tweets in real-time ( see Figure 3). Sinfonier encapsulates real-time capabilities of Apache Storm in a very elegant way, so we can build a “digest Twitter-to-MongoDB” within a few minutes and with almost zero code writing. Figure 3: Sinfonier Topology (data extraction pipeline) designed to capture real-time data from Twitter. We then also used real-time capabilities using Elastic Search and Kibana. Apart from visualizing the tweets in real-time in a dashboard ( see Figure 4), we wanted to try out something more fun so we also got some Philips Hue lamps involved. Figure 4: Real-time tweets on Clinton (blue) vs. Trump (red). Trump tweets almost double those of Clinton. Trump is slightly more active in replying and both re-tweet equally all tweets they are mentioned in. Actually, in the Big Data era, where taking data-driven decisions is the ultimate goal, in a variety of situations it will be unfeasible to use traditional dashboards to convey the status of real-time KPIs. Imagine the case of factories, call center open floors and large retail centers. It could be very cool to have “realtime transparent dashboards”, so that lights (visual perception) adapt to the (big) data produced in the real world in a fast, intelligent and pervasive way. The applications are limitless! In such cases, using dynamic lights could be a good alternative to convey the main insights of dashboards. For instance, in a call center, the light intensity and color could change according to the number and type of calls (complaints, information, products, etc.) from customers. By connecting lamps to the Internet, we enter the world of the Internet of Things. Hue lamps can be instructed to react to phenomena on the Internet using the “If This Then That" framework. Using IFTTT, you can use the lamps to turn the light on when it is raining in Amsterdam, or when your plane lands safely in another city around the world. So how have we connected the lamps to the tweets related to the US elections? On this occasion, we haven't performed a complex analysis nor a prediction of the winner as for this purpose, we would need to ask ourselves first if Twitter is the best data stream to use. Our first step was to design two query sets, in the first query, we count all the tweets related to common words used to refer to Hillary Clinton (e.g. Hillary, Clinton, HillaryClinton) or Donald Trump (e.g. Donald, Trump, DonaldTrump). The results of which are displayed in Figure 5: Figure 5: Twitter shows much more activityrelated with Donald Trump but tweet count include both positive andnegative references so direct interpretation could be misleading. The second query ( Figure 6) is much more specific as we compare tweets containing their handles, @ RealDonaldTrump vs @ HillaryClinton: Figure 6. The second query reveals a much more balanced pattern with no clear winner. Additionally, we set rules for the lights as follows: Lamp 1 is connected to a twitter feed on query 1 (Figure 5), blinking the color of the candidate who had more tweets in the last 2 seconds and showing a stable color of the candidate who had more tweets in the last hour. Lamp 2 has the same behaviour of Lamp 1 but this time it is connected to the second query stream (div 6). On the day of the elections, we used the lamps in our office to engage with Election Day. We observed that the lamps were mostly red, reflecting the fact that Trump has many more mentions than Clinton. However, we saw that many tweets mentioning Trump are actually against him, showing negative sentimient whilst Hillary Clinton attracted less negative tweets. Video: Pedro Antonio de Alarcon explains the technology used (video in Spanish). Now, the votes have been cast and the United States of America has decided - but in the midst of global frenzied reaction to Donald Trump's election, our lamps keep on flickering.
November 9, 2016
AI & Data
Telco Data Analytics: what's next in Big Data for Telcos
By Lora Mihova, Strategy Manager at LUCA and Richard Benjamins, Director External Positioning and Big Data for Social Good at LUCA. The European version of Telco Data Analytics Conference took place on October 25 and 26 in Madrid. This annual event also takes places in the USA and this year there were approximately 100 participants from Operators, Vendors, Startups and OTTs. For the first time, Telefonica was the "host operator" of the event which was run by KNect 365. Figure 1: TDA Europe, What's next in Big Data for Telcos Chema Alonso, Chief Data Officer of Telefonica, gave the opening keynote where he argued that taking data-driven decisions is now a must for all organizations, emphasizing the growing importance of security and privacy for any Big Data initiative. Phil Douty, Director of Partnerships and Strategic Alliances of LUCA discussed the power of anonymized and aggregated telco data enabling the understanding of a representative cut of the population, which can be extrapolated flexibly to show real behaviours. Phil transparently talked about the learning curve for telcos in the Big Data business: it takes time, money and learnings from mistakes being made. The accelerating pace of change in telco Data Analytics means that anyone who is still learning now, will struggle to catch up. Figure 2: Phil Douty discusses the power of anonymized ad aggregated telco - Photo by Lee Tucker A panel discussion on how to create a first class Data Science team brought to bear that there are different types of data professionals including Data Engineers (Data Plumbing), Data Scientists (analytics) and data-savvy managers (turning insights into actions), and because those profiles are very hard to find in the market, training of existing employees is very important. It also became clear that creating a team of Data Scientists is not enough to ensure impact; the rest of the organization also needs to have the right "data-oriented culture" such that insights are put into action acro ss the organization. Another discussion, led by Dr. Sebastian Fisher, Data Scientist at Deutsche Telecom, reminded us that for machines to be truly in telligent, they must be able to perform Reasoning, Knowledge Representation, Planning, Natural Language Processing, Perception, and manifest General Intelligence. As on many events of Big Data nowadays, Artificial Intelligence is getting increasingly more attention. Figure 3: Chema Alonso opening a session of TDA Europe BigML, one of the companies present at the conference, is a startup whose mission is to make Machine Learning easy. BigML offers a tool with several Data Science algorithms so that non-data scientists can execute clustering or predictive algorithms on (quality) data. BigML won the Partnership Award for its collaboration with Teléfonica Open Future_ in creating PreSeries, which uses BigML's algorithms to predict which startups will be successful; a truly data-driven approach! Another question which was discussed on the IOT analytics panel was: if you had $100 million to invest in IoT, what would you invest in? All panel members agreed that part should be invested in security: when everything (people and things) are connected to the Internet, the risk for abuse and disasters increases significantly. Several studies have shown that security in IoT requires more attention. Another suggestion was to invest in "innovation at the edge", that is, to invest some of the money in IoT-related startups so that the projects are not hindered by the rules of large organizations, which can hinder progress. Finally, for IoT Analytics to really take of, thriving ecosystems are fundamental, probably around a few main platforms in the world that will host a large share of the IoT data available. Such platforms would then give secure and "permissioned" access to data to developers, startups and businesses in order to create value. The last topic which was discussed was Big Data for Social Good, explaining how Big Data, and more specifically telco data, can help measure progress on the 17 Sustainable Development Goals of the United Nations. More details can be read in this post.
October 31, 2016
AI & Data
Big Data Week 2016: Forget Big Data, Artificial Intelligence is the new kid on the block
Yesterday, the LUCA team attended the first day of Big Data Week. BDW is a global community that organizes an annual event focusing on the social, political, and technological impacts of data, with events taking place during the same week in 9 cities around the world. Same day, Big Data Week Madrid (#bdwmadrid) kicked off, with the Barcelona edition set to take place tomorrow (October 26th). This is the 4th edition of the Big Data Week, and the 3rd held in Spain, which is organized by Synergic Partners. Figure 1: LUCA attends #BDW16 in Madrid and Barcelona this week Carme Artigas, the CEO and Founder of Synergic Partners, opened up by mentioning that it is more than 10 years since O’Reilly's Roger Magoulas coined the term " Big Data" in 2005. Roger himself also attended, explaining that 11 years later, Big Data is everywhere; in the press, on TV and there are hundreds of events on the topic. Yet the term Big Data has disappeared from Gartner's Hype Cycle of 2015 due to it no longer being an emerging technology. Figure 2: Carme Artigas kicks off Big Data Week Madrid In contrast to other technologies, Big Data has transformed from an emerging to mainstream technology in record time and the new kid on the block is Artificial Intelligence. Google acquired AI startup Deepmind in 2014, a company that built the first general learning system that can learn directly from experience. Their system learned to be an expert Atari player just by experimenting with the game. The same startup's AlphaGo program defeated in March this year one of the best GO players in the world. Like Big Data some years ago, Artificial Intelligence is now everywhere, and there are predictions that AI will make many jobs obsolete in the future, including that of Data Scientists. Some people are already talking about "stealing" AIs and now that AI is becoming more sophisticated, ethical discussions also start, like Stephen Hawking's " AI could spell end of the human race". Roger Magoulas also mentioned FATML: Fairness, Accountability, and Transparency in Machine Learning, something which is expected to become very important as machines increasingly take more decisions away from people. After all, who can explain how deep learning algorithms come to their conclusions? Apart from these ethical discussions, there were pragmatic and promising presentations discussing how Big Data can be used for Social Good. As it turns out, a wide range of data sources (as you can see below) can contribute significantly to monitoring and progressing on the UN's seventeen Sustainable Development Goals set for 2030. Figure 3: Big Data for Social Good Use Cases Tomorrow we'll be attending the Barcelona version of the event and LUCA's Strategic Marketing Manager, Florence Broderick, will be attending to expand on Big Data for Social Good and how mobile phone data can bring value to this very cause. Follow the conversation online at #bdw16. Figure 4: Big Data Week video
October 25, 2016
AI & Data
The 6 challenges of Big Data for Social Good
Many of us are familiar with the Sustainable Development Goals set by the United Nations for 2030 and increasingly more and more companies and organizations are contributing to their achievement. However, there are some specific companies in certain sectors who hold invaluable assets which can be key in accelerating the journey towards achieving these goals. One of those assets is Big Data. Figure 1: The six challenges of Big Data for Social Good A data-driven approach can be taken for each and every one of the Sustainable Development Goals, using data to measure how the public and private sector are progressing, as well as helping policy makers to shape their decisions and have the greatest social impact possible. As we can see below, there are many different use cases that can be considered by organizations: Figure 2: Big Data can support the SDGs However, many of the examples above refer to one-off projects and pilots and the real acceleration of towards these SDG's will come from running these projects on a continuous basis with (near) real-time data-feeds to ensure stability and continuity for the next generation of social Data Scientists. So what are the biggest challenges for companies and organizations who want to contribute their data for the greater good? Is it risky that the data has to leave the company's premises for analysis by other organizations? We've outlined the challenges decision makers are currently facing when it comes to Big Data for Social Good: 1: Privacy & Security Data needs to be anonymized and aggregated. But will the anonymization process be good enough? Is it impossible to re-identify customers or users? Once the data is somewhere else, how secure is it? If it becomes a constant data feed, how safe is it? 2: Legal For many companies, most of the relevant data is customer data. And although it is likely to be anonymized, aggregated and extrapolated, there is no full consensus on whether this is allowed or not. Organizations also have to face the challenges of there being a wide range of different Data Protection legislation in the different countries across their footprints. 3. Corporate reputation Even if things are completely legal, professionals may still worry about public opinion and how customers may see things differently. What happens after a data breach, even if the use of data had a social purpose? 4. Big Data is the Key Asset Businesses also have strategic commercial issues that they may struggle with. Many companies have only just learned that Big Data is a key asset, so may think why should they share this with someone else, even if for the greater good? 5. Competition. Could the competition get hold of my data (asset) and make inappropriate use of it? How would I explain that one in the boardroom? The competition is tough and sending data to an external platform has most CSO's concerned. 6. Cannibalization. Does this use of data for social good cannibalize some of my external Big Data revenue? What if I jeopardize an existing business opportunity in order to carry out a Big Data for Social Good project? Figure 3: Open Government Partnership Global Summit will take place in Paris 7-9 December 2016 However, there is an existing solution which addresses the first three challenges. The OPAL Project ( which stands for Open Algorithms) doesn't require companies to move their data off their premises; it stays where it is. Using OPAL, the algorithms are transferred to the data and are certified (against virus and malware) and produce the insights they are designed for (ensuring quality). Albeit, simple, this is an extremely powerful technology and as it is an Open Source project, all software developed will be freely available. The algorithms will be developed by the community and certified by OPAL. OPAL is still in early stage and low profile, but we firmly believe that it will encourager a wider range of companies to contribute to the Sustainable Development Goals. And while OPAL is an interesting solution for the privacy, legal and reputation concerns, it doesn't yet solve the strategic and business concerns mentioned above. Until now, there is a general consensus that Big Data for Social Good should be free of charge, meaning that Social Good implies Data Philanthropy: a form of collaboration in which private sector companies share data for public benefit. However, Big Data for Social Good projects do not have to be necessarily free of charge. While data philanthropy is very important to start the social good movement, in the long run we expect progress to be much quicker if there are also commercial opportunities. Companies are simply more willing to invest in something with a business model. At the moment there are several examples of Big Data for Social Good not being free: Many international organizations are spending a significant part of their budgets on monitoring and achieving the Sustainable Development Goals, including The World Bank, United Nations, UN Global Pulse, UNICEF and the Inter-American Development Bank, While it may not be appropriate to charge commercial rates, it may be possible to have an "at-cost" model. Several philanthropists are donating large amounts for social purposes such as the Bill & Melinda Gates Foundation for gender equality, or Facebook's founder, Mark Zuckerberg, who committed to donate €3bn to fight diseases. Many projects with a social purpose are a high priority for local and national governments. For example, generating a poverty index; anticipating pandemic spreads or reducing CO2 emissions in large cities. Governments are spending considerable amounts of their budgets on such projects and there is no reason why initiatives with a social purpose couldn't also be charged for. Sometimes a freemium model works: pilots (or proofs of concepts) are done free of charge, but putting the project into production requires investment. Or, insights with a limited amount granularity (frequency and geography) are free of charge, but more detailed insights often have a bigger price tag. While the discussion about data for SDGs and Data Philanthropy is far from over, some visionaries predict that any future commercial, business opportunity will have a strong social component. A great read on this is " Breakthrough Business Models Exponentially more social, lean, integrated and circular" which was recently commissioned by the Business and Sustainable Development Commission. Will Big Data cause the next revolution in social impact? We believe it can and we're 100% behind it.
October 25, 2016
AI & Data
From Data Exhaust to Data-Driven: How CEOs face Big Data
Since Big Data became a buzzword in the board room of companies some years ago (thanks to McKinsey's report "Big Data: The next frontier for innovation, competition, and productivity"), many organizations have started Big Data initiatives in the hope of achieving its full potential. Over time, many companies have started pilot projects to address some of their most important business issues. Often, these initial steps have not shown immediate results for a number of reasons, which have been amply published online. From my experience, one of the main reasons behind the failure of Big Data projects is Data Access and Data Quality. Figure 1: From Data-Exhaust to Data-Driven This is mostly true for "non-digital native" companies, and stems from the fact that such organizations never considered that the data their systems generated could be of strategic value. In other words, data was considered an "exhaust": a side-effect or a mere byproduct of running the business. While some things were done with some of this data such as descriptive business intelligence (i.e what has happened), data was never considered as a strategic asset. Normally, organizations take meticulous care of their strategic assets, and manage them explicitly, keeping a close eye on them at all times. Figure 2: Gartner infographic about CEOs on Data as an Asset When companies start their data journey, they don't often realize that their data has not been carefully taken care of or collected. It might be incomplete, duplicated, hidden, incorrect or even missing. When Data Scientists first get their hands on the data, they have many questions, and will find insights that do not make sense from a business perspective, perhaps even leading to wrong conclusions. Big Data Analytics and Machine Learning are no exception to the rule: "garbage in, garbage out". For all of these reasons, it is important for organizations to have the right expectations when starting their data journey. We are not saying that much upfront investment needs to go into data asset management, but that organizations must be aware of the potential pitfalls in their Big Data pilots. Ideally, business leaders need to move things in parallel: starting to create value through pilots, but also starting with data management so that when you are ready to scale Data Science projects, your data is in good shape and a first-class asset.
October 23, 2016