Fran Ramírez Vicente

Fran Ramírez Vicente

Master's Degree in ICT Security, Higher Technician in Digital Electronics, and Engineer/Degree in Computer Systems. Telefónica's CDCO Crazy Ideas team computer security researcher.

AI & Data
Women who changed Mathematics
BY FRAN RAMÍREZ & FRAN FENOLL Since March is also International Mathematics Day, we wanted to pay tribute to the great influence that women have had on this science from the beginnings of our civilisation to the present day. Of course, there are many others who do not appear in this article, but we want this representation to serve as a tribute and recognition to all of them. * * * Pioneering women in history Theano Bust of Theanus. Source. One of the first mathematicians we know of is perhaps Theano, who was born in the 6th century BC. Apart from being a mathematician, she also mastered other disciplines such as philosophy, physics and medicine. Treatises on polyhedra are attributed to her, as well as on proportionality, specifically the golden ratio. Theano is also known for being the wife of the mathematician Pythagoras, and for belonging to the Pythagorean school. In fact, it is thanks to Theano that we can study Pythagoras today, because when Pythagoras died there was a revolt against his school, and both Theano and her daughters saved his works, extending and spreading his study years later throughout Greece and Egypt. Hypatia Hypathia. Source Moving on in history, we come across another great woman, an innovator in her time, a teacher at the Neoplatonic School of Alexandria, Hypatia. She was born around 350 AD, the daughter of the mathematician and astronomer Theon. From an early age she was taught science, but her quest for knowledge and truth led her to travel to different parts of Athens and Rome in search of knowledge. This eagerness led her to teaching and oratory as headmistress of Theon's school, also known as the Musaeum. She excelled for years as a teacher of many pupils, both Christian and non-Christian. She contributed writings in fields such as Geometry, Algebra and especially Astronomy. Unfortunately she was lynched to death by a mob of Christians in 415. The Age of Enlightenment Émilie du Châtelet Gabrielle Emilie Le Tonnelier. Source. At the beginning of the 18th century, Émilie du Châtelet , Marquise de Châtelet, was born in France. Although she could have enjoyed a life full of luxury and extravagance, she decided to devote herself to research and to the dissemination of her theories, some of which even provoked wide debate in Europe. She stood out above all for her role in the dissemination of Newtonian theories, for her work in the Differential and Integral Calculus. Given her position, she received great mathematical knowledge from great professors of the time such as Pierre Louis Moreau de Maupertuis, Clairaut and Koenig, among others. Voltaire's influence was notable on the marquise for many years, and the two made a great couple, both sentimentally and in their work. In fact, they both came close to winning the competition in 1737 organised by the Academy of Sciences for the best scientific essay on the nature of fire and its propagation, won by the famous mathematician-physicist Leonhard Euler. The Marquise de Châtelet was the first woman to enter the Café Gradot to discuss mathematics with Maupertuis dressed as a man, it should be remembered that at that time women were not allowed to enter such places unaccompanied. She was also the first woman to have a public scientific debate. Maria Gaetana Agnesi During the 18th century we also meet the Italian mathematician Maria Gaetana Agnesi, considered by many to be the first university professor, as she took charge of her father's courses for two years in 1748. In 1750, after publishing her work on Analytical Institutions, the Pope appointed her to the chair of Higher Mathematics and Natural Philosophy at the University of Bologna. The magnificence of the Age of Enlightenment is spreading … Sofya Kovalévskaya Sofia Kovalévskaya. Source. Other authors, however, claim that the Russian mathematician Sofya Kovalévskaya was the first university professor in Europe in 1881, in Sweden. She was taught by the famous mathematician Weierstrass in Berlin. Her contribution to the Differential Calculus was very important, above all she managed to improve a result of the mathematician Cauchy, she enunciated and proved the theorem known today as Cauchy-Kowalevski. This was one of the reasons why she was awarded the title of Doctor Summa Sum Laude at the University of Göttingen in 1874, becoming, together with Agnesi, one of the first women in the world to do so. During her stay in Stockholm her study of Differential Calculus managed to solve one of the problems that had most disturbed famous mathematicians: the rotation of a solid body around a fixed point, which together with the known solutions of Euler and Lagrange solved the problem posed in 1850 by the Berlin Academy of Sciences. Sofya Kovalevsky Day, organised by the Association for Women in Mathematics (AWM), promotes the funding of workshops in the United States to encourage girls to explore mathematics. Sophie Germain Towards the end of the 18th century, we meet the French mathematician, Sophie Germain. She stood out among other things for the development of the Number Theory and Elasticity, but above all in her study we can highlight Sophie Germain's prime numbers and the attempt to prove Fermat's theorem, which despite not succeeding she was able to draw conclusions such as the theorem that bears her name. During her lifetime she corresponded with the mathematicians Lagrange and Gauss. In both cases and given the times Sophie Germain passed herself off as a man, and it was only after some time that she revealed her true identity. In 1816 she won the competition, with a paper entitled "Mémoire sur les Vibrations des Surfaces Élastiques". She became the first woman to attend the sessions of the French Academy of Sciences. Today, the Sophie Germain Prize is awarded annually to the researcher who has carried out the most important work in mathematics. Ada Lovelace's inspiration Mary Somerville During the progress of Scottish universities against other European universities, led by the scientist Lord Kelvin, the div of Mary Somerville emerged. She was born in Edinburgh in 1740, and although at that time women were not allowed to join universities or mathematical societies, this did not prevent her from disseminating her acquired knowledge and winning a silver medal for the solution of a problem on Diophantine equations in William Wallace's Mathematical Repository. In addition, in 1826 Mary Somerville wrote her first article The Magnetic Properties of the Violet Rays of the Solar Spectrum for the Royal Society in the Philosophical Transactions, and these were the first writings signed by a woman to date. Among her most outstanding achievements we can highlight her work in astronomy in the study of the orbit of Uranus, something which years later led to the discovery of the planet Neptune. This work earned her the medal of honour of the Astronomical Society and various medals and awards from different European societies and universities. Ada Lovelace Ada Lovelace circa 1836. Source. Mary Somerville was an inspiration to Ada Lovelace. Ada Augusta Byron, daughter of the poet Lord Byron and the mathematician Anne Isabella Noel Byron, was born in 1815, and is noted for her work with Charles Babagge on the construction of a differential and analytical machine (the latter was never built), possibly the forerunner of computers. All of Ada's contributions to the operation of Babbage's machine had to be signed under the initials AAL and these notes have become the basis of what we now call computer algorithms. We can therefore say that Ada Lovelace was the first female programmer in history. And despite her early death, her legacy is recognised today, with a programming language named after her as ADA. Great injustices and modern times Amalie Emmy Noether One of the great injustices that have been done to female mathematicians because of their gender is undoubtedly that suffered by Amalie Emmy Noether. Emmy was born in Germany in 1882 and was noted for her work in the field of algebra and topology, but despite her great knowledge, studies and the help of mathematicians such as David Hilbert and Felix Klein, she did not get a place at university, either during her time in Germany or in the United States at Princeton University, where she had to teach at the Bryn Mawr College for Girls. Maryam Mirzakhani Maryam Mirzajani. Source. In recent years we would like to highlight the Iranian mathematician Maryam Mirzakhani, born in 1977 in Tehran, who has been teaching at Stanford until her early death at the age of 40. She devoted herself to the study of Geometry, Topology and Differential Calculus, but above all to hyperbolic and Riemann surfaces. Maryam has the honour of being the first woman to receive the Fields Medal, a prize awarded every four years since 1936, which has had academic recognition similar to the Abel Prize since 2003 and the Nobel Prize (Alfred Nobel did not consider awarding a mathematical Nobel Prize because of the various legends that speak of "problems" with mathematicians). * * * Author's personal note I would like to highlight a mathematician in my life who was an inspiration and role model, her name is Fuensanta Andreu (1955-2008) Professor of Applied Mathematics at the University of Valencia. I was very lucky to have her as a teacher, not only for all her work in functional and differential analysis, but also for the closeness and simplicity with which she transmitted her classes. Thank you for your patience and help. Featured photo: Max Fischer / Pexels
March 9, 2023
AI & Data
Is your AI system discriminating without knowing it?: The paradox between fairness and privacy
This post assumes that readers are aware of the good things that Artificial Intelligence (AI) can bring to our businesses, societies and lives. And also about the most evident challenges that the massive uptake of this technology implies such as bias & unwanted discrimination, lack of algorithmic explainability, automation & the future of work, privacy, liability of self-learning autonomous systems to mention some of them. In this post, I will focus on bias and & unwanted discrimination, and in particular on supervised machine learning algorithms. The intrinsic objective of machine learning. Before entering into the matter, we should not forget that the intrinsic objective of machine learning is to discriminate; it is all about finding those customers that have an intention to leave, finding those X-Rays that manifest cancer, finding those photos that contain faces etc. etc. However, what is not allowed in this process, is to base those patterns (a collection of certain attributes) on attributes forbidden by law. In Europe, those attributes are defined in the General Data Protection Regulation (GDPR) and include racial or ethnic origin; political opinions; religious beliefs; membership of trade unions; physical or mental health; sexual life; criminal offenses. In the US, the following characteristics are protected under the US federal anti-discrimination law: Race, Religion, National origin, Age, Sex, Pregnancy, Familial status, Disability status, Veteran status, Genetic information. Different sources of unwanted discrimination by algorithms. As much research already has pointed out, in Machine Learning there are different sources of unwanted discrimination by algorithms, which may lead to discriminatory decision making. Discrimination due to bias in the data set because of an unbalanced distribution of so-called protected groups (represented by sensitive variables such as race, ethical origin, religion, etc, as mentioned above). Discrimination due to the availability of sensitive variables in the data set, or their proxies: apparently harmless variables that exhibit a high correlation with sensitive variables. Discrimination due to the algorithm manifested by the fact that the proportion of false positives and/or false negatives in the outcome is not equal across protected groups. High-profile cases of unwanted discrimination reported in the media. But let’s start with briefly mentioning some of the high-profile cases of unwanted discrimination that have been reported amply in the media: COMPAS. The US criminal system uses an AI system, called COMPAS to assess the likelihood of defendants committing future crimes. It turned out that the algorithm used in COMPAS systematically discriminated against black people. Amazon had to withdraw an AI system that automatically reviewed job applicants’ resumes because it discriminated against women. Google had to change its Google Photos AI algorithm after it recognized black people as Gorillas. Sparked by those high-profile cases, several approaches have seen the light that deal with the identification and mitigation of unwanted discrimination. IBM has developed an open source toolkit, called AI Fairness 360, that provides tools to detect the rate of bias in data sets and to mitigate the bias. Pymetrics, a data science company focused on recruiting, developed open-source software to help to measure and mitigate bias. Aequitas of the University of Washington is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias. Main approaches to detect and mitigate unwanted discrimination. In general, there are three major approaches to detect and mitigate unwanted discrimination in Machine Learning: Pre-processing: in this approach, biased variables are transformed into non-biased variables before the training of the algorithm begins. In-processing: in this approach, apart from optimizing the target variable (the goal of the algorithm), the outcome is also optimized for having no discrimination, or the least discrimination possible. Post-processing: this approach only acts on the outcome of the model; the output is manipulated in such a way that no undesired discrimination takes place. There are several criteria for measuring fairness, including independence, separation, and sufficiency. Telefónica is developing LUCA Ethics a post-processing approach to comply with the separation criterion. All those approaches help detecting and mitigating bias and unwanted discrimination by analyzing data sets or the outcome of the algorithm. However, they have one major assumption in common that is important to review. All approaches assume that the sensitive attribute against which should not be discriminated, is included in the data set. In the Amazon case on recruitment, gender forms part of the data set. In the COMPAS case, race is an attribute of the data set. This availability of the sensitive variable enables different kinds of checks on the data set such as its distribution, how balanced it is, etc, and, once the model is trained, this same variable also enables to check whether the model discriminates based on this sensitive variable, which usually corresponds to a protected group. Real-world datasets. But what happens when the data set doesn’t contain any explicit sensitive variables? Not surprisingly, most real-world data sets do not contain any sensitive variables because they are designed in this way. Since it is forbidden by law to discriminate against certain sensitive variables (see above for what is considered sensitive in Europe and the USA), most organizations make an effort to exclude those variables to prevent the algorithm from using it. Collecting and storing such sensitive personal data also increases the privacy risk for organizations. If we think about the high-profile cases mentioned above, the sensitive variables (gender and race) actually were in the data set. For gender, we may expect it to be present in many data sets (that is why many if not most examples of bias and unwanted discrimination are illustrated by gender). In the COMPAS case, race is in the data set because it is a very specific (criminal) domain. However, we wouldn’t expect attributes such as religion, sexual life, race or ethnic origin to be part of typical data sets used by organizations. The question arises then of how to know that you are not discriminating illegally if you can’t check it? The simple technical solution would be to have this sensitive personal data available in the data set, and then check the outcome of the algorithm against it. This seems however at odds with many data protection principles (GDPR art. 5, data minimization, purpose limitation) and best practices. Let’s look a moment at possible ways to obtain sensitive personal data: The user could be asked for sensitive personal data to be included in the data. It might seem unlikely that users would allow this, but in several UK institutions, sensitive personal data variables are asked and stored. Users can always choose not to provide this information, but the option is there. It seems, however, unlikely that users will consent massively to this. Organizations could use their internal data combined with publicly available data sources to infer the value of sensitive personal data for each of their users. Organizations could perform a survey with a representative subset of their users and ask them for their sensitive personal data. One could even announce that the survey forms part of an approach to check for unwanted discrimination. Any new machine learning algorithm can then be tested against this subset for unwanted discrimination. Some “normal” variables are proxies for sensitive variables, such as for example postal code for race in some regions of the USA. If such known (and confirmed) proxies exist and they are in the data set, they can be used for testing the algorithm for unwanted discrimination. The paradox between fairness and privacy. This leads to an interesting paradox between fairness and privacy: in order to ensure that a machine learning algorithm is not illegally discriminating, one needs to store and process highly sensitive personal data. Some might think that the cure is worse than the disease. This paradox was recently highlighted in the context of Facebook advertising for housing when Facebook was accused to discriminate against race and gender. The automatic process of targeting ads today is very complex, but Facebook could try to infer for each of their users their race and gender (using its own data but also publicly available data sets), and then use this to avoid unwanted or illegal discrimination. But would you like Facebook or any other private company to hold so much sensitive personal data? Given existing privacy regulations and risks, most organizations prefer not to store sensitive data unless it is absolutely necessary like for some medical applications. In that respect, from the four options mentioned above, the “survey” option seems to be the least risky option to equip organizations with a reasonable assurance of not discriminating. Practical implications. So, what does this all mean in practice for organizations? I believe most organizations are only starting to think about these issues. Only a few have something in place and are starting to check their algorithms for discrimination against certain sensitive variables, but only if they are available in the data set. For the cases where organizations do not have sensitive personal data in their data sets (and most organizations make an effort to exclude this data from their data sets – for obvious reasons as we saw), the current state of the art does not allow systematic checks. It is true that organizations are starting sensibilization campaigns to make their engineers aware of the possible risks of AI, and some are aiming to have diverse and inclusive teams to avoid as much as possible that bias creeps in the machine learning process. Conclusion. As a conclusion, if sensitive data is included in data sets, technically organizations can know whether or not they are discriminating in an unfair way. But when there is no sensitive data in the data set, they cannot know. This might not seem optimal, but it is the current state of play, which until now, we all seem to have accepted. I am, however, convinced that new research will also come up with solutions to tackle this problem, and thereby solving one of the most cited undesired, unintended consequences of Artificial Intelligence. To keep up to date with LUCA visit our website, subscribe to LUCA Data Speaks or follow us on Twitter, LinkedIn or YouTube .
September 24, 2019
Cyber Security
How to analyze documents with FOCA in ten steps (or fewer)
Every time we create an office document—such as a word processor file (e.g., Microsoft Word), a presentation (PowerPoint), a spreadsheet (Excel), a PDF, or even an image—these files by default store far more information than we might expect. Embedded within these files is additional content known as metadata, which can include details such as the author's name, creation/modification dates, or even the document’s title. While this already provides quite a bit of information, deeper analysis can extract even more data, including highly valuable insights into the infrastructure where the document was created. For instance, it’s possible to extract passwords, usernames, folder names, server names, printers, version history, and more—all from a simple office document. This kind of information can pose a serious threat not only to personal privacy but also to the security of an entire company or organization, since it exposes valuable data that a potential cybercriminal could use to study your infrastructure (a technique known as fingerprinting) and potentially launch a targeted attack. In the case of images, the most sensitive information is often the geolocation data, which could, for example, reveal the route of a trip. Metadata is more important than it may initially seem. Perhaps the most well-known case is that of Tony Blair and the Word document that supposedly proved Iraq had weapons of mass destruction—but a review of the metadata revealed a host of hidden content, including revisions and comments, that ultimately proved the information was false. FOCA is a free tool created by ElevenPaths, designed to analyze metadata in both individual documents and across entire organizations. FOCA is open source and available for download from the ElevenPaths GitHub repository. Let’s explore how easy it is to extract all the data from an office document and obtain metadata reports for an entire organization in just a few simple steps. Extracting metadata from one or more local files Step 1: Once FOCA is open, simply select the “Metadata” option [1]. Then, right-click on the area shown in the image [2] and select “Add file” [3] (if you want to analyze the contents of an entire folder, use “Add folder”). Choose the file whose metadata you want to analyze (you can also drag and drop files or folders directly into the interface). Step 2: Once the file is loaded, right-click on it [4] and select “Extract Metadata” [5]. Step 3: To view the results, check the left panel where the “Metadata” section will display the file name and format (e.g., a .docx file called “Test1”) [6]. Clicking on it will display a summary of all extracted metadata in the right panel [7]. Extracting all metadata from an organization Step 1: The first step is to define a project. Go to the “Project” section and select “New Project” [1]. Step 2: Use the “Select Project” field [2] only if you’ve already created a project and want to reuse it. If starting from scratch, leave this blank. Enter a Project name [3] and the Domain website [4] you want to audit. If there are alternative domains you want FOCA to include in the search, add them in “Alternative domains” [5]. The files you download (we’ll go over that process shortly) will be saved to the folder you define under “Folder where save documents” [6]. Then click “Create” [7] to set up your project. Step 3: You’ll now return to the “Metadata” screen. First, select your search engines [8] (in the example, all three are selected). In the “Extensions” section, choose which file types you want FOCA to look for in your project [9]. Then click “Search All”. Depending on the number of files found at the project’s URL, a list will appear after a short wait [10]. Step 4: To analyze the files found, the process is similar to the one for a single file. However, here you must first download them. Right-click on the file [11] (you can select multiple files by holding the Shift key), then choose “Download” as shown in [12]. To download everything, use “Download All”. Step 5: Once downloaded, you’ll see a dot on the right side along with the date and time of the download. Now proceed to extract the metadata using “Extract Metadata” [13], and then analyze it using “Analyze Metadata” [14]. Step 6: Finally, you’ll see the output shown in the following image (content hidden for privacy). The analysis reveals the name of the computer where the file was created [15], server data [16], two usernames [17], the software type used [18], and other general information such as the creation date. Using a tool like FOCA is essential to audit both personal and organizational files. It helps you understand the kind of information you might be unknowingly exposing and prevents potential data leaks.
June 26, 2019