Franco Piergallini Guida

Franco Piergallini Guida

Security Researcher. Innovation & Laboratory Area at ElevenPaths.
Cyber Security
How to Trick Apps That Use Deep Learning for Melanoma Detection
One of the great achievements of deep learning is image classification using convolutional neural networks. In the article "The Internet of Health" we find a clear example where this technology, like Google's GoogleLeNet project (which was originally designed to interpret images for intelligent cars or self-driving cars), is now used in the field of medical image analysis for the detection of melanoma and skin cancer. Just by searching the mobile app shops for this purpose, we found some apps that, based on a photo of a spot or mole on your skin, predict whether it is a malicious melanoma or something completely benign. As we have seen in previous articles, these types of algorithms could be vulnerable to alterations in their behaviour. From the selection of some of these applications, we proceeded to perform a blackbox attack with the aim of strategically generating noise to an image of a melanoma to see if it is possible to invert the classification of the internal neural networks of the applications about which we had no information. That is, in this research scenario, we did not have access to the internal neural networks of the applications. Methodology Given this situation, one of the possible paths was to recreate our own trained models in the most intuitive way to address this type of problem, and we generated attacks for these that, due to the property called transferability, should work in all the applications we had selected. But we found an even simpler way: to save us the step of training a neural network dedicated to melanoma detection in images, we simply looked for an open source project that addressed this problem and had a neural network already trained and ready on Github. The transferability property was discovered by researchers who found that adversarial samples specifically designed to cause misclassification in one model can also cause misclassification in other independently trained models, even when the two models are supported by distinctly different algorithms or infrastructures To try to verify the theory using one of the selected apps in a "normal" way from our device or emulator (Android), we proceeded to load our randomly selected melanoma images from Google in order to see their results. Indeed, we could observe that the apps classified those images with a high confidence as melanomas , as we can see in the following image: Image 1: Classification of images as Melanomas From there, we proceeded to recreate an adversarial attack. We assumed that all the victim applications used an approach similar to the one proposed in the Github repository. Therefore, using the neural network weights provided by the repository, we applied the Fast Sign Gradient Method (FSGM) technique, which we mentioned in another another post, generating the "white noise" needed to fool the neural networks. This noise, almost imperceptible to the human eye, is specifically designed from the weights of the neural network to have the greatest impact when assigning the classification probabilities of the images and completely change the prediction verdict. And indeed, the image carefully generated by means of the weights of the open-source neural networks with the FSGM have the desired impact on the target victim applications. We observe that the transferability property is clearly fulfilled, as we have no idea what internal structure and weights the internal networks of the applications have. However, we were able to change the prediction of images in which a fairly certain result was shown to be melanomas, simply by adding "noise" to them. Image 2: Analysed melanomas, but with a reduction in classification We successfully recreated this type of attack on several apps we found in the Google and Apple shops. In some cases, they behaved in a similar way and not exactly the same, but at the end of the tests we always got the same result. Tricking the neural network in its prediction. In the following image we show the results of the same melanoma image uploaded to the same application, but to which we increased the noise until we reach the point where the application's internal network changes its prediction.
February 9, 2021
Cyber Security
Thinking About Attacks on WAFs Based on Machine Learning
One of the fundamental pieces for the correct implementation of machine and deep learning is data. This type of algorithm needs to consume, in some cases, a large amount of data in order to find a combination of internal "parameters" that allow it to generalise or learn, with a view to predict new entries. If you are familiar with computer security, what you have probably noticed is that data is what is left over, security is about data, and we find it represented in different forms: files, logs, network packets, etc. Typically, this data is analysed in a manually, for example, using file hashes, custom rules such as signatures and manually defined heuristics. These types of techniques require too much manual work to keep up to date with the changing picture of cyber threats, which has a dramatically exponential daily growth. In 2016, there were around 597 million unique malware executables known to the security community according to AVTEST, and in 2020 we are already over a billion so far. Picture 1: source: https://www.av-test.org/en/statistics/malware/ For this volume of data, a manual analysis of all attacks is humanly impossible. For this reason, deep and machine learning algorithms are widely used in security, for example: anti-virus to detect malware, firewall detecting suspicious activity on the network, SIEMs to identify suspicious trends in data, among others. Just as a cybercriminal could exploit a vulnerability in a firewall to gain access to a web server, machine learning algorithms are also susceptible to possible attack as we saw in these two previous instalments: Adversarial Attacks: the Enemy of Artificial Intelligence I and Adversarial Attacks: the Enemy of Artificial intelligence (II). Therefore, before putting such solutions in the front line, it is crucial to consider their weaknesses and understand how malleable they are under pressure. Examples of Attacks on WAF Let's have a look at a couple of examples of attacks on two WAFs, where each one fulfils a simple objective: to detect XSS and malicious sites by analysing the text of a specific URL. From large data sets, where XSS and malicious sites were correctly labelled, a logistic regression type algorithm was trained with which to predict whether it is malicious or not. The data sets for XSS and for malicious sites used to train these two logistic regression algorithms are basically a collection of URLs classified as "good" and "bad": Picture 2: Malicious URLs Picture 3: XSS Where the data set of malicious sites contains about 420,000 URLs between good and bad. And, on the XSS side, 1,310,000. As it is a white box type attack, we have access to all the data processing and manipulation for the training of the algorithms. Therefore, we can see that the first step in both scenarios is to apply a technique called TF-IDF (Term frequency - Inverse document frequency), which will give us an importance to each of the terms given their frequency of appearance in each of the URLs in our data sets. From our TF-IDF object we can obtain the vocabulary generated for both cases, and once the algorithm is trained, we can easily access and see which of these terms gave it more weight. At the same time, from these terms we can easily manipulate the output of the algorithm. Let's have a look at the case of malicious site rating. Malicious Site Rating According to the algorithm, if any of these terms appears in a URL there is a high probability that it is a non-malicious site: Picture 4: weight of terms to be considered NOT malicious This means that, by simply adding some of these terms to my malicious URL, I will be able to influence the algorithm at my mercy as much as possible. I have my malicious URL that the algorithm detects with enough certainty, which indeed, is a malicious site: Picture 5: malicious URL With a 90% confidence, it classifies the URL as malicious. But if we add the term 'photobucket' to the URL, the algorithm already classifies it as "good": Picture 6: Malicious URL with a trustworthy term We could even push that probability further by simply adding another term to the URL, for example "2011": Picture 7: Malicious URL with two trustworthy terms Let's move on to the XSS scenario. We have a payload which the algorithm correctly classifies as XSS and with a 99% confidence (in this example label 1 corresponds to XSS and the 0 to non-XSS): Picture 8: Payload of detectable XSS Let's take a look at the terms with the least weight to reverse that prediction: Picture 9: weight of the terms to lower the prediction of XSS attack As we did before, we added some of these terms to manipulate the output of the algorithm. After some tests, we find the payload that inverts the prediction, we had to add the term "t/s" about 700 times to achieve the objective: Picture 10: payload capable of reversing the XSS prediction And, indeed, our algorithm predicts it as NO XSS: Picture 11: No detection of XSS by the payload used In case anyone is interested in the subject, we leave some links to the WAF Malicious Sites and the WAF of XSS projects. Some references were taken from the Malware Data Science book. Having access to the pre-processing steps data and models facilitates the creation of these types of attacks. If the attacker did not have access to these, it would imply a greater effort to find the right pre-processing of the data and the architecture or algorithm of the predictive model. However, it is still possible to recreate these attacks through other techniques such as transferability, where adverse samples that are specifically designed to cause a misrating in one model can also cause misrating in other independently trained models. Even when the two models are supported by clearly different algorithms or infrastructures.
October 5, 2020
Cyber Security
Adversarial Attacks: The Enemy of Artificial Intelligence (II)
In Machine and Deep Learning, as in any system, there are vulnerabilities and techniques that allow manipulating its behaviour at the mercy of an attacker. As we discussed in the first part of this article on Adversarial Attacks, one of these techniques are adversarial examples: inputs carefully generated by an attacker to alter the response behaviour of a model. Let's look at some examples: The easiest one can be found in the beginning of spam detection, standard classifiers like Naive Bayes were very successful against emails containing texts like: Make rapid money! Refinance your mortgage, Viagra... As they were automatically detected and classified as spam, the spam generators learned to trick the classifiers by inserting scores, special characters or HTML code like comments or even false tags. So they started using "disguises" like: v.ia.g.ra, Mα∑e r4p1d mФn €y!... And they went further, having solved this problem for the classifiers, the attackers invented a new trick: to evade the classifiers that relied on text analysis, they simply embedded the message in an image. Adversarial examples EbayPicture 1: Adversarial examples Ebay Several countermeasures were quickly developed based on image hashes known as spam using OCRs to extract text from images. To evade these defences, attackers began applying filters and transformations to the images with random noise making the task of recognizing characters in the images quite difficult. Picture 2: Random noise As in cryptography, we find ourselves in an endless game where defence techniques and attack techniques are constantly found. Let's stop at this point. Image Classification and Adversarial Attacks In the classification of images, the attackers learned how to meticulously and strategically generate white noise, using algorithms to maximize the impact on neural networks and go unnoticed by the human eye. In other words, they achieve a stimulation in the internal layers of the network that completely alters their response and prevents them from being processed intelligently. One of the reasons why there are these types of attacks in the images is due to the dimensions of the images and the infinite possible combinations that a neuronal network can have as an input. While we can apply techniques such as data augmentation to increase both the size and variety of our training data sets, it is impossible to capture the great combinatorial complexity involved in the actual space of possible images. Figura 3: https://arxiv.org/abs/1412.6572 But how is this white noise generated? First, we will formulate the adversarial examples mathematically, from the perspective of optimization. Our fundamental objective in supervised learning is to provide an accurate mapping from an input to an output by optimizing some parameters of the model. This can be formulated as the following optimization problem: 〖min 〗_θ loss(θ,X_i 〖,Y〗_i ) Which is typically known as neural network training. To perform this optimization, algorithms such as stochastic gradient descent are used, among others. A very similar approach can be used to get a model to misclassify a specific input. To generate an adversarial example, we used the parameters into which the network converged after the training process and optimised on the possible input space. This means that we will look for a disturbance that can be added to the input and maximize the model's loss function: 〖max 〗_(δ∈∆) loss(θ,X_i+ δ〖,Y〗_i ) Toy Example Let's think for a moment about a simple example where we have a linear regression neuron, with a 6-dimensional input: Which, when going through the training process, converged with the following weights: W=(0,-1,-2,0,3,1), b=0. If the input is given: The neuron will remain as output: So how do we change x→x* so that yx* changes radically but x x*≅x? If we take the derivative of ∂y/∂x=WT, it will tell us how small changes in x impact on y. To generate x* we add a small perturbation εWT,ε=0.5 ε to the x input: And if we do forward propagation to our new x* input, if we are lucky, we will notice a difference from the output provided by the model for x. Indeed, for x* input we get 6.5 as output, when for x we had -1. This technique (with some minor differences to the toy example we have just seen) is called fast gradient sign method and was introduced in 2015 by Ian Goodfellow in the paper entitled Explaining and Harnessing Adversarial Examples Future Adversarial Examples: Autonomous Cars Adversarial examples are an innate feature of all optimisation problems, including deep learning. But if we go back about 10 years, deep learning did not even do a good job on normal, unaltered data. The fact that we are now searching and investigating ways to "hack" or "break" into neural networks means that they have become incredibly advanced. But can these attacks have an impact on the real world, such as the autopilot system in a car? Elon Musk gave his opinion in Lex Fridman's podcast assuring that these types of attacks can be easily controlled. In a black-box environment, where attackers do not have access to the internal details of the neural network such as architecture or parameters, the probability of success is relatively low, approximately 4% on average. However, Keen Labs researchers have managed to generate adversarial examples by altering the Tesla car's autopilot system. Furthermore, in white-box environments, adversarial examples could be generated with an average success rate of 98% (An Analysis of Adversarial Attacks and Defences on Autonomous Driving Models). This implies a high susceptibility in open-source self-driving projects such as comma.ai, where the architecture and parameters of the models are fully exposed. Waymo, a developer of autonomous vehicles belonging to the Alphabet Inc. conglomerate, lays out a range of high-resolution sensor data collected by its cars in a wide variety of conditions, in order to help the research community move forward on this technology. This data could be used to train a wide variety of models and generate adversarial attacks that in some cases could have an effect on the networks used by Waymo due to transferability, a property of neural networks in which two models will be based on the same characteristics to meet the same objective. We must mention that there is a big gap between cheating a model and cheating a system that contains a model. Many times, neural networks are just another component in an ecosystem where different types of analysis interact in decision making. Regarding the case of autonomous cars, the decision to reduce speed due to the detection of a possible nearby object, detected in the analysis of the front camera, may not agree with the data obtained from another component such as a LIDAR in the case of an adversarial attack. But in other types of decision making, such as analysing traffic signs, only video analysis could interfere and have a really dangerous effect by converting a stop sign into, for example, a 50 kilometres speed limit sign. Picture 4: Stop signal This technique undoubtedly constitutes a latent threat to the world of deep learning. But that is not everything, since there are other types of attacks for each of the stages in the machine learning pipeline in which an attacker can take advantage: Training stage: poisoning of the data set. Learned parameters: parameter manipulation attacks. Inference stage: adversarial attacks. Outputs Test: model theft. Want to know more about Adversarial Attacks? Find out in the first part of this article here: CYBER SECURITY Adversarial Attacks: The Enemy of Artificial Intelligence July 7, 2020
September 8, 2020
Cyber Security
Adversarial Attacks: The Enemy of Artificial Intelligence
A neural network has a simple objective: to recognise patterns inherent in data sets. To achieve this, it must have the ability to "learn" by going through a training process where thousands of parameters are adjusted until a combination that minimises a given error metric is reached. If it finally finds a combination of parameters allowing it to generalise the data, it will be able to recognise these patterns and predict, with an acceptable error tolerance, data inputs never seen in the training process. This data can be images, videos, audios, or tabular data. What if someone knows how to manipulate this data to provide the most convenient data for them? Imperceptibly and unconsciously, we use neural networks in a multitude of tasks that we perform every day. Some of the most simple examples are the film recommendation systems on Netflix and music on Spotify, the identification and categorisation of emails, the interpretation of queries and next-word predictions in search engines, virtual assistants and their natural language processing, facial recognition in cameras and, of course, the identification of friends on social networks as well as the funny filters that change our facial features. Without discrimination, neural networks succeed in an immense variety of fields. We can use them to diagnose COVID-19, track down drug dealers on social networks, or even detect fake news. However, it has been shown that they may also be hacked, taking us back to the essence and definition of hacking: manipulating the normal behaviour of a system. One of the techniques used to arbitrarily manipulate neural networks is what is commonly known as "Adversarial Attacks". Thanks to this we can produce the desired output by creating a carefully crafted input. For example, if we have a neural network that, based on the sound of a cough, predicts the probability of having or not having COVID-19, we could manipulate the recorded spectrograms by adding noise to modify the probability of response (increase or decrease it). Or we could even generate a spectrogram with no sense or similar to those generated by the cough and thus obtain any desired response probability. Example with Deep Fakes Let's see a specific example: We have a system very good at predicting whether a video is deepfake or not. One of the traditional solutions to this issue begins with the collection and alignment of n faces appearing in the video by using a specific neural network for this task. Once collected, another network predicts the probability of a face being deepfake or not. Source: Deepfake Detection Challenge (DFDC) The last step is to take an average of all the probabilities for the n faces collected. If this average is greater than an established limit (for example, 0.6), then the video is classified as deepfake. Otherwise, it is classified as not deepfake. Clearly, we can see that in the example the quality of the generated deepfake is not very good, so the system is very confident when classifying it (0.86). To modify the output probability of the system, we should add strategically generated noise and insert it into the video. To achieve this, we have three restrictions: The noise generated must be sophisticated enough for the network that identifies the faces to continue to do its job well. The noise must be generated in such a way as to lower the probability that the second network predicts on all collected faces. The modifications should be as unnoticeable to humans as possible. Analysing the second network in detail, we can see that the input received is always the same size: a 256-pixel high by 256-pixel wide RGB image. Neural networks are deterministic, that is, for any input image that fits the first layer, it will produce an output. Pixels take values between 0 and 256, which means that the space of possible combinations for inputs from the second network will be 256256*256*3 but only a very small subset will meet all three restrictions. To generate noise, we use the Fast Gradient Sign Method (live demo), which involves a white box attack and full access to the system. But what happens when we have only one chance to fool the system? We could create our own replica model of the original and generate the noise based on it. There are high probabilities that the attack will work by transferability, a property that is still a case study but that basically says that two models with the same objective will be based on the same features to accomplish it. How Can We Protect Ourselves from This Kind of Attack? One solution may be to add a new neural network that works as a kind of IDS (SafetyNet) in our process. If it detects that the image or video contains this type of attack, it can discard the video and classify it as malicious. Another solution would be to generate these attacks and include them in our data sets and within the training process of our network so that it can tag them as malicious. However, this option is very cost-intensive due to the amount of combinations over which they can be generated. A very clever solution from an NVIDIA team called BaRT (The Barrage of Random Transforms) proposes to apply different types of attacks to the data set where the neural network is trained to make it difficult for the attacker to perform a black box attack so that the network can correctly classify a video as malicious. Cleverhans, from Tensorflow, and ART (Adversarial Robustness Toolbox), from IBM, are libraries where we can find a starting point with examples to learn more about this type of attacks in neural networks, as well as ways to fix them in our models and increase their robustness. There are many places where attackers can exploit these types of techniques and have a significant impact: Identity theft in facial recognition systems, tricking detectors of sexual or violent content on social networks, traffic signs used by autonomous vehicles, fake news detectors, etc. Behind all these applications that we use daily there are models that, like any system, can become vulnerable and their behaviour can be disrupted.
June 30, 2020