The Artificial Reality Of Cyber Defense

Name: The Artificial Reality Of Cyber Defense
Uploaded: 2018-10-21
Duration: 42 min 34 s
Description: Security threats are growing faster than security teams and budgets while there is a huge talent shortage in security. The proliferation of data from dozens of security products in complex environments is paradoxically making it harder, not easier, for teams to detect and investigate threats. Attac

BSides Luxembourg · 201842:34270 viewsPublished 2018-10Watch on YouTube ↗

Speakers

Pascal Geenens

Tags

CategoryTechnical

TeamBlue

StyleTalk

About this talk

Security threats are growing faster than security teams and budgets while there is a huge talent shortage in security. The proliferation of data from dozens of security products in complex environments is paradoxically making it harder, not easier, for teams to detect and investigate threats. Attackers already have an unfair advantage over the defenders in cyber-attacks, it's time for security strategies to leverage automation in detection and mitigation, restore the balance and increase our chances to come out on top in the war against cyber threats. This presentation will discuss the different options in automating the detection and mitigation of attacks, from traditional machine learning to advanced deep learning systems, their challenges, and applications, and where they fit in a modern cybersecurity strategy.

Show transcript [en]

please thank you happy to be here happy to see that people are still in the room is getting late and was long day some interesting talks I saw so 2016 Microsoft set out to do an experiment so they built a chat Bob yeah some people probably already know where I'm going to right so they build the chat bot and integrated it with Twitter Twitter which is a great platform to integrate chat BOTS because it has an API that allows you to integrate very easily so they set out to do an experiment on human interaction and also how a new kind of Bob can learn from interacting with people so they created the bot called tie and tie was supposed to be or

started out as a teenage girl so in the idea was having people interact with that boat and as they go forward the bot will learn more from people and will be more clever in its answers now there were a couple of guys from 4chan who picked up on tie and the experiment from Microsoft and they started trolling around with time and within 24 hours the little innocent tie actually became a trump loving Nazy nosy supporter and sex-crazed ball so they had to just plug out flick it out of Twitter because it became too crazy so that's an example of what can go wrong and also a good example of what we call poisoning in in

data so today I'm gonna talk about artificial reality or better artificial intelligence I'd like to call it artificial reality in my topic because there's a lot of talk around artificial intelligence and I would not call it a hype but sometimes even though it's technically correct there's different families of artificial intelligence that are being abused under the name artificial intelligence if you go to info security for example you will see that every next booth has a title artificial intelligence for security for better secure so I started out this presentation to show people the difference and all the different domains and artificial intelligence how we actually already been using kinds of artificial intelligence which we call machine

learning in the past how we have been seeing new technologies like deep learning coming out and how we can see them used in cybersecurity but also what are the challenges with deep learning and why does it not always a good fit for cybersecurity but before I get in to more details I want to start with setting the problem and I think that everybody is familiar with the cyber kill chain not here to go through the whole cyber kill chain I just want it up there to show you that there's actually plenty opportunities to detect malicious activity in your network even though that there's so many opportunities why is it that organizations and thank you Walter for the previous presentation

because the report numbers actually represent what I'm saying here so how come that organization are still getting breached and only find out a couple of months later and the way that they find out is either because their security researchers on the dark web that saw that there were records being traded from that company but otherwise that would have never found found it another way to ransom of course on some hacker can come to them and say hey I have that much information from you so you pay up or otherwise I will publish and now with the new GDP our loss that ransom can be pretty high so for me there's actually two reasons for this first reason is not

enough events if you don't have the visibility you can of course not see the malicious events that are happening on your network the most important for me I'm probably the most prevalent in most networks because we all have sensors and we all centralized our logs we all get the events in is too many events there's too much noise you cannot see the real important events and all the noise you don't have enough humans you don't have enough X to grow through all those events and make sure if there is an actual malicious intent in that event or it's a false positive and that brings me to this world of security it has always been about minimizing false positives

and false negatives so minimizing false negatives you know that a false negative is an actual malicious event that's going on but you cannot detect it so it's clear why we want to minimize that event a false positive is something that is raised as being an attack or a malicious event but actually it turns out after investigation there was nothing wrong it was just an application that was not completely 100% according to the RFC or a new application that came onto the network that you didn't know up of up front so we're all about minimizing false positives and false negatives however the thing about false positives and false negatives they are related and not in a nice way so if you

look at the bottom line at the sensitivity of detection the more sensitivity that you put in your detection the more false positives you will get of course you will also detect a malicious event so you will actually have a very good system for blocking attacks and for detecting attacks but at the same time you will have so much noise that you cannot filter out the good events on the other side if you allow everything if you have a very low sensitivity of detection or you have a firewall that passes everything you will have no noise but at the same time it's open fields for the hackers for the attackers better now there are ways to

lower that false negative you can use a negative security model using signatures of known attacks you can push that line down but still the optimum is still over there with lots of false positives so we still have a problem with those false positive events in our detection now security threats are growing faster our budgets are getting well maybe not smaller but they're not getting bigger and the teams and the experts there's a huge talent gap on the market to find good experts and at the same time we are gathering more information more data we are getting more aware so there's a big gap there now we have already ruled based event correlation systems like in traditional

Sims but the traditional rule based correlation also depends on humans it's humans putting in the rules it's humans finding an event that was a false positive or knowing that couple of events that they found in a research they put up some rules so that next time and those events come in they are reduced to a single event and that will give us some reduction of events but you go from millions to a couple of thousands no security operations center can handle thousands of events a large Sock'em maybe handle a couple of hundred and if you give me fifty events in one day after eight hours beside being tired and going crazy I think the first few

events I will analyze with some good motivation but after a while if you get to event 30 and 40 you get tired and you are tempting to say well maybe this is a false positive and you will miss the actual event so we need more and better automation to get further in that hunt for real events that we want to see so in that automation we will find in machine learning now I told you that artificial intelligence is a very broad domain and there's lots of subdomains and in those subdomains there is already technologies that we have been using for quite some times and there are some technologies that all the talk is about and it's difficult sometimes in the

press or are also when people are talking to you especially the sales side of the vendors when they're talking to you what they mean by AI are they talking about machine learning are they talking about deep learning one thing I know it's not about the real artificial intelligence because real artificial intelligence as far as I know there's no one application out there that uses real artificial intelligence the artificial intelligence itself is a system that can sense it can reason it can act it can change itself based on its surroundings so it is aware of what's happening and it can pivot to another problem domain to solve its problem which I never saw in security but I do see in security however is

everything that's under machine learning machine learning is an algorithm that gets better at a specific task so it has one specific task to perform has been developed to do that task and it gets better as it sees more data from the environment itself it's like customizing itself to get better at prediction and then a newer kind of machine learning so one specific algorithm of one specific domain insight machine learning is neural networks and also the deep neural networks of the deep learning those are all the talks that are about what Google does what old hyper clouds are doing face recognition automatic classification of visuals and objects and photographs so this is the newest trend and this is typically when

people talk about AI mostly now they are referring to deep learning but deep learning is just one category of algorithms hidden machine learning and machine learning sits in that hole bigger domain which is AI now for some it will be an introduction for some it will be a review and just wanted to give you a little bit of an idea what deep learning is and how it works so how does a neural network work it's modeled after the brain it's not simulating the brain so it's far from how our brain is working it's much more complex in the brain but it's modeled after the idea of brain and it comes out that by building those networks we can do some really

interesting things now actually the technology here and all the principles that we see here is not new technology this is something that was been written about in the 1940s and in the 1960 we already had the first neural networks running in computing of course they lacked computing power they lacked memory but the most important thing that they like to make really effective use of deep learning is data one thing that we need for deep learning to work well is lots and lots of data now in the past few years we had the hyper clouds like Facebook's and Google and Alibaba what all those have is they sit on a very large amount of data and that made deep

learning in the recent years really breakthrough technology for solving certain problems that were too complex before for us to solve so a neural network consists of many perceptrons and a perceptron has an input vector it just has some lines coming in and those lines are summed together well every line every input value that comes in first is weighted with a weight vector and then we will sum them together and we will run them through function typically in non linear function and that will provide an output so that's just one block inside such a neural network and by combining many of those perceptrons together in multiple layers multiple perceptrons we can have or we can create

a network that can be trained based on data and that can generalize using information now the whole idea here is to Train a network and for training what we will do is actually we will optimize the global error so we will give it an image of a can it will show it to him and he will say no it's I think it's a dog and you say no no it's not a dog it's a cat so we're gonna change the weights on the input vectors on the input side to make sure that he decides that the image that you showed him is a cat next time you show another cat he comes back at the dog you say no no

change the weights and like this you're gonna continuously change the weights so a deep learning network is just many many layers of those perceptrons and it's configured through its weights so what you do is actually you present data on the input you know the classification of that data so you show it images of cats and dogs you know for each image that is a cat that is a dog whenever it comes out wrong you will change the weights until it comes out correct and you will do that with millions of different pictures to train the model once it is train you will test the model typically when you have your images for training you look categorize them in two sets one

set is for the actual training the other set is for evaluation to look at how good is a network performing after training and then after that once you did that once you have a performing network you will put it in real production a real production means that you will show it a picture that it never saw and it will tell you that's a cat or that's a dog depending on the picture and most of the time it will be correct I say most of the time because sometimes it will come out with really strange decisions now those training decisions that is a problem with deep learning most of the time we cannot explain them because it's too many interactions out

there even though it's it's mathematics it's just matrix calculation so it's very deterministic but at the same time it's too complex for our brains to even imagine what is happening in there you can I have an example later on to illustrate that so that was deep learning but before I come back to deep learning want to talk about traditional machine learning because that is something that we have been using for quite some time already and that isn't used today for protecting our networks so traditional machine learning is solving low to medium complexity problems those are problems that we humans can model in our brains think about TC P and the RFC's for TCP the RFC is like

a contract a contract that makes communication between two machines possible right TCP is a state table you can implement it as a state machine and then you can follow the complete state of how a TCP connection should work should work with the syn syn ack ack and then at the end you will see a reset or a fin so we know exactly how it needs to work so to build something to track anomalies on TCP we can build a state machine in a program now the problem with that state machine is if you're not working stateless or better if you're not working stateful you will see lots of things coming but at some point you

need to make a decision am i under a syn attack or not so if you have a detail syn attack you will see lots of sins come in and you will never get a cynic or ACK or something else so you need to look at the distribution of those flags but you don't know how much sins exactly are an attack because in transactional environment and Finance you will have lots of sessions that meet that you will have lots of sins coming in per second however in streaming services like Netflix you will only have a few sins per minute because those are typically very long sessions so you need to adapt to the environment so there what you

will do is use data and probabilistic or probability and based on the data on what you see in that environment you will adapt the different thresholds to do the detection so you're still based on rate and variant analysis with a distribution of the flags but the actual thresholds will change based on data that you learned from that specific network for that environment that's an example of machine learning but this is an algorithm that is very deterministic it's a human who wrote to program that algorithm the human can also know very well when something goes wrong why it goes wrong and also at the same time we know that there will be a low rate of

false positives because we know exact how it behaved so we can put a measure even on how much false positives to expect in an environment but that typically you can only do when the problem is not too complex I was talking about TCP we can wrap our mind around the TCP session and what happens with TCP sessions in the network but try to wrap your mind about all the data flows that go on in your network and somebody tells you ok you need to write a program that will analyze all the traffic through the parameter everything that you see coming and you need to detect anomalies well you have to write a very specific code for all the applications

application are getting added by the time that you can write an application like that that you did all the investigation there's already new application updates it's impossible to write code to do that right so for this you need something like deep learning because deep learning is programming by example so the way that you use deep learning versus the traditional machine learning is that here you have a generic model that can be applied for different kinds of problems and that model can be tuned using data you need a large amount of data and you need good data no garbage in garbage out so only good data and label data needs to go in but if you

have a large amount of good data you put it in that model it will create an approximation of what your network is and how it behaves and then based on new information that comes in after training it will be able to say that this is an anomaly or this is normal data and the way to think about deep learning and also neural networks is about curve fitting it's nothing more than curve fitting so if you put it and this is now a one dimensional problem so you have a couple of crosses here and if you have those data points those samples each crosses one data point one sample that you give to the network you can draw a

curve to those points and that curve actually how do you draw the best curve to those points that's the mathematical function that comes down to minimizing the error between each of the crosses and the curve so it's just a problem of minimizing the global error now the same happens in deep learning however not in one dimension in thousands maybe hundreds of thousands of dimensions so something that's difficult for us to interpret as a human but it comes down to this it's just like curve fitting it's fitting fitting in a hyper dimensional plane to all the data samples that you're feeding to that network so in this case for me it's more like programming with data instead of

programming with code now to put both of them against each other on the left side I put up traditional machine learning on the right side you have the deep learning on the left side the influence of code so the program that you built the model that you're creating will be dominant and all the data that you're using to make that code learn is only there to provide a baseline it will get better at prediction as it sees more data but the actual model will not change you programmed it the way it should behave on the other side there's a generic model and it gets programmed by data just like programming by example so you feed in data and the reason that

you're using that is because the problem that you're trying to solve is too complex now to give you an example when you had in the time of speech recognition there was a company in Belgium right not a nozbe I don't know if you know the company but they approached speech recognition through science so they try to listen to people say something they analyze that and based on that they could put up text now that worked well if you were an American or you didn't have a big accent when you were talking to the program now this given accuracy of about 90 percent maybe 80 between 80 and 90 percent now with all the new technologies of deep

learning what we're doing is you just take such a deep learning neural network you feed it hundreds of thousands of hours of audio that are already typed out with text and then you feed it new audio and it will come out with a complete speech recognition so the text written out and whether you're an Indian or a French or American it will work as well as long as the data that you fit in also came from all those different people so and that works typically with an accuracy of about 90 to 95% 98% so there's a big jump in accuracy between traditional machine learning and then applying that deep learning so you might tell me why don't we use deep learning

for every case then because it gives you better accuracy it gives you much more or much more flexibility to solve difficult problems so why still create models and use traditional machine learning well they're actually a couple of challenges with that deep learning that are important to us so first of all you need access to training data of course if you have images if you're Google or Facebook and in facebook people are helping you to tag the right faces on the pictures it's easy to do face recognition and to put a name label under your face because people are feeding you continuously good data tag the data so that is one big problem we need access to data that's being tagged

good or bad now one way that you can get that data in your network you cannot collect just every data that passes through your parameter and say that this is good data you never know if you're already infected or not also what comes from the outside certainly not because you don't know what is a tag data what is normal data so to Train cannot happen in real life in your network the only way that you could do some training is if you're using honey pots honey pots are the only thing if something comes in if a connection comes in there you know that it's bad so that you can give a label of bad but the good

is more difficult to learn so getting access to a massive amount of data specifically for your environment and label it is very hard and then there's also the reproducibility if you take that deep learning network and you put in data to train it at the end of the training you will come out with a configuration of the weights you know all the weights that you changed on the links on that network that is the actual configuration of your model those weights will completely change if you just change two lines in position in your data so if you just add one new sample or you just add a few new lines it can't be that your model looks

completely different so it's very hard to transfer models based on data the only way to transfer your model is to transfer the architecture of the model and the weight samples itself and we will see there's actual applications of this that are being used by malware vendors where the anti-malware vendors transparency I already told you that sometimes the neural network will give you strange result and you don't know why and then asking yourself the question why did it do that nobody can explain to you what happened inside that network and why it came out wrong to give you an example Salesforce had deep learning for a while to do to find out what is a good

prospect and when a good prospect comes out of the CRM the sales will say well I know my environment I know my customers for me this is not a good prospect tell me why this is a good prospect actually using the deep learning network itself and whatever information was in there they were unable to explain I this came out as a good prospect so what they started to do is start using more traditional machine learning and make a decision tree and that decision tree was built by just taking the deep learning network put some inputs to one put some to zero looking at the different features on input changing them on value and see how the decision

tree can be modeled so they actually built a second model to explain what the first model predicted so also we have learning and changing environments so because you have that training step a model you set it up you train it once it's trained you can put it in production and do some real prediction based on live traffic however if your data changes well you need to retrain your model it's not as such that you can continuously learn in a model and then there is learning an adversarial context which means that I ever continuously under attack so we cannot just put a deep learning model out there and try to make it learn from whatever happens out

there and have and think that we will have a good anomaly detection in the future now to give you some examples of what deep neural networks need for data a state of the art speech recognition we're talking about a hundred thousands of hours of audio if you play that sequentially it's about ten years of sound that you need to train a deep learning network face recognition needs about two hundred million images so it's not that you have enough with a couple of samples of data another problem is and that's a problem that comes up quite often and will be very important when you have dynamic environments it's overfitting just like with curve fitting this is a

perfect fit this is under fitting the complexity of your model is not complex enough to fit very well all the points this your model was too complex and is fitting to good it's actually approximating all the noise so that means whatever data I give here will be detected as an anomaly not good now this depends on the diversity in your data if you have more diversity in your data you need a complex or model now once you find a good model you drain it so typically when you train you train with data training error will always go down because you're training with the data now evaluation error however so 80% of your data or 70% of your data used

for training 30% for evaluation this is the error in the evaluation if you see that it goes down and it starts to go up when it starts to go up that means your overfitting because the error of trying new data is bigger than the error from your training set so when you see that happening the impact that has for our network is that when we add a new application to our network we add more diversification and our data we add a new protocol for example so that means that actually our model that was tuned and trained to perfection to do a normally detection is not complex enough anymore it will suddenly sit here because now our data

has more diversity so we need to make it more complex so you always have to maintain that Network it's not something that you can deploy autonomously so here is the example of tie here's another example of attack on a network so an adversarial attack is an image on this side we have here a red sports car the model was trained on pictures from cars recognize it as a sports car you add some noise the perturbation to the image this is the resulting image for us for the humans and especially on the wall with the projector you don't see the difference for us is still a sports car however the model thinks it's a toaster same for Sylvester Stallone you add some

attacking noise becomes Gianna Reese now for pictures is not that important however if you drove here in Tesla this will be a little bit more important because this is an attack on an autonomous vehicle system one of the things that those systems need to do is need to be able to recognize traffic signs now this is clearly a stop sign but by adding a small perturbation these white and black stickers in very specific places the system thinks that this stop sign is a 45 mile-per-hour sign so when your car comes up to the intersection instead of slowing down it's actually speeding up and going through the intersection so how do hackers use machine learning or will use

machine learning some of them are already using it but we did not see a lot of attacks in using our hackers using machine learning hackers better say malicious actors non hackers so first of all they can use it to create increasingly evasive malware they use two networks we call a generative adversarial networks in stew neural networks pitched against each other one will generate the malware the other will do the detection and it will loop until it finds a version of the malware that cannot be detected by the second Network another example breaking CAPTCHA so some researchers found that using deep learning they could break CAPTCHA or solve CAPTCHA with 98% accuracy this is like 48 percent better than I can do it

when I see those pictures most of the time I fail so if you're using CAPTCHA to make the distinction between a boat on a human actually the body is much better at it than the human so is the inverse right now I'm not saying that all of them are using that spearfishing you do your research get information collect that information based on that you create an email to attack a person that collection of information that's out there on a public web can be done through natural language processing through deep learning through all those a works and when it does here's an example of a human against the machine on Twitter trying to get people to click on

a certain link the human got fifty victims in two hours the machines got five more than five times 275 victims an example of using deep learning by hackers and also a real act is because those were white hats so they did a presentation on it they created the deep learning network that automatically looks at the website tries to find vulnerabilities and hacks into it completely automatically they didn't touch it although it found an SQL injection and broke into the website and they were not long experienced deep learning engineer you know using tensorflow and python five lines of code you can write your own deep learning network you can do your own state of the art face recognition or attack system so

gonna go a little bit faster to this summary so looking ahead machine learning and defense so for now human assisted the normally detection will be the number one application because you can use that and then using that you can reduce to a lower number of events and then have a human expert after that that treats those events because you cannot always count on the deep learning system what comes out of it however deep learning can make more complex associations and data than any human can remember they're good at processing data very good at it we are not that good at it so they can find much better complex associations technologies that already use it pattern matching for example is

already using it it's like face recognition pattern matching in antivirus for example they use deep neural networks they train it in the cloud massive amount of malware copy of the weight vectors to smaller to the same model that runs on your PC but it doesn't have to go through the training step so it needs much less compute much less memory much less storage you just copy the model and then you run it in life that's how those AI based malware works malware research and an analysis we sort of presentation on the Android so you can use that to actually go into malware classify them bring them together in case of the threat landscape next few

years we will see increasingly automated attacks automation where the threat actors are still creating their own attacks they're just automating their attacks and attacking more targets they will also get better efficiency at evasion and if you think that today we are already facing lots of threats I can assure you in the next couple of years as they get more automated we will see more and more threats that goes faster spread faster whenever there is a new vulnerability published there will be faster to abuse it and to use it for more targets and then further outs I believe that threat actors will actually just become the engineers of the machine they will know to the actual

attacks anymore they will just design builds dai machine and then they will just maintain it thank you so thank you I think it was a great summary of the state of the art of machine learning which is not an obvious topic so I guess you can go through the slides through those interesting slides afterwards so if any of you guys have question about this thank you that was excellent I learned more than I expected you think about that image recognition not just the errors that can be introduced but the basic ability to do image recognition it's clear that we're using a lot more watts of power to do it with silicon and then with the human

brain and so it feels a little bit like brute force now in the old days before we had this much compute power available a lot of researchers were looking at less brute force e ways to do this as natural language processing one of the examples that you brought up is there any hope that anybody is gonna work on that again now that we have all these computes and we can melt ice caps or is this our future well I 100% agree this is like the brute force way of attacking problems right you take a lots of amount of data you up my computing to it you create a model and then you go and use it now actually

what we do see in those models when you would look at some of those models and there's nice simulations online to see that in images they find features and that's actually what the research is on image recognition did before what they did before is actually find features and images and then bastes try to filter those features so for me actually in the future it can be an interaction between both because before they had a very hard time finding those features and I took maybe a couple of months to find the right features to detect an object but using deep learning by just putting brute amounts brute force on it you can detect those features and you don't see

it on the output you see it in the network itself you have some of those those images that come back in the neurons themselves and then you will see that you have that specific angle or that specific feature so you could use combination of both of them and actually all that deep learning that there's still a lot of machine learning and cleverness that comes in front of it it's not like you have that model I I simply oversimplified it by saying you just take the model throw data in it and how to come something magic yeah you have to process the data you have to find the right features because some features make sense other features make

no sense they try to reduce the dimensionality to make the problem simpler because as you said it's lots of heart of computer we can throw lots of computer that but sometimes this is even too slow if you have millions of features so there's lots of research going into the fact of reducing that and there comes the human ability of reasoning and also bringing in the right features doing the right pre-processing of data before it goes into the deep Learning Network so that's part I did not talk about but that part still contains a lot of what you are referring to so maybe you have time for last question

thank you do you know any commercially available network intrusion or open source network intrusion detection system that is using deep learning I would say they are all looking into it and so machine learning has been used a lot already the deep learning things and yeah I had some slides but I skipped it the actual applications of it so Microsoft so false and I know other anti-malware vendors are also using it they're already using that deep learning they use models in the cloud with terabytes of data or malware they train the model copy off the weights copy those ways to the exact same model that runs as a client and then use that to do malware detection we use it for example

traditional machine learning you can build a model you can easily detect unknown DDoS attacks like syn attacks but then again there's some attacks that are so complex that you cannot just detect them with traditional machine learning so you need deep learning now deep learning you need lots of data lots of data when customers data is not good enough because that customer might be already infected too at the backdoor so what you need is crowdsourcing take data from many customers bring them into the cloud anonymized preferably yeah bring them in the cloud process them there build a model and based on that model you can get new data from a customer run it through the model and

then you can detect if it's an attack or not and then you can feedback that information through threat intelligence so a lot of threat intelligence feeds that you see are actually already based on deep learning there's already a lot of deep learning out there but most of them is processing in the cloud and - either threat intelligence or copying of the vectors resists in real time on site that answer the question so thank you very much Pascal I guess it's a really broad topic so we can discuss about this during hours but unfortunately now we have to switch to the next speaker which is the last of the day so thank you again it was good

[Applause]

The Artificial Reality Of Cyber Defense

Related talks