← All talks

GT - Reducing Inactionable Alerts via Policy Layer - John Seymour

BSides Las Vegas15:564 viewsPublished 2019-10Watch on YouTube ↗
Mentioned in this talk
Vendors
About this talk
GT - Reducing Inactionable Alerts via Policy Layer - John Seymour Ground Truth BSidesLV 2019 - Tuscany Hotel - Aug 06, 2019
Show transcript [en]

hi everyone welcome to my talk reducing in actionable alerts by a policy later my name is John Seymour okay delta0 and I'm a lead data scientist at Salesforce where I work on the detection and response team performing machine learning on security logs to alert to new attacks to improve our existing alerts and rules and to find and make new contextual data for use in investigations my goal here today is to inspire you to be creative and where you apply data science and machine learning techniques a major impact can be had with very small amounts of effort if you come out of here with a nagging feeling that machine learning would help you with some where it's not normally

applied then I'd call this talk a success right so like a lot of good presentations let's start with definitions often times humans analyzing model generated alerts will throw in a alert away immediately right and we found as we've deployed our models there are two main reasons what we've seen how this happens first when there are issues with the data pipeline so these are things like when necessary logs are missing when parsers fail when joins between host and Network artifacts behave unexpectedly when third-party information is bad or corrupted when deployment inconsistencies throughout the fleet when an added contextual information like host names is wrong when added contextual information is stale where it used to be right when

it's wrong when added contextual information is right and an expected leeway this list goes on and on and on right contrast this to like obvious false positives where we mean the model is not able to capture the complexity of the instance where the activity can easily be determined to be a low priority or non-existent threat to the business such as a model alerting on beginning to a company and resource generally we've seen these handled after the fact a whitelist is added which says simply even if the rule or the model says this is bad don't alert on it you can think of whitelist as a simplistic example of a policy layer which addresses the two causes for an

actionable alerts in this talk will demonstrate how even simple modeling improves upon whitelist and further will argue how modelling wait lists separately from modeling suspicious events is actually a natural approach problem so here are some reasonable examples for whitelisting we've seen in the past rape for a large number of alerts generated we don't actually care about if a connection is completely internal to the network so for these it might actually make sense to whitelist anything that where the both the source and the destination are internal or we might only care about a connection attack that's successful right um so we might think filtering alerts where the connection was ultimately unsuccessful like maybe the firewall blocked the

connection we might think that's a good idea or another widespread use for whitelist is in filtering if the domain is obviously benign so like take the top you know popular white domains and just shove that as a whitelist and say don't alert on any of these things right and these are obviously common in widespread ideas but white lists create some unintended side effects down the road they're extremely rigid if a white listing rule matches an event then the event is just completely discarded even if the activity is extremely suspicious for other reasons um they're also challenging to maintain right if you've worked with white lists you've probably noticed issues with updating them you know since each problem set tends to

have different white listing requirements for example take an extremely loud attacker moving laterally through Network you can't whitelist internal connections for that or take failed connections they could be n X domains today but future command and control instances where the malware just hasn't activated yet large number of these might also you know indicate an infected host or something like that mmm benign domains a major way to exfiltrate information is through standard services that are likely wait-listed not even considering the fact that domains are static and most popular domains are out of popular domains not all of them are always benign so I bet that most of you in the room are thinking ok we'll just include

those as features in our models and that's definitely a reasonable position to take but here are some reasons why you might actually want a separate policy layer at least at first let's start easy whitelists are already accepted by human analysts and stating this alert has white listable characteristics but we think it's suspicious even given those is actually still generally well received and you know useful information to the analysts however model centric reasons exists for such a separation to Google rules of machine learning application spam filtering and quality ranking in a policy layer and that equality rankings should focus on ranking content that's actually posted in good faith the main idea behind that rule is that spammers the adversaries in

that context will attempt to emulate high quality posts so features used to indicate you know high quality posts today might actually be indicative of spam tomorrow that concept of adversarial drift also applies here and the independence of the two types of models actually really helps with tuning training frequency we all know that adversaries adapt they're likely to adapt faster than what makes a model in actor and alert in actionable also separation gives us generalize ability which allows for centralization reducing code duplication most of these features will be present in a large number of rules in my model that you use so even if most models exclude some of these you know heuristics you can still have one

whitelist model which applies to a large number of rules or models another is good data is actually the limiting factor for model quality in the intrusion detection space a lot of the time separating actually reduces the problem space so that models don't have to learn both what's a good you know alert what's likely to be thrown away and what's actually suspicious at the same time and that actually means you can label you know data for different tasks and be a lot more effective with your labeling and then finally some of our team's models have multiple consumers with contradicting preferences these might actually be unsatisfiable if you try to push them into the data detection model so how do we actually

enforce the heuristics commonly used to filter false positives in a better way well here's a simple machine learning based approach that we've deployed at scale for doing so we actually combined you know these heuristics using the function which penalizes alerts where any such issues that are found without completely filtering the pea beer concretely if we let X be a list of binary heuristics which are commonly used to filter you know false positives then let your Policy score be some number between 0 & 1 raised to the count of how many of these heuristics are true for a given event for example if lambda is 0.5 and you have two issues surrounding the data then your policy

score would be 0.25 or if no issues are detective your policy score would be 1 and we just re wait the alert by this value right so this is a really really simple approach and it's also simple to integrate with the models actually integrating this score with rules and models is very straightforward you just multiply the two different scores together it also has a lot of other benefits though the model can be reused for many use cases so even in instances where the heuristics conflict with the actual model generated such as that exfil detection model um being using good domains you can actually just reweighed the final learning threshold for the combined score another benefit is the

models completely unsupervised requiring you know zero effort to train and the model requires very little data science background to actually understand right but perhaps most importantly it allows you to aggregate all of your different heuristics in one place which makes it much much easier to maintain but we know there's no free lunch when it comes to these sorts of things so what do we actually lose when we're moving to this sort of model for waitlisting well to start the main changes that we're now allowing a small number of events through that we wouldn't previously so there is definitely by definition going to be a nonzero number of false positives we've mitigated this quite a bit through threshold in but obviously

some of the new alerts that you get are going to be true positives and some are going to be false positives a piece of low-hanging fruit that you might think is that the model is actually very very simplistic right now right you could for well trained it on historical data like previous case adjudications that's definitely something we're looking into but probably the primary drawback here is how the model handles repeated alerts with white listing you can completely negate an alert from being generated but you can't do that here you can reduce them by adding sort of features based on prior case adjudications but that makes the model complex um you could also eliminate them by you know adding

additional hopefully temporary white lists but that brings us back to the initial white listing problem and it reduces a lot of the impact of this approach but here are the main issues that we're trying to improve upon this first again the most obvious being supervised learning which would allow us to more finely tuned that you know policy layer after that we plan to try out different stacking methods so using like the output of the policy model as a feature when training the unsuspicious is in this model and trying different configurations for that and then finally we're trying to more formally incorporate the consumer preferences like we stated earlier such as different outputs for thread Intel versus our sock

or for the different responders for risks vulnerabilities and policy violations so that's actually all ahead um does anybody have any questions

hi thank you a quick question for you one of the big problems that I've experienced in the past with trying to kind of work with ml systems in general is the problem of interest of int respectability and for you know coming from an OPS background and having been the person getting woken up at like 3:00 in the morning nothing is worse than having something like sometimes tell you one thing and sometimes tell you another thing I'm like you can't figure out the hell's going on how do you kind of how have you dealt with while working with your customers like dealing with that kind of problem of not necessarily being able to introspect the system in a easy

to understand why yeah that's definitely a great question um so for the actual whitelisting technique here it's actually very entry to introspect right because you actually have the values of the different variables you know how many are set to one you know etc you can just return those to the opposite people adding in something like a supervised method or something like that yeah that would definitely obfuscate for example and would harm the introspective ility right so for this um like we have issues for example um with our hostname mapping for example IP hostname mapping and for those we actually encode that into our our policy layer by saying okay like this is known to be faulty

you know IP hostname mapper right um and we can surface those to whoever's investigating the alert

who oh yes

hi so there's someone who is completely unaccustomed to ml in general how would you what strategy would you say is best for determining a proper lambda and how would you really tune that appropriately okay we'd probably do it the same way we actually did it which is start at just lambda equal to 0.5 so that means like you know for every extra alert that you generate then you have basically the score right the output score um if you wanted to tune it further you'd probably need actually to get data and collect data about how how many alerts are good and how many are bad coming out of your system and that's when you want to

actually move to something like a supervised approach any ways to sort of tune sort of that lambda you also have say a different weight for every single feature if you did that technique whereas here we only have a single lambda for efficiency sort of related to that did you as maybe a middle ground look at talking to the analysts and seeing what they thought about the different alert types and assigning different weights based on their feedback yes so we actually we did so we did definitely talk to the analyst and we did consider assigning different weights to each you know different heuristic that's being true um the only issue with assigning weights here is basically that sort of explodes the

dimensionality of the problem right like you have a different weight for every feature right where's right now we only have one sort of parameter we're tweaking um and so we decided that basically if we did sort of attempt that route we would wait until we actually had a collection of data that we could use to sort of label things as being you know well generate it or not and then just go supervised to approach any of this

so just to make sure I'm understanding right each feature is a or each element on a white list is going to be a feature in the policy layer sort of so for example if we have like a rule that says it's it's a domain that's in a whitelist or we have a rule that says okay the IP the hostname is missing from this particular log or we're missing the parent log or things like that those are all rules that we would use as separate features in that that so do you have any thoughts on how to handle the model expanding as you come up with new heuristics you know every time yes so that's also comes back to the the sort

of supervised training approach if you start something simple like this like really simple then you can sort of get a lot of those ideas out and know sort of all the a lot of the different rules that you have um even so I guess you're probably going after the idea that like even if you deploy a model today then later down the road you're going to discover something some new in action Bowl or rule or something okay well I was going to say like we're kind of assuming here that the the drift in terms of what makes them an alert in actionable changes much much more slowly than the suspiciousness sort of score so adversaries adapt that your system

doesn't adapt that fast cool another animal

[Applause]