Determining Normal: Baselining with Security Log and Event Data - Derek Thomas

Name: Determining Normal: Baselining with Security Log and Event Data - Derek Thomas
Uploaded: 2016-08-29
Duration: 48 min 56 s
Description: Determining Normal: Baselining with Security Log and Event Data - Derek Thomas Ground Truth BSidesLV 2016 - Tuscany Hotel - Aug 03, 2016

BSides Las Vegas48:56446 viewsPublished 2016-08Watch on YouTube ↗

Mentioned in this talk

Tools used

Microsoft Excel Splunk

Languages

Python R SQL

About this talk

Determining Normal: Baselining with Security Log and Event Data - Derek Thomas Ground Truth BSidesLV 2016 - Tuscany Hotel - Aug 03, 2016

Show transcript [en]

we have derek thomas speaking next he's a senior information security consultant at e centaur so give it up thanks guys oh we got a packed room coming out after lunch i'm impressed hopefully we can keep you awake here so my name is derek thomas we'll be talking about uh just security baselines kind of a fundamental application of statistics and analysis for logs and determining operator and lower thresholds on your on your log data look kind of for funny stuff so start off with who am i derek thomas on twitter i'm dtom uh that's with the zero so if you want to uh say hi anytime you know i'm basically a family guy i'm a security consultant with east center

my domain knowledge has been in log management sim so from almost my entire security uh career i've been doing sim implementations with uh pre-sales post sales ongoing services things like that uh with with several different uh seven different vendors uh i've also done vulnerability management and i kind of uh from a theoretical standpoint uh try to try to stay up with uh different pen testing techniques i mean i think if you're going to be a good uh a blue team or you gotta understand red team if you're gonna be a good red team you gotta understand blue team so i think both of those are kind of critical so i'm a lover of logs i probably if you

know me i've probably bored you to death on numerous occasions with uh you know talking about windows security logs and uh network hardware logs so uh primarily with logs i focus on threat detection um i like i like i like logs because they tell you everything that's going on in the environment you can see everything a user does um and that's really the definitive uh activity for for that user so once those posts once an exploits are once there's been a compromise that's what you're gonna be going to um and i consider myself an armchair data analyst so i'm trying out my skills i think that's going to be critical if you're in the same field with log

management and sim you're going to need to be able to uh kind of start applying advanced um analysis to the log data we're getting to a point where there's ridiculous quantities of log data i think in my day-to-day life got over a petabyte of compressed logs that to work with across all our clients so it's kind of impressive to me so day to day i manage log management sensors so i'm in the linux command line interface python somehow i got miyagied into learning sql so i never touched sql but become pretty decent at querying and extracting data and i've been studying r on my own i think that's going to be one of the a pretty good language to to do

this type of analysis and most of the things you need are built in i volunteer with ultimatewindowsecurity.org so if you're familiar if you've ever looked for windows events that's probably the first thing that comes up there's tons of great content for windows logging active in the michigan infosec community i see some mysec people out here so let's give a shout out to to myself but uh and in my previous life i was a blackjack dealer so i'm kind of came home here so if you want to talk about card counting or anything that's kind of a fun fact afterwards so word of caution uh i heard ken weston say he's not a data scientist and a

couple people say that so i'm definitely not i've been trying to up my game and apply these concepts to log data and this presentation is kind of the application of my research into uh the data that i work on a day-to-day basis and their experiences have been pretty pretty good so far clients will often ask us you know hey i want to see weird stuff happen and if you're if you're like me if you're generating use cases you're like well what does that mean you know when you're doing a use case you need clear defined uh events and alerts that you want so this is one way that you can hopefully answer that question so we're going to focus on just

obtaining kind of practical results you can apply these concepts in excel if you want uh python or r or sql whatever you have at your disposal you should be able to apply these concepts uh covering agenda i'm going to talk about why i'm giving this presentation what i'm hoping you get out of this presentation why i like internal log data i see a lot of external data web data dns data but i haven't really seen much in operating system why determine normal is uh important what a baseline is getting data and exploring it um investigating it and finally kind of creating those thresholds running through a use case methodology and uh and we'll probably go for through two

different use cases then i'll i'll follow up with some resources and references so i feel that log data is highly underutilized when i talk to people um there's not that many i find in there are in security mode how many people here operate uh or on a day-to-day basis so i'm in good company here it's not normal i don't normally find that at other conferences so i like to see that but um i find that there's a little there's very little defined approach to using log data like you know when i'm doing implementations for sim in my previous life you would implement it and then you know the clients say now what well you got to define what you

want you got to figure out what's valuable to the company um so i learned way too much the hard way and so hopefully this will be one of those things i can help uh you guys with um and the main reason is when i was looking through best practices in the in the very beginning everything says look for normal activity look for trending but no nothing ever says how to do that have you has anybody ever seen really how to determine a trend now really it's visualize and say look for this spike but sometimes that spike is normal and it may not be something you need to investigate so i find there's little coverage with security data analysis pertaining

to windows logs and network data so hoping to kind of cover that gap or that perceived gap so i'm trying to provide a practical strategies so like i said if somebody implements this in excel i'm going to consider a success because i think that's awesome i do everything in r or sql but hopefully you can do it in anything that you're familiar with so we're going to cover methods that are easy to understand and describe that's really the goal i mean it's going to be this is straightforward but you need to be able to describe this to the stakeholders in your organization and say why is this important and what's going on there's some really cool stuff with deep

learning but you know to describe that to clients or management how you know how are you going to do that there's a lot sometimes things are perceived as a black box so hopefully that's one way you can easily illustrate kind of uh uh anomaly detection techniques so like i said this can be implemented in excel python r sql et cetera so kind of cover this i see i see a lot of content on dns logs firewall logs netflow web blogs i don't see on on windows logs windows tells you everything that's going on in your environment and everything a user has done so i think that it's very valuable so why is uh the importance of normal

have you ever been has anybody here been asked to you know uh determine what's normal or try to trend data or create baselines so yeah if you google log management best practices everything says that so hopefully this will be a tool for that um so beyond just kind of like alerting on thresholds there's also a tool for proactive investigation or some people call it hunting so i've seen that frequently looking for deviations in traffic in an unknown data set so that you'll be able to use this to go back in time and look if anything abnormal happened during that time um aids in investigation so from time to time i've been asked and said derek you

know i just looked at this report there's 1700 authentication failures please investigate that well the first thing i do to say is that normal every single day there could be 1700 invest or authentication failures and yet the users aren't aware so this is a tool you say well yeah this happens every day on average it happens this amount of time and it'll increase or decrease by x amount based on the deviations so and it also will help you understand your environment so when if you got into a new environment you could apply these techniques to kind of see what's going on so as security professionals and specifically as security monitoring professionals i have had very little

ability to affect change in terms of like policies and things like that so you can understand well it might be normal for users to rdp into all your servers right it may not be best practice but it happens and i can't change that so this is one way to monitor that type of activity so we'll talk about baselines so like i said best practices always state look for trends and abnormal activity so here's an example of just uh aggregated windows event count per hour right you see a significant spike at six o'clock but you know is that a trend you know is this even abnormal and really what is normal some of the things we're going to tackle so i'm

looking at these you can see that at first it might seem like there's a significant spike and something's going on it's bad but as you go on you see that's happening every day so either that's normal activity or you've been pwned and this has been going on for a very long time so what is a baseline i kind of give a fundamental definition of just it's a measure of the normalness of some set of data right i try to get too technical with these down here is a reference to i pulled it from a book called logging log management by probably the godfather of security monitoring anton chevoken so that's a great book it's kind of hard to get through log

management's not the most electrifying thing to read through but it's definitely valuable in my opinion so we're going to talk about baselining and in particular baselining events that kind of follow a normal distribution what i found is that a lot of the data that we look at will be approximately normal and if you're not familiar the normal distribution really just means that on average activity will occur near the average and as you go farther out either up above or below that account the probability of that occurring will will be less and less um and if it follows the normal distribute distribution you can see that uh tip with 68 of the time it'll fall within one deviation

above or below the mean and 95 percent time once or two standard deviations above or below the mean and then 99.7 three uh give or take so we're going to be using this data to create our thresholds after we've determined that the data is uh approximately normal so what types of things can you baseline uh really anything that occurs uh on average and will deviate you know above or below the average um kind of equally we gotta it's we gotta have good symmetry so things like event rates which we'll talk about authentication failures maybe bytes transferred in a vpn session um could be good candidates uh the first question i always get is what about non-normally distributed data

well there's other techniques for analyzing that i'll probably show a tool at the end called twitter's anomaly detection for non-parametric data but we're going to focus on normally distributed data and creating thresholds and analyzing that so we'll start with the windows event rate so windows event rates um it's really exactly what it says the quantity of events that are coming in so it's often measured in events per second uh we we aggregate in events every five minutes could be events per hour per day uh whatever you want so basically event event rates in your log management system it's probably not going to catch an apt right like an apt is not going to be generating a million

events that don't normally occur they're going to try to try to create as little as possible to kind of fall in with normal traffic but it's i still think it's extremely valuable for um things like uh determining log sources that have failed uh determining scripts that have gone haywire like maybe you maybe the password is changing a script on an admins desktop maybe there's operational issues you can't catch an apt if you're not logging and if you're if your agent on the domain controller failed well and you don't know it that could be a significant problem once that data starts to roll over and also you may not you maybe think you're logging but you're not due to

audit configurations so maybe you don't even produce logs for login failures so here's an example you know some people may or may not be familiar with windows this is a standard windows log you can see that um there's a little diagram you know we transmit data typically through syslog windows has windows event forwarding there's many ways to get the data so but most log management systems sim systems can uh will measure the events per second because they use that data to determine and are we overloading the system they usually have some sort of specifications say you know we can handle 10 000 events in a second or we can handle 3 million events surges and things like that

so obtaining this data is probably the most difficult thing i talk about i think in data science and data analysis in general i think getting and cleaning the data is probably the most difficult from what i'm told but like i said most systems will have that here's an example this is elk measuring events per second this is a sim right here measuring events per second and if you don't have either of these systems you could look at a custom uh application using a graph a metric server so statsd from etsy as a way like log stash can output stats d so if you can start counting your events um and and visualizing and analyzing them

so data available to me available to me was aggregated uh into five minute periods so um i extracted that data pre-processed didn't pro and postgres and i was i wasn't sure what i was looking for so it's early the day really the data started off with just a time stamp a windows count and syslog count but i extracted the day of the year the day of the week the hour of the day because i'm not sure as we start to explore the data what we're going to be looking at are we're going to be comparing specific hours of weekdays so we're going to be carrying comparing full days against you know all mondays we're going to be comparing

weekdays to other weekdays so those are things that we're going to find out but basically the data is all just a comma separated format with uh integer quantities for event rates so before you start you always ask some questions and for especially for these baseline you're going to be asking is this data approximately normal that's going to be a big one for for developing statistically relevant thresholds you can always just calculate standard deviations and add and subtract it but it may not be statistically relevant as we talk about so if the data is not normal can you make it normal so can you subset the data like with this i'm going to assume that weekdays and weekends weekends are

probably going to get far less data so that would skew your data and you may have to model those separately and are there anomalies within the data set so once you look at the data is there extreme anomalies that you're gonna have to deal with and we'll talk about uh how to do how to do that so the first thing i did was visualize the data um these are uh the quantity of events that occur per day so you you see that there's kind of us uh you know like most of the data occurs right here then you see uh two sequential days far below so first thing i think those are weekends right because you're going to say

typically the traffic is going to be lower because there's not many people it's going to be automated processes always generating events so there'll always be events so those are a few things that immediately look at so the data was probably skewed due to weekends there appear to be several outliers so down here this one's extreme and we'll talk about this one in a minute but then also when i look at these are there any other patterns that you see here that might be considered kind of a local outlier does anybody see it yeah right here so you look at that you say okay there's two sequential but here there's three so let's talk we'll that's

something we need to look at and that's the first thing that i uh queued up to when i was looking at this data set but when i look at the weekdays so i think these are weekdays the data looks to be you know based on this graph could be normally distributed you see it's usually happens around this sometimes mondays are lower sometimes they're higher sometimes tuesday is way high and sometimes it's lower so there seems to be a decent distribution above and below could your first cluster be a holiday weekend that's a great point and you're absolutely right so that that right there is actually good friday so it's a stat holiday on that one so

um so i'm going to take a look at the and now i'm going to color code the weekend so now we definitely know that this is the weekend so this saturday and sunday this is a friday so i took a look and that day is the 85th day of the year so i took a look and that's good friday and they happen to have a stat holiday for good friday so this is far lower than what we would normally experience this down here is something that needs to be investigated what's going on here this is what we're going to be looking for we want to see when things like this happen and what that was was audit

configuration so they didn't have the proper audit configurations enabled so they weren't getting anywhere near the log data that we needed to be getting and once they enabled that it turned out and they were trying to get this done before good friday i think so there's two possible outliers we need to figure out how we're going to handle them i find that within the data science community and statistics community handling outliers seems to be kind of a really uh tough debate how you know do you get rid of it do you keep it but i think that in these cases we want to get rid of them especially this because this is not normal traffic so we don't

want to use this to base our baselines off of it's not like this was a naturally occurring because um there's a big project and everybody's working and that's generating a lot more events it was just strictly due to a technical deficiency and this is a holiday so i definitely when you're doing something like this take a look and say do we even want to include stat holidays okay because those are going to be lower um that's going to be a choice it's up to you and really the worst case scenario is that this um would be an alert that day do you have people that would even uh respond to an alert on a stat holiday if

no i would definitely get rid of it there's really three different methods to deal with outliers so trim excluding it you could which is uh trimming there's winds rising which is replacement so you could take the average friday and just replace that value with the average friday um you could also just keep it so if it's a naturally occurring outlier i would definitely keep it like all you just really have a lot of people working hard and doing a lot of administrative activity that's naturally occurring and that should probably stay within the data set so we're still kind of going through the exploratory data we think this could be a naturally or this could be uh normally distributed it's

impossible to tell from this graph alone so we'll take a look um we're going to look at the histogram so the histogram will take uh we'll bend the data into groups and and tell you the frequencies so in normally distributed data you're going to see some symmetry around the median but here you totally see the um the skewed data due to the weekend so this is including the weekends so after removing the weekends we see a little bit better data there's a little bit better symmetry but it's still that good friday outlier again i'm going to remove that take a look now we see the data looks much better we still think it could be skewed here but again we're

only looking at about 30 data points so this is a sample the data is never going to be perfectly distributed we need to find out whether it's accept acceptable for our analysis so there's a couple i'm going to talk about a couple techniques for looking at uh the data and determining if it's normally distributed so the first one is called a qq plot basically that maps the the data versus the theoretical quantiles and the only thing that i need to know is is this data following this line if it's close to this line it has a better chance of being normally distributed and we do see on this data there's a few points where it starts to

stray off so this looks like it's pretty good i have a better feeling this is normally distributed but i'm not still not totally positive you'll never be totally positive we're just trying to determine if this is approximately normal so that we can create upper and lower thresholds using the statistics of normal distributions so the next thing i did is i said well what does a what does a randomly uh generated normal distribution look like on a qq plot so i took the mean of the data set i took the standard deviation and i pumped it into uh our norm on using r and generated this data randomly so randomly absolutely taken from a normal distribution it looks like this you see

it deviates a little bit so this makes me feel a little bit better about the data you see that there are a few points that are straying from the line there's still only 30 data points we see similar deviations so i feel better that this is normally distributed and you can see comparing them directly but this is still a visual interpretation it's up to your judgment and a lot of this will be much like outlier detection there's no without lighter detection there's no specific guidelines for determining what exactly is an outlier and how to handle it but finally we're going to take a look at another test this is the shapiro wilkes test so the

superior wilkes test is available in our but you can also have seen it done in excel it's done it's available in multiple packages so um what this does is really just feed it your list of quantities so that was a list of all the events through the the data set that we had feed it into the function shapiro.test and it outputs a p-value and what that p-value tells you is that if this is over 0.05 there's a possibility that this was taken from a normally distributed data set and the higher this value the better that possibility is if it's below 0.05 we would have to exclude it from being normally distributed so we've determined that the data

appears approximately normal we're going to be going through the exercise of baselining weekday non-holidays you're going to have to to analyze the weak end data separately you go through the similar methods but you'd have far less data because you'd have to use you know 30 15 weeks worth of data to collect that so the thresholds are going to be calculated to determine surges and dips in the value so we're going to use this data to create the thresholds so here i've rescaled the data you see this red line is the median that's the average value you can kind of see above is about the same amount of events is below it it looks pretty evenly distributed you

see to me it looks like there's more in this band and as you go farther out there's less so we're going to take a look at calculating the deviation of this data kind of basic statistics the standard deviation measures the spread so here i have lines for one standard deviation two standard deviations and what we see is that 64 of the data is within one standard deviation 97 percents within two um and i've provided a couple examples of how you can calculate this so an r sql excel pretty easy it's a it's a base uh mathematical function so going back to our uh normal curve here we see that 64 68 of the data should be within one

standard deviation our data's got 64. so we're pretty close and this is we got to keep in mind this is a sample uh from real data we see 97 within two standard deviations that's pretty close to our 95 thresholds so i think we're pretty good pretty close to the what they call this the three sigma rule um that you know this chart illustrates the three sigma rule which is a kind of a base property of normal distributions so for me creating thresholds if you two standard deviations works pretty well for those thresholds and you can say that 95 of the time they should fall within that uh threshold and after that um that would be somewhat anomalous you

could use three standard deviations again depending on your use case so for the example of events per second usually when there's significant deviations it's major like audit configurations have stopped network configurations are blocking syslog those are things that cause significant drops so depending on the use case you might want to use two maybe three uh standard deviations uh so we're going to talk about baselines as a use case i think this is kind of a fundamental topic through all the ground truth tracks too i hear a lot of people talking about use cases when it comes to security monitoring i feel probably one of the best things you can do is to create well-defined use cases i hear a

lot of complaints and the answer to most those complaints with uh with any sim that anybody has ever used i think could be solved with uh with a well-defined use case um so we're going to walk through another baseline example uh i pulled this example from threathunting.net so if you're familiar with uh threat hunting a couple of people that are very well known have created a github repository for successful hunts that they use and one of these hunts utilizes windows log on events protecting lateral movements and rdp external access so that's we're going to talk about is a pretty good security relevant example also what i like about thornton.net is some of the techniques rely on

baselining and outlier detection so both are are very applicable to what we're doing right now so if you're not familiar with rdp it stands for remote desktop protocol so if you want to log in from computer a to computer b and have a a gui uh interface to that computer you can use rdp um that produces event it's a 4624 windows event id of type 10. the type 10 indicates interactive logon or remote interactive logon and that's what we're going to be using for collecting this data so we're going to baseline rdp access within the environment um rdp access could be used for administration it could be attack or movement like i said lateral movement

but it could be excessive connections from a new administrator rdp should not be necessary for most administration activities for my knowledge but i find that is often used for that so this is a good use case because administrators will use it but it may not be maliciously so we'll cover kind of the use case primer this applying this use case to not just baseline to anything you do within log management i think is critical so i've pulled this from a resource called infosec nirvana they kind of outline a good um a good methodology there's a great it was i think it was a stable talk at derby con last year or two years ago it's pretty quick 30 minutes this is a

very in-depth intense use case walk through so it provides documentation tracking use cases as they go on and measuring new use cases as you create them use cases follow i think it's an eight step kind of program so we start off with developing the requirements of the use case defining the scope so what's this use case going to apply to your whole environment a remote office specific assets things like that then what event sources are going to be needed for the use case and do we have that data we're going to validate that the use case is feasible so in our case we're going to look at determining whether the data is normal what logic we're going to use for

generating the outputs so you know when do we send you an email when do we send you a report when do we create a dashboard event etc then we're going to go through implementation and testing the actual response to the alert so you know one of the biggest things i get is okay i see these alerts in the dashboard derek what do i do now so defining that is critical and then ongoing maintenance the first step is defining the requirements so requirements are really a high level why are we doing this so we're either doing this to enhance security another one might be compliance or regulatory or maybe it's a business use case of enabling availability so the

business process can uh can occur so for me rdp access probably falls into security and compliance um you know pci requires uh monitoring authentication remote access to systems and it's also just a good uh a good housekeeping for for monitoring remote access within your environment so defining the scope the scope defines the physical logical group so what's what's this use case going to apply to it's really what you're asking yourself in our case it's everything and that'll probably be a lot of cases when you look for people clearing logs you're not just you don't just care about a specific office but you may be sending large to different people and in that case you would have a more granular

scope for that so what event sources are needed i've already covered this 4624 is a windows logon type 10 indicates remote interactive one thing to keep in mind a lot of people don't consider is that there's windows audit requirements that need to be enabled so that you can even log these events so you see log on log off and you want to go to you want to be logging successful logons and that will generate those events these events will occur on the log sources that you're getting so that's something else to keep in mind and here's just kind of an example of the local policy editor so this is where uh you might you change these so

um in the advanced audit policy configuration windows 2008 and above has a pretty granular ability to to log different items so we'll talk about validation this is probably where most of the work goes in when you're doing baselining so during this phase you know event sources are validated you want to make sure that the events are coming in that you're receiving them in a simple use case and common problems will include like non-uniform audit policies so for example you might have audit settings on one domain controller that have been applied locally but not across all your domain controllers so on one you might be getting log on success and another you may not so having a

a very uniform policy is very important incorrect auto policies so you may have the same auto policy across your environment they're just not set to collect the information that you need configuration issues so that might be agent installation credentials that need to pull the logs um and then logging infrastructure so do you have if you send all your data through syslog but the firewall blocks this log then you're obviously not gonna get the data maybe you have syslog enabled everywhere except for your dmz and in that case it's blocking the data so that's something else to think about but in our case the exploratory data analysis occurs in the validation phase so determining whether the data is

normally distributed determining um whether it's going to make a good use case for and what our thresholds are going to be so here again i've plotted rdp data on a for a 30 day period per day so per day we see on day 20 we see almost exactly on the average and so on again this data does not look heavily skewed so you see kind of the same data above and below the uh the the average here um you see most of the data lies in this band and kind of as you go out you see less and less so just off the top of my head you see it looks pretty good but when you look at the data 93 of days

within two standard deviations 70 within one so that closely approximates our three sigma rule at 68 95 99.7 so i took it take a look at the histogram the histogram looks pretty symmetrical about the mean so there's approximately the same amount of data above and below the average but again it's hard to say but i think this is a good candidate for that and finally the shapiro wilkes test that we talked about again i fed the rdp count per day we see that the p value is 0.39 which is far above the 0.05 value that kind of rejects the hypothesis that this was not taken from a normally distributed data set so we're pretty good we think that this

is normally distributed and we can start creating some thresholds so the point would create a threshold in the logic section so we're going to find the exact details used to generate the desired output so in the case of you know clearing logs it'd just be that event in our case it's uh going above the quantities are exceed or go below our thresholds so for this we're going to use two standard deviations so 95 of the time the data should be well within that range and five percent will get alert or some sort of alarm um so the actual logic will be we're going to compare the current value for yesterday to the baseline that was

generated from 30 days prior to that and if that's above or below the data we're going to send an alert essentially um

well they're all they're all weekdays so i'm not yeah i'm not worried about it again this is only a weekday data we're baselining in our use case we're baselining weekday data so that's a good no it's a good thought because if it included weekends or significant changes or if the data changes on a day to day maybe wednesdays are always big mondays are always low and fridays are always low it may not be a good uh choice to baseline all weekday activity so we'll talk about the implementation and testing this is where you test you actually implement the use case you create the thresholds needed and you define the output the outputs can really be an email

probably all familiar with email alerts reports it could be a dashboard event maybe you have a running talia on your dashboard could be just a new generated security event to be used with another rule so maybe you have rules in place for you know if if uh so much weird activity happens for a specific user we're going to trip a a bigger offense so and there's probably others i'm not thinking of but i think you get the idea there and finally defining a response this is a big issue right like what do you do with the alerts so during the use case development you need to figure out what are you going to do with this so create

a formal response procedure you know this is good in a document you know companies always like documents and having a use case catalog really works well for uh showing you know what you're doing on a day-to-day basis how you're managing that and i've given that to otters and it's worked very well for finding and proving that you're actually doing the work that you want um so for rdp logon investigations for me i would have a list of questions i provide i'm saying i'm going to look to determine if the rdp logons appear to be a threat that's what you're looking for is this administrator is this somebody with compromised credentials moving laterally throughout your environment

and you do that by you know are the logons at a time for that user so you see a user you see like maybe it's the network administrator or the network manager and it's on a sunday at 2am that may or may not be normal for your environment so that's important to understand are there a large quantity of unique destinations so are the user is that uh you know are you logging into one server repeatedly or all the servers in your environment should that user be logging into those servers are there single or multiple sources how long were the sessions were they logging on looking for something than logging off or are they logging on and

doing a variety of tasks what processes were ran by the user when they're single or multiple users you know was it one specific user or maybe there was 10 new admins hired and that just increased the threshold and finally are there any any major operational activities so i would include these as part of my investigation for this specific use case i think it's very important because you know i think someone said earlier you got to ask yourself why we're doing this you know you don't just do this for fun or maybe you are and you just want to put on a nice dashboard at the top of your sock or something but and finally the maintenance this is

going to be a little bit more involved for baselines um and but the maintenance is really you know every use case needs to be maintained and i find that you know organizations will have rules but they don't know if they've ever triggered or whether they're triggering properly if they're correctly configured they're even applicable anymore so this is where you kind of look at those questions so yeah just evaluate the use cases on a schedule on some sort of interval so you know every month every quarter semi-annually annually take a look and say you know is this still applicable do i still want to even use this do i still need bass lines um this is also where we identify an owl

and handle kind of uh outliers and just determine that in a baseline is the data still normal like i'm showing 30 days but in a period of 180 days that may that may change it definitely will so for this phase we're going to primarily focus on monitoring the health of our baselines um and you do that by evaluating the data the same way that we did before we're going to take a look at a bigger swath of data and and just see is it still normally distributed and is it still relevant so here's a graph i talked if you recall the p values the p values if it's over 0.05 we can say that it's probably

normally distributed so here's a graph every day that we generate a baseline over a period of about 101 days so there's 101 baselines generated we're graphing the p-value so anything above this line is good anything below this line is bad we see that there's 73 or 73 of them were are from good data sets 28 are not so there's a period here for about a month where the data starts to change and it doesn't look good for creating the baselines when when that happens you're going to start to get more alerts and that's really the the worst case scenario you get more alerts than you than you really should you're gonna get more than that 95

so here we see during this period these baselines were not we're not um we're not relevant here's the data that contributes to those baselines to me when i look at this i see i see a clear breakout about this point this point happens to be about 30 days so now we start seeing how what appear to be outliers of the time are actually normal and what happens here is there was more rdp logons due to this could have been done due to new admins maybe there's operational procedures that they had to do like you had there's a big patching push everybody had to log on and start patching the products but what you notice here is that after a

while this data starts it's still up above but this data becomes normal again and it's still relevant so you during this phase you're gonna ask yourself should we even still can we even still create these bass lines and are they working for us in this case it does uh deviate but it it it gets uh it comes back to to life i would say so you would not you would still use this data it's still relevant in my opinion and you're really just going to get more alerts during that period of time so during the phase you're going to decide how to handle how to handle these data changes in this case i would leave

the data these are probably naturally occurring i did not investigate so really the first step would be investigate and and ensure that these are naturally occurring and not somebody that's been not a compromised credential that's been in your network for uh they could have been a you know an attacker there for for a month and they finally uh patched all their systems um but in this situation clear breakout so i'm not i'm not worried about it but you have a couple options accept the deviations and wait till it recovers remove those outliers or scrap the use case in general if it becomes too chaotic you're not gonna be doing yourself any favors by investigating alerts uh you know every day that you

know are just wild goose chases uh so we're kind of wrapping up here and uh summaries pretty straightforward you know identify the data make sure it's normally distributed and you can use two standard deviations as the baseline um you know the use case the gen the high level use case for this regular outlier detection which is what we went over so if you're sim you you're managing these thresholds that's kind of falls into that uh that category proactive investigation and hunting so you get into a network for the first time and you're trying to see if there's funny activity going on that you can start using thresholds to determine hey is this normal activity and then just general environment

exploration so you're new to a company you want to figure out what's normal on here what what can you create rules on what makes sense and what doesn't um so during this presentation like i thought the star i thought would be very easy and i'm as i'm going through this all this stuff and i'm thinking man we could really push this so i mentioned a breakout detection or i mentioned a breakout well there's a twitter github package for breakout automatically determining breakout detection statistically so you can see that there was a breakout and handle that accordingly um automatic outlier detection so there are some strategies for determining outlier detection but nothing uh no specific guidelines but maybe coming up with

automated outlier detection and in a way to accept or reject those outliers and then non-parametric analysis so if your data like a stock market will trend up but it'll deviate up and down that will not follow normal distributions but there's still ways to analyze that and you can use a package like the twitter anomaly detection for my intervals i use 30 days you might be able to find better intervals for that and really with the goal of determining or having the least amount of outliers and alerts while still remaining statistically relevant um and then additional use cases i'd like to go through so i covered two they're pretty basic but there's probably some really cool ones that can

be applied so i'm gonna be exploring that and then um and finally kind of looking at multiple baselines so if you create if you have you know 10 baselines that apply to each user and you're mapping their activity you could create a sort of a weighted system and determine which user is showing the most anomalous activity and then investigating that that user so you know one user might often trip one baseline but if they do nine you know that's that's something significant you need to look into that so kind of investigating how we can uh stack those baselines to really look for global anomalies for for specific users here's some references so like i said i

really like the logging log management book um you know if you're interested more in the shapiro wilkes test i definitely recommend infosec nirvana for the simus case it's high level pretty straightforward you know there's nothing uh uh difficult there and if you're interested in the anomaly detection and breakout detection these are easy packages you're just feeding in a list of of quantities and it'll it'll graph that for you and then threadhunting.net if you want to take a look at additional use cases for evaluating outlier and baseline detection that's about it do we have any questions

hi thanks good talk um i'm gonna ask maybe the most obvious question uh what if it's not normally distributed or if it's like a fat tail or something like that exactly so that's where like i said this will only work on normally distributed data and you can take a look at a package like anomaly detection you have to look at other other methods and that's something i have not explored and there's a lot of data like that for example during the investigation i took a look at account lockouts and in many organizations account lockouts won't happen at all so your your your most frequent event's going to be at zero and you're going to only see

a tail that will that will rise they'll go down and that's not that's not a normally distributed data set so you'd have to look at that in a different way so depending on how that is you could only have an upper limit or take a look at some other methods so i haven't gotten into that but something i'm definitely going to explore further

you talked about having a petabyte of data i don't know how big your user base is but what are you doing for audit log reduction for log reduction yeah compression no i don't know no no no i no i mean it's like filtering out the data yeah i know um you know so we're kind of getting into to my daily work but you know what you for most sims you have 11 to 40 terabytes of data and that's compressed after some sort of uh interval so for most comp organizations use cases from a smaller medium size uh 40 terabytes will store a significant quantity of data more than they ever really need so i'm not i'm not usually worried about

reduction but if you're looking at log reduction things like looking at your audit settings following cis benchmarks right they say here's how your audit configuration should be and you can start tuning those to turn off stuff that you're not using there's also different methods so they call it output driven logging so if you have a list of use cases what events are going to do you need and only logging those events but then if you need it for forensic investigation you may need additional events that are not part of that so there's multiple techniques it's not something i typically worry about because i haven't had to storage is you know far more than than i need you know

usually so um instead of drawing sort of like a mean on a graph that involves both the weekends and weekdays or other sorts of like predictable anomalies i guess like that do you think it makes more sense to be separating them out and then only evaluating the data for example on weekends on a weekend-only graph et cetera so that you don't have to be constantly looking and then maybe you miss a dot that looks like it's a weekend dot or things like that yeah absolutely so i think we're covering so you're talking about like uh going back to one of the very earlier slides

this one um so like this one you're kind of talking about i think and i didn't really know that these were weekends at first that was just the first thing i thought so you kind of determine that uh through the exploratory analysis and you know in some instances it may not change from uh weekend to weekday so you got you need to figure it out but once this data right here let's cover it this data only contains weekdays so i kind of do spread it out and i just don't go into the baseline for a weekend but you do the same exact process so just for brevity

these are in this case it's all windows events in my other graph it's only the rdp logons so if you see this is really 12.5 million which actually isn't that much but it's pretty stable there yep this is more of a comment than anything else if you uh if you start looking at uh auto regressive and moving average models then you'll be able to actually incorporate those weekends into the uh into the baseline that you use because you'd be so that's a good point that's and that's something i'd definitely like to talk about anybody if they got some ideas on these because this might be rudimentary and there might be some better options so i think like looking

at seven day moving window kind of thing is that what you're talking about well uh auto aggressive models essentially they they take each so if you do a seven day thing like that it will normalize it for that for that seventh day okay or seven days back or so and it'll help remove some of the seasonality oh that's that's interesting yeah okay and if you look at the twitter packages uh the seasonality is a big one with these because depending on your environment it could change from season to season um i think the twitter package uses the esd algorithm for its um which accounts for seasonality and things like that so i'll definitely look into that

okay

i was wondering if you could tell me a story uh on so you showed basically looking at two styles of events here the windows event and the rdp did your looking at that data when you initially set that up did you find incidents to investigate further um no or did you have to then use those same well kind of so i was unsure about that friday with the dip right so that's one thing i did check out and this one as well right so i looked into that and the audit settings were changed between this day and that day so that's something i could tell um we we do uh take a look at account lockouts so i've

seen in my past like uh companies that allow rdp externally they didn't play they didn't put a host behind the firewall all of a sudden accounts were getting locked out left and right and surges in the account lockouts indicated that when they went investigated and determined that that the appropriate uh you know preventative controls were not in place so that's another story there's yeah there's plenty of stories so in my past life um monitoring so i used to look but i'd only look at surges within the time series graph i was not analyzing it like this but i would see uh severe spikes and then the eps with baseline or just you know just stop and then it would start

falling behind and logging configurations on the cisco ids modules for a client i used to have in my old organization was incorrectly configured it was trying to pull down patches and during that was constantly generating dns requests and in a highly secure network and that actually led to cameras being taken offline so determining that we nailed down the process we're able to fix it and we kind of used trend analysis for that it'd be better if we could have more quickly determined that and had and had a use case developed around something like that to quickly determine those thresholds so yeah

okay i think that's it thank you

Determining Normal: Baselining with Security Log and Event Data - Derek Thomas

Related talks