
alright I'll just go ahead and start I've got a lot it's I built this for like 45 minutes so I'm gonna try to squeeze it down let's talk will fast so I used to work at NASA help though their Security Operations Center I also did some time at mandiant hard time and then we're back in our stuff a little while but now I work at Cisco umbrella which used to be we used to be Open DNS and it was purchased by Cisco so I'm talking about things like behavior now this is how you might watch some behavior of visiting a website you see where people go where they click etc etc I might also look for
things like you know I think of it like this like anomaly detection there's a bunch of people walking and all of a sudden there's some weird unpredictable unsimilar behavior you're looking for that kind of thing uh but doesn't what I'm doing is not yet as cool as that but trying there's some current detection methods that we have I'm just talking about a couple overview here IDs antivirus we've got people IDs is based on signatures so it's based on things you know already with you can maybe be a little bit predictive if you do some regular expression stuff you could things like this like you're looking for certain kind of content in in something trying to find I in this case a remote
include vulnerability requires human and intervention so you've got to work with these things from these signatures you got to keep them up to date and adjust them quite a bit it catches known threats so like I said - nothing catch things that you don't know about this is not really predictive antivirus of course same thing uses the signatures based on known malware variants or file variants its host based it can be host based yeah so we're working with things like CSV files you may if you do this kind of work which you probably do sales sorry they give you CSV files or even worse Excel files look at that we work with pcaps active directory logs I touched on this briefly
proxy logs DNS logs you know kinds of things and I'm also going to be streaming logs so say you get that awful CSV file and it it's smell enough to work with you can do certain things just to kind of get an idea of what you're seeing in that specific time you can make a pivot table and you can talk faster than I do and you can count the number of times domains been visited and then you could go graph that in a second it takes a while you know you can get an idea of like what the bandwidth looks like for a certain domains things like that now we want to find normal so we want to
find normal so we can get rid of normal basically so we can find the anomalies and also get an idea of what weird stuff is going on in the network so how do you find it can you find it in this example you're looking at a bunch of domains that look really bizarre I mean you if you saw this in your network you would probably freak out it's a bunch of DNS tunneling traffic of course if you have you know a standard Jesus Singh regular there mains go by and then maybe there's one weird one how do you find that without your signature detection but how do you find that in a different way you know and how
you find that in this and of course you're probably not watching the traffic go like this the Travis is going and you have something monitoring it and hopefully process him through it but how do you find it or how do you find it in this CSV so I've got this first my first attempt is I'll count the number of times that a domain has been seen and so in this case I've got Facebook you know a bunch of Facebook domains that I'm interested in maybe I'm not interested but as an example I count the number of Facebook subdomains I can graph that to get an idea the bandwidth like I showed before I could count other domains I can
remove normal from that and look at the domains that I visited just one time and so far we're doing all this manually but let's just keep going so I've got this list of just one time maybe two times or less or three times or less would be good but those good one for now and we'll grab a random selection from this list of stuff so in this case I've got some highlighted domains these are things I I just have a gut feeling they're probably fine like military comm nothing it's fine but it's you know it's a thing and so I'm gonna remove those and then I've got these other ones I I'm a little more curious
about but I I feel pretty good about them I'm not sure about mister steam calm but it's okay I remove it I'll look at this this Medicare supplement maybe I go visit these sites using some way and I see it's a Medicare site doesn't mean it's not compromised or anything but I'm just go with my gut feeling as an analyst you know health care so I cross it off this site it has a five and it looks weird it's got a default Apache site so kind of suspicious go look it up I'm sorry I'm seeing some stuff maybe with SonicWALL and my gut says it's fine for now I mean maybe it's bad minutes I'm trying to go through a bunch of
domains real fast this one is a parenting blog so I'm not too threatened by it unable to connect this is no HTTP content that might be other stuff you may do other scans or something I look it up it's a Microsoft relation thing another one of those things seems to be some kind of phone spam I don't really care maybe I'm looking for something else so then I've got this one no contents look it up I've got some more information threat crowd has something but I'm not really getting a lot of hits maybe I do some other analysis I just don't care there's this last domain not in pass net and same thing I do some searches I see that
there are some files associated with it and it it's got some bad stuff it steals information it does persistence you know it's got some some extensions that looks like lucky actually you can see using Open DNS Network stuff you can see there's a spike in traffic all of a sudden whereas before there was nothing so we got one and we can continue on with our day it's a long process so it's mostly manual requires expertise so you have to you have to have that gut feeling you know that this is fine or this is wrong or something like that and you have other indicators too maybe you do API call us to various servers and stuff
like that but ultimately it comes down to you just eyeballing it so I'm hoping to automate things a little more so that's what I've been working on automating a lot of that process what are we automate we want to auto clean logs and network streams so you don't want to have to go through yourself and write scripts every single time you get a new log file on a process that stuff find ways to remove that normal like more efficient ways do some categorization so categorization obvious is very important to behavioral analysis if you want to know what's going on not just regular categorization but like security category categorization as well and save it to a workable database of
some sort or something that you can work with so you can do time series analysis this is a lot of stuff that I do and maybe do some visualization because other people want to look at it too so it's all different in this case they've got some DNS logs we've got these different timestamp formats I get DNS logs all the time from customers from our resolvers from all over the place and it's different items are in different locations it's tough to do some manual work initially and you know when I show you later on all the stuff I've built I've got all that code available publicly so you can go and be you might still have to move things
around just a little bit I've got Active Directory I mean that's looks nice to the eye but it's not really easy for managing with Python which is right there everything in system locks HTTP logs everything is very different so anyway the manual process I think it needs to go away so you got to use more what makes sense to you again if you use Excel like in this example this is a really large file I'm trying to open like 500 megabytes or so which isn't that large but it wanted to import it into multiple worksheets and it excels running at 100% and then at this point you want to quit Excel you want to force
quit it and then move on to something else because it's not going to work so don't use that you our and I would like to use our a little more it's a lot faster but I've been using Python and I only have enough space for one programming language at a time so I'm using a lot of libraries like mat plot for initial studies of stuff pandas for data stuff and plot Lee I used plot leave for some graphics and things and then I moved on to d3 use some d3 wrappers like c3 I use something called metric graphics JavaScript thing it's all in my documentation there's some tools I use for open source for taking that data and and working with it
to find the weird stuff like PI a SN in stock line a s and lookup I tried to make it so as much as possible was offline or something that everyone can do unfortunately I'm still using some things that aren't into that so pilot and you could just apply you apply the domain it gives you the IP and the ASN that that exists on and you can do things with that which I've shown it's like a new Python who is which is like a who is look up thing use network X to combine the things you find so in this you'll see i'm using pi is n to go the domain to the IP to the ASN and then i
can create like a graph you know and that's a very simple graph and it can get more complex like this kind of thing or import prettier I guess and you can arrange in all sorts of ways I'm using some paid tools because you know yeah I'm using open ETS since I work there it's nice and this isn't a vendor talk but I'm good I'm using as an example because it has some good categorization and security categories categorization so if you take a domain and like with this you can kind of you can search it as an API which I use heavily but you can search a domain you can see all the traffic the who owns the domain the
passive DNS information a whole bunch of things that you can pivot off of and find information from it's it's not the entire internet but it's about 3% of the internet so you get a good good idea of what's going on I'm using virustotal and this is something used for free but if you're using API it's only for requests per minute but in this example you can see like if I put in a domain that same domain I'm not sure why I chose that one but you get a bunch of hits or samples associated with that domain and then you can go pull those down and you can get some other information that you can at the API you
can grab all of us and work with it as you need so talk about creating a baseline first I talked about the counting things and I'm gonna remove what I think is popular so I'll start with this data from Open DNS I took some data from from one of the resolvers and looking at it it's it looks like this it's just a bunch of stuff you can see those domains in there and there's time two lives there's the record type a record whatever IP ASN we'll go ahead and try it though so over 10 times I'm going to write to a file called normal traffic text under three times right to suspicious traffic text and it's only
nine minutes of data so it's not a lot of data but it's a lot of domains over six hundred and sixty-four thousand when I get it I have this suspicious traffic which is six point four megabytes and normals 247 kilobytes so really didn't help a lot but maybe it did a little bit if you look at the normal traffic you do get an idea of normal traffic you see a lot of Google okay how could DNS you know and network time protocol stuff and the suspicious you sees things that are more suspicious like this wine style dot grew a Russian site and anything Russian I'm suspicious right away so I could look at that domain traffic to me it
doesn't look like anything crazy is going on just based on eyeballing it I'm still doing this manually of course I look at the domain and it's not really what's going on maybe there's this other domain here this dot K Z domain suspicious good look at it and the traffic has kind of normalized but I mean Open DNS its blocked for locky being for for lucky and then you go to the pages it's to take down page or taken down and remove the page but alright so I have a better idea which is to use a top domains list so Alexa used to until recently provide the top 1 million domains that people visit in the world
and then they start charging for it so Cisco decided Open DNS we had all that data while I would just give it to people for free so I mean you can do certain things like you can search Wikipedia and you can scrape this and pull the top domains and things like that but if you want to get the free one there's got a little hundred myself it's coming you can go download basically a CSV with a domain and a count of visits over a certain amount of time I think it's updated maybe every day it was like that and so I decided to go ahead and use that and I just basically open it and assign it to a list which I
then reference when I'm running through all the domains in my whatever file or my streaming network traffic I'll skip that Python stuff so when I use that I run my stuff through there and I have a two files in top 1 million dot txt and not in top 1 million and it's still bigger not in 1 top 1 million but I get a good result on the left side as they in a lot of google normal stuff on the right side a lot of potentially suspicious stuff it's still a huge list of domains you've got to go through and look at so there's got to be other things that you apply to it but I've cut
out a huge chunk of stuff that's been reduced then we can do some security categorizations just to add on to this you usually do need to do a third party service because it's being categorized as something bad in this case looking at stuff I can see there's a bunch of botnet dynamic DNS it's labeled that way through whatever sister I pull this in from virustotal and from Open DNS if you eat a lot of um blank though so you might need to use multiple third parties you could also do the bandwidth just to get an idea of what the traffic looks like on a normal day but the problem is I'm sorry to do that to you but it's a
random data from multiple sources it's nine minutes of data from a resolver so it's all over the world so it's hard to do behavioral analysis on that so we're gonna narrow it we're gonna try with one organization I think this is just one log file looking at it it's like it's a lot it's a lot of lines of text I think it's 24 hours and looking at it you can see there's the time and there's a there's these domains there's a lot of DirectTV in there I don't know why but anyway I I might go and manually modify that and clean out all those DirecTV domains and I'll go through it again and kind of keep cleaning it I'm still doing
this manually by the way and then I get down to this list of domains they're just kind of minimum domains like whatever and I can go look at those and I've got some mistakes in there like a calm and a dot but whatever you can see I've got 666 have used to Yahoo anyway I can remove that I have Def Con Dorothy it's cool it took a little while to do that kind of thing so then I tried I wanted to do a streaming the situation where I'm capturing traffic and running stuff automatically on this so at first I ran TCP dump on my home system attached to a network tap and I'm just watching the traffic and the running is
a TCP dump log and it looks like this you know it's a mess but there are domains there and there's timestamps so there's the things that I'm interested in so I pull those out and I can see a bunch of I do my top domain situation and I'm down to the ones and then I can find a little more suspicious activity kind of doing a weird graph but I see that there's some there's some domains or IP I'm also visiting IDs because T speed up but you can see that there's some of theirs are more suspicious but like this one is a lot of is it's a private Internet access which is a VPN provider I use but if you see that in
your network and you're not expecting it or maybe that's suspicious so you can at least get an idea of what's going on within the timeframe and that's that's me when I don't want to be watched because I'm watching myself new things like compare activity looking at still using that TCP dump data and this is kind of an awkward graph but you can get an idea that this is a day so zero to 24 hours and you can if you get enough of this you can get idea of when I might be home or what I might be doing you get the advantage of knowing because I'm telling you what I did I was at a concert but it it's kind
of flat that's someone else in my house so I don't know the cats go online but they might but anyway but if you keep on going you can kind of get an idea of when I might be at work when I might be home looking at each individual one doesn't give you a lot but spread out it gives you something the color change changed because I did a different server to process this but so we're out of town here it's kind of quiet and then I'm time that looks much more like what I'd like to see in a general graph and then I'm home right there until you're gone at work I don't know why it looks so
weird cuz I was learning I'm at a concert again and then I'm home because Netflix spikes up I'm too pumped up from the concert then you could kind of look at it more like this this is a little bit better version but you can get an idea of maybe you can see the patterns of my traffic it's still not really giving you behavior but I mean behavior of what I'm visiting and where I'm going but you might be getting an idea of when I'm going there so then I decided TCP dump is difficult for this I'm going to try using a DNS server on my network so I installed piehole on a linux server and started routing my log files off to
a place I could analyze them just everyday you know and it goes from January 29th I didn't update it but I last time I updated was July 18th I was going to do it last night so I'm saving this stuff to a database in my case I'm using MongoDB and I'm when I insert them manually of course you can do you can set this to do automatic but when it's manually being inserted into stick it prints out what it's doing and then you can present it in multiple ways like I tried a few different options like using in flux DB which is a time series database and griffons to display stuff I also used log stash to
elasticsearch the Cabana requires a lot of configuration but then I the primary thing I'm using now is doing custom stuff with flask and d3 and plotly so if you use a in flux DB you can get really pretty graphs like this you know for time series actually and you can get a really good idea of like a better idea of when I'm home and what I'm doing things or when maybe not I'm home maybe thousand people are at work and what they're doing and when you could do cabana IV elasticsearch so you can search things and you can build custom queries and log stash and then you can of course create cool graphs taking an
idea so now I'm getting into the categorization of things so using find out what what are the primary kind of categories that I visit on my network or in the network I look at locations people love Maps so but maybe you're expecting everything to go to us or to somewhere in Europe and it's all going to China or you have one little domain to China and it's something interesting to look at so the custom thing though I'm using scripts to auto send data into the MongoDB database and then I'm using flash to auto process from that MongoDB and then it serves the content so I can do things like this like I can do my
count of requests you know and I can kind of narrow down and this is pretty ugly but you can get the idea of where it's headed you can do some security categorization here like it's 40 percent not determined this this is kind of a bad one too but a lot of white list of domains actually in this case I didn't have any black list of domains out of my security categorization so domains that our I thought were bad this is more interesting you can do like so the investigate categories the you've got this percentage of software technology this percentage of news me and you can go with virustotal categories which provides a lot more information I guess they have a lot a
better way of categorizing I'm trying to find an open source source of categorization that is wider is also freer but then I thought it'd be really interesting if I could make a timeline of instead of domains visited on a graph timeline of categories so I started taking those domains and capturing the category and then putting it to a time and getting things like this so I got the count two search engines at that time I've got the six podcasts at that time and so and so on like that so another then I put that into a kind of ugly graph being an idea you can sort of follow along with my daily whatever like Oh between this time and this time
there's a lot of this activity to be done this time there's a lot of this activity so if you're looking at more than just one person you might actually be able to see I needed was just one person you might be able to see a lot of interesting things and then the initial growth thing I made was this kind of thing where you could get a count of where you visited and a graph of things and a website and a little time the serious thing and all that stuff there's a not found because there's no bad traffic in that example but if there was there'd be a little spike in the traffic that other domain there was visited or
the IP that was visited then I started building a this site oh this flask thing and this is it's it's better looking than this actually I should just well show you in a moment but I want to show you that you can do some category mapping you can map this is kind of weird but you've got grumpy cat is the node but you can say like PayPal to the actually this is to the ASN to the IP but you can do you could draw graphs big graphs of these domains are tied to this category so you can get an idea of what's going on it's an interesting way to look at the environment and it could look like this
so I'm sorry they it's the grainy but you know Oh in this case I'm actually looking at subdomains so and that bottom one and some domain called libsyn calm and I've got all the subdomains tied to it so that can be kind of useful when tied together this looks terrible so let's ignore that um you can do just mapping so I'll show you some of this most recent stuff that I've been doing let's see here so I'm gonna start my MongoDB and I'll starts a little flask site I haven't really optimized it so be prepared for slowness still the slowest it's hard to do with a microphone
so you can do things like um you can do a sort of a combined thing of domains in and not in top domains a security category situation over here or you can do a timeline of security categories so this is all the domains and my group this is Justin the top domains and this is not and talk to me did you see I've got two bad domains right here so on that date I visited two places that were known bad by at least one or two third part other third parties and I'm using and you can kind of get a bandwidth view over here you can to get things like um I'll load this once in a different tab
it's a map it's a little slower you can do things like look at all the domains in the top 1 million and then you can sort you can sort by things and look at the ASNs and you can search by status which might be negative one oh there's none there's no bad domains in this particular in the knots top 1 million so oh well maybe that's not the searcher field I have but you can search you can set it up to do things like that that you can build like Maps like this where you can in this particular case that I actually just ran this before before this presentation again on my data if it was if there were block
domains or bad domains or things they're suspicious they pop up as red and you can click on them and get some more information and you can do things like category metrics this this is a little bit messed up at the moment but you can get an idea of the categories in visiting and pie chart form or other form and then like this is a little more interesting so here's the tree of categories so you can click on those and you can drill down into the domain based the category that's been seen inside the environment that's kind of sort of a really quick introduction of that part then I've got some other stuff too so I
just to talk about it I've got some just if you want to take a pee caps and you want to break them apart and play with them and you can open them up in water Shauna can look at the stuff so you know it looks like this I'm sure you're all familiar look at the HTTP traffic and you can follow this and see what's what's happening here so in this case someone's posting some email and password information in this pcap if you have a signature like a fan this idea signature I might can't catch this HTTP POST or something like that but you can also run the scripts are against this kind of stuff against the network
traffic and pull out all the gifts and all the posts and you can write stuff that I was automatically looking for suspicious things like this scrolling down and you can see that there's a better way to get this out than just opening things in Wireshark and you can do things like map - I actually skipped it before the video was over do active directory logs which is a little interesting we had a slice of active directory logs from former location but I played with it a little bit and this is alright this is the first attempt speak and do more things like this where you can do the time series of when things are failing
and when or when people are logging in when they aren't logging in and it might look like this like you maybe you custom you pull out your logs and you you get them the way you need them to be and then you can run this Python script which is included to generate this data
and then you can like good look at it and see if it's stuff that you're expecting like when our accounts logging off what are they logging on when do they fail log on things like that and then you do the same thing with I thought it would be interesting to try it with off-log analysis and this is just a really super simple one so you can see like you know I just grabbed the off lock from one of my servers up to today and just to see I only did pull the invalid SSH login attempts to get the idea of the behavior of what's going on it's a really loose term behavior right there but you can grab more stuff out of
those logs and then I thought actually what would be more interesting is if you saw a log file or you have streaming data and you want to capture traffic and see traffic based on all the clients inside that network so in this example I have the logs from my Infoblox device and looks really messy I don't know why it looks that way but it's just you know client you can see there's a client IP and then where they visited and then some other information which is good but not what I'm using right now so I process those it'll process through all the different clients it sees and then it creates this HTML file for me that I
can go use and I can see you know I just put it this is a simple example but I put it like a bandwidth graph of each individual client so I can get an idea of what's going on and then I could do further processing on that so I put all this on on this github didn't see and kind of work through that already already I won't give you the demo this is a more extreme example of a much much larger log file of all those clients so I've written a lot of code for this and it doesn't look as bad as this anymore I've fixed it up a little bit it cleaned it up made a little nicer I'm you know
you know how it is so I'll just some things to think about it's not a perfect solution I mean intrusion detection AV and user awareness of course always still have their place and this is an ongoing process I mean I'm I haven't even touched on machine learning yet because I haven't been touched by machine learning yet and I'm still learning but I I hope that I could make this something kind of intelligent it could decrease the work you have to do and that's the dream I'm pretty sure that we're all if you do analyst work 10 20 years will be sis admins again because machines will be doing all that work for us can provide more visibility
into your network because there's so much happening that you don't know about and it could possibly alert you to anomalies before anything else you don't want to be watched just a disclaimer because I love privacy you know use a VPN on your network you know am I in the US they have a thing now so they ISPs can watch our data and use it for they can sell it to advertisers so it's just like you know you don't wanna be watched use your own DNS server and send it through that VPN and you can learn a lot about who you're watching but here's the github is up here cat outline has all my stuff in it so I've got a personal email
I got a work email you can see all these slides as you sell them today at that place and then the Treasury so any questions it is a work in progress so keep if you're interested keep track feel free to contribute I have a lot more work to do but it's on it's fun any questions quick one
oh okay I think that's it [Applause]