Alex Kirk - Incident response and threat hunting using Bro/Zeek data

Name: Alex Kirk - Incident response and threat hunting using Bro/Zeek data
Uploaded: 2019-10-06
Duration: 29 min 14 s
Description: Alex Kirk presents practical incident response and threat hunting techniques using Zeek (formerly Bro), an open-source network metadata extraction tool. The talk walks through real-world scenarios—from investigating suspicious file downloads to detecting DNS tunnels and malware communication pattern

BSides Augusta · 201929:14515 viewsPublished 2019-10Watch on YouTube ↗

Speakers

Alex Kirk

Tags

CategoryTechnical

TopicDetection Engineering DFIR Threat Intel

DifficultyIntermediary

TeamBlue

ResearchCase Studies and Incidents Analysis Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

Elasticsearch Nessus Snort Splunk Wireshark Zeek

Service

VirusTotal

Concepts

JA3

About this talk

Alex Kirk presents practical incident response and threat hunting techniques using Zeek (formerly Bro), an open-source network metadata extraction tool. The talk walks through real-world scenarios—from investigating suspicious file downloads to detecting DNS tunnels and malware communication patterns—demonstrating how to extract forensic signals from network data without storing full packet captures. Kirk covers detection methods including file analysis, user-agent anomalies, SSL/TLS fingerprinting, and integration with the MITRE ATT&CK framework for systematic threat detection.

Show transcript [en]

thank you folks so quickly here I'm gonna start out with Who I am and why it's worth listening to me so I'm an open source security guy I have been with all the projects listed there in a professional capacity ten years research was Sourcefire before I came to the dark side and became a sales engineer right at the Cisco acquisition and currently I cover the seventh state southeast for tenable or sorry for for correlate previously with tenable and I'm also a big fan of the absurd especially in the wild so my wife actually sent me this picture from the gas station this morning that's a Windows boot error on the pump I thought that was an entertaining a little bit

for the crowd here so a little bit about so what the heck is Zeke you know I know some people in the room know this but for those who don't it is an open source project that came out of Berkeley all the way back in 1995 burne Paxson was one of the guys that helped write TCP dump and he was using that for forensics on Berkeley campus and realized what a crazy needle in haystack kind of a problem that was and NetFlow on the other end didn't really give him the data he needed because yeah you get this IP on that port for this many bytes but no real depth in terms of what's actually going on and so he wrote Zeke

which at the time was named bro for the Big Brother kind of connotation and it's a layer seven metadata extraction tool that is designed for security purposes the whole idea is that it gives you all the richness you need in order to be able to track down an incident but isn't recording every single YouTube stream that crosses the wire all the other junk that you're going to get in a pcap solution that actually comes out typically at about 1% the size of packet capture so it's a lot easier to store for the long term then then packet capture so talking a little bit about you know incident response and threat hiding I want to start out by defining

those two terms to kind of level set for everybody in the room Incident Response is your typical reactive hair on fire you had a three-letter agency maybe knock on your door and say hey your beaconing off to a C&C there's some kind of time sensitive indicator of compromise that you need to go track down the nice thing about it is that it's closed ended because you have an incident that you're going after as opposed to on the other end threat hunting is a lot more open-ended it's a proactive questioning of the data you don't necessarily know that there's a problem going on but you might have some kind of hypothesis about what would be normal versus abnormal in your network

and you're going after those kinds of things and looking for that signal and the noise the hard part about that of course is because there's no set deadlines and there's no sense of urgency it can often be deprioritized in large organizations and so you know we hope to actually make it a little bit easier to do on a regular basis so that even folks who are budget and time strapped can get into that level of analytics so what I want to start with is walking through a sample incident response scenario this is the classic you've got a secretary who's looking for a specialized label template size on google clicks a link gets a bunch of

popups computers all slow comes to IT and says yeah I remember it was I think it was called my business doc but that's really all the information they can give and so at that point you would come in and start looking at our data here and elastic because hey we're open-source friendly and we're big fans of elastic and you can see just by querying the pointer I guess not all you got to do is query this my business doc string that the secretary has given you and you've got DNS records and HTTP records and all that sort of stuff confirming what was going on and if you dig a little bit deeper into there you'll see that you've got in this

HTTP record the file name of what was actually retrieved comes back as this random string dot gif what's interesting there is that the mime type on that was actually an application Doss exec so that's probably an exploit kits or some other piece of software that's trying to bypass your you know generic proxy the is only bothering to look at the extension of a file and not actually dig into the magic and see what's going on so at that point you go well that's definitely suspicious I need to dig a little further into this one of the things that Zeke provides is the concept of you IDs that index different layers of a connection together and in this

case we've got in that red box the file UID that allows you to go ahead and pivot and do a search into a file record that's going to give you the sha-256 automatically because we're extracting the full file and doing analysis on it natively you've got all the hashes pre-populated you can tell that this is a PE file because that analyzer ran successfully etc and at that point if you're like me and you've got the virustotal extension in chrome it is literally a highlight right click and boom you know exactly which piece of malware is on that box so from a you know an incident response perspective you can go from something that takes so

you know hours potentially and dig that down into a matter of minutes and save a lot of the cycles going through and touching different systems because the whole point of this really is that you've got network ground truth in a centralized format that is normalized and it's all you know it's coming from one place as opposed to so many sims have you know firewall logs and ids logs and the active directories given off DNS information but it doesn't have the responses and all of this data is difficult to actually put together into a coherent whole whereas we try to make it a single unified way of looking at things or it's gonna make make your life easier now

some of the other things you can do in that incident response scenario are take that file and say okay who else on my network has touched this thing I know at Cisco selling endpoint an email we're having a record of what file touch what endpoints on the network was one of the hottest features that close steals most often again it's just natively here because we're extracting all those files that cross the wire doing this analysis for you and in this case we only have the one host that hit it so it really was just a simple somebody clicked on a bad link and got hit by an exploit kit not a big deal but if this was something that was

moving over SMB within the environment that had been seen by a lot of folks maybe you had a broader phishing campaign this allows you to really scope the nature of the problem and not say well I saw it on this box but I saw it on that box too and I don't know exactly how it was transferred between those boxes and so now suddenly that entire subnet is suspect as opposed to well I know concretely that only these systems have touched that file and so they're the only ones I need to go and clean up there's a lot of other specific digging you could do at this point you could say what other websites has that system

visited what are the other DNS queries that have gone on past that so you could you know dig up the see and see if from a successful exploitation all of the information about what that system did is going to be very easily queryable at that point once you have a confirmed infection and so again scoping your incident response is gonna make your life easier as as a defender but that's you know really that's the simple part the more fun part is when you start getting into threat hunting and digging that signal out of the noise and so I've got a few different things I want to talk through on that case speaking specifically about the hypothesis that

you start with when you're doing a threat hunts the first one is that typically people on a keyboard don't like to type really long names into a web browser about 25 characters is roughly the limit for how much a human is gonna type into a browser before they've got a link shortener or some other thing going on so what you would do at that point and and I've transitioned Blunk here because as it turns out doing lengths on the fly and elastic is problematic and hey we as correlate don't care which sim you use will we support all of them but what I've got here is a query that says give me all of the DNS queries that are greater than 25

bytes for the domain name being queried along with a count of how many times those were hits across a single connection and yes UDP is connectionless but you can have pairs of systems and ports and essentially derive a connection at that point and if you look at the the counts on the right you see that a lot of these things are just onesie twosie hits they're things like this McAfee URL that is clearly encoding some kind of data on the backend but that's fine because it's your antivirus client and you trust it and this actually resolves to IP space owned by McAfee and all right that's weird but it only did it once and so it's not a big

deal where it gets more interesting as where you see this one has 13,000 hits that are all the same length and so you want to dig into that at that point and say well what exactly is the query that's that's being run here and you can see that the sub domain off of this sweet cold water com that sure looks like a hex encoded string those are all hex legal characters and at that point you say alright I need to go a little bit further let me look at what kind of answers were coming back on this oh look they weren't even sending IP addresses back they're sending more domain names back as answers which yeah that's

totally legal Google will do it for load balancing across its different servers there's lots of legitimate use cases that's why it's in the RFC but all of these responses again those sub domain strings are just hex bytes and if you look closely at the counts you'll see that these things are showing up twice or once per each of those subdomains and across thirteen thousand hits obviously that's enough that's enough entropy that you're you're transferring data this has got to be a DNS tunnel some kind because there's no other reason to hit that many subdomains within a given domain particularly with that much entropy on strings that look like they're hex encoded data anyway so it's probably not that easy necessarily

in the wild you're not going to do this query and find a DNS tunnel your first day but it's not hard to make this into a dashboard and take a look every morning and say hey that looks funny I'm gonna spend five minutes digging into that data and seeing what's going on there so the next one I want to talk about is user agent strings in HTTP and this is a personal favorite topic of mine because back when I was with source fire research I stood up our first malware sandbox back in 2009 and one of the first things I did with all the packet captures out of that was I pulled out user agents

drinks because they were easily parsed by Wireshark they're human readable and initially I had figured well I'm gonna find people who've made a typo in the process of trying to have a user agent string that looks normal and maybe I can write a signature around a misspelled Mozilla or something like that but when I started digging into the pcaps I was seeing user agent Z BOTS and other pieces of malware that were actually declaring themselves right out in the open and the user agent string and so you know I we wrote a number of signatures for snort back in the day that were as simple as do you see this user agent string you have got malware

on your network plain and simple and the reality of it is that with a simple elastic query or Splunk or whatever the case may be you can do a search for what are the least common user agents on my network and I would bet anybody in this room $100 that if you got me the hundred most rare user agents out of an enterprise network I'm gonna find two or three things in there that are gonna make you go hmm whether they're oh something with a UNIX command and it looks like it's trying to do a weird injection attack it's a cloud mapping experiment that it's probably a nice guy because you giving you a contact information to tell

him what's going on if you don't want him scanning your box or maybe it's just a scanning tool that some script Kitty is using that you probably want to go pay attention to if you're getting a bunch of hits for it and you know really there's a number of kind of classes of suspect user agent strings that are worth digging into you see a lot of scanners out there like good old mass scan that can scan the whole internet from a single box within an hour on commodity hardware a lot of those are good at telling you what they are you know I've seen the necess user agent string as a tenable guy in the wild and

you know if that's coming from inside your network fine whitelisted it's the security team it's coming from outside your network you probably have a problem on your hands there's a number of bots and and pieces of malware that again will just recently declare themselves out in the wild you know some of them are fine buy do spiders just China's Google indexing things and it shows up on all your web logs dub-dub-dub mechanize might be somebody who's written a script to go scrape data off of your site and get competitive Intel but maybe you've got something like this packet capture from last month where you've got somebody who's again declaring themselves as user agent Buran which is some silly piece of

you know trojan malware out there in the wild but again in the ten years since I opened up this sandbox at source fire people haven't bothered to do enough inspection of user agent data that pieces of malware can continue to to say hi I'm a bot and nobody notices of course you're gonna get a lot of device information as well everything from you know if you've got a PlayStation 3 updater on the network at my house that's fine at your office it might not be your Samsung TV that is busy spying on you you'll get to know about by the user agent strings all that kind of data about just assets and your network is

easy pickings looking at this stuff and of course finally sometimes you see straight-up exploit attempts in user agents shellshock is a classic example of that you know you can see down there it's got the the triggering conditions with the braces and the semicolons and all that good foolishness that messes with bash but even if you didn't know that seeing a call to echo you name off to a TCP socket it's probably a good indicator that this is something you want to go track down and of course you know a number of IDS's or other tools might find that with a signature the nice thing about us is we're going to give you all the context around that all the

connections before and after who that box has been talking to so that it's real easy to validate ok fine that server got hit with that attempt but did it do anything funny afterwards is always the question and so having that at the ready makes your life simpler of course you know I talked about nobody focuses on this and part of the problem is in the in the process of doing this talk I said well you know what exactly are the good sources of data out there and there's a lot of web server block lists for like HT access for BOTS that don't pay attention to robots.txt there's user agent string comm is a really good list of legit strings so

what is my browser com people have 5 million user agents in there all of which are legitimate but there's really not a lot of good information security you know we've got the SSL BL and all DNS black lists out the wazoo nobody's got a good public user agent malware string database I'd love to see somebody in this room make one someday but for now part of the reason people aren't looking is because they don't have that data so the last piece I wanted to walk through on the the threat hunt side was SSL everybody talks about well if you're not doing break and inspect then you know what are you doing what kind of visibility do you have there's a lot of

interesting unencrypted data and an encrypted stream if you look closely and it's everything from the status of the certificate you know is it self-signed has it expired to the version of TLS in use if you've got 1.0 running around your network you've either got a policy problem on your hands or you've got an attacker who's using an old kit and hasn't bothered to update the TLS libraries on things and if you look again closely at the records that we pull out for these things one of the things that we pull in particular that's useful is the subject of the certificates you can see here that's Internet widgets PTY which is the defaults out of the x.509 command-line

tools and a lot of other ssl certificate generation programs and again I wrote an IPS signature and did a blog post on this and 2011 and figured maybe I can pick off a couple of lazy bad guys in the process I've had dozens of people come up to me at conferences and events afterwards and say I found more weird stuff on my network with that one IDs signature than anything else so the bad guys are clearly not bothering to cover their tracks again they're just going with defaults and running and a simple query is gonna pull that out of our data just as well one of the more kind of interesting geeky bits that we've got is

the ja3 hashes for ssl this is a relatively new thing that came out of salesforce.com where they're actually taking information about the cipher suite and the type of software that's running and making a hash of a connection so that you can tell that it's this browser type connecting with that server type and everything from public black lists that are available for known malicious connections to again doing that search of what are the rare ones in my environment and and weeding out what's potentially an encrypted malware connection as opposed to your legitimate web servers is easy to do based upon that kind of a hash and that's something that we as a company core light are focusing on fairly

closely coming out of our research team we're gonna be releasing in a couple of weeks a package that is going to do things like you can see the difference between a successful SSH login on the right and a sssh login pattern on the left just from bite timing and size and so it's easy to do things like alright I had a thousand SSH logon failures off this one host followed by a success I probably won't alert on that so there's a lot of context you can get out of the the surrounding connections and the unencrypted bits of a session that can be useful from an incident response perspective even when things are not decrypted so the last kind of portion of

the talk that I wanted to get into was the mitre attack framework and how that applies to bro and Zeke so you guys have probably heard about this it's an initiative from the mitre folks the guys who do the CVE standard to talk about different attacker techniques and tactics and it is a giant matrix when you get right down to it it literally keeps scrolling from here but it's a hugely useful project because for each of these different indicators they're breaking down this is how it's used in an attack these are the common indicators you might look for on a network or an endpoints and really giving you a good view into what the bad

guys are doing and how they're doing it so that you can look for these different techniques on the wire on the endpoint as the case may be and even cooler they have put together a bro script called bizarre that is implementing some of these detection x' as a package on top of our software it's open source of course it's pre-built with the correlate commercial appliance so you don't have to worry about installing it but it's it's very straightforward stuff at the end of the day because the bro scripting language is a lot easier to work with and a lot of people think people get intimidated by the concept of a network programming language but it's the little

things like this DPD cig file is basically saying alright I want to say that if something is on TCP and the Paquette begins with four bites and then a hex f DSMB well that's SMB version three I've seen FF SMB for a huge portion of my career as SMB - they said well oh hey we need to catch three as well we're gonna make sure to enable that detection for SMB so that it's not constrained to two and then they went through and said all right we're gonna define a whole bunch of dce/rpc named pipe operation combinations that are potentially suspicious so in this particular case you know that tp1 o7t indicator that we saw earlier there are

system shutdowns and event log clear dce/rpc calls that really you probably shouldn't be using in any kind of a legitimate network setting why would you go across dce/rpc to say I want to wipe all the event logs on this Windows server when you're an admin and you could just do it more directly and so once they've defined you know what all of these different things are you come in and we're an event-driven language so the the underlying core system is going to generate events on everything from a TCP connection being made to in this case a dce/rpc response coming back from a system that has been contacted and it's really as simple in this case as

alright if the connection is dce/rpc and it's got the endpoint in the operation data in there then i'm gonna iterate across all of those different constants that i had earlier for those specific blacklisted operations and if one of them pops generate a note that says hey this was an attack for credential access you know you're getting the log you go go with it from there it is also possible to say alright I've got something that is not necessarily nasty on its own but if you see a lot of it it might be a problem and so there's a built-in function called some staff in bronzy that allows you to basically say alright I'm gonna set an observation

of this type I saw this thing happening and then I want to say alright I'm gonna apply the reducer that is a sum of these things so essentially count the things that I've observed and then from there I want to go ahead and set a time value a threshold and say if I get you know a thousand of these things within a minute that's what I want to raise an alert as generating alert every time so realistically if you're you're trying to write your own rosy scripts you got to understand which network of Vettes you're going after and then you've got a fully featured programming language to apply logic on top of that event based upon a bunch of

characteristics that we're already pulling out of the packets because we extract all this information about all these different protocols it's really just a bunch of variables you're going to check that have already been populated by the system so it makes makes things very straightforward so that's my lightning talk my contact information is here we do have a raspberry pi for that we're gonna have an SD card that has the preloaded on it so I'm gonna do this as a trivia question giveaway now can anybody name the field in Zeke that is used to pivot between layers of a connection all right I saw the hand up what's UID all right you were the winner sir do we have is

the SD card in this already all right cool and you know what I heard somebody say UID in the back before raising a hand so if you want to raise your hand you get a choice of blue team field manual a hunter based network tap or what's this last piece here lockpick set so your choice whoever that was that said you ad in the back all right well here since since I've got a couple more things to give away here the other question I guess I would ask wait which one did you want take your pick cool all right so for again for your choice who's who's the creator or bro who wrote it hands up there you go and then for the

last one last question I'm going to say what's the government agency that funded bro for 20 years anybody right in the front not the Oh II know right okay no no not FBI not NSA Bert now Berkeley wasn't it it's not Berkeley it came from Berkeley but it wasn't funded by Berkeley sir no

you got it NSF alright well honestly I'm out of questions I didn't know I was gonna have this many giveaway so who wants the blue team Field Manual all right you're right in the front man there you go thanks folks oh also we're gonna be at the Hyatt House rooftop bar later this evening anybody wants to stop by and talk Zeke or light whatever come on by we're buying drinks [Applause]

Alex Kirk - Incident response and threat hunting using Bro/Zeek data

Related talks